Re: Difference in queryString and Parsed query

2019-01-22 Thread Lavanya Thirumalaisami
 
Thank you Walter Under Underwood for a complete honest review. 
I will start simple by using the sample. 
Regards,LavanyaOn Tuesday, 22 January 2019, 12:31:55 pm AEDT, Walter 
Underwood  wrote:  
 
 There are many, many problems with this analyzer chain definition.

This is a summary of the indexing chain:

* WhitespaceTokenizerFilter
* LowerCaseFilter
* SynonymFilter (with ignoreCase=true after lower-casing everything)
* StopFilter (we should have stopped using stopwords 20 years ago)
* WordDelimiterFilter (with all the transformation options set to 0, does 
nothing)
* RemoveDuplicates (this must always be last)
* KStemFilter (good choice)
* EdgeNGramFilter (!!! are you doing prefix matching? doing that with stemming 
makes bizarre matches)
* ReverseStringFilter (Yowza! Only do this on unmodified tokens, what does this 
mean on word stems? Even more bizarre)

Reversed stemmed edge ngrams should cause some really exciting matches. 

Summary of the query chain:

* WhitespaceTokenizerFilter
* LowerCaseFilter
* PorterStemFilter (different stemmer from indexing, guarantees missed matches)
* SynonymFilter (after stemmer? never do this, all tokens need stemmed)
* StopFilter (bad, but extra bad after a Porter stemmer that doesn’t generate 
dictionary words)
* WordDelimiterFilter (again, doing nothing, also the results should have been 
stemmed)
* KStemFilter (two stemmers in a chain! never do that! plus the Porter stemmer 
doesn’t produce dictionary words, so KStem won’t do much)

Short version, I’m astonished that this configuration works at all. Delete the 
whole thing, use one from the sample file (without stop words), and reindex. 
There is no way to fix this. Not to be mean, but this is the worst field type 
definition I have ever seen.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 21, 2019, at 4:24 AM, Lavanya Thirumalaisami 
>  wrote:
> 
> 
> Thank you Aman Deep 
> I tried removing the kstem filter factory and still get the same issue, but 
> when i comment the Porterstemfilterfactory the character y does not get 
> replaced. 
> 
>    On Monday, 21 January 2019, 11:16:23 pm AEDT, Aman deep singh 
> wrote:  
> 
> Hi Lavanya,
> This is probably due to the kstem Filter factory it is removing the y 
> charactor ,since the stemmer has rule of words ending with y .
> 
> 
> Regards,
> Aman Deep Singh
> 
>> On 21-Jan-2019, at 5:43 PM, Mikhail Khludnev  wrote:
>> 
>> querystring  is what goes into QPaser,  parsedquery  is
>> LuceneQuery.toString()
>> 
>> On Mon, Jan 21, 2019 at 3:04 PM Lavanya Thirumalaisami
>>  wrote:
>> 
>>> Hi,
>>> Our solr search is not returning expected results for keywords ending with
>>> the character 'y'.
>>> For example keywords like battery, way, accessory etc.
>>> I tried debugging the solr query in solr admin console and i find there is
>>> a difference between query string and parsed query.
>>> "querystring":"battery","parsedquery":"batteri",
>>> Also I find that if i search omitting the character y i am getting all the
>>> results.
>>> This happens only for keywords ending with Y and most others we donot have
>>> this issue.
>>> Could any one please help me understand why is the keywords gets changed,
>>> specially the last character. Is there any issues in my field type
>>> definition.
>>> While indexing the data we use the text data type and we have defined as
>>> follows
>>> >> positionIncrementGap="100">  >> class="solr.WhitespaceTokenizerFactory" /> >> class="solr.LowerCaseFilterFactory" /> >> class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
>>> expand="true"/> >> words="stopwords.txt" /> >> catenateWords="1" class="solr.WordDelimiterFilterFactory"
>>> generateNumberParts="0" generateWordParts="0" preserveOriginal="1"
>>> splitOnCaseChange="0" splitOnNumerics="0" /> >> class="solr.RemoveDuplicatesTokenFilterFactory" /> >> class="solr.KStemFilterFactory" /> >> class="solr.EdgeNGramFilterFactory" maxGramSize="255" minGramSize="1" />
>>>    >> type="query">  >> class="solr.LowerCaseFilterFactory" /> >> class="solr.PorterStemFilterFactory" /> >> class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
>>> expand="true"/> >> words="stopwords.txt" /> >> catenateWords="0" class="solr.WordDelimiterFilterFactory"
>>> generateNumberParts="0" generateWordParts="0" preserveOriginal="1"
>>> splitOnCaseChange="0" splitOnNumerics="0" /> >> class="solr.KStemFilterFactory" />  
>>> 
>>> Regards,Lavanya
>> 
>> 
>> 
>> -- 
>> Sincerely yours
>> Mikhail Khludnev
  

Re: Large Number of Collections takes down Solr 7.3

2019-01-22 Thread Dave
Do you mind if I ask why so many collections rather than a field in one 
collection that you can apply a filter query to each customer to restrict the 
result set, assuming you’re the one controlling the middle ware?

> On Jan 22, 2019, at 4:43 PM, Monica Skidmore 
>  wrote:
> 
> We have been running Solr 5.4 in master-slave mode with ~4500 cores for a 
> couple of years very successfully.  The cores represent individual customer 
> data, so they can vary greatly in size, and some of them have gotten too 
> large to be manageable.
> 
> We are trying to upgrade to Solr 7.3 in cloud mode, with ~4500 collections, 2 
>  NRTreplicas total per collection.  We have experimented with additional 
> servers and ZK nodes as a part of this move.  We can create up to ~4000 
> collections, with a slow-down to ~20s per collection to create, but if we go 
> much beyond that, the time to create collections shoots up, some collections 
> fail to be created, and we see some of the nodes crash.  Autoscaling brings 
> nodes back into the cluster, but they don’t have all the replicas created on 
> them that they should – we’re pretty sure this is related to the challenge of 
> adding the large number of collections on those node as they come up.
> 
> There are some approaches we could take that don’t separate our customers 
> into collections, but we get some benefits from this approach that we’d like 
> to keep.  We’d also like to add the benefits of cloud, like balancing where 
> collections are placed and the ability to split large collections.
> 
> Is anyone successfully running Solr 7x in cloud mode with thousands or more 
> of collections?  Are there some configurations we should be taking a closer 
> look at to make this feasible?  Should we try a different replica type?  (We 
> do want NRT-like query latency, but we also index heavily – this cluster will 
> have 10’s of millions of documents.)
> 
> I should note that the problems are not due to the number of documents – the 
> problems occur on a new cluster while we’re creating the collections we know 
> we’ll need.
> 
> Monica Skidmore
> 
> 
> 


Re: Large Number of Collections takes down Solr 7.3

2019-01-22 Thread Shawn Heisey

On 1/22/2019 2:43 PM, Monica Skidmore wrote:

Is anyone successfully running Solr 7x in cloud mode with thousands or more of 
collections?  Are there some configurations we should be taking a closer look 
at to make this feasible?  Should we try a different replica type?  (We do want 
NRT-like query latency, but we also index heavily – this cluster will have 10’s 
of millions of documents.)


That many collections will overwhelm SolrCloud.  This issue is marked as 
fixed, but it's actually not fixed:


https://issues.apache.org/jira/browse/SOLR-7191

SolrCloud simply will not scale to that many collections. I wish I had 
better news for you.  I would like to be able to solve the problem, but 
I am not familiar with that particular code.  Getting familiar with the 
code is a major undertaking.


Thanks,
Shawn



Large Number of Collections takes down Solr 7.3

2019-01-22 Thread Monica Skidmore
We have been running Solr 5.4 in master-slave mode with ~4500 cores for a 
couple of years very successfully.  The cores represent individual customer 
data, so they can vary greatly in size, and some of them have gotten too large 
to be manageable.

We are trying to upgrade to Solr 7.3 in cloud mode, with ~4500 collections, 2  
NRTreplicas total per collection.  We have experimented with additional servers 
and ZK nodes as a part of this move.  We can create up to ~4000 collections, 
with a slow-down to ~20s per collection to create, but if we go much beyond 
that, the time to create collections shoots up, some collections fail to be 
created, and we see some of the nodes crash.  Autoscaling brings nodes back 
into the cluster, but they don’t have all the replicas created on them that 
they should – we’re pretty sure this is related to the challenge of adding the 
large number of collections on those node as they come up.

There are some approaches we could take that don’t separate our customers into 
collections, but we get some benefits from this approach that we’d like to 
keep.  We’d also like to add the benefits of cloud, like balancing where 
collections are placed and the ability to split large collections.

Is anyone successfully running Solr 7x in cloud mode with thousands or more of 
collections?  Are there some configurations we should be taking a closer look 
at to make this feasible?  Should we try a different replica type?  (We do want 
NRT-like query latency, but we also index heavily – this cluster will have 10’s 
of millions of documents.)

I should note that the problems are not due to the number of documents – the 
problems occur on a new cluster while we’re creating the collections we know 
we’ll need.

Monica Skidmore





Re: _version_ field missing in schema?

2019-01-22 Thread Alexandre Rafalovitch
What do you mean schema.xml from managed-schema? schema.xml is old
non-managed approach. If you have both, schema.xml will be ignored.

I suspect you are not running with the schema you think you do. You can
check that with API or in Admin UI if you get that far.

Regards,
Alex

On Tue, Jan 22, 2019, 11:39 AM Aleksandar Dimitrov <
a.dimit...@seidemann-web.com wrote:

> Hi,
>
> I'm using solr 7.5, in my schema.xml I have this, which I took
> from the
> managed-schema:
>
>   
>   
>  stored="false" />
>  docValues="true" />
>
> However, on startup, solr complains:
>
>  Caused by: org.apache.solr.common.SolrException: _version_ field
>  must exist in schema and be searchable (indexed or docValues) and
>  retrievable(stored or docValues) and not multiValued (_version_
>  does not exist)
>   at
>
> org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:69)
>
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at
>   org.apache.solr.update.VersionInfo.(VersionInfo.java:95)
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at org.apache.solr.update.UpdateLog.init(UpdateLog.java:404)
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at
>   org.apache.solr.update.UpdateHandler.(UpdateHandler.java:161)
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at
>   org.apache.solr.update.UpdateHandler.(UpdateHandler.java:116)
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at
>
> org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:119)
>
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at
>
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>   Method) ~[?:?]
>   at
>
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>
>   ~[?:?]
>   at
>
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>   ~[?:?]
>   at
>   java.lang.reflect.Constructor.newInstance(Constructor.java:488)
>   ~[?:?]
>   at
>   org.apache.solr.core.SolrCore.createInstance(SolrCore.java:799)
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at
>   org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:861)
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at
>   org.apache.solr.core.SolrCore.initUpdateHandler(SolrCore.java:1114)
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at org.apache.solr.core.SolrCore.(SolrCore.java:984)
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at org.apache.solr.core.SolrCore.(SolrCore.java:869)
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   at
>
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1138)
>
>   ~[solr-core-7.5.0.jar:7.5.0
>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>   2018-09-18 13:07:55]
>   ... 7 more
>
> Anyone know what I'm doing wrong?
> I've tried having the _version_ field be string, and indexed and
> stored,
> but that didn't help.
>
> Thanks!
>
> Aleks
>
>


_version_ field missing in schema?

2019-01-22 Thread Aleksandar Dimitrov

Hi,

I'm using solr 7.5, in my schema.xml I have this, which I took 
from the

managed-schema:

 
 
  stored="false" />
  docValues="true" />


However, on startup, solr complains:

Caused by: org.apache.solr.common.SolrException: _version_ field 
must exist in schema and be searchable (indexed or docValues) and 
retrievable(stored or docValues) and not multiValued (_version_ 
does not exist)
 at 
 org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:69) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at 
 org.apache.solr.update.VersionInfo.(VersionInfo.java:95) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at org.apache.solr.update.UpdateLog.init(UpdateLog.java:404) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at 
 org.apache.solr.update.UpdateHandler.(UpdateHandler.java:161) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at 
 org.apache.solr.update.UpdateHandler.(UpdateHandler.java:116) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at 
 org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:119) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at 
 jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method) ~[?:?]
 at 
 jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
 ~[?:?]
 at 
 jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
 ~[?:?]
 at 
 java.lang.reflect.Constructor.newInstance(Constructor.java:488) 
 ~[?:?]
 at 
 org.apache.solr.core.SolrCore.createInstance(SolrCore.java:799) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at 
 org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:861) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at 
 org.apache.solr.core.SolrCore.initUpdateHandler(SolrCore.java:1114) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at org.apache.solr.core.SolrCore.(SolrCore.java:984) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at org.apache.solr.core.SolrCore.(SolrCore.java:869) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]
 at 
 org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1138) 
 ~[solr-core-7.5.0.jar:7.5.0 
 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 
 2018-09-18 13:07:55]

 ... 7 more

Anyone know what I'm doing wrong?
I've tried having the _version_ field be string, and indexed and 
stored,

but that didn't help.

Thanks!

Aleks



Best practice to deploy Solr to production

2019-01-22 Thread marotosg
Hi all,

I have a Solr index which has been evolving since Solr1.4 and now is in
SolrCloud6.6. 
This cluster is composed of 4 servers, few collections and shards. 
Since first time I deployed to production in 2009 I am using the same
approach to deploy. I think it's probably the time to review and improve. I
am looking for best practices and different approaches.

Here is what I have. Everyhting is done using ant.
1. I do have solr package together with my own code. I do have some plugins
written. 
2. 8 collections with several files per collection.
3. Ant steps 
3.1 Update config files with deployment properties per environment. 
3.2 Package and copy to servers
3.3 Run  remote sh commands for installation of Solr
3.4 Run remote sh to push config files to zookeeper
3.4 Run web requests to create collections.

I have the feeling this is not the proper way to go but not sure what's the
best practice either.

Can anyone point me to  a nicer way to do this?

Thanks a lot.
Sergio




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


difference in behavior of term boosting between Solr 6 and Solr 7

2019-01-22 Thread Elaine Cario
We're preparing to upgrade from Solr 6.4.2 to Solr 7.6.0, and found an
inconsistency in scoring. It appears that term boosts in the query are not
applied in Solr 7.

The query itself against both versions is identical (removed un-important
params):

("one"^1) OR ("two"^2) OR ("three"^3)
edismax
max_term
AND
dictionary_id:"WKUS-TAL-DEPLURALIZATION-THESAURUS"
100
xml
on


3 documents are returned, but in Solr 6 results the docs are returned in
order of the boosts (three, two, one), as the boosts accounts for the
entirety of the score, while in Solr 7 they are returned randomly, as all
the scores are 1.0.

Looking at the debug and explains, in Solr 6 the boost is multiplied to the
rest of the score:


("one"^1) OR ("two"^2) OR ("three"^3)
("one"^1) OR ("two"^2) OR ("three"^3)
(+(DisjunctionMaxQuery((max_term:" one
"))^1.0 DisjunctionMaxQuery((max_term:" two "))^2.0
DisjunctionMaxQuery((max_term:" three "))^3.0))/no_coord
+(((max_term:" one "))^1.0
((max_term:" two "))^2.0 ((max_term:" three "))^3.0)


3.0 = sum of:
  3.0 = weight(max_term:" three " in 658) [WKSimilarity], result of:
3.0 = score(doc=658,freq=1.0 = phraseFreq=1.0
), product of:
  3.0 = boost
  1.0 = idf(), for phrases, always set to 1
  1.0 = tfNorm, computed as (freq * (k1a + 1)) / (freq + k1b)
[WKSimilarity] from:
1.0 = phraseFreq=1.0
1.2 = k1a
1.2 = k1b
0.0 = b (norms omitted for field)


But in Solr 7, the boost is not there at all:


("one"^1) OR ("two"^2) OR ("three"^3)
("one"^1) OR ("two"^2) OR ("three"^3)
+((+DisjunctionMaxQuery((max_term:" one
"))^1.0) (+DisjunctionMaxQuery((max_term:" two "))^2.0)
(+DisjunctionMaxQuery((max_term:" three "))^3.0))
+((+((max_term:" one "))^1.0)
(+((max_term:" two "))^2.0) (+((max_term:" three
"))^3.0))


1.0 = sum of:
  1.0 = weight(max_term:" two " in 436) [WKSimilarity], result of:
1.0 = score(doc=436,freq=1.0 = phraseFreq=1.0
), product of:
  1.0 = idf(), for phrases, always set to 1
  1.0 = tfNorm, computed as (freq * (k1a + 1)) / (freq + k1b)
[WKSimilarity] from:
1.0 = phraseFreq=1.0
1.2 = k1a
1.2 = k1b
0.0 = b (norms omitted for field)


I noted a subtle difference in the parsedquery between the 2 versions as
well, not sure if that is causing the boost to drop out in Solr 7:

SOLR 6:  +(((max_term:" one "))^1.0 ((max_term:" two
"))^2.0 ((max_term:" three "))^3.0)
SOLR 7:  +((+((max_term:" one "))^1.0) (+((max_term:" two
"))^2.0) (+((max_term:" three "))^3.0))
For our use case , I think we can work around it using a constant score
query, but it would be good to know if this is a bug or expected behavior,
or we're missing something in the query to get boost to work again.

Thanks!


Re: Get MLT Interesting Terms for a set of documents corresponding to the query specified

2019-01-22 Thread Pratik Patel
I will certainly try it out. Thanks!


On Mon, Jan 21, 2019 at 8:48 PM Joel Bernstein  wrote:

> You find the significantTerms streaming expressions useful:
>
>
> https://lucene.apache.org/solr/guide/7_6/stream-source-reference.html#significantterms
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Jan 21, 2019 at 3:02 PM Pratik Patel  wrote:
>
> > Aman,
> >
> > Thanks for the reply!
> >
> > I have tried with corrected query but it doesn't solve the problem. also,
> > my tags filter matches multiple documents, however the interestingTerms
> > seems to correspond to just the first document.
> > Here is an example of a query which matches 1900 documents.
> >
> >
> >
> http://localhost:8081/solr/collection1/mlt?debugQuery=on=tags:voltage=true=my_field=details=1=2=3=*:*=100=0
> >
> >
> > Thanks,
> > Pratik
> >
> >
> > On Mon, Jan 21, 2019 at 2:52 PM Aman Tandon 
> > wrote:
> >
> > > I see two rows params, looks like which will be overwritten by rows=2,
> > and
> > > then your tags filter is resulting only one document. Please remove
> extra
> > > rows and try.
> > >
> > > On Mon, Jan 21, 2019, 08:44 Pratik Patel  > >
> > > > Hi Everyone!
> > > >
> > > > I am trying to use MLT request handler. My query matches more than
> one
> > > > documents but the response always seems to pick up the first document
> > and
> > > > interestingTerms also seems to be corresponding to that single
> document
> > > > only.
> > > >
> > > > What I am expecting is that if my query matches multiple documents
> then
> > > the
> > > > InterestingTerms handler result also corresponds to that set of
> > documents
> > > > and not the first document.
> > > >
> > > > Following is my query,
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8081/solr/collection1/mlt?debugQuery=on=tags:test=true=mlt.fl=textpropertymlt=details=1=2=3=*:*=100=2=0
> > > >
> > > > Ultimately, my goal is to get interesting terms corresponding to this
> > > whole
> > > > set of documents. I don't need similar documents as such. If not with
> > > mlt,
> > > > is there any other way I can achieve this? that is, given a query
> > > matching
> > > > set of documents, find interestingTerms for that set of documents
> based
> > > on
> > > > tf-idf?
> > > >
> > > > Thanks!
> > > > Pratik
> > > >
> > >
> >
>


Re: Iterative graph/nodes query

2019-01-22 Thread Dan M
Perhaps you're looking for the traversalFilter parameter of the graph query?

https://lucene.apache.org/solr/guide/7_6/other-parsers.html#graph-query-parser

Dan Meehl
Meehl Technology Solutions Inc


On Tue, Jan 22, 2019 at 7:13 AM Magnus Karlsson  wrote:

> Hi,
>
>
> anyone using any of the functionality of graphs either in a single
> collection (shortest path) or streaming expressions (nodes)?
>
>
> Experiences?
>
>
> / Magnus
>
> 
> Från: Magnus Karlsson 
> Skickat: den 17 december 2018 14:51:17
> Till: solr-user@lucene.apache.org
> Ämne: Iterative graph/nodes query
>
> Hi,
>
>
> looking at the graph traversal capabilities of solr. Is there a
> function/feature that traverses until certain pre-requisites are met?
>
>
> For instance, in an hierarchical use case, "traverse all children until a
> child has a certain name or type"?
>
>
> Using the current nodes streaming expression I need to know beforehand the
> number of levels I need to traverse.
>
>
> Is there a feature supporting this use-case or is it planned to be
> implemented?
>
>
> Thanks in advance.
>
>
> / Magnus
>
>


Re: The parent shard will never be delete/clean?

2019-01-22 Thread Jason Gerlowski
Hi,

You might want to check out the documentation, which goes over
split-shard in a bit more detail:
https://lucene.apache.org/solr/guide/7_6/collections-api.html#CollectionsAPI-splitshard

To answer your question directly though, no.  Split-shard creates two
new subshards, but it doesn't do anything to remove or cleanup the
original shard.  The original shard remains with its data and will
delegate future requests to the result shards.

Hope that helps,

Jason

On Tue, Jan 22, 2019 at 4:17 AM zhenyuan wei  wrote:
>
> Hi,
>If I split shard1 to shard1_0,shard1_1, Is the parent shard1 will
> never be clean up?
>
>
> Best,
> Tinswzy


Re: modifying the export request handler

2019-01-22 Thread tom400
i use the following defintion : 
< request handler name="my_export" class="solr.exportHandler"
useParams="_EXPORT"> 

 json 


false
{!xport}


myComponent
query



and recieve a nullPointerException when the im loading the core. the
exception is at
org.apache.solr.common.params.solrParams.toMultiMap(solrParams:414)




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


SV: Iterative graph/nodes query

2019-01-22 Thread Magnus Karlsson
Hi,


anyone using any of the functionality of graphs either in a single collection 
(shortest path) or streaming expressions (nodes)?


Experiences?


/ Magnus


Från: Magnus Karlsson 
Skickat: den 17 december 2018 14:51:17
Till: solr-user@lucene.apache.org
Ämne: Iterative graph/nodes query

Hi,


looking at the graph traversal capabilities of solr. Is there a 
function/feature that traverses until certain pre-requisites are met?


For instance, in an hierarchical use case, "traverse all children until a child 
has a certain name or type"?


Using the current nodes streaming expression I need to know beforehand the 
number of levels I need to traverse.


Is there a feature supporting this use-case or is it planned to be implemented?


Thanks in advance.


/ Magnus



The parent shard will never be delete/clean?

2019-01-22 Thread zhenyuan wei
Hi,
   If I split shard1 to shard1_0,shard1_1, Is the parent shard1 will
never be clean up?


Best,
Tinswzy


Multiplicative Boosts broken since 7.3 (LUCENE-8099)

2019-01-22 Thread Tobias Ibounig
Hello,

As described in https://issues.apache.org/jira/browse/SOLR-13126 multiplicative 
boots (in certain conditions) seem to be broken since 7.3.
The error seems to be introduced in 
https://issues.apache.org/jira/browse/LUCENE-8099. Reverting the SOLR parts to 
the now deprecated BoostingQuery again fixes the issue.
The filed issue contains a test case and a patch with the revert (for testing 
purposes, not really a clean fix).
We sadly couldn't find the actual issue, which seems to lie with the use of 
"FunctionScoreQuery" for boosting.

We were able to patch our 7.5 installation with the patch. As others might be 
affected as well, we hope this can be helpful in resolving this bug.

To all SOLR/Lucene developers, thank you for your work. Looking trough the code 
base gave me a new appreciation of your work.

Best Regards,
Tobias

PS: This issue was already posted by a colleague, "Inconsistent debugQuery 
score with multiplicative boost", but I wanted to create a new post with a 
clearer title.



Re: Single query to get the count for all individual collections

2019-01-22 Thread Jan Høydahl
+1 for the most elegant solution so far :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 22. jan. 2019 kl. 03:15 skrev Joel Bernstein :
> 
> Streaming Expressions can do this:
> 
> plist(stats(collection1, q="*:*", count(*)),
>stats(collection2, q="*:*", count(*)),
>stats(collection2, q="*:*", count(*)))
> 
> The plist function is a parallel list of expressions. It will spin each
> expression off in it's own thread and concatenate the results of each
> expression into a single result set.
> Here are the docs:
> https://lucene.apache.org/solr/guide/7_6/stream-source-reference.html#stats
> https://lucene.apache.org/solr/guide/7_6/stream-decorator-reference.html#plist
> 
> plist is quite new, but "list" has been around for a while if you have an
> older version of Solr
> 
> https://lucene.apache.org/solr/guide/7_6/stream-decorator-reference.html#list_expression
> 
> 
> 
> 
> 
> 
> 
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> 
> On Mon, Jan 21, 2019 at 12:53 PM Jens Brandt  wrote:
> 
>> Hi,
>> 
>> maybe adding =true might help. In case of SolrCloud this
>> gives you numFound for each shard.
>> 
>> Regards,
>>  Jens
>> 
>>> Am 10.01.2019 um 04:40 schrieb Zheng Lin Edwin Yeo >> :
>>> 
>>> Hi,
>>> 
>>> I would like to find out, is there any way that I can send a single query
>>> to retrieve the numFound for all the individual collections?
>>> 
>>> I have tried with this query
>>> 
>> http://localhost:8983/solr/collection1/select?q=*:*=collection1,collection2
>>> However, this query is doing the sum of all the collections, instead of
>>> showing the count for each of the collection.
>>> 
>>> I am using Solr 7.5.0.
>>> 
>>> Regards,
>>> Edwin
>> 
>>