Re: Solr mm is field Level in case sow is false

2017-11-28 Thread Aman Deep singh
HI Steve,
I can’t use the copy field because i have multiple types of field ,which uses 
different type of data ,examples are
1. Normal Tokenized field (normal fields)
2. Word deliminated field 
3. synonyms field (synonyms can be applied on one or two fields not all 
according to our requirement)
4. Ngrams field (model related field, partial word matches)

> On 29-Nov-2017, at 8:30 AM, Steve Rowe  wrote:
> 
> Hi Aman, see my responses inline below.
> 
>> On Nov 28, 2017, at 9:11 PM, Aman Deep Singh  
>> wrote:
>> 
>> Thanks steve,
>> I got it but my problem is u can't make the every field with same analysis,
> 
> I don’t understand: why can’t you use copy fields with all the same analysis?
> 
>> Is there any chance that sow and mm will work properly ,I don't see this in
>> future pipe line also,as their is no jira related to this.
> 
> I wrote up a description of an idea I had about addressing it in a reply to 
> Doug Turnbull's thread on this subject, linked from my blog: from 
> :
> 
>> In implementing the SOLR-9185 changtes, I considered a compromise approach 
>> to the term-centric
>> / field-centric axis you describe in the case of differing field analysis 
>> pipelines: finding
>> common source-text-offset bounded slices in all per-field queries, and then 
>> producing dismax
>> queries over these slices; this is a generalization of what happens in the 
>> sow=true case,
>> where slice points are pre-determined by whitespace.  However, it looked 
>> really complicated
>> to maintain source text offsets with queries (if you’re interested, you can 
>> see an example
>> of the kind of thing I’m talking about in my initial patch on 
>> , which I ultimately 
>> decided against committing), so I decided to go with per-field dismax when
>> structural differences are encountered in the per-field queries.  While I 
>> won’t be doing
>> any work on this short term, I still think the above-described approach 
>> could improve the
>> situation in the sow=false/differing-field-analysis case.  Patches welcome!
> 
> --
> Steve
> www.lucidworks.com
> 

Thanks,
Aman Deep Singh

RE: Solr Spellcheck

2017-11-28 Thread GVK Prasad
No, This word is , cholera , is appearing in less than one percent.
My corpus has  about 80 docs and Cholera appears in less an 2000 docs. OK, 
I understand why it is showing that way.

Can you please suggest suitable configuration for spell check to work 
correctly.  I am indexing all the words in one  column. With current 
configuration I am not getting good suggestions

Regards,
Prasad.

Sent from Mail for Windows 10

From: alessandro.benedetti
Sent: Tuesday, November 28, 2017 11:28 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Spellcheck

You spellcheck configurations are quite extensive !
In particular you specified :


  0.01

This means that if the term appears in less than 1 % total docs  it will be
considered misspelled.
Is cholera occurring in your corpus > 1% total docs ?



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


Re: Solr mm is field Level in case sow is false

2017-11-28 Thread Steve Rowe
Hi Aman, see my responses inline below.

> On Nov 28, 2017, at 9:11 PM, Aman Deep Singh  
> wrote:
> 
> Thanks steve,
> I got it but my problem is u can't make the every field with same analysis,

I don’t understand: why can’t you use copy fields with all the same analysis?

> Is there any chance that sow and mm will work properly ,I don't see this in
> future pipe line also,as their is no jira related to this.

I wrote up a description of an idea I had about addressing it in a reply to 
Doug Turnbull's thread on this subject, linked from my blog: from 
:

> In implementing the SOLR-9185 changtes, I considered a compromise approach to 
> the term-centric
> / field-centric axis you describe in the case of differing field analysis 
> pipelines: finding
> common source-text-offset bounded slices in all per-field queries, and then 
> producing dismax
> queries over these slices; this is a generalization of what happens in the 
> sow=true case,
> where slice points are pre-determined by whitespace.  However, it looked 
> really complicated
> to maintain source text offsets with queries (if you’re interested, you can 
> see an example
> of the kind of thing I’m talking about in my initial patch on 
> , which I ultimately 
> decided against committing), so I decided to go with per-field dismax when
> structural differences are encountered in the per-field queries.  While I 
> won’t be doing
> any work on this short term, I still think the above-described approach could 
> improve the
> situation in the sow=false/differing-field-analysis case.  Patches welcome!

--
Steve
www.lucidworks.com



Re: Solr mm is field Level in case sow is false

2017-11-28 Thread Aman Deep Singh
Thanks steve,
I got it but my problem is u can't make the every field with same analysis,
Is there any chance that sow and mm will work properly ,I don't see this in
future pipe line also,as their is no jira related to this.

Thanks ,
Aman deep singh


On 28-Nov-2017 8:02 PM, "Steve Rowe"  wrote:

Hi Aman,

>From the last bullet in the “Caveats and remaining issues” section of m
query-time multi-word synonyms blog: , in part:

> sow=false changes the queries edismax produces over multiple fields when
> any of the fields’ query-time analysis differs from the other fields’
[...]
> This can change results in general, but quite significantly when combined
> with the mm (min-should-match) request parameter: since min-should-match
> applies per field instead of per term, missing terms in one field’s
analysis
> won’t disqualify docs from matching.

One effective way of addressing this issue is to make all queried fields
use the same analysis, e.g. by copy-fielding the subset of fields that are
different into ones that are the same, and then querying against the target
fields instead.

--
Steve
www.lucidworks.com

> On Nov 28, 2017, at 5:25 AM, Aman Deep singh 
wrote:
>
> Hi,
> When sow is set to false then solr query is generated a little
differently as compared to sow=true
>
> Solr version -6.6.1
>
> User query -Asus ZenFone Go ZB5 Smartphone
> mm is set to 100%
> qf=nameSearch^7 brandSearch
>
> field definition
>
> 1. nameSearch—
> 
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 
>
> 2. brandSearch
> 
>
>
>
>
>
>
>
>
> 
>
>
> with sow=false
> "parsedquery":"(+DisjunctionMaxQuerybrandSearch:asus
brandSearch:zenfone brandSearch:go brandSearch:zb5
brandSearch:smartphone)~5) | ((nameSearch:asus nameSearch:zen
nameSearch:fone nameSearch:go nameSearch:zb nameSearch:5
nameSearch:smartphone)~7)^7.0)))/no_coord",
>
> with sow=true
> "parsedquery":"(+(DisjunctionMaxQuery((brandSearch:asus |
(nameSearch:asus)^7.0)) DisjunctionMaxQuery((brandSearch:zenfone |
((nameSearch:zen nameSearch:fone)~2)^7.0)) DisjunctionMaxQuery((brandSearch:go
| (nameSearch:go)^7.0)) DisjunctionMaxQuery((brandSearch:zb5 |
((nameSearch:zb nameSearch:5)~2)^7.0))
DisjunctionMaxQuery((brandSearch:smartphone
| (nameSearch:smartphone)^7.0)))~5)/no_coord",
>
>
>
> If you see the difference in parsed query in sow=false case mm is working
as field level while in case of sow=true mm is working across the field
>
> We need to use sow=false as it is the only way to use multiword synonyms
> Any idea why it is behaving in this manner and any way to fix so that mm
will work across fields in qf.
>
> Thanks,
> Aman Deep Singh


Re: /export LongPointField

2017-11-28 Thread Erick Erickson
Looks like a duplicate of 10835, fixed in 6.7 (unreleased) and 7.0.
Have you tried with any 7x versions?

On Tue, Nov 28, 2017 at 4:08 PM, nandakishorek
 wrote:
> My index has certain fields of type LongPointField with docValues=true i.e.,
> they are not multivalued.
> When I use the export handler to export the data, there is an exception
> thrown 'Export fields must either be one of the following types:
> int,float,long,double,string,date,boolean'.
> I see the ExportWriter cannot write fields of type LongPointFields. Should I
> raise a JIRA for this?
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


/export LongPointField

2017-11-28 Thread nandakishorek
My index has certain fields of type LongPointField with docValues=true i.e.,
they are not multivalued.
When I use the export handler to export the data, there is an exception
thrown 'Export fields must either be one of the following types:
int,float,long,double,string,date,boolean'.
I see the ExportWriter cannot write fields of type LongPointFields. Should I
raise a JIRA for this?






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


LTR model upload

2017-11-28 Thread Brian Yee
When I upload, I can see my model when I hit 
"solr/collection_name/schema/model-store/" but when I reload the collection, 
it's gone.

Is there a size limit for LTR models? I have a 1.6mb / 49,000 line long 
lambdamart model (that's what ranklib spit out for me) which I didn't think 
would be a huge problem. I decided to test by cutting down the model size by 
deleting 99% of it and it worked after reload. Does this mean my model is too 
big or do I possibly have a syntax bug somewhere in my model.json?


  *   Brian


TemplateTransformer is not working for few new fields

2017-11-28 Thread Abhijit Pawar
Hello All,

I have added two new fields to a entity in data-source-config.xml file.
On running full-import it gets indexed fine and I can see the values for
those two fields just like other fields.

However on adding a nested entity under that entity and using a
TemplateTransformer on nested entity I am unable to see values for those
two fields.Rest fields from parent entity are showing the values without
any issues.

<>



* *
* *



**
* *



Here's the schema definition for those two fields :

<>

  
 

Any idea on what could be the issue here?

Regards,

Abhijit


Re: OutOfMemoryError in 6.5.1

2017-11-28 Thread Walter Underwood
I’m pretty sure these OOMs are caused by uncontrolled thread creation, up to 
4000 threads. That requires an additional 4 Gb (1 Meg per thread). It is like 
Solr doesn’t use thread pools at all.

I set this in jetty.xml, but it still created 4000 threads.

  



wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 23, 2017, at 7:02 PM, Damien Kamerman  wrote:
> 
> I found the suggesters very memory hungry. I had one particularly large
> index where the suggester should have been filtering a small number of
> docs, but was mmap'ing the entire index. I only ever saw this behavior with
> the suggesters.
> 
> On 22 November 2017 at 03:17, Walter Underwood 
> wrote:
> 
>> All our customizations are in solr.in.sh. We’re using the one we
>> configured for 6.3.0. I’ll check for any differences between that and the
>> 6.5.1 script.
>> 
>> I don’t see any arguments at all in the dashboard. I do see them in a ps
>> listing, right at the end.
>> 
>> java -server -Xms8g -Xmx8g -XX:+UseG1GC -XX:+ParallelRefProcEnabled
>> -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages
>> -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -verbose:gc
>> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
>> -XX:+PrintGCApplicationStoppedTime
>> -Xloggc:/solr/logs/solr_gc.log -XX:+UseGCLogFileRotation
>> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
>> -Dcom.sun.management.jmxremote 
>> -Dcom.sun.management.jmxremote.local.only=false
>> -Dcom.sun.management.jmxremote.ssl=false 
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.port=18983 
>> -Dcom.sun.management.jmxremote.rmi.port=18983
>> -Djava.rmi.server.hostname=new-solr-c01.test3.cloud.cheggnet.com
>> -DzkClientTimeout=15000 -DzkHost=zookeeper1.test3.cloud.cheggnet.com:2181,
>> zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud.
>> cheggnet.com:2181/solr-cloud -Dsolr.log.level=WARN
>> -Dsolr.log.dir=/solr/logs -Djetty.port=8983 -DSTOP.PORT=7983
>> -DSTOP.KEY=solrrocks -Dhost=new-solr-c01.test3.cloud.cheggnet.com
>> -Duser.timezone=UTC -Djetty.home=/apps/solr6/server
>> -Dsolr.solr.home=/apps/solr6/server/solr -Dsolr.install.dir=/apps/solr6
>> -Dgraphite.prefix=solr-cloud.new-solr-c01 -Dgraphite.host=influx.test.
>> cheggnet.com -javaagent:/apps/solr6/newrelic/newrelic.jar
>> -Dnewrelic.environment=test3 -Dsolr.log.muteconsole -Xss256k
>> -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/apps/solr6/bin/oom_solr.sh
>> 8983 /solr/logs -jar start.jar --module=http
>> 
>> I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0.
>> Our load benchmarks use prod logs. We added suggesters, but those use
>> analyzing infix, so they are search indexes, not in-memory.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Nov 21, 2017, at 5:46 AM, Shawn Heisey  wrote:
>>> 
>>> On 11/20/2017 6:17 PM, Walter Underwood wrote:
 When I ran load benchmarks with 6.3.0, an overloaded cluster would get
>> super slow but keep functioning. With 6.5.1, we hit 100% CPU, then start
>> getting OOMs. That is really bad, because it means we need to reboot every
>> node in the cluster.
 Also, the JVM OOM hook isn’t running the process killer (JVM
>> 1.8.0_121-b13). Using the G1 collector with the Shawn Heisey settings in an
>> 8G heap.
>>> 
 This is not good behavior in prod. The process goes to the bad place,
>> then we need to wait until someone is paged and kills it manually. Luckily,
>> it usually drops out of the live nodes for each collection and doesn’t take
>> user traffic.
>>> 
>>> There was a bug, fixed long before 6.3.0, where the OOM killer script
>> wasn't working because the arguments enabling it were in the wrong place.
>> It was fixed in 5.5.1 and 6.0.
>>> 
>>> https://issues.apache.org/jira/browse/SOLR-8145
>>> 
>>> If the scripts that you are using to get Solr started originated with a
>> much older version of Solr than you are currently running, maybe you've got
>> the arguments in the wrong order.
>>> 
>>> Do you see the commandline arguments for the OOM killer (only available
>> on *NIX systems, not Windows) on the admin UI dashboard?  If they are
>> properly placed, you will see them on the dashboard, but if they aren't
>> properly placed, then you won't see them.  This is what the argument looks
>> like for one of my Solr installs:
>>> 
>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs
>>> 
>>> Something which you probably already know:  If you're hitting OOM, you
>> need a larger heap, or you need to adjust the config so it uses less
>> memory.  There are no other ways to "fix" OOM problems.
>>> 
>>> Thanks,
>>> Shawn
>> 
>> 



Performance questions

2017-11-28 Thread Nicolas Bélisle
Hi,


We use Solr like a search engine / document store / database. We are
currently optimizing a test environment and would welcome any relevant
suggestions.

I've taken a lot of time researching this mailing list and found a lot of
relevant information.


Here's our current setup :


SolrCloud 6.5 Cluster : 5-6 nodes (8 cpu, 16GB ram : 8GB Solr and 8GB OS)
with 3-4 Solr cores each. Shards: 9-10 with replicationFactor = 2.

Current sharding: we have 4-5 cores per 8-cpu server.

Documents : 10M +, based on Wikipedia.

We use dynamic fields, with hundred of different fields (400?) but
individual documents have around 50 fields. We store most field.

We do not commit immediately: once every few seconds

We use fast hybrid storage (1 GB/s write)

Cache : generally have 0.4-0.66 hit rate. Default options.

Tried changing max indexing threads and merge factor. No significant gain.

We use an application cache for simple queries (get by id)

Current performance: on a 1100 concurrent users load, we average 3-4
seconds by query / update.

We use a couple of negative filter queries. Example *:* AND -type_id:A

Couple of questions:

-  Suggestions for new optimisations?

-  Anyone seen performance gains from 6.5 to Solr 7.1?

-  Do negative filter queries severely impact performance?

-  Heap is not an issue, but could docValues enhance search
performance? Or just memory usage?

Here’s an excellent guide we used for some of our work :
http://events.linuxfoundation.org/sites/events/files/slides/HighPerformanceSolr.pdf


Nicolas


RE: Solr Spellcheck

2017-11-28 Thread alessandro.benedetti
You spellcheck configurations are quite extensive !
In particular you specified :

  
  0.01 

This means that if the term appears in less than 1 % total docs  it will be
considered misspelled.
Is cholera occurring in your corpus > 1% total docs ?



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Huge Query execution time for multiple ORs

2017-11-28 Thread Faraz Fallahi
Hi

Thx for all the replies.
I think in any way tagging them is probably the best solution on any way.

Best regards

Am 28.11.2017 15:39 schrieb "Toke Eskildsen" :

> On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote:
> > I have a question regarding solr queries.
> > My query basically contains thousand of OR conditions for authors
> > (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
> > The execution time on my index is huge (around 15 sec). When i tag
> > all the associated documents with a custom field and value like
> > authorlist:1 and then i change my query to just search for
> > authorlist:1 it executes in 78 ms. How come there is such a big
> > difference in exec-time?
>
> Due to the nature of inverted indexes (which lies at the heart of
> Solr), your thousands of OR-queries means thousands of lookups, whereas
> your authorlist means a single lookup. Adding to this the results for
> each author needs to be merged with the other author-results - for
> authorlist the results are there directly.
>
> If your author lists are static, indexing them as you did in your test
> is the best solution.
>
> If they are not static, using a filter-query will ensure that they are
> at least cached subsequently, so that only the first call will be
> slow.
>
> If they are semi-static and there are not too many of them, you could
> do warm-up filter-queries for all the different groups so that the
> users does not pay the first-call penalty. This requires your filter-
> cache to be large enough to hold all the author lists.
>
> - Toke Eskildsen, Royal Danish Library
>
>


Re: Does the schema api support xml requests?

2017-11-28 Thread Steve Rowe
Hi,

The schema api does not support XML requests, and there are currently no plans 
I’m aware of to add support. 

--
Steve
www.lucidworks.com

> On Nov 28, 2017, at 8:23 AM, startrekfan  wrote:
> 
> Hey
> 
> Does the schema api supports xml requests? I tried to post a xml formatted
> "add-field" but got an parser exception. Is there no xml support? Is it
> planned to add xml support within the next few months?
> 
> Thanks



Re: Huge Query execution time for multiple ORs

2017-11-28 Thread Toke Eskildsen
On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote:
> I have a question regarding solr queries.
> My query basically contains thousand of OR conditions for authors
> (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
> The execution time on my index is huge (around 15 sec). When i tag
> all the associated documents with a custom field and value like
> authorlist:1 and then i change my query to just search for
> authorlist:1 it executes in 78 ms. How come there is such a big
> difference in exec-time?

Due to the nature of inverted indexes (which lies at the heart of
Solr), your thousands of OR-queries means thousands of lookups, whereas
your authorlist means a single lookup. Adding to this the results for
each author needs to be merged with the other author-results - for
authorlist the results are there directly.

If your author lists are static, indexing them as you did in your test
is the best solution.

If they are not static, using a filter-query will ensure that they are
at least cached subsequently, so that only the first call will be
slow. 

If they are semi-static and there are not too many of them, you could
do warm-up filter-queries for all the different groups so that the
users does not pay the first-call penalty. This requires your filter-
cache to be large enough to hold all the author lists.

- Toke Eskildsen, Royal Danish Library



Re: Solr mm is field Level in case sow is false

2017-11-28 Thread Steve Rowe
Hi Aman,

From the last bullet in the “Caveats and remaining issues” section of m 
query-time multi-word synonyms blog: 
,
 in part:

> sow=false changes the queries edismax produces over multiple fields when
> any of the fields’ query-time analysis differs from the other fields’ [...]
> This can change results in general, but quite significantly when combined
> with the mm (min-should-match) request parameter: since min-should-match
> applies per field instead of per term, missing terms in one field’s analysis
> won’t disqualify docs from matching.

One effective way of addressing this issue is to make all queried fields use 
the same analysis, e.g. by copy-fielding the subset of fields that are 
different into ones that are the same, and then querying against the target 
fields instead.

--
Steve
www.lucidworks.com

> On Nov 28, 2017, at 5:25 AM, Aman Deep singh  
> wrote:
> 
> Hi,
> When sow is set to false then solr query is generated a little differently as 
> compared to sow=true
> 
> Solr version -6.6.1
> 
> User query -Asus ZenFone Go ZB5 Smartphone
> mm is set to 100%
> qf=nameSearch^7 brandSearch
> 
> field definition
> 
> 1. nameSearch—
>  autoGeneratePhraseQueries="false" positionIncrementGap="100">
>
>
> replacement="and"/>
>
> generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" 
> preserveOriginal="1" catenateAll="1" catenateWords="1"/>
>
>
>
>
>
> replacement="and"/>
>
> synonyms=“synonyms.txt"/>
> generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" 
> splitOnNumerics="1" preserveOriginal="0" catenateAll="0" catenateWords="0"/>
>
>
>
> 
> 
> 2. brandSearch
>  autoGeneratePhraseQueries="true" positionIncrementGap="100">
>
>
>
>
>
>
>
>
> 
> 
> 
> with sow=false
> "parsedquery":"(+DisjunctionMaxQuerybrandSearch:asus brandSearch:zenfone 
> brandSearch:go brandSearch:zb5 brandSearch:smartphone)~5) | ((nameSearch:asus 
> nameSearch:zen nameSearch:fone nameSearch:go nameSearch:zb nameSearch:5 
> nameSearch:smartphone)~7)^7.0)))/no_coord",
> 
> with sow=true
> "parsedquery":"(+(DisjunctionMaxQuery((brandSearch:asus | 
> (nameSearch:asus)^7.0)) DisjunctionMaxQuery((brandSearch:zenfone | 
> ((nameSearch:zen nameSearch:fone)~2)^7.0)) 
> DisjunctionMaxQuery((brandSearch:go | (nameSearch:go)^7.0)) 
> DisjunctionMaxQuery((brandSearch:zb5 | ((nameSearch:zb nameSearch:5)~2)^7.0)) 
> DisjunctionMaxQuery((brandSearch:smartphone | 
> (nameSearch:smartphone)^7.0)))~5)/no_coord",
> 
> 
> 
> If you see the difference in parsed query in sow=false case mm is working as 
> field level while in case of sow=true mm is working across the field
> 
> We need to use sow=false as it is the only way to use multiword synonyms
> Any idea why it is behaving in this manner and any way to fix so that mm will 
> work across fields in qf.
> 
> Thanks,
> Aman Deep Singh



Does the schema api support xml requests?

2017-11-28 Thread startrekfan
Hey

Does the schema api supports xml requests? I tried to post a xml formatted
"add-field" but got an parser exception. Is there no xml support? Is it
planned to add xml support within the next few months?

Thanks


Solr mm is field Level in case sow is false

2017-11-28 Thread Aman Deep singh
Hi,
When sow is set to false then solr query is generated a little differently as 
compared to sow=true

Solr version -6.6.1

User query -Asus ZenFone Go ZB5 Smartphone
mm is set to 100%
qf=nameSearch^7 brandSearch

field definition

1. nameSearch—




















2. brandSearch












with sow=false
"parsedquery":"(+DisjunctionMaxQuerybrandSearch:asus brandSearch:zenfone 
brandSearch:go brandSearch:zb5 brandSearch:smartphone)~5) | ((nameSearch:asus 
nameSearch:zen nameSearch:fone nameSearch:go nameSearch:zb nameSearch:5 
nameSearch:smartphone)~7)^7.0)))/no_coord",

with sow=true
"parsedquery":"(+(DisjunctionMaxQuery((brandSearch:asus | 
(nameSearch:asus)^7.0)) DisjunctionMaxQuery((brandSearch:zenfone | 
((nameSearch:zen nameSearch:fone)~2)^7.0)) DisjunctionMaxQuery((brandSearch:go 
| (nameSearch:go)^7.0)) DisjunctionMaxQuery((brandSearch:zb5 | ((nameSearch:zb 
nameSearch:5)~2)^7.0)) DisjunctionMaxQuery((brandSearch:smartphone | 
(nameSearch:smartphone)^7.0)))~5)/no_coord",



If you see the difference in parsed query in sow=false case mm is working as 
field level while in case of sow=true mm is working across the field

We need to use sow=false as it is the only way to use multiword synonyms
Any idea why it is behaving in this manner and any way to fix so that mm will 
work across fields in qf.

Thanks,
Aman Deep Singh

Re: Huge Query execution time for multiple ORs

2017-11-28 Thread Mikhail Khludnev
Long queries hurt Solr on many layers. You can experiment with
https://lucene.apache.org/solr/guide/7_1/other-parsers.html#terms-query-parser


On Tue, Nov 28, 2017 at 1:07 PM, Faraz Fallahi  wrote:

> Hi
>
> I have a question regarding solr queries.
> My query basically contains thousand of OR conditions for authors
> (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
> The execution time on my index is huge (around 15 sec). When i tag all the
> associated documents with a custom field and value like authorlist:1 and
> then i change my query to just search for authorlist:1 it executes in 78
> ms. How come there is such a big difference in exec-time?
> Can somebody please explain why there is sucha difference (maybe the query
> parser?) and if there is a way to speed this up?
>
> Thx for the help
>



-- 
Sincerely yours
Mikhail Khludnev


Huge Query execution time for multiple ORs

2017-11-28 Thread Faraz Fallahi
Hi

I have a question regarding solr queries.
My query basically contains thousand of OR conditions for authors
(author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
The execution time on my index is huge (around 15 sec). When i tag all the
associated documents with a custom field and value like authorlist:1 and
then i change my query to just search for authorlist:1 it executes in 78
ms. How come there is such a big difference in exec-time?
Can somebody please explain why there is sucha difference (maybe the query
parser?) and if there is a way to speed this up?

Thx for the help


Re: Sort with subquery

2017-11-28 Thread Mikhail Khludnev
No way. Subquery processes a cropped page of docs sorted before.
Giving the subquery, you can use {!join score=Max ...}... see
http://blog-archive.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html


On Tue, Nov 28, 2017 at 12:39 AM, Jinyi Lu  wrote:

> Hi all,
>
> I have a question about how to sort results based on the fields in the
> subquery. It’s exactly same as this question posted on the stackoverflow
> https://stackoverflow.com/questions/47127478/solr-how-
> to-sort-based-on-subquery but no answer yet.
>
> Basically, I have two collections:
>
>   1.  Static data like the information about the objects.
> {
>   "id": "a",
>   "type": "type1"
> }
>
>   1.  Status about the objects in the previous collection which will be
> frequently updated.
> {
>   "object_id": "a",
>   "cnt": 1
> }
>
> By using queries like q=id:*&fl=*,status:[subquery]&status.q=
> status.q={!term f=object_id v=$row.id}, I am able to combine two
> collections together and the response is something like:
> [{
>   "id": "a",
>   "type": "type1"
>   "status":{"numFound":1, "start":0, "docs":[
> {
>   "object_id": "a",
>   "cnt": 1
> }]
>   }
> },
> …]
>
> But is there a way to sort the results based on the fields in the
> subquery, like "cnt" in this case? Any ideas are appreciated!
>
> Thanks!
> Jinyi
>



-- 
Sincerely yours
Mikhail Khludnev