Increase in response time in case of collapse queries.

2021-03-03 Thread Parshant Kumar
Hi all,

We have implemented collapse queries in place of grouped queries on our
production solr. As mentioned in solr documentation collapse queries are
recommended in place of grouped queries in terms of performance . But after
switching to collapsed queries from grouped queries response time of
queries have increased. This is unexpected behaviour, the response time
should have been improved but results are opposites.
Please someone help why response time is increased for collapsed queries.

Thanks
Parshant Kumar

-- 



Re: Tweaking Shards and Replicas for high volume queries and updates

2021-02-01 Thread Dominique Bejean
Hi,

Some suggestions.

* 64GB JVM Heap
Are you sure you really need this heap size ? Did you check in your GC logs
(with gceasy.io) ?
A best practice is to minimize as possible the heap size and never more
than 31 GB.

* OS Caching
Did you set swappiness to 1 ?

* Put two instances of Solr on each node
You need to check resource usage in order to evaluate if it could be
interesting (CPU usage, CPU load average, CPU iowait, Heap usage, Disk I/O
read and write, MMAP caching, ...)
Load Average high with CPU Load low looks like Disk I/O can be the
bottleneck. I would consider increasing the number of physical servers with
less CPU, RAM and disk space on each (but globally with the same quantity
of CPU, RAM and disk space). This will increase the disk I/O capacity.

* Collection 4 is the trouble collection
Try to have smaller cores (more shards if you increase the number of Solr
instances)
Investigate in time routed ou category routed aliases if it can match with
your update strategy and/or your queries profiles.
Work again on shema :
- For docValues=true fields, check if you really need indexed=true and
storted=true (there are a lot of considerations to take in account), ...
- Over-indexing with copyfield ?
Work on queries : facets, group, collapse, fl=, rows=, ...

Regards

Dominique


Le mer. 27 janv. 2021 à 14:53, Hollowell,Skip  a écrit :

> 30 Dedicated physical Nodes in the Solr Cloud Cluster, all of identical
> configuration
> Server01   RHEL 7.x
> 256GB RAM
> 10 2TB Spinning Disk in a RAID 10 Configuration (Leaving us 9.8TB usable
> per node)
> 64GB JVM Heap, Tried has high as 100GB, but it appeared that 64GB was
> faster.  If we set a higher heap, do we starve the OS for caching?
> Huge Pages is off on the system, and thus UseLargePages is off on Solr
> Startup
> G1GC, Java 11  (ZGC with Java 15 and HugePages turned on was a disaster.
> We suspect it was due to the Huge Pages configuration)
> At one time we discussed putting two instances of Solr on each node,
> giving us a cloud of 60 instances instead of 30.  Load Average is high on
> these nodes during certain types of queries or updates, but CPU Load is
> relatively low and should be able to accommodate a second instance, but all
> the data would still be on the same RAID10 group of disks.
> Collection 4 is the trouble collection.  It has nearly a billion
> documents, and there are between 200 and 400 million updates every day.
> How do we get that kind of update performance, and still serve 10 million
> queries a day?  Schemas have been reviewed and re-reviewed to ensure we are
> only indexing and storing what is absolutely necessary.  What are we
> missing?  Do we need to revisit our replica policy?  Number of replicas or
> types of replicas (to ensure some are only used for reading, etc?)
> [Grabbed from the Admin UI]
> 755.6Gb Index Size according to Solr Cloud UI
> Total #docs: 371.8mn
> Avg size/doc: 2.1Kb
> 90 Shards, 2 NRT Replicas per Shard, 1,750,612,476 documents, avg
> size/doc: 1.7Kb, uses nested documents
> collection-1_s69r317   31.1Gb
> collection-1_s49r96 30.7Gb
> collection-1_s78r154   30.2Gb
> collection-1_s40r259   30.1Gb
> collection-1_s9r197 29.1Gb
> collection-1_s18r34 28.9Gb
> 120 Shards, 2 TLOG Replicas per Shard, 2,230,207,046 documents, avg
> size/doc: 1.3Kb
> collection-2_s78r154   22.8Gb
> collection-2_s49r96 22.8Gb
> collection-2_s46r331   22.8Gb
> collection-2_s18r34 22.7Gb
> collection-2_s109r21622.7Gb
> collection-2_s104r44722.7Gb
> collection-2_s15r269   22.7Gb
> collection-2_s73r385   22.7Gb
> 120 Shards, 2 TLOG Replicas per Shard, 733,588,503 documents, avg
> size/doc: 1.9Kb
> collection-3_s19r277   10.6Gb
> collection-3_s108r21410.6Gb
> collection-3_s48r94 10.6Gb
> collection-3_s109r45710.6Gb
> collection-3_s47r333   10.5Gb
> collection-3_s78r154   10.5Gb
> collection-3_s18r34 10.5Gb
> collection-3_s77r393   10.5Gb
>
> 120 Shards, 2 TLOG Replicas per Shard, 864,372,654 documents, avg
> size/doc: 5.6Kb
> collection-4_s109r21638.7Gb
> collection-4_s100r43938.7Gb
> collection-4_s49r96 38.7Gb
> collection-4_s35r309   38.6Gb
> collection-4_s18r34 38.6Gb
> collection-4_s78r154   38.6Gb
> collection-4_s7r253 38.6Gb
> collection-4_s69r377   38.6Gb
>


Re: Is there way to autowarm new searcher using recently ran queries

2021-01-28 Thread Chris Hostetter


: I am wondering if there is a way to warmup new searcher on commit by
: rerunning queries processed by the last searcher. May be it happens by
: default but then I can't understand why we see high query times if those
: searchers are being warmed.

it only happens by default if you have an 'autowarmCount' enabled for each 
cache...

https://lucene.apache.org/solr/guide/8_7/query-settings-in-solrconfig.html#caches

But note that this warms the caches *individually* --it doesn't re-simular 
a "full request" so some things (like stored fields) may still be "cold" 
on disk.

This typically isn't a problem -- eccept for people relying on FieldCache 
-- which is a query tie "un-inversion" of fields for sorting/faceting -- 
and has no explicit solr configuration or warming.

for that you have to use soemthing like joel described -- static 
'newSearcher' QuerySenderListenr queries that will sort/facet on those 
fields

https://lucene.apache.org/solr/guide/8_7/query-settings-in-solrconfig.html#query-related-listeners

...but a better solution is to make sure you use DocValues on these fields 
instead.




-Hoss
http://www.lucidworks.com/


Re: Is there way to autowarm new searcher using recently ran queries

2021-01-27 Thread Joel Bernstein
Typically what you would do is add static warming queries to warm all the
caches. These queries are hardcoded into the solrconfig.xml. You'll want to
run the facets you're using in the warming queries particularly facets on
string fields.

Once you add these it will take longer to warm the new searcher so you may
need to change the auto-commit intervals.




Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jan 27, 2021 at 5:30 PM Pushkar Raste 
wrote:

> Hi,
>
> A rookie question. We have a Solr cluster that doesn't get too much
> traffic. We see that our queries take long time unless we run a script to
> send more traffic to Solr.
>
> We are indexing data all the time and use autoCommit.
>
> I am wondering if there is a way to warmup new searcher on commit by
> rerunning queries processed by the last searcher. May be it happens by
> default but then I can't understand why we see high query times if those
> searchers are being warmed.
>


Is there way to autowarm new searcher using recently ran queries

2021-01-27 Thread Pushkar Raste
Hi,

A rookie question. We have a Solr cluster that doesn't get too much
traffic. We see that our queries take long time unless we run a script to
send more traffic to Solr.

We are indexing data all the time and use autoCommit.

I am wondering if there is a way to warmup new searcher on commit by
rerunning queries processed by the last searcher. May be it happens by
default but then I can't understand why we see high query times if those
searchers are being warmed.


Tweaking Shards and Replicas for high volume queries and updates

2021-01-27 Thread Hollowell,Skip
30 Dedicated physical Nodes in the Solr Cloud Cluster, all of identical 
configuration
Server01   RHEL 7.x
256GB RAM
10 2TB Spinning Disk in a RAID 10 Configuration (Leaving us 9.8TB usable per 
node)
64GB JVM Heap, Tried has high as 100GB, but it appeared that 64GB was faster.  
If we set a higher heap, do we starve the OS for caching?
Huge Pages is off on the system, and thus UseLargePages is off on Solr Startup
G1GC, Java 11  (ZGC with Java 15 and HugePages turned on was a disaster.  We 
suspect it was due to the Huge Pages configuration)
At one time we discussed putting two instances of Solr on each node, giving us 
a cloud of 60 instances instead of 30.  Load Average is high on these nodes 
during certain types of queries or updates, but CPU Load is relatively low and 
should be able to accommodate a second instance, but all the data would still 
be on the same RAID10 group of disks.
Collection 4 is the trouble collection.  It has nearly a billion documents, and 
there are between 200 and 400 million updates every day.  How do we get that 
kind of update performance, and still serve 10 million queries a day?  Schemas 
have been reviewed and re-reviewed to ensure we are only indexing and storing 
what is absolutely necessary.  What are we missing?  Do we need to revisit our 
replica policy?  Number of replicas or types of replicas (to ensure some are 
only used for reading, etc?)
[Grabbed from the Admin UI]
755.6Gb Index Size according to Solr Cloud UI
Total #docs: 371.8mn
Avg size/doc: 2.1Kb
90 Shards, 2 NRT Replicas per Shard, 1,750,612,476 documents, avg size/doc: 
1.7Kb, uses nested documents
collection-1_s69r317   31.1Gb
collection-1_s49r96 30.7Gb
collection-1_s78r154   30.2Gb
collection-1_s40r259   30.1Gb
collection-1_s9r197 29.1Gb
collection-1_s18r34 28.9Gb
120 Shards, 2 TLOG Replicas per Shard, 2,230,207,046 documents, avg size/doc: 
1.3Kb
collection-2_s78r154   22.8Gb
collection-2_s49r96 22.8Gb
collection-2_s46r331   22.8Gb
collection-2_s18r34 22.7Gb
collection-2_s109r21622.7Gb
collection-2_s104r44722.7Gb
collection-2_s15r269   22.7Gb
collection-2_s73r385   22.7Gb
120 Shards, 2 TLOG Replicas per Shard, 733,588,503 documents, avg size/doc: 
1.9Kb
collection-3_s19r277   10.6Gb
collection-3_s108r21410.6Gb
collection-3_s48r94 10.6Gb
collection-3_s109r45710.6Gb
collection-3_s47r333   10.5Gb
collection-3_s78r154   10.5Gb
collection-3_s18r34 10.5Gb
collection-3_s77r393   10.5Gb

120 Shards, 2 TLOG Replicas per Shard, 864,372,654 documents, avg size/doc: 
5.6Kb
collection-4_s109r21638.7Gb
collection-4_s100r43938.7Gb
collection-4_s49r96 38.7Gb
collection-4_s35r309   38.6Gb
collection-4_s18r34 38.6Gb
collection-4_s78r154   38.6Gb
collection-4_s7r253 38.6Gb
collection-4_s69r377   38.6Gb


Re: Queries Regarding Cold searcher

2021-01-22 Thread Shawn Heisey

On 1/21/2021 3:42 AM, Parshant Kumar wrote:

Do value(true or false) of cold searcher play any role during the
completion of replication on slave server.If not please tell in which
process in solr its applied?


The setting to use a cold searcher applies whenever a new searcher is 
opened.  It determines what happens while the new searcher is warming. 
If it's false, queries will be answered by the old searcher until all of 
the warming work is complete on the new searcher, at which time Solr 
will switch to the new one and work on dismantling the old one.  If it's 
true, then the new searcher will be used immediately, before warming is 
finished.


In order for Solr to do queries on an index that has changed for any 
reason, including replication, a new searcher is required.  If Solr 
doesn't open a new searcher, it will still be querying the index that 
existed before the change.


Thanks,
Shawn


Re: Queries Regarding Cold searcher

2021-01-21 Thread Parshant Kumar
Adding more queries :-

Do value(true or false) of cold searcher play any role during the
completion of replication on slave server.If not please tell in which
process in solr its applied?

On Thu, Jan 21, 2021 at 3:11 PM Parshant Kumar 
wrote:

> Hi all,
>
> Please help me in below queries:
>
> 1) what is the impact of making cold searcher false,true?
> 2)After full replication completion of data on slave server, new searcher
> is opened or not?
> 3)If opensearcher is false in autocommit and cold searcher is true , what
> does this conclude , Is their any interconnection between both of them?
>
> Thanks
> Parshant
>
>

-- 



Queries Regarding Cold searcher

2021-01-21 Thread Parshant Kumar
Hi all,

Please help me in below queries:

1) what is the impact of making cold searcher false,true?
2)After full replication completion of data on slave server, new searcher
is opened or not?
3)If opensearcher is false in autocommit and cold searcher is true , what
does this conclude , Is their any interconnection between both of them?

Thanks
Parshant

-- 



SockerTimeoutException in long running streaming queries

2021-01-13 Thread ufuk yılmaz
When I performa a long running streaming expression, sometimes I get:

{
"error": {
"metadata": [
"error-class",
"org.apache.solr.common.SolrException",
"root-error-class",
"java.net.SocketTimeoutException"
],
"msg": "Error trying to proxy request for url:

Is this related to SOLR-13457, or is this a completely different thing?

Regards

Sent from Mail for Windows 10



Re: Monitoring Solr for currently running queries

2020-12-29 Thread Markus Jelsma
Hello Ufuk,

You can log slow queries [1].

If you would want to see currently running queries you would have to extend
SearchHandler and build the custom logic yourself. Watch out for SolrCloud
because the main query as well as the per-shard queries can pass through
that same SearchHandler. You can distinguish between then reading the
shard=true parameter.

Regards,
Markus

[1] https://lucene.apache.org/solr/guide/6_6/configuring-logging.html

Op di 29 dec. 2020 om 16:49 schreef ufuk yılmaz :

> Hello All,
>
> Is there a way to see currently executing queries in a SolrCloud? Or a
> general strategy to detect a query using absurd amount or resources?
>
> We are using Solr for not only simple querying, but running complex
> streaming expressions, facets with large data etc. Sometimes, randomly, CPU
> usage gets so high that it starts to respond very slowly to even simple
> queries, or don’t respond at all. I’m trying to determine if it’s a result
> of simple overloading of the system by many “normal” queries, or someone
> sends Solr an unreasonably compute-heavy request.
>
> A few days ago when this occured, I stopped every service that can send
> Solr a query. After that, for about an hour, nodes were reading from the
> disk at 1GB/s which is the maximum of our disks. Then everything went back
> to the normal as I started the other services.
>
> One (bad) idea I had is to build a proxy service which proxies every
> request to our SolrCloud and monitors current running requests, but scaling
> this to the size of SolrCloud may be reinventing the wheel.
>
> For now all I can detect is that Solr is struggling, but I have no idea
> what causes that and when.
>
> -Chees and happy new year
>


Monitoring Solr for currently running queries

2020-12-29 Thread ufuk yılmaz
Hello All,

Is there a way to see currently executing queries in a SolrCloud? Or a general 
strategy to detect a query using absurd amount or resources?

We are using Solr for not only simple querying, but running complex streaming 
expressions, facets with large data etc. Sometimes, randomly, CPU usage gets so 
high that it starts to respond very slowly to even simple queries, or don’t 
respond at all. I’m trying to determine if it’s a result of simple overloading 
of the system by many “normal” queries, or someone sends Solr an unreasonably 
compute-heavy request.

A few days ago when this occured, I stopped every service that can send Solr a 
query. After that, for about an hour, nodes were reading from the disk at 1GB/s 
which is the maximum of our disks. Then everything went back to the normal as I 
started the other services.

One (bad) idea I had is to build a proxy service which proxies every request to 
our SolrCloud and monitors current running requests, but scaling this to the 
size of SolrCloud may be reinventing the wheel.

For now all I can detect is that Solr is struggling, but I have no idea what 
causes that and when.

-Chees and happy new year


Facet count issues wit multi sharded collections, same issue with multi collection queries?

2020-12-16 Thread ufuk yılmaz
Hi everyone,

Last day I was comparing term+range facet counts from two different collections 
having exact same data and schema. Only difference is one collection has 2 
shards, the other 1. After searching about this I came upon an article: 
medium.com

My results were like this:
Counts from collection with two shard: 120, 60, 30 ...
One shard: 120, 90, 60, 30...

After a bit fiddling with overrequest, overrefine etc. parameters, two results 
started to match. It seems I need to work on this.

My real question is, I don’t really need 2 shards but I use multi-collection 
querying a lot like:

http://solr:8983/collection1,collection2,collection3/select ...

Or with an alias pointing to multiple collections:
http://solr:8983/myAliasHaving10Collections/select ...


Does this have the same issue with multiple shards? My guess is it does, since 
it’s a logical problem with distributed systems and not an implementation issue 
it must be so, but I wanted to ask since I’m new to this.

Have a nice week

Sent from Mail for Windows 10



Re: Solr collapse & expand queries.

2020-11-30 Thread Joel Bernstein
Both collapse and grouping are used quite often so I'm not sure I would
agree with the preference for collapse. There is a very specific use case
where collapse performs better and in these scenarios collapse might be the
only option that would work.

The use case where collapse works better is:

1) High cardinality grouping field, like product id.
2) Larger result sets
3) The need to know the full number of groups that match the result set. In
grouping this is group.ngroups.

At a certain point grouping will become too slow under the scenario
described above. It will all depend on the scale of #1 and #2 above. If you
remove group.ngroups grouping will usually be just as fast or faster then
collapse.

So in your testing, make sure you're testing the full data set with
representative queries, and decide if group.ngroups is needed.







Joel Bernstein
http://joelsolr.blogspot.com/


On Sat, Nov 28, 2020 at 3:42 AM Parshant Kumar
 wrote:

> Hi community,
>
> I want to implement collapse queries instead of group queries . In solr
> documentation it is stated that we should prefer collapse & expand queries
> instead of group queries.Please explain how the collapse & expand queries
> is better than grouped queries ? How can I implement it ? Do i need to add
> anything in *solrconfig.xml file* as well or just need to make changes in
> solr queries like below:
>
>
> *fq={!collapse field=*field*}=n=true  instead of
> group.field=*field*=true=n*
>
> I have done performance testing by making above changes in solr queries and
> found that query times are almost the same for both collapse queries and
> group queries.
>
> Please help me how to implement it and its advantage over grouped queries.
>
> Thanks,
> Parshant Kumar.
>
> --
>
>


Solr collapse & expand queries.

2020-11-28 Thread Parshant Kumar
Hi community,

I want to implement collapse queries instead of group queries . In solr
documentation it is stated that we should prefer collapse & expand queries
instead of group queries.Please explain how the collapse & expand queries
is better than grouped queries ? How can I implement it ? Do i need to add
anything in *solrconfig.xml file* as well or just need to make changes in
solr queries like below:


*fq={!collapse field=*field*}=n=true  instead of
group.field=*field*=true=n*

I have done performance testing by making above changes in solr queries and
found that query times are almost the same for both collapse queries and
group queries.

Please help me how to implement it and its advantage over grouped queries.

Thanks,
Parshant Kumar.

-- 



Re: Possible to add a default "appends" fq except for queries in the admin GUI?

2020-10-22 Thread Batanun B
Well, we are not making the http requests to solr, we are using a 3rd party 
component for that, which as configuration basically only takes a base URL (ie 
domain, port etc, without path) and the name of the collection. So it is not 
possible to define the request parser here. So we would have to specify it as a 
qt parameter when performing the search, but that is more or less the same as 
having to define the fq parameter to filter out unwanted documents. We would 
like the fundamental basic "no frills" default search to include this fq, while 
still not being hindered by it when using the solr admin gui. But, if I 
interpret you correctly that is impossible, right?

From: Alexandre Rafalovitch 
Sent: Thursday, October 22, 2020 8:50 PM
To: solr-user 
Subject: Re: Possible to add a default "appends" fq except for queries in the 
admin GUI?

Why not have a custom handler endpoint for your online queries? You
will be modifying them anyway to remove fq.

Or even create individual endpoints for every significant use-case.
You can share the configuration between them with initParams or
useParams, but have more flexibility going forward.

Admin UI allows you to change /select, but - like you said - manually
and every time.

Regards,
  Alex.

On Thu, 22 Oct 2020 at 14:18, Batanun B  wrote:
>
> Hi,
>
> We have multiple components that uses the Solr search feature on our 
> websites. But we have some documents in the index that we never want to 
> display in the search results (nothing secret or anything, just uninteresting 
> for the user to see). So far, we have added a fq to all our queries, that 
> filters out these documents. But we would like to not have to do this, since 
> there is always a risk of us forgetting to add that fq parameter.
>
> So, today i tried adding this fq in a "appends" list in the standard 
> requestHandler. I needed to add it to the standard one, since that's the one 
> that all the search components use (ie, no qt parameter defined, and i would 
> prefer not to have to change that). That worked fine. Until I needed to do a 
> query in the solr admin GUI, and realized that this filter query was used 
> there too, effectively hiding a bunch of documents that I as an administrator 
> need to see.
>
> Is there a way to avoid this problem? Can I somehow configure Solr to not use 
> this filter query when in the admin GUI? If I define a separate request 
> handler in solrconfig, can i make the admin GUI always use this by default? I 
> don't want to have to manually change the request handler in the admin GUI 
> every time.
>
> What I tried so far:
>
> * Adding the fq in the "appends" in the standard request handler, as 
> mentioned above. Causing the filter to always be in effect, even in admin GUI
> * Keeping the configuration as above, but also adding a request handler with 
> name="/select", that doesn't have this fq defined. Then the filter was never 
> applied, not in admin GUI and not on any website search


Re: Possible to add a default "appends" fq except for queries in the admin GUI?

2020-10-22 Thread Alexandre Rafalovitch
Why not have a custom handler endpoint for your online queries? You
will be modifying them anyway to remove fq.

Or even create individual endpoints for every significant use-case.
You can share the configuration between them with initParams or
useParams, but have more flexibility going forward.

Admin UI allows you to change /select, but - like you said - manually
and every time.

Regards,
  Alex.

On Thu, 22 Oct 2020 at 14:18, Batanun B  wrote:
>
> Hi,
>
> We have multiple components that uses the Solr search feature on our 
> websites. But we have some documents in the index that we never want to 
> display in the search results (nothing secret or anything, just uninteresting 
> for the user to see). So far, we have added a fq to all our queries, that 
> filters out these documents. But we would like to not have to do this, since 
> there is always a risk of us forgetting to add that fq parameter.
>
> So, today i tried adding this fq in a "appends" list in the standard 
> requestHandler. I needed to add it to the standard one, since that's the one 
> that all the search components use (ie, no qt parameter defined, and i would 
> prefer not to have to change that). That worked fine. Until I needed to do a 
> query in the solr admin GUI, and realized that this filter query was used 
> there too, effectively hiding a bunch of documents that I as an administrator 
> need to see.
>
> Is there a way to avoid this problem? Can I somehow configure Solr to not use 
> this filter query when in the admin GUI? If I define a separate request 
> handler in solrconfig, can i make the admin GUI always use this by default? I 
> don't want to have to manually change the request handler in the admin GUI 
> every time.
>
> What I tried so far:
>
> * Adding the fq in the "appends" in the standard request handler, as 
> mentioned above. Causing the filter to always be in effect, even in admin GUI
> * Keeping the configuration as above, but also adding a request handler with 
> name="/select", that doesn't have this fq defined. Then the filter was never 
> applied, not in admin GUI and not on any website search


Possible to add a default "appends" fq except for queries in the admin GUI?

2020-10-22 Thread Batanun B
Hi,

We have multiple components that uses the Solr search feature on our websites. 
But we have some documents in the index that we never want to display in the 
search results (nothing secret or anything, just uninteresting for the user to 
see). So far, we have added a fq to all our queries, that filters out these 
documents. But we would like to not have to do this, since there is always a 
risk of us forgetting to add that fq parameter.

So, today i tried adding this fq in a "appends" list in the standard 
requestHandler. I needed to add it to the standard one, since that's the one 
that all the search components use (ie, no qt parameter defined, and i would 
prefer not to have to change that). That worked fine. Until I needed to do a 
query in the solr admin GUI, and realized that this filter query was used there 
too, effectively hiding a bunch of documents that I as an administrator need to 
see.

Is there a way to avoid this problem? Can I somehow configure Solr to not use 
this filter query when in the admin GUI? If I define a separate request handler 
in solrconfig, can i make the admin GUI always use this by default? I don't 
want to have to manually change the request handler in the admin GUI every time.

What I tried so far:

* Adding the fq in the "appends" in the standard request handler, as mentioned 
above. Causing the filter to always be in effect, even in admin GUI
* Keeping the configuration as above, but also adding a request handler with 
name="/select", that doesn't have this fq defined. Then the filter was never 
applied, not in admin GUI and not on any website search


Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-19 Thread Dominique Bejean
Shawn,

According to the log4j description (
https://bz.apache.org/bugzilla/show_bug.cgi?id=57714), the issue is related
to lock during appenders collection process.

In addition to CONSOLE and file appenders in the default log4j.properties,
my customer added 2 extra FileAppender dedicated to all requests and slow
requests. I suggested removing these two extra appenders.

Regards

Dominique



Le lun. 19 oct. 2020 à 15:48, Dominique Bejean 
a écrit :

> Hi Shawn,
>
> Thank you for your response.
>
> You are confirming my diagnosis.
>
> This is in fact a 8 nodes cluster with one single collection with 4 shards
> and 1 replica (8 cores).
>
> 4 Gb heap and 90 Gb Ram
>
>
> When no issue occurs nearly 50% of the heap is used.
>
> Num Docs in collection : 10.000.000
>
> Num Docs per core is more or less 2.500.000
>
> Max Doc per core is more or less 3.000.000
>
> Core Data size is more or less 70 Gb
>
> Here are the JVM settings
>
> -DSTOP.KEY=solrrocks
>
> -DSTOP.PORT=7983
>
> -Dcom.sun.management.jmxremote
>
> -Dcom.sun.management.jmxremote.authenticate=false
>
> -Dcom.sun.management.jmxremote.local.only=false
>
> -Dcom.sun.management.jmxremote.port=18983
>
> -Dcom.sun.management.jmxremote.rmi.port=18983
>
> -Dcom.sun.management.jmxremote.ssl=false
>
> -Dhost=
>
> -Djava.rmi.server.hostname=XXX
>
> -Djetty.home=/x/server
>
> -Djetty.port=8983
>
> -Dlog4j.configuration=file:/xx/log4j.properties
>
> -Dsolr.install.dir=/xx/solr
>
> -Dsolr.jetty.request.header.size=32768
>
> -Dsolr.log.dir=/xxx/Logs
>
> -Dsolr.log.muteconsole
>
> -Dsolr.solr.home=//data
>
> -Duser.timezone=Europe/Paris
>
> -DzkClientTimeout=3
>
> -DzkHost=xxx
>
> -XX:+CMSParallelRemarkEnabled
>
> -XX:+CMSScavengeBeforeRemark
>
> -XX:+ParallelRefProcEnabled
>
> -XX:+PrintGCApplicationStoppedTime
>
> -XX:+PrintGCDateStamps
>
> -XX:+PrintGCDetails
>
> -XX:+PrintGCTimeStamps
>
> -XX:+PrintHeapAtGC
>
> -XX:+PrintTenuringDistribution
>
> -XX:+UseCMSInitiatingOccupancyOnly
>
> -XX:+UseConcMarkSweepGC
>
> -XX:+UseGCLogFileRotation
>
> -XX:+UseGCLogFileRotation
>
> -XX:+UseParNewGC
>
> -XX:-OmitStackTraceInFastThrow
>
> -XX:CMSInitiatingOccupancyFraction=50
>
> -XX:CMSMaxAbortablePrecleanTime=6000
>
> -XX:ConcGCThreads=4
>
> -XX:GCLogFileSize=20M
>
> -XX:MaxTenuringThreshold=8
>
> -XX:NewRatio=3
>
> -XX:NumberOfGCLogFiles=9
>
> -XX:OnOutOfMemoryError=/xxx/solr/bin/oom_solr.sh
>
> 8983
>
> /xx/Logs
>
> -XX:ParallelGCThreads=4
>
> -XX:PretenureSizeThreshold=64m
>
> -XX:SurvivorRatio=4
>
> -XX:TargetSurvivorRatio=90
>
> -Xloggc:/xx/solr_gc.log
>
> -Xloggc:/xx/solr_gc.log
>
> -Xms4g
>
> -Xmx4g
>
> -Xss256k
>
> -verbose:gc
>
>
>
> Here is one screenshot of top command for the node that failed last week.
>
> [image: 2020-10-19 15_48_06-Photos.png]
>
> Regards
>
> Dominique
>
>
>
> Le dim. 18 oct. 2020 à 22:03, Shawn Heisey  a écrit :
>
>> On 10/18/2020 3:22 AM, Dominique Bejean wrote:
>> > A few months ago, I reported an issue with Solr nodes crashing due to
>> the
>> > old generation heap growing suddenly and generating OOM. This problem
>> > occurred again this week. I have threads dumps for each minute during
>> the 3
>> > minutes the problem occured. I am using fastthread.io in order to
>> analyse
>> > these dumps.
>>
>> 
>>
>> > * The Log4j issue starts (
>> > https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)
>>
>> If the log4j bug is the root cause here, then the only way you can fix
>> this is to upgrade to at least Solr 7.4.  That is the Solr version where
>> we first upgraded from log4j 1.2.x to log4j2.  You cannot upgrade log4j
>> in Solr 6.6.2 without changing Solr code.  The code changes required
>> were extensive.  Note that I did not do anything to confirm whether the
>> log4j bug is responsible here.  You seem pretty confident that this is
>> the case.
>>
>> Note that if you upgrade to 8.x, you will need to reindex from scratch.
>> Upgrading an existing index is possible with one major version bump, but
>> if your index has ever been touched by a release that's two major
>> versions back, it won't work.  In 8.x, that is enforced -- 8.x will not
>> even try to read an old index touched by 6.x or earlier.
>>
>> In the following wiki page, I provided instructions for getting a
>> screenshot of the process listing.
>>
>> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
>>
>> In addition to that screenshot, I would like to know the on-disk size of
>> all the cores running on the problem node, along with a document count
>> from those cores.  It might be possible to work around the OOM just by
>> increasing the size of the heap.  That won't do anything about problems
>> with log4j.
>>
>> Thanks,
>> Shawn
>>
>


Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-19 Thread Dominique Bejean
Hi Shawn,

Thank you for your response.

You are confirming my diagnosis.

This is in fact a 8 nodes cluster with one single collection with 4 shards
and 1 replica (8 cores).

4 Gb heap and 90 Gb Ram


When no issue occurs nearly 50% of the heap is used.

Num Docs in collection : 10.000.000

Num Docs per core is more or less 2.500.000

Max Doc per core is more or less 3.000.000

Core Data size is more or less 70 Gb

Here are the JVM settings

-DSTOP.KEY=solrrocks

-DSTOP.PORT=7983

-Dcom.sun.management.jmxremote

-Dcom.sun.management.jmxremote.authenticate=false

-Dcom.sun.management.jmxremote.local.only=false

-Dcom.sun.management.jmxremote.port=18983

-Dcom.sun.management.jmxremote.rmi.port=18983

-Dcom.sun.management.jmxremote.ssl=false

-Dhost=

-Djava.rmi.server.hostname=XXX

-Djetty.home=/x/server

-Djetty.port=8983

-Dlog4j.configuration=file:/xx/log4j.properties

-Dsolr.install.dir=/xx/solr

-Dsolr.jetty.request.header.size=32768

-Dsolr.log.dir=/xxx/Logs

-Dsolr.log.muteconsole

-Dsolr.solr.home=//data

-Duser.timezone=Europe/Paris

-DzkClientTimeout=3

-DzkHost=xxx

-XX:+CMSParallelRemarkEnabled

-XX:+CMSScavengeBeforeRemark

-XX:+ParallelRefProcEnabled

-XX:+PrintGCApplicationStoppedTime

-XX:+PrintGCDateStamps

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-XX:+PrintHeapAtGC

-XX:+PrintTenuringDistribution

-XX:+UseCMSInitiatingOccupancyOnly

-XX:+UseConcMarkSweepGC

-XX:+UseGCLogFileRotation

-XX:+UseGCLogFileRotation

-XX:+UseParNewGC

-XX:-OmitStackTraceInFastThrow

-XX:CMSInitiatingOccupancyFraction=50

-XX:CMSMaxAbortablePrecleanTime=6000

-XX:ConcGCThreads=4

-XX:GCLogFileSize=20M

-XX:MaxTenuringThreshold=8

-XX:NewRatio=3

-XX:NumberOfGCLogFiles=9

-XX:OnOutOfMemoryError=/xxx/solr/bin/oom_solr.sh

8983

/xx/Logs

-XX:ParallelGCThreads=4

-XX:PretenureSizeThreshold=64m

-XX:SurvivorRatio=4

-XX:TargetSurvivorRatio=90

-Xloggc:/xx/solr_gc.log

-Xloggc:/xx/solr_gc.log

-Xms4g

-Xmx4g

-Xss256k

-verbose:gc



Here is one screenshot of top command for the node that failed last week.

[image: 2020-10-19 15_48_06-Photos.png]

Regards

Dominique



Le dim. 18 oct. 2020 à 22:03, Shawn Heisey  a écrit :

> On 10/18/2020 3:22 AM, Dominique Bejean wrote:
> > A few months ago, I reported an issue with Solr nodes crashing due to the
> > old generation heap growing suddenly and generating OOM. This problem
> > occurred again this week. I have threads dumps for each minute during
> the 3
> > minutes the problem occured. I am using fastthread.io in order to
> analyse
> > these dumps.
>
> 
>
> > * The Log4j issue starts (
> > https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)
>
> If the log4j bug is the root cause here, then the only way you can fix
> this is to upgrade to at least Solr 7.4.  That is the Solr version where
> we first upgraded from log4j 1.2.x to log4j2.  You cannot upgrade log4j
> in Solr 6.6.2 without changing Solr code.  The code changes required
> were extensive.  Note that I did not do anything to confirm whether the
> log4j bug is responsible here.  You seem pretty confident that this is
> the case.
>
> Note that if you upgrade to 8.x, you will need to reindex from scratch.
> Upgrading an existing index is possible with one major version bump, but
> if your index has ever been touched by a release that's two major
> versions back, it won't work.  In 8.x, that is enforced -- 8.x will not
> even try to read an old index touched by 6.x or earlier.
>
> In the following wiki page, I provided instructions for getting a
> screenshot of the process listing.
>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
>
> In addition to that screenshot, I would like to know the on-disk size of
> all the cores running on the problem node, along with a document count
> from those cores.  It might be possible to work around the OOM just by
> increasing the size of the heap.  That won't do anything about problems
> with log4j.
>
> Thanks,
> Shawn
>


Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-18 Thread Shawn Heisey

On 10/18/2020 3:22 AM, Dominique Bejean wrote:

A few months ago, I reported an issue with Solr nodes crashing due to the
old generation heap growing suddenly and generating OOM. This problem
occurred again this week. I have threads dumps for each minute during the 3
minutes the problem occured. I am using fastthread.io in order to analyse
these dumps.





* The Log4j issue starts (
https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)


If the log4j bug is the root cause here, then the only way you can fix 
this is to upgrade to at least Solr 7.4.  That is the Solr version where 
we first upgraded from log4j 1.2.x to log4j2.  You cannot upgrade log4j 
in Solr 6.6.2 without changing Solr code.  The code changes required 
were extensive.  Note that I did not do anything to confirm whether the 
log4j bug is responsible here.  You seem pretty confident that this is 
the case.


Note that if you upgrade to 8.x, you will need to reindex from scratch. 
Upgrading an existing index is possible with one major version bump, but 
if your index has ever been touched by a release that's two major 
versions back, it won't work.  In 8.x, that is enforced -- 8.x will not 
even try to read an old index touched by 6.x or earlier.


In the following wiki page, I provided instructions for getting a 
screenshot of the process listing.


https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems

In addition to that screenshot, I would like to know the on-disk size of 
all the cores running on the problem node, along with a document count 
from those cores.  It might be possible to work around the OOM just by 
increasing the size of the heap.  That won't do anything about problems 
with log4j.


Thanks,
Shawn


SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-18 Thread Dominique Bejean
ionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)



=== 15h57 -> fastthreads reports issue
Old gen heap full : from 3Gb (3Gb max)
43 threads TIMED_WAITING
250 threads RUNNABLE
110 threads WAITING
112 threads BLOCKED


18 runnable threads are still stuck  (same stack trace) waiting for
response from some other nodes

22 threads are in BLOCKED state (same stack trace)
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:204)
- waiting to lock <0x0007005a4900> (a org.apache.log4j.Logger)

90 threads are in BLOCKED state (same stack trace)
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:816)
- waiting to lock <0x00070087b910> (a
org.apache.solr.util.stats.InstrumentedHttpClient)


=== 15h58 -> Young Gen heap full -> OOM




The global scenario is (6 solr nodes)

1/
The problem starts with very slow queries (30 seconds to 2 minutes) on all
nodes.

2/
On the failing node :
* Some threads are stucked on at
java.net.SocketInputStream.socketRead0(Native Method) waiting for responses
from other nodes
* The Log4j issue starts (
https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)
* The Old generation heap grows in few seconds
* More and more threads handling incoming request are blocked
* The Young generation heap is full -> OOM


Ok, slow queries have to be fixed, but even if some nodes are not
responding to another node, this node shouldn't finally crash with this
Log4J issue.

Before slow queries are fixed, any suggestion in order to avoid this crash ?

Regards.

Dominique


Re: Solr queries slow down over time

2020-09-25 Thread Goutham Tholpadi
Hi Mark, Thanks for confirming Dwane's advice from your own experience. I
will shift to a streaming expressions implementation.

Best
Goutham

On Fri, Sep 25, 2020 at 7:03 PM Mark H. Wood  wrote:

> On Fri, Sep 25, 2020 at 11:49:22AM +0530, Goutham Tholpadi wrote:
> > I have around 30M documents in Solr, and I am doing repeated *:* queries
> > with rows=1, and changing start to 0, 1, 2, and so on, in a
> > loop in my script (using pysolr).
> >
> > At the start of the iteration, the calls to Solr were taking less than 1
> > sec each. After running for a few hours (with start at around 27M) I
> found
> > that each call was taking around 30-60 secs.
> >
> > Any pointers on why the same fetch of 1 records takes much longer
> now?
> > Does Solr need to load all the 27M before getting the last 1 records?
>
> I and many others have run into the same issue.  Yes, each windowed
> query starts fresh, having to find at least enough records to satisfy
> the query, walking the list to discard the first 'start' worth of
> them, and then returning the next 'rows' worth.  So as 'start' increases,
> the work required of Solr increases and the response time lengthens.
>
> > Is there a better way to do this operation using Solr?
>
> Another answer in this thread gives links to resources for addressing
> the problem, and I can't improve on those.
>
> I can say that when I switched from start= windowing to cursormark, I
> got a very nice improvement in overall speed and did not see the
> progressive slowing anymore.  A query loop that ran for *days* now
> completes in under five minutes.  In some way that I haven't quite
> figured out, a cursormark tells Solr where in the overall document
> sequence to start working.
>
> So yes, there *is* a better way.
>
> --
> Mark H. Wood
> Lead Technology Analyst
>
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu
>


Re: Solr queries slow down over time

2020-09-25 Thread Goutham Tholpadi
Thanks a ton, Dwane. I went through the article and the documentation link.
This corresponds exactly to my use case.

Best
Goutham

On Fri, Sep 25, 2020 at 2:59 PM Dwane Hall  wrote:

> Goutham I suggest you read Hossman's excellent article on deep paging and
> why returning rows=(some large number) is a bad idea. It provides an
> thorough overview of the concept and will explain it better than I ever
> could (
> https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#update_2013_12_18).
> In short if you want to extract that many documents out of your corpus use
> cursor mark, streaming expressions, or Solr's parallel SQL interface (that
> uses streaming expressions under the hood)
> https://lucene.apache.org/solr/guide/8_6/streaming-expressions.html.
>
> Thanks,
>
> Dwane
> --
> *From:* Goutham Tholpadi 
> *Sent:* Friday, 25 September 2020 4:19 PM
> *To:* solr-user@lucene.apache.org 
> *Subject:* Solr queries slow down over time
>
> Hi,
>
> I have around 30M documents in Solr, and I am doing repeated *:* queries
> with rows=1, and changing start to 0, 1, 2, and so on, in a
> loop in my script (using pysolr).
>
> At the start of the iteration, the calls to Solr were taking less than 1
> sec each. After running for a few hours (with start at around 27M) I found
> that each call was taking around 30-60 secs.
>
> Any pointers on why the same fetch of 1 records takes much longer now?
> Does Solr need to load all the 27M before getting the last 1 records?
> Is there a better way to do this operation using Solr?
>
> Thanks!
> Goutham
>


Re: Solr queries slow down over time

2020-09-25 Thread Mark H. Wood
On Fri, Sep 25, 2020 at 11:49:22AM +0530, Goutham Tholpadi wrote:
> I have around 30M documents in Solr, and I am doing repeated *:* queries
> with rows=1, and changing start to 0, 1, 2, and so on, in a
> loop in my script (using pysolr).
> 
> At the start of the iteration, the calls to Solr were taking less than 1
> sec each. After running for a few hours (with start at around 27M) I found
> that each call was taking around 30-60 secs.
> 
> Any pointers on why the same fetch of 1 records takes much longer now?
> Does Solr need to load all the 27M before getting the last 1 records?

I and many others have run into the same issue.  Yes, each windowed
query starts fresh, having to find at least enough records to satisfy
the query, walking the list to discard the first 'start' worth of
them, and then returning the next 'rows' worth.  So as 'start' increases,
the work required of Solr increases and the response time lengthens.

> Is there a better way to do this operation using Solr?

Another answer in this thread gives links to resources for addressing
the problem, and I can't improve on those.

I can say that when I switched from start= windowing to cursormark, I
got a very nice improvement in overall speed and did not see the
progressive slowing anymore.  A query loop that ran for *days* now
completes in under five minutes.  In some way that I haven't quite
figured out, a cursormark tells Solr where in the overall document
sequence to start working.

So yes, there *is* a better way.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Solr queries slow down over time

2020-09-25 Thread Dwane Hall
Goutham I suggest you read Hossman's excellent article on deep paging and why 
returning rows=(some large number) is a bad idea. It provides an thorough 
overview of the concept and will explain it better than I ever could 
(https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#update_2013_12_18).
 In short if you want to extract that many documents out of your corpus use 
cursor mark, streaming expressions, or Solr's parallel SQL interface (that uses 
streaming expressions under the hood)
https://lucene.apache.org/solr/guide/8_6/streaming-expressions.html.

Thanks,

Dwane

From: Goutham Tholpadi 
Sent: Friday, 25 September 2020 4:19 PM
To: solr-user@lucene.apache.org 
Subject: Solr queries slow down over time

Hi,

I have around 30M documents in Solr, and I am doing repeated *:* queries
with rows=1, and changing start to 0, 1, 2, and so on, in a
loop in my script (using pysolr).

At the start of the iteration, the calls to Solr were taking less than 1
sec each. After running for a few hours (with start at around 27M) I found
that each call was taking around 30-60 secs.

Any pointers on why the same fetch of 1 records takes much longer now?
Does Solr need to load all the 27M before getting the last 1 records?
Is there a better way to do this operation using Solr?

Thanks!
Goutham


Solr queries slow down over time

2020-09-25 Thread Goutham Tholpadi
Hi,

I have around 30M documents in Solr, and I am doing repeated *:* queries
with rows=1, and changing start to 0, 1, 2, and so on, in a
loop in my script (using pysolr).

At the start of the iteration, the calls to Solr were taking less than 1
sec each. After running for a few hours (with start at around 27M) I found
that each call was taking around 30-60 secs.

Any pointers on why the same fetch of 1 records takes much longer now?
Does Solr need to load all the 27M before getting the last 1 records?
Is there a better way to do this operation using Solr?

Thanks!
Goutham


Re: What is the Best way to block certain types of queries/ query patterns in Solr?

2020-09-08 Thread Mark Robinson
Makes sense.
Thanks much David!

Mark

On Fri, Sep 4, 2020 at 12:13 AM David Smiley  wrote:

> The general assumption in deploying a search platform is that you are going
> to front it with a service you write that has the search features you care
> about, and only those.  Only this service or other administrative functions
> should reach Solr.  Be wary of making your service so flexible to support
> arbitrary parameters you pass to Solr as-is that you don't know about in
> advance (i.e. use an allow-list).
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Aug 31, 2020 at 10:57 AM Mark Robinson 
> wrote:
>
> > Hi,
> > I had come across a mail (Oct, 2019 one) which suggested the best way is
> to
> > handle it before it reaches Solr. I was curious whether:-
> >1. Jetty query filter can be used (came across something like
> > that,, need to check)
> > 2. Any new features in Solr itself (like in a request handler...or
> > solrconfig, schema etc..)
> >
> > Thanks!
> > Mark
> >
>


Re: What is the Best way to block certain types of queries/ query patterns in Solr?

2020-09-03 Thread David Smiley
The general assumption in deploying a search platform is that you are going
to front it with a service you write that has the search features you care
about, and only those.  Only this service or other administrative functions
should reach Solr.  Be wary of making your service so flexible to support
arbitrary parameters you pass to Solr as-is that you don't know about in
advance (i.e. use an allow-list).

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Aug 31, 2020 at 10:57 AM Mark Robinson 
wrote:

> Hi,
> I had come across a mail (Oct, 2019 one) which suggested the best way is to
> handle it before it reaches Solr. I was curious whether:-
>1. Jetty query filter can be used (came across something like
> that,, need to check)
> 2. Any new features in Solr itself (like in a request handler...or
> solrconfig, schema etc..)
>
> Thanks!
> Mark
>


What is the Best way to block certain types of queries/ query patterns in Solr?

2020-08-31 Thread Mark Robinson
Hi,
I had come across a mail (Oct, 2019 one) which suggested the best way is to
handle it before it reaches Solr. I was curious whether:-
   1. Jetty query filter can be used (came across something like
that,, need to check)
2. Any new features in Solr itself (like in a request handler...or
solrconfig, schema etc..)

Thanks!
Mark


Re: Understanding Negative Filter Queries

2020-07-14 Thread Erick Erickson
There’s another possibility if the person I _should_ shoot who
wrote the query can’t change it; add cost=101 and turn it
into a post-filter. It’s not clear to me how much difference
that’d make, but it might be worth a shot, see: 

https://yonik.com/advanced-filter-caching-in-solr-2/

Best,
Erick

> On Jul 14, 2020, at 8:33 AM, Chris Dempsey  wrote:
> 
>> 
>> Well, they’ll be exactly the same if (and only if) every document has a
>> tag. Otherwise, the
>> first one will exclude a doc that has no tag and the second one will
>> include it.
> 
> 
> That's a good point/catch.
> 
> How slow is “very slow”?
>> 
> 
> Well, in the case I was looking at it was about 10x slower but with the
> following caveats that there were 15 or so of these negative fq all some
> version of `fq={!cache=false}(tag:* -tag:)` (*don't shoot me I
> didn't write it lol*) over 15 million documents. Which to me means that
> each fq was doing each step that you described below:
> 
> The second form only has to index into the terms dictionary for the tag
>> field
>> value “email”, then zip down the posting list for all the docs that have
>> it. The
>> first form has to first identify all the docs that have a tag, accumulate
>> that list,
>> _then_ find the “email” value and zip down the postings list.
>> 
> 
> Thanks yet again Erick. That solidified in my mind how this works. Much
> appreciated!
> 
> 
> 
> 
> 
> On Tue, Jul 14, 2020 at 7:22 AM Erick Erickson 
> wrote:
> 
>> Yeah, there are optimizations there. BTW, these two queries are subtly
>> different.
>> 
>> Well, they’ll be exactly the same if (and only if) every document has a
>> tag. Otherwise, the
>> first one will exclude a doc that has no tag and the second one will
>> include it.
>> 
>> How slow is “very slow”?
>> 
>> The second form only has to index into the terms dictionary for the tag
>> field
>> value “email”, then zip down the posting list for all the docs that have
>> it. The
>> first form has to first identify all the docs that have a tag, accumulate
>> that list,
>> _then_ find the “email” value and zip down the postings list.
>> 
>> You could get around this if you require the first form functionality by,
>> say,
>> including a boolean field “has_tags”, then the first one would be
>> 
>> fq=has_tags:true -tags:email
>> 
>> Best,
>> Erick
>> 
>>> On Jul 14, 2020, at 8:05 AM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>>> 
>>> Hi Chris,
>>> tag:* is a wildcard query while *:* is match all query. I believe that
>> adjusting pure negative is turned on by default so you can safely just use
>> -tag:email and it’ll be translated to *:* -tag:email.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
>>>> On 14 Jul 2020, at 14:00, Chris Dempsey  wrote:
>>>> 
>>>> I'm trying to understand the difference between something like
>>>> fq={!cache=false}(tag:* -tag:email) which is very slow compared to
>>>> fq={!cache=false}(*:* -tag:email) on Solr 7.7.1.
>>>> 
>>>> I believe in the case of `tag:*` Solr spends some effort to gather all
>> of
>>>> the documents that have a value for `tag` and then removes those with
>>>> `-tag:email` while in the `*:*` Solr simply uses the document set as-is
>>>> and  then remove those with `-tag:email` (*and I believe Erick mentioned
>>>> there were special optimizations for `*:*`*)?
>>> 
>> 
>> 



Re: Understanding Negative Filter Queries

2020-07-14 Thread Chris Dempsey
>
> Well, they’ll be exactly the same if (and only if) every document has a
> tag. Otherwise, the
> first one will exclude a doc that has no tag and the second one will
> include it.


That's a good point/catch.

How slow is “very slow”?
>

Well, in the case I was looking at it was about 10x slower but with the
following caveats that there were 15 or so of these negative fq all some
version of `fq={!cache=false}(tag:* -tag:)` (*don't shoot me I
didn't write it lol*) over 15 million documents. Which to me means that
each fq was doing each step that you described below:

The second form only has to index into the terms dictionary for the tag
> field
> value “email”, then zip down the posting list for all the docs that have
> it. The
> first form has to first identify all the docs that have a tag, accumulate
> that list,
> _then_ find the “email” value and zip down the postings list.
>

Thanks yet again Erick. That solidified in my mind how this works. Much
appreciated!





On Tue, Jul 14, 2020 at 7:22 AM Erick Erickson 
wrote:

> Yeah, there are optimizations there. BTW, these two queries are subtly
> different.
>
> Well, they’ll be exactly the same if (and only if) every document has a
> tag. Otherwise, the
> first one will exclude a doc that has no tag and the second one will
> include it.
>
> How slow is “very slow”?
>
> The second form only has to index into the terms dictionary for the tag
> field
> value “email”, then zip down the posting list for all the docs that have
> it. The
> first form has to first identify all the docs that have a tag, accumulate
> that list,
> _then_ find the “email” value and zip down the postings list.
>
> You could get around this if you require the first form functionality by,
> say,
> including a boolean field “has_tags”, then the first one would be
>
> fq=has_tags:true -tags:email
>
> Best,
> Erick
>
> > On Jul 14, 2020, at 8:05 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> >
> > Hi Chris,
> > tag:* is a wildcard query while *:* is match all query. I believe that
> adjusting pure negative is turned on by default so you can safely just use
> -tag:email and it’ll be translated to *:* -tag:email.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 14 Jul 2020, at 14:00, Chris Dempsey  wrote:
> >>
> >> I'm trying to understand the difference between something like
> >> fq={!cache=false}(tag:* -tag:email) which is very slow compared to
> >> fq={!cache=false}(*:* -tag:email) on Solr 7.7.1.
> >>
> >> I believe in the case of `tag:*` Solr spends some effort to gather all
> of
> >> the documents that have a value for `tag` and then removes those with
> >> `-tag:email` while in the `*:*` Solr simply uses the document set as-is
> >> and  then remove those with `-tag:email` (*and I believe Erick mentioned
> >> there were special optimizations for `*:*`*)?
> >
>
>


Re: Understanding Negative Filter Queries

2020-07-14 Thread Erick Erickson
Yeah, there are optimizations there. BTW, these two queries are subtly 
different.

Well, they’ll be exactly the same if (and only if) every document has a tag. 
Otherwise, the
first one will exclude a doc that has no tag and the second one will include it.

How slow is “very slow”?

The second form only has to index into the terms dictionary for the tag field
value “email”, then zip down the posting list for all the docs that have it. The
first form has to first identify all the docs that have a tag, accumulate that 
list,
_then_ find the “email” value and zip down the postings list. 

You could get around this if you require the first form functionality by, say, 
including a boolean field “has_tags”, then the first one would be 

fq=has_tags:true -tags:email

Best,
Erick

> On Jul 14, 2020, at 8:05 AM, Emir Arnautović  
> wrote:
> 
> Hi Chris,
> tag:* is a wildcard query while *:* is match all query. I believe that 
> adjusting pure negative is turned on by default so you can safely just use 
> -tag:email and it’ll be translated to *:* -tag:email.
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 14 Jul 2020, at 14:00, Chris Dempsey  wrote:
>> 
>> I'm trying to understand the difference between something like
>> fq={!cache=false}(tag:* -tag:email) which is very slow compared to
>> fq={!cache=false}(*:* -tag:email) on Solr 7.7.1.
>> 
>> I believe in the case of `tag:*` Solr spends some effort to gather all of
>> the documents that have a value for `tag` and then removes those with
>> `-tag:email` while in the `*:*` Solr simply uses the document set as-is
>> and  then remove those with `-tag:email` (*and I believe Erick mentioned
>> there were special optimizations for `*:*`*)?
> 



Re: Understanding Negative Filter Queries

2020-07-14 Thread Emir Arnautović
Hi Chris,
tag:* is a wildcard query while *:* is match all query. I believe that 
adjusting pure negative is turned on by default so you can safely just use 
-tag:email and it’ll be translated to *:* -tag:email.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 14 Jul 2020, at 14:00, Chris Dempsey  wrote:
> 
> I'm trying to understand the difference between something like
> fq={!cache=false}(tag:* -tag:email) which is very slow compared to
> fq={!cache=false}(*:* -tag:email) on Solr 7.7.1.
> 
> I believe in the case of `tag:*` Solr spends some effort to gather all of
> the documents that have a value for `tag` and then removes those with
> `-tag:email` while in the `*:*` Solr simply uses the document set as-is
> and  then remove those with `-tag:email` (*and I believe Erick mentioned
> there were special optimizations for `*:*`*)?



Understanding Negative Filter Queries

2020-07-14 Thread Chris Dempsey
I'm trying to understand the difference between something like
fq={!cache=false}(tag:* -tag:email) which is very slow compared to
fq={!cache=false}(*:* -tag:email) on Solr 7.7.1.

I believe in the case of `tag:*` Solr spends some effort to gather all of
the documents that have a value for `tag` and then removes those with
`-tag:email` while in the `*:*` Solr simply uses the document set as-is
and  then remove those with `-tag:email` (*and I believe Erick mentioned
there were special optimizations for `*:*`*)?


Re: Slow queries until core is reindexed

2020-06-13 Thread dbourassa
I'm pretty sure we found the problem.
It's related to memory.
Sometimes Windows seems to unmap index files from memory, because other
processes need it.
To force Windows to map index files again, we need to rebuild the index.
We can clearly see this behaviour with tools like RAMMap. 

With servers 100% dedicated to Solr (no other softwares on the server), this
problem does not occur.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Log slow queries to SQL Database using Log4j2 (JDBC)

2020-05-26 Thread Krönert Florian
Hi Walter,

thanks for your response.
That sounds like a feasible approach, although I would like to keep the stack 
as small as possible.

But the direction that you pointed out seems promising, the JDBC issues with 
log4j2 don't seem to lead anywhere.

Kind Regards,
Florian

-Original Message-
From: Walter Underwood 
Sent: Dienstag, 26. Mai 2020 02:06
To: solr-user@lucene.apache.org
Subject: Re: Log slow queries to SQL Database using Log4j2 (JDBC)

I would back up and do this a different way, with off-the-shelf parts.

Send the logs to syslog or your favorite log aggregator. From there, configure 
something that puts them into an ELK stack (Elasticsearch, Logstash, Kibana). A 
commercial version of this is logz.io <http://logz.io/>.

Traditional relational databases are not designed for time-series data like 
logs.

Also, do you want your search service to be slow when the SQL database gets 
slow? That is guaranteed to happen. Writing to logs should be very, very low 
overhead. Do all of the processing after Solr writes the log line.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 25, 2020, at 3:53 AM, Krönert Florian  
> wrote:
>
> Hi everyone,
>
> For our Solr instance I have the requirement that all queries should be 
> logged, so that we can later on analyze, which search texts were queried most 
> often.
> Were using solr 8.3.1 using the official docker image, hosted on Azure.
> My approach for implementing this, was now to configure a Slow Request rule 
> of 0ms, so that in fact every request is logged to the slow requests file.
> This part works without any issues.
>
> However now I need to process these logs.
> It would be convenient, if I had those logs already inside a SQL database.
> I saw that log4j2 is able to log to a JDBC database, so for me it seemed the 
> easiest way to just add a new appender for JDBC, which also logs the slow 
> requests.
> Unfortunately I can’t seem to get the JDBC driver loaded properly. I have the 
> correct jar and the correct driver namespace. I’m sure because I use the same 
> setup for the dataimport handler and it works flawlessly.
>
> My log4j2.xml looks like this (removed non-relevant appenders):
>
> 
>   
> 
>connectionString="${jvmrunargs:dataimporter.datasource.url}"
> driverClassName="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> username="${jvmrunargs:dataimporter.datasource.user}"
> password="${jvmrunargs:dataimporter.datasource.password}"
>   />
>   
>   
>   
>   
>   
>   
> 
>   
>   
>  level="info" additivity="false">
>   
> 
>   
> 
>
> I have loaded the JDBC driver per core in solrconfig.xml and also “globally” 
> inside solr.xml by adding its containing folder as libDir.
>
> Unforunately log4j2 still can’t find the JDBC driver, I always receive these 
> issues inside the sysout log:
>
>
> 2020-05-25T10:49:08.616562330Z DEBUG StatusLogger Acquiring JDBC
> connection from jdbc:sqlserver://--redacted--
>  2020-05-25T10:49:08.618023875Z DEBUG
> StatusLogger Loading driver class
> com.microsoft.sqlserver.jdbc.SQLServerDriver
>
> 2020-05-25T10:49:08.623000529Z DEBUG StatusLogger Cannot reestablish
> JDBC connection to FactoryData
> [connectionSource=jdbc:sqlserver://--redacted--
> , tableName=dbo.solr_requestlog,
> columnConfigs=[{ name=date, layout=null, literal=null, timestamp=true
> }, { name=id, layout=%u{RANDOM}, literal=null, timestamp=false }, {
> name=logLevel, layout=%level, literal=null, timestamp=false }, {
> name=logger, layout=%logger, literal=null, timestamp=false }, {
> name=message, layout=%message, literal=null, timestamp=false }, {
> name=exception, layout=%ex{full}, literal=null, timestamp=false }],
> columnMappings=[], immediateFail=false, retry=true,
> reconnectIntervalMillis=5000, truncateStrings=true]: The
> DriverManagerConnectionSource could not load the JDBC driver
> com.microsoft.sqlserver.jdbc.SQLServerDriver:
> java.lang.ClassNotFoundException:
> com.microsoft.sqlserver.jdbc.SQLServerDriver
>
> 2020-05-25T10:49:08.627598771Z  java.sql.SQLException: The
> DriverManagerConnectionSource could not load the JDBC driver
> com.microsoft.sqlserver.jdbc.SQLServerDriver:
> java.lang.ClassNotFoundException:
> com.microsoft.sqlserver.jdbc.SQLServerDriver
>
> 2020-05-25T10:49:08.628239791Z  at 
> org.apache.logging.log4j.core.appender.db.jdbc.AbstractDriverManagerConnectionSource.loadDriver(AbstractDriverManagerConnectionSource.java:203)
>
> 2020-05-25T10:49:08.628442097Z  at 
> org.apache.logging.log4j.core.appender.

Re: Log slow queries to SQL Database using Log4j2 (JDBC)

2020-05-25 Thread Walter Underwood
I would back up and do this a different way, with off-the-shelf parts. 

Send the logs to syslog or your favorite log aggregator. From there, configure 
something that puts them into an ELK stack (Elasticsearch, Logstash, Kibana). A 
commercial version of this is logz.io <http://logz.io/>.

Traditional relational databases are not designed for time-series data like 
logs.

Also, do you want your search service to be slow when the SQL database gets 
slow? That is guaranteed to happen. Writing to logs should be very, very low 
overhead. Do all of the processing after Solr writes the log line.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 25, 2020, at 3:53 AM, Krönert Florian  
> wrote:
> 
> Hi everyone,
>  
> For our Solr instance I have the requirement that all queries should be 
> logged, so that we can later on analyze, which search texts were queried most 
> often.
> Were using solr 8.3.1 using the official docker image, hosted on Azure.
> My approach for implementing this, was now to configure a Slow Request rule 
> of 0ms, so that in fact every request is logged to the slow requests file.
> This part works without any issues.
>  
> However now I need to process these logs.
> It would be convenient, if I had those logs already inside a SQL database.
> I saw that log4j2 is able to log to a JDBC database, so for me it seemed the 
> easiest way to just add a new appender for JDBC, which also logs the slow 
> requests.
> Unfortunately I can’t seem to get the JDBC driver loaded properly. I have the 
> correct jar and the correct driver namespace. I’m sure because I use the same 
> setup for the dataimport handler and it works flawlessly.
>  
> My log4j2.xml looks like this (removed non-relevant appenders):
> 
> 
>   
> 
>connectionString="${jvmrunargs:dataimporter.datasource.url}"
> driverClassName="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> username="${jvmrunargs:dataimporter.datasource.user}"
> password="${jvmrunargs:dataimporter.datasource.password}"
>   />
>   
>   
>   
>   
>   
>   
> 
>   
>   
>  level="info" additivity="false">
>   
> 
>   
> 
>  
> I have loaded the JDBC driver per core in solrconfig.xml and also “globally” 
> inside solr.xml by adding its containing folder as libDir.
> 
> Unforunately log4j2 still can’t find the JDBC driver, I always receive these 
> issues inside the sysout log:
> 
> 
> 2020-05-25T10:49:08.616562330Z DEBUG StatusLogger Acquiring JDBC connection 
> from jdbc:sqlserver://--redacted-- 
> 2020-05-25T10:49:08.618023875Z DEBUG StatusLogger Loading driver class 
> com.microsoft.sqlserver.jdbc.SQLServerDriver
> 
> 2020-05-25T10:49:08.623000529Z DEBUG StatusLogger Cannot reestablish JDBC 
> connection to FactoryData [connectionSource=jdbc:sqlserver://--redacted-- 
> , tableName=dbo.solr_requestlog, columnConfigs=[{ 
> name=date, layout=null, literal=null, timestamp=true }, { name=id, 
> layout=%u{RANDOM}, literal=null, timestamp=false }, { name=logLevel, 
> layout=%level, literal=null, timestamp=false }, { name=logger, 
> layout=%logger, literal=null, timestamp=false }, { name=message, 
> layout=%message, literal=null, timestamp=false }, { name=exception, 
> layout=%ex{full}, literal=null, timestamp=false }], columnMappings=[], 
> immediateFail=false, retry=true, reconnectIntervalMillis=5000, 
> truncateStrings=true]: The DriverManagerConnectionSource could not load the 
> JDBC driver com.microsoft.sqlserver.jdbc.SQLServerDriver: 
> java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver
> 
> 2020-05-25T10:49:08.627598771Z  java.sql.SQLException: The 
> DriverManagerConnectionSource could not load the JDBC driver 
> com.microsoft.sqlserver.jdbc.SQLServerDriver: 
> java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver
> 
> 2020-05-25T10:49:08.628239791Z  at 
> org.apache.logging.log4j.core.appender.db.jdbc.AbstractDriverManagerConnectionSource.loadDriver(AbstractDriverManagerConnectionSource.java:203)
> 
> 2020-05-25T10:49:08.628442097Z  at 
> org.apache.logging.log4j.core.appender.db.jdbc.AbstractDriverManagerConnectionSource.loadDriver(AbstractDriverManagerConnectionSource.java:185)
> 
> 2020-05-25T10:49:08.628637603Z  at 
> org.apache.logging.log4j.core.appender.db.jdbc.AbstractDriverManagerConnectionSource.getConnection(AbstractDriverManagerConnectionSource.java:147)
> 
> 2020-05-25T10:49:08.628921112Z  at 
> org.apache.logging.log4j.core.appender.db.jdbc.JdbcDatabaseManager.connectAndPrepare(JdbcDatabaseManag

Re: Default Values and Missing Field Queries

2020-05-25 Thread Chris Dempsey
Thanks for the clarification and pointers Erick! Much appreciated!

On Mon, May 25, 2020 at 11:18 AM Erick Erickson 
wrote:

> Try q=*:* -boolfield=false
>
> And it's not as costly as you might think, there's special handling for *:*
> queries. And if you put that in an fq clause instead, the result set will
> be put into the filter cache and be reused assuming you want to do this
> repeatedly.
>
> BTW, Solr doesn't use strict Boolean logic, which may be a bit confusing.
> Google for Chris Hostetter's (Hossman) blog at Lucidwirks for a great
> explanation.
>
> And yes, your understanding of adding a new field is correct
>
> Best,
> Erick
> On Mon, May 25, 2020, 11:39 Chris Dempsey  wrote:
>
> > I'm new to Solr and made an honest stab to finding this in info the docs.
> >
> > I'm working on an update to an existing large collection in Solr 7.7 to
> add
> > a BoolField to mark it as "soft deleted" or not. My understanding is that
> > updating the schema will mean the new field will only exist and have a
> > value (or the default value) for documents indexed after the change,
> > correct? If that's the case, is it possible to query for all documents
> that
> > have that field set to `true` or if that field is completely missing? If
> is
> > a Bad Idea(tm) from a performance or resource usage standpoint to use a
> > "where field X doesn't exist" query (i.e. am I going to end up running a
> > "table scan" if I do)?
> >
> > Thanks in advance!
> >
>


Re: Default Values and Missing Field Queries

2020-05-25 Thread Erick Erickson
Try q=*:* -boolfield=false

And it's not as costly as you might think, there's special handling for *:*
queries. And if you put that in an fq clause instead, the result set will
be put into the filter cache and be reused assuming you want to do this
repeatedly.

BTW, Solr doesn't use strict Boolean logic, which may be a bit confusing.
Google for Chris Hostetter's (Hossman) blog at Lucidwirks for a great
explanation.

And yes, your understanding of adding a new field is correct

Best,
Erick
On Mon, May 25, 2020, 11:39 Chris Dempsey  wrote:

> I'm new to Solr and made an honest stab to finding this in info the docs.
>
> I'm working on an update to an existing large collection in Solr 7.7 to add
> a BoolField to mark it as "soft deleted" or not. My understanding is that
> updating the schema will mean the new field will only exist and have a
> value (or the default value) for documents indexed after the change,
> correct? If that's the case, is it possible to query for all documents that
> have that field set to `true` or if that field is completely missing? If is
> a Bad Idea(tm) from a performance or resource usage standpoint to use a
> "where field X doesn't exist" query (i.e. am I going to end up running a
> "table scan" if I do)?
>
> Thanks in advance!
>


Default Values and Missing Field Queries

2020-05-25 Thread Chris Dempsey
I'm new to Solr and made an honest stab to finding this in info the docs.

I'm working on an update to an existing large collection in Solr 7.7 to add
a BoolField to mark it as "soft deleted" or not. My understanding is that
updating the schema will mean the new field will only exist and have a
value (or the default value) for documents indexed after the change,
correct? If that's the case, is it possible to query for all documents that
have that field set to `true` or if that field is completely missing? If is
a Bad Idea(tm) from a performance or resource usage standpoint to use a
"where field X doesn't exist" query (i.e. am I going to end up running a
"table scan" if I do)?

Thanks in advance!


Log slow queries to SQL Database using Log4j2 (JDBC)

2020-05-25 Thread Krönert Florian
Hi everyone,



For our Solr instance I have the requirement that all queries should be logged, 
so that we can later on analyze, which search texts were queried most often.

Were using solr 8.3.1 using the official docker image, hosted on Azure.

My approach for implementing this, was now to configure a Slow Request rule of 
0ms, so that in fact every request is logged to the slow requests file.

This part works without any issues.



However now I need to process these logs.

It would be convenient, if I had those logs already inside a SQL database.

I saw that log4j2 is able to log to a JDBC database, so for me it seemed the 
easiest way to just add a new appender for JDBC, which also logs the slow 
requests.

Unfortunately I can't seem to get the JDBC driver loaded properly. I have the 
correct jar and the correct driver namespace. I'm sure because I use the same 
setup for the dataimport handler and it works flawlessly.



My log4j2.xml looks like this (removed non-relevant appenders):





  



  

  

  

  

  

  

  



  

  



  



  





I have loaded the JDBC driver per core in solrconfig.xml and also "globally" 
inside solr.xml by adding its containing folder as libDir.

Unforunately log4j2 still can't find the JDBC driver, I always receive these 
issues inside the sysout log:



2020-05-25T10:49:08.616562330Z DEBUG StatusLogger Acquiring JDBC connection 
from jdbc:sqlserver://--redacted--

2020-05-25T10:49:08.618023875Z DEBUG StatusLogger Loading driver class 
com.microsoft.sqlserver.jdbc.SQLServerDriver

2020-05-25T10:49:08.623000529Z DEBUG StatusLogger Cannot reestablish JDBC 
connection to FactoryData [connectionSource=jdbc:sqlserver://--redacted--, 
tableName=dbo.solr_requestlog, columnConfigs=[{ name=date, layout=null, 
literal=null, timestamp=true }, { name=id, layout=%u{RANDOM}, literal=null, 
timestamp=false }, { name=logLevel, layout=%level, literal=null, 
timestamp=false }, { name=logger, layout=%logger, literal=null, timestamp=false 
}, { name=message, layout=%message, literal=null, timestamp=false }, { 
name=exception, layout=%ex{full}, literal=null, timestamp=false }], 
columnMappings=[], immediateFail=false, retry=true, 
reconnectIntervalMillis=5000, truncateStrings=true]: The 
DriverManagerConnectionSource could not load the JDBC driver 
com.microsoft.sqlserver.jdbc.SQLServerDriver: java.lang.ClassNotFoundException: 
com.microsoft.sqlserver.jdbc.SQLServerDriver

2020-05-25T10:49:08.627598771Z  java.sql.SQLException: The 
DriverManagerConnectionSource could not load the JDBC driver 
com.microsoft.sqlserver.jdbc.SQLServerDriver: java.lang.ClassNotFoundException: 
com.microsoft.sqlserver.jdbc.SQLServerDriver

2020-05-25T10:49:08.628239791Z  at 
org.apache.logging.log4j.core.appender.db.jdbc.AbstractDriverManagerConnectionSource.loadDriver(AbstractDriverManagerConnectionSource.java:203)

2020-05-25T10:49:08.628442097Z  at 
org.apache.logging.log4j.core.appender.db.jdbc.AbstractDriverManagerConnectionSource.loadDriver(AbstractDriverManagerConnectionSource.java:185)

2020-05-25T10:49:08.628637603Z  at 
org.apache.logging.log4j.core.appender.db.jdbc.AbstractDriverManagerConnectionSource.getConnection(AbstractDriverManagerConnectionSource.java:147)

2020-05-25T10:49:08.628921112Z  at 
org.apache.logging.log4j.core.appender.db.jdbc.JdbcDatabaseManager.connectAndPrepare(JdbcDatabaseManager.java:567)

2020-05-25T10:49:08.629127918Z  at 
org.apache.logging.log4j.core.appender.db.jdbc.JdbcDatabaseManager.access$800(JdbcDatabaseManager.java:62)

2020-05-25T10:49:08.629311024Z  at 
org.apache.logging.log4j.core.appender.db.jdbc.JdbcDatabaseManager$Reconnector.reconnect(JdbcDatabaseManager.java:174)

2020-05-25T10:49:08.629620834Z  at 
org.apache.logging.log4j.core.appender.db.jdbc.JdbcDatabaseManager$Reconnector.run(JdbcDatabaseManager.java:185)

2020-05-25T10:49:08.629944544Z Caused by: java.lang.ClassNotFoundException: 
com.microsoft.sqlserver.jdbc.SQLServerDriver

2020-05-25T10:49:08.630181051Z  at 
java.base/java.net.URLClassLoader.findClass(Unknown Source)

2020-05-25T10:49:08.630374057Z  at 
java.base/java.lang.ClassLoader.loadClass(Unknown Source)

2020-05-25T10:49:08.630666266Z  at 
java.base/java.lang.ClassLoader.loadClass(Unknown Source)

2020-05-25T10:49:08.630893073Z  at 
java.base/java.lang.Class.forName0(Native Method)

2020-05-25T10:49:08.631085879Z  at 
java.base/java.lang.Class.forName(Unknown Source)

2020-05-25T10:49:08.631652596Z  at 
org.apache.logging.log4j.core.appender.db.jdbc.AbstractDriverManagerConnectionSource.loadDriver(AbstractDriverManagerConnectionSource.java:201)

2020-05-25T10:49:08.631852802Z  ... 6 more



Is anyone of you able to point me to the right direction on how I need to load 
the JDBC driver for log4j2 to grab it up properly?



Thanks a lot in advance.



Kind Regards

Re: Is it possible to direct queries to replicas in SolrCloud

2020-05-21 Thread Erick Erickson
https://lucene.apache.org/solr/guide/7_7/distributed-requests.html

> On May 21, 2020, at 5:40 PM, Pushkar Raste  wrote:
> 
> Hi,
> In master/slave we can send queries to slaves only, now that we have tlog
> and pull replicas can we send queries to those replicas to achieve similar
> scaling like master/slave for large search volumes?
> 
> 
> -- 
> — Pushkar Raste



Is it possible to direct queries to replicas in SolrCloud

2020-05-21 Thread Pushkar Raste
Hi,
In master/slave we can send queries to slaves only, now that we have tlog
and pull replicas can we send queries to those replicas to achieve similar
scaling like master/slave for large search volumes?


-- 
— Pushkar Raste


Re: Optimal size for queries?

2020-04-15 Thread Mark H. Wood
On Wed, Apr 15, 2020 at 10:09:59AM +0100, Colvin Cowie wrote:
> Hi, I can't answer the question as to what the optimal size of rows per
> request is. I would expect it to depend on the number of stored fields
> being marshaled, and their type, and your hardware.

It was a somewhat naive question, but I wasn't sure how to ask a
better one.  Having thought a bit more, I expect that the eventual
solution to my problem will include a number of different changes,
including larger pages, tuning several caches, providing a progress
indicator to the user, and (as you point out below) re-thinking how I
ask Solr for so many documents.

> But using start + rows is a *bad thing* for deep paging. You need to use
> cursorMark, which looks like it was added in 4.7 originally
> https://issues.apache.org/jira/browse/SOLR-5463
> There's a description on the newer reference guide
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
> and in the 4.10 PDF on page 305
> https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf
> 
> http://yonik.com/solr/paging-and-deep-paging/

Thank you for the links.  I think these will be very helpful.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Optimal size for queries?

2020-04-15 Thread Colvin Cowie
Hi, I can't answer the question as to what the optimal size of rows per
request is. I would expect it to depend on the number of stored fields
being marshaled, and their type, and your hardware.

But using start + rows is a *bad thing* for deep paging. You need to use
cursorMark, which looks like it was added in 4.7 originally
https://issues.apache.org/jira/browse/SOLR-5463
There's a description on the newer reference guide
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
and in the 4.10 PDF on page 305
https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf

http://yonik.com/solr/paging-and-deep-paging/


On Fri, 10 Apr 2020 at 19:05, Mark H. Wood  wrote:

> I need to pull a *lot* of records out of a core, to be statistically
> analyzed and the stat.s presented to the user, who is sitting at a
> browser waiting.  So far I haven't seen a way to calculate the stat.s
> I need in Solr itself.  It's difficult to know the size of the total
> result, so I'm running the query repeatedly and windowing the results
> with 'start' and 'rows'.  I just guessed that a window of 1000
> documents would be reasonable.  We currently have about 48GB in the
> core.
>
> The product uses Solr 4.10.  Yes, I know that's very old.
>
> What I got is that every three seconds or so I get another 1000
> documents, totalling around 500KB per response.  For a user request
> for a large range, this is taking way longer than the user's browser
> is willing to wait.  The single CPU on my test box is at 99%
> continuously, and Solr's memory use is around 90% of 8GB.  The test
> hardware is a VMWare guest on an 'Intel(R) Xeon(R) Gold 6150 CPU @
> 2.70GHz'.
>
> A sample query:
>
> 0:0:0:0:0:0:0:1 - - [10/Apr/2020:13:34:18 -0400] "GET
> /solr/statistics/select?q=*%3A*=1000=%2Btype%3A0+%2BbundleName%3AORIGINAL+%2Bstatistics_type%3Aview=%2BisBot%3Afalse=%2Btime%3A%5B2018-01-01T05%3A00%3A00Z+TO+2020-01-01T04%3A59%3A59Z%5D=time+asc=867000=javabin=2
> HTTP/1.1" 200 497475 "-"
> "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0"
>
> As you can see, my test was getting close to 1000 windows.  It's still
> going.  I don't know how far along that is.
>
> So I'm wondering:
>
> o  how can I do better than guessing that 1000 is a good window size?
>How big a response is too big?
>
> o  what else should I be thinking about?
>
> o  given that my test on a full-sized copy of the live data has been
>running for an hour and is still going, is it totally impractical
>to expect that I can improve the process enough to give a response
>to an ad-hoc query while-you-wait?
>
> --
> Mark H. Wood
> Lead Technology Analyst
>
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu
>


Learning To Rank with Group Queries

2020-04-14 Thread Webster Homer
Hi,
My company is looking at using the Learning to Rank. However, our main searches 
do grouping. There is an old Jira from 2016 about how these don't work together.
https://issues.apache.org/jira/browse/SOLR-8776
It doesn't look like this has moved much since then. When will we be able to 
re-rank group queries? From the Jira it seems that it is mostly patched. We use 
Solrcloud and group on a field.

Did these changes ever fix the pagination issues mentioned in the Jira?

We are currently using Solr 7.7.2 but expect to move to 8.* in the next few 
months.

Thanks,
Webster



This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.



Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Queries on adding headers to solrj Request

2020-04-13 Thread dinesh naik
Hi all,
We are planning to add security to Solr using . For this we are adding few
information in the headers of each SolrJ Request. These request will be
intercepted by some application (proxy) in the Solr VM and then route it to
Solr ( Considering Solr port as 8983 ) .
Could you please answer below queries:
 1. Are there any API ( Path ) that Solr Client cannot access and only Solr
uses for Intra node communication?
 2. As the SolrJ client will add headers, Intra communication from Solr
also needs to add these headers ( like ping request from Solr1 Node to
Solr2 Node ). Could Solr add custom headers for intra node communication?
 3. Apart from 8983 node, are there any other ports Solr is using for intra
node communication?
 4. how to add headers to CloudSolrClient ?

-- 
Best Regards,
Dinesh Naik


Optimal size for queries?

2020-04-10 Thread Mark H. Wood
I need to pull a *lot* of records out of a core, to be statistically
analyzed and the stat.s presented to the user, who is sitting at a
browser waiting.  So far I haven't seen a way to calculate the stat.s
I need in Solr itself.  It's difficult to know the size of the total
result, so I'm running the query repeatedly and windowing the results
with 'start' and 'rows'.  I just guessed that a window of 1000
documents would be reasonable.  We currently have about 48GB in the
core.

The product uses Solr 4.10.  Yes, I know that's very old.

What I got is that every three seconds or so I get another 1000
documents, totalling around 500KB per response.  For a user request
for a large range, this is taking way longer than the user's browser
is willing to wait.  The single CPU on my test box is at 99%
continuously, and Solr's memory use is around 90% of 8GB.  The test
hardware is a VMWare guest on an 'Intel(R) Xeon(R) Gold 6150 CPU @
2.70GHz'.

A sample query:

0:0:0:0:0:0:0:1 - - [10/Apr/2020:13:34:18 -0400] "GET 
/solr/statistics/select?q=*%3A*=1000=%2Btype%3A0+%2BbundleName%3AORIGINAL+%2Bstatistics_type%3Aview=%2BisBot%3Afalse=%2Btime%3A%5B2018-01-01T05%3A00%3A00Z+TO+2020-01-01T04%3A59%3A59Z%5D=time+asc=867000=javabin=2
 HTTP/1.1" 200 497475 "-" 
"Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0"

As you can see, my test was getting close to 1000 windows.  It's still
going.  I don't know how far along that is.

So I'm wondering:

o  how can I do better than guessing that 1000 is a good window size?
   How big a response is too big?

o  what else should I be thinking about?

o  given that my test on a full-sized copy of the live data has been
   running for an hour and is still going, is it totally impractical
   to expect that I can improve the process enough to give a response
   to an ad-hoc query while-you-wait?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Performance of range queries in Point vs. Trie fields

2020-03-29 Thread Michael Cooper
I think my original post didn't go through because I wasn't subscribed so 
apologizes if this is a duplicate.

For both Solr 7 and Solr 8, we have found that attempts to do range queries on 
DatePointField when there are a large number of points performs poorly (queries 
were taking over 30 seconds on a 50G core). We also tried switching to 
IntPointField to see if it made a difference and it didn't. Just for 
comparison, we switched to using the deprecated TrieDateField and found the 
performance was significantly better, almost 5x better on average. We even 
tried different precision steps and although there was slight variation between 
various values, all were significantly faster than the DatePointField. So we 
are now running in production with the deprecated fields instead. Wanted to 
know if this is a common observance, because blogs I've read lead me to believe 
that the Point fields are supposed to be fast. Not sure what the testing 
environment was for that but that has not been our experience. I hope that 
these Trie fields are going to stay in the product for Solr 9, I know they were 
supposed to be removed in Solr 8 but there must have been a reason they were 
not.

Thanks for your help!
Michael Cooper



Performance of range queries in Point vs. Trie fields

2020-03-25 Thread Michael Cooper
For both Solr 7 and Solr 8, we have found that attempts to do range queries on 
DatePointField when there are a large number of points performs poorly (queries 
were taking over 30 seconds on a 50G core). We also tried switching to 
IntPointField to see if it made a difference and it didn't. Just for 
comparison, we switched to using the deprecated TrieDateField and found the 
performance was significantly better, almost 5x better on average. We even 
tried different precision steps and although there was slight variation between 
various values, all were significantly faster than the DatePointField. So we 
are now running in production with the deprecated fields instead. Wanted to 
know if this is a common observance, because blogs I've read lead me to believe 
that the Point fields are supposed to be fast. Not sure what the testing 
environment was for that but that has not been our experience. I hope that 
these Trie fields are going to stay in the product for Solr 9, I know they were 
supposed to be removed in Solr 8 but there must have been a reason they were 
not.

Michael Cooper



Re: Solr range search queries

2020-03-11 Thread Paras Lehana
Hi Niharika,

Range queries for string fields would work lexicographically and not
numeric I think. Read:
https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html#TheStandardQueryParser-RangeSearches
.

If this is the case, [2 TO 5] will include 200 and [2 TO 20] will not
include 19.

On Tue, 10 Mar 2020 at 15:09, Niharika  wrote:

> hello,
>
> I have  two field declared in schema xml as
>
> * required='false' multiValued='true' 
>  required='false' multiValued='true'>
> *
>
> I want to generate a query to find all the result in specific range for
> longitude and latitude
>
> My query looks like
>
> *latitude:[47.010225655683485 TO 52.40241887397332] AND
> longitude:[-2.021484375004 TO 14.63378906252]*
>
> the problem here is:- i am not getting all the result i except, can anyone
> suggest me what I can do here and why it is wrong?
>
> PS:- I have already tried rounding of the decimals, I cannot change the
> type
> from string in schema.xml.
>
> Thanks & Regards
> Niharika
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Solr range search queries

2020-03-10 Thread Niharika
hello, 

I have  two field declared in schema xml as 
 
*

*

I want to generate a query to find all the result in specific range for
longitude and latitude

My query looks like 

*latitude:[47.010225655683485 TO 52.40241887397332] AND
longitude:[-2.021484375004 TO 14.63378906252]*

the problem here is:- i am not getting all the result i except, can anyone
suggest me what I can do here and why it is wrong?

PS:- I have already tried rounding of the decimals, I cannot change the type
from string in schema.xml. 

Thanks & Regards
Niharika 




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Slow queries until core is reindexed

2020-03-05 Thread dbourassa
Hi all,

We have a solr 8.4.1 server running on Windows. (Very simple setup.)
16GB RAM / JVM-Mem set to 4GB
Solr host 4 cores. (2 GB + 1GB + 75MB + 75MB)
Full data import every night. No delta import.
This server is used for tests by 2 people. (very low request rate)

We have an issue we don't understand: 
The average response time for search queries is < 10ms.
Sometimes the response time slow down considerably (>1000ms) for all queries
but just for 1 core.
All queries continue to be slow until we reindex the core with a full data
import.
After that, response time go back under 10ms for this core but another core
begins to slow down.
We cannot operate all the 4 cores with the expected response time <10ms at
the same time.

What can be the cause of this issue?

Thanks,
Dany




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Blocking certain queries

2020-02-03 Thread John Davis
Hello,

Is there a way to block certain queries in solr? For eg a delete for *:* or
if there is a known query that causes problems, can these be blocked at the
solr server layer.


Re: Edismax ignoring queries containing booleans

2020-01-10 Thread Edward Ribeiro
Cool, glad to help, Paras and Claire. :)

Cheers,
Edward


Em sex, 10 de jan de 2020 06:31, Paras Lehana 
escreveu:

> Hi Edward, the way you have explained mm and fq's relation with parser has
> cleared all my potential queries. I didn't know fq supports other parsers.
> :)
>
> On Fri, 10 Jan 2020 at 10:46, Edward Ribeiro 
> wrote:
>
> > The fq is not affected by mm parameter because it uses Solr's default
> query
> > parser (LuceneQueryParser) that doesn't support it. But you can change
> the
> > parser used by fq this way: fq={!edismax}recordID:(10 20) or fq={!edismax
> > mm=1}recordID:(10 20) , for example (even though that is not the case
> > here).
> >
> > Please, let me know if any of the suggestions, or any other you come up
> > with, solve the issue and don't forget to test those approaches so that
> you
> > can avoid any performance degradation.
> >
> > Best,
> > Edward
> >
> > On Fri, Jan 10, 2020 at 1:41 AM Edward Ribeiro  >
> > wrote:
> >
> > > Hi Claire,
> > >
> > > > The only visual difference I think is the ~2 which came after the
> > > initial part of the parsed query:
> > > > Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > > (recordID:[20 TO 20]))~2
> > > > New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > > (recordID:[20 TO 20]))
> > >
> > > The mm (minimum match) parameter alter the behaviour of the OR clauses.
> > > See here:
> > >
> >
> https://lucene.apache.org/solr/guide/8_3/the-dismax-query-parser.html#mm-minimum-should-match-parameter
> > > For example, if there is a query like `text:(toys OR children OR
> sales)`,
> > > but your mm=3, then at least three terms are required to match. The
> query
> > > is now equivalent to `text:(toys AND children AND sales)`
> > >
> > > In the "+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO
> > > 20]))~2" query the "))~2" part means that at least two matches are
> > required
> > > of the three optional terms: 18, 19, and 20. But recordID will only
> match
> > > at most one term. Therefore, it will return no documents because it
> will
> > > never satisfy the condition setup by mm (match 18 AND 19 AND 20). If
> mm=1
> > > the query would work as intended in this example.
> > >
> > > The mm parameter you use is: 0<1 2<-1 5<-2 6<90% that can roughly be
> > > translated as:
> > >
> > > * 0<1 : If there is one term then minimum match 1??? Didn't get this
> one.
> > >
> > > * 2<-1 5<-2 6<90% : If there are one or two terms then mininum match
> all.
> > > Between 3 and 5 (inclusive) terms match all but one (in your example
> > there
> > > are 3 numbers so it will require to match at least 2, that’s the reason
> > of
> > > the ~2). If there are 6 terms then match 4 (6 - 2), and above 6 terms
> > then
> > > matches 90% of the terms (e.g., if there are 10 clauses then it is
> > required
> > > to match at least 9).
> > >
> > > > There shouldn't be a problem using mm with edismax right? Or does the
> > > problem lie with the structure of my qf/pf and then adding mm?
> > >
> > > Nope. There’s no problem using mm with edismax nor the problem lies on
> > > qf/pf. As you dig
> > >
> > > > I can see this is a change to default behaviour, but does it mean I
> > > should be passing mm in the query now rather than just at config level?
> > >
> > > I see a couple of approaches to solve this issue:
> > >
> > > 1) Removing the mm parameter from solrconfig. But it probably was setup
> > > for a reason so you should check before hand. In this case, you could
> > issue
> > > mm=0<1 2<-1 5<-2 6<90% as a query parameter if necessary.
> > >
> > > 2) Adding a mm=1 as a query parameter whenever you search for recordID.
> > > Issuing the parameter in the query will overwrite the mm parameter that
> > was
> > > setup in solrconfig for that particular query.
> > >
> > > 3) Doing a match all query (q=*:*) and moving the recordID query to a
> > > filter query: fq=recordID:(18 OR 19 OR 20)  The fq is not affected by
> mm
> > > parameter or so it seems. No need to change mm in solrconfig nor adding
> > mm
> > > as a query parameter.
> > >
> > > Particularly, I would go with either 2) or 3).
> > >
> > > Best,

RE: Edismax ignoring queries containing booleans

2020-01-10 Thread Claire Pollard
Hi Edward,

Thank you so much for your reply. Your explanations have really helped me 
understand the impact of mm on our queries 

I'm going to try what you suggest but I agree, it seems like 2 or 3 is the best 
option for us. We still would like the behaviour of mm on certain queries, so 
removing it from the solrconfig isn't possible.

I'll let you know how I get on, might be a little while until I get some 
results, but thank you again!

Cheers,
Claire.

-Original Message-
From: Edward Ribeiro  
Sent: 10 January 2020 05:16
To: solr-user@lucene.apache.org
Subject: Re: Edismax ignoring queries containing booleans

The fq is not affected by mm parameter because it uses Solr's default query 
parser (LuceneQueryParser) that doesn't support it. But you can change the 
parser used by fq this way: fq={!edismax}recordID:(10 20) or fq={!edismax
mm=1}recordID:(10 20) , for example (even though that is not the case here).

Please, let me know if any of the suggestions, or any other you come up with, 
solve the issue and don't forget to test those approaches so that you can avoid 
any performance degradation.

Best,
Edward

On Fri, Jan 10, 2020 at 1:41 AM Edward Ribeiro 
wrote:

> Hi Claire,
>
> > The only visual difference I think is the ~2 which came after the
> initial part of the parsed query:
> > Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))~2
> > New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))
>
> The mm (minimum match) parameter alter the behaviour of the OR clauses.
> See here:
> https://lucene.apache.org/solr/guide/8_3/the-dismax-query-parser.html#
> mm-minimum-should-match-parameter For example, if there is a query 
> like `text:(toys OR children OR sales)`, but your mm=3, then at least 
> three terms are required to match. The query is now equivalent to 
> `text:(toys AND children AND sales)`
>
> In the "+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO 
> 20]))~2" query the "))~2" part means that at least two matches are 
> required of the three optional terms: 18, 19, and 20. But recordID 
> will only match at most one term. Therefore, it will return no 
> documents because it will never satisfy the condition setup by mm 
> (match 18 AND 19 AND 20). If mm=1 the query would work as intended in this 
> example.
>
> The mm parameter you use is: 0<1 2<-1 5<-2 6<90% that can roughly be 
> translated as:
>
> * 0<1 : If there is one term then minimum match 1??? Didn't get this one.
>
> * 2<-1 5<-2 6<90% : If there are one or two terms then mininum match all.
> Between 3 and 5 (inclusive) terms match all but one (in your example 
> there are 3 numbers so it will require to match at least 2, that’s the 
> reason of the ~2). If there are 6 terms then match 4 (6 - 2), and 
> above 6 terms then matches 90% of the terms (e.g., if there are 10 
> clauses then it is required to match at least 9).
>
> > There shouldn't be a problem using mm with edismax right? Or does 
> > the
> problem lie with the structure of my qf/pf and then adding mm?
>
> Nope. There’s no problem using mm with edismax nor the problem lies on 
> qf/pf. As you dig
>
> > I can see this is a change to default behaviour, but does it mean I
> should be passing mm in the query now rather than just at config level?
>
> I see a couple of approaches to solve this issue:
>
> 1) Removing the mm parameter from solrconfig. But it probably was 
> setup for a reason so you should check before hand. In this case, you 
> could issue
> mm=0<1 2<-1 5<-2 6<90% as a query parameter if necessary.
>
> 2) Adding a mm=1 as a query parameter whenever you search for recordID.
> Issuing the parameter in the query will overwrite the mm parameter 
> that was setup in solrconfig for that particular query.
>
> 3) Doing a match all query (q=*:*) and moving the recordID query to a 
> filter query: fq=recordID:(18 OR 19 OR 20)  The fq is not affected by 
> mm parameter or so it seems. No need to change mm in solrconfig nor 
> adding mm as a query parameter.
>
> Particularly, I would go with either 2) or 3).
>
> Best,
> Edward
>
> On Thu, Jan 9, 2020 at 7:47 AM Claire Pollard 
> 
> wrote:
> >
> > Also, I've found this bug from previous which highlights the issue 
> > with
> ))~2
> >
> > https://issues.apache.org/jira/browse/SOLR-8812
> >
> > mm is set at config, but not explicitly in the query...
> >
> > I can see this is a change to default behaviour, but does it mean I
> should be passing mm in the query now rather than just at config level?
> >
> > -Original Message-
> > From: Claire Pollard 
> 

Re: Edismax ignoring queries containing booleans

2020-01-10 Thread Paras Lehana
Hi Edward, the way you have explained mm and fq's relation with parser has
cleared all my potential queries. I didn't know fq supports other parsers.
:)

On Fri, 10 Jan 2020 at 10:46, Edward Ribeiro 
wrote:

> The fq is not affected by mm parameter because it uses Solr's default query
> parser (LuceneQueryParser) that doesn't support it. But you can change the
> parser used by fq this way: fq={!edismax}recordID:(10 20) or fq={!edismax
> mm=1}recordID:(10 20) , for example (even though that is not the case
> here).
>
> Please, let me know if any of the suggestions, or any other you come up
> with, solve the issue and don't forget to test those approaches so that you
> can avoid any performance degradation.
>
> Best,
> Edward
>
> On Fri, Jan 10, 2020 at 1:41 AM Edward Ribeiro 
> wrote:
>
> > Hi Claire,
> >
> > > The only visual difference I think is the ~2 which came after the
> > initial part of the parsed query:
> > > Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > (recordID:[20 TO 20]))~2
> > > New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > (recordID:[20 TO 20]))
> >
> > The mm (minimum match) parameter alter the behaviour of the OR clauses.
> > See here:
> >
> https://lucene.apache.org/solr/guide/8_3/the-dismax-query-parser.html#mm-minimum-should-match-parameter
> > For example, if there is a query like `text:(toys OR children OR sales)`,
> > but your mm=3, then at least three terms are required to match. The query
> > is now equivalent to `text:(toys AND children AND sales)`
> >
> > In the "+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO
> > 20]))~2" query the "))~2" part means that at least two matches are
> required
> > of the three optional terms: 18, 19, and 20. But recordID will only match
> > at most one term. Therefore, it will return no documents because it will
> > never satisfy the condition setup by mm (match 18 AND 19 AND 20). If mm=1
> > the query would work as intended in this example.
> >
> > The mm parameter you use is: 0<1 2<-1 5<-2 6<90% that can roughly be
> > translated as:
> >
> > * 0<1 : If there is one term then minimum match 1??? Didn't get this one.
> >
> > * 2<-1 5<-2 6<90% : If there are one or two terms then mininum match all.
> > Between 3 and 5 (inclusive) terms match all but one (in your example
> there
> > are 3 numbers so it will require to match at least 2, that’s the reason
> of
> > the ~2). If there are 6 terms then match 4 (6 - 2), and above 6 terms
> then
> > matches 90% of the terms (e.g., if there are 10 clauses then it is
> required
> > to match at least 9).
> >
> > > There shouldn't be a problem using mm with edismax right? Or does the
> > problem lie with the structure of my qf/pf and then adding mm?
> >
> > Nope. There’s no problem using mm with edismax nor the problem lies on
> > qf/pf. As you dig
> >
> > > I can see this is a change to default behaviour, but does it mean I
> > should be passing mm in the query now rather than just at config level?
> >
> > I see a couple of approaches to solve this issue:
> >
> > 1) Removing the mm parameter from solrconfig. But it probably was setup
> > for a reason so you should check before hand. In this case, you could
> issue
> > mm=0<1 2<-1 5<-2 6<90% as a query parameter if necessary.
> >
> > 2) Adding a mm=1 as a query parameter whenever you search for recordID.
> > Issuing the parameter in the query will overwrite the mm parameter that
> was
> > setup in solrconfig for that particular query.
> >
> > 3) Doing a match all query (q=*:*) and moving the recordID query to a
> > filter query: fq=recordID:(18 OR 19 OR 20)  The fq is not affected by mm
> > parameter or so it seems. No need to change mm in solrconfig nor adding
> mm
> > as a query parameter.
> >
> > Particularly, I would go with either 2) or 3).
> >
> > Best,
> > Edward
> >
> > On Thu, Jan 9, 2020 at 7:47 AM Claire Pollard 
> > wrote:
> > >
> > > Also, I've found this bug from previous which highlights the issue with
> > ))~2
> > >
> > > https://issues.apache.org/jira/browse/SOLR-8812
> > >
> > > mm is set at config, but not explicitly in the query...
> > >
> > > I can see this is a change to default behaviour, but does it mean I
> > should be passing mm in the query now rather than just at config level?
> > >
> > > -Original Message-
> > > From: Cla

Re: Edismax ignoring queries containing booleans

2020-01-09 Thread Edward Ribeiro
The fq is not affected by mm parameter because it uses Solr's default query
parser (LuceneQueryParser) that doesn't support it. But you can change the
parser used by fq this way: fq={!edismax}recordID:(10 20) or fq={!edismax
mm=1}recordID:(10 20) , for example (even though that is not the case here).

Please, let me know if any of the suggestions, or any other you come up
with, solve the issue and don't forget to test those approaches so that you
can avoid any performance degradation.

Best,
Edward

On Fri, Jan 10, 2020 at 1:41 AM Edward Ribeiro 
wrote:

> Hi Claire,
>
> > The only visual difference I think is the ~2 which came after the
> initial part of the parsed query:
> > Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))~2
> > New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))
>
> The mm (minimum match) parameter alter the behaviour of the OR clauses.
> See here:
> https://lucene.apache.org/solr/guide/8_3/the-dismax-query-parser.html#mm-minimum-should-match-parameter
> For example, if there is a query like `text:(toys OR children OR sales)`,
> but your mm=3, then at least three terms are required to match. The query
> is now equivalent to `text:(toys AND children AND sales)`
>
> In the "+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO
> 20]))~2" query the "))~2" part means that at least two matches are required
> of the three optional terms: 18, 19, and 20. But recordID will only match
> at most one term. Therefore, it will return no documents because it will
> never satisfy the condition setup by mm (match 18 AND 19 AND 20). If mm=1
> the query would work as intended in this example.
>
> The mm parameter you use is: 0<1 2<-1 5<-2 6<90% that can roughly be
> translated as:
>
> * 0<1 : If there is one term then minimum match 1??? Didn't get this one.
>
> * 2<-1 5<-2 6<90% : If there are one or two terms then mininum match all.
> Between 3 and 5 (inclusive) terms match all but one (in your example there
> are 3 numbers so it will require to match at least 2, that’s the reason of
> the ~2). If there are 6 terms then match 4 (6 - 2), and above 6 terms then
> matches 90% of the terms (e.g., if there are 10 clauses then it is required
> to match at least 9).
>
> > There shouldn't be a problem using mm with edismax right? Or does the
> problem lie with the structure of my qf/pf and then adding mm?
>
> Nope. There’s no problem using mm with edismax nor the problem lies on
> qf/pf. As you dig
>
> > I can see this is a change to default behaviour, but does it mean I
> should be passing mm in the query now rather than just at config level?
>
> I see a couple of approaches to solve this issue:
>
> 1) Removing the mm parameter from solrconfig. But it probably was setup
> for a reason so you should check before hand. In this case, you could issue
> mm=0<1 2<-1 5<-2 6<90% as a query parameter if necessary.
>
> 2) Adding a mm=1 as a query parameter whenever you search for recordID.
> Issuing the parameter in the query will overwrite the mm parameter that was
> setup in solrconfig for that particular query.
>
> 3) Doing a match all query (q=*:*) and moving the recordID query to a
> filter query: fq=recordID:(18 OR 19 OR 20)  The fq is not affected by mm
> parameter or so it seems. No need to change mm in solrconfig nor adding mm
> as a query parameter.
>
> Particularly, I would go with either 2) or 3).
>
> Best,
> Edward
>
> On Thu, Jan 9, 2020 at 7:47 AM Claire Pollard 
> wrote:
> >
> > Also, I've found this bug from previous which highlights the issue with
> ))~2
> >
> > https://issues.apache.org/jira/browse/SOLR-8812
> >
> > mm is set at config, but not explicitly in the query...
> >
> > I can see this is a change to default behaviour, but does it mean I
> should be passing mm in the query now rather than just at config level?
> >
> > -Original Message-
> > From: Claire Pollard 
> > Sent: 09 January 2020 10:23
> > To: solr-user@lucene.apache.org
> > Subject: RE: Edismax ignoring queries containing booleans
> >
> > Hey Edward,
> >
> > Thanks for the tips.
> >
> > I've cleaned up my solrconfig, removed the duplicate df, tabs and
> newlines, and tried commenting out the bits you've suggested and adding
> them back in bit by bit, and it seems mm was the thing which is breaking
> the query for me.
> >
> > Without it, the query returns 2 documents as expected.
> >
> > "debug":{
> > "rawquerystring":"recordID:(18 OR 19 OR 20)",
> > "querystring":&

Re: Edismax ignoring queries containing booleans

2020-01-09 Thread Edward Ribeiro
Hi Claire,

> The only visual difference I think is the ~2 which came after the initial
part of the parsed query:
> Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
(recordID:[20 TO 20]))~2
> New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
(recordID:[20 TO 20]))

The mm (minimum match) parameter alter the behaviour of the OR clauses. See
here:
https://lucene.apache.org/solr/guide/8_3/the-dismax-query-parser.html#mm-minimum-should-match-parameter
For example, if there is a query like `text:(toys OR children OR sales)`,
but your mm=3, then at least three terms are required to match. The query
is now equivalent to `text:(toys AND children AND sales)`

In the "+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO
20]))~2" query the "))~2" part means that at least two matches are required
of the three optional terms: 18, 19, and 20. But recordID will only match
at most one term. Therefore, it will return no documents because it will
never satisfy the condition setup by mm (match 18 AND 19 AND 20). If mm=1
the query would work as intended in this example.

The mm parameter you use is: 0<1 2<-1 5<-2 6<90% that can roughly be
translated as:

* 0<1 : If there is one term then minimum match 1??? Didn't get this one.

* 2<-1 5<-2 6<90% : If there are one or two terms then mininum match all.
Between 3 and 5 (inclusive) terms match all but one (in your example there
are 3 numbers so it will require to match at least 2, that’s the reason of
the ~2). If there are 6 terms then match 4 (6 - 2), and above 6 terms then
matches 90% of the terms (e.g., if there are 10 clauses then it is required
to match at least 9).

> There shouldn't be a problem using mm with edismax right? Or does the
problem lie with the structure of my qf/pf and then adding mm?

Nope. There’s no problem using mm with edismax nor the problem lies on
qf/pf. As you dig

> I can see this is a change to default behaviour, but does it mean I
should be passing mm in the query now rather than just at config level?

I see a couple of approaches to solve this issue:

1) Removing the mm parameter from solrconfig. But it probably was setup for
a reason so you should check before hand. In this case, you could issue
mm=0<1 2<-1 5<-2 6<90% as a query parameter if necessary.

2) Adding a mm=1 as a query parameter whenever you search for recordID.
Issuing the parameter in the query will overwrite the mm parameter that was
setup in solrconfig for that particular query.

3) Doing a match all query (q=*:*) and moving the recordID query to a
filter query: fq=recordID:(18 OR 19 OR 20)  The fq is not affected by mm
parameter or so it seems. No need to change mm in solrconfig nor adding mm
as a query parameter.

Particularly, I would go with either 2) or 3).

Best,
Edward

On Thu, Jan 9, 2020 at 7:47 AM Claire Pollard 
wrote:
>
> Also, I've found this bug from previous which highlights the issue with
))~2
>
> https://issues.apache.org/jira/browse/SOLR-8812
>
> mm is set at config, but not explicitly in the query...
>
> I can see this is a change to default behaviour, but does it mean I
should be passing mm in the query now rather than just at config level?
>
> -Original Message-
> From: Claire Pollard 
> Sent: 09 January 2020 10:23
> To: solr-user@lucene.apache.org
> Subject: RE: Edismax ignoring queries containing booleans
>
> Hey Edward,
>
> Thanks for the tips.
>
> I've cleaned up my solrconfig, removed the duplicate df, tabs and
newlines, and tried commenting out the bits you've suggested and adding
them back in bit by bit, and it seems mm was the thing which is breaking
the query for me.
>
> Without it, the query returns 2 documents as expected.
>
> "debug":{
> "rawquerystring":"recordID:(18 OR 19 OR 20)",
> "querystring":"recordID:(18 OR 19 OR 20)",
> "parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19])
(recordID:[20 TO 20])) DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2 |
(annotations:\"19 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
(Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19
20\"~100)^1.1))",
> "parsedquery_toString":"+(recordID:[18 TO 18] recordID:[19 TO 19]
recordID:[20 TO 20]) ((text:\"19 20\"~100)^0.2 | (annotations:\"19
20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
(Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19
20\"~100)^1.1)",
> "explain":{
>   "2CBF8A49-

RE: Edismax ignoring queries containing booleans

2020-01-09 Thread Claire Pollard
Also, I've found this bug from previous which highlights the issue with ))~2 

https://issues.apache.org/jira/browse/SOLR-8812

mm is set at config, but not explicitly in the query... 

I can see this is a change to default behaviour, but does it mean I should be 
passing mm in the query now rather than just at config level?

-Original Message-
From: Claire Pollard  
Sent: 09 January 2020 10:23
To: solr-user@lucene.apache.org
Subject: RE: Edismax ignoring queries containing booleans

Hey Edward,

Thanks for the tips. 

I've cleaned up my solrconfig, removed the duplicate df, tabs and newlines, and 
tried commenting out the bits you've suggested and adding them back in bit by 
bit, and it seems mm was the thing which is breaking the query for me.

Without it, the query returns 2 documents as expected.

"debug":{
"rawquerystring":"recordID:(18 OR 19 OR 20)",
"querystring":"recordID:(18 OR 19 OR 20)",
"parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 
TO 20])) DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2 | (annotations:\"19 
20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 | 
collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 | (Test_FR:\"19 
20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19 20\"~100)^1.1))",
"parsedquery_toString":"+(recordID:[18 TO 18] recordID:[19 TO 19] 
recordID:[20 TO 20]) ((text:\"19 20\"~100)^0.2 | (annotations:\"19 
20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 | 
collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 | (Test_FR:\"19 
20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19 20\"~100)^1.1)",
"explain":{
  "2CBF8A49-CA2D-4e42-88F2-3790922EF415":"\n1.0 = sum of:\n  1.0 = sum 
of:\n1.0 = recordID:[19 TO 19]\n",
  "F73CFBC7-2CD2-4aab-B8C1-9D19D427EAFB":"\n1.0 = sum of:\n  1.0 = sum 
of:\n1.0 = recordID:[20 TO 20]\n"},

The only visual difference I think is the ~2 which came after the initial part 
of the parsed query:

Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO 
20]))~2 New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19]) 
(recordID:[20 TO 20]))

There shouldn't be a problem using mm with edismax right? Or does the problem 
lie with the structure of my qf/pf and then adding mm?

Cheers,
Claire.

-Original Message-
From: Edward Ribeiro 
Sent: 09 January 2020 02:28
To: solr-user@lucene.apache.org
Subject: Re: Edismax ignoring queries containing booleans

Hi Claire,

Unfortunately I didn't see anything in the debug explain that could potentially 
be the source of the problem. As Saurabh, I tested on a core and it worked for 
me.

I suggest that you simplify the solrconfig (commenting out qf, mm, spellchecker 
config and pf, for example) and reload the core. If the query works then you  
reinsert the config one by one, reloading the core and see if the query works.

A few remarks based on a snippet of the solrconfig you posted on a previous
e-mail:

* Your solrconfig.xml defines df two times (the debug shows "df":["text", 
"text"]);

* There are a couple codes like 
 and  It would be nice to remove It;

Please, let us know if you find why. :)

Best,
Edward


Em qua, 8 de jan de 2020 13:00, Claire Pollard 
escreveu:

> It would be lovely to be able to use range to complete my searches, 
> but sadly documents aren't necessarily sequential so I might want say 
> 18, 24 or
> 30 in future.
>
> I've re-run the query with debug on. Is there anything here that looks 
> unusual? Thanks.
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":75,
> "params":{
>   "mm":"\r\n   0<1 2<-1 5<-2 6<90%\r\n  ",
>   "spellcheck.collateExtendedResults":"true",
>   "df":["text",
> "text"],
>   "q.alt":"*:*",
>   "ps":"100",
>   "spellcheck.dictionary":["default",
> "wordbreak"],
>   "bf":"",
>   "echoParams":"all",
>   "fl":"*,score",
>   "spellcheck.maxCollations":"5",
>   "rows":"10",
>   "spellcheck.alternativeTermCount":"5",
>   "spellcheck.extendedResults":"true",
>   "q":"recordID:(18 OR 19 OR 20)",
>   "defType":"edismax",
>

RE: Edismax ignoring queries containing booleans

2020-01-09 Thread Claire Pollard
Hey Edward,

Thanks for the tips. 

I've cleaned up my solrconfig, removed the duplicate df, tabs and newlines, and 
tried commenting out the bits you've suggested and adding them back in bit by 
bit, and it seems mm was the thing which is breaking the query for me.

Without it, the query returns 2 documents as expected.

"debug":{
"rawquerystring":"recordID:(18 OR 19 OR 20)",
"querystring":"recordID:(18 OR 19 OR 20)",
"parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 
TO 20])) DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2 | (annotations:\"19 
20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 | 
collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 | (Test_FR:\"19 
20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19 20\"~100)^1.1))",
"parsedquery_toString":"+(recordID:[18 TO 18] recordID:[19 TO 19] 
recordID:[20 TO 20]) ((text:\"19 20\"~100)^0.2 | (annotations:\"19 
20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 | 
collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 | (Test_FR:\"19 
20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19 20\"~100)^1.1)",
"explain":{
  "2CBF8A49-CA2D-4e42-88F2-3790922EF415":"\n1.0 = sum of:\n  1.0 = sum 
of:\n1.0 = recordID:[19 TO 19]\n",
  "F73CFBC7-2CD2-4aab-B8C1-9D19D427EAFB":"\n1.0 = sum of:\n  1.0 = sum 
of:\n1.0 = recordID:[20 TO 20]\n"},

The only visual difference I think is the ~2 which came after the initial part 
of the parsed query:

Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO 
20]))~2
New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO 
20]))

There shouldn't be a problem using mm with edismax right? Or does the problem 
lie with the structure of my qf/pf and then adding mm?

Cheers,
Claire.

-Original Message-
From: Edward Ribeiro  
Sent: 09 January 2020 02:28
To: solr-user@lucene.apache.org
Subject: Re: Edismax ignoring queries containing booleans

Hi Claire,

Unfortunately I didn't see anything in the debug explain that could potentially 
be the source of the problem. As Saurabh, I tested on a core and it worked for 
me.

I suggest that you simplify the solrconfig (commenting out qf, mm, spellchecker 
config and pf, for example) and reload the core. If the query works then you  
reinsert the config one by one, reloading the core and see if the query works.

A few remarks based on a snippet of the solrconfig you posted on a previous
e-mail:

* Your solrconfig.xml defines df two times (the debug shows "df":["text", 
"text"]);

* There are a couple codes like 
 and  It would be nice to remove It;

Please, let us know if you find why. :)

Best,
Edward


Em qua, 8 de jan de 2020 13:00, Claire Pollard 
escreveu:

> It would be lovely to be able to use range to complete my searches, 
> but sadly documents aren't necessarily sequential so I might want say 
> 18, 24 or
> 30 in future.
>
> I've re-run the query with debug on. Is there anything here that looks 
> unusual? Thanks.
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":75,
> "params":{
>   "mm":"\r\n   0<1 2<-1 5<-2 6<90%\r\n  ",
>   "spellcheck.collateExtendedResults":"true",
>   "df":["text",
> "text"],
>   "q.alt":"*:*",
>   "ps":"100",
>   "spellcheck.dictionary":["default",
> "wordbreak"],
>   "bf":"",
>   "echoParams":"all",
>   "fl":"*,score",
>   "spellcheck.maxCollations":"5",
>   "rows":"10",
>   "spellcheck.alternativeTermCount":"5",
>   "spellcheck.extendedResults":"true",
>   "q":"recordID:(18 OR 19 OR 20)",
>   "defType":"edismax",
>   "spellcheck.maxResultsForSuggest":"5",
>   "qf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.4 recordID^10.0
> annotations^0.5 collectionTitle^1.9 collectionDescription^0.9 
> title^2.0
> Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0 
> french2^1.0\r\n\n\t\t\t\t\n\t\t\t",
>   "spellcheck":"on",
>   "pf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.2 recordID^10.0
> annotations^0.6 collectionTitle^

Re: Edismax ignoring queries containing booleans

2020-01-08 Thread Edward Ribeiro
Hi Claire,

Unfortunately I didn't see anything in the debug explain that could
potentially be the source of the problem. As Saurabh, I tested on a core
and it worked for me.

I suggest that you simplify the solrconfig (commenting out qf, mm,
spellchecker config and pf, for example) and reload the core. If the query
works then you  reinsert the config one by one, reloading the core and see
if the query works.

A few remarks based on a snippet of the solrconfig you posted on a previous
e-mail:

* Your solrconfig.xml defines df two times (the debug shows "df":["text",
"text"]);

* There are a couple codes like 
 and  It would be nice to remove It;

Please, let us know if you find why. :)

Best,
Edward


Em qua, 8 de jan de 2020 13:00, Claire Pollard 
escreveu:

> It would be lovely to be able to use range to complete my searches, but
> sadly documents aren't necessarily sequential so I might want say 18, 24 or
> 30 in future.
>
> I've re-run the query with debug on. Is there anything here that looks
> unusual? Thanks.
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":75,
> "params":{
>   "mm":"\r\n   0<1 2<-1 5<-2 6<90%\r\n  ",
>   "spellcheck.collateExtendedResults":"true",
>   "df":["text",
> "text"],
>   "q.alt":"*:*",
>   "ps":"100",
>   "spellcheck.dictionary":["default",
> "wordbreak"],
>   "bf":"",
>   "echoParams":"all",
>   "fl":"*,score",
>   "spellcheck.maxCollations":"5",
>   "rows":"10",
>   "spellcheck.alternativeTermCount":"5",
>   "spellcheck.extendedResults":"true",
>   "q":"recordID:(18 OR 19 OR 20)",
>   "defType":"edismax",
>   "spellcheck.maxResultsForSuggest":"5",
>   "qf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.4 recordID^10.0
> annotations^0.5 collectionTitle^1.9 collectionDescription^0.9 title^2.0
> Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0
> french2^1.0\r\n\n\t\t\t\t\n\t\t\t",
>   "spellcheck":"on",
>   "pf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.2 recordID^10.0
> annotations^0.6 collectionTitle^2.0 collectionDescription^1.0 title^2.1
> Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1
> french2^1.1\r\n\n\t\t\t\t\n\t\t\t",
>   "spellcheck.count":"10",
>   "debugQuery":"on",
>   "_":"1578499092576",
>   "spellcheck.collate":"true"}},
>   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
>   },
>   "spellcheck":{
> "suggestions":[],
> "correctlySpelled":false,
> "collations":[]},
>   "debug":{
> "rawquerystring":"recordID:(18 OR 19 OR 20)",
> "querystring":"recordID:(18 OR 19 OR 20)",
> "parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))~2 DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2 |
> (annotations:\"19 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19
> 20\"~100)^1.1))",
> "parsedquery_toString":"+((recordID:[18 TO 18] recordID:[19 TO 19]
> recordID:[20 TO 20])~2) ((text:\"19 20\"~100)^0.2 | (annotations:\"19
> 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19
> 20\"~100)^1.1)",
> "explain":{},
> "QParser":"ExtendedDismaxQParser",
> "altquerystring":null,
> "boost_queries":null,
> "parsed_boost_queries":[],
> "boostfuncs":[""],
> "timing":{
>   "time":75.0,
>   "prepare":{
> "time":35.0,
> "query":{
>   "time":35.0},
> "facet":{
&

RE: Edismax ignoring queries containing booleans

2020-01-08 Thread Claire Pollard
It would be lovely to be able to use range to complete my searches, but sadly 
documents aren't necessarily sequential so I might want say 18, 24 or 30 in 
future.

I've re-run the query with debug on. Is there anything here that looks unusual? 
Thanks.

{
  "responseHeader":{
"status":0,
"QTime":75,
"params":{
  "mm":"\r\n   0<1 2<-1 5<-2 6<90%\r\n  ",
  "spellcheck.collateExtendedResults":"true",
  "df":["text",
"text"],
  "q.alt":"*:*",
  "ps":"100",
  "spellcheck.dictionary":["default",
"wordbreak"],
  "bf":"",
  "echoParams":"all",
  "fl":"*,score",
  "spellcheck.maxCollations":"5",
  "rows":"10",
  "spellcheck.alternativeTermCount":"5",
  "spellcheck.extendedResults":"true",
  "q":"recordID:(18 OR 19 OR 20)",
  "defType":"edismax",
  "spellcheck.maxResultsForSuggest":"5",
  "qf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.4 recordID^10.0 
annotations^0.5 collectionTitle^1.9 collectionDescription^0.9 title^2.0 
Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0 
french2^1.0\r\n\n\t\t\t\t\n\t\t\t",
  "spellcheck":"on",
  "pf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.2 recordID^10.0 
annotations^0.6 collectionTitle^2.0 collectionDescription^1.0 title^2.1 
Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1 
french2^1.1\r\n\n\t\t\t\t\n\t\t\t",
  "spellcheck.count":"10",
  "debugQuery":"on",
  "_":"1578499092576",
  "spellcheck.collate":"true"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  },
  "spellcheck":{
"suggestions":[],
"correctlySpelled":false,
"collations":[]},
  "debug":{
"rawquerystring":"recordID:(18 OR 19 OR 20)",
"querystring":"recordID:(18 OR 19 OR 20)",
"parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 
TO 20]))~2 DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2 | (annotations:\"19 
20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 | 
collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 | (Test_FR:\"19 
20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19 20\"~100)^1.1))",
"parsedquery_toString":"+((recordID:[18 TO 18] recordID:[19 TO 19] 
recordID:[20 TO 20])~2) ((text:\"19 20\"~100)^0.2 | (annotations:\"19 
20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 | 
collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 | (Test_FR:\"19 
20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19 20\"~100)^1.1)",
"explain":{},
"QParser":"ExtendedDismaxQParser",
"altquerystring":null,
"boost_queries":null,
"parsed_boost_queries":[],
"boostfuncs":[""],
"timing":{
  "time":75.0,
  "prepare":{
"time":35.0,
"query":{
  "time":35.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"spellcheck":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":38.0,
"query":{
  "time":29.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"spellcheck":{
  "time":6.0},
&quo

Extending SOLR default/eDisMax query parser with Span queries functionalities

2020-01-07 Thread Kaminski, Adi
Hi,
We would like to extend SOLR default (named 'lucene' per: 
https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html)
or eDisMax query parser with additional functionality of Lucene Span queries in 
order to allow via standard parsers to execute position search (SpanFirst, etc.)
via more trivial interface (for example 'sq=' clause).

Is there any guideline/HowTo regarding required areas to focus on/implement, 
important notes/checklist, etc. ?
(the idea I guess is to inherit the default/eDisMax relevant classes and expand 
functionality, without harming the existing ones)

We've found the below try to do something similar, but it was at 2012 and on 
very old Solr version (4.X), and i assume default SOLR/eDisMax
parsers were changed since then (we are on Solr 8.3 version right now).
https://issues.apache.org/jira/browse/SOLR-3925

Thanks a lot in advance,
Adi



This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Re: Edismax ignoring queries containing booleans

2020-01-06 Thread Edward Ribeiro
Hi Claire,

You can add the following parameter `=all` on the URL to bring back
debugging info and share with us (if you are using the Solr admin UI you
should check the `debugQuery` checkbox).

Also, if you are searching a sequence of values you could perform a range
query: recordID:[18 TO 20]

Best,
Edward

On Mon, Jan 6, 2020 at 10:46 AM Claire Pollard 
wrote:
>
> Ok... It doesn't work for me. I'm fairly new to Solr so any help would be
appreciated!
>
> My managed-schema field and field type look like this:
>
> 
> 
>
> And my solrconfig.xml select/query handlers look like this:
>
> 
> 
> all
> 
> edismax
> 
> text^0.4 recordID^10.0
annotations^0.5 collectionTitle^1.9 collectionDescription^0.9 title^2.0
Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0
french2^1.0
> 
> text
> *:*
> 10
> *,score
> 
> text^0.2 recordID^10.0
annotations^0.6 collectionTitle^2.0 collectionDescription^1.0 title^2.1
Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1
french2^1.1
> 
>01 2-1
5-2 690%  
> 100
> 
> text
> 
> default
> wordbreak
> on
> true
> 10
> 5
> 5
> true
> true
> 5
> 
> 
> spellcheck
> 
> 
> 
>
> 
> 
> explicit
> json
> true
> text
> 
> 
>
> Is there anything else that might be useful in helping diagnose what's
going wrong for me?
>
> Cheers,
> Claire.
>
> -Original Message-
> From: Saurabh Sharma 
> Sent: 06 January 2020 11:20
> To: solr-user@lucene.apache.org
> Subject: Re: Edismax ignoring queries containing booleans
>
> It should work well. I have just tested the same with 8.3.0.
>
> Thanks
> Saurabh Sharma
>
> On Mon, Jan 6, 2020, 4:31 PM Claire Pollard 
> wrote:
>
> > I'm using:
> >
> > recordID:(18 OR 19 OR 20)
> >
> > Which should return 2 records (as 18 doesn't exist), but it returns
none.
> > recordID is a LongPointField (sorry I said Int in my previous message).
> >
> > -Original Message-
> > From: Saurabh Sharma 
> > Sent: 06 January 2020 10:35
> > To: solr-user@lucene.apache.org
> > Subject: Re: Edismax ignoring queries containing booleans
> >
> > Please share the query which you are creating.
> >
> > On Mon, Jan 6, 2020, 3:52 PM Claire Pollard 
> > wrote:
> >
> > > In Solr 8.3.0 I've got an edismax query parser in my search handler,
> > > and it seems to be ignoring Boolean operators such as AND and OR
> > > when searching using an IntPointField.
> > >
> > > I was hoping to use a query to this field to return a batch of
> > > documents with non-sequential IDs, so a range would be inappropriate.
> > >
> > > We had a previous 4.10.2 instance of Solr which uses the now
> > > deprecated Trie fields, and these seem to search without issue using
> > boolean operators.
> > >
> > > Is there something extra I need to do with my setup for PointFields
> > > to use booleans or should they work as default.
> > >
> > > Cheers,
> > > Claire.
> > >
> >


RE: Edismax ignoring queries containing booleans

2020-01-06 Thread Claire Pollard
Ok... It doesn't work for me. I'm fairly new to Solr so any help would be 
appreciated!

My managed-schema field and field type look like this:




And my solrconfig.xml select/query handlers look like this:



all

edismax

text^0.4 recordID^10.0 annotations^0.5 
collectionTitle^1.9 collectionDescription^0.9 title^2.0 Test_FR^1.0 Test_DE^1.0 
Test_AR^1.0 genre^1.0 genre_fr^1.0 french2^1.0

text
*:*
10
*,score

text^0.2 recordID^10.0 annotations^0.6 
collectionTitle^2.0 collectionDescription^1.0 title^2.1 Test_FR^1.1 Test_DE^1.1 
Test_AR^1.1 genre^1.1 genre_fr^1.1 french2^1.1

   01 2-1 
5-2 690%  
100

text

default
wordbreak
on
true
10
5
5
true
true
5


spellcheck






explicit
json
true
text



Is there anything else that might be useful in helping diagnose what's going 
wrong for me?

Cheers,
Claire.

-Original Message-
From: Saurabh Sharma  
Sent: 06 January 2020 11:20
To: solr-user@lucene.apache.org
Subject: Re: Edismax ignoring queries containing booleans

It should work well. I have just tested the same with 8.3.0.

Thanks
Saurabh Sharma

On Mon, Jan 6, 2020, 4:31 PM Claire Pollard 
wrote:

> I'm using:
>
> recordID:(18 OR 19 OR 20)
>
> Which should return 2 records (as 18 doesn't exist), but it returns none.
> recordID is a LongPointField (sorry I said Int in my previous message).
>
> -Original Message-
> From: Saurabh Sharma 
> Sent: 06 January 2020 10:35
> To: solr-user@lucene.apache.org
> Subject: Re: Edismax ignoring queries containing booleans
>
> Please share the query which you are creating.
>
> On Mon, Jan 6, 2020, 3:52 PM Claire Pollard 
> wrote:
>
> > In Solr 8.3.0 I've got an edismax query parser in my search handler, 
> > and it seems to be ignoring Boolean operators such as AND and OR 
> > when searching using an IntPointField.
> >
> > I was hoping to use a query to this field to return a batch of 
> > documents with non-sequential IDs, so a range would be inappropriate.
> >
> > We had a previous 4.10.2 instance of Solr which uses the now 
> > deprecated Trie fields, and these seem to search without issue using
> boolean operators.
> >
> > Is there something extra I need to do with my setup for PointFields 
> > to use booleans or should they work as default.
> >
> > Cheers,
> > Claire.
> >
>


Re: Edismax ignoring queries containing booleans

2020-01-06 Thread Saurabh Sharma
It should work well. I have just tested the same with 8.3.0.

Thanks
Saurabh Sharma

On Mon, Jan 6, 2020, 4:31 PM Claire Pollard 
wrote:

> I'm using:
>
> recordID:(18 OR 19 OR 20)
>
> Which should return 2 records (as 18 doesn't exist), but it returns none.
> recordID is a LongPointField (sorry I said Int in my previous message).
>
> -Original Message-
> From: Saurabh Sharma 
> Sent: 06 January 2020 10:35
> To: solr-user@lucene.apache.org
> Subject: Re: Edismax ignoring queries containing booleans
>
> Please share the query which you are creating.
>
> On Mon, Jan 6, 2020, 3:52 PM Claire Pollard 
> wrote:
>
> > In Solr 8.3.0 I've got an edismax query parser in my search handler,
> > and it seems to be ignoring Boolean operators such as AND and OR when
> > searching using an IntPointField.
> >
> > I was hoping to use a query to this field to return a batch of
> > documents with non-sequential IDs, so a range would be inappropriate.
> >
> > We had a previous 4.10.2 instance of Solr which uses the now
> > deprecated Trie fields, and these seem to search without issue using
> boolean operators.
> >
> > Is there something extra I need to do with my setup for PointFields to
> > use booleans or should they work as default.
> >
> > Cheers,
> > Claire.
> >
>


RE: Edismax ignoring queries containing booleans

2020-01-06 Thread Claire Pollard
I'm using:

recordID:(18 OR 19 OR 20)

Which should return 2 records (as 18 doesn't exist), but it returns none. 
recordID is a LongPointField (sorry I said Int in my previous message).

-Original Message-
From: Saurabh Sharma  
Sent: 06 January 2020 10:35
To: solr-user@lucene.apache.org
Subject: Re: Edismax ignoring queries containing booleans

Please share the query which you are creating.

On Mon, Jan 6, 2020, 3:52 PM Claire Pollard 
wrote:

> In Solr 8.3.0 I've got an edismax query parser in my search handler, 
> and it seems to be ignoring Boolean operators such as AND and OR when 
> searching using an IntPointField.
>
> I was hoping to use a query to this field to return a batch of 
> documents with non-sequential IDs, so a range would be inappropriate.
>
> We had a previous 4.10.2 instance of Solr which uses the now 
> deprecated Trie fields, and these seem to search without issue using boolean 
> operators.
>
> Is there something extra I need to do with my setup for PointFields to 
> use booleans or should they work as default.
>
> Cheers,
> Claire.
>


Re: Edismax ignoring queries containing booleans

2020-01-06 Thread Saurabh Sharma
Please share the query which you are creating.

On Mon, Jan 6, 2020, 3:52 PM Claire Pollard 
wrote:

> In Solr 8.3.0 I've got an edismax query parser in my search handler, and
> it seems to be ignoring Boolean operators such as AND and OR when searching
> using an IntPointField.
>
> I was hoping to use a query to this field to return a batch of documents
> with non-sequential IDs, so a range would be inappropriate.
>
> We had a previous 4.10.2 instance of Solr which uses the now deprecated
> Trie fields, and these seem to search without issue using boolean operators.
>
> Is there something extra I need to do with my setup for PointFields to use
> booleans or should they work as default.
>
> Cheers,
> Claire.
>


Edismax ignoring queries containing booleans

2020-01-06 Thread Claire Pollard
In Solr 8.3.0 I've got an edismax query parser in my search handler, and it 
seems to be ignoring Boolean operators such as AND and OR when searching using 
an IntPointField.

I was hoping to use a query to this field to return a batch of documents with 
non-sequential IDs, so a range would be inappropriate.

We had a previous 4.10.2 instance of Solr which uses the now deprecated Trie 
fields, and these seem to search without issue using boolean operators.

Is there something extra I need to do with my setup for PointFields to use 
booleans or should they work as default.

Cheers,
Claire.


Re: shard.preference for single shard queries

2019-12-05 Thread Tomás Fernández Löbbe
Look at SOLR-12217, it explains the limitation and has a patch for SolrJ
cases. Should be merged soon.

Note that the combination of replica types you are describing is not
recommended. See
https://lucene.apache.org/solr/guide/8_1/shards-and-indexing-data-in-solrcloud.html#combining-replica-types-in-a-cluster


On Thu, Dec 5, 2019 at 5:58 AM spanchal 
wrote:

> Hi all, Thanks to  SOLR-11982
> <https://issues.apache.org/jira/browse/SOLR-11982>   we can now give solr
> parameter to sort replicas while giving results but ONLY for distributed
> queries as per documentation. May I know why this limitation?
>
> As my setup, I have 3 replicas(2 NRT, 1 PULL) of a single shard on 3
> different machines. Since NRT replicas might be busy with indexing, I would
> like my queries to land on PULL replica as a preferred option. And
> shard.preference=replica.type:PULL is not working in my case.
> Please help, thanks.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


shard.preference for single shard queries

2019-12-05 Thread spanchal
Hi all, Thanks to  SOLR-11982
<https://issues.apache.org/jira/browse/SOLR-11982>   we can now give solr
parameter to sort replicas while giving results but ONLY for distributed
queries as per documentation. May I know why this limitation?

As my setup, I have 3 replicas(2 NRT, 1 PULL) of a single shard on 3
different machines. Since NRT replicas might be busy with indexing, I would
like my queries to land on PULL replica as a preferred option. And
shard.preference=replica.type:PULL is not working in my case. 
Please help, thanks.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Is it possible to use the Lucene Query Builder? Is there any API to create boolean queries?

2019-12-02 Thread yeikel valdes
Is there any builder for the XMLQueryParser so that we don't need to build as a 
String?


And what query DSL are you referring to?


 On Mon, 02 Dec 2019 08:00:57 -1100 m...@apache.org wrote 


and Query DSL as well. Although, it didn't get the point in the topic
starter.

On Mon, Dec 2, 2019 at 9:16 PM Alexandre Rafalovitch 
wrote:

> What about XMLQueryParser:
>
> https://lucene.apache.org/solr/guide/8_2/other-parsers.html#xml-query-parser
>
> Regards,
> Alex.
>
> On Wed, 27 Nov 2019 at 22:43,  wrote:
> >
> > I am trying to simulate the following query(Lucene query builder) using
> Solr
> >
> >
> >
> >
> > BooleanQuery.Builder main = new BooleanQuery.Builder();
> >
> > Term t1 = new Term("f1","term");
> > Term t2 = new Term("f1","second");
> > Term t3 = new Term("f1","another");
> >
> > BooleanQuery.Builder q1 = new BooleanQuery.Builder();
> > q1.add(new FuzzyQuery(t1,2), BooleanClause.Occur.SHOULD);
> > q1.add(new FuzzyQuery(t2,2), BooleanClause.Occur.SHOULD);
> > q1.add(new FuzzyQuery(t3,2), BooleanClause.Occur.SHOULD);
> > q1.setMinimumNumberShouldMatch(2);
> >
> > Term t4 = new Term("f1","anothert");
> > Term t5 = new Term("f1","anothert2");
> > Term t6 = new Term("f1","anothert3");
> >
> > BooleanQuery.Builder q2 = new BooleanQuery.Builder();
> > q2.add(new FuzzyQuery(t4,2), BooleanClause.Occur.SHOULD);
> > q2.add(new FuzzyQuery(t5,2), BooleanClause.Occur.SHOULD);
> > q2.add(new FuzzyQuery(t6,2), BooleanClause.Occur.SHOULD);
> > q2.setMinimumNumberShouldMatch(2);
> >
> >
> > main.add(q1.build(),BooleanClause.Occur.SHOULD);
> > main.add(q2.build(),BooleanClause.Occur.SHOULD);
> > main.setMinimumNumberShouldMatch(1);
> >
> > System.out.println(main.build()); // (((f1:term~2 f1:second~2
> > f1:another~2)~2) ((f1:anothert~2 f1:anothert2~2 f1:anothert3~2)~2))~1
> -->
> > Invalid Solr Query
> >
> >
> >
> >
> >
> > In a few words : ( q1 OR q2 )
> >
> >
> >
> > Where q1 and q2 are a set of different terms using I'd like to do a fuzzy
> > search but I also need a minimum of terms to match.
> >
> >
> >
> > The best I was able to create was something like this :
> >
> >
> >
> > SolrQuery query = new SolrQuery();
> > query.set("fl", "term");
> > query.set("q", "term~1 term2~2 term3~2");
> > query.set("mm",2);
> >
> > System.out.println(query);
> >
> >
> >
> > And I was unable to find any example that would allow me to do the type
> of
> > query that I am trying to build with only one solr query.
> >
> >
> >
> > Is it possible to use the Lucene Query builder with Solr? Is there any
> way
> > to create Boolean queries with Solr? Do I need to build the query as a
> > String? If so , how do I set the mm parameter in a String query?
> >
> >
> >
> > Thank you
> >
>


--
Sincerely yours
Mikhail Khludnev


Re: Is it possible to use the Lucene Query Builder? Is there any API to create boolean queries?

2019-12-02 Thread Mikhail Khludnev
and Query DSL as well. Although, it didn't get the point in the topic
starter.

On Mon, Dec 2, 2019 at 9:16 PM Alexandre Rafalovitch 
wrote:

> What about XMLQueryParser:
>
> https://lucene.apache.org/solr/guide/8_2/other-parsers.html#xml-query-parser
>
> Regards,
>Alex.
>
> On Wed, 27 Nov 2019 at 22:43,  wrote:
> >
> > I am trying to simulate the following query(Lucene query builder) using
> Solr
> >
> >
> >
> >
> > BooleanQuery.Builder main = new BooleanQuery.Builder();
> >
> > Term t1 = new Term("f1","term");
> > Term t2 = new Term("f1","second");
> > Term t3 = new Term("f1","another");
> >
> > BooleanQuery.Builder q1 = new BooleanQuery.Builder();
> > q1.add(new FuzzyQuery(t1,2), BooleanClause.Occur.SHOULD);
> > q1.add(new FuzzyQuery(t2,2), BooleanClause.Occur.SHOULD);
> > q1.add(new FuzzyQuery(t3,2), BooleanClause.Occur.SHOULD);
> > q1.setMinimumNumberShouldMatch(2);
> >
> > Term t4 = new Term("f1","anothert");
> > Term t5 = new Term("f1","anothert2");
> > Term t6 = new Term("f1","anothert3");
> >
> > BooleanQuery.Builder q2 = new BooleanQuery.Builder();
> > q2.add(new FuzzyQuery(t4,2), BooleanClause.Occur.SHOULD);
> > q2.add(new FuzzyQuery(t5,2), BooleanClause.Occur.SHOULD);
> > q2.add(new FuzzyQuery(t6,2), BooleanClause.Occur.SHOULD);
> > q2.setMinimumNumberShouldMatch(2);
> >
> >
> > main.add(q1.build(),BooleanClause.Occur.SHOULD);
> > main.add(q2.build(),BooleanClause.Occur.SHOULD);
> > main.setMinimumNumberShouldMatch(1);
> >
> > System.out.println(main.build()); // (((f1:term~2 f1:second~2
> > f1:another~2)~2) ((f1:anothert~2 f1:anothert2~2 f1:anothert3~2)~2))~1
>  -->
> > Invalid Solr Query
> >
> >
> >
> >
> >
> > In a few words :  ( q1 OR q2 )
> >
> >
> >
> > Where q1 and q2 are a set of different terms using I'd like to do a fuzzy
> > search but I also need a minimum of terms to match.
> >
> >
> >
> > The best I was able to create was something like this  :
> >
> >
> >
> > SolrQuery query = new SolrQuery();
> > query.set("fl", "term");
> > query.set("q", "term~1 term2~2 term3~2");
> > query.set("mm",2);
> >
> > System.out.println(query);
> >
> >
> >
> > And I was unable to find any example that would allow me to do the type
> of
> > query that I am trying to build with only one solr query.
> >
> >
> >
> > Is it possible to use the Lucene Query builder with Solr? Is there any
> way
> > to create Boolean queries with Solr? Do I need to build the query as a
> > String? If so , how do I set the mm parameter in a String query?
> >
> >
> >
> > Thank you
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Is it possible to use the Lucene Query Builder? Is there any API to create boolean queries?

2019-12-02 Thread Alexandre Rafalovitch
What about XMLQueryParser:
https://lucene.apache.org/solr/guide/8_2/other-parsers.html#xml-query-parser

Regards,
   Alex.

On Wed, 27 Nov 2019 at 22:43,  wrote:
>
> I am trying to simulate the following query(Lucene query builder) using Solr
>
>
>
>
> BooleanQuery.Builder main = new BooleanQuery.Builder();
>
> Term t1 = new Term("f1","term");
> Term t2 = new Term("f1","second");
> Term t3 = new Term("f1","another");
>
> BooleanQuery.Builder q1 = new BooleanQuery.Builder();
> q1.add(new FuzzyQuery(t1,2), BooleanClause.Occur.SHOULD);
> q1.add(new FuzzyQuery(t2,2), BooleanClause.Occur.SHOULD);
> q1.add(new FuzzyQuery(t3,2), BooleanClause.Occur.SHOULD);
> q1.setMinimumNumberShouldMatch(2);
>
> Term t4 = new Term("f1","anothert");
> Term t5 = new Term("f1","anothert2");
> Term t6 = new Term("f1","anothert3");
>
> BooleanQuery.Builder q2 = new BooleanQuery.Builder();
> q2.add(new FuzzyQuery(t4,2), BooleanClause.Occur.SHOULD);
> q2.add(new FuzzyQuery(t5,2), BooleanClause.Occur.SHOULD);
> q2.add(new FuzzyQuery(t6,2), BooleanClause.Occur.SHOULD);
> q2.setMinimumNumberShouldMatch(2);
>
>
> main.add(q1.build(),BooleanClause.Occur.SHOULD);
> main.add(q2.build(),BooleanClause.Occur.SHOULD);
> main.setMinimumNumberShouldMatch(1);
>
> System.out.println(main.build()); // (((f1:term~2 f1:second~2
> f1:another~2)~2) ((f1:anothert~2 f1:anothert2~2 f1:anothert3~2)~2))~1   -->
> Invalid Solr Query
>
>
>
>
>
> In a few words :  ( q1 OR q2 )
>
>
>
> Where q1 and q2 are a set of different terms using I'd like to do a fuzzy
> search but I also need a minimum of terms to match.
>
>
>
> The best I was able to create was something like this  :
>
>
>
> SolrQuery query = new SolrQuery();
> query.set("fl", "term");
> query.set("q", "term~1 term2~2 term3~2");
> query.set("mm",2);
>
> System.out.println(query);
>
>
>
> And I was unable to find any example that would allow me to do the type of
> query that I am trying to build with only one solr query.
>
>
>
> Is it possible to use the Lucene Query builder with Solr? Is there any way
> to create Boolean queries with Solr? Do I need to build the query as a
> String? If so , how do I set the mm parameter in a String query?
>
>
>
> Thank you
>


Is it possible to use the Lucene Query Builder? Is there any API to create boolean queries?

2019-11-27 Thread email
I am trying to simulate the following query(Lucene query builder) using Solr


 

BooleanQuery.Builder main = new BooleanQuery.Builder();

Term t1 = new Term("f1","term");
Term t2 = new Term("f1","second");
Term t3 = new Term("f1","another");

BooleanQuery.Builder q1 = new BooleanQuery.Builder();
q1.add(new FuzzyQuery(t1,2), BooleanClause.Occur.SHOULD);
q1.add(new FuzzyQuery(t2,2), BooleanClause.Occur.SHOULD);
q1.add(new FuzzyQuery(t3,2), BooleanClause.Occur.SHOULD);
q1.setMinimumNumberShouldMatch(2);

Term t4 = new Term("f1","anothert");
Term t5 = new Term("f1","anothert2");
Term t6 = new Term("f1","anothert3");

BooleanQuery.Builder q2 = new BooleanQuery.Builder();
q2.add(new FuzzyQuery(t4,2), BooleanClause.Occur.SHOULD);
q2.add(new FuzzyQuery(t5,2), BooleanClause.Occur.SHOULD);
q2.add(new FuzzyQuery(t6,2), BooleanClause.Occur.SHOULD);
q2.setMinimumNumberShouldMatch(2);


main.add(q1.build(),BooleanClause.Occur.SHOULD);
main.add(q2.build(),BooleanClause.Occur.SHOULD);
main.setMinimumNumberShouldMatch(1);

System.out.println(main.build()); // (((f1:term~2 f1:second~2
f1:another~2)~2) ((f1:anothert~2 f1:anothert2~2 f1:anothert3~2)~2))~1   -->
Invalid Solr Query

 

 

In a few words :  ( q1 OR q2 )

 

Where q1 and q2 are a set of different terms using I'd like to do a fuzzy
search but I also need a minimum of terms to match. 

 

The best I was able to create was something like this  : 

 

SolrQuery query = new SolrQuery();
query.set("fl", "term");
query.set("q", "term~1 term2~2 term3~2");
query.set("mm",2);

System.out.println(query);

 

And I was unable to find any example that would allow me to do the type of
query that I am trying to build with only one solr query. 

 

Is it possible to use the Lucene Query builder with Solr? Is there any way
to create Boolean queries with Solr? Do I need to build the query as a
String? If so , how do I set the mm parameter in a String query? 

 

Thank you



Problems with TokenFilter, but only in wildcard queries

2019-10-16 Thread Björn Keil
Hello,

I am having a problem with a primitive self-written TokenFilter, namely the
GermanUmlautFilter in the example below. It's being used for both queries
and indexing.
It works perfectly most of the time, it replace ä with ae, ö with oe and so
forth, before ICUFoldingFilter replaces the remaining non-ascii symbols.

However, it does cause odd behaviour in Wildcard Queries. e.g.:
The query title:todesmä* matches todesmarsch, which it should not, because
an ä is supposed to be replaced with an ae, however, it also matches
todesmärchen, as it should.
The query title:todesmär still matches todesmarsch, but not todesmärchen.

That is odd, even as though the replacement did not take place while
performing a wildcard query, even though it did work during indexing. In
different circumstances it works, however. E.g.:
The query title:härte does correctly not match harte, but it does match
härte.
The query title:haerte is equivalent to the query title:härte.
The query title:harte does correctly not match haerte, but it does match
harte.

While debugging the GermanUmlautFilter, I did not find any obvious mistake.
The only thing that it is a bit strange is that the CharTermAttribute's
(implement by PackedTokenAttributeImpl) endOffset attribute does not appear
to change. However, if it is supposed to indicate the last character's
offset in byte, that would be the expected result: It replaces a single
two-byte character with two one byte characters in the examples above.

Does anybody have an idea what's going on here? What's so different about
wildcard queries?

>From the schema.xml:


  














GermanUmlautFilter code:

package de.example.analysis;

import java.io.IOException;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

/**
 * This TokenFilter replaces German umlauts and the character ß with a
normalized form in ASCII characters.
 *
 * ü => ue
 * ß => ss
 * etc.
 *
 * This enables a sort order according DIN 5007, variant 2, the so
called "phone book" sort order.
 *
 * @see org.apache.lucene.analysis.TokenStream
 *
 */
public class GermanUmaultFilter extends TokenFilter {

private final CharTermAttribute termAtt =
addAttribute(CharTermAttribute.class);

/**
 * @see org.apache.lucene.analysis.TokenFilter#TokenFilter()
 * @param input TokenStream with the tokens to filter
 */
public GermanUmaultFilter(TokenStream input) {
super(input);
}

/**
 * Performs the actual filtering upon request by the consumer.
 *
 * @see org.apache.lucene.analysis.TokenStream#incrementToken()
 * @return true on success, false on failure
 */
public boolean incrementToken() throws IOException {
if (input.incrementToken()) {
int countReplacements = 0;
char[] origBuffer = termAtt.buffer();
int origLength = termAtt.length();
// Figure out how many replacements we need to get the 
size of the new buffer
for (int i = 0; i < origLength; i++) {
if (origBuffer[i] == 'ü'
|| origBuffer[i] == 'ä'
|| origBuffer[i] == 'ö'
|| origBuffer[i] == 'ß'
|| origBuffer[i] == 'Ä'
|| origBuffer[i] == 'Ö'
|| origBuffer[i] == 'Ü'
) {
countReplacements++;
}
}

// If there is a replacement create a new buffer of the 
appropriate length...
if (countReplacements != 0) {
int newLength = origLength + countReplacements;
char[] target = new char[newLength];
int j = 0;
// ... perform the replacement ...
for (int i = 0; i < origLength; i++) {
switch (origBuffer[i]) {
case 'ä':
target[j++] = 'a';
target[j++] = 'e';
break;
case 'ö':
target[j++] = 'o';
target[j++] = 'e';
break;
  

Problems with Wildcard Queries / Own Filter

2019-10-15 Thread Björn Keil
Hello,

I am having a bit of a problem with Wildcard queries and I don't know how
to pin it down yet. I have a suspect, but I kind find an error in it, one
of the filters in the respective search field.

The problem is that when I do a wildcard query:
title:todesmä*
it does return a result, but it also returns results that would match
title:todesma* It is not supposed to do that because, due to the filter,
it's supposed to be equivalent to title:todesmae*

The reals problem is that if I search for title:todesmär* it does not find
anything at all anymore. There are titles on the index that would match
"todesmärsche" and "todesmärchen".

I have looked the Filter in a debugger, but I could not find anything wrong
with it. It's supposed to replace the "ä" with "ae", which it does, calls
termAtt.resizeBuffer() before it does and termAtt.length() afterwards. The
result seems perfectly alright. What it does not change is the endOffset
attribute of the CharTermAttribute object, that's probably because it's
counting Bytes, not characters; I replaced a single two-byte char with a
two one-byte chars, consequently the endOffset is the same.

Could anybody tell me whether there is anything wrong with the filter in
the attachment?
package de.example.analysis;

import java.io.IOException;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

/**
 * This TokenFilter replaces German umlauts and the character ß with a normalized form in ASCII characters.
 * 
 * ü => ue
 * ß => ss
 * etc.
 * 
 * This enables a sort order according DIN 5007, variant 2, the so called "phone book" sort order.
 * 
 * @see org.apache.lucene.analysis.TokenStream
 *
 */
public class GermanUmaultFilter extends TokenFilter {
	
	private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);

	/**
	 * @see org.apache.lucene.analysis.TokenFilter#TokenFilter()
	 * @param input TokenStream with the tokens to filter
	 */
	public GermanUmaultFilter(TokenStream input) {
		super(input);
	}

	/**
	 * Performs the actual filtering whenever upon request by the consumer.
	 * 
	 * @see org.apache.lucene.analysis.TokenStream#incrementToken()
	 * @return true on success, false on failure
	 */
	public boolean incrementToken() throws IOException {
		if (input.incrementToken()) {
			int countReplacements = 0;
			char[] origBuffer = termAtt.buffer();
			int origLength = termAtt.length();
			// Figure out how many replacements we need to get the size of the new buffer
			for (int i = 0; i < origLength; i++) {
if (origBuffer[i] == 'ü'
	|| origBuffer[i] == 'ä'
	|| origBuffer[i] == 'ö'
	|| origBuffer[i] == 'ß'
	|| origBuffer[i] == 'Ä'
	|| origBuffer[i] == 'Ö'
	|| origBuffer[i] == 'Ü'
) {
	countReplacements++;
}
			}
			
			// If there is a replacement create a new buffer of the appropriate length...
			if (countReplacements != 0) {
int newLength = origLength + countReplacements;
char[] target = new char[newLength];
int j = 0;
// ... perform the replacement ...
for (int i = 0; i < origLength; i++) {
	switch (origBuffer[i]) {
	case 'ä':
		target[j++] = 'a';
		target[j++] = 'e';
		break;
	case 'ö':
		target[j++] = 'o';
		target[j++] = 'e';
		break;
	case 'ü':
		target[j++] = 'u';
		target[j++] = 'e';
		break;
	case 'Ä':
		target[j++] = 'A';
		target[j++] = 'E';
		break;
	case 'Ö':
		target[j++] = 'O';
		target[j++] = 'E';
		break;
	case 'Ü':
		target[j++] = 'U';
		target[j++] = 'E';
		break;
	case 'ß':
		target[j++] = 's';
		target[j++] = 's';
		break;
	default:
		target[j++] = origBuffer[i];
	}
}
// ... make sure the attribute's buffer is large enough, copy the new buffer
// and set the length ...
termAtt.resizeBuffer(newLength);
termAtt.copyBuffer(target, 0, newLength);
termAtt.setLength(newLength);
			}
			return true;
		} else {
			return false;
		}
	}

}


Re: How to block expensive solr queries

2019-10-10 Thread Wei
On Wed, Oct 9, 2019 at 9:59 AM Wei  wrote:

> Thanks all. I debugged a bit and see timeAllowed does not limit stats
> call. Also I think it would be useful for solr to support a white list or
> black list of operations as Toke suggested. Will create jira for it.
> Currently seems the only option to explore is adding filter to solr's
> embedded jetty.  Does anyone have experience doing that? Do I also need to
> change SolrDispatchFilter?
>
> On Tue, Oct 8, 2019 at 3:50 AM Toke Eskildsen  wrote:
>
>> On Mon, 2019-10-07 at 10:18 -0700, Wei wrote:
>> > /solr/mycollection/select?stats=true=unique_ids
>> > cdistinct=true
>> ...
>> > Is there a way to block certain solr queries based on url pattern?
>> > i.e. ignore the stats.calcdistinct request in this case.
>>
>> It sounds like it is possible for users to issue arbitrary queries
>> against your Solr installation. As you have noticed, it makes it easy
>> to perform a Denial Of Service (intentional or not). Filtering out
>> stats.calcdistinct won't help with the next request for
>> group.ngroups=true, facet.field=unique_id=1,
>> rows=1 or something fifth.
>>
>> I recommend you flip your logic and only allow specific types of
>> requests and put limits on those. To my knowledge that is not a build-
>> in feature of Solr.
>>
>> - Toke Eskildsem, Royal Danish Library
>>
>>
>>


Re: How to block expensive solr queries

2019-10-08 Thread Toke Eskildsen
On Mon, 2019-10-07 at 10:18 -0700, Wei wrote:
> /solr/mycollection/select?stats=true=unique_ids
> cdistinct=true
...
> Is there a way to block certain solr queries based on url pattern?
> i.e. ignore the stats.calcdistinct request in this case.

It sounds like it is possible for users to issue arbitrary queries
against your Solr installation. As you have noticed, it makes it easy
to perform a Denial Of Service (intentional or not). Filtering out
stats.calcdistinct won't help with the next request for
group.ngroups=true, facet.field=unique_id=1,
rows=1 or something fifth.

I recommend you flip your logic and only allow specific types of
requests and put limits on those. To my knowledge that is not a build-
in feature of Solr.

- Toke Eskildsem, Royal Danish Library




Re: How to block expensive solr queries

2019-10-08 Thread Mikhail Khludnev
It's worth to raise an issue for supporting timeAllowed for stats. Until
it's done, something like jetty filter is only an option,

On Tue, Oct 8, 2019 at 12:34 AM Wei  wrote:

> Hi Mikhail,
>
> Yes I have the timeAllowed parameter configured, still is this case it
> doesn't seem to prevent the stats request from blocking other normal
> queries.  Is it possible to drop the request before solr executes it? maybe
> at the jetty request filter?
>
> Thanks,
> Wei
>
> On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev  wrote:
>
> > Hello, Wei.
> >
> > Have you tried to abandon heavy queries with
> >
> >
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
> >  ?
> > It may or may not be able to stop stats.
> >
> >
> https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
> > can clarify it.
> >
> > On Mon, Oct 7, 2019 at 8:19 PM Wei  wrote:
> >
> > > Hi,
> > >
> > > Recently we encountered a problem when solr cloud query latency
> suddenly
> > > increase, many simple queries that has small recall gets time out.
> After
> > > digging a bit I found that the root cause is some stats queries happen
> at
> > > the same time, such as
> > >
> > >
> > >
> >
> /solr/mycollection/select?stats=true=unique_ids=true
> > >
> > >
> > >
> > > I see unique_ids is a high cardinality field so this query is quite
> > > expensive. But why a small volume of such query blocks other queries
> and
> > > make simple queries time out?  I checked the solr thread pool and see
> > there
> > > are plenty of idle threads available.  We are using solr 7.6.2 with a
> 10
> > > shard cloud set up.
> > >
> > > Is there a way to block certain solr queries based on url pattern? i.e.
> > > ignore the stats.calcdistinct request in this case.
> > >
> > >
> > > Thanks,
> > >
> > > Wei
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How to block expensive solr queries

2019-10-07 Thread Wei
Hi Mikhail,

Yes I have the timeAllowed parameter configured, still is this case it
doesn't seem to prevent the stats request from blocking other normal
queries.  Is it possible to drop the request before solr executes it? maybe
at the jetty request filter?

Thanks,
Wei

On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev  wrote:

> Hello, Wei.
>
> Have you tried to abandon heavy queries with
>
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
>  ?
> It may or may not be able to stop stats.
>
> https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
> can clarify it.
>
> On Mon, Oct 7, 2019 at 8:19 PM Wei  wrote:
>
> > Hi,
> >
> > Recently we encountered a problem when solr cloud query latency suddenly
> > increase, many simple queries that has small recall gets time out. After
> > digging a bit I found that the root cause is some stats queries happen at
> > the same time, such as
> >
> >
> >
> /solr/mycollection/select?stats=true=unique_ids=true
> >
> >
> >
> > I see unique_ids is a high cardinality field so this query is quite
> > expensive. But why a small volume of such query blocks other queries and
> > make simple queries time out?  I checked the solr thread pool and see
> there
> > are plenty of idle threads available.  We are using solr 7.6.2 with a 10
> > shard cloud set up.
> >
> > Is there a way to block certain solr queries based on url pattern? i.e.
> > ignore the stats.calcdistinct request in this case.
> >
> >
> > Thanks,
> >
> > Wei
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: How to block expensive solr queries

2019-10-07 Thread Mikhail Khludnev
Hello, Wei.

Have you tried to abandon heavy queries with
https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
 ?
It may or may not be able to stop stats.
https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
can clarify it.

On Mon, Oct 7, 2019 at 8:19 PM Wei  wrote:

> Hi,
>
> Recently we encountered a problem when solr cloud query latency suddenly
> increase, many simple queries that has small recall gets time out. After
> digging a bit I found that the root cause is some stats queries happen at
> the same time, such as
>
>
> /solr/mycollection/select?stats=true=unique_ids=true
>
>
>
> I see unique_ids is a high cardinality field so this query is quite
> expensive. But why a small volume of such query blocks other queries and
> make simple queries time out?  I checked the solr thread pool and see there
> are plenty of idle threads available.  We are using solr 7.6.2 with a 10
> shard cloud set up.
>
> Is there a way to block certain solr queries based on url pattern? i.e.
> ignore the stats.calcdistinct request in this case.
>
>
> Thanks,
>
> Wei
>


-- 
Sincerely yours
Mikhail Khludnev


How to block expensive solr queries

2019-10-07 Thread Wei
Hi,

Recently we encountered a problem when solr cloud query latency suddenly
increase, many simple queries that has small recall gets time out. After
digging a bit I found that the root cause is some stats queries happen at
the same time, such as

/solr/mycollection/select?stats=true=unique_ids=true



I see unique_ids is a high cardinality field so this query is quite
expensive. But why a small volume of such query blocks other queries and
make simple queries time out?  I checked the solr thread pool and see there
are plenty of idle threads available.  We are using solr 7.6.2 with a 10
shard cloud set up.

Is there a way to block certain solr queries based on url pattern? i.e.
ignore the stats.calcdistinct request in this case.


Thanks,

Wei


Re: OR and AND queries case sensitive in q param?

2019-09-13 Thread Paras Lehana
Hey Shawn,

Love your Solr articles! Just joined here.

The edismax query parser supports lowercase operators, if the
> lowercaseOperators parameter is set to true. I believe it defaults to
> false.


 To add - Yes, *lowercaseOperators* defaults to *false* as per Solr Ref
Guide 8.1

and github source code
.
However, Ref Guide 6.6

doesn't specify the default value.

Arnold, additionally, I suggest you to confirm about your *mm* (minimum
match) parameter and stop words filter (which might had 'or' as a stop
word) though I assume that you are using exactly the same query/schema.

On Fri, 13 Sep 2019 at 06:11, Shawn Heisey  wrote:

> On 9/12/2019 5:50 PM, Arnold Bronley wrote:
> > in Solr 6.3, I was able to use OR and AND operators in case insensitive
> > manner.
>
> The edismax query parser supports lowercase operators, if the
> lowercaseOperators parameter is set to true.  I believe it defaults to
> false.
>
> > Then if I pass 'rick OR morty' to q param then I would get both documents
> > back. I would get both documents back even if I pass 'rick or morty'.
> >
> > In Solr 8.2, I am not able to 'rick or morty' does not give any results
> > back. 'rick OR morty' gives both results back.
>
> The default (lucene) query parser does NOT support lowercase operators.
> It never has.
>
> > Is this intentional change?
>
> The change you may be experiencing is that I THINK at some point (no
> idea when) the default for lowercaseOperators (which only works with
> edismax) changed from true to false.
>
> Thanks,
> Shawn
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Software Programmer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: OR and AND queries case sensitive in q param?

2019-09-12 Thread Shawn Heisey

On 9/12/2019 5:50 PM, Arnold Bronley wrote:

in Solr 6.3, I was able to use OR and AND operators in case insensitive
manner.


The edismax query parser supports lowercase operators, if the 
lowercaseOperators parameter is set to true.  I believe it defaults to 
false.



Then if I pass 'rick OR morty' to q param then I would get both documents
back. I would get both documents back even if I pass 'rick or morty'.

In Solr 8.2, I am not able to 'rick or morty' does not give any results
back. 'rick OR morty' gives both results back.


The default (lucene) query parser does NOT support lowercase operators. 
It never has.



Is this intentional change?


The change you may be experiencing is that I THINK at some point (no 
idea when) the default for lowercaseOperators (which only works with 
edismax) changed from true to false.


Thanks,
Shawn


OR and AND queries case sensitive in q param?

2019-09-12 Thread Arnold Bronley
Hi,

in Solr 6.3, I was able to use OR and AND operators in case insensitive
manner.

E.g.
If I have two documents like following in my corpus:
document 1:
{
id:1
author:rick
}

document 2:
{
id:2
author:morty
}

Then if I pass 'rick OR morty' to q param then I would get both documents
back. I would get both documents back even if I pass 'rick or morty'.

In Solr 8.2, I am not able to 'rick or morty' does not give any results
back. 'rick OR morty' gives both results back.

Is this intentional change?


Re: Block Join Queries parsed incorrectly

2019-09-09 Thread MUNENDRA S N
This change was done in Solr-7.2. SOLR-11501
<https://issues.apache.org/jira/browse/SOLR-11501> has the details about
the change

Regards,
Munendra S N



On Mon, Sep 9, 2019 at 1:16 PM Enna Raerinne (TAU) 
wrote:

> Hi!
>
> I've been using block join queries with Solr version 7.1 and with request
> handler where defType is edismax and everything has worked fine. I recently
> updated my Solr to 8.1 and updated the luceneMatchVersion also to 8.1 in
> solrconfig.xml. However, now my block join queries don't work anymore and
> when I debugged the issue it seems that my block join queries are not
> parsed correctly when edismax is used as default parser and if
> luceneMatchVersion is 8.1. I searched everywhere I could think of, but
> didn't find out why the parsing has been changed. Is it intentional or a
> bug or am I using Solr the wrong way?
>
> Thanks,
> Enna Raerinne
>


Block Join Queries parsed incorrectly

2019-09-09 Thread Enna Raerinne (TAU)
Hi!

I've been using block join queries with Solr version 7.1 and with request 
handler where defType is edismax and everything has worked fine. I recently 
updated my Solr to 8.1 and updated the luceneMatchVersion also to 8.1 in 
solrconfig.xml. However, now my block join queries don't work anymore and when 
I debugged the issue it seems that my block join queries are not parsed 
correctly when edismax is used as default parser and if luceneMatchVersion is 
8.1. I searched everywhere I could think of, but didn't find out why the 
parsing has been changed. Is it intentional or a bug or am I using Solr the 
wrong way?

Thanks,
Enna Raerinne


Re: Solr restricting time-consuming/heavy processing queries

2019-08-13 Thread Mark Robinson
Thank you Jan for the reply.
I will try it out.

Best,
Mark.

On Mon, Aug 12, 2019 at 6:29 PM Jan Høydahl  wrote:

> I have never used such settings, but you could check out
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#segmentterminateearly-parameter
> which will allow you to pre-sort the index so that any early termination
> will actually return the most relevant docs. This will probably be easier
> to setup once https://issues.apache.org/jira/browse/SOLR-13681 is done.
>
> According to that same page you will not be able to abort long-running
> faceting using timeAllowed, but there are other ways to optimize faceting,
> such as using jsonFacet, threaded execution etc.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 12. aug. 2019 kl. 23:10 skrev Mark Robinson :
>
> Hi Jan,
>
> Thanks for the reply.
> Our normal search times is within 650 ms.
> We were analyzing some queries and found that few of them were like 14675
> ms, 13767 ms etc...
> So was curious to see whether we have some way to restrict the query to
> not run beyond say 5s or some ideal timing  in SOLR even if it returns only
> partial results.
>
> That is how I came across the "timeAllowed" and wanted to check on it.
> Also was curious to know whether  "shardHandler"  could be used to work
> in those lines or it is meant for a totally different functionality.
>
> Thanks!
> Best,
> Mark
>
>
> On Sun, Aug 11, 2019 at 8:17 AM Jan Høydahl  wrote:
>
>> What is the root use case you are trying to solve? What kind of solr
>> install is this and do you not have control over the clients or what is the
>> reason that users overload your servers?
>>
>> Normally you would scale the cluster to handle normal expected load
>> instead of trying to give users timeout exceptions. What kind of query
>> times do you experience that are above 1s and are these not important
>> enough to invest extra HW? Trying to understand the real reason behind your
>> questions.
>>
>> Jan Høydahl
>>
>> > 11. aug. 2019 kl. 11:43 skrev Mark Robinson :
>> >
>> > Hello,
>> > Could someone share their thoughts please or point to some link that
>> helps
>> > understand my above queries?
>> > In the Solr documentation I came across a few lines on timeAllowed and
>> > shardHandler, but if there was an example scenario for both it would
>> help
>> > understand them more thoroughly.
>> > Also curious to know different ways if any n SOLR to restrict/ limit a
>> time
>> > consuming query from processing for a long time.
>> >
>> > Thanks!
>> > Mark
>> >
>> > On Fri, Aug 9, 2019 at 2:15 PM Mark Robinson 
>> > wrote:
>> >
>> >>
>> >> Hello,
>> >> I have the following questions please:-
>> >>
>> >> In solrconfig.xml I created a new "/selecttimeout" handler copying
>> >> "/select" handler and added the following to my new "/selecttimeout":-
>> >>  
>> >>10
>> >>20
>> >>  
>> >>
>> >> 1.
>> >> Does the above mean that if I dont get a request once in 10ms on the
>> >> socket handling the /selecttimeout handler, that socket will be closed?
>> >>
>> >> 2.
>> >> Same with  connTimeOut? ie the connection  object remains live only if
>> at
>> >> least a connection request comes once in every 20 mS; if not the object
>> >> gets closed?
>> >>
>> >> Suppose a time consumeing query (say with lots of facets etc...), is
>> fired
>> >> against SOLR. How can I prevent Solr processing it for not more than
>> 1s?
>> >>
>> >> 3.
>> >> Is this achieved by setting timeAllowed=1000?  Or are there any other
>> ways
>> >> to do this in Solr?
>> >>
>> >> 4
>> >> For the same purpose to prevent heavy queries overloading SOLR, does
>> the
>> >>  above help in anyway or is it that shardHandler has
>> nothing
>> >> to restrict a query once fired against Solr?
>> >>
>> >>
>> >> Could someone pls share your views?
>> >>
>> >> Thanks!
>> >> Mark
>> >>
>>
>
>


Re: Solr restricting time-consuming/heavy processing queries

2019-08-12 Thread Mark Robinson
Hi Jan,

Thanks for the reply.
Our normal search times is within 650 ms.
We were analyzing some queries and found that few of them were like 14675
ms, 13767 ms etc...
So was curious to see whether we have some way to restrict the query to not
run beyond say 5s or some ideal timing  in SOLR even if it returns only
partial results.

That is how I came across the "timeAllowed" and wanted to check on it.
Also was curious to know whether  "shardHandler"  could be used to work in
those lines or it is meant for a totally different functionality.

Thanks!
Best,
Mark


On Sun, Aug 11, 2019 at 8:17 AM Jan Høydahl  wrote:

> What is the root use case you are trying to solve? What kind of solr
> install is this and do you not have control over the clients or what is the
> reason that users overload your servers?
>
> Normally you would scale the cluster to handle normal expected load
> instead of trying to give users timeout exceptions. What kind of query
> times do you experience that are above 1s and are these not important
> enough to invest extra HW? Trying to understand the real reason behind your
> questions.
>
> Jan Høydahl
>
> > 11. aug. 2019 kl. 11:43 skrev Mark Robinson :
> >
> > Hello,
> > Could someone share their thoughts please or point to some link that
> helps
> > understand my above queries?
> > In the Solr documentation I came across a few lines on timeAllowed and
> > shardHandler, but if there was an example scenario for both it would help
> > understand them more thoroughly.
> > Also curious to know different ways if any n SOLR to restrict/ limit a
> time
> > consuming query from processing for a long time.
> >
> > Thanks!
> > Mark
> >
> > On Fri, Aug 9, 2019 at 2:15 PM Mark Robinson 
> > wrote:
> >
> >>
> >> Hello,
> >> I have the following questions please:-
> >>
> >> In solrconfig.xml I created a new "/selecttimeout" handler copying
> >> "/select" handler and added the following to my new "/selecttimeout":-
> >>  
> >>10
> >>20
> >>  
> >>
> >> 1.
> >> Does the above mean that if I dont get a request once in 10ms on the
> >> socket handling the /selecttimeout handler, that socket will be closed?
> >>
> >> 2.
> >> Same with  connTimeOut? ie the connection  object remains live only if
> at
> >> least a connection request comes once in every 20 mS; if not the object
> >> gets closed?
> >>
> >> Suppose a time consumeing query (say with lots of facets etc...), is
> fired
> >> against SOLR. How can I prevent Solr processing it for not more than 1s?
> >>
> >> 3.
> >> Is this achieved by setting timeAllowed=1000?  Or are there any other
> ways
> >> to do this in Solr?
> >>
> >> 4
> >> For the same purpose to prevent heavy queries overloading SOLR, does the
> >>  above help in anyway or is it that shardHandler has
> nothing
> >> to restrict a query once fired against Solr?
> >>
> >>
> >> Could someone pls share your views?
> >>
> >> Thanks!
> >> Mark
> >>
>


Re: Solr restricting time-consuming/heavy processing queries

2019-08-11 Thread Jan Høydahl
What is the root use case you are trying to solve? What kind of solr install is 
this and do you not have control over the clients or what is the reason that 
users overload your servers?

Normally you would scale the cluster to handle normal expected load instead of 
trying to give users timeout exceptions. What kind of query times do you 
experience that are above 1s and are these not important enough to invest extra 
HW? Trying to understand the real reason behind your questions.

Jan Høydahl

> 11. aug. 2019 kl. 11:43 skrev Mark Robinson :
> 
> Hello,
> Could someone share their thoughts please or point to some link that helps
> understand my above queries?
> In the Solr documentation I came across a few lines on timeAllowed and
> shardHandler, but if there was an example scenario for both it would help
> understand them more thoroughly.
> Also curious to know different ways if any n SOLR to restrict/ limit a time
> consuming query from processing for a long time.
> 
> Thanks!
> Mark
> 
> On Fri, Aug 9, 2019 at 2:15 PM Mark Robinson 
> wrote:
> 
>> 
>> Hello,
>> I have the following questions please:-
>> 
>> In solrconfig.xml I created a new "/selecttimeout" handler copying
>> "/select" handler and added the following to my new "/selecttimeout":-
>>  
>>10
>>20
>>  
>> 
>> 1.
>> Does the above mean that if I dont get a request once in 10ms on the
>> socket handling the /selecttimeout handler, that socket will be closed?
>> 
>> 2.
>> Same with  connTimeOut? ie the connection  object remains live only if at
>> least a connection request comes once in every 20 mS; if not the object
>> gets closed?
>> 
>> Suppose a time consumeing query (say with lots of facets etc...), is fired
>> against SOLR. How can I prevent Solr processing it for not more than 1s?
>> 
>> 3.
>> Is this achieved by setting timeAllowed=1000?  Or are there any other ways
>> to do this in Solr?
>> 
>> 4
>> For the same purpose to prevent heavy queries overloading SOLR, does the
>>  above help in anyway or is it that shardHandler has nothing
>> to restrict a query once fired against Solr?
>> 
>> 
>> Could someone pls share your views?
>> 
>> Thanks!
>> Mark
>> 


Re: Solr restricting time-consuming/heavy processing queries

2019-08-11 Thread Mark Robinson
Hello,
Could someone share their thoughts please or point to some link that helps
understand my above queries?
In the Solr documentation I came across a few lines on timeAllowed and
shardHandler, but if there was an example scenario for both it would help
understand them more thoroughly.
Also curious to know different ways if any n SOLR to restrict/ limit a time
consuming query from processing for a long time.

Thanks!
Mark

On Fri, Aug 9, 2019 at 2:15 PM Mark Robinson 
wrote:

>
> Hello,
> I have the following questions please:-
>
> In solrconfig.xml I created a new "/selecttimeout" handler copying
> "/select" handler and added the following to my new "/selecttimeout":-
>   
> 10
> 20
>   
>
> 1.
> Does the above mean that if I dont get a request once in 10ms on the
> socket handling the /selecttimeout handler, that socket will be closed?
>
> 2.
> Same with  connTimeOut? ie the connection  object remains live only if at
> least a connection request comes once in every 20 mS; if not the object
> gets closed?
>
> Suppose a time consumeing query (say with lots of facets etc...), is fired
> against SOLR. How can I prevent Solr processing it for not more than 1s?
>
> 3.
> Is this achieved by setting timeAllowed=1000?  Or are there any other ways
> to do this in Solr?
>
> 4
> For the same purpose to prevent heavy queries overloading SOLR, does the
>  above help in anyway or is it that shardHandler has nothing
> to restrict a query once fired against Solr?
>
>
> Could someone pls share your views?
>
> Thanks!
> Mark
>


Solr restricting time-consuming/heavy processing queries

2019-08-09 Thread Mark Robinson
Hello,
I have the following questions please:-

In solrconfig.xml I created a new "/selecttimeout" handler copying
"/select" handler and added the following to my new "/selecttimeout":-
  
10
20
  

1.
Does the above mean that if I dont get a request once in 10ms on the socket
handling the /selecttimeout handler, that socket will be closed?

2.
Same with  connTimeOut? ie the connection  object remains live only if at
least a connection request comes once in every 20 mS; if not the object
gets closed?

Suppose a time consumeing query (say with lots of facets etc...), is fired
against SOLR. How can I prevent Solr processing it for not more than 1s?

3.
Is this achieved by setting timeAllowed=1000?  Or are there any other ways
to do this in Solr?

4
For the same purpose to prevent heavy queries overloading SOLR, does the
 above help in anyway or is it that shardHandler has nothing
to restrict a query once fired against Solr?


Could someone pls share your views?

Thanks!
Mark


Re: Question regarding negated block join queries

2019-06-17 Thread Erick Erickson
Bram:

Here’s a fuller explanation that you might be interested in:

https://lucidworks.com/2011/12/28/why-not-and-or-and-not/

Best,
Erick

> On Jun 17, 2019, at 11:32 AM, Bram Biesbrouck 
>  wrote:
> 
> On Mon, Jun 17, 2019 at 7:11 PM Shawn Heisey  wrote:
> 
>> On 6/17/2019 4:46 AM, Bram Biesbrouck wrote:
>>> q={!parent which=-(parentUri:*)}*:*
>> 
>> Pure negative queries do not work in Lucene.  Sometimes, when you do a
>> single-clause negative query, Solr is able to detect the problem and
>> automatically make an adjustment so the query works.  This happens
>> transparently so you never notice.
>> 
>> In essence, what your negative query tells Lucene is "start with
>> nothing, and then subtract docs that match this query."  Since you
>> started with nothing and then subtracted, you get nothing.
>> 
>> Also, that's a wilcard query.  Which could be very slow if the possible
>> number of values in parentUri is more than a few.  If that field can
>> only contain a very small number of values, then a wildcard query might
>> be fast.
>> 
>> The following query solves both problems -- starting with all docs and
>> then subtracting things that match the query clause after that:
>> 
>> *:* -parentUri:[* TO *]
>> 
>> This will return all documents that do not have the parentUri field
>> defined.  The [* TO *] syntax is an all-inclusive range query.
>> 
> 
> Hi Shawn,
> 
> Awesome elaborate explanation, thank you. Also thanks for the optimization
> hint. I found both approaches online, but didn't realize there was a
> performance difference .
> Digging deeper, I've found this SO post, basically explaining why it worked
> some of the time, but not in all cases:
> https://stackoverflow.com/questions/10651548/negation-in-solr-query
> 
> best,
> 
> b.



Re: Question regarding negated block join queries

2019-06-17 Thread Bram Biesbrouck
On Mon, Jun 17, 2019 at 7:11 PM Shawn Heisey  wrote:

> On 6/17/2019 4:46 AM, Bram Biesbrouck wrote:
> > q={!parent which=-(parentUri:*)}*:*
>
> Pure negative queries do not work in Lucene.  Sometimes, when you do a
> single-clause negative query, Solr is able to detect the problem and
> automatically make an adjustment so the query works.  This happens
> transparently so you never notice.
>
> In essence, what your negative query tells Lucene is "start with
> nothing, and then subtract docs that match this query."  Since you
> started with nothing and then subtracted, you get nothing.
>
> Also, that's a wilcard query.  Which could be very slow if the possible
> number of values in parentUri is more than a few.  If that field can
> only contain a very small number of values, then a wildcard query might
> be fast.
>
> The following query solves both problems -- starting with all docs and
> then subtracting things that match the query clause after that:
>
> *:* -parentUri:[* TO *]
>
> This will return all documents that do not have the parentUri field
> defined.  The [* TO *] syntax is an all-inclusive range query.
>

Hi Shawn,

Awesome elaborate explanation, thank you. Also thanks for the optimization
hint. I found both approaches online, but didn't realize there was a
performance difference .
Digging deeper, I've found this SO post, basically explaining why it worked
some of the time, but not in all cases:
https://stackoverflow.com/questions/10651548/negation-in-solr-query

best,

b.


Re: Question regarding negated block join queries

2019-06-17 Thread Shawn Heisey

On 6/17/2019 4:46 AM, Bram Biesbrouck wrote:

q={!parent which=-(parentUri:*)}*:*


Pure negative queries do not work in Lucene.  Sometimes, when you do a 
single-clause negative query, Solr is able to detect the problem and 
automatically make an adjustment so the query works.  This happens 
transparently so you never notice.


In essence, what your negative query tells Lucene is "start with 
nothing, and then subtract docs that match this query."  Since you 
started with nothing and then subtracted, you get nothing.


Also, that's a wilcard query.  Which could be very slow if the possible 
number of values in parentUri is more than a few.  If that field can 
only contain a very small number of values, then a wildcard query might 
be fast.


The following query solves both problems -- starting with all docs and 
then subtracting things that match the query clause after that:


*:* -parentUri:[* TO *]

This will return all documents that do not have the parentUri field 
defined.  The [* TO *] syntax is an all-inclusive range query.


Thanks,
Shawn


Question regarding negated block join queries

2019-06-17 Thread Bram Biesbrouck
Dear all,

I'm new to this list, so let me introduce myself. I'm Bram, author of a
linked data framework called Stralo. We're working toward version 1.0, in
which we're integrating Solr indexing and querying of RDF triples (
https://github.com/republic-of-reinvention/com.stralo.framework/milestone/3)

I'm running to inconsistent results regarding block join queries and I
wondered if any of you could help me out. We're indexing our parent-child
relationships using a field called "parentUri". The field contains the URI
(the id of the document) of the parent document, is just omitted when the
document itself if a parent.

Here's an example of a child document:

{
"language":"en",
"resource":"/resource/1130494009577889453",
"parentUri":"/en/blah",
"uri":"/resource/1130494009577889453",
"label":"Label of the object",
"description":"Example of some sub text",
"typeOf":"ror:Page",
"rdf:type":["ror:Page"],
"rdfs:label":["Label of the object"],
"ror:text":["Example of some sub text"],
"ror:testNumber":[4],
"ror:testDate":["2019-05-10T00:00:00Z"],
"_version_":1636582287436939264
}

(Please ignore the CURIE syntax we're using as field names. We know it's
slightly illegal in Solr, but it works just fine and it makes our lives
indexing tripes so much more convenient)

Here's it's parent document:

{
"language":"en",
"resource":"/resource/1106177060466942658",
"uri":"/en/blah",
"label":"rdfs label test 3",
"description":"Hi, we are the Republic \nwe do video
technology",
"typeOf":"ror:BlogPost",
"rdf:type":["ror:BlogPost"],
"rdfs:label":["rdfs label test 3"],
"meta:created":["2019-04-04T09:08:35.736Z"],
"meta:creator":["/users/2"],
"meta:modified":["2019-06-17T10:14:54.134Z"],
"meta:contributor":["/users/2",
  "/users/1"],
"ror:testEditor":["Blah, dit is inhoud van test editor"],
"ror:testEnum":["af"],
"ror:testDate":["2019-05-31T00:00:00Z"],
"ror:testResource":["/resource/Page/800895161299715471"],
"ror:testObject":["/resource/1130494009577889453"],
"ror:text":["Hi, we are the Republic we do video technology"],
"_version_":1636582287436939264
}

As said, we're struggling with block joins, because we don't have a clear
field that contains "this" for parent documents and "that" for child
documents. Instead, it's omitted for parent documents. So, to fire a block
join child query, we use this approach (just an example):

q={!parent which=-(parentUri:*)}*:*

What we expect is that the allParents filter selects all those documents
where the "parentUri" field doesn't exist using a negated wildcard query
(which works just fine when used alone). The someParents fitler just
selects everything since this is an example. Alas, this doesn't yield any
results.

Since the docs say:
When subordinate clause () is omitted, it’s parsed as a
segmented and cached filter for children documents. More precisely,
q={!child of=} is equivalent to q=*:* -.

I tried to run this query (assuming a double negation becomes a plus):

*:* +(parentUri:*)

And this yields correct results, so I'm assuming it's possible, but I'm
overlooking something in my block join children query syntax.

Could anyone put me in the right direction to use block join queries with
non-existent or existent fields?

all the best,

b.


How to migrate the queries having core-across join and json.facet to SolrCloud

2019-05-27 Thread Yasufumi Mizoguchi
Hi, community.

We are trying to migrate from single Solr instance to SolrCloud with Solr
7.4.0 due to the increase of documents.
We have some join query running on current Solr, and need to migrate these
because join
queries has some restrictions when running on SolrCloud.
(We cannot use custom document routing and do not think integrate cores.)

Almost all join queries has migrate to those with StreamingExpressions, but
I am struggling
to migrate queries having both core-across join and lots of json.facet
parameters...

So, I want to know how to migrate from the queries having both core-across
join and multiple
json.facet params to those can run on SolrCloud.

query example)
http://localhost:8983/solr/books/select?q=*:*={!join from=id
to=author_id fromIndex=authors
v=name:Cheng}={"category":{"sort":"index","facet_count":"unique(author_id)"},"field":"category","limit":10,"offset":0,"mincount":1,"type":"terms",
(Similar json.facet params)
}

2 cores(books & authors) are both split into 3 shards.

Any insight would be greatly appreciated.

Thanks,
Yasufumi.


  1   2   3   4   5   6   7   8   9   10   >