Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
Thanks Yonik. A JIRA bug is opened: https://issues.apache.org/jira/browse/SOLR-8251 Wei On Fri, Nov 6, 2015 at 7:10 PM, Yonik Seeley wrote: > On Fri, Nov 6, 2015 at 9:56 PM, wei wrote: > > Good point! I tried that, on solr5 the query time is around 100-110ms, > and > > on solr4 it is around 60-63ms(very consistent). Solr5 is slower. > > When it's something easy, there comes a point when it makes sense to > stop asking more questions and just try it yourself... > I just did this, and can confirm what you're seeing. For me, 5.3.1 > is about 5x slower than 4.10 for this particular query. > Thanks for your persistence / patience in reporting this. Could you > open a JIRA issue for it? > > -Yonik >
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
On Fri, Nov 6, 2015 at 9:56 PM, wei wrote: > Good point! I tried that, on solr5 the query time is around 100-110ms, and > on solr4 it is around 60-63ms(very consistent). Solr5 is slower. When it's something easy, there comes a point when it makes sense to stop asking more questions and just try it yourself... I just did this, and can confirm what you're seeing. For me, 5.3.1 is about 5x slower than 4.10 for this particular query. Thanks for your persistence / patience in reporting this. Could you open a JIRA issue for it? -Yonik
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
Good point! I tried that, on solr5 the query time is around 100-110ms, and on solr4 it is around 60-63ms(very consistent). Solr5 is slower. Thanks, Wei On Fri, Nov 6, 2015 at 6:46 PM, Yonik Seeley wrote: > On Fri, Nov 6, 2015 at 9:30 PM, wei wrote: > > in solr 5.3.1, there is actually a boost, and the score is product of > boost > > & queryNorm. > > Hmmm, well, it's worth putting on the list of stuff to investigate. > Boosting was also changed in lucene. > > What happens if you try this multiple times in a row? > > &rows=2&fl=id&q={!cache=false}*:*&fq=categoryIdsPath:1001 > > (basically just add {!cache=false} as a prefix to the main query.) > > This would allow hotspot time to compile methods, and ensure that the > filter query was cached, and do a better job of isolating the > "filtered match-all-docs" part of the execution. > > -Yonik >
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
On Fri, Nov 6, 2015 at 9:30 PM, wei wrote: > in solr 5.3.1, there is actually a boost, and the score is product of boost > & queryNorm. Hmmm, well, it's worth putting on the list of stuff to investigate. Boosting was also changed in lucene. What happens if you try this multiple times in a row? &rows=2&fl=id&q={!cache=false}*:*&fq=categoryIdsPath:1001 (basically just add {!cache=false} as a prefix to the main query.) This would allow hotspot time to compile methods, and ensure that the filter query was cached, and do a better job of isolating the "filtered match-all-docs" part of the execution. -Yonik
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
Hi Shawn, I took care of the warm up problem during the test. I setup jmeter project, get query log from our production(>10 queries), and run the same query log through jmeter to hit the solr instances with the same qps(about 40). I removed warmup queries in both the solr setup, and also set the autowarmup of cache to 0 in the solrconfig. I run the test for 1 hour. these two instances are not serving other query traffic but they both get update traffic. I disabled softcommit in solr5 and set the hardcommit to 2 minutes. The solr4 instance is a slave node replicating from solr4 master instance, and the master also has 2 minutes commit cycle, and the testing solr4 instance replicate the index every 2 minutes. The solr5 is slower than solr4. After some investigation I realized that it seems the queries containing q=*:* are causing the problem. I splitted the query log into two log files, one with q=*:* and another without(almost all our queries have filter queries). when I run the test, solr5 is faster when running query with query keyword, but is much slower when run "q=*:*" query log. There is no other query traffic to both the two instance.(there is index traffic). When I get the query debug log in my first email, I make sure there is no filter cache (verified through the solr console. after hard commit, the filtercache is cleaned) Hope my email address your concern about how I do the test. What obvious to me is that solr5 is faster in one test(with query keyword) and is slower in the other test(without query keyword). Thanks, Wei On Fri, Nov 6, 2015 at 1:41 PM, Shawn Heisey wrote: > On 11/6/2015 1:01 PM, wei wrote: > > Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if > > the slowness of MatchAllDocsQuery is also caused by the removal of > > fieldcache. Can someone please explain a little bit? > > I only glanced at your full output in the message at the start of this > thread. I thought I saw facet output in it, but it turns out that the > only mention of facets was the timing information from the debug, so > that very likely rules out the FieldCache change as a culprit. > > I am suspecting that the 4.7 index is warmed better, and may have the > specific filter query (categoryIdsPath:1001)already sitting in the > filterCache. > > Try running that query a few of times on both versions, then restart > Solr on both versions so they both start clean, and run the query *once* > on each system, and see whether there's still a large discrepancy. > > If one of the systems is receiving queries from active clients and the > other is not, then the comparison will be unfair, and biased towards the > one that is getting additional queries. Query activity, even if it > seems unrelated to the query you are testing, has a tendency to reduce > overall qtime values. > > Thanks, > Shawn > >
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
the explain part are different in solr4.7 and solr 5.3.1. In solr 4.7, there is only one line 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm in solr 5.3.1, there is actually a boost, and the score is product of boost & queryNorm. Can that cause the problem? if solr5 need to calculate the product of all the hits. I am not sure where the boost come from, and why it is different in solr4.7 1.0 = *:*, product of: 1.0 = boost 1.0 = queryNorm 1.0 = *:*, product of: 1.0 = boost 1.0 = queryNorm
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
Hi Jack, I also run the test with queries that have query terms(with filter too). Solr5 is faster compare to solr4 in the test. I got the queries set from our production log, almost all of our queries have filter. So that suggest to me that it is not the filter query that is slow. I copy the fq query to the q field (i did not remove fq though), the solr5 is slightly faster than solr 4 for the query solr4: 0 64 id 0 +categoryIdsPath:1001 true +categoryIdsPath:1001 2 36652255 36651884 +categoryIdsPath:1001 +categoryIdsPath:1001 +categoryIdsPath:1001 +categoryIdsPath:1001 20.451632 = (MATCH) weight(categoryIdsPath:1001 in 19) [], result of: 20.451632 = score(doc=19,freq=1.0 = termFreq=1.0 ), product of: 4.522348 = queryWeight, product of: 4.522348 = idf(docFreq=610392, maxDocs=20670250) 1.0 = queryNorm 4.522348 = fieldWeight in 19, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.522348 = idf(docFreq=610392, maxDocs=20670250) 1.0 = fieldNorm(doc=19) 20.451632 = (MATCH) weight(categoryIdsPath:1001 in 44) [], result of: 20.451632 = score(doc=44,freq=1.0 = termFreq=1.0 ), product of: 4.522348 = queryWeight, product of: 4.522348 = idf(docFreq=610392, maxDocs=20670250) 1.0 = queryNorm 4.522348 = fieldWeight in 44, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.522348 = idf(docFreq=610392, maxDocs=20670250) 1.0 = fieldNorm(doc=44) LuceneQParser +categoryIdsPath:1001 +categoryIdsPath:1001 63.0 3.0 3.0 0.0 0.0 0.0 0.0 0.0 60.0 57.0 0.0 0.0 0.0 0.0 3.0 solr5: 0 51 id 0 +categoryIdsPath:1001 true +categoryIdsPath:1001 2 36652255 36651884 +categoryIdsPath:1001 +categoryIdsPath:1001 +categoryIdsPath:1001 +categoryIdsPath:1001 20.420362 = weight(categoryIdsPath:1001 in 20) [], result of: 20.420362 = score(doc=20,freq=1.0), product of: 4.5188894 = queryWeight, product of: 4.5188894 = idf(docFreq=602005, maxDocs=20315855) 1.0 = queryNorm 4.5188894 = fieldWeight in 20, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.5188894 = idf(docFreq=602005, maxDocs=20315855) 1.0 = fieldNorm(doc=20) 20.420362 = weight(categoryIdsPath:1001 in 49) [], result of: 20.420362 = score(doc=49,freq=1.0), product of: 4.5188894 = queryWeight, product of: 4.5188894 = idf(docFreq=602005, maxDocs=20315855) 1.0 = queryNorm 4.5188894 = fieldWeight in 49, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.5188894 = idf(docFreq=602005, maxDocs=20315855) 1.0 = fieldNorm(doc=49) LuceneQParser +categoryIdsPath:1001 +categoryIdsPath:1001 51.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 50.0 48.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 On Fri, Nov 6, 2015 at 12:12 PM, Jack Krupansky wrote: > Just to be clear, I was suggesting that the filter query (fq) was slow, not > the MatchAllDocsQuery, which should be just as speedy as before. You can > test for yourself whether the MADQ by itself is any slower. > > You could also test using the fq as the main query (q) - with no fq > parameter, and see if that is a
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
On Fri, Nov 6, 2015 at 3:12 PM, Jack Krupansky wrote: > Just to be clear, I was suggesting that the filter query (fq) was slow That's a possibility. Filters were actually removed in Lucene, so it's a very different code path now. In 4.10, filters were first class, and SolrIndexSearcher used methods like: search(query, pf.filter, collector); And BitSet based filters were pushed down to the leaves of a query (which the filter generated from MatchAllDocsQuery would have been). At some point, those were changed to use FilteredQuery instead. But I think at some point prior Lucene converted a Filter to a FilteredQuery, so that change in Solr may not have mattered at that point. Then in LUCENE-6583, Filters were removed and the code in SolrIndexSearcher was changed to use a BooleanQuery: if (pf.filter != null) { Query query = new BooleanQuery.Builder() .add(main, Occur.MUST) .add(pf.filter, Occur.FILTER) .build(); search(query, collector); So... lots of changes over time, no idea which (if any) is the cause. -Yonik
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
On 11/6/2015 1:01 PM, wei wrote: > Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if > the slowness of MatchAllDocsQuery is also caused by the removal of > fieldcache. Can someone please explain a little bit? I only glanced at your full output in the message at the start of this thread. I thought I saw facet output in it, but it turns out that the only mention of facets was the timing information from the debug, so that very likely rules out the FieldCache change as a culprit. I am suspecting that the 4.7 index is warmed better, and may have the specific filter query (categoryIdsPath:1001)already sitting in the filterCache. Try running that query a few of times on both versions, then restart Solr on both versions so they both start clean, and run the query *once* on each system, and see whether there's still a large discrepancy. If one of the systems is receiving queries from active clients and the other is not, then the comparison will be unfair, and biased towards the one that is getting additional queries. Query activity, even if it seems unrelated to the query you are testing, has a tendency to reduce overall qtime values. Thanks, Shawn
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
Just to be clear, I was suggesting that the filter query (fq) was slow, not the MatchAllDocsQuery, which should be just as speedy as before. You can test for yourself whether the MADQ by itself is any slower. You could also test using the fq as the main query (q) - with no fq parameter, and see if that is a lot faster, both with old and new Solr. -- Jack Krupansky On Fri, Nov 6, 2015 at 3:01 PM, wei wrote: > Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if > the slowness of MatchAllDocsQuery is also caused by the removal of > fieldcache. Can someone please explain a little bit? > > Thanks, > Wei > > On Fri, Nov 6, 2015 at 7:15 AM, Shawn Heisey wrote: > > > On 11/5/2015 10:25 PM, Jack Krupansky wrote: > > > I vaguely recall some discussion concerning removal of the field cache > in > > > Lucene. > > > > The FieldCache wasn't exactly *removed* ... it's more like it was > > renamed, improved, and sort of hidden in a miscellaneous package. Some > > things still require this functionality, so they use the hidden class > > instead, which was changed to use the DocValues API. > > > > https://issues.apache.org/jira/browse/LUCENE-5666 > > > > I am not qualified to discuss LUCENE-5666 beyond what I wrote in the > > paragraph above, and it's possible that some of what I said is wrong > > because I do not really understand the APIs involved. > > > > The change has caused problems for Solr. End result from Solr's > > perspective: Certain things which used to work perfectly fine (mostly > > facets and grouping) in Solr 4.x have one of two problems in 5.x: > > Either they don't work at all, or performance has gone way down. Some > > of these problems are documented in Jira. These are the issues I know > > about: > > > > https://issues.apache.org/jira/browse/SOLR-8088 > > https://issues.apache.org/jira/browse/SOLR-7495 > > https://issues.apache.org/jira/browse/SOLR-8096 > > > > For fields where adding docValues is a viable option (most field types > > other than solr.TextField), adding docValues and reindexing is very > > likely to solve those problems. > > > > Sometimes adding docValues won't work, either because the field type > > doesn't allow it, or because it's the indexed terms that are needed, not > > the original field value. For those situations, there is currently no > > solution. > > > > Thanks, > > Shawn > > > > >
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if the slowness of MatchAllDocsQuery is also caused by the removal of fieldcache. Can someone please explain a little bit? Thanks, Wei On Fri, Nov 6, 2015 at 7:15 AM, Shawn Heisey wrote: > On 11/5/2015 10:25 PM, Jack Krupansky wrote: > > I vaguely recall some discussion concerning removal of the field cache in > > Lucene. > > The FieldCache wasn't exactly *removed* ... it's more like it was > renamed, improved, and sort of hidden in a miscellaneous package. Some > things still require this functionality, so they use the hidden class > instead, which was changed to use the DocValues API. > > https://issues.apache.org/jira/browse/LUCENE-5666 > > I am not qualified to discuss LUCENE-5666 beyond what I wrote in the > paragraph above, and it's possible that some of what I said is wrong > because I do not really understand the APIs involved. > > The change has caused problems for Solr. End result from Solr's > perspective: Certain things which used to work perfectly fine (mostly > facets and grouping) in Solr 4.x have one of two problems in 5.x: > Either they don't work at all, or performance has gone way down. Some > of these problems are documented in Jira. These are the issues I know > about: > > https://issues.apache.org/jira/browse/SOLR-8088 > https://issues.apache.org/jira/browse/SOLR-7495 > https://issues.apache.org/jira/browse/SOLR-8096 > > For fields where adding docValues is a viable option (most field types > other than solr.TextField), adding docValues and reindexing is very > likely to solve those problems. > > Sometimes adding docValues won't work, either because the field type > doesn't allow it, or because it's the indexed terms that are needed, not > the original field value. For those situations, there is currently no > solution. > > Thanks, > Shawn > >
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
On 11/5/2015 10:25 PM, Jack Krupansky wrote: > I vaguely recall some discussion concerning removal of the field cache in > Lucene. The FieldCache wasn't exactly *removed* ... it's more like it was renamed, improved, and sort of hidden in a miscellaneous package. Some things still require this functionality, so they use the hidden class instead, which was changed to use the DocValues API. https://issues.apache.org/jira/browse/LUCENE-5666 I am not qualified to discuss LUCENE-5666 beyond what I wrote in the paragraph above, and it's possible that some of what I said is wrong because I do not really understand the APIs involved. The change has caused problems for Solr. End result from Solr's perspective: Certain things which used to work perfectly fine (mostly facets and grouping) in Solr 4.x have one of two problems in 5.x: Either they don't work at all, or performance has gone way down. Some of these problems are documented in Jira. These are the issues I know about: https://issues.apache.org/jira/browse/SOLR-8088 https://issues.apache.org/jira/browse/SOLR-7495 https://issues.apache.org/jira/browse/SOLR-8096 For fields where adding docValues is a viable option (most field types other than solr.TextField), adding docValues and reindexing is very likely to solve those problems. Sometimes adding docValues won't work, either because the field type doesn't allow it, or because it's the indexed terms that are needed, not the original field value. For those situations, there is currently no solution. Thanks, Shawn
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
I vaguely recall some discussion concerning removal of the field cache in Lucene. -- Jack Krupansky On Thu, Nov 5, 2015 at 10:38 PM, wei wrote: > We are running our search on solr4.7 and I am evaluating whether to upgrade > to solr5.3.1. I found MatchAllDocsQuery is much slower in solr5.3.1. Anyone > know why? > > We have a lot of queries without any query keyword, but we apply filters on > the query. Load testing shows those queries are much slower in solr5.3.1 > compare to 4.7. If we load test with queries with search keywords, we can > see the queries are much faster in solr5.3.1 compare solr4.7. > here is sample debug info: > (in solr 4.7) > > > > 0 > 86 > > id > 0 > *:* > true > +categoryIdsPath:1001 > 2 > > > > > 36652255 > > > 36651884 > > > > *:* > *:* > MatchAllDocsQuery(*:*) > *:* > > 1.0 = (MATCH) MatchAllDocsQuery, product of: > 1.0 = queryNorm > 1.0 = (MATCH) MatchAllDocsQuery, product of: > 1.0 = queryNorm > > LuceneQParser > > +categoryIdsPath:1001 > > > +categoryIdsPath:1001 > > > 86.0 > > 0.0 > >0.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > > > 86.0 > >85.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > >1.0 > > > > > > > (in solr 5.3.1) > > > > 0 > 313 > > id > 0 > *:* > true > +categoryIdsPath:1001 > 2 > > > > > 36652255 > > > 36651884 > > > > *:* > *:* > MatchAllDocsQuery(*:*) > *:* > > 1.0 = *:*, product of: > 1.0 = boost > 1.0 = queryNorm > 1.0 = *:*, product of: > 1.0 = boost > 1.0 = queryNorm > > LuceneQParser > > +categoryIdsPath:1001 > > > +categoryIdsPath:1001 > > > 313.0 > > 0.0 > >0.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > > > 311.0 > >311.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > >0.0 > > > > > > Thanks, > Wei >