Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread wei
Thanks Yonik.

A JIRA bug is opened:
https://issues.apache.org/jira/browse/SOLR-8251

Wei

On Fri, Nov 6, 2015 at 7:10 PM, Yonik Seeley  wrote:

> On Fri, Nov 6, 2015 at 9:56 PM, wei  wrote:
> > Good point! I tried that, on solr5 the query time is around 100-110ms,
> and
> > on solr4 it is around 60-63ms(very consistent). Solr5 is slower.
>
> When it's something easy, there comes a point when it makes sense to
> stop asking more questions and just try it yourself...
> I just did this, and can confirm what you're seeing.   For me, 5.3.1
> is about 5x slower than 4.10 for this particular query.
> Thanks for your persistence / patience in reporting this.  Could you
> open a JIRA issue for it?
>
> -Yonik
>


Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread Yonik Seeley
On Fri, Nov 6, 2015 at 9:56 PM, wei  wrote:
> Good point! I tried that, on solr5 the query time is around 100-110ms, and
> on solr4 it is around 60-63ms(very consistent). Solr5 is slower.

When it's something easy, there comes a point when it makes sense to
stop asking more questions and just try it yourself...
I just did this, and can confirm what you're seeing.   For me, 5.3.1
is about 5x slower than 4.10 for this particular query.
Thanks for your persistence / patience in reporting this.  Could you
open a JIRA issue for it?

-Yonik


Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread wei
Good point! I tried that, on solr5 the query time is around 100-110ms, and
on solr4 it is around 60-63ms(very consistent). Solr5 is slower.

Thanks,
Wei

On Fri, Nov 6, 2015 at 6:46 PM, Yonik Seeley  wrote:

> On Fri, Nov 6, 2015 at 9:30 PM, wei  wrote:
> > in solr 5.3.1, there is actually a boost, and the score is product of
> boost
> > & queryNorm.
>
> Hmmm, well, it's worth putting on the list of stuff to investigate.
> Boosting was also changed in lucene.
>
> What happens if you try this multiple times in a row?
>
> &rows=2&fl=id&q={!cache=false}*:*&fq=categoryIdsPath:1001
>
> (basically just add {!cache=false} as a prefix to the main query.)
>
> This would allow hotspot time to compile methods, and ensure that the
> filter query was cached, and do a better job of isolating the
> "filtered match-all-docs" part of the execution.
>
> -Yonik
>


Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread Yonik Seeley
On Fri, Nov 6, 2015 at 9:30 PM, wei  wrote:
> in solr 5.3.1, there is actually a boost, and the score is product of boost
> & queryNorm.

Hmmm, well, it's worth putting on the list of stuff to investigate.
Boosting was also changed in lucene.

What happens if you try this multiple times in a row?

&rows=2&fl=id&q={!cache=false}*:*&fq=categoryIdsPath:1001

(basically just add {!cache=false} as a prefix to the main query.)

This would allow hotspot time to compile methods, and ensure that the
filter query was cached, and do a better job of isolating the
"filtered match-all-docs" part of the execution.

-Yonik


Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread wei
Hi Shawn,

I took care of the warm up problem during the test. I setup jmeter project,
get query log from our production(>10 queries), and run the same query
log through jmeter to hit the solr instances with the same qps(about 40). I
removed warmup queries in both the solr setup, and also set the autowarmup
of cache to 0 in the solrconfig. I run the test for 1 hour. these two
instances are not serving other query traffic but they both get update
traffic. I disabled softcommit in solr5 and set the hardcommit to 2
minutes. The solr4 instance is a slave node replicating from solr4 master
instance, and the master also has 2 minutes commit cycle, and the testing
solr4 instance replicate the index every 2 minutes.

The solr5 is slower than solr4. After some investigation I realized that it
seems the queries containing q=*:* are causing the problem. I splitted the
query log into two log files, one with q=*:* and another without(almost all
our queries have filter queries). when I run the test, solr5 is faster when
running query with query keyword, but is much slower when run "q=*:*" query
log.

There is no other query traffic to both the two instance.(there is index
traffic). When I get the query debug log in my first email, I make sure
there is no filter cache (verified through the solr console. after hard
commit, the filtercache is cleaned)

Hope my email address your concern about how I do the test. What obvious to
me is that solr5 is faster in one test(with query keyword) and is slower in
the other test(without query keyword).

Thanks,
Wei

On Fri, Nov 6, 2015 at 1:41 PM, Shawn Heisey  wrote:

> On 11/6/2015 1:01 PM, wei wrote:
> > Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if
> > the slowness of MatchAllDocsQuery is also caused by the removal of
> > fieldcache. Can someone please explain a little bit?
>
> I only glanced at your full output in the message at the start of this
> thread.  I thought I saw facet output in it, but it turns out that the
> only mention of facets was the timing information from the debug, so
> that very likely rules out the FieldCache change as a culprit.
>
> I am suspecting that the 4.7 index is warmed better, and may have the
> specific filter query (categoryIdsPath:1001)already sitting in the
> filterCache.
>
> Try running that query a few of times on both versions, then restart
> Solr on both versions so they both start clean, and run the query *once*
> on each system, and see whether there's still a large discrepancy.
>
> If one of the systems is receiving queries from active clients and the
> other is not, then the comparison will be unfair, and biased towards the
> one that is getting additional queries.  Query activity, even if it
> seems unrelated to the query you are testing, has a tendency to reduce
> overall qtime values.
>
> Thanks,
> Shawn
>
>


Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread wei
the explain part are different in solr4.7 and solr 5.3.1. In solr 4.7,
there is only one line

 
 1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm
 1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm
  

in solr 5.3.1, there is actually a boost, and the score is product of boost
& queryNorm.

Can that cause the problem? if solr5 need to calculate the product of all
the hits. I am not sure where the boost come from, and why it is different
in solr4.7

  
 1.0 = *:*, product of:
  1.0 = boost
  1.0 = queryNorm
 1.0 = *:*, product of:
  1.0 = boost
  1.0 = queryNorm
  


Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread wei
Hi Jack,

I also run the test with queries that have query terms(with filter too).
Solr5 is faster compare to solr4 in the test. I got the queries set from
our production log, almost all of our queries have filter. So that suggest
to me that it is not the filter query that is slow.

I copy the fq query to the q field (i did not remove fq though), the solr5
is slightly faster than solr 4 for the query

solr4:


   
  0
  64
  
 id
 0
 +categoryIdsPath:1001
 true
 +categoryIdsPath:1001
 2
  
   
   
  
 36652255
  
  
 36651884
  
   
   
  +categoryIdsPath:1001
  +categoryIdsPath:1001
  +categoryIdsPath:1001
  +categoryIdsPath:1001
  
 20.451632 = (MATCH)
weight(categoryIdsPath:1001 in 19) [], result of:
  20.451632 = score(doc=19,freq=1.0 = termFreq=1.0
), product of:
4.522348 = queryWeight, product of:
  4.522348 = idf(docFreq=610392, maxDocs=20670250)
  1.0 = queryNorm
4.522348 = fieldWeight in 19, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  4.522348 = idf(docFreq=610392, maxDocs=20670250)
  1.0 = fieldNorm(doc=19)
 20.451632 = (MATCH)
weight(categoryIdsPath:1001 in 44) [], result of:
  20.451632 = score(doc=44,freq=1.0 = termFreq=1.0
), product of:
4.522348 = queryWeight, product of:
  4.522348 = idf(docFreq=610392, maxDocs=20670250)
  1.0 = queryNorm
4.522348 = fieldWeight in 44, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  4.522348 = idf(docFreq=610392, maxDocs=20670250)
  1.0 = fieldNorm(doc=44)
  
  LuceneQParser
  
 +categoryIdsPath:1001
  
  
 +categoryIdsPath:1001
  
  
 63.0
 
3.0

   3.0


   0.0


   0.0


   0.0


   0.0


   0.0

 
 
60.0

   57.0


   0.0


   0.0


   0.0


   0.0


   3.0

 
  
   


solr5:


   
  0
  51
  
 id
 0
 +categoryIdsPath:1001
 true
 +categoryIdsPath:1001
 2
  
   
   
  
 36652255
  
  
 36651884
  
   
   
  +categoryIdsPath:1001
  +categoryIdsPath:1001
  +categoryIdsPath:1001
  +categoryIdsPath:1001
  
 20.420362 = weight(categoryIdsPath:1001
in 20) [], result of:
  20.420362 = score(doc=20,freq=1.0), product of:
4.5188894 = queryWeight, product of:
  4.5188894 = idf(docFreq=602005, maxDocs=20315855)
  1.0 = queryNorm
4.5188894 = fieldWeight in 20, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  4.5188894 = idf(docFreq=602005, maxDocs=20315855)
  1.0 = fieldNorm(doc=20)
 20.420362 = weight(categoryIdsPath:1001
in 49) [], result of:
  20.420362 = score(doc=49,freq=1.0), product of:
4.5188894 = queryWeight, product of:
  4.5188894 = idf(docFreq=602005, maxDocs=20315855)
  1.0 = queryNorm
4.5188894 = fieldWeight in 49, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  4.5188894 = idf(docFreq=602005, maxDocs=20315855)
  1.0 = fieldNorm(doc=49)
  
  LuceneQParser
  
 +categoryIdsPath:1001
  
  
 +categoryIdsPath:1001
  
  
 51.0
 
1.0

   1.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0

 
 
50.0

   48.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0


   2.0

 
  
   



On Fri, Nov 6, 2015 at 12:12 PM, Jack Krupansky 
wrote:

> Just to be clear, I was suggesting that the filter query (fq) was slow, not
> the MatchAllDocsQuery, which should be just as speedy as before. You can
> test for yourself whether the MADQ by itself is any slower.
>
> You could also test using the fq as the main query (q) - with no fq
> parameter, and see if that is a

Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread Yonik Seeley
On Fri, Nov 6, 2015 at 3:12 PM, Jack Krupansky  wrote:
> Just to be clear, I was suggesting that the filter query (fq) was slow

That's a possibility.  Filters were actually removed in Lucene, so
it's a very different code path now.

In 4.10, filters were first class, and SolrIndexSearcher used methods like:
search(query, pf.filter, collector);
And BitSet based filters were pushed down to the leaves of a query
(which the filter generated from MatchAllDocsQuery would have been).

At some point, those were changed to use FilteredQuery instead.  But I
think at some point prior Lucene converted a Filter to a
FilteredQuery, so that change in Solr may not have mattered at that
point.

Then in LUCENE-6583, Filters were removed and the code in
SolrIndexSearcher was changed to use a BooleanQuery:
   if (pf.filter != null) {
  Query query = new BooleanQuery.Builder()
  .add(main, Occur.MUST)
  .add(pf.filter, Occur.FILTER)
  .build();
  search(query, collector);

So... lots of changes over time, no idea which (if any) is the cause.

-Yonik


Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread Shawn Heisey
On 11/6/2015 1:01 PM, wei wrote:
> Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if
> the slowness of MatchAllDocsQuery is also caused by the removal of
> fieldcache. Can someone please explain a little bit?

I only glanced at your full output in the message at the start of this
thread.  I thought I saw facet output in it, but it turns out that the
only mention of facets was the timing information from the debug, so
that very likely rules out the FieldCache change as a culprit.

I am suspecting that the 4.7 index is warmed better, and may have the
specific filter query (categoryIdsPath:1001)already sitting in the
filterCache.

Try running that query a few of times on both versions, then restart
Solr on both versions so they both start clean, and run the query *once*
on each system, and see whether there's still a large discrepancy.

If one of the systems is receiving queries from active clients and the
other is not, then the comparison will be unfair, and biased towards the
one that is getting additional queries.  Query activity, even if it
seems unrelated to the query you are testing, has a tendency to reduce
overall qtime values.

Thanks,
Shawn



Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread Jack Krupansky
Just to be clear, I was suggesting that the filter query (fq) was slow, not
the MatchAllDocsQuery, which should be just as speedy as before. You can
test for yourself whether the MADQ by itself is any slower.

You could also test using the fq as the main query (q) - with no fq
parameter, and see if that is a lot faster, both with old and new Solr.

-- Jack Krupansky

On Fri, Nov 6, 2015 at 3:01 PM, wei  wrote:

> Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if
> the slowness of MatchAllDocsQuery is also caused by the removal of
> fieldcache. Can someone please explain a little bit?
>
> Thanks,
> Wei
>
> On Fri, Nov 6, 2015 at 7:15 AM, Shawn Heisey  wrote:
>
> > On 11/5/2015 10:25 PM, Jack Krupansky wrote:
> > > I vaguely recall some discussion concerning removal of the field cache
> in
> > > Lucene.
> >
> > The FieldCache wasn't exactly *removed* ... it's more like it was
> > renamed, improved, and sort of hidden in a miscellaneous package.  Some
> > things still require this functionality, so they use the hidden class
> > instead, which was changed to use the DocValues API.
> >
> > https://issues.apache.org/jira/browse/LUCENE-5666
> >
> > I am not qualified to discuss LUCENE-5666 beyond what I wrote in the
> > paragraph above, and it's possible that some of what I said is wrong
> > because I do not really understand the APIs involved.
> >
> > The change has caused problems for Solr.  End result from Solr's
> > perspective: Certain things which used to work perfectly fine (mostly
> > facets and grouping) in Solr 4.x have one of two problems in 5.x:
> > Either they don't work at all, or performance has gone way down.  Some
> > of these problems are documented in Jira.  These are the issues I know
> > about:
> >
> > https://issues.apache.org/jira/browse/SOLR-8088
> > https://issues.apache.org/jira/browse/SOLR-7495
> > https://issues.apache.org/jira/browse/SOLR-8096
> >
> > For fields where adding docValues is a viable option (most field types
> > other than solr.TextField), adding docValues and reindexing is very
> > likely to solve those problems.
> >
> > Sometimes adding docValues won't work, either because the field type
> > doesn't allow it, or because it's the indexed terms that are needed, not
> > the original field value.  For those situations, there is currently no
> > solution.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread wei
Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if
the slowness of MatchAllDocsQuery is also caused by the removal of
fieldcache. Can someone please explain a little bit?

Thanks,
Wei

On Fri, Nov 6, 2015 at 7:15 AM, Shawn Heisey  wrote:

> On 11/5/2015 10:25 PM, Jack Krupansky wrote:
> > I vaguely recall some discussion concerning removal of the field cache in
> > Lucene.
>
> The FieldCache wasn't exactly *removed* ... it's more like it was
> renamed, improved, and sort of hidden in a miscellaneous package.  Some
> things still require this functionality, so they use the hidden class
> instead, which was changed to use the DocValues API.
>
> https://issues.apache.org/jira/browse/LUCENE-5666
>
> I am not qualified to discuss LUCENE-5666 beyond what I wrote in the
> paragraph above, and it's possible that some of what I said is wrong
> because I do not really understand the APIs involved.
>
> The change has caused problems for Solr.  End result from Solr's
> perspective: Certain things which used to work perfectly fine (mostly
> facets and grouping) in Solr 4.x have one of two problems in 5.x:
> Either they don't work at all, or performance has gone way down.  Some
> of these problems are documented in Jira.  These are the issues I know
> about:
>
> https://issues.apache.org/jira/browse/SOLR-8088
> https://issues.apache.org/jira/browse/SOLR-7495
> https://issues.apache.org/jira/browse/SOLR-8096
>
> For fields where adding docValues is a viable option (most field types
> other than solr.TextField), adding docValues and reindexing is very
> likely to solve those problems.
>
> Sometimes adding docValues won't work, either because the field type
> doesn't allow it, or because it's the indexed terms that are needed, not
> the original field value.  For those situations, there is currently no
> solution.
>
> Thanks,
> Shawn
>
>


Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread Shawn Heisey
On 11/5/2015 10:25 PM, Jack Krupansky wrote:
> I vaguely recall some discussion concerning removal of the field cache in
> Lucene.

The FieldCache wasn't exactly *removed* ... it's more like it was
renamed, improved, and sort of hidden in a miscellaneous package.  Some
things still require this functionality, so they use the hidden class
instead, which was changed to use the DocValues API.

https://issues.apache.org/jira/browse/LUCENE-5666

I am not qualified to discuss LUCENE-5666 beyond what I wrote in the
paragraph above, and it's possible that some of what I said is wrong
because I do not really understand the APIs involved.

The change has caused problems for Solr.  End result from Solr's
perspective: Certain things which used to work perfectly fine (mostly
facets and grouping) in Solr 4.x have one of two problems in 5.x:
Either they don't work at all, or performance has gone way down.  Some
of these problems are documented in Jira.  These are the issues I know
about:

https://issues.apache.org/jira/browse/SOLR-8088
https://issues.apache.org/jira/browse/SOLR-7495
https://issues.apache.org/jira/browse/SOLR-8096

For fields where adding docValues is a viable option (most field types
other than solr.TextField), adding docValues and reindexing is very
likely to solve those problems.

Sometimes adding docValues won't work, either because the field type
doesn't allow it, or because it's the indexed terms that are needed, not
the original field value.  For those situations, there is currently no
solution.

Thanks,
Shawn



Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-05 Thread Jack Krupansky
I vaguely recall some discussion concerning removal of the field cache in
Lucene.

-- Jack Krupansky

On Thu, Nov 5, 2015 at 10:38 PM, wei  wrote:

> We are running our search on solr4.7 and I am evaluating whether to upgrade
> to solr5.3.1. I found MatchAllDocsQuery is much slower in solr5.3.1. Anyone
> know why?
>
> We have a lot of queries without any query keyword, but we apply filters on
> the query. Load testing shows those queries are much slower in solr5.3.1
> compare to 4.7. If we load test with queries with search keywords, we can
> see the queries are much faster in solr5.3.1 compare solr4.7.
> here is sample debug info:
> (in solr 4.7)
>
> 
>
>   0
>   86
>   
>  id
>  0
>  *:*
>  true
>  +categoryIdsPath:1001
>  2
>   
>
>
>   
>  36652255
>   
>   
>  36651884
>   
>
>
>   *:*
>   *:*
>   MatchAllDocsQuery(*:*)
>   *:*
>   
>  1.0 = (MATCH) MatchAllDocsQuery, product of:
>   1.0 = queryNorm
>  1.0 = (MATCH) MatchAllDocsQuery, product of:
>   1.0 = queryNorm
>   
>   LuceneQParser
>   
>  +categoryIdsPath:1001
>   
>   
>  +categoryIdsPath:1001
>   
>   
>  86.0
>  
> 0.0
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
>  
>  
> 86.0
> 
>85.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>1.0
> 
>  
>   
>
>
>
> (in solr 5.3.1)
>
> 
>
>   0
>   313
>   
>  id
>  0
>  *:*
>  true
>  +categoryIdsPath:1001
>  2
>   
>
>
>   
>  36652255
>   
>   
>  36651884
>   
>
>
>   *:*
>   *:*
>   MatchAllDocsQuery(*:*)
>   *:*
>   
>  1.0 = *:*, product of:
>   1.0 = boost
>   1.0 = queryNorm
>  1.0 = *:*, product of:
>   1.0 = boost
>   1.0 = queryNorm
>   
>   LuceneQParser
>   
>  +categoryIdsPath:1001
>   
>   
>  +categoryIdsPath:1001
>   
>   
>  313.0
>  
> 0.0
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
>  
>  
> 311.0
> 
>311.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
>  
>   
>
>
> Thanks,
> Wei
>


MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-05 Thread wei
We are running our search on solr4.7 and I am evaluating whether to upgrade
to solr5.3.1. I found MatchAllDocsQuery is much slower in solr5.3.1. Anyone
know why?

We have a lot of queries without any query keyword, but we apply filters on
the query. Load testing shows those queries are much slower in solr5.3.1
compare to 4.7. If we load test with queries with search keywords, we can
see the queries are much faster in solr5.3.1 compare solr4.7.
here is sample debug info:
(in solr 4.7)


   
  0
  86
  
 id
 0
 *:*
 true
 +categoryIdsPath:1001
 2
  
   
   
  
 36652255
  
  
 36651884
  
   
   
  *:*
  *:*
  MatchAllDocsQuery(*:*)
  *:*
  
 1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm
 1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm
  
  LuceneQParser
  
 +categoryIdsPath:1001
  
  
 +categoryIdsPath:1001
  
  
 86.0
 
0.0

   0.0


   0.0


   0.0


   0.0


   0.0


   0.0

 
 
86.0

   85.0


   0.0


   0.0


   0.0


   0.0


   1.0

 
  
   


(in solr 5.3.1)


   
  0
  313
  
 id
 0
 *:*
 true
 +categoryIdsPath:1001
 2
  
   
   
  
 36652255
  
  
 36651884
  
   
   
  *:*
  *:*
  MatchAllDocsQuery(*:*)
  *:*
  
 1.0 = *:*, product of:
  1.0 = boost
  1.0 = queryNorm
 1.0 = *:*, product of:
  1.0 = boost
  1.0 = queryNorm
  
  LuceneQParser
  
 +categoryIdsPath:1001
  
  
 +categoryIdsPath:1001
  
  
 313.0
 
0.0

   0.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0

 
 
311.0

   311.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0

 
  
   

Thanks,
Wei