Re: Sorting by field is slow

2013-06-17 Thread Shane Perry
Turns out it was a case of an oversite.  My warming queries weren't setting
the sort order and as a result don't successfully complete.  After setting
the sort order things appear to be responding quickly.

Thanks for the help.

On Mon, Jun 17, 2013 at 9:45 AM, Shane Perry  wrote:

> Using 4.3.1-SNAPSHOT I have identified where the issue is occurring.  For
> a query in the format (it returns one document, sorted by field4)
>
> +(field0:UUID0) -field1:string0 +field2:string1 +field3:text0
> +field4:"text1"
>
>
> with the field types
>
> 
>
>  omitNorms="true"/>
>
> 
>   
> 
> 
> 
> 
> 
>   
> 
>
>
> the method FieldCacheImpl$SortedDocValuesCache#createValue, the reader
> reports 2640449 terms.  As a result, the loop on line 1198 is
> executed 2640449 and the inner loop is executed a total of 658310778.  My
> index contains 56180128 documents.
>
> My configuration file sets the  for the newSearcher and
> firstSearcher listeners to the value
>
> 
>static firstSearcher warming in solrconfig.xml
>field4
> 
>
>
> which does not appear to affect the speed.  I'm not sure how replication
> plays into the equation outside the fact that we are relatively aggressive
> on the replication (every 60 seconds).  I fear I may be at the end of my
> knowledge without really getting into the code so any help at this point
> would be greatly appreciated.
>
> Shane
>
> On Thu, Jun 13, 2013 at 4:11 PM, Shane Perry  wrote:
>
>> I've dug through the code and have narrowed the delay down
>> to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
>> the point where the comparator's setNextReader() method is called (line 98
>> in the lucene_solr_4_3 branch).  That line is actually two method calls so
>> I'm not yet certain which path is the cause.  I'll continue to dig through
>> the code but am on thin ice so input would be great.
>>
>> Shane
>>
>>
>> On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry  wrote:
>>
>>> Erick,
>>>
>>> We do have soft commits turned.  Initially, autoCommit was set at 15000
>>> and autoSoftCommit at 1000.  We did up those to 120 and 60
>>> respectively.  However, since the core in question is a slave, we don't
>>> actually do writes to the core but rely on replication only to populate the
>>> index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
>>> no-ops?  I thought I had pulled out all hard commits but a double check
>>> shows one instance where it still occurs.
>>>
>>> Thanks for your time.
>>>
>>> Shane
>>>
>>> On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson >> > wrote:
>>>
 Shane:

 You've covered all the config stuff that I can think of. There's one
 other possibility. Do you have the soft commits turned on and are
 they very short? Although soft commits shouldn't invalidate any
 segment-level caches (but I'm not sure whether the sorting buffers
 are low-level or not).

 About the only other thing I can think of is that you're somehow
 doing hard commits from, say, the client but that's really
 stretching.

 All I can really say at this point is that this isn't a problem I've
 seen
 before, so it's _likely_ some innocent-seeming config has changed.
 I'm sure it'll be obvious once you find it ...

 Erick

 On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry 
 wrote:
 > Erick,
 >
 > I agree, it doesn't make sense.  I manually merged the solrconfig.xml
 from
 > the distribution example with my 3.6 solrconfig.xml, pulling out what
 I
 > didn't need.  There is the possibility I removed something I
 shouldn't have
 > though I don't know what it would be.  Minus removing the dynamic
 fields, a
 > custom tokenizer class, and changing all my fields to be stored, the
 > schema.xml file should be the same as well.  I'm not currently in the
 > position to do so, but I'll double check those two files.  Finally,
 the
 > data was re-indexed when I moved to 4.3.
 >
 > My statement about field values wasn't stated very well.  What I
 meant is
 > that the 'text' field has more unique terms than some of my other
 fields.
 >
 > As for this being an edge case, I'm not sure why it would manifest
 itself
 > in 4.3 but not in 3.6 (short of me having a screwy configuration
 setting).
 >  If I get a chance, I'll see if I can duplicate the behavior with a
 small
 > document count in a sandboxed environment.
 >
 > Shane
 >
 > On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson <
 erickerick...@gmail.com>wrote:
 >
 >> This doesn't make much sense, particularly the fact
 >> that you added first/new searchers. I'm assuming that
 >> these are sorting on the same field as your slow query.
 >>
 >> But sorting on a text field for which
 >> "Overall, the values of the field are unique"
 >> is a red-flag. Solr doesn't sort on fields th

Re: Sorting by field is slow

2013-06-17 Thread Shane Perry
Using 4.3.1-SNAPSHOT I have identified where the issue is occurring.  For a
query in the format (it returns one document, sorted by field4)

+(field0:UUID0) -field1:string0 +field2:string1 +field3:text0
+field4:"text1"


with the field types






  





  



the method FieldCacheImpl$SortedDocValuesCache#createValue, the reader
reports 2640449 terms.  As a result, the loop on line 1198 is
executed 2640449 and the inner loop is executed a total of 658310778.  My
index contains 56180128 documents.

My configuration file sets the  for the newSearcher and
firstSearcher listeners to the value


   static firstSearcher warming in solrconfig.xml
   field4



which does not appear to affect the speed.  I'm not sure how replication
plays into the equation outside the fact that we are relatively aggressive
on the replication (every 60 seconds).  I fear I may be at the end of my
knowledge without really getting into the code so any help at this point
would be greatly appreciated.

Shane

On Thu, Jun 13, 2013 at 4:11 PM, Shane Perry  wrote:

> I've dug through the code and have narrowed the delay down
> to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
> the point where the comparator's setNextReader() method is called (line 98
> in the lucene_solr_4_3 branch).  That line is actually two method calls so
> I'm not yet certain which path is the cause.  I'll continue to dig through
> the code but am on thin ice so input would be great.
>
> Shane
>
>
> On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry  wrote:
>
>> Erick,
>>
>> We do have soft commits turned.  Initially, autoCommit was set at 15000
>> and autoSoftCommit at 1000.  We did up those to 120 and 60
>> respectively.  However, since the core in question is a slave, we don't
>> actually do writes to the core but rely on replication only to populate the
>> index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
>> no-ops?  I thought I had pulled out all hard commits but a double check
>> shows one instance where it still occurs.
>>
>> Thanks for your time.
>>
>> Shane
>>
>> On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson 
>> wrote:
>>
>>> Shane:
>>>
>>> You've covered all the config stuff that I can think of. There's one
>>> other possibility. Do you have the soft commits turned on and are
>>> they very short? Although soft commits shouldn't invalidate any
>>> segment-level caches (but I'm not sure whether the sorting buffers
>>> are low-level or not).
>>>
>>> About the only other thing I can think of is that you're somehow
>>> doing hard commits from, say, the client but that's really
>>> stretching.
>>>
>>> All I can really say at this point is that this isn't a problem I've seen
>>> before, so it's _likely_ some innocent-seeming config has changed.
>>> I'm sure it'll be obvious once you find it ...
>>>
>>> Erick
>>>
>>> On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry  wrote:
>>> > Erick,
>>> >
>>> > I agree, it doesn't make sense.  I manually merged the solrconfig.xml
>>> from
>>> > the distribution example with my 3.6 solrconfig.xml, pulling out what I
>>> > didn't need.  There is the possibility I removed something I shouldn't
>>> have
>>> > though I don't know what it would be.  Minus removing the dynamic
>>> fields, a
>>> > custom tokenizer class, and changing all my fields to be stored, the
>>> > schema.xml file should be the same as well.  I'm not currently in the
>>> > position to do so, but I'll double check those two files.  Finally, the
>>> > data was re-indexed when I moved to 4.3.
>>> >
>>> > My statement about field values wasn't stated very well.  What I meant
>>> is
>>> > that the 'text' field has more unique terms than some of my other
>>> fields.
>>> >
>>> > As for this being an edge case, I'm not sure why it would manifest
>>> itself
>>> > in 4.3 but not in 3.6 (short of me having a screwy configuration
>>> setting).
>>> >  If I get a chance, I'll see if I can duplicate the behavior with a
>>> small
>>> > document count in a sandboxed environment.
>>> >
>>> > Shane
>>> >
>>> > On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson <
>>> erickerick...@gmail.com>wrote:
>>> >
>>> >> This doesn't make much sense, particularly the fact
>>> >> that you added first/new searchers. I'm assuming that
>>> >> these are sorting on the same field as your slow query.
>>> >>
>>> >> But sorting on a text field for which
>>> >> "Overall, the values of the field are unique"
>>> >> is a red-flag. Solr doesn't sort on fields that have
>>> >> more than one term, so you might as well use a
>>> >> string field and be done with it, it's possible you're
>>> >> hitting some edge case.
>>> >>
>>> >> Did you just copy your 3.6 schema and configs to
>>> >> 4.3? Did you re-index?
>>> >>
>>> >> Best
>>> >> Erick
>>> >>
>>> >> On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry 
>>> wrote:
>>> >> > Thanks for the responses.
>>> >> >
>>> >> > Setting first/newSearcher had no noticeable effect.  I'm sorting on
>>> a
>>> >> > sto

Re: Sorting by field is slow

2013-06-13 Thread Shane Perry
I've dug through the code and have narrowed the delay down
to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
the point where the comparator's setNextReader() method is called (line 98
in the lucene_solr_4_3 branch).  That line is actually two method calls so
I'm not yet certain which path is the cause.  I'll continue to dig through
the code but am on thin ice so input would be great.

Shane

On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry  wrote:

> Erick,
>
> We do have soft commits turned.  Initially, autoCommit was set at 15000
> and autoSoftCommit at 1000.  We did up those to 120 and 60
> respectively.  However, since the core in question is a slave, we don't
> actually do writes to the core but rely on replication only to populate the
> index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
> no-ops?  I thought I had pulled out all hard commits but a double check
> shows one instance where it still occurs.
>
> Thanks for your time.
>
> Shane
>
> On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson 
> wrote:
>
>> Shane:
>>
>> You've covered all the config stuff that I can think of. There's one
>> other possibility. Do you have the soft commits turned on and are
>> they very short? Although soft commits shouldn't invalidate any
>> segment-level caches (but I'm not sure whether the sorting buffers
>> are low-level or not).
>>
>> About the only other thing I can think of is that you're somehow
>> doing hard commits from, say, the client but that's really
>> stretching.
>>
>> All I can really say at this point is that this isn't a problem I've seen
>> before, so it's _likely_ some innocent-seeming config has changed.
>> I'm sure it'll be obvious once you find it ...
>>
>> Erick
>>
>> On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry  wrote:
>> > Erick,
>> >
>> > I agree, it doesn't make sense.  I manually merged the solrconfig.xml
>> from
>> > the distribution example with my 3.6 solrconfig.xml, pulling out what I
>> > didn't need.  There is the possibility I removed something I shouldn't
>> have
>> > though I don't know what it would be.  Minus removing the dynamic
>> fields, a
>> > custom tokenizer class, and changing all my fields to be stored, the
>> > schema.xml file should be the same as well.  I'm not currently in the
>> > position to do so, but I'll double check those two files.  Finally, the
>> > data was re-indexed when I moved to 4.3.
>> >
>> > My statement about field values wasn't stated very well.  What I meant
>> is
>> > that the 'text' field has more unique terms than some of my other
>> fields.
>> >
>> > As for this being an edge case, I'm not sure why it would manifest
>> itself
>> > in 4.3 but not in 3.6 (short of me having a screwy configuration
>> setting).
>> >  If I get a chance, I'll see if I can duplicate the behavior with a
>> small
>> > document count in a sandboxed environment.
>> >
>> > Shane
>> >
>> > On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson <
>> erickerick...@gmail.com>wrote:
>> >
>> >> This doesn't make much sense, particularly the fact
>> >> that you added first/new searchers. I'm assuming that
>> >> these are sorting on the same field as your slow query.
>> >>
>> >> But sorting on a text field for which
>> >> "Overall, the values of the field are unique"
>> >> is a red-flag. Solr doesn't sort on fields that have
>> >> more than one term, so you might as well use a
>> >> string field and be done with it, it's possible you're
>> >> hitting some edge case.
>> >>
>> >> Did you just copy your 3.6 schema and configs to
>> >> 4.3? Did you re-index?
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry 
>> wrote:
>> >> > Thanks for the responses.
>> >> >
>> >> > Setting first/newSearcher had no noticeable effect.  I'm sorting on a
>> >> > stored/indexed field named 'text' who's fieldType is solr.TextField.
>> >> >  Overall, the values of the field are unique. The JVM is only using
>> about
>> >> > 2G of the available 12G, so no OOM/GC issue (at least on the
>> surface).
>> >>  The
>> >> > server is question is a slave with approximately 56 million
>> documents.
>> >> >  Additionally, sorting on a field of the same type but with
>> significantly
>> >> > less uniqueness results quick response times.
>> >> >
>> >> > The following is a sample of *debugQuery=true* for a query which
>> returns
>> >> 1
>> >> > document:
>> >> >
>> >> > 
>> >> >   61458.0
>> >> >   
>> >> > 61452.0
>> >> >   
>> >> >   
>> >> > 0.0
>> >> >   
>> >> >   
>> >> > 0.0
>> >> >   
>> >> >   
>> >> > 0.0
>> >> >   
>> >> >   
>> >> > 0.0
>> >> >   
>> >> >   
>> >> > 6.0
>> >> >   
>> >> > 
>> >> >
>> >> >
>> >> > -- Update --
>> >> >
>> >> > Out of desperation, I turned off replication by commenting out the
>> *> >> > name="slave">* element in the replication requestHandler block.
>>  After
>> >> > restarting tomcat I was surprised to find that the replication admin
>> UI
>> >> > still reported the core as rep

Re: Sorting by field is slow

2013-06-13 Thread Shane Perry
Erick,

We do have soft commits turned.  Initially, autoCommit was set at 15000 and
autoSoftCommit at 1000.  We did up those to 120 and 60
respectively.  However, since the core in question is a slave, we don't
actually do writes to the core but rely on replication only to populate the
index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
no-ops?  I thought I had pulled out all hard commits but a double check
shows one instance where it still occurs.

Thanks for your time.

Shane

On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson wrote:

> Shane:
>
> You've covered all the config stuff that I can think of. There's one
> other possibility. Do you have the soft commits turned on and are
> they very short? Although soft commits shouldn't invalidate any
> segment-level caches (but I'm not sure whether the sorting buffers
> are low-level or not).
>
> About the only other thing I can think of is that you're somehow
> doing hard commits from, say, the client but that's really
> stretching.
>
> All I can really say at this point is that this isn't a problem I've seen
> before, so it's _likely_ some innocent-seeming config has changed.
> I'm sure it'll be obvious once you find it ...
>
> Erick
>
> On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry  wrote:
> > Erick,
> >
> > I agree, it doesn't make sense.  I manually merged the solrconfig.xml
> from
> > the distribution example with my 3.6 solrconfig.xml, pulling out what I
> > didn't need.  There is the possibility I removed something I shouldn't
> have
> > though I don't know what it would be.  Minus removing the dynamic
> fields, a
> > custom tokenizer class, and changing all my fields to be stored, the
> > schema.xml file should be the same as well.  I'm not currently in the
> > position to do so, but I'll double check those two files.  Finally, the
> > data was re-indexed when I moved to 4.3.
> >
> > My statement about field values wasn't stated very well.  What I meant is
> > that the 'text' field has more unique terms than some of my other fields.
> >
> > As for this being an edge case, I'm not sure why it would manifest itself
> > in 4.3 but not in 3.6 (short of me having a screwy configuration
> setting).
> >  If I get a chance, I'll see if I can duplicate the behavior with a small
> > document count in a sandboxed environment.
> >
> > Shane
> >
> > On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson  >wrote:
> >
> >> This doesn't make much sense, particularly the fact
> >> that you added first/new searchers. I'm assuming that
> >> these are sorting on the same field as your slow query.
> >>
> >> But sorting on a text field for which
> >> "Overall, the values of the field are unique"
> >> is a red-flag. Solr doesn't sort on fields that have
> >> more than one term, so you might as well use a
> >> string field and be done with it, it's possible you're
> >> hitting some edge case.
> >>
> >> Did you just copy your 3.6 schema and configs to
> >> 4.3? Did you re-index?
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry  wrote:
> >> > Thanks for the responses.
> >> >
> >> > Setting first/newSearcher had no noticeable effect.  I'm sorting on a
> >> > stored/indexed field named 'text' who's fieldType is solr.TextField.
> >> >  Overall, the values of the field are unique. The JVM is only using
> about
> >> > 2G of the available 12G, so no OOM/GC issue (at least on the surface).
> >>  The
> >> > server is question is a slave with approximately 56 million documents.
> >> >  Additionally, sorting on a field of the same type but with
> significantly
> >> > less uniqueness results quick response times.
> >> >
> >> > The following is a sample of *debugQuery=true* for a query which
> returns
> >> 1
> >> > document:
> >> >
> >> > 
> >> >   61458.0
> >> >   
> >> > 61452.0
> >> >   
> >> >   
> >> > 0.0
> >> >   
> >> >   
> >> > 0.0
> >> >   
> >> >   
> >> > 0.0
> >> >   
> >> >   
> >> > 0.0
> >> >   
> >> >   
> >> > 6.0
> >> >   
> >> > 
> >> >
> >> >
> >> > -- Update --
> >> >
> >> > Out of desperation, I turned off replication by commenting out the
> * >> > name="slave">* element in the replication requestHandler block.  After
> >> > restarting tomcat I was surprised to find that the replication admin
> UI
> >> > still reported the core as replicating.  Search queries were still
> slow.
> >>  I
> >> > then disabled replication via the UI and the display updated to report
> >> the
> >> > core was no longer replicating.  Queries are now fast so it appears
> that
> >> > the sorting may be a red-herring.
> >> >
> >> > It's may be of note to also mention that the slow queries don't
> appear to
> >> > be getting cached.
> >> >
> >> > Thanks again for the feed back.
> >> >
> >> > On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky <
> j...@basetechnology.com
> >> >wrote:
> >> >
> >> >> Rerun the sorted query with &debugQuery=true and look at the module
> >> >> timings. See what stands out
> >> >>
> >> >> Ar

Re: Sorting by field is slow

2013-06-13 Thread Erick Erickson
Shane:

You've covered all the config stuff that I can think of. There's one
other possibility. Do you have the soft commits turned on and are
they very short? Although soft commits shouldn't invalidate any
segment-level caches (but I'm not sure whether the sorting buffers
are low-level or not).

About the only other thing I can think of is that you're somehow
doing hard commits from, say, the client but that's really
stretching.

All I can really say at this point is that this isn't a problem I've seen
before, so it's _likely_ some innocent-seeming config has changed.
I'm sure it'll be obvious once you find it ...

Erick

On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry  wrote:
> Erick,
>
> I agree, it doesn't make sense.  I manually merged the solrconfig.xml from
> the distribution example with my 3.6 solrconfig.xml, pulling out what I
> didn't need.  There is the possibility I removed something I shouldn't have
> though I don't know what it would be.  Minus removing the dynamic fields, a
> custom tokenizer class, and changing all my fields to be stored, the
> schema.xml file should be the same as well.  I'm not currently in the
> position to do so, but I'll double check those two files.  Finally, the
> data was re-indexed when I moved to 4.3.
>
> My statement about field values wasn't stated very well.  What I meant is
> that the 'text' field has more unique terms than some of my other fields.
>
> As for this being an edge case, I'm not sure why it would manifest itself
> in 4.3 but not in 3.6 (short of me having a screwy configuration setting).
>  If I get a chance, I'll see if I can duplicate the behavior with a small
> document count in a sandboxed environment.
>
> Shane
>
> On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson 
> wrote:
>
>> This doesn't make much sense, particularly the fact
>> that you added first/new searchers. I'm assuming that
>> these are sorting on the same field as your slow query.
>>
>> But sorting on a text field for which
>> "Overall, the values of the field are unique"
>> is a red-flag. Solr doesn't sort on fields that have
>> more than one term, so you might as well use a
>> string field and be done with it, it's possible you're
>> hitting some edge case.
>>
>> Did you just copy your 3.6 schema and configs to
>> 4.3? Did you re-index?
>>
>> Best
>> Erick
>>
>> On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry  wrote:
>> > Thanks for the responses.
>> >
>> > Setting first/newSearcher had no noticeable effect.  I'm sorting on a
>> > stored/indexed field named 'text' who's fieldType is solr.TextField.
>> >  Overall, the values of the field are unique. The JVM is only using about
>> > 2G of the available 12G, so no OOM/GC issue (at least on the surface).
>>  The
>> > server is question is a slave with approximately 56 million documents.
>> >  Additionally, sorting on a field of the same type but with significantly
>> > less uniqueness results quick response times.
>> >
>> > The following is a sample of *debugQuery=true* for a query which returns
>> 1
>> > document:
>> >
>> > 
>> >   61458.0
>> >   
>> > 61452.0
>> >   
>> >   
>> > 0.0
>> >   
>> >   
>> > 0.0
>> >   
>> >   
>> > 0.0
>> >   
>> >   
>> > 0.0
>> >   
>> >   
>> > 6.0
>> >   
>> > 
>> >
>> >
>> > -- Update --
>> >
>> > Out of desperation, I turned off replication by commenting out the *> > name="slave">* element in the replication requestHandler block.  After
>> > restarting tomcat I was surprised to find that the replication admin UI
>> > still reported the core as replicating.  Search queries were still slow.
>>  I
>> > then disabled replication via the UI and the display updated to report
>> the
>> > core was no longer replicating.  Queries are now fast so it appears that
>> > the sorting may be a red-herring.
>> >
>> > It's may be of note to also mention that the slow queries don't appear to
>> > be getting cached.
>> >
>> > Thanks again for the feed back.
>> >
>> > On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky > >wrote:
>> >
>> >> Rerun the sorted query with &debugQuery=true and look at the module
>> >> timings. See what stands out
>> >>
>> >> Are you actually sorting on a "text" field, as opposed to a "string"
>> field?
>> >>
>> >> Of course, it's always possible that maybe you're hitting some odd
>> OOM/GC
>> >> condition as a result of Solr growing  between releases.
>> >>
>> >> -- Jack Krupansky
>> >>
>> >> -Original Message- From: Shane Perry
>> >> Sent: Wednesday, June 12, 2013 3:00 PM
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Sorting by field is slow
>> >>
>> >>
>> >> In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
>> >> increased exponentially.  After testing in 4.3.0 it appears the same
>> query
>> >> (with 1 matching document) returns after 100 ms without sorting but
>> takes 1
>> >> minute when sorting by a text field.  I've looked around but haven't yet
>> >> found a reason for the degradation.  Can someone give me some insight or
>> >> point

Re: Sorting by field is slow

2013-06-12 Thread Shane Perry
Erick,

I agree, it doesn't make sense.  I manually merged the solrconfig.xml from
the distribution example with my 3.6 solrconfig.xml, pulling out what I
didn't need.  There is the possibility I removed something I shouldn't have
though I don't know what it would be.  Minus removing the dynamic fields, a
custom tokenizer class, and changing all my fields to be stored, the
schema.xml file should be the same as well.  I'm not currently in the
position to do so, but I'll double check those two files.  Finally, the
data was re-indexed when I moved to 4.3.

My statement about field values wasn't stated very well.  What I meant is
that the 'text' field has more unique terms than some of my other fields.

As for this being an edge case, I'm not sure why it would manifest itself
in 4.3 but not in 3.6 (short of me having a screwy configuration setting).
 If I get a chance, I'll see if I can duplicate the behavior with a small
document count in a sandboxed environment.

Shane

On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson wrote:

> This doesn't make much sense, particularly the fact
> that you added first/new searchers. I'm assuming that
> these are sorting on the same field as your slow query.
>
> But sorting on a text field for which
> "Overall, the values of the field are unique"
> is a red-flag. Solr doesn't sort on fields that have
> more than one term, so you might as well use a
> string field and be done with it, it's possible you're
> hitting some edge case.
>
> Did you just copy your 3.6 schema and configs to
> 4.3? Did you re-index?
>
> Best
> Erick
>
> On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry  wrote:
> > Thanks for the responses.
> >
> > Setting first/newSearcher had no noticeable effect.  I'm sorting on a
> > stored/indexed field named 'text' who's fieldType is solr.TextField.
> >  Overall, the values of the field are unique. The JVM is only using about
> > 2G of the available 12G, so no OOM/GC issue (at least on the surface).
>  The
> > server is question is a slave with approximately 56 million documents.
> >  Additionally, sorting on a field of the same type but with significantly
> > less uniqueness results quick response times.
> >
> > The following is a sample of *debugQuery=true* for a query which returns
> 1
> > document:
> >
> > 
> >   61458.0
> >   
> > 61452.0
> >   
> >   
> > 0.0
> >   
> >   
> > 0.0
> >   
> >   
> > 0.0
> >   
> >   
> > 0.0
> >   
> >   
> > 6.0
> >   
> > 
> >
> >
> > -- Update --
> >
> > Out of desperation, I turned off replication by commenting out the * > name="slave">* element in the replication requestHandler block.  After
> > restarting tomcat I was surprised to find that the replication admin UI
> > still reported the core as replicating.  Search queries were still slow.
>  I
> > then disabled replication via the UI and the display updated to report
> the
> > core was no longer replicating.  Queries are now fast so it appears that
> > the sorting may be a red-herring.
> >
> > It's may be of note to also mention that the slow queries don't appear to
> > be getting cached.
> >
> > Thanks again for the feed back.
> >
> > On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky  >wrote:
> >
> >> Rerun the sorted query with &debugQuery=true and look at the module
> >> timings. See what stands out
> >>
> >> Are you actually sorting on a "text" field, as opposed to a "string"
> field?
> >>
> >> Of course, it's always possible that maybe you're hitting some odd
> OOM/GC
> >> condition as a result of Solr growing  between releases.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Shane Perry
> >> Sent: Wednesday, June 12, 2013 3:00 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Sorting by field is slow
> >>
> >>
> >> In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
> >> increased exponentially.  After testing in 4.3.0 it appears the same
> query
> >> (with 1 matching document) returns after 100 ms without sorting but
> takes 1
> >> minute when sorting by a text field.  I've looked around but haven't yet
> >> found a reason for the degradation.  Can someone give me some insight or
> >> point me in the right direction for resolving this?  In most cases, I
> can
> >> change my code to do client-side sorting but I do have a couple of
> >> situations where pagination prevents client-side sorting.  Any help
> would
> >> be greatly appreciated.
> >>
> >> Thanks,
> >>
> >> Shane
> >>
>


Re: Sorting by field is slow

2013-06-12 Thread Erick Erickson
This doesn't make much sense, particularly the fact
that you added first/new searchers. I'm assuming that
these are sorting on the same field as your slow query.

But sorting on a text field for which
"Overall, the values of the field are unique"
is a red-flag. Solr doesn't sort on fields that have
more than one term, so you might as well use a
string field and be done with it, it's possible you're
hitting some edge case.

Did you just copy your 3.6 schema and configs to
4.3? Did you re-index?

Best
Erick

On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry  wrote:
> Thanks for the responses.
>
> Setting first/newSearcher had no noticeable effect.  I'm sorting on a
> stored/indexed field named 'text' who's fieldType is solr.TextField.
>  Overall, the values of the field are unique. The JVM is only using about
> 2G of the available 12G, so no OOM/GC issue (at least on the surface).  The
> server is question is a slave with approximately 56 million documents.
>  Additionally, sorting on a field of the same type but with significantly
> less uniqueness results quick response times.
>
> The following is a sample of *debugQuery=true* for a query which returns 1
> document:
>
> 
>   61458.0
>   
> 61452.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 6.0
>   
> 
>
>
> -- Update --
>
> Out of desperation, I turned off replication by commenting out the * name="slave">* element in the replication requestHandler block.  After
> restarting tomcat I was surprised to find that the replication admin UI
> still reported the core as replicating.  Search queries were still slow.  I
> then disabled replication via the UI and the display updated to report the
> core was no longer replicating.  Queries are now fast so it appears that
> the sorting may be a red-herring.
>
> It's may be of note to also mention that the slow queries don't appear to
> be getting cached.
>
> Thanks again for the feed back.
>
> On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky 
> wrote:
>
>> Rerun the sorted query with &debugQuery=true and look at the module
>> timings. See what stands out
>>
>> Are you actually sorting on a "text" field, as opposed to a "string" field?
>>
>> Of course, it's always possible that maybe you're hitting some odd OOM/GC
>> condition as a result of Solr growing  between releases.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Shane Perry
>> Sent: Wednesday, June 12, 2013 3:00 PM
>> To: solr-user@lucene.apache.org
>> Subject: Sorting by field is slow
>>
>>
>> In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
>> increased exponentially.  After testing in 4.3.0 it appears the same query
>> (with 1 matching document) returns after 100 ms without sorting but takes 1
>> minute when sorting by a text field.  I've looked around but haven't yet
>> found a reason for the degradation.  Can someone give me some insight or
>> point me in the right direction for resolving this?  In most cases, I can
>> change my code to do client-side sorting but I do have a couple of
>> situations where pagination prevents client-side sorting.  Any help would
>> be greatly appreciated.
>>
>> Thanks,
>>
>> Shane
>>


Re: Sorting by field is slow

2013-06-12 Thread Shane Perry
Thanks for the responses.

Setting first/newSearcher had no noticeable effect.  I'm sorting on a
stored/indexed field named 'text' who's fieldType is solr.TextField.
 Overall, the values of the field are unique. The JVM is only using about
2G of the available 12G, so no OOM/GC issue (at least on the surface).  The
server is question is a slave with approximately 56 million documents.
 Additionally, sorting on a field of the same type but with significantly
less uniqueness results quick response times.

The following is a sample of *debugQuery=true* for a query which returns 1
document:


  61458.0
  
61452.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
6.0
  



-- Update --

Out of desperation, I turned off replication by commenting out the ** element in the replication requestHandler block.  After
restarting tomcat I was surprised to find that the replication admin UI
still reported the core as replicating.  Search queries were still slow.  I
then disabled replication via the UI and the display updated to report the
core was no longer replicating.  Queries are now fast so it appears that
the sorting may be a red-herring.

It's may be of note to also mention that the slow queries don't appear to
be getting cached.

Thanks again for the feed back.

On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky wrote:

> Rerun the sorted query with &debugQuery=true and look at the module
> timings. See what stands out
>
> Are you actually sorting on a "text" field, as opposed to a "string" field?
>
> Of course, it's always possible that maybe you're hitting some odd OOM/GC
> condition as a result of Solr growing  between releases.
>
> -- Jack Krupansky
>
> -Original Message- From: Shane Perry
> Sent: Wednesday, June 12, 2013 3:00 PM
> To: solr-user@lucene.apache.org
> Subject: Sorting by field is slow
>
>
> In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
> increased exponentially.  After testing in 4.3.0 it appears the same query
> (with 1 matching document) returns after 100 ms without sorting but takes 1
> minute when sorting by a text field.  I've looked around but haven't yet
> found a reason for the degradation.  Can someone give me some insight or
> point me in the right direction for resolving this?  In most cases, I can
> change my code to do client-side sorting but I do have a couple of
> situations where pagination prevents client-side sorting.  Any help would
> be greatly appreciated.
>
> Thanks,
>
> Shane
>


Re: Sorting by field is slow

2013-06-12 Thread Jack Krupansky
Rerun the sorted query with &debugQuery=true and look at the module timings. 
See what stands out


Are you actually sorting on a "text" field, as opposed to a "string" field?

Of course, it's always possible that maybe you're hitting some odd OOM/GC 
condition as a result of Solr growing  between releases.


-- Jack Krupansky

-Original Message- 
From: Shane Perry

Sent: Wednesday, June 12, 2013 3:00 PM
To: solr-user@lucene.apache.org
Subject: Sorting by field is slow

In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
increased exponentially.  After testing in 4.3.0 it appears the same query
(with 1 matching document) returns after 100 ms without sorting but takes 1
minute when sorting by a text field.  I've looked around but haven't yet
found a reason for the degradation.  Can someone give me some insight or
point me in the right direction for resolving this?  In most cases, I can
change my code to do client-side sorting but I do have a couple of
situations where pagination prevents client-side sorting.  Any help would
be greatly appreciated.

Thanks,

Shane 



Re: Sorting by field is slow

2013-06-12 Thread bbarani
http://wiki.apache.org/solr/SolrPerformanceFactors

If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the "newSearcher" and "firstSearcher" event listeners in
your solrconfig which sort on those fields, so the FieldCache is populated
prior to any queries being executed by your users.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-by-field-is-slow-tp4070026p4070028.html
Sent from the Solr - User mailing list archive at Nabble.com.