Re: Sort on docValue field is slow.

2019-05-20 Thread Erick Erickson
Shawn’s right. You have a mixed index, some segments have docValues and some 
don’t. So yes, you do need to reindex everything before drawing conclusions. To 
make matters worse, when you start indexing documents new segments with 
docvalues will eventually be merged with segments that don’t have docValues, 
leading to significant inconsistencies.

As with all sorting, you can tell nothing from one test. The first time a field 
is accessed for sorting it must be read from disk in either case (docValues 
true or false). The difference is that with docValues=true, the “uninverted” 
structure must be built from the indexed values on the Java heap. In the 
docValues=true case, it’s just un-serialized from disk into the OS memory.

Point is that after you’ve completely re-indexed everything (and I would, 
indeed, use a new collection) the first time you use the field it’ll take extra 
time. You can’t draw any valid conclusions until you average over quite a 
number of queries or throw out the first few times.

Best,
Erick

> On May 20, 2019, at 8:30 AM, Shawn Heisey  wrote:
> 
> On 5/20/2019 8:59 AM, Ashwin Ramesh wrote:
>> Hi Shawn,
>> Thanks for the prompt response.
>> 1. date type def - > positionIncrementGap="0" />
>> 2. The field is brand new. I added it to schema.xml, uploaded to ZK &
>> reloaded the collection. After that we started indexing the few thousand.
>> Did we still need to do a full reindex to a fresh collection?
>> 3. It is the only difference. I am testing the raw URL call timing
>> difference with and without the extra sort.
> 
> As I understand it, the docValues data will not be correct for the existing 
> documents if they are not all reindexed.  If I am wrong, I am sure somebody 
> will correct me.  Although I would not expect that to make things slow, the 
> internal Lucene details are not something I have a lot of insight into.
> 
> Thanks,
> Shawn



Re: Sort on docValue field is slow.

2019-05-20 Thread Shawn Heisey

On 5/20/2019 8:59 AM, Ashwin Ramesh wrote:

Hi Shawn,

Thanks for the prompt response.

1. date type def - 

2. The field is brand new. I added it to schema.xml, uploaded to ZK &
reloaded the collection. After that we started indexing the few thousand.
Did we still need to do a full reindex to a fresh collection?

3. It is the only difference. I am testing the raw URL call timing
difference with and without the extra sort.


As I understand it, the docValues data will not be correct for the 
existing documents if they are not all reindexed.  If I am wrong, I am 
sure somebody will correct me.  Although I would not expect that to make 
things slow, the internal Lucene details are not something I have a lot 
of insight into.


Thanks,
Shawn


Re: Sort on docValue field is slow.

2019-05-20 Thread Ashwin Ramesh
Hi Shawn,

Thanks for the prompt response.

1. date type def - 

2. The field is brand new. I added it to schema.xml, uploaded to ZK &
reloaded the collection. After that we started indexing the few thousand.
Did we still need to do a full reindex to a fresh collection?

3. It is the only difference. I am testing the raw URL call timing
difference with and without the extra sort.

Hope this helps,

Regards,

Ash



On Mon, May 20, 2019 at 11:17 PM Shawn Heisey  wrote:

> On 5/20/2019 6:25 AM, Ashwin Ramesh wrote:
> > Hoping to get advice on a specific issue - We have a collection of 50M
> > documents. We recently added a featuredAt field defined as such -
> >
> >  > required="false"
> > multiValued="false" docValues="true"/>
>
> What is the fieldType definition for "date"?  We cannot assume that you
> have left this the same as Solr's sample configs.
>
> > This field is sparely populated such that only a small subset (3-5
> thousand
> > currently) have been tagged with that field.
>
> Did you completely reindex, or just index those few thousand records?
> When changing fields related to docValues, you must completely delete
> the old index and reindex.  That's just how docValues works.
>
> > We have a business case where we want to order this content by most
> > recently featured -> least recently featured -> the rest of the content
> in
> > any order. However adding the `sort=featuredAt desc` param results in
> qTime
> >> 5000 (our hard timeout is 5000).
>
> Is the definition of the sort parameter the ONLY difference?  Are you
> querying on the new field?  Can you share the entire query URL, or the
> code that produced it if you're using a Solr client?  What is the before
> QTime?
>
> Thanks,
> Shawn
>

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
. ***
** Empowering the 
world to design
Also, we're hiring. Apply here! 

  
  
    
  








Re: Sort on docValue field is slow.

2019-05-20 Thread Shawn Heisey

On 5/20/2019 6:25 AM, Ashwin Ramesh wrote:

Hoping to get advice on a specific issue - We have a collection of 50M
documents. We recently added a featuredAt field defined as such -




What is the fieldType definition for "date"?  We cannot assume that you 
have left this the same as Solr's sample configs.



This field is sparely populated such that only a small subset (3-5 thousand
currently) have been tagged with that field.


Did you completely reindex, or just index those few thousand records? 
When changing fields related to docValues, you must completely delete 
the old index and reindex.  That's just how docValues works.



We have a business case where we want to order this content by most
recently featured -> least recently featured -> the rest of the content in
any order. However adding the `sort=featuredAt desc` param results in qTime

5000 (our hard timeout is 5000).


Is the definition of the sort parameter the ONLY difference?  Are you 
querying on the new field?  Can you share the entire query URL, or the 
code that produced it if you're using a Solr client?  What is the before 
QTime?


Thanks,
Shawn


Sort on docValue field is slow.

2019-05-20 Thread Ashwin Ramesh
Hello everybody,

Hoping to get advice on a specific issue - We have a collection of 50M
documents. We recently added a featuredAt field defined as such -



This field is sparely populated such that only a small subset (3-5 thousand
currently) have been tagged with that field.

We have a business case where we want to order this content by most
recently featured -> least recently featured -> the rest of the content in
any order. However adding the `sort=featuredAt desc` param results in qTime
> 5000 (our hard timeout is 5000).

The request handler processing this request is defined as follows:

  
*
  
  
id
edismax
10
id
  
  
elevator
  


We hydrate content with a seperate store.

Any advice as to how to improve the performance of this request handler +
sorting.

System/Architecture Specs:
Solr 7.4
8 Shards
TLOG / PULLs

Thank you & Regards,

Ash

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
. ***
** Empowering the 
world to design
Also, we're hiring. Apply here!