[ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13337:
------------------------------------
    Attachment:     (was: SOLR-13337.patch)

> TermsComponent sharded and terms.sort=index performance
> -------------------------------------------------------
>
>                 Key: SOLR-13337
>                 URL: https://issues.apache.org/jira/browse/SOLR-13337
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SearchComponents - other
>    Affects Versions: 7.7
>         Environment: Linux 64bit debian
> 20-node cluster
>            Reporter: Morten Bøgeskov
>            Priority: Minor
>         Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, 
> screenshot-1.png
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register&terms.sort=index&terms.lower=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title&terms.sort=index&terms.lower=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to