[ https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Morten Bøgeskov updated SOLR-13337: ----------------------------------- Attachment: SOLR-13337.patch > TermsComponent sharded and terms.sort=index performance > ------------------------------------------------------- > > Key: SOLR-13337 > URL: https://issues.apache.org/jira/browse/SOLR-13337 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SearchComponents - other > Affects Versions: 7.7 > Environment: Linux 64bit debian > 20-node cluster > Reporter: Morten Bøgeskov > Priority: Minor > Attachments: > 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, > SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch > > > When the TermsComponet distributes across all shards, all (terms.limit=-1) > are returned. > This ought not to be needed when using terms.sort=index. > When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s > to do a > /terms?terms.fl=register&terms.sort=index&terms.lower=a I did not try it on > production data (10x) > I do get the reason for getting all terms when sorting by count, however when > sorting by index, no more than the terms.limit number rows is required from > any shard. Most likely some will get discarded due to presence in more than > one shard. Given no term.min/maxcount (which definetely throws a spanner in > the works). > > I've attached what I think would do the trick. > I haven't actually tested the patch (it compiles, however some other files in > the checkout I have doesn't: ant compile, javac: "error: cannot find symbol") > > Might be somewhat related issue (SOLR-2908). I didn't quite get the more > subtle information in it. > > > Tested by > * applying patch to 7.7.1 (the one we use in production) > * start up on spare server (during off house on test system) > * add a replica from a collection (so that it'll serve requests) > * request /terms?terms.fl=phrase.title&terms.sort=index&terms.lower=a from > the instance ~30 ms > * request the same from another unpatched instance ~17k ms > * both returned same result > * added terms.mincount=2 to the quick request. failed with out of memory > * restarted sever with more memory (.5g -> 8g) > * request completed in ~18k ms > > I don't see how I'm supposed to unit test the functionality given it requires > a cloud instance and sufficient data to give measurable difference with or > without extra request arguments. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org