[ https://issues.apache.org/jira/browse/LUCENE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Russell updated LUCENE-5637: ---------------------------------- Attachment: Lucene-5637.patch Patch for Solr 4.8, with one unit test failure I haven't figured out yet. > Scaling scale function > ---------------------- > > Key: LUCENE-5637 > URL: https://issues.apache.org/jira/browse/LUCENE-5637 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Chris Russell > Priority: Minor > Labels: patch, performance > Fix For: 4.8 > > Attachments: Lucene-5637.patch > > > The existing scale() function examines the scores of all documents in the > index in order to calculate its scale constant. This does not perform well > in solr on very large indexes or with costly scoring mechanisms such as geo > distance. > I have developed a patch that allows the scale function to only score > documents that match the given filters, thus improving performance of the > scale function. > For test queries involving two scale operations where one was scaling the > result of keyword scoring and the other was scaling the result of geo > distance scoring on an index with ~2 million documents, query time was > improved from ~400 ms with vanilla scale to ~190 ms with new scale. A > similar query using no scaling ran in ~90 ms. (Each enhanced scale function > added to the query appeared to add about 50 ms of processing) > e.g. scaled query - q = scale(keywords, 0, 90) and scale(geo, 0, 10) > e.g. unscaled query - q = keywords and geo > In both cases fq includes keywords and geo. > In order to accomplish this goal I had to introduce a couple of changes: > 1) In the indexsearcher.search method where scorers are created and then used > to score on a per-atomicreadercontext basis I had to make it so that all > scorers would be created before any scoring was done. This was so that the > scale function would have an opportunity to observe the entire index before > being asked to score something. > 2) Introduced a new property to the Bits interface that indicates whether or > not the bits provide constant-time access. Why? Read on. > 3) FilterSet used to return Null when asked for its bits because it did not > have any, it had an iterator. This was an issue when trying to make it so > that scale would only score documents matching the filter. Thus a new bits > implementation was added (LazyIteratorBackedBits) that could expose an > iterator as a Bits implementation. It advances the iterator on-demand when > asked about a document and uses an OpenBitSet to keep track of what it has > advanced beyond. Thus once the iterator is exhausted it provides > constant-time answers like any other Bits. > 4) Introduced a function on the ValueSource interface to allow a Bits to be > passed in for filtering purposes. > This was originally developed against Solr 4.2 but I have ported it to Solr > 4.8. There is one failing unit test related to code that has been added in > the interim, AnalyzingInfixSuggesterTest.testRandomNRT. I have not been able > to figure out why this test fails. All other tests pass. > In relation to implementation detail 1) above, the introduction of > LeafCollectors in trunk has caused somewhat of an issue. It seems to no > longer be possible to create multiple scorers without immediately scoring on > that LeafCollector. This may be related to the encapsulation of the > Collector.setNextReader() method which was very useful for this purpose. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org