[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648869#action_12648869 ]
Tim Sturge commented on LUCENE-1461: ------------------------------------ Here's some benchmark data to demonstrate the utility. Results on a 45M document index: Firstly without an age constraint as a baseline: Query "+name:tim" startup: 0 Hits: 15089 first query: 1004 100 queries: 132 (1.32 msec per query) Now with a cached filter. This is ideal from a speed standpoint but as with most range based queries there are too many possible start/end combinations to cache all the filters. Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on cached RangeFilter) startup: 3 Hits: 11156 first query: 1830 100 queries: 287 (2.87 msec per query) Now with an uncached filter. This is awful. Query "+name:tim age:[18 TO 35]" (uncached ConstantScoreRangeQuery) startup: 3 Hits: 11156 first query: 1665 100 queries: 51862 (yes, 518 msec per query, 200x slower) A RangeQuery is slightly better but still bad (and has a different result set) Query "+name:tim age:[18 TO 35]" (uncached RangeQuery) startup: 0 Hits: 10147 first query: 1517 100 queries: 27157 (271 msec is 100x slower than the filter) Now with the prebuilt column stride filter: Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on prebuilt column stride filter) startup: 2811 Hits: 11156 first query: 1395 100 queries: 441 (back down to 4.41msec per query) This is less than 2x slower than the dedicated bitset and more than 50x faster than the range boolean query. > Cached filter for a single term field > ------------------------------------- > > Key: LUCENE-1461 > URL: https://issues.apache.org/jira/browse/LUCENE-1461 > Project: Lucene - Java > Issue Type: New Feature > Reporter: Tim Sturge > Attachments: DisjointMultiFilter.java, RangeMultiFilter.java > > > These classes implement inexpensive range filtering over a field containing a > single term. They do this by building an integer array of term numbers > (storing the term->number mapping in a TreeMap) and then implementing a fast > integer comparison based DocSetIdIterator. > This code is currently being used to do age range filtering, but could also > be used to do other date filtering or in any application where there need to > be multiple filters based on the same single term field. I have an untested > implementation of single term filtering and have considered but not yet > implemented term set filtering (useful for location based searches) as well. > The code here is fairly rough; it works but lacks javadocs and toString() and > hashCode() methods etc. I'm posting it here to discover if there is other > interest in this feature; I don't mind fixing it up but would hate to go to > the effort if it's not going to make it into Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]