[ 
https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443409#comment-13443409
 ] 

Greg Bowyer commented on SOLR-3763:
-----------------------------------

I guess my next step is to get caching working, I am not sure quite how to take 
baby steps with this beyond getting to feature parity.
                
> Make solr use lucene filters directly
> -------------------------------------
>
>                 Key: SOLR-3763
>                 URL: https://issues.apache.org/jira/browse/SOLR-3763
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: Greg Bowyer
>            Assignee: Greg Bowyer
>         Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch
>
>
> Presently solr uses bitsets, queries and collectors to implement the concept 
> of filters. This has proven to be very powerful, but does come at the cost of 
> introducing a large body of code into solr making it harder to optimise and 
> maintain.
> Another issue here is that filters currently cache sub-optimally given the 
> changes in lucene towards atomic readers.
> Rather than patch these issues, this is an attempt to rework the filters in 
> solr to leverage the Filter subsystem from lucene as much as possible.
> In good time the aim is to get this to do the following:
> ∘ Handle setting up filter implementations that are able to correctly cache 
> with reference to the AtomicReader that they are caching for rather that for 
> the entire index at large
> ∘ Get the post filters working, I am thinking that this can be done via 
> lucenes chained filter, with the ‟expensive” filters being put towards the 
> end of the chain - this has different semantics internally to the original 
> implementation but IMHO should have the same result for end users
> ∘ Learn how to create filters that are potentially more efficient, at present 
> solr basically runs a simple query that gathers a DocSet that relates to the 
> documents that we want filtered; it would be interesting to make use of 
> filter implementations that are in theory faster than query filters (for 
> instance there are filters that are able to query the FieldCache)
> ∘ Learn how to decompose filters so that a complex filter query can be cached 
> (potentially) as its constituent parts; for example the filter below 
> currently needs love, care and feeding to ensure that the filter cache is not 
> unduly stressed
> {code}
>   'category:(100) OR category:(200) OR category:(300)'
> {code}
> Really there is no reason not to express this in a cached form as 
> {code}
> BooleanFilter(
>     FilterClause(CachedFilter(TermFilter(Term("category", 100))), SHOULD),
>     FilterClause(CachedFilter(TermFilter(Term("category", 200))), SHOULD),
>     FilterClause(CachedFilter(TermFilter(Term("category", 300))), SHOULD)
>   )
> {code}
> This would yeild better cache usage I think as we can resuse docsets across 
> multiple queries as well as avoid issues when filters are presented in 
> differing orders
> ∘ Instead of end users providing costing we might (and this is a big might 
> FWIW), be able to create a sort of execution plan of filters, leveraging a 
> combination of what the index is able to tell us as well as sampling and 
> ‟educated guesswork”; in essence this is what some DBMS software, for example 
> postgresql does - it has a genetic algo that attempts to solve the travelling 
> salesman - to great effect
> ∘ I am sure I will probably come up with other ambitious ideas to plug in 
> here ..... :S 
> Patches obviously forthcoming but the bulk of the work can be followed here 
> https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to