[ 
https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189469#comment-13189469
 ] 

Martijn van Groningen commented on LUCENE-3602:
-----------------------------------------------

I'm not sure how you plan to sort by DTI ords. The terms collected in the first 
phase are from many segments. The ords from DTI are only valid inside a 
segment. You can create a toplevel DTI but that is expensive... 

Currently caching is minimal and can be improved at the cost of more RAM. The 
TermsCollector caches the from terms via DocTerms in the FC (per segment). 
Caching can be improved in the second phase as you described, by saving a 
bitset per fromTerm?. But I think we first need to tackle how bitsets are 
cached in general. Solr caches (which the Solr JoinQuery uses) are top level 
(one commit and you lose it all). I'm unsure how to cache the posting list file 
pointers with the current Lucene api... I think this (joining) caching should 
be addressed in a new issue.

Performance of the JoinUtil looks actual quite good from what I have measured. 
I have a test data set containing 100000 products and 100 offers per product 
(10100000 docs in total). The JoinUtil is between 2 till 3 times faster than 
Solr's JoinQuery with this data set on my dev machine.
                
> Add join query to Lucene
> ------------------------
>
>                 Key: LUCENE-3602
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3602
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/join
>            Reporter: Martijn van Groningen
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch
>
>
> Solr has (psuedo) join query for a while now. I think this should also be 
> available in Lucene.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to