[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052194#comment-13052194 ]
Michael McCandless commented on LUCENE-2454: -------------------------------------------- bq. Would modules/grouping meanwhile be a better place for this than lucene/contrib/queries? I think modules/join is the right place? When we factor out Solr's generic join impl it can go there too... I have some concerns about the current approach here (this is why I opened LUCENE-3171): * prevSetBit is called for each child doc, which is an O(N^2) cost (N = number of child docs for one parent) I think? Admittedly, "typically" N is probably small... * It uses 2 passes if you also want to collect child docs per parent * PerParentLimitedQuery is also O(N^2) cost, both on insert of a new child and on popping the child docs per group: I think it should use a PQ to find the lowest child to evict per parent doc? * I think "typically" an app will want to collect the top N groups (parent docs and their children), so it's more efficient to gather those top N and only in the end sort the each set of children per-parent? (This is similar to how 2nd pass grouping collector works). * PerParentLimitedQuery only supports relevance sort w/in each parent. * You don't get the parent/child structure back, from PerParentLimitedQuery (but now we have TopGroups which is a great match for representing each parent and its children). If you always only use PerParentLimitedQuery on the top parents from the first pass, eg you AND/filter it against those parent docs, then the O(N^2) cost is less severe since it'll have a small constant in front, but since it's a Query I imagine users will use it w/o that filter, which is bad... I think using a TopN Collector is a better match here. > Nested Document query support > ----------------------------- > > Key: LUCENE-2454 > URL: https://issues.apache.org/jira/browse/LUCENE-2454 > Project: Lucene - Java > Issue Type: New Feature > Components: core/search > Affects Versions: 3.0.2 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Attachments: LUCENE-2454.patch, LUCENE-2454.patch, > LuceneNestedDocumentSupport.zip > > > A facility for querying nested documents in a Lucene index as outlined in > http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org