[
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052194#comment-13052194
]
Michael McCandless commented on LUCENE-2454:
--------------------------------------------
bq. Would modules/grouping meanwhile be a better place for this than
lucene/contrib/queries?
I think modules/join is the right place? When we factor out Solr's
generic join impl it can go there too...
I have some concerns about the current approach here (this is why I
opened LUCENE-3171):
* prevSetBit is called for each child doc, which is an O(N^2) cost
(N = number of child docs for one parent) I think? Admittedly,
"typically" N is probably small...
* It uses 2 passes if you also want to collect child docs per
parent
* PerParentLimitedQuery is also O(N^2) cost, both on insert of a new
child and on popping the child docs per group: I think it should
use a PQ to find the lowest child to evict per parent doc?
* I think "typically" an app will want to collect the top N groups
(parent docs and their children), so it's more efficient to gather
those top N and only in the end sort the each set of children
per-parent? (This is similar to how 2nd pass grouping collector
works).
* PerParentLimitedQuery only supports relevance sort w/in each
parent.
* You don't get the parent/child structure back, from
PerParentLimitedQuery (but now we have TopGroups which is a great
match for representing each parent and its children).
If you always only use PerParentLimitedQuery on the top parents from
the first pass, eg you AND/filter it against those parent docs, then
the O(N^2) cost is less severe since it'll have a small constant in
front, but since it's a Query I imagine users will use it w/o that
filter, which is bad... I think using a TopN Collector is a better match
here.
> Nested Document query support
> -----------------------------
>
> Key: LUCENE-2454
> URL: https://issues.apache.org/jira/browse/LUCENE-2454
> Project: Lucene - Java
> Issue Type: New Feature
> Components: core/search
> Affects Versions: 3.0.2
> Reporter: Mark Harwood
> Assignee: Mark Harwood
> Priority: Minor
> Attachments: LUCENE-2454.patch, LUCENE-2454.patch,
> LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]