[ https://issues.apache.org/jira/browse/LUCENE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034854#comment-13034854 ]
Robert Muir commented on LUCENE-3112: ------------------------------------- {quote} I suppose we could consider changing the index format today to record which docs are subs... but I think we don't need to. Maybe I should strengthen the @experimental to explain the risk that a future reindexing could be required? {quote} I think this would be perfect. I certainly don't want to hold up this improvement, yet, in the future I just didnt want us to be in a situation where we say 'well if only we had recorded this information, now its not possible to do XYZ because someone COULD have used add/updateDocuments() for some arbitrary reason and we will 'split' their grouped ids'. We could also include in the note that various existing IndexSorters/Splitters are unaware about this, so use with caution :) > Add IW.add/updateDocuments to support nested documents > ------------------------------------------------------ > > Key: LUCENE-3112 > URL: https://issues.apache.org/jira/browse/LUCENE-3112 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3112.patch > > > I think nested documents (LUCENE-2454) is a very compelling addition > to Lucene. It's also a popular (many votes) issue. > Beyond supporting nested document querying, which is already an > incredible addition since it preserves the relational model on > indexing normalized content (eg, DB tables, XML docs), LUCENE-2454 > should also enable speedups in grouping implementation when you group > by a nested field. > For the same reason, it can also enable very fast post-group facet > counting impl (LUCENE-3097) when you what to > count(distinct(nestedField)), instead of unique documents, as your > "identifier". I expect many apps that use faceting need this ability > (to count(distinct(nestedField)) not distinct(docID)). > To support these use cases, I believe the only core change needed is > the ability to atomically add or update multiple documents, which you > cannot do today since in between add/updateDocument calls a flush (eg > due to commit or getReader()) could occur. > This new API (addDocuments(Iterable<Document>), updateDocuments(Term > delTerm, Iterable<Document>) would also further guarantee that the > documents are assigned sequential docIDs in the order the iterator > provided them, and that the docIDs all reside in one segment. > Segment merging never splits segments apart, so this invariant would > hold even as merges/optimizes take place. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org