[jira] [Commented] (LUCENE-3112) Add IW.add/updateDocuments to support nested documents

Michael McCandless (JIRA) Tue, 17 May 2011 06:15:34 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034750#comment-13034750
 ]


Michael McCandless commented on LUCENE-3112:
--------------------------------------------

bq. Yet, I think you should push the document iteration etc into DWPT to 
actually apply the delterm only once to make it really atomic.

Ahh good point -- it's wrong just passing that delTerm down N times, too.  I'll 
fix.

bq. I also wonder if we should allow multiple delTerm e.g. Tuple<DelTerm, 
Document> otherwise you would be bound to one delterm pre "collection" but what 
if you want to remove only one of the "sub-documents"?

So, this won't work today w/ nested querying, if I understand it right.  Ie, if 
you only update one of the subs, now your subdocs are no longer sequential (nor 
in one segment).  So I think "design for today" here...?

Someday, when we implement incremental field updates correctly, so that updates 
are written as stacked segments against the original segment containing the 
document, at that point I think we can add an API that lets you update multiple 
docs atomically?
{quote}

> Add IW.add/updateDocuments to support nested documents
> ------------------------------------------------------
>
>                 Key: LUCENE-3112
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3112
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3112.patch
>
>
> I think nested documents (LUCENE-2454) is a very compelling addition
> to Lucene.  It's also a popular (many votes) issue.
> Beyond supporting nested document querying, which is already an
> incredible addition since it preserves the relational model on
> indexing normalized content (eg, DB tables, XML docs), LUCENE-2454
> should also enable speedups in grouping implementation when you group
> by a nested field.
> For the same reason, it can also enable very fast post-group facet
> counting impl (LUCENE-3097) when you what to
> count(distinct(nestedField)), instead of unique documents, as your
> "identifier".  I expect many apps that use faceting need this ability
> (to count(distinct(nestedField)) not distinct(docID)).
> To support these use cases, I believe the only core change needed is
> the ability to atomically add or update multiple documents, which you
> cannot do today since in between add/updateDocument calls a flush (eg
> due to commit or getReader()) could occur.
> This new API (addDocuments(Iterable<Document>), updateDocuments(Term
> delTerm, Iterable<Document>) would also further guarantee that the
> documents are assigned sequential docIDs in the order the iterator
> provided them, and that the docIDs all reside in one segment.
> Segment merging never splits segments apart, so this invariant would
> hold even as merges/optimizes take place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3112) Add IW.add/updateDocuments to support nested documents

Reply via email to