[jira] [Commented] (LUCENE-4258) Incremental Field Updates through Stacked Segments

Michael McCandless (JIRA) Mon, 30 Jul 2012 04:54:37 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424803#comment-13424803
 ]


Michael McCandless commented on LUCENE-4258:
--------------------------------------------

bq. BTW, since the new method is to handle multiple fields (as the name 
suggests), the operation descriptions should also be in plural: UPDATE_FIELDS 
and REPLACE_FIELDS.

+1

I think this design sounds good!  REPLACE_FIELDS should easily be able
to update norms correctly, right?  Because the full per-field stats
are recomputed from scratch.  So then scores should be identical:
should be a nice simple testcase to create :)

I don't see how UPDATE_FIELDS can do so unless we somehow save the raw
stats (FieldInvertState) in the index.  It seems like UPDATE_FIELDS
should forever be limited to DOCS_ONLY, no norms updating?  Positions
also seems hard to update, and if the only reason to do so is for
payloads... seems like the app should be using doc values instead, and
we should (eventually) make doc values updatable?.

I do think this is a common use case (ACLs, filters, social
tags)... though I'm not sure how bad it'd really be in practice for
the app to simply REPLACE_FIELDS with the full set of tags.  I guess
if we build REPLACE_FIELDS first we can test that.

The implementation should be able to piggy-back on all the
buffering/tracking we currently do for buffered deletes.

I think this change should live entirely above Codec?  Ie Codec just
thinks it's writing a segment, not knowing if that segment is the base
segment, or one of the stacked ones.  If the +postings and -postings
are simply 2 terms then the Codec need not know...

Seems like only SegmentInfos needs to track how segments stack up, and
then I guess we'd need a new StackedSegmentReader that is atomic,
holds N SegmentReaders, and presents the merged codec APIs by merging
down the stack on the fly?  I suspect this (having to use a PQ to
merge the docIDs in the postings) will be a huge search performance
hit....

I think UnionDocs/AndPositionsEnum (in MultiPhraseQuery.java) is
already doing what we want?  (Except it doesn't handle negative
postings).

What about merging?  Seems like the merge policy should know about
stacking and should sometimes (aggressively?) merge a stack down?

                
> Incremental Field Updates through Stacked Segments
> --------------------------------------------------
>
>                 Key: LUCENE-4258
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4258
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Sivan Yogev
>   Original Estimate: 2,520h
>  Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field 
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4258) Incremental Field Updates through Stacked Segments

Reply via email to