[jira] [Commented] (LUCENE-4272) another idea for updatable fields

Robert Muir (JIRA) Mon, 30 Jul 2012 13:39:38 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425208#comment-13425208
 ]


Robert Muir commented on LUCENE-4272:
-------------------------------------

{quote}
We'd also need to open up the TV APIs so we can get TVs for a doc in the 
current segment, for the case where app adds a doc and later (before flush), 
replaces some fields.
{quote}

Realistically I'd like to support that anyway for the norms case so that codecs 
can index term impacts (LUCENE-4198),
as this is going to involve length normalization in addition to TF. But 
currently the postings writer has no way
to "see" this.

So it would be nice if we could do solve that too, then we wouldnt need 
norms/dvs in the vectors (they are already per-doc).
This would make for a faster way of updating docvalues fields: for that 
specific case I think more can be done
but it would be an improvement and fit well.

                
> another idea for updatable fields
> ---------------------------------
>
>                 Key: LUCENE-4272
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4272
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Robert Muir
>
> I've been reviewing the ideas for updatable fields and have an alternative
> proposal that I think would address my biggest concern:
> * not slowing down searching
> When I look at what Solr and Elasticsearch do here, by basically reindexing 
> from stored fields, I think they solve a lot of the problem: users don't have 
> to "rebuild" their document from scratch just to update one tiny piece.
> But I think we can do this more efficiently: by avoiding reindexing of the 
> unaffected fields.
> The basic idea is that we would require term vectors for this approach (as 
> the already store a serialized indexed version of the doc), and so we could 
> just take the other pieces from the existing vectors for the doc.
> I think we would have to extend vectors to also store the norm (so we dont 
> recompute that), and payloads, but it seems feasible at a glance.
> I dont think we should discard the idea because vectors are slow/big today, 
> this seems like something we could fix.
> Personally I like the idea of not slowing down search performance to solve 
> the problem, I think we should really start from that angle and work towards 
> making the indexing side more efficient, not vice-versa.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4272) another idea for updatable fields

Reply via email to