[ https://issues.apache.org/jira/browse/LUCENE-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425159#comment-13425159 ]
Michael McCandless commented on LUCENE-4272: -------------------------------------------- This is an interesting idea! And it makes sense to factor this down from ElasticSearch/Solr. So we have the codec approach (LUCENE-3837), the stacked-segments approach (LUCENE-4258), and this new approach (copy over already-inverted fields). We could quite efficiently add the already-inverted doc (term vectors) to the in-memory postings. And then there'd be zero impact to search performance, and no (well, small) index format changes. The only downside is the use case of replacing tiny fields on otherwise massive docs: in this case the other approaches would be faster at indexing (but still slower at searching). I agree not slowing down search is a big plus for this approach. We'd also need to open up the TV APIs so we can get TVs for a doc in the current segment, for the case where app adds a doc and later (before flush), replaces some fields. And we need to pool readers in IW so the updates can on-demand resolve the Term to docIDs. Hmm and we'd need to be able to do so for the in-memory segment (I think we should not support replaceFields by Query for starters). > another idea for updatable fields > --------------------------------- > > Key: LUCENE-4272 > URL: https://issues.apache.org/jira/browse/LUCENE-4272 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Robert Muir > > I've been reviewing the ideas for updatable fields and have an alternative > proposal that I think would address my biggest concern: > * not slowing down searching > When I look at what Solr and Elasticsearch do here, by basically reindexing > from stored fields, I think they solve a lot of the problem: users don't have > to "rebuild" their document from scratch just to update one tiny piece. > But I think we can do this more efficiently: by avoiding reindexing of the > unaffected fields. > The basic idea is that we would require term vectors for this approach (as > the already store a serialized indexed version of the doc), and so we could > just take the other pieces from the existing vectors for the doc. > I think we would have to extend vectors to also store the norm (so we dont > recompute that), and payloads, but it seems feasible at a glance. > I dont think we should discard the idea because vectors are slow/big today, > this seems like something we could fix. > Personally I like the idea of not slowing down search performance to solve > the problem, I think we should really start from that angle and work towards > making the indexing side more efficient, not vice-versa. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org