[ https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972288#action_12972288 ]
Michael Busch commented on LUCENE-2814: --------------------------------------- bq. I think taking things one step at a time would be good here? Probably still a smaller change than flex indexing ;) But yeah in general I agree that we should do things more incrementally. I think that's a mistake I've made with the RT branch so far. In this particular case it's just a bit sad to redo all this work now, because I think I got the removal of doc stores right in RT and all related tests to pass. > stop writing shared doc stores across segments > ---------------------------------------------- > > Key: LUCENE-2814 > URL: https://issues.apache.org/jira/browse/LUCENE-2814 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 3.1, 4.0 > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-2814.patch, LUCENE-2814.patch > > > Shared doc stores enables the files for stored fields and term vectors to be > shared across multiple segments. We've had this optimization since 2.1 I > think. > It works best against a new index, where you open an IW, add lots of docs, > and then close it. In that case all of the written segments will reference > slices a single shared doc store segment. > This was a good optimization because it means we never need to merge these > files. But, when you open another IW on that index, it writes a new set of > doc stores, and then whenever merges take place across doc stores, they must > now be merged. > However, since we switched to shared doc stores, there have been two > optimizations for merging the stores. First, we now bulk-copy the bytes in > these files if the field name/number assignment is "congruent". Second, we > now force congruent field name/number mapping in IndexWriter. This means > this optimization is much less potent than it used to be. > Furthermore, the optimization adds *a lot* of hair to > IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over > time, and causes odd behavior like a merge possibly forcing a flush when it > starts. Finally, with DWPT (LUCENE-2324), which gets us truly concurrent > flushing, we can no longer share doc stores. > So, I think we should turn off the write-side of shared doc stores to pave > the path for DWPT to land on trunk and simplify IW/DW. We still must support > reading them (until 5.0), but the read side is far less hairy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org