I had a quick look and couldn't find anything to prevent what you called “franken-segments” in the Lucene test?
On Tue, Dec 18, 2018 at 5:59 PM Erick Erickson <erickerick...@gmail.com> wrote: > > A couple of additions: > > AddDVMPLuceneTest2 does not use Solr constructs at all, so is the test > we think is most interesting at this point, it won't lead anyone down > the path of "what's all this Solr stuff and is it right" kinds of > questions (believe me, we've spent some time on that path!). Please > feel free to look at all the rest of it of course, but the place we're > stuck is why this test fails. > > AddDvStress is intended as an integration-level test, it requires some > special setup (in particular providing a particular configset), we put > that together to reliably make the problem visible. We thought the new > code was the issue at first and needed something to narrow down the > possibilities... > > The reason we're obsessing about this is that it calls into question > how segments are merged when "things change". We don't understand why > this is happening at the Lucene level so don't know how to insure that > things like the schema API in Solr aren't affected. > > Andrzej isn't the only one running out of ideas ;). > > On Tue, Dec 18, 2018 at 4:46 AM Andrzej Białecki <a...@getopt.org> wrote: > > > > Hi, > > > > I'm working on a use case where an existing Solr setup needs to migrate to > > a schema that uses docValues for faceting, instead of uninversion. This > > case fits into a broader subject of SOLR-12259 (Robustly upgrade indexes). > > However, in this case there are two major requirements for this migration > > process: > > > > * data cannot be reindexed from scratch - I need to work with the already > > indexed documents (which do contain the values needed for faceting, but > > these values are simply indexed and not stored as doc values) > > > > * indexing can’t be stopped while the schema is being changed (the > > conversion process needs to work on-the-fly while the collection is online, > > both for searching and for updates). Collection reloads / reopening is ok > > but it’s not ok to take the collection offline for several minutes (or > > hours). > > > > Together with Erick Erickson we implemented a solution that uses > > MergePolicy (actually MergePolicyFactory in Solr) to enforce re-writing of > > segments that no longer match the schema, ie. don’t contain docValues in a > > field where the new schema requires it. This merge policy determines what > > segments need this conversion and then forces the “merging” (actually > > re-writing) of these segments by first wrapping them into UninvertingReader > > to supply docValues where they are required by the new schema but actually > > are missing in the segment’s data. This “AddDocValuesMergePolicy” (ADVMP > > for short) is supposed to deal with the following types of segments: > > > > * old segments created before the schema change - these don’t contain any > > docValues in the target fields and so they are wrapped in UninvertingReader > > for merging (and for searching) according to the new schema. > > > > * new segments created after the schema change - if FieldInfo-s for these > > fields claim that the segment already contains docValues where it should > > then the segment is passed as-is to merging, otherwise it’s wrapped again. > > An optimisation was also put here to “mark” the already converted segments > > using a marker in SegmentInfo diagnostics map so that we can avoid > > re-checking and re-converting already converted data. > > > > So, long story short, this process works very well when there’s no > > concurrent indexing activity - all old segments are properly wrapped and > > re-written and merging with new segments works as intended. However, in a > > situation with concurrent indexing it works well but only for a short > > while. At some point this conversion process seems to lose large percentage > > of the docValues, even though it seems that at all points the source > > segments are properly wrapped - the ADVMP merge policy adds a lot of > > debugging information to track the source and type of segments across many > > levels of merging and whether they were wrapped or not. > > > > My working theory is that somehow this schema change produces > > “franken-segments” (while they still haven’t been flushed) where only some > > of the most recent docs have the docValues and earlier ones don’t. As I > > understand it, this should not happen in Solr because a schema change > > results in a core reload. The tracking information from ADVMP seems to > > indicate that all generations of segments, both those that were flushed and > > merged earlier, have been properly wrapped. > > > > My alternate theory is that there’s some bug in the doc values merging > > process when UninvertingReader is involved, because this problem occurs > > also when we modify ADVMP to always force the wrapping of all segments in > > UninvertingReader-s. The percentage of lost doc values is sometimes quite > > large, up to 50%, perhaps it’s a bug somewhere where the code accounts for > > the presence of doc values in FieldCacheImpl? > > > > Together with Erick we implemented a bunch of tests that illustrate this > > issue - both the tests and the code can be found on branch > > "jira/solr-12259": > > > > * code.tests.AddDVMPLuceneTest2 - this is a pure Lucene test that shows how > > doc values are lost after several rounds of merging while concurrent > > indexing is going on. This failure is reproducible 100%. > > > > * code.tests.AddDvStress - this is a Solr test that repeatedly creates a > > collection without doc values, starts the indexing, changes the config to > > use ADVMP, makes the schema change to turn doc values on, and verifies the > > number of facets on the target field. This test also fails after a while > > with the same symptoms as the Lucene one, so I think that solving the > > Lucene test failure should solve this failure too. > > > > Any suggestions or insights are very much appreciated - I'm running out of > > ideas to try... > > > > — > > > > Andrzej Białecki > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org