SOLR-12259: online index schema modification - adding docValues to existing indexed data?

2018-12-18 Thread Andrzej Białecki
Hi,

I'm working on a use case where an existing Solr setup needs to migrate to a 
schema that uses docValues for faceting, instead of uninversion. This case fits 
into a broader subject of SOLR-12259 (Robustly upgrade indexes). However, in 
this case there are two major requirements for this migration process:

* data cannot be reindexed from scratch - I need to work with the already 
indexed documents (which do contain the values needed for faceting, but these 
values are simply indexed and not stored as doc values)

* indexing can’t be stopped while the schema is being changed (the conversion 
process needs to work on-the-fly while the collection is online, both for 
searching and for updates). Collection reloads / reopening is ok but it’s not 
ok to take the collection offline for several minutes (or hours).

Together with Erick Erickson we implemented a solution that uses MergePolicy 
(actually MergePolicyFactory in Solr) to enforce re-writing of segments that no 
longer match the schema, ie. don’t contain docValues in a field where the new 
schema requires it. This merge policy determines what segments need this 
conversion and then forces the “merging” (actually re-writing) of these 
segments by first wrapping them into UninvertingReader to supply docValues 
where they are required by the new schema but actually are missing in the 
segment’s data. This “AddDocValuesMergePolicy” (ADVMP for short) is supposed to 
deal with the following types of segments:

* old segments created before the schema change - these don’t contain any 
docValues in the target fields and so they are wrapped in UninvertingReader for 
merging (and for searching) according to the new schema.

* new segments created after the schema change - if FieldInfo-s for these 
fields claim that the segment already contains docValues where it should then 
the segment is passed as-is to merging, otherwise it’s wrapped again. An 
optimisation was also put here to “mark” the already converted segments using a 
marker in SegmentInfo diagnostics map so that we can avoid re-checking and 
re-converting already converted data.

So, long story short, this process works very well when there’s no concurrent 
indexing activity - all old segments are properly wrapped and re-written and 
merging with new segments works as intended. However, in a situation with 
concurrent indexing it works well but only for a short while. At some point 
this conversion process seems to lose large percentage of the docValues, even 
though it seems that at all points the source segments are properly wrapped - 
the ADVMP merge policy adds a lot of debugging information to track the source 
and type of segments across many levels of merging and whether they were 
wrapped or not.

My working theory is that somehow this schema change produces 
“franken-segments” (while they still haven’t been flushed) where only some of 
the most recent docs have the docValues and earlier ones don’t. As I understand 
it, this should not happen in Solr because a schema change results in a core 
reload. The tracking information from ADVMP  seems to indicate that all 
generations of segments, both those that were flushed and merged earlier, have 
been properly wrapped.

My alternate theory is that there’s some bug in the doc values merging process 
when UninvertingReader is involved, because this problem occurs also when we 
modify ADVMP to always force the wrapping of all segments in 
UninvertingReader-s. The percentage of lost doc values is sometimes quite 
large, up to 50%, perhaps it’s a bug somewhere where the code accounts for the 
presence of doc values in FieldCacheImpl?

Together with Erick we implemented a bunch of tests that illustrate this issue 
- both the tests and the code can be found on branch "jira/solr-12259":

* code.tests.AddDVMPLuceneTest2 - this is a pure Lucene test that shows how doc 
values are lost after several rounds of merging while concurrent indexing is 
going on. This failure is reproducible 100%.

* code.tests.AddDvStress - this is a Solr test that repeatedly creates a 
collection without doc values, starts the indexing, changes the config to use 
ADVMP, makes the schema change to turn doc values on, and verifies the number 
of facets on the target field. This test also fails after a while with the same 
symptoms as the Lucene one, so I think that solving the Lucene test failure 
should solve this failure too.

Any suggestions or insights are very much appreciated - I'm running out of 
ideas to try...

—

Andrzej Białecki



Re: SOLR-12259: online index schema modification - adding docValues to existing indexed data?

2018-12-18 Thread Erick Erickson
A couple of additions:

AddDVMPLuceneTest2 does not use Solr constructs at all, so is the test
we think is most interesting at this point, it won't lead anyone down
the path of "what's all this Solr stuff and is it right" kinds of
questions (believe me, we've spent some time on that path!). Please
feel free to look at all the rest of it of course, but the place we're
stuck is why this test fails.

AddDvStress is intended as an integration-level test, it requires some
special setup (in particular providing a particular configset), we put
that together to reliably make the problem visible. We thought the new
code was the issue at first and needed something to narrow down the
possibilities...

The reason we're obsessing about this is that it calls into question
how segments are merged when "things change". We don't understand why
this is happening at the Lucene level so don't know how to insure that
things like the schema API in Solr aren't affected.

Andrzej isn't the only one running out of ideas ;).

On Tue, Dec 18, 2018 at 4:46 AM Andrzej Białecki  wrote:
>
> Hi,
>
> I'm working on a use case where an existing Solr setup needs to migrate to a 
> schema that uses docValues for faceting, instead of uninversion. This case 
> fits into a broader subject of SOLR-12259 (Robustly upgrade indexes). 
> However, in this case there are two major requirements for this migration 
> process:
>
> * data cannot be reindexed from scratch - I need to work with the already 
> indexed documents (which do contain the values needed for faceting, but these 
> values are simply indexed and not stored as doc values)
>
> * indexing can’t be stopped while the schema is being changed (the conversion 
> process needs to work on-the-fly while the collection is online, both for 
> searching and for updates). Collection reloads / reopening is ok but it’s not 
> ok to take the collection offline for several minutes (or hours).
>
> Together with Erick Erickson we implemented a solution that uses MergePolicy 
> (actually MergePolicyFactory in Solr) to enforce re-writing of segments that 
> no longer match the schema, ie. don’t contain docValues in a field where the 
> new schema requires it. This merge policy determines what segments need this 
> conversion and then forces the “merging” (actually re-writing) of these 
> segments by first wrapping them into UninvertingReader to supply docValues 
> where they are required by the new schema but actually are missing in the 
> segment’s data. This “AddDocValuesMergePolicy” (ADVMP for short) is supposed 
> to deal with the following types of segments:
>
> * old segments created before the schema change - these don’t contain any 
> docValues in the target fields and so they are wrapped in UninvertingReader 
> for merging (and for searching) according to the new schema.
>
> * new segments created after the schema change - if FieldInfo-s for these 
> fields claim that the segment already contains docValues where it should then 
> the segment is passed as-is to merging, otherwise it’s wrapped again. An 
> optimisation was also put here to “mark” the already converted segments using 
> a marker in SegmentInfo diagnostics map so that we can avoid re-checking and 
> re-converting already converted data.
>
> So, long story short, this process works very well when there’s no concurrent 
> indexing activity - all old segments are properly wrapped and re-written and 
> merging with new segments works as intended. However, in a situation with 
> concurrent indexing it works well but only for a short while. At some point 
> this conversion process seems to lose large percentage of the docValues, even 
> though it seems that at all points the source segments are properly wrapped - 
> the ADVMP merge policy adds a lot of debugging information to track the 
> source and type of segments across many levels of merging and whether they 
> were wrapped or not.
>
> My working theory is that somehow this schema change produces 
> “franken-segments” (while they still haven’t been flushed) where only some of 
> the most recent docs have the docValues and earlier ones don’t. As I 
> understand it, this should not happen in Solr because a schema change results 
> in a core reload. The tracking information from ADVMP  seems to indicate that 
> all generations of segments, both those that were flushed and merged earlier, 
> have been properly wrapped.
>
> My alternate theory is that there’s some bug in the doc values merging 
> process when UninvertingReader is involved, because this problem occurs also 
> when we modify ADVMP to always force the wrapping of all segments in 
> UninvertingReader-s. The percentage of lost doc values is sometimes quite 
> large, up to 50%, perhaps it’s a bug somewhere where the code accounts for 
> the presence of doc values in FieldCacheImpl?
>
> Together with Erick we implemented a bunch of tests that illustrate this 
> issue - both the tests and the code can be found on branch "jira/solr-1

Re: SOLR-12259: online index schema modification - adding docValues to existing indexed data?

2018-12-18 Thread Adrien Grand
I had a quick look and couldn't find anything to prevent what you
called “franken-segments” in the Lucene test?

On Tue, Dec 18, 2018 at 5:59 PM Erick Erickson  wrote:
>
> A couple of additions:
>
> AddDVMPLuceneTest2 does not use Solr constructs at all, so is the test
> we think is most interesting at this point, it won't lead anyone down
> the path of "what's all this Solr stuff and is it right" kinds of
> questions (believe me, we've spent some time on that path!). Please
> feel free to look at all the rest of it of course, but the place we're
> stuck is why this test fails.
>
> AddDvStress is intended as an integration-level test, it requires some
> special setup (in particular providing a particular configset), we put
> that together to reliably make the problem visible. We thought the new
> code was the issue at first and needed something to narrow down the
> possibilities...
>
> The reason we're obsessing about this is that it calls into question
> how segments are merged when "things change". We don't understand why
> this is happening at the Lucene level so don't know how to insure that
> things like the schema API in Solr aren't affected.
>
> Andrzej isn't the only one running out of ideas ;).
>
> On Tue, Dec 18, 2018 at 4:46 AM Andrzej Białecki  wrote:
> >
> > Hi,
> >
> > I'm working on a use case where an existing Solr setup needs to migrate to 
> > a schema that uses docValues for faceting, instead of uninversion. This 
> > case fits into a broader subject of SOLR-12259 (Robustly upgrade indexes). 
> > However, in this case there are two major requirements for this migration 
> > process:
> >
> > * data cannot be reindexed from scratch - I need to work with the already 
> > indexed documents (which do contain the values needed for faceting, but 
> > these values are simply indexed and not stored as doc values)
> >
> > * indexing can’t be stopped while the schema is being changed (the 
> > conversion process needs to work on-the-fly while the collection is online, 
> > both for searching and for updates). Collection reloads / reopening is ok 
> > but it’s not ok to take the collection offline for several minutes (or 
> > hours).
> >
> > Together with Erick Erickson we implemented a solution that uses 
> > MergePolicy (actually MergePolicyFactory in Solr) to enforce re-writing of 
> > segments that no longer match the schema, ie. don’t contain docValues in a 
> > field where the new schema requires it. This merge policy determines what 
> > segments need this conversion and then forces the “merging” (actually 
> > re-writing) of these segments by first wrapping them into UninvertingReader 
> > to supply docValues where they are required by the new schema but actually 
> > are missing in the segment’s data. This “AddDocValuesMergePolicy” (ADVMP 
> > for short) is supposed to deal with the following types of segments:
> >
> > * old segments created before the schema change - these don’t contain any 
> > docValues in the target fields and so they are wrapped in UninvertingReader 
> > for merging (and for searching) according to the new schema.
> >
> > * new segments created after the schema change - if FieldInfo-s for these 
> > fields claim that the segment already contains docValues where it should 
> > then the segment is passed as-is to merging, otherwise it’s wrapped again. 
> > An optimisation was also put here to “mark” the already converted segments 
> > using a marker in SegmentInfo diagnostics map so that we can avoid 
> > re-checking and re-converting already converted data.
> >
> > So, long story short, this process works very well when there’s no 
> > concurrent indexing activity - all old segments are properly wrapped and 
> > re-written and merging with new segments works as intended. However, in a 
> > situation with concurrent indexing it works well but only for a short 
> > while. At some point this conversion process seems to lose large percentage 
> > of the docValues, even though it seems that at all points the source 
> > segments are properly wrapped - the ADVMP merge policy adds a lot of 
> > debugging information to track the source and type of segments across many 
> > levels of merging and whether they were wrapped or not.
> >
> > My working theory is that somehow this schema change produces 
> > “franken-segments” (while they still haven’t been flushed) where only some 
> > of the most recent docs have the docValues and earlier ones don’t. As I 
> > understand it, this should not happen in Solr because a schema change 
> > results in a core reload. The tracking information from ADVMP  seems to 
> > indicate that all generations of segments, both those that were flushed and 
> > merged earlier, have been properly wrapped.
> >
> > My alternate theory is that there’s some bug in the doc values merging 
> > process when UninvertingReader is involved, because this problem occurs 
> > also when we modify ADVMP to always force the wrapping of all segments in 
> > UninvertingRea

Re: SOLR-12259: online index schema modification - adding docValues to existing indexed data?

2018-12-18 Thread Andrzej Białecki
The unexpected part is that I would have expected the code to handle 
franken-segments as well, because at some point we finally resorted to always 
forcing the wrapping even for segments that don’t need it (ie. they claim the 
field contains DVs) but the test is still failing.

> On 18 Dec 2018, at 19:05, Adrien Grand  wrote:
> 
> I had a quick look and couldn't find anything to prevent what you
> called “franken-segments” in the Lucene test?
> 
> On Tue, Dec 18, 2018 at 5:59 PM Erick Erickson  
> wrote:
>> 
>> A couple of additions:
>> 
>> AddDVMPLuceneTest2 does not use Solr constructs at all, so is the test
>> we think is most interesting at this point, it won't lead anyone down
>> the path of "what's all this Solr stuff and is it right" kinds of
>> questions (believe me, we've spent some time on that path!). Please
>> feel free to look at all the rest of it of course, but the place we're
>> stuck is why this test fails.
>> 
>> AddDvStress is intended as an integration-level test, it requires some
>> special setup (in particular providing a particular configset), we put
>> that together to reliably make the problem visible. We thought the new
>> code was the issue at first and needed something to narrow down the
>> possibilities...
>> 
>> The reason we're obsessing about this is that it calls into question
>> how segments are merged when "things change". We don't understand why
>> this is happening at the Lucene level so don't know how to insure that
>> things like the schema API in Solr aren't affected.
>> 
>> Andrzej isn't the only one running out of ideas ;).
>> 
>> On Tue, Dec 18, 2018 at 4:46 AM Andrzej Białecki  wrote:
>>> 
>>> Hi,
>>> 
>>> I'm working on a use case where an existing Solr setup needs to migrate to 
>>> a schema that uses docValues for faceting, instead of uninversion. This 
>>> case fits into a broader subject of SOLR-12259 (Robustly upgrade indexes). 
>>> However, in this case there are two major requirements for this migration 
>>> process:
>>> 
>>> * data cannot be reindexed from scratch - I need to work with the already 
>>> indexed documents (which do contain the values needed for faceting, but 
>>> these values are simply indexed and not stored as doc values)
>>> 
>>> * indexing can’t be stopped while the schema is being changed (the 
>>> conversion process needs to work on-the-fly while the collection is online, 
>>> both for searching and for updates). Collection reloads / reopening is ok 
>>> but it’s not ok to take the collection offline for several minutes (or 
>>> hours).
>>> 
>>> Together with Erick Erickson we implemented a solution that uses 
>>> MergePolicy (actually MergePolicyFactory in Solr) to enforce re-writing of 
>>> segments that no longer match the schema, ie. don’t contain docValues in a 
>>> field where the new schema requires it. This merge policy determines what 
>>> segments need this conversion and then forces the “merging” (actually 
>>> re-writing) of these segments by first wrapping them into UninvertingReader 
>>> to supply docValues where they are required by the new schema but actually 
>>> are missing in the segment’s data. This “AddDocValuesMergePolicy” (ADVMP 
>>> for short) is supposed to deal with the following types of segments:
>>> 
>>> * old segments created before the schema change - these don’t contain any 
>>> docValues in the target fields and so they are wrapped in UninvertingReader 
>>> for merging (and for searching) according to the new schema.
>>> 
>>> * new segments created after the schema change - if FieldInfo-s for these 
>>> fields claim that the segment already contains docValues where it should 
>>> then the segment is passed as-is to merging, otherwise it’s wrapped again. 
>>> An optimisation was also put here to “mark” the already converted segments 
>>> using a marker in SegmentInfo diagnostics map so that we can avoid 
>>> re-checking and re-converting already converted data.
>>> 
>>> So, long story short, this process works very well when there’s no 
>>> concurrent indexing activity - all old segments are properly wrapped and 
>>> re-written and merging with new segments works as intended. However, in a 
>>> situation with concurrent indexing it works well but only for a short 
>>> while. At some point this conversion process seems to lose large percentage 
>>> of the docValues, even though it seems that at all points the source 
>>> segments are properly wrapped - the ADVMP merge policy adds a lot of 
>>> debugging information to track the source and type of segments across many 
>>> levels of merging and whether they were wrapped or not.
>>> 
>>> My working theory is that somehow this schema change produces 
>>> “franken-segments” (while they still haven’t been flushed) where only some 
>>> of the most recent docs have the docValues and earlier ones don’t. As I 
>>> understand it, this should not happen in Solr because a schema change 
>>> results in a core reload. The tracking information from ADVMP  seems to 
>

Re: SOLR-12259: online index schema modification - adding docValues to existing indexed data?

2018-12-18 Thread Adrien Grand
UninvertingReader#wrap seems to skip uninverting if the field to
uninvert already has doc values of the expected type?

On Tue, Dec 18, 2018 at 8:24 PM Andrzej Białecki
 wrote:
>
> The unexpected part is that I would have expected the code to handle 
> franken-segments as well, because at some point we finally resorted to always 
> forcing the wrapping even for segments that don’t need it (ie. they claim the 
> field contains DVs) but the test is still failing.
>
> On 18 Dec 2018, at 19:05, Adrien Grand  wrote:
>
> I had a quick look and couldn't find anything to prevent what you
> called “franken-segments” in the Lucene test?
>
> On Tue, Dec 18, 2018 at 5:59 PM Erick Erickson  
> wrote:
>
>
> A couple of additions:
>
> AddDVMPLuceneTest2 does not use Solr constructs at all, so is the test
> we think is most interesting at this point, it won't lead anyone down
> the path of "what's all this Solr stuff and is it right" kinds of
> questions (believe me, we've spent some time on that path!). Please
> feel free to look at all the rest of it of course, but the place we're
> stuck is why this test fails.
>
> AddDvStress is intended as an integration-level test, it requires some
> special setup (in particular providing a particular configset), we put
> that together to reliably make the problem visible. We thought the new
> code was the issue at first and needed something to narrow down the
> possibilities...
>
> The reason we're obsessing about this is that it calls into question
> how segments are merged when "things change". We don't understand why
> this is happening at the Lucene level so don't know how to insure that
> things like the schema API in Solr aren't affected.
>
> Andrzej isn't the only one running out of ideas ;).
>
> On Tue, Dec 18, 2018 at 4:46 AM Andrzej Białecki  wrote:
>
>
> Hi,
>
> I'm working on a use case where an existing Solr setup needs to migrate to a 
> schema that uses docValues for faceting, instead of uninversion. This case 
> fits into a broader subject of SOLR-12259 (Robustly upgrade indexes). 
> However, in this case there are two major requirements for this migration 
> process:
>
> * data cannot be reindexed from scratch - I need to work with the already 
> indexed documents (which do contain the values needed for faceting, but these 
> values are simply indexed and not stored as doc values)
>
> * indexing can’t be stopped while the schema is being changed (the conversion 
> process needs to work on-the-fly while the collection is online, both for 
> searching and for updates). Collection reloads / reopening is ok but it’s not 
> ok to take the collection offline for several minutes (or hours).
>
> Together with Erick Erickson we implemented a solution that uses MergePolicy 
> (actually MergePolicyFactory in Solr) to enforce re-writing of segments that 
> no longer match the schema, ie. don’t contain docValues in a field where the 
> new schema requires it. This merge policy determines what segments need this 
> conversion and then forces the “merging” (actually re-writing) of these 
> segments by first wrapping them into UninvertingReader to supply docValues 
> where they are required by the new schema but actually are missing in the 
> segment’s data. This “AddDocValuesMergePolicy” (ADVMP for short) is supposed 
> to deal with the following types of segments:
>
> * old segments created before the schema change - these don’t contain any 
> docValues in the target fields and so they are wrapped in UninvertingReader 
> for merging (and for searching) according to the new schema.
>
> * new segments created after the schema change - if FieldInfo-s for these 
> fields claim that the segment already contains docValues where it should then 
> the segment is passed as-is to merging, otherwise it’s wrapped again. An 
> optimisation was also put here to “mark” the already converted segments using 
> a marker in SegmentInfo diagnostics map so that we can avoid re-checking and 
> re-converting already converted data.
>
> So, long story short, this process works very well when there’s no concurrent 
> indexing activity - all old segments are properly wrapped and re-written and 
> merging with new segments works as intended. However, in a situation with 
> concurrent indexing it works well but only for a short while. At some point 
> this conversion process seems to lose large percentage of the docValues, even 
> though it seems that at all points the source segments are properly wrapped - 
> the ADVMP merge policy adds a lot of debugging information to track the 
> source and type of segments across many levels of merging and whether they 
> were wrapped or not.
>
> My working theory is that somehow this schema change produces 
> “franken-segments” (while they still haven’t been flushed) where only some of 
> the most recent docs have the docValues and earlier ones don’t. As I 
> understand it, this should not happen in Solr because a schema change results 
> in a core reload. The tracki

Re: SOLR-12259: online index schema modification - adding docValues to existing indexed data?

2018-12-18 Thread Andrzej Białecki
Right - the code on this branch is a port from another branch with other 
changes, too - among others a modified UninvertingReader that discards 
docValues even when FieldInfo claims the field has them (ie it always wraps 
when the mapping says so). Even with that change the results are the same. 

> On 18 Dec 2018, at 20:36, Adrien Grand  wrote:
> 
> UninvertingReader#wrap seems to skip uninverting if the field to
> uninvert already has doc values of the expected type?
> 
> On Tue, Dec 18, 2018 at 8:24 PM Andrzej Białecki
>  wrote:
>> 
>> The unexpected part is that I would have expected the code to handle 
>> franken-segments as well, because at some point we finally resorted to 
>> always forcing the wrapping even for segments that don’t need it (ie. they 
>> claim the field contains DVs) but the test is still failing.
>> 
>> On 18 Dec 2018, at 19:05, Adrien Grand  wrote:
>> 
>> I had a quick look and couldn't find anything to prevent what you
>> called “franken-segments” in the Lucene test?
>> 
>> On Tue, Dec 18, 2018 at 5:59 PM Erick Erickson  
>> wrote:
>> 
>> 
>> A couple of additions:
>> 
>> AddDVMPLuceneTest2 does not use Solr constructs at all, so is the test
>> we think is most interesting at this point, it won't lead anyone down
>> the path of "what's all this Solr stuff and is it right" kinds of
>> questions (believe me, we've spent some time on that path!). Please
>> feel free to look at all the rest of it of course, but the place we're
>> stuck is why this test fails.
>> 
>> AddDvStress is intended as an integration-level test, it requires some
>> special setup (in particular providing a particular configset), we put
>> that together to reliably make the problem visible. We thought the new
>> code was the issue at first and needed something to narrow down the
>> possibilities...
>> 
>> The reason we're obsessing about this is that it calls into question
>> how segments are merged when "things change". We don't understand why
>> this is happening at the Lucene level so don't know how to insure that
>> things like the schema API in Solr aren't affected.
>> 
>> Andrzej isn't the only one running out of ideas ;).
>> 
>> On Tue, Dec 18, 2018 at 4:46 AM Andrzej Białecki  wrote:
>> 
>> 
>> Hi,
>> 
>> I'm working on a use case where an existing Solr setup needs to migrate to a 
>> schema that uses docValues for faceting, instead of uninversion. This case 
>> fits into a broader subject of SOLR-12259 (Robustly upgrade indexes). 
>> However, in this case there are two major requirements for this migration 
>> process:
>> 
>> * data cannot be reindexed from scratch - I need to work with the already 
>> indexed documents (which do contain the values needed for faceting, but 
>> these values are simply indexed and not stored as doc values)
>> 
>> * indexing can’t be stopped while the schema is being changed (the 
>> conversion process needs to work on-the-fly while the collection is online, 
>> both for searching and for updates). Collection reloads / reopening is ok 
>> but it’s not ok to take the collection offline for several minutes (or 
>> hours).
>> 
>> Together with Erick Erickson we implemented a solution that uses MergePolicy 
>> (actually MergePolicyFactory in Solr) to enforce re-writing of segments that 
>> no longer match the schema, ie. don’t contain docValues in a field where the 
>> new schema requires it. This merge policy determines what segments need this 
>> conversion and then forces the “merging” (actually re-writing) of these 
>> segments by first wrapping them into UninvertingReader to supply docValues 
>> where they are required by the new schema but actually are missing in the 
>> segment’s data. This “AddDocValuesMergePolicy” (ADVMP for short) is supposed 
>> to deal with the following types of segments:
>> 
>> * old segments created before the schema change - these don’t contain any 
>> docValues in the target fields and so they are wrapped in UninvertingReader 
>> for merging (and for searching) according to the new schema.
>> 
>> * new segments created after the schema change - if FieldInfo-s for these 
>> fields claim that the segment already contains docValues where it should 
>> then the segment is passed as-is to merging, otherwise it’s wrapped again. 
>> An optimisation was also put here to “mark” the already converted segments 
>> using a marker in SegmentInfo diagnostics map so that we can avoid 
>> re-checking and re-converting already converted data.
>> 
>> So, long story short, this process works very well when there’s no 
>> concurrent indexing activity - all old segments are properly wrapped and 
>> re-written and merging with new segments works as intended. However, in a 
>> situation with concurrent indexing it works well but only for a short while. 
>> At some point this conversion process seems to lose large percentage of the 
>> docValues, even though it seems that at all points the source segments are 
>> properly wrapped - the ADVMP merge policy adds a lot