OK I ran the test and saw the failure, thank you!
I think I understand why you are seeing what you are seeing.
First off, you are not actually using an NRT reader when
hardReopenBeforeDVUpdate is false, because in readerReopenIfChanged,
when oldReader == null, you must do:
return DirectoryReader.open(writer2, true);
so that your initial reader is in fact NRT. All subsequent reopens
from then on will then be NRT.
When I make that change to your test, it seems to pass (or at least
run for much longer than it did before...).
However, if I remove the writer.commit() before the reopen, the test
fails. The reason is that IW commit and NRT reader reopen do not
reflect merges "just kicked off" by that flush, even when using SMS.
So, there will always be this "off by 1", in that you'll get a reader
with 10 segments (pre-merge) not 1 segment (post-merge).
One possible workaround here w/o having to call crazy-expensive commit
would be to call reopenIfChanged twice in a row (and fix your reopen
method to properly handle null return from openIfChanged); when I
tried that in your test, it also seemed to run forever...
Mike McCandless
http://blog.mikemccandless.com
On Fri, Sep 26, 2014 at 2:44 PM, Mikhail Khludnev
<[email protected]> wrote:
>
>
> On Fri, Sep 26, 2014 at 7:07 PM, Michael McCandless
> <[email protected]> wrote:
>>
>> Sorry I can't make heads or tails of what you are saying here ... can
>> you maybe make a small test case that fails with "ant test"? Boil it
>> down as much as possible...
>
>
> Sure. I'm really sorry for being so confusing.
> I changed constant
> https://github.com/m-khl/lucene-merge-visibility/commit/a4a01c2c91d9c30850602b8dddf23de5363c4851#diff-86ebfbf440fe69ee36a52705cb92b824R44
> to make it fail.
> the branch reader-vs-merge at
> https://github.com/m-khl/lucene-merge-visibility/tree/reader-vs-merge
> in lucene/core there is a failed test
> $> ant test -Dtestcase=TestNumDValUpdVsReaderVisibility
>
> it's verbose, because it uses sysout as infostream.
> [junit4] FAILURE 2.40s | TestNumDValUpdVsReaderVisibility.testSimple <<<
> [junit4] > Throwable #1: java.lang.AssertionError: failed on id:doc-18
> expected:<17> but was:<18>
> [junit4] > at
> __randomizedtesting.SeedInfo.seed([73A18231908F4ADC:4B12A6CFB77C9E0D]:0)
> [junit4] > at
> org.apache.lucene.index.TestNumDValUpdVsReaderVisibility.testSimple(TestNumDValUpdVsReaderVisibility.java:134)
>
>
>>
>>
>> The gist seems to be if you use an NRT reader something fails, but if
>> you instead open a new reader, that something passes?
>
> I don't use NTR, and perhaps it's a solution. I just don't know how to do
> that.
> Note: closing writer, open reader - works (but I suppose it's slow); just
> committing and reopening reader - it fails;
>>
>> But what
>> exactly is failing?
>
> - let I have merge factor 10 and SerialMergeSceduler.
> - I did 9 commits already and have 9 segments in the index
> - I add a few docs and commit
> - 10th commit triggers merge synchronously, it's done.
> - now if I reopen reader it see 10 unmerged segments (merged single segment
> index, isn't visible for reopen) /*test FAILS*/
> - but if I fully close writer&reader and open reader, I've got single
> segment merged index. /*test PASS */
>
> - usually such behavior gets no probs, it's reasonable, and fine.
> - but I do a mad thing
> - I use that reader (with 10 segments) to get docnum and write it as a
> docvalue;
> - after I commit only docvalues update (no docs update) and reopen reader,
> I've got single segment index, which was already written by merge at the
> previous commit.
> - and here is a problem because a docnum obtained at 10 segments index,
> doesn't match to docnum at single segment index (there was a deletion)
>
>>
>> And what is a "solid" segment here?
>
> I meant an index contains of single segment, at contrast from index contains
> of many ones.
>
> Thank you!
>>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Sep 25, 2014 at 6:00 PM, Mikhail Khludnev
>> <[email protected]> wrote:
>> > Hello Mike!
>> >
>> > Thanks for your attention.
>> > I pushed the mad case at
>> >
>> > https://github.com/m-khl/lucene-merge-visibility/commit/fa2d60be5b13eb57e0527c843119cf62cfa83a7d#diff-86ebfbf440fe69ee36a52705cb92b824R120
>> >
>> > it does the following
>> >
>> > - writes a pair of doc
>> > - commit
>> > - reopen reader, searches for one of them
>> > - update this doc with its' docnum (I know it's weird, but I should work
>> > if
>> > reopened reader sees that update)
>> > - commit this DV update
>> > - search that doc and check the written doc val.
>> > it passes if hardReopenBeforeDVUpdate=true and fails otherwise
>> >
>> > I know that changing docnum is natural, but I expect it doesnt change
>> > while
>> > I update docvals.
>> > here how it flips:
>> > at the commit after doc update we have many segments
>> >
>> > now checkpoint "_0(6.0.0):C2/1:delGen=1:fieldInfosGen=1:dvGen=1
>> > _1(6.0.0):C2:fieldInfosGen=1:dvGen=1 _2(6.0.0):C2:
>> > commit: wrote segments file "segments_j"
>> >
>> > but also there is a solid segment, which is merged but haven't
>> > committed/published
>> > after commitMerge: _a(6.0.0):c19
>> >
>> > and after DV update commit we have that solid segment visible
>> >
>> > now checkpoint "_a(6.0.0):c19:fieldInfosGen=1:dvGen=1" [1 segments ;
>> > isCommit = true]
>> > IFD 0 [Thu Sep 25 23:56:22 SAST 2014;
>> >
>> > TEST-TestNumDValUpdVsReaderVisibility.testSimple-seed#[6131CF35B3A45FC3]]:
>> > deleteCommits: now decRef commit "segments_j"
>> > ...
>> > wrote segments file "segments_k"
>> >
>> > I'm using SerialMergeScheduler, and expect to see single solid segment
>> > after
>> > I commit document updates and it triggers the merge.
>> > How I can reopen reader which sees it?
>> > Thanks
>> >
>> >
>> > On Wed, Sep 24, 2014 at 10:07 PM, Michael McCandless
>> > <[email protected]> wrote:
>> >>
>> >> I don't understand what's actually happening / going wrong here.
>> >>
>> >> Maybe you can make a test case / give more details?
>> >>
>> >> What assertions are broken? Why is it bad if SMS does a merge before
>> >> you reopen? Why are you using SMS :)
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> On Mon, Sep 22, 2014 at 6:00 PM, Mikhail Khludnev
>> >> <[email protected]> wrote:
>> >> > Hello!
>> >> > I'm in trouble with Lucene Index Writer. I'm benchmarking some
>> >> > algorithm
>> >> > which might seem like NRT-case, but I'm not sure that I need it
>> >> > particularly. The overall problem is to writing "join index" (column
>> >> > holds
>> >> > docnums) via updating binary docvalues after commit. i.e.:
>> >> > - update docs
>> >> > - commit
>> >> > - read docs (openIfChanged() before )
>> >> > - updateDocVals
>> >> > - commit
>> >> >
>> >> > It's clunky but it works, until guess what happens... merge.Oh my.
>> >> >
>> >> > Once a time I have segments
>> >> > segments_ec:2090 _7c(5.0):C117/8:delGen=8:....
>> >> > _7j(5.0):C1:fieldInfosGen=1:dvGen=1 _7k(5.0):C1)
>> >> >
>> >> > I apply one update and trigger commit, as a result I have:
>> >> > segments_ee:2102 _7c(5.0):C117/9:delGen=9:..
>> >> > _7k(5.0):C1:fieldInfosGen=1:dvGen=1 _7l(5.0):C1)
>> >> >
>> >> > however, somewhere inside of the this commit call, pretty
>> >> > SerialMergeScheduler bakes the single solid segment
>> >> > _7m(5.0):C117
>> >> > however, it wasn't exposed in via any segments file so far.
>> >> >
>> >> > And now I get into trouble:
>> >> > if I call DR.openIfChanged(segments_ec) (even after IW.waitMerges()),
>> >> > I've
>> >> > got segments_ee that's fairly reasonable, to keep it incremental and
>> >> > fast.
>> >> > but if I use that IndexWriter, it applies new updates on top of that
>> >> > merged
>> >> > one (_7m(5.0):C117), not on segments_ee. And it broke my assertions.
>> >> > I
>> >> > rather need to open reader of that merged _7m(5.0):C117, which IW
>> >> > keeps
>> >> > somewhere internally, and it's better to do if fancy&incremental. If
>> >> > you
>> >> > can
>> >> > point me on how NRT can solve I'd happy to switch on it.
>> >> >
>> >> > Incredibly thank you for your time!!!
>> >> >
>> >> > --
>> >> > Sincerely yours
>> >> > Mikhail Khludnev
>> >> > Principal Engineer,
>> >> > Grid Dynamics
>> >> >
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> > Principal Engineer,
>> > Grid Dynamics
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]