It's awesome! Incredibly thank you! On Sun, Sep 28, 2014 at 12:46 PM, Michael McCandless < luc...@mikemccandless.com> wrote:
> OK I ran the test and saw the failure, thank you! > > I think I understand why you are seeing what you are seeing. > > First off, you are not actually using an NRT reader when > hardReopenBeforeDVUpdate is false, because in readerReopenIfChanged, > when oldReader == null, you must do: > > return DirectoryReader.open(writer2, true); > > so that your initial reader is in fact NRT. All subsequent reopens > from then on will then be NRT. > > When I make that change to your test, it seems to pass (or at least > run for much longer than it did before...). > > However, if I remove the writer.commit() before the reopen, the test > fails. The reason is that IW commit and NRT reader reopen do not > reflect merges "just kicked off" by that flush, even when using SMS. > So, there will always be this "off by 1", in that you'll get a reader > with 10 segments (pre-merge) not 1 segment (post-merge). > > One possible workaround here w/o having to call crazy-expensive commit > would be to call reopenIfChanged twice in a row (and fix your reopen > method to properly handle null return from openIfChanged); when I > tried that in your test, it also seemed to run forever... > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Sep 26, 2014 at 2:44 PM, Mikhail Khludnev > <mkhlud...@griddynamics.com> wrote: > > > > > > On Fri, Sep 26, 2014 at 7:07 PM, Michael McCandless > > <luc...@mikemccandless.com> wrote: > >> > >> Sorry I can't make heads or tails of what you are saying here ... can > >> you maybe make a small test case that fails with "ant test"? Boil it > >> down as much as possible... > > > > > > Sure. I'm really sorry for being so confusing. > > I changed constant > > > https://github.com/m-khl/lucene-merge-visibility/commit/a4a01c2c91d9c30850602b8dddf23de5363c4851#diff-86ebfbf440fe69ee36a52705cb92b824R44 > > to make it fail. > > the branch reader-vs-merge at > > https://github.com/m-khl/lucene-merge-visibility/tree/reader-vs-merge > > in lucene/core there is a failed test > > $> ant test -Dtestcase=TestNumDValUpdVsReaderVisibility > > > > it's verbose, because it uses sysout as infostream. > > [junit4] FAILURE 2.40s | TestNumDValUpdVsReaderVisibility.testSimple > <<< > > [junit4] > Throwable #1: java.lang.AssertionError: failed on > id:doc-18 > > expected:<17> but was:<18> > > [junit4] > at > > __randomizedtesting.SeedInfo.seed([73A18231908F4ADC:4B12A6CFB77C9E0D]:0) > > [junit4] > at > > > org.apache.lucene.index.TestNumDValUpdVsReaderVisibility.testSimple(TestNumDValUpdVsReaderVisibility.java:134) > > > > > >> > >> > >> The gist seems to be if you use an NRT reader something fails, but if > >> you instead open a new reader, that something passes? > > > > I don't use NTR, and perhaps it's a solution. I just don't know how to do > > that. > > Note: closing writer, open reader - works (but I suppose it's slow); just > > committing and reopening reader - it fails; > >> > >> But what > >> exactly is failing? > > > > - let I have merge factor 10 and SerialMergeSceduler. > > - I did 9 commits already and have 9 segments in the index > > - I add a few docs and commit > > - 10th commit triggers merge synchronously, it's done. > > - now if I reopen reader it see 10 unmerged segments (merged single > segment > > index, isn't visible for reopen) /*test FAILS*/ > > - but if I fully close writer&reader and open reader, I've got single > > segment merged index. /*test PASS */ > > > > - usually such behavior gets no probs, it's reasonable, and fine. > > - but I do a mad thing > > - I use that reader (with 10 segments) to get docnum and write it as a > > docvalue; > > - after I commit only docvalues update (no docs update) and reopen > reader, > > I've got single segment index, which was already written by merge at the > > previous commit. > > - and here is a problem because a docnum obtained at 10 segments index, > > doesn't match to docnum at single segment index (there was a deletion) > > > >> > >> And what is a "solid" segment here? > > > > I meant an index contains of single segment, at contrast from index > contains > > of many ones. > > > > Thank you! > >> > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> > >> On Thu, Sep 25, 2014 at 6:00 PM, Mikhail Khludnev > >> <mkhlud...@griddynamics.com> wrote: > >> > Hello Mike! > >> > > >> > Thanks for your attention. > >> > I pushed the mad case at > >> > > >> > > https://github.com/m-khl/lucene-merge-visibility/commit/fa2d60be5b13eb57e0527c843119cf62cfa83a7d#diff-86ebfbf440fe69ee36a52705cb92b824R120 > >> > > >> > it does the following > >> > > >> > - writes a pair of doc > >> > - commit > >> > - reopen reader, searches for one of them > >> > - update this doc with its' docnum (I know it's weird, but I should > work > >> > if > >> > reopened reader sees that update) > >> > - commit this DV update > >> > - search that doc and check the written doc val. > >> > it passes if hardReopenBeforeDVUpdate=true and fails otherwise > >> > > >> > I know that changing docnum is natural, but I expect it doesnt change > >> > while > >> > I update docvals. > >> > here how it flips: > >> > at the commit after doc update we have many segments > >> > > >> > now checkpoint "_0(6.0.0):C2/1:delGen=1:fieldInfosGen=1:dvGen=1 > >> > _1(6.0.0):C2:fieldInfosGen=1:dvGen=1 _2(6.0.0):C2: > >> > commit: wrote segments file "segments_j" > >> > > >> > but also there is a solid segment, which is merged but haven't > >> > committed/published > >> > after commitMerge: _a(6.0.0):c19 > >> > > >> > and after DV update commit we have that solid segment visible > >> > > >> > now checkpoint "_a(6.0.0):c19:fieldInfosGen=1:dvGen=1" [1 segments ; > >> > isCommit = true] > >> > IFD 0 [Thu Sep 25 23:56:22 SAST 2014; > >> > > >> > > TEST-TestNumDValUpdVsReaderVisibility.testSimple-seed#[6131CF35B3A45FC3]]: > >> > deleteCommits: now decRef commit "segments_j" > >> > ... > >> > wrote segments file "segments_k" > >> > > >> > I'm using SerialMergeScheduler, and expect to see single solid segment > >> > after > >> > I commit document updates and it triggers the merge. > >> > How I can reopen reader which sees it? > >> > Thanks > >> > > >> > > >> > On Wed, Sep 24, 2014 at 10:07 PM, Michael McCandless > >> > <luc...@mikemccandless.com> wrote: > >> >> > >> >> I don't understand what's actually happening / going wrong here. > >> >> > >> >> Maybe you can make a test case / give more details? > >> >> > >> >> What assertions are broken? Why is it bad if SMS does a merge before > >> >> you reopen? Why are you using SMS :) > >> >> > >> >> Mike McCandless > >> >> > >> >> http://blog.mikemccandless.com > >> >> > >> >> On Mon, Sep 22, 2014 at 6:00 PM, Mikhail Khludnev > >> >> <mkhlud...@griddynamics.com> wrote: > >> >> > Hello! > >> >> > I'm in trouble with Lucene Index Writer. I'm benchmarking some > >> >> > algorithm > >> >> > which might seem like NRT-case, but I'm not sure that I need it > >> >> > particularly. The overall problem is to writing "join index" > (column > >> >> > holds > >> >> > docnums) via updating binary docvalues after commit. i.e.: > >> >> > - update docs > >> >> > - commit > >> >> > - read docs (openIfChanged() before ) > >> >> > - updateDocVals > >> >> > - commit > >> >> > > >> >> > It's clunky but it works, until guess what happens... merge.Oh my. > >> >> > > >> >> > Once a time I have segments > >> >> > segments_ec:2090 _7c(5.0):C117/8:delGen=8:.... > >> >> > _7j(5.0):C1:fieldInfosGen=1:dvGen=1 _7k(5.0):C1) > >> >> > > >> >> > I apply one update and trigger commit, as a result I have: > >> >> > segments_ee:2102 _7c(5.0):C117/9:delGen=9:.. > >> >> > _7k(5.0):C1:fieldInfosGen=1:dvGen=1 _7l(5.0):C1) > >> >> > > >> >> > however, somewhere inside of the this commit call, pretty > >> >> > SerialMergeScheduler bakes the single solid segment > >> >> > _7m(5.0):C117 > >> >> > however, it wasn't exposed in via any segments file so far. > >> >> > > >> >> > And now I get into trouble: > >> >> > if I call DR.openIfChanged(segments_ec) (even after > IW.waitMerges()), > >> >> > I've > >> >> > got segments_ee that's fairly reasonable, to keep it incremental > and > >> >> > fast. > >> >> > but if I use that IndexWriter, it applies new updates on top of > that > >> >> > merged > >> >> > one (_7m(5.0):C117), not on segments_ee. And it broke my > assertions. > >> >> > I > >> >> > rather need to open reader of that merged _7m(5.0):C117, which IW > >> >> > keeps > >> >> > somewhere internally, and it's better to do if fancy&incremental. > If > >> >> > you > >> >> > can > >> >> > point me on how NRT can solve I'd happy to switch on it. > >> >> > > >> >> > Incredibly thank you for your time!!! > >> >> > > >> >> > -- > >> >> > Sincerely yours > >> >> > Mikhail Khludnev > >> >> > Principal Engineer, > >> >> > Grid Dynamics > >> >> > > >> >> > > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> >> > >> > > >> > > >> > > >> > -- > >> > Sincerely yours > >> > Mikhail Khludnev > >> > Principal Engineer, > >> > Grid Dynamics > >> > > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > Principal Engineer, > > Grid Dynamics > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>