It's awesome! Incredibly thank you!

On Sun, Sep 28, 2014 at 12:46 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> OK I ran the test and saw the failure, thank you!
>
> I think I understand why you are seeing what you are seeing.
>
> First off, you are not actually using an NRT reader when
> hardReopenBeforeDVUpdate is false, because in readerReopenIfChanged,
> when oldReader == null, you must do:
>
>     return DirectoryReader.open(writer2, true);
>
> so that your initial reader is in fact NRT.  All subsequent reopens
> from then on will then be NRT.
>
> When I make that change to your test, it seems to pass (or at least
> run for much longer than it did before...).
>
> However, if I remove the writer.commit() before the reopen, the test
> fails.  The reason is that IW commit and NRT reader reopen do not
> reflect merges "just kicked off" by that flush, even when using SMS.
> So, there will always be this "off by 1", in that you'll get a reader
> with 10 segments (pre-merge) not 1 segment (post-merge).
>
> One possible workaround here w/o having to call crazy-expensive commit
> would be to call reopenIfChanged twice in a row (and fix your reopen
> method to properly handle null return from openIfChanged); when I
> tried that in your test, it also seemed to run forever...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Sep 26, 2014 at 2:44 PM, Mikhail Khludnev
> <mkhlud...@griddynamics.com> wrote:
> >
> >
> > On Fri, Sep 26, 2014 at 7:07 PM, Michael McCandless
> > <luc...@mikemccandless.com> wrote:
> >>
> >> Sorry I can't make heads or tails of what you are saying here ... can
> >> you maybe make a small test case that fails with "ant test"?  Boil it
> >> down as much as possible...
> >
> >
> > Sure. I'm really sorry for being so confusing.
> > I changed constant
> >
> https://github.com/m-khl/lucene-merge-visibility/commit/a4a01c2c91d9c30850602b8dddf23de5363c4851#diff-86ebfbf440fe69ee36a52705cb92b824R44
> > to make it fail.
> > the branch reader-vs-merge  at
> > https://github.com/m-khl/lucene-merge-visibility/tree/reader-vs-merge
> > in lucene/core there is a failed test
> > $> ant test -Dtestcase=TestNumDValUpdVsReaderVisibility
> >
> > it's verbose, because it uses sysout as infostream.
> >    [junit4] FAILURE 2.40s | TestNumDValUpdVsReaderVisibility.testSimple
> <<<
> >    [junit4]    > Throwable #1: java.lang.AssertionError: failed on
> id:doc-18
> > expected:<17> but was:<18>
> >    [junit4]    >     at
> > __randomizedtesting.SeedInfo.seed([73A18231908F4ADC:4B12A6CFB77C9E0D]:0)
> >    [junit4]    >     at
> >
> org.apache.lucene.index.TestNumDValUpdVsReaderVisibility.testSimple(TestNumDValUpdVsReaderVisibility.java:134)
> >
> >
> >>
> >>
> >> The gist seems to be if you use an NRT reader something fails, but if
> >> you instead open a new reader, that something passes?
> >
> > I don't use NTR, and perhaps it's a solution. I just don't know how to do
> > that.
> > Note: closing writer, open reader - works (but I suppose it's slow); just
> > committing and reopening reader - it fails;
> >>
> >> But what
> >> exactly is failing?
> >
> > - let I have merge factor 10 and SerialMergeSceduler.
> > - I did 9 commits already and have 9 segments in the index
> > - I add a few docs and commit
> > - 10th commit triggers merge synchronously, it's done.
> > - now if I reopen reader it see 10 unmerged segments (merged single
> segment
> > index, isn't visible for reopen) /*test FAILS*/
> > - but if I fully close writer&reader and open reader, I've got single
> > segment merged index.    /*test PASS */
> >
> > - usually such behavior gets no probs, it's reasonable, and fine.
> > - but I do a mad thing
> > - I use that reader (with 10 segments) to get docnum and write it as a
> > docvalue;
> > - after I commit only docvalues update (no docs update) and reopen
> reader,
> > I've got single segment index, which was already written by merge at the
> > previous commit.
> > - and here is a problem because a docnum obtained at 10 segments index,
> > doesn't match to docnum at single segment index (there was a deletion)
> >
> >>
> >> And what is a "solid" segment here?
> >
> > I meant an index contains of single segment, at contrast from index
> contains
> > of many ones.
> >
> > Thank you!
> >>
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Thu, Sep 25, 2014 at 6:00 PM, Mikhail Khludnev
> >> <mkhlud...@griddynamics.com> wrote:
> >> > Hello Mike!
> >> >
> >> > Thanks for your attention.
> >> > I pushed the mad case at
> >> >
> >> >
> https://github.com/m-khl/lucene-merge-visibility/commit/fa2d60be5b13eb57e0527c843119cf62cfa83a7d#diff-86ebfbf440fe69ee36a52705cb92b824R120
> >> >
> >> > it does the following
> >> >
> >> > - writes a pair of doc
> >> > - commit
> >> > - reopen reader, searches for one of them
> >> > - update this doc with its' docnum (I know it's weird, but I should
> work
> >> > if
> >> > reopened reader sees that update)
> >> > - commit this DV update
> >> > - search that doc and check the written doc val.
> >> > it passes if hardReopenBeforeDVUpdate=true and fails otherwise
> >> >
> >> > I know that changing docnum is natural, but I expect it doesnt change
> >> > while
> >> > I update docvals.
> >> > here how it flips:
> >> > at the commit after doc update we have many segments
> >> >
> >> >  now checkpoint "_0(6.0.0):C2/1:delGen=1:fieldInfosGen=1:dvGen=1
> >> > _1(6.0.0):C2:fieldInfosGen=1:dvGen=1 _2(6.0.0):C2:
> >> > commit: wrote segments file "segments_j"
> >> >
> >> > but also there is a solid segment, which is merged but haven't
> >> > committed/published
> >> > after commitMerge: _a(6.0.0):c19
> >> >
> >> > and after DV update commit we have that solid segment visible
> >> >
> >> > now checkpoint "_a(6.0.0):c19:fieldInfosGen=1:dvGen=1" [1 segments ;
> >> > isCommit = true]
> >> > IFD 0 [Thu Sep 25 23:56:22 SAST 2014;
> >> >
> >> >
> TEST-TestNumDValUpdVsReaderVisibility.testSimple-seed#[6131CF35B3A45FC3]]:
> >> > deleteCommits: now decRef commit "segments_j"
> >> > ...
> >> > wrote segments file "segments_k"
> >> >
> >> > I'm using SerialMergeScheduler, and expect to see single solid segment
> >> > after
> >> > I commit document updates and it triggers the merge.
> >> > How I can reopen reader which sees it?
> >> > Thanks
> >> >
> >> >
> >> > On Wed, Sep 24, 2014 at 10:07 PM, Michael McCandless
> >> > <luc...@mikemccandless.com> wrote:
> >> >>
> >> >> I don't understand what's actually happening / going wrong here.
> >> >>
> >> >> Maybe you can make a test case / give more details?
> >> >>
> >> >> What assertions are broken?  Why is it bad if SMS does a merge before
> >> >> you reopen?  Why are you using SMS :)
> >> >>
> >> >> Mike McCandless
> >> >>
> >> >> http://blog.mikemccandless.com
> >> >>
> >> >> On Mon, Sep 22, 2014 at 6:00 PM, Mikhail Khludnev
> >> >> <mkhlud...@griddynamics.com> wrote:
> >> >> > Hello!
> >> >> > I'm in trouble with Lucene Index Writer. I'm benchmarking some
> >> >> > algorithm
> >> >> > which might seem like NRT-case, but I'm not sure that I need it
> >> >> > particularly. The overall problem is to writing "join index"
> (column
> >> >> > holds
> >> >> > docnums) via updating binary docvalues after commit. i.e.:
> >> >> >  - update docs
> >> >> >  - commit
> >> >> >  - read docs (openIfChanged() before )
> >> >> >  - updateDocVals
> >> >> >  - commit
> >> >> >
> >> >> > It's clunky but it works, until guess what happens... merge.Oh my.
> >> >> >
> >> >> > Once a time I have segments
> >> >> > segments_ec:2090 _7c(5.0):C117/8:delGen=8:....
> >> >> > _7j(5.0):C1:fieldInfosGen=1:dvGen=1 _7k(5.0):C1)
> >> >> >
> >> >> > I apply one update and trigger commit, as a result I have:
> >> >> > segments_ee:2102 _7c(5.0):C117/9:delGen=9:..
> >> >> > _7k(5.0):C1:fieldInfosGen=1:dvGen=1 _7l(5.0):C1)
> >> >> >
> >> >> > however, somewhere inside of the this commit call, pretty
> >> >> > SerialMergeScheduler bakes the single solid segment
> >> >> > _7m(5.0):C117
> >> >> > however, it wasn't exposed in via any segments file so far.
> >> >> >
> >> >> > And now I get into trouble:
> >> >> > if I call DR.openIfChanged(segments_ec) (even after
> IW.waitMerges()),
> >> >> > I've
> >> >> > got segments_ee that's fairly reasonable, to keep it incremental
> and
> >> >> > fast.
> >> >> > but if I use that IndexWriter, it applies new updates on top of
> that
> >> >> > merged
> >> >> > one (_7m(5.0):C117), not on segments_ee. And it broke my
> assertions.
> >> >> > I
> >> >> > rather need to open reader of that merged _7m(5.0):C117, which IW
> >> >> > keeps
> >> >> > somewhere internally, and it's better to do if fancy&incremental.
> If
> >> >> > you
> >> >> > can
> >> >> > point me on how NRT can solve I'd happy to switch on it.
> >> >> >
> >> >> > Incredibly thank you for your time!!!
> >> >> >
> >> >> > --
> >> >> > Sincerely yours
> >> >> > Mikhail Khludnev
> >> >> > Principal Engineer,
> >> >> > Grid Dynamics
> >> >> >
> >> >> >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Sincerely yours
> >> > Mikhail Khludnev
> >> > Principal Engineer,
> >> > Grid Dynamics
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Reply via email to