On Mon, Oct 2, 2017 at 2:25 PM, Dawid Weiss <dawid.we...@gmail.com> wrote:

> I think the delayed deletes might have to do w/ segment warming?
>
> I'll have to digest the scenario you described tomorrow. I didn't hit
> any exceptions when running those modified code snippets (which I'd be
> very grateful to see -- they'd provide an immediate proof something is
> wrong...).


Yeah, it's disappointing the test didn't fail when you removed it.  If my
theory is right (and I'm not sure it is!), removing that code would make
much higher NRT latency after a big merge finished, because the refresh
thread would pay the price of going off and building the parallel index for
the newly merged segment, instead of the bg merge thread.

> I am glad you're finding a use for this crazy class!
>
> It's super-useful for people who wish to low-level tweak the index
> format. I dreaded this for a long time, but for us it'd provide many
> benefits. We have a scenario where documents can be indexed once (and
> stay in the primary index) and certain derived indexes (features
> indexed on top of those documents) can be placed in the secondary
> index. The benefit here is that our data used to index features can
> change from time to time (as new documents emerge); then we can simply
> drop those existing secondary indexes and provide up-to-date ones.
> This saves disk I/O and is still fairly transparent to the rest of the
> application (because fields never clash between the primary and the
> secondary index and documents are always aligned).
>

Great!  That's exactly what it should work well for!


> Your 'demo' class is a great example of how this can be done. The
> class is surely advanced. Read: it crams way too many aspects into one
> class :) Each of these could be a separate demo:
>

Sorry :)  This is why it's a test class.

If you have ideas to make it easier to use, please refactor away!  I think
it can open up all sorts of unexpected use cases for Lucene, letting you
change your mind / experiment later about how exactly to index your raw
content.


> - splitting indexes into parallel once (primary/ secondary), with
> automatic secondary index creation on merges and startup.
> - folding back secondary index data into the primary index on merges
> (we don't need it, but I imagine there exist a scenario for this),
> - keeping multiple versions of the secondary index (those "generations").
>

I agree these are separate concerns if we can tease them out.


> And probably lots more. It's a very interesting advanced use case.
>
> > And how did you find this test :)
>
> I've been looking at ParallelCompositeReader for some time; as I was
> scanning it internally for its use cases within the code I somehow
> came across that "demo" class which leveraged its lower-level
> internals. It did take me some time to go through the class's internal
> workings because of confusingly named variables (I ended up renaming
> them to 'primary' and 'secondary' index instead of the original
> 'parallel'). But hey, I don't complain -- it's still an awesome piece
> of code!


Thanks :)  Keep up the renaming/refactoring!

I'm am still unsure why I tracked ref counts at the leaf reader level; did
this somehow enable re-using the parallel leaf readers on each refresh vs.
opening all leaves on each reopen?

Mike McCandless

http://blog.mikemccandless.com

Reply via email to