Re: Could we allow an IndexInput to read from a still writing IndexOutput?

Michael McCandless Mon, 23 Oct 2023 09:41:35 -0700

Thanks everyone!  Responses below:

On Thu, Oct 19, 2023 at 11:17 AM Robert Muir <rcm...@gmail.com> wrote:

> what will happen on windows?
>
> sorry, could not resist.
>

LOL, yeah, sigh.

On Thu, Oct 19, 2023 at 10:36 PM Dawid Weiss <dawid.we...@gmail.com> wrote:

>
> I think there is a certain beauty (of tape-backed storage flavor...) in
> existing abstractions and I wouldn't change them unless absolutely
> necessary (FST construction isn't the dominant cost in indexing). Also,
> random seeks all over the place may be really problematic in certain
> scenarios (as is opening a written-to file for reading, as Robert
> mentioned).
>

I do agree.  I love how minimalistic the IO semantics Lucene actually
requires are.

> Failing that, our plan B is to wastefully duplicate the byte[] slices
> from the already written bytes into our own private (heap resident, boo)
> copy, which would use quite a bit more RAM while building the FST, and make
> less minimal FSTs for a given RAM budget.
>
> Well, this node cache doesn't have to be on heap... It can be a plain
> temporary file (with full random access). It's a scratch-only structure
> which you can delete after the fst is written. It does add I/O overhead but
> doesn't interfere with the rest of the code in Lucene. Perhaps, instead of
> changing IndexInput and IndexOutput, one could start with a plain temp file
> (NIO API)?
>

That's an interesting option.  I had ruled out "bypassing Directory
abstraction and going straight to JDK IO APIs", but maybe it's OK to do
so.  I like this option Dawid!

> I also think that the tradeoffs presented in graphs on the fst-node-cache
> issue are not so bad at all. Yes, the FST is not minimal, but the
> construction-space vs output size is quite all right to me.
>

Well, the tradeoffs I posted in this PR
<https://github.com/apache/lucene/pull/12633> (now merged, to main, and
eventually to 9.x) are only if we still buffer the whole FST in RAM, and so
we use that as our random-access cache to past FST nodes.  If we succeed in
changing FST writing to fully off-heap (append bytes directly to disk),
then we need to duplicate that random-access RAM somewhere else (maybe a
direct NIO file, maybe just duplicated byte[] copies in the NodeHash).  So
net/net those curves will get worse -- more RAM required to achieve the
same minimality.  I haven't tested just how much worse yet ... I wanted to
probe this possibility (random read access on an appending write file)
first to not wastefully duplicate these bytes in RAM.

On Sat, Oct 21, 2023 at 1:09 AM Uwe Schindler <u...@thetaphi.de> wrote:

> Hi, the biggest problem is with some IndexInputs that work on FS Cache
> (mmapdir). The file size changes while you are writing therefore it could
> cause strange issues. Especially the mapping of mmap may not see the
> changes you have already written as there is no happens-before relationship.
>
Hmm, I didn't realize Panama's MMap implementation had this limitation.  Or
maybe you are saying this is an OS level limitation?  Because when you map
a region of a file, you must give a bounded range (0 .. file-length), and
then if the file grows, you would have to re-map or make a 2nd, 3rd, ...
map?  Yeah OK this seems problematic indeed.

> So as said by the others, if you need stuff already written, keep it in
> memory (like nodes). We should really not change our IO model for this
> singleton. 1% slowdown while writing due to some caching of buffering does
> not matter and risk us corrupting indexes or run into errors while reading.
>
Yeah OK I'm convinced :)  Let's leave Lucene's IO "WORM" semantics intact,
and either use direct NIO for the suffix hash (NodeHash), or, burn the RAM
in duplicating the FST nodes (and measure the impact on RAM vs minimality).

Thanks everyone,

Mike McCandless

http://blog.mikemccandless.com

Re: Could we allow an IndexInput to read from a still writing IndexOutput?

Reply via email to