Re: Could we allow an IndexInput to read from a still writing IndexOutput?

Uwe Schindler Sat, 21 Oct 2023 01:08:15 -0700

Hi, the biggest problem is with some IndexInputs that work on FS Cache(mmapdir). The file size changes while you are writing therefore itcould cause strange issues. Especially the mapping of mmap may not seethe changes you have already written as there is no happens-beforerelationship.

Basically the IO model of Lucene is WORM. So something thats visible toreaders must never change anymore.

So as said by the others, if you need stuff already written, keep it inmemory (like nodes). We should really not change our IO model for thissingleton. 1% slowdown while writing due to some caching of bufferingdoes not matter and risk us corrupting indexes or run into errors whilereading.


Uwe

Am 19.10.2023 um 15:47 schrieb Michael McCandless:

Hi Team,
Today, Lucene's Directory abstraction does not allow opening anIndexInput on a file until the file is fully written and closed viaIndexOutput. We enforce this in tests, and some of our core Directoryimplementations demand this (e.g. caching the file's length on openingan IndexInput).
Yet, most filesystems will easily allow simultaneous read/append of asingle file. We just don't expose this IO semantics to Lucene, butcould we allow random-access reads with append-only writes on onefile? Is there a strong reason that we don't allow this?
Quick TL/DR context: we are trying to enable FST compilation to writeoff-heap (directly to disk), enabling creating arbitrarily large FSTswith bounded heap, matching how FSTs can now be read off-heap, and itwould be much much more RAM efficient if we could read/append the samefile at once.
Full gory details context: inspired by how Tantivy<https://github.com/quickwit-oss/tantivy> (awesome and fast Rustsearch engine!) writes its FSTs<https://blog.burntsushi.net/transducers/>, over in this issue<https://github.com/apache/lucene/issues/12543> and PR<https://github.com/dungba88/lucene/commit/882f5a5b1f60d4321d2e09986335063368c08e9b>,we (thank you Dzung Bui / @dungba88!) are trying to fix Lucene's FSTbuilding to immediately stream the FST to disk, instead of bufferingthe whole thing in RAM and then writing to disk.
This would allow building arbitrarily large FSTs without using upheap, and symmetrically matches how we can now read FSTs off-heap,plus FST building is already (mostly) append-only. This would alsoallow removing some of the crazy abstractions we have for writing FSTbytes into RAM (FSTStore, BytesStore). It would enable interestingthings like a Codec whose term dictionary is stored entirely in an FST<https://github.com/apache/lucene/pull/12688> (also inspired by Tantivy).
The wrinkle is that, while the FST is building, it sometimes looksback and reads previously written bytes, to share suffixes and createa minimal (or near minimal) FST. So if IndexInput could read thosebytes, even as the FST is still appending to IndexOutput, it would"just work".
Failing that, our plan B is to wastefully duplicate the byte[] slicesfrom the already written bytes into our own private (heap resident,boo) copy, which would use quite a bit more RAM while building theFST, and make less minimal FSTs for a given RAM budget. I haven'tmeasured the added wasted RAM if we have to go this route but I fearit is sizable in practice, i.e. it strongly negates the whole idea ofwriting an FST off-heap since its effectively storing a possibly largeportion of the FST in many duplicated byte[] fragments (in the NodeHash).
So ... could we somehow relax Lucene's Directory semantics to allowopening an IndexInput on a still appending IndexOutput, since mostfilesystems are fine with this?
Mike McCandless

http://blog.mikemccandless.com


--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de

Re: Could we allow an IndexInput to read from a still writing IndexOutput?

Reply via email to