Hi, the biggest problem is with some IndexInputs that work on FS Cache
(mmapdir). The file size changes while you are writing therefore it
could cause strange issues. Especially the mapping of mmap may not see
the changes you have already written as there is no happens-before
relationship.
Basically the IO model of Lucene is WORM. So something thats visible to
readers must never change anymore.
So as said by the others, if you need stuff already written, keep it in
memory (like nodes). We should really not change our IO model for this
singleton. 1% slowdown while writing due to some caching of buffering
does not matter and risk us corrupting indexes or run into errors while
reading.
Uwe
Am 19.10.2023 um 15:47 schrieb Michael McCandless:
Hi Team,
Today, Lucene's Directory abstraction does not allow opening an
IndexInput on a file until the file is fully written and closed via
IndexOutput. We enforce this in tests, and some of our core Directory
implementations demand this (e.g. caching the file's length on opening
an IndexInput).
Yet, most filesystems will easily allow simultaneous read/append of a
single file. We just don't expose this IO semantics to Lucene, but
could we allow random-access reads with append-only writes on one
file? Is there a strong reason that we don't allow this?
Quick TL/DR context: we are trying to enable FST compilation to write
off-heap (directly to disk), enabling creating arbitrarily large FSTs
with bounded heap, matching how FSTs can now be read off-heap, and it
would be much much more RAM efficient if we could read/append the same
file at once.
Full gory details context: inspired by how Tantivy
<https://github.com/quickwit-oss/tantivy> (awesome and fast Rust
search engine!) writes its FSTs
<https://blog.burntsushi.net/transducers/>, over in this issue
<https://github.com/apache/lucene/issues/12543> and PR
<https://github.com/dungba88/lucene/commit/882f5a5b1f60d4321d2e09986335063368c08e9b>,
we (thank you Dzung Bui / @dungba88!) are trying to fix Lucene's FST
building to immediately stream the FST to disk, instead of buffering
the whole thing in RAM and then writing to disk.
This would allow building arbitrarily large FSTs without using up
heap, and symmetrically matches how we can now read FSTs off-heap,
plus FST building is already (mostly) append-only. This would also
allow removing some of the crazy abstractions we have for writing FST
bytes into RAM (FSTStore, BytesStore). It would enable interesting
things like a Codec whose term dictionary is stored entirely in an FST
<https://github.com/apache/lucene/pull/12688> (also inspired by Tantivy).
The wrinkle is that, while the FST is building, it sometimes looks
back and reads previously written bytes, to share suffixes and create
a minimal (or near minimal) FST. So if IndexInput could read those
bytes, even as the FST is still appending to IndexOutput, it would
"just work".
Failing that, our plan B is to wastefully duplicate the byte[] slices
from the already written bytes into our own private (heap resident,
boo) copy, which would use quite a bit more RAM while building the
FST, and make less minimal FSTs for a given RAM budget. I haven't
measured the added wasted RAM if we have to go this route but I fear
it is sizable in practice, i.e. it strongly negates the whole idea of
writing an FST off-heap since its effectively storing a possibly large
portion of the FST in many duplicated byte[] fragments (in the NodeHash).
So ... could we somehow relax Lucene's Directory semantics to allow
opening an IndexInput on a still appending IndexOutput, since most
filesystems are fine with this?
Mike McCandless
http://blog.mikemccandless.com
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de