mikemccand commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1351949356
##########
lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java:
##########
@@ -21,19 +21,18 @@
import java.util.List;
import org.apache.lucene.store.DataInput;
import org.apache.lucene.store.DataOutput;
-import org.apache.lucene.util.Accountable;
import org.apache.lucene.util.RamUsageEstimator;
// TODO: merge with PagedBytes, except PagedBytes doesn't
// let you read while writing which FST needs
-class BytesStore extends DataOutput implements Accountable {
+class BytesStore extends FSTWriter {
Review Comment:
> Besides `reverse` most of the methods here also needs to write/update
arbitrarily previously written bytes (within the current frontier input), such
as writeBytes/writeByte/copyBytes/truncate.
I think these "modify prior bytes" are only called from `fst.addNode`, where
that (complex!!) method is basically using the tail of the `BytesStore` as a
scratch area? I.e. it writes some bytes first, and might go back and shuffle
the bytes around, depending on whether it's "direct addressing" or "binary
search"? Once `FST.addNode` completes, those written bytes are never altered?
> Moreover, there is also methods which requires reading of previously
written bytes that can't simply fit in the DataOutput (getReverseBytesReader
e.g).
Hmm `getReverseBytesReader` is indeed a problem. I wonder how the [Tantivy
FST impl](https://blog.burntsushi.net/transducers/) deals with this?
If we take the `FSTWriter` approach, are you thinking that we could make an
impl of this class based e.g. on `FileChannel` directly (supports absolute
positional reads, reading from a still-appending file), bypassing Lucene's
`Directory` abstraction entirely? That is not a great solution (we try to have
all IO go through `Directory`), but, perhaps as an intermediate state, for
users directly creating massive FSTs, it's acceptable. But that'd mean we
could not fix Lucene by default to do all of its FST compilation off-heap...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]