mikemccand commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1418746164
##########
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##########
@@ -419,6 +417,8 @@ public FST(FSTMetadata<T> metadata, DataInput in, FSTStore
fstStore) throws IOEx
/** Create the FST with a metadata object and a FSTReader. */
FST(FSTMetadata<T> metadata, FSTReader fstReader) {
+ assert metadata != null;
Review Comment:
Can we make these real `if`? User might hit this if they pass `null` right?
We try to use `assert` when only a weird bug in our code might cause
something. When it's a real thing that could happen due to user actions, we
try to use real `if`. You could just do `this.metadata =
Objects.requireNonNull(metadata);`?
##########
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##########
@@ -248,12 +327,16 @@ public Builder<T>
directAddressingMaxOversizingFactor(float factor) {
/** Creates a new {@link FSTCompiler}. */
public FSTCompiler<T> build() throws IOException {
+ // create a default DataOutput if not specified
+ if (dataOutput == null) {
Review Comment:
Thanks :)
##########
lucene/core/src/test/org/apache/lucene/util/fst/Test2BFSTOffHeap.java:
##########
@@ -0,0 +1,342 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util.fst;
+
+import com.carrotsearch.randomizedtesting.annotations.TimeoutSuite;
+import java.util.Arrays;
+import java.util.Random;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.store.MMapDirectory;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.LuceneTestCase.SuppressSysoutChecks;
+import org.apache.lucene.tests.util.TimeUnits;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IntsRef;
+import org.junit.Ignore;
+
+// Similar to Test2BFST but will build and read the FST off-heap and can be
run with small heap
+
+// Run something like this:
+// ./gradlew test --tests Test2BFSTOffHeap -Dtests.verbose=true
--max-workers=1
Review Comment:
Yay, no massive heap needed!! So sweet.
##########
lucene/test-framework/src/java/org/apache/lucene/tests/util/fst/FSTTester.java:
##########
@@ -255,9 +255,16 @@ private T randomAcceptedWord(FST<T> fst, IntsRefBuilder
in) throws IOException {
public FST<T> doTest() throws IOException {
+ IndexOutput indexOutput = null;
+ boolean useOffHeap = random.nextBoolean();
+ if (useOffHeap) {
+ indexOutput = dir.createOutput("fstOffHeap.bin", IOContext.DEFAULT);
+ }
+
final FSTCompiler<T> fstCompiler =
new FSTCompiler.Builder<>(
inputMode == 0 ? FST.INPUT_TYPE.BYTE1 : FST.INPUT_TYPE.BYTE4,
outputs)
+ .dataOutput(indexOutput)
Review Comment:
Hmm -- I don't think this Builder method (`.dataOutput`) should allow
`null`? And then we should conditionalize this call to `if (useOffHeap) ...`?
##########
lucene/test-framework/src/java/org/apache/lucene/tests/util/fst/FSTTester.java:
##########
@@ -274,8 +281,16 @@ public FST<T> doTest() throws IOException {
}
}
FST<T> fst = fstCompiler.compile();
+ ;
- if (random.nextBoolean() && fst != null) {
+ if (useOffHeap) {
+ indexOutput.close();
+ try (IndexInput in = dir.openInput("fstOffHeap.bin", IOContext.DEFAULT))
{
+ fst = new FST<>(fst.getMetadata(), in);
+ } finally {
+ dir.deleteFile("fstOffHeap.bin");
Review Comment:
Whoa, nice usage of "delete on final close" semantics!
##########
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##########
@@ -153,6 +192,41 @@ private FSTCompiler(
}
}
+ // Get the respective FSTReader of the DataOutput. If the DataOutput is also
a FSTReader then we
+ // will use it, otherwise we will return a NullFSTReader. Attempting to read
from a FST with
+ // NullFSTReader
+ // will throw UnsupportedOperationException
Review Comment:
Merge the above two lines? Annoying that spotless does not actually tidy up
on itself, rather than brutally chopping up text to fit within the styling
requirements!
##########
lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java:
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util.fst;
+
+import java.io.IOException;
+import org.apache.lucene.store.ByteBuffersDataInput;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.DataOutput;
+
+/**
+ * An adapter class to use {@link ByteBuffersDataOutput} as a {@link
FSTReader}. It allows the FST
+ * to be readable immediately after writing
+ */
+final class ReadWriteDataOutput extends DataOutput implements FSTReader {
+
+ private final ByteBuffersDataOutput dataOutput;
+ // the DataInput to read from once we finish writing
+ private ByteBuffersDataInput dataInput;
+ // whether this DataOutput is already frozen
+ private boolean frozen;
+
+ public ReadWriteDataOutput(ByteBuffersDataOutput dataOutput) {
+ this.dataOutput = dataOutput;
+ }
+
+ @Override
+ public void writeByte(byte b) {
+ if (frozen) {
+ throw new IllegalStateException("Already frozen");
+ }
+ dataOutput.writeByte(b);
+ }
+
+ @Override
+ public void writeBytes(byte[] b, int offset, int length) {
+ if (frozen) {
Review Comment:
I think these could be `assert` instead? User should not be able to do
anything to trigger this exception? Only a bug in our code could...?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]