mikemccand commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1418746164


##########
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##########
@@ -419,6 +417,8 @@ public FST(FSTMetadata<T> metadata, DataInput in, FSTStore 
fstStore) throws IOEx
 
   /** Create the FST with a metadata object and a FSTReader. */
   FST(FSTMetadata<T> metadata, FSTReader fstReader) {
+    assert metadata != null;

Review Comment:
   Can we make these real `if`?  User might hit this if they pass `null` right? 
 We try to use `assert` when only a weird bug in our code might cause 
something.  When it's a real thing that could happen due to user actions, we 
try to use real `if`.  You could just do `this.metadata = 
Objects.requireNonNull(metadata);`?



##########
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##########
@@ -248,12 +327,16 @@ public Builder<T> 
directAddressingMaxOversizingFactor(float factor) {
 
     /** Creates a new {@link FSTCompiler}. */
     public FSTCompiler<T> build() throws IOException {
+      // create a default DataOutput if not specified
+      if (dataOutput == null) {

Review Comment:
   Thanks :)



##########
lucene/core/src/test/org/apache/lucene/util/fst/Test2BFSTOffHeap.java:
##########
@@ -0,0 +1,342 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util.fst;
+
+import com.carrotsearch.randomizedtesting.annotations.TimeoutSuite;
+import java.util.Arrays;
+import java.util.Random;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.store.MMapDirectory;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.LuceneTestCase.SuppressSysoutChecks;
+import org.apache.lucene.tests.util.TimeUnits;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IntsRef;
+import org.junit.Ignore;
+
+// Similar to Test2BFST but will build and read the FST off-heap and can be 
run with small heap
+
+// Run something like this:
+//    ./gradlew test --tests Test2BFSTOffHeap -Dtests.verbose=true 
--max-workers=1

Review Comment:
   Yay, no massive heap needed!!  So sweet.



##########
lucene/test-framework/src/java/org/apache/lucene/tests/util/fst/FSTTester.java:
##########
@@ -255,9 +255,16 @@ private T randomAcceptedWord(FST<T> fst, IntsRefBuilder 
in) throws IOException {
 
   public FST<T> doTest() throws IOException {
 
+    IndexOutput indexOutput = null;
+    boolean useOffHeap = random.nextBoolean();
+    if (useOffHeap) {
+      indexOutput = dir.createOutput("fstOffHeap.bin", IOContext.DEFAULT);
+    }
+
     final FSTCompiler<T> fstCompiler =
         new FSTCompiler.Builder<>(
                 inputMode == 0 ? FST.INPUT_TYPE.BYTE1 : FST.INPUT_TYPE.BYTE4, 
outputs)
+            .dataOutput(indexOutput)

Review Comment:
   Hmm -- I don't think this Builder method (`.dataOutput`) should allow 
`null`?  And then we should conditionalize this call to `if (useOffHeap) ...`?



##########
lucene/test-framework/src/java/org/apache/lucene/tests/util/fst/FSTTester.java:
##########
@@ -274,8 +281,16 @@ public FST<T> doTest() throws IOException {
       }
     }
     FST<T> fst = fstCompiler.compile();
+    ;
 
-    if (random.nextBoolean() && fst != null) {
+    if (useOffHeap) {
+      indexOutput.close();
+      try (IndexInput in = dir.openInput("fstOffHeap.bin", IOContext.DEFAULT)) 
{
+        fst = new FST<>(fst.getMetadata(), in);
+      } finally {
+        dir.deleteFile("fstOffHeap.bin");

Review Comment:
   Whoa, nice usage of "delete on final close" semantics!



##########
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##########
@@ -153,6 +192,41 @@ private FSTCompiler(
     }
   }
 
+  // Get the respective FSTReader of the DataOutput. If the DataOutput is also 
a FSTReader then we
+  // will use it, otherwise we will return a NullFSTReader. Attempting to read 
from a FST with
+  // NullFSTReader
+  // will throw UnsupportedOperationException

Review Comment:
   Merge the above two lines?  Annoying that spotless does not actually tidy up 
on itself, rather than brutally chopping up text to fit within the styling 
requirements!



##########
lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java:
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util.fst;
+
+import java.io.IOException;
+import org.apache.lucene.store.ByteBuffersDataInput;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.DataOutput;
+
+/**
+ * An adapter class to use {@link ByteBuffersDataOutput} as a {@link 
FSTReader}. It allows the FST
+ * to be readable immediately after writing
+ */
+final class ReadWriteDataOutput extends DataOutput implements FSTReader {
+
+  private final ByteBuffersDataOutput dataOutput;
+  // the DataInput to read from once we finish writing
+  private ByteBuffersDataInput dataInput;
+  // whether this DataOutput is already frozen
+  private boolean frozen;
+
+  public ReadWriteDataOutput(ByteBuffersDataOutput dataOutput) {
+    this.dataOutput = dataOutput;
+  }
+
+  @Override
+  public void writeByte(byte b) {
+    if (frozen) {
+      throw new IllegalStateException("Already frozen");
+    }
+    dataOutput.writeByte(b);
+  }
+
+  @Override
+  public void writeBytes(byte[] b, int offset, int length) {
+    if (frozen) {

Review Comment:
   I think these could be `assert` instead?  User should not be able to do 
anything to trigger this exception?  Only a bug in our code could...?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to