Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

via GitHub Tue, 26 Mar 2024 07:41:19 -0700


uschindler commented on code in PR #13219:
URL: https://github.com/apache/lucene/pull/13219#discussion_r1539364622



##########
lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java:
##########
@@ -137,17 +136,11 @@ public void madvise(MemorySegment segment, IOContext 
context) throws IOException
   }
 
   private Integer mapIOContext(IOContext ctx) {
-    // Merging always wins and implies sequential access, because kernel is 
advised to free pages
-    // after use:
-    if (ctx.context() == Context.MERGE) {
-      return POSIX_MADV_SEQUENTIAL;
-    }
-    if (ctx.randomAccess()) {
-      return POSIX_MADV_RANDOM;
-    }
-    if (ctx.readOnce()) {
-      return POSIX_MADV_SEQUENTIAL;
-    }
-    return null;
+    return switch (ctx.readAdvice()) {

Review Comment:
   I think we can remove the context from the signature and change it to 
`madvise(MemorySegment, ReadAdvice)`. `MemorySegmentIndexInputProvider` would 
just pass `context.readAdvice()` to `madvice()` then.



##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -54,58 +43,74 @@ public enum Context {
     DEFAULT
   };
 
-  public static final IOContext DEFAULT = new IOContext(Context.DEFAULT);
+  /** Advice regarding the read access pattern. */
+  public enum ReadAdvice {
+    /**
+     * Normal behavior. Data is expected to be read mostly sequentially. The 
system is expected to
+     * cache the hottest pages.
+     */
+    NORMAL,
+    /**
+     * Data is expected to be read in a random-access fashion, either by {@link
+     * IndexInput#seek(long) seeking} often and reading relatively short 
sequences of bytes at once,
+     * or by reading data through the {@link RandomAccessInput} abstraction in 
random order.
+     */
+    RANDOM,
+    /** Data is expected to be read sequentially with very little seeking at 
most. */
+    SEQUENTIAL,
+    /**
+     * Data is treated as random-access memory in practice. {@link Directory} 
implementations may
+     * explicitly load the content of the file in memory, or provide hints to 
the system so that it
+     * loads the content of the file into the page cache at open time. This 
should only be used on
+     * very small files that can be expected to fit in RAM with very high 
confidence.
+     */
+    LOAD
+  }
+
+  public static final IOContext DEFAULT =
+      new IOContext(Context.DEFAULT, null, null, ReadAdvice.NORMAL);
 
-  public static final IOContext READONCE = new IOContext(true, false, false);
+  public static final IOContext READONCE = new 
IOContext(ReadAdvice.SEQUENTIAL);
 
-  public static final IOContext READ = new IOContext(false, false, false);
+  public static final IOContext READ = new IOContext(ReadAdvice.NORMAL);
 
-  public static final IOContext LOAD = new IOContext(false, true, true);
+  public static final IOContext LOAD = new IOContext(ReadAdvice.LOAD);
 
-  public static final IOContext RANDOM = new IOContext(false, false, true);
+  public static final IOContext RANDOM = new IOContext(ReadAdvice.RANDOM);
 
   @SuppressWarnings("incomplete-switch")
   public IOContext {
+    Objects.requireNonNull(context, "context must not be null");
+    Objects.requireNonNull(readAdvice, "readAdvice must not be null");
     switch (context) {
       case MERGE -> Objects.requireNonNull(
           mergeInfo, "mergeInfo must not be null if context is MERGE");
       case FLUSH -> Objects.requireNonNull(
           flushInfo, "flushInfo must not be null if context is FLUSH");
     }
-    if (load && readOnce) {
-      throw new IllegalArgumentException("load and readOnce are mutually 
exclusive");
-    }
-    if (readOnce && randomAccess) {
-      throw new IllegalArgumentException("readOnce and randomAccess are 
mutually exclusive");
+    if (context == Context.MERGE && readAdvice != ReadAdvice.SEQUENTIAL) {

Review Comment:
   This is really a good idea! It makes code much easier and the merge case 
needs no special handling in MMapDir.



##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -27,22 +27,11 @@
  * @param context An object of a enumerator Context type
  * @param mergeInfo must be given when {@code context == MERGE}
  * @param flushInfo must be given when {@code context == FLUSH}
- * @param readOnce This flag indicates that the file will be opened, then 
fully read sequentially
- *     then closed.
- * @param load This flag is used for files that are a small fraction of the 
total index size and are
- *     expected to be heavily accessed in random-access fashion. Some {@link 
Directory}
- *     implementations may choose to load such files into physical memory 
(e.g. Java heap) as a way
- *     to provide stronger guarantees on query latency.
- * @param randomAccess This flag indicates that the file will be accessed 
randomly. If this flag is
- *     set, then readOnce will be false.
+ * @param readAdvice Advice regarding the read access pattern. Write 
operations should disregard

Review Comment:
   Writing in our case is always sequential (OutputStream). If we have a 
solutions for fadvise when writing files we can add another enum.



##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -54,58 +43,74 @@ public enum Context {
     DEFAULT
   };
 
-  public static final IOContext DEFAULT = new IOContext(Context.DEFAULT);
+  /** Advice regarding the read access pattern. */
+  public enum ReadAdvice {
+    /**
+     * Normal behavior. Data is expected to be read mostly sequentially. The 
system is expected to
+     * cache the hottest pages.
+     */
+    NORMAL,
+    /**
+     * Data is expected to be read in a random-access fashion, either by {@link
+     * IndexInput#seek(long) seeking} often and reading relatively short 
sequences of bytes at once,
+     * or by reading data through the {@link RandomAccessInput} abstraction in 
random order.
+     */
+    RANDOM,
+    /** Data is expected to be read sequentially with very little seeking at 
most. */

Review Comment:
   The madvise flags also say "Expect page references in sequential order. 
(Hence, pages in the given range can be aggressively read ahead, and may be 
freed soon after they are accessed.)"
   
   The second sentence is important as this is exactly our use case
   
   This is also the reason why we don't use sequential for preloaded files, as 
it's a "read once" like approach.



##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -54,58 +43,74 @@ public enum Context {
     DEFAULT
   };
 
-  public static final IOContext DEFAULT = new IOContext(Context.DEFAULT);
+  /** Advice regarding the read access pattern. */
+  public enum ReadAdvice {

Review Comment:
   Maybe make this toplevel class!? I am tempting between both variants.
   
   Could we maybe rename the inner `Context` as the name `IOContext` is so 
similar.



##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -54,58 +43,74 @@ public enum Context {
     DEFAULT
   };
 
-  public static final IOContext DEFAULT = new IOContext(Context.DEFAULT);
+  /** Advice regarding the read access pattern. */
+  public enum ReadAdvice {
+    /**
+     * Normal behavior. Data is expected to be read mostly sequentially. The 
system is expected to
+     * cache the hottest pages.
+     */
+    NORMAL,
+    /**
+     * Data is expected to be read in a random-access fashion, either by {@link
+     * IndexInput#seek(long) seeking} often and reading relatively short 
sequences of bytes at once,
+     * or by reading data through the {@link RandomAccessInput} abstraction in 
random order.
+     */
+    RANDOM,
+    /** Data is expected to be read sequentially with very little seeking at 
most. */
+    SEQUENTIAL,
+    /**
+     * Data is treated as random-access memory in practice. {@link Directory} 
implementations may
+     * explicitly load the content of the file in memory, or provide hints to 
the system so that it
+     * loads the content of the file into the page cache at open time. This 
should only be used on
+     * very small files that can be expected to fit in RAM with very high 
confidence.
+     */
+    LOAD

Review Comment:
   I don't like load, should be preload, maybe `RANDOM_PRELOAD`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

Reply via email to