uschindler commented on code in PR #13219: URL: https://github.com/apache/lucene/pull/13219#discussion_r1539364622
########## lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java: ########## @@ -137,17 +136,11 @@ public void madvise(MemorySegment segment, IOContext context) throws IOException } private Integer mapIOContext(IOContext ctx) { - // Merging always wins and implies sequential access, because kernel is advised to free pages - // after use: - if (ctx.context() == Context.MERGE) { - return POSIX_MADV_SEQUENTIAL; - } - if (ctx.randomAccess()) { - return POSIX_MADV_RANDOM; - } - if (ctx.readOnce()) { - return POSIX_MADV_SEQUENTIAL; - } - return null; + return switch (ctx.readAdvice()) { Review Comment: I think we can remove the context from the signature and change it to `madvise(MemorySegment, ReadAdvice)`. `MemorySegmentIndexInputProvider` would just pass `context.readAdvice()` to `madvice()` then. ########## lucene/core/src/java/org/apache/lucene/store/IOContext.java: ########## @@ -54,58 +43,74 @@ public enum Context { DEFAULT }; - public static final IOContext DEFAULT = new IOContext(Context.DEFAULT); + /** Advice regarding the read access pattern. */ + public enum ReadAdvice { + /** + * Normal behavior. Data is expected to be read mostly sequentially. The system is expected to + * cache the hottest pages. + */ + NORMAL, + /** + * Data is expected to be read in a random-access fashion, either by {@link + * IndexInput#seek(long) seeking} often and reading relatively short sequences of bytes at once, + * or by reading data through the {@link RandomAccessInput} abstraction in random order. + */ + RANDOM, + /** Data is expected to be read sequentially with very little seeking at most. */ + SEQUENTIAL, + /** + * Data is treated as random-access memory in practice. {@link Directory} implementations may + * explicitly load the content of the file in memory, or provide hints to the system so that it + * loads the content of the file into the page cache at open time. This should only be used on + * very small files that can be expected to fit in RAM with very high confidence. + */ + LOAD + } + + public static final IOContext DEFAULT = + new IOContext(Context.DEFAULT, null, null, ReadAdvice.NORMAL); - public static final IOContext READONCE = new IOContext(true, false, false); + public static final IOContext READONCE = new IOContext(ReadAdvice.SEQUENTIAL); - public static final IOContext READ = new IOContext(false, false, false); + public static final IOContext READ = new IOContext(ReadAdvice.NORMAL); - public static final IOContext LOAD = new IOContext(false, true, true); + public static final IOContext LOAD = new IOContext(ReadAdvice.LOAD); - public static final IOContext RANDOM = new IOContext(false, false, true); + public static final IOContext RANDOM = new IOContext(ReadAdvice.RANDOM); @SuppressWarnings("incomplete-switch") public IOContext { + Objects.requireNonNull(context, "context must not be null"); + Objects.requireNonNull(readAdvice, "readAdvice must not be null"); switch (context) { case MERGE -> Objects.requireNonNull( mergeInfo, "mergeInfo must not be null if context is MERGE"); case FLUSH -> Objects.requireNonNull( flushInfo, "flushInfo must not be null if context is FLUSH"); } - if (load && readOnce) { - throw new IllegalArgumentException("load and readOnce are mutually exclusive"); - } - if (readOnce && randomAccess) { - throw new IllegalArgumentException("readOnce and randomAccess are mutually exclusive"); + if (context == Context.MERGE && readAdvice != ReadAdvice.SEQUENTIAL) { Review Comment: This is really a good idea! It makes code much easier and the merge case needs no special handling in MMapDir. ########## lucene/core/src/java/org/apache/lucene/store/IOContext.java: ########## @@ -27,22 +27,11 @@ * @param context An object of a enumerator Context type * @param mergeInfo must be given when {@code context == MERGE} * @param flushInfo must be given when {@code context == FLUSH} - * @param readOnce This flag indicates that the file will be opened, then fully read sequentially - * then closed. - * @param load This flag is used for files that are a small fraction of the total index size and are - * expected to be heavily accessed in random-access fashion. Some {@link Directory} - * implementations may choose to load such files into physical memory (e.g. Java heap) as a way - * to provide stronger guarantees on query latency. - * @param randomAccess This flag indicates that the file will be accessed randomly. If this flag is - * set, then readOnce will be false. + * @param readAdvice Advice regarding the read access pattern. Write operations should disregard Review Comment: Writing in our case is always sequential (OutputStream). If we have a solutions for fadvise when writing files we can add another enum. ########## lucene/core/src/java/org/apache/lucene/store/IOContext.java: ########## @@ -54,58 +43,74 @@ public enum Context { DEFAULT }; - public static final IOContext DEFAULT = new IOContext(Context.DEFAULT); + /** Advice regarding the read access pattern. */ + public enum ReadAdvice { + /** + * Normal behavior. Data is expected to be read mostly sequentially. The system is expected to + * cache the hottest pages. + */ + NORMAL, + /** + * Data is expected to be read in a random-access fashion, either by {@link + * IndexInput#seek(long) seeking} often and reading relatively short sequences of bytes at once, + * or by reading data through the {@link RandomAccessInput} abstraction in random order. + */ + RANDOM, + /** Data is expected to be read sequentially with very little seeking at most. */ Review Comment: The madvise flags also say "Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)" The second sentence is important as this is exactly our use case This is also the reason why we don't use sequential for preloaded files, as it's a "read once" like approach. ########## lucene/core/src/java/org/apache/lucene/store/IOContext.java: ########## @@ -54,58 +43,74 @@ public enum Context { DEFAULT }; - public static final IOContext DEFAULT = new IOContext(Context.DEFAULT); + /** Advice regarding the read access pattern. */ + public enum ReadAdvice { Review Comment: Maybe make this toplevel class!? I am tempting between both variants. Could we maybe rename the inner `Context` as the name `IOContext` is so similar. ########## lucene/core/src/java/org/apache/lucene/store/IOContext.java: ########## @@ -54,58 +43,74 @@ public enum Context { DEFAULT }; - public static final IOContext DEFAULT = new IOContext(Context.DEFAULT); + /** Advice regarding the read access pattern. */ + public enum ReadAdvice { + /** + * Normal behavior. Data is expected to be read mostly sequentially. The system is expected to + * cache the hottest pages. + */ + NORMAL, + /** + * Data is expected to be read in a random-access fashion, either by {@link + * IndexInput#seek(long) seeking} often and reading relatively short sequences of bytes at once, + * or by reading data through the {@link RandomAccessInput} abstraction in random order. + */ + RANDOM, + /** Data is expected to be read sequentially with very little seeking at most. */ + SEQUENTIAL, + /** + * Data is treated as random-access memory in practice. {@link Directory} implementations may + * explicitly load the content of the file in memory, or provide hints to the system so that it + * loads the content of the file into the page cache at open time. This should only be used on + * very small files that can be expected to fit in RAM with very high confidence. + */ + LOAD Review Comment: I don't like load, should be preload, maybe `RANDOM_PRELOAD`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org