[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-12-14 Thread GitBox


mikemccand commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r542469384



##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##
@@ -154,14 +146,43 @@ public IndexOutput createOutput(String name, IOContext 
context) throws IOExcepti
 if (context.context != Context.MERGE || 
context.mergeInfo.estimatedMergeBytes < minBytesDirect) {
   return delegate.createOutput(name, context);
 } else {
-  return new NativeUnixIndexOutput(getDirectory().resolve(name), name, 
mergeBufferSize);
+  return new DirectIOIndexOutput(getDirectory().resolve(name), name, 
mergeBufferSize);
 }
   }
 
-  @SuppressForbidden(reason = "java.io.File: native API requires old-style 
FileDescriptor")
-  private final static class NativeUnixIndexOutput extends IndexOutput {
+  @Override
+  public void deleteFile(String name) throws IOException {
+delegate.deleteFile(name);
+  }
+
+  @Override
+  public Set getPendingDeletions() throws IOException {

Review comment:
   Great, thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-12-14 Thread GitBox


mikemccand commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r542469154



##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##
@@ -154,14 +146,43 @@ public IndexOutput createOutput(String name, IOContext 
context) throws IOExcepti
 if (context.context != Context.MERGE || 
context.mergeInfo.estimatedMergeBytes < minBytesDirect) {
   return delegate.createOutput(name, context);
 } else {
-  return new NativeUnixIndexOutput(getDirectory().resolve(name), name, 
mergeBufferSize);
+  return new DirectIOIndexOutput(getDirectory().resolve(name), name, 
mergeBufferSize);
 }
   }
 
-  @SuppressForbidden(reason = "java.io.File: native API requires old-style 
FileDescriptor")
-  private final static class NativeUnixIndexOutput extends IndexOutput {
+  @Override
+  public void deleteFile(String name) throws IOException {
+delegate.deleteFile(name);
+  }
+
+  @Override
+  public Set getPendingDeletions() throws IOException {
+return delegate.getPendingDeletions();
+  }
+
+  @Override
+  public String[] listAll() throws IOException {
+return delegate.listAll();
+  }
+
+  @Override
+  public long fileLength(String name) throws IOException {
+return delegate.fileLength(name);
+  }
+
+  @Override
+  public void rename(String source, String dest) throws IOException {
+delegate.rename(source, dest);
+  }
+
+  @Override
+  public void close() throws IOException {
+delegate.close();

Review comment:
   Oh yeah hmm, that is frustrating :)  I was hoping we could pass two 
things to `IOUtils.close` ... maybe pass `FilterDirectory.this`?  Though I 
suspect that would indeed just call this existing `close()` method, leading to 
infinite recursion.  Hrmph, this is beyond my Java knowledge but sure seems 
like it ought to be possible.  Maybe @uschindler knows :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-12-07 Thread GitBox


mikemccand commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r537569680



##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##
@@ -154,14 +146,43 @@ public IndexOutput createOutput(String name, IOContext 
context) throws IOExcepti
 if (context.context != Context.MERGE || 
context.mergeInfo.estimatedMergeBytes < minBytesDirect) {
   return delegate.createOutput(name, context);
 } else {
-  return new NativeUnixIndexOutput(getDirectory().resolve(name), name, 
mergeBufferSize);
+  return new DirectIOIndexOutput(getDirectory().resolve(name), name, 
mergeBufferSize);
 }
   }
 
-  @SuppressForbidden(reason = "java.io.File: native API requires old-style 
FileDescriptor")
-  private final static class NativeUnixIndexOutput extends IndexOutput {
+  @Override
+  public void deleteFile(String name) throws IOException {
+delegate.deleteFile(name);
+  }
+
+  @Override
+  public Set getPendingDeletions() throws IOException {
+return delegate.getPendingDeletions();
+  }
+
+  @Override
+  public String[] listAll() throws IOException {
+return delegate.listAll();
+  }
+
+  @Override
+  public long fileLength(String name) throws IOException {
+return delegate.fileLength(name);
+  }
+
+  @Override
+  public void rename(String source, String dest) throws IOException {
+delegate.rename(source, dest);
+  }
+
+  @Override
+  public void close() throws IOException {
+delegate.close();

Review comment:
   Maybe use `IOUtils.close(...)` here instead?  It will try to close both 
even if one of them throws exception.

##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##
@@ -154,14 +146,43 @@ public IndexOutput createOutput(String name, IOContext 
context) throws IOExcepti
 if (context.context != Context.MERGE || 
context.mergeInfo.estimatedMergeBytes < minBytesDirect) {
   return delegate.createOutput(name, context);
 } else {
-  return new NativeUnixIndexOutput(getDirectory().resolve(name), name, 
mergeBufferSize);
+  return new DirectIOIndexOutput(getDirectory().resolve(name), name, 
mergeBufferSize);
 }
   }
 
-  @SuppressForbidden(reason = "java.io.File: native API requires old-style 
FileDescriptor")
-  private final static class NativeUnixIndexOutput extends IndexOutput {
+  @Override
+  public void deleteFile(String name) throws IOException {
+delegate.deleteFile(name);
+  }
+
+  @Override
+  public Set getPendingDeletions() throws IOException {

Review comment:
   I wonder if we should simply extend `FilterDirectory` here?  Let that 
class (designed for such delegation) handle delegating all existing `Directory` 
methods?  It would lower the risk that a new `Directory` method fails to get 
delegated here ...
   
   But, we can do this as followon issue.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-12-07 Thread GitBox


mikemccand commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r537561523



##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##
@@ -51,16 +50,9 @@
  * to the provided Directory instance.
  *
  * See Overview
+ * href="{@docRoot}/overview-summary.html#DirectIODirectory">Overview
  * for more details.
  *
- * To use this you must compile
- * NativePosixUtil.cpp (exposes Linux-specific APIs through
- * JNI) for your platform, by running ./gradlew build, and then 
putting the resulting
- * libLuceneNativeIO.so or libLuceneNativeIO.dylib
- * (from lucene/misc/native/build/lib/release/platform/) onto 
your dynamic
- * linker search path.
- *
  * WARNING: this code is very new and quite easily
  * could contain horrible bugs.  For example, here's one
  * known issue: if you use seek in IndexOutput, and then

Review comment:
   Let's leave the warning for now?  Because this was JNI code, bringing it 
risks like SIGSEGV / memory leaks / etc. (even though in this instances those 
risks were very small), and requiring C++ compilation to build per-platform, 
very few users tried it out.
   
   Now that it will be pure java (thank you @zacharymorn!!), it should see more 
usage.
   
   Especially for search applications that do concurrent indexing and searching 
on a single box (e.g. Elasticsearch!), this Directory should be a huge win to 
bring down interference of a large merge running on a box and impacting queries 
as hot pages are swapped out.
   
   So, let's wait and see if we can remove this warning at a later date, if 
users really do use this and have good results?
   
   Also, this directory requires some tuning (the buffer size).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-12-03 Thread GitBox


mikemccand commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r535354667



##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##
@@ -51,16 +50,9 @@
  * to the provided Directory instance.
  *
  * See Overview
+ * href="{@docRoot}/overview-summary.html#DirectIODirectory">Overview
  * for more details.
  *
- * To use this you must compile
- * NativePosixUtil.cpp (exposes Linux-specific APIs through
- * JNI) for your platform, by running ./gradlew build, and then 
putting the resulting
- * libLuceneNativeIO.so or libLuceneNativeIO.dylib
- * (from lucene/misc/native/build/lib/release/platform/) onto 
your dynamic
- * linker search path.
- *
  * WARNING: this code is very new and quite easily
  * could contain horrible bugs.  For example, here's one
  * known issue: if you use seek in IndexOutput, and then

Review comment:
   We can remove this `For example` -- this is ancient history and no 
longer can occur since we removed `IndexOutput.seek` entirely.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-20 Thread GitBox


mikemccand commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r52826



##
File path: lucene/misc/native/src/main/posix/NativePosixUtil.cpp
##
@@ -102,6 +102,7 @@ JNIEXPORT jint JNICALL 
Java_org_apache_lucene_misc_store_NativePosixUtil_posix_1
 }
 #endif
 
+// TODO: To be removed together?

Review comment:
   Awesome!
   
   Yeah, `// nocommit` comment is really helpful in patches/PRs to note 
something that you know you need to fix before pushing.  Plus, our build 
tooling will catch us if we accidentally push code with `// nocommit` still.

##
File path: 
lucene/test-framework/src/java/org/apache/lucene/store/MockDirectoryWrapper.java
##
@@ -745,7 +745,7 @@ public synchronized IndexInput openInput(String name, 
IOContext context) throws
   maybeThrowDeterministicException();
 }
 if (!LuceneTestCase.slowFileExists(in, name)) {
-  throw randomState.nextBoolean() ? new FileNotFoundException(name + " in 
dir=" + in) : new NoSuchFileException(name + " in dir=" + in);
+  throw new NoSuchFileException(name + " in dir=" + in);

Review comment:
   Hmm why did we remove the randomness about which (confusingly) different 
exception to throw here?  This randomness was (is?) useful to help test that 
Lucene indeed catches `FNFE` and `NSFE` interchangeably.

##
File path: 
lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##
@@ -72,14 +64,17 @@
  * and OS X; other Unixes should work but have not been
  * tested!  Use at your own risk.
  *
+ * Throws UnsupportedOperationException if the operating system or

Review comment:
   Can we move this to `@throws` javadoc tag?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-17 Thread GitBox


mikemccand commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r525257315



##
File path: lucene/misc/native/src/main/posix/NativePosixUtil.cpp
##
@@ -102,6 +102,7 @@ JNIEXPORT jint JNICALL 
Java_org_apache_lucene_misc_store_NativePosixUtil_posix_1
 }
 #endif
 
+// TODO: To be removed together?

Review comment:
   This should really be a `// nocommit` right?  I.e. we want to solve this 
question before committing, and we don't plan to commit this `// TODO`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-09 Thread GitBox


mikemccand commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r519868079



##
File path: lucene/misc/src/java/org/apache/lucene/store/DirectIODirectory.java
##
@@ -66,45 +66,32 @@
  *
  * @lucene.experimental
  */
-public class NativeUnixDirectory extends FSDirectory {
+public class DirectIODirectory extends FSDirectory {
 
   // TODO: this is OS dependent, but likely 512 is the LCD
   private final static long ALIGN = 512;
   private final static long ALIGN_NOT_MASK = ~(ALIGN-1);
-  
-  /** Default buffer size before writing to disk (256 KB);
-   *  larger means less IO load but more RAM and direct
-   *  buffer storage space consumed during merging. */
-
-  public final static int DEFAULT_MERGE_BUFFER_SIZE = 262144;
 
   /** Default min expected merge size before direct IO is
*  used (10 MB): */
   public final static long DEFAULT_MIN_BYTES_DIRECT = 10*1024*1024;
 
-  private final int mergeBufferSize;
   private final long minBytesDirect;
   private final Directory delegate;
 
   /** Create a new NIOFSDirectory for the named location.
* 
* @param path the path of the directory
-   * @param lockFactory to use
-   * @param mergeBufferSize Size of buffer to use for
-   *merging.  See {@link #DEFAULT_MERGE_BUFFER_SIZE}.
* @param minBytesDirect Merges, or files to be opened for
*   reading, smaller than this will
*   not use direct IO.  See {@link
*   #DEFAULT_MIN_BYTES_DIRECT}
+   * @param lockFactory to use
* @param delegate fallback Directory for non-merges
* @throws IOException If there is a low-level I/O error
*/
-  public NativeUnixDirectory(Path path, int mergeBufferSize, long 
minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException 
{
+  public DirectIODirectory(Path path, long minBytesDirect, LockFactory 
lockFactory, Directory delegate) throws IOException {
 super(path, lockFactory);
-if ((mergeBufferSize & ALIGN) != 0) {
-  throw new IllegalArgumentException("mergeBufferSize must be 0 mod " + 
ALIGN + " (got: " + mergeBufferSize + ")");
-}
-this.mergeBufferSize = mergeBufferSize;

Review comment:
   Hmm but previously it was a 256 KB buffer, by default, and caller could 
change that if they wanted.
   
   But with this change, it's now hardwired to something much smaller (512 
bytes, or 1 or 4 KB; I'm not sure what "typical" filesystem block sizes are 
now?).
   
   This buffering, and its size, is really important when using direct IO 
because every write will go straight to the device, so a larger buffer 
amortizes the cost of such writes.  I think we need to keep the option for 
caller to set this buffer size, and leave it at the 256 KB default?  Or at 
least, let's not try to change that behavior here, and leave this change 100% 
focused on moving to pure java implementation?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-06 Thread GitBox


mikemccand commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r518846067



##
File path: lucene/misc/src/java/org/apache/lucene/store/DirectIODirectory.java
##
@@ -164,15 +149,16 @@ public IndexOutput createOutput(String name, IOContext 
context) throws IOExcepti
 private long fileLength;
 private boolean isOpen;
 
-public NativeUnixIndexOutput(Path path, String name, int bufferSize) 
throws IOException {
-  super("NativeUnixIndexOutput(path=\"" + path.toString() + "\")", name);
-  //this.path = path;
-  final FileDescriptor fd = NativePosixUtil.open_direct(path.toString(), 
false);
-  fos = new FileOutputStream(fd);
-  //fos = new FileOutputStream(path);
-  channel = fos.getChannel();
-  buffer = ByteBuffer.allocateDirect(bufferSize);
-  this.bufferSize = bufferSize;
+  @SuppressForbidden(reason = "com.sun.nio.file.ExtendedOpenOption: Direct I/O 
with FileChannel requires the use of internal proprietary API 
ExtendedOpenOption.DIRECT")
+  public DirectIOIndexOutput(Path path, String name) throws IOException {
+  super("DirectIOIndexOutput(path=\"" + path.toString() + "\")", name);
+
+  int blockSize = Math.toIntExact(Files.getFileStore(path).getBlockSize());
+  bufferSize = Math.addExact(blockSize, blockSize - 1);
+  buffer = ByteBuffer.allocateDirect(bufferSize).alignedSlice(blockSize);

Review comment:
   Hmm I wonder why we got away without using `.alignedSlice` before?

##
File path: lucene/CHANGES.txt
##
@@ -156,6 +156,9 @@ Improvements
 * LUCENE-9531: Consolidated CharStream and FastCharStream classes: these have 
been moved
   from each query parser package to org.apache.lucene.queryparser.charstream 
(Dawid Weiss).
 
+* LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO 
flag,

Review comment:
   Will this be Lucene 9.0 only?
   
   The `com.sun.nio.file.ExtendedOpenOption.DIRECT` was added as of JDK 10 
right?
   
   Since Lucene 8.x still allows Java 8.x (and Lucene 9.x will require Java 
11.x minimum), I think this improvement must be Lucene 9.x only?

##
File path: lucene/misc/src/java/org/apache/lucene/store/DirectIODirectory.java
##
@@ -66,45 +66,32 @@
  *
  * @lucene.experimental
  */
-public class NativeUnixDirectory extends FSDirectory {
+public class DirectIODirectory extends FSDirectory {
 
   // TODO: this is OS dependent, but likely 512 is the LCD
   private final static long ALIGN = 512;
   private final static long ALIGN_NOT_MASK = ~(ALIGN-1);
-  
-  /** Default buffer size before writing to disk (256 KB);
-   *  larger means less IO load but more RAM and direct
-   *  buffer storage space consumed during merging. */
-
-  public final static int DEFAULT_MERGE_BUFFER_SIZE = 262144;
 
   /** Default min expected merge size before direct IO is
*  used (10 MB): */
   public final static long DEFAULT_MIN_BYTES_DIRECT = 10*1024*1024;
 
-  private final int mergeBufferSize;
   private final long minBytesDirect;
   private final Directory delegate;
 
   /** Create a new NIOFSDirectory for the named location.
* 
* @param path the path of the directory
-   * @param lockFactory to use
-   * @param mergeBufferSize Size of buffer to use for
-   *merging.  See {@link #DEFAULT_MERGE_BUFFER_SIZE}.
* @param minBytesDirect Merges, or files to be opened for
*   reading, smaller than this will
*   not use direct IO.  See {@link
*   #DEFAULT_MIN_BYTES_DIRECT}
+   * @param lockFactory to use
* @param delegate fallback Directory for non-merges
* @throws IOException If there is a low-level I/O error
*/
-  public NativeUnixDirectory(Path path, int mergeBufferSize, long 
minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException 
{
+  public DirectIODirectory(Path path, long minBytesDirect, LockFactory 
lockFactory, Directory delegate) throws IOException {
 super(path, lockFactory);
-if ((mergeBufferSize & ALIGN) != 0) {
-  throw new IllegalArgumentException("mergeBufferSize must be 0 mod " + 
ALIGN + " (got: " + mergeBufferSize + ")");
-}
-this.mergeBufferSize = mergeBufferSize;

Review comment:
   Hmm does this mean we are losing the byte buffering during merging?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org