mikemccand commented on pull request #2052: URL: https://github.com/apache/lucene-solr/pull/2052#issuecomment-731234052
> > Second, it is extremely experimental and not clear when it provides benefits / what risks there are / etc. We need to learn much more about it, in diverse usage, to help here. I'd love to hear from Elasticsearch or Solr users if this helps, since those applications do simultaneous indexing (merging) and searching on the same box. > > I am sure, you at amazon will test it extensively. But I agree: I would not make this any default, I am still in favour of using plain MMAPDirectory. The risks of making it worse by using direct io is too heavy. LOL, actually, no, we at Amazon are not really planning on testing this extensively! Amazon (well, specifically our customer facing product search built directly on Lucene) uses [Lucene's fast segment replication feature](http://blog.mikemccandless.com/2017/09/lucenes-near-real-time-segment-index.html), which is much more efficient than Elasticsearch/Solr document replication when you need deep replicas because you have high peak QPS. So, at Amazon, at least for product search, we never index and search on the same JVM/hardware. Instead we have a few dedicated boxes for pure indexing, then replicate segments via S3 out to many boxes dedicated to searching. Lucene's segment replication feature allows us to use much less hardware to simultaneously handle high indexing throughput and high query throughput. But, since Elasticsearch/Solr do concurrent indexing (merging) and searching on a single box, by design, I think this Directory would be very interesting to test. It is likely a massive improvement in long-pole query latencies when heavy merges are running, since the merges would now bypass the OS's buffer (IO) cache entirely, using direct IO. > > Third, users are able to choose to use this when they instantiate the Directory implementation for their search application, so it is straightforward to adopt and play with, even if Lucene's core does not do so by default. > > +1 > > Elasticsearch may play with it and may also improve the parts where it is actually used. We do not know yet if it is a good idea to use it when you merge stuff that needs heavy random access to index (like you have a FilterCodecReader during merging, transform an index, resort it,...). Also it depends on codecs and how they are implemented. Unless we know that it works well for merging all partsof Lucene's core codecs, we may do a recommendation. > > If we decide to make it part of Lucene core, we can just move it. It will compile and work out of box with current Java versions and most file systems. Yeah, this is an awesome improvement thanks to this PR -- it becomes pure Java, yay! But we need more data of actual usage to decide if this is worth moving to core, let alone somehow defaulting to. Thanks @zacharymorn! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org