[ https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579338#comment-14579338 ]
Mark Miller commented on LUCENE-6536: ------------------------------------- bq. is it the directory or the blockcache that is the source of most of the corruptions There are two issues: * The write side of the block cache is buggy and can corrupt indexes - I don't think it provides any value anyway so it should just be cut out - currently it's turned off. * The hdfs directory doesn't do a classic fsync - to get this kind of behavior you have to write files to hdfs in some really slow mode I believe - it doesn't have an API compatible with how Lucene fsyncs. All and all the block cache performance is good enough for a ton of use cases, but the overall approach and management of it is not great. The Apache Blur project has made a better version that is better for even more uses cases, but it requires Unsafe usage for direct memory access. > Migrate HDFSDirectory from solr to lucene-hadoop > ------------------------------------------------ > > Key: LUCENE-6536 > URL: https://issues.apache.org/jira/browse/LUCENE-6536 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Greg Bowyer > Labels: hadoop, hdfs, lucene, solr > Attachments: LUCENE-6536.patch > > > I am currently working on a search engine that is throughput orientated and > works entirely in apache-spark. > As part of this, I need a directory implementation that can operate on HDFS > directly. This got me thinking, can I take the one that was worked on so hard > for solr hadoop. > As such I migrated the HDFS and blockcache directories out to a lucene-hadoop > module. > Having done this work, I am not sure if it is actually a good change, it > feels a bit messy, and I dont like how the Metrics class gets extended and > abused. > Thoughts anyone -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org