[ 
https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579338#comment-14579338
 ] 

Mark Miller commented on LUCENE-6536:
-------------------------------------

bq.  is it the directory or the blockcache that is the source of most of the 
corruptions

There are two issues:

* The write side of the block cache is buggy and can corrupt indexes - I don't 
think it provides any value anyway so it should just be cut out - currently 
it's turned off.
* The hdfs directory doesn't do a classic fsync - to get this kind of behavior 
you have to write files to hdfs in some really slow mode I believe - it doesn't 
have an API compatible with how Lucene fsyncs.

All and all the block cache performance is good enough for a ton of use cases, 
but the overall approach and management of it is not great. The Apache Blur 
project has made a better version that is better for even more uses cases, but 
it requires Unsafe usage for direct memory access.

> Migrate HDFSDirectory from solr to lucene-hadoop
> ------------------------------------------------
>
>                 Key: LUCENE-6536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6536
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Greg Bowyer
>              Labels: hadoop, hdfs, lucene, solr
>         Attachments: LUCENE-6536.patch
>
>
> I am currently working on a search engine that is throughput orientated and 
> works entirely in apache-spark.
> As part of this, I need a directory implementation that can operate on HDFS 
> directly. This got me thinking, can I take the one that was worked on so hard 
> for solr hadoop.
> As such I migrated the HDFS and blockcache directories out to a lucene-hadoop 
> module.
> Having done this work, I am not sure if it is actually a good change, it 
> feels a bit messy, and I dont like how the Metrics class gets extended and 
> abused.
> Thoughts anyone



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to