[
https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579338#comment-14579338
]
Mark Miller commented on LUCENE-6536:
-------------------------------------
bq. is it the directory or the blockcache that is the source of most of the
corruptions
There are two issues:
* The write side of the block cache is buggy and can corrupt indexes - I don't
think it provides any value anyway so it should just be cut out - currently
it's turned off.
* The hdfs directory doesn't do a classic fsync - to get this kind of behavior
you have to write files to hdfs in some really slow mode I believe - it doesn't
have an API compatible with how Lucene fsyncs.
All and all the block cache performance is good enough for a ton of use cases,
but the overall approach and management of it is not great. The Apache Blur
project has made a better version that is better for even more uses cases, but
it requires Unsafe usage for direct memory access.
> Migrate HDFSDirectory from solr to lucene-hadoop
> ------------------------------------------------
>
> Key: LUCENE-6536
> URL: https://issues.apache.org/jira/browse/LUCENE-6536
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Greg Bowyer
> Labels: hadoop, hdfs, lucene, solr
> Attachments: LUCENE-6536.patch
>
>
> I am currently working on a search engine that is throughput orientated and
> works entirely in apache-spark.
> As part of this, I need a directory implementation that can operate on HDFS
> directly. This got me thinking, can I take the one that was worked on so hard
> for solr hadoop.
> As such I migrated the HDFS and blockcache directories out to a lucene-hadoop
> module.
> Having done this work, I am not sure if it is actually a good change, it
> feels a bit messy, and I dont like how the Metrics class gets extended and
> abused.
> Thoughts anyone
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]