[ https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579210#comment-14579210 ]
Greg Bowyer commented on LUCENE-6536: ------------------------------------- bq. Questions: bq. What will be done to deal with the bugginess of this thing? I see many reports of user corruption issues. By committing it, we take responsibility for this and it becomes "our problem". I don't want to see the code committed to lucene just for this reason. Fix its bugs ;), joking aside is it the directory or the blockcache that is the source of most of the corruptions bq. What will be done about the performance? I am not really sure the entire technique is viable. My usecase is a bit odd, I have many small (2*HDFS block) indexes that get run over map jobs in hadoop. The performance I got last time I did this (with a dirty hack Directory that copied the files in and out of HDFS :S) was pretty good. Its a throughput orientated usage, I think if you tried to use this to back an online searcher you would have poor performance. bq. Personally, I think if someone wants to do this, a better integration point is to make it a java 7 filesystem provider. That is really how such a filesystem should work anyway. That is awesome I didnt know such an SPI existed in java. I have found a few people that are trying to make a provider for hadoop. I also dont have the greatest love for this path, the more test manipulations I did the less and less it felt like a simple feature that should be in lucene. I might try to either strip out the block-cache from this patch, or use a HDFS filesystem SPI in java7. > Migrate HDFSDirectory from solr to lucene-hadoop > ------------------------------------------------ > > Key: LUCENE-6536 > URL: https://issues.apache.org/jira/browse/LUCENE-6536 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Greg Bowyer > Labels: hadoop, hdfs, lucene, solr > Attachments: LUCENE-6536.patch > > > I am currently working on a search engine that is throughput orientated and > works entirely in apache-spark. > As part of this, I need a directory implementation that can operate on HDFS > directly. This got me thinking, can I take the one that was worked on so hard > for solr hadoop. > As such I migrated the HDFS and blockcache directories out to a lucene-hadoop > module. > Having done this work, I am not sure if it is actually a good change, it > feels a bit messy, and I dont like how the Metrics class gets extended and > abused. > Thoughts anyone -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org