[jira] [Commented] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop

Greg Bowyer (JIRA) Tue, 09 Jun 2015 09:58:33 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579210#comment-14579210
 ]


Greg Bowyer commented on LUCENE-6536:
-------------------------------------

bq. Questions:
bq. What will be done to deal with the bugginess of this thing? I see many 
reports of user corruption issues. By committing it, we take responsibility for 
this and it becomes "our problem". I don't want to see the code committed to 
lucene just for this reason.

Fix its bugs ;), joking aside is it the directory or the blockcache that is the 
source of most of the corruptions

bq. What will be done about the performance? I am not really sure the entire 
technique is viable.
My usecase is a bit odd, I have many small (2*HDFS block) indexes that get run 
over map jobs in hadoop. The performance I got last time I did this (with a 
dirty hack Directory that copied the files in and out of HDFS :S) was pretty 
good.

Its a throughput orientated usage, I think if you tried to use this to back an 
online searcher you would have poor performance.

bq. Personally, I think if someone wants to do this, a better integration point 
is to make it a java 7 filesystem provider. That is really how such a 
filesystem should work anyway.

That is awesome I didnt know such an SPI existed in java. I have found a few 
people that are trying to make a provider for hadoop.

I also dont have the greatest love for this path, the more test manipulations I 
did the less and less it felt like a simple feature that should be in lucene. I 
might try to either strip out the block-cache from this patch, or use a HDFS 
filesystem SPI in java7.

> Migrate HDFSDirectory from solr to lucene-hadoop
> ------------------------------------------------
>
>                 Key: LUCENE-6536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6536
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Greg Bowyer
>              Labels: hadoop, hdfs, lucene, solr
>         Attachments: LUCENE-6536.patch
>
>
> I am currently working on a search engine that is throughput orientated and 
> works entirely in apache-spark.
> As part of this, I need a directory implementation that can operate on HDFS 
> directly. This got me thinking, can I take the one that was worked on so hard 
> for solr hadoop.
> As such I migrated the HDFS and blockcache directories out to a lucene-hadoop 
> module.
> Having done this work, I am not sure if it is actually a good change, it 
> feels a bit messy, and I dont like how the Metrics class gets extended and 
> abused.
> Thoughts anyone



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop

Reply via email to