[ 
https://issues.apache.org/jira/browse/HAMA-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255003#comment-13255003
 ] 

Keith Turner commented on HAMA-544:
-----------------------------------

I am an Accumulo developer and I was looking at ACCUMULO-532.  I was thinking 
about the problem of where this code should be put.   It seems like there are 
three options.

 # Put hama input/output formats in Hama
 # Put hama input/output formats in cassandra, accumulo, hbase, ..., etc
 # Put hama input/output formats on something like github

I am not sure what the best option is, we are trying to figure that out.  Below 
are some thoughts about this issue.

Option 3 may not be a blessed apache way of doing business.  

One thing nice about option 3, is that if accumulo-hama has a serious bug that 
it can release immediately w/o waiting.  For options 1 and 2, either hama or 
accumulo must release to fix a serious hama-accumulo bug.  Options 3 may also 
make it easier to use a newer version of hama w/ an older version of accumulo.  
If accumulo ships with hama, and you have an older version of accumulo it 
probably depends on an older version of hama.  This may not be an issue if the 
hama API is really stable.

With option 3, Accumulo could include a hama jar in contrib w/ a link to 
github.  

>From the perspective of Accumulo, we have to work this same issue out for lots 
>of projects.  For example accumulo could ship with connectors for pig, hive, 
>gora, hama, etc.  This increases the # of dependencies that accumulo has that 
>users may not need.  Currently the accumulo pig adapter is on github.

Apache Gora seems to be doing option 1.  They have gora-core, gora-accumulo, 
gora-hbase, etc.  Each one of these are maven sub projects of Gora.  One nice 
thing about this for the gora case is that all of the gora stores can share 
test code. For example gora-accumlo extends a test class from gora-core for 
testing.

I suppose option 1 is bad because hama is a subsystem like map reduce.  For 
example gora and pig depend on map reduce, and its probably ok to make them 
depend on hbase or accumulo.  However, you would not want map reduce or hama to 
depend on a certain version of accumulo or hbase.  If accumulo-1.3 jars were in 
the map reduce system lib dir, I do not think user accumulo-1.4 jars can 
override those. I suspect the same is true for hama, which is why option 1 is 
bad?

I wrote the gora-accumulo backend and then I wrote goraci 
(https://github.com/keith-turner/goraci).  To make goraci easy to run I wrote a 
slightly complex script and pom.  Maybe I was being a bit OCD, but when I ran 
goraci against accumulo I only wanted the jars that were needed on the 
classpath.  For example, I did not want hbase jars when running goraci against 
accumulo and visa versa.  So what steps would the user have to go through to 
have hama read from accumulo w/ options 2 vs 3? Seems like the main diff is add 
one extra jar to the classpath?  Is there any other burden?  Its nice to make 
things as easy as possible for the user.
                
> Create InputFormats/OutputFormats for HBase
> -------------------------------------------
>
>                 Key: HAMA-544
>                 URL: https://issues.apache.org/jira/browse/HAMA-544
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp
>    Affects Versions: 0.5.0
>            Reporter: praveen sripati
>            Assignee: praveen sripati
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> Create InputFormats/OutputFormats for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to