[
https://issues.apache.org/jira/browse/HAMA-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255003#comment-13255003
]
Keith Turner commented on HAMA-544:
-----------------------------------
I am an Accumulo developer and I was looking at ACCUMULO-532. I was thinking
about the problem of where this code should be put. It seems like there are
three options.
# Put hama input/output formats in Hama
# Put hama input/output formats in cassandra, accumulo, hbase, ..., etc
# Put hama input/output formats on something like github
I am not sure what the best option is, we are trying to figure that out. Below
are some thoughts about this issue.
Option 3 may not be a blessed apache way of doing business.
One thing nice about option 3, is that if accumulo-hama has a serious bug that
it can release immediately w/o waiting. For options 1 and 2, either hama or
accumulo must release to fix a serious hama-accumulo bug. Options 3 may also
make it easier to use a newer version of hama w/ an older version of accumulo.
If accumulo ships with hama, and you have an older version of accumulo it
probably depends on an older version of hama. This may not be an issue if the
hama API is really stable.
With option 3, Accumulo could include a hama jar in contrib w/ a link to
github.
>From the perspective of Accumulo, we have to work this same issue out for lots
>of projects. For example accumulo could ship with connectors for pig, hive,
>gora, hama, etc. This increases the # of dependencies that accumulo has that
>users may not need. Currently the accumulo pig adapter is on github.
Apache Gora seems to be doing option 1. They have gora-core, gora-accumulo,
gora-hbase, etc. Each one of these are maven sub projects of Gora. One nice
thing about this for the gora case is that all of the gora stores can share
test code. For example gora-accumlo extends a test class from gora-core for
testing.
I suppose option 1 is bad because hama is a subsystem like map reduce. For
example gora and pig depend on map reduce, and its probably ok to make them
depend on hbase or accumulo. However, you would not want map reduce or hama to
depend on a certain version of accumulo or hbase. If accumulo-1.3 jars were in
the map reduce system lib dir, I do not think user accumulo-1.4 jars can
override those. I suspect the same is true for hama, which is why option 1 is
bad?
I wrote the gora-accumulo backend and then I wrote goraci
(https://github.com/keith-turner/goraci). To make goraci easy to run I wrote a
slightly complex script and pom. Maybe I was being a bit OCD, but when I ran
goraci against accumulo I only wanted the jars that were needed on the
classpath. For example, I did not want hbase jars when running goraci against
accumulo and visa versa. So what steps would the user have to go through to
have hama read from accumulo w/ options 2 vs 3? Seems like the main diff is add
one extra jar to the classpath? Is there any other burden? Its nice to make
things as easy as possible for the user.
> Create InputFormats/OutputFormats for HBase
> -------------------------------------------
>
> Key: HAMA-544
> URL: https://issues.apache.org/jira/browse/HAMA-544
> Project: Hama
> Issue Type: Sub-task
> Components: bsp
> Affects Versions: 0.5.0
> Reporter: praveen sripati
> Assignee: praveen sripati
> Priority: Minor
> Fix For: 0.6.0
>
>
> Create InputFormats/OutputFormats for HBase
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira