[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902696#action_12902696 ] HBase Review Board commented on HIVE-1434: -- Message from: "John Sichi" --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/721/ --- Review request for Hive Developers. Summary --- review by JVS This addresses bug HIVE-1434. http://issues.apache.org/jira/browse/HIVE-1434 Diffs - http://svn.apache.org/repos/asf/hadoop/hive/trunk/build-common.xml 981263 http://svn.apache.org/repos/asf/hadoop/hive/trunk/build.xml 981263 http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/build.xml PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/ivy.xml PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/antlr-3.1.3.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/apache-cassandra-0.6.3.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/avro-1.2.0-dev.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/clhm-production.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-cli-1.1.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-codec-1.2.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-collections-3.2.1.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-lang-2.4.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/google-collections-1.0.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/high-scale-lib.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/ivy-2.1.0.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jackson-core-asl-1.4.0.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jackson-mapper-asl-1.4.0.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jline-0.9.94.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/json-simple-1.1.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/libthrift-r917130.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/log4j-1.2.14.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/slf4j-api-1.5.8.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/slf4j-log4j12-1.5.8.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/storage-conf.xml PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/CassandraSerDe.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/CassandraStorageHandler.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/CassandraRowResult.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/CassandraSplit.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/HiveCassandraTableInputFormat.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/HiveIColumn.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/LazyCassandraCellMap.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/LazyCassandraRow.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/CassandraColumn.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/CassandraPut.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/HiveCassandraOutputFormat.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/udf/GetCassandraCol
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902688#action_12902688 ] John Sichi commented on HIVE-1434: -- @Ed: to clarify about the tarball; we would just use a standard Cassandra distribution, e.g. http://apache.opensourceresources.org/cassandra/0.6.4/apache-cassandra-0.6.4-bin.tar.gz > Cassandra Storage Handler > - > > Key: HIVE-1434 > URL: https://issues.apache.org/jira/browse/HIVE-1434 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.7.0 > > Attachments: cas-handle.tar.gz, hive-1434-1.txt, > hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt > > > Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902654#action_12902654 ] Basab Maulik commented on HIVE-1434: Re: Should we attempt to factor out the HBase commonality immediately, or commit the overlapping code and then do refactoring as a followup? I'm fine either way; I can give suggestions on how to create the reusable abstract bases and where to package+name them. and Re: For the refactor, let's do it in a followup and also talk with the Hypertable folks to plan it out, since I think they had to copy a lot of code also. I think it will be possible to do it in a way that is useful and understandable since we now have three instances to work from. Let us refactor as a follow up. It will be good for these pieces to stabilize independently initially. > Cassandra Storage Handler > - > > Key: HIVE-1434 > URL: https://issues.apache.org/jira/browse/HIVE-1434 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.7.0 > > Attachments: cas-handle.tar.gz, hive-1434-1.txt, > hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt > > > Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902621#action_12902621 ] John Sichi commented on HIVE-1434: -- Regarding the dependencies: if we use the same mechanism as Hadoop, then we don't need a Maven repo. We just point ivy at the tarball location. See target ivy-retrieve-hadoop-source in build-common.xml, and the various ivy.xml files in subdirs. If you can get this working against a standard Apache mirror download, I can start working on getting the files hosted on mirror.facebook.net, which has had better availability in the past. For the refactor, let's do it in a followup and also talk with the Hypertable folks to plan it out, since I think they had to copy a lot of code also. I think it will be possible to do it in a way that is useful and understandable since we now have three instances to work from. > Cassandra Storage Handler > - > > Key: HIVE-1434 > URL: https://issues.apache.org/jira/browse/HIVE-1434 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.7.0 > > Attachments: cas-handle.tar.gz, hive-1434-1.txt, > hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt > > > Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902616#action_12902616 ] Edward Capriolo commented on HIVE-1434: --- Maven, I am on the fence about it. We actually do not need all the libs I included. Having them in a tarball sounds good, but making a maven repo for only this purpose seems to be a lot of work. {quote} Should we attempt to factor out the HBase commonality immediately, or commit the overlapping code and then do refactoring as a followup? I'm fine either way; I can give suggestions on how to create the reusable abstract bases and where to package+name them.{quote} If you can specify specific instances then sure. The code may be 99% the same, but that one nuance is going to make the abstractions confusing and useless. I await further review. > Cassandra Storage Handler > - > > Key: HIVE-1434 > URL: https://issues.apache.org/jira/browse/HIVE-1434 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.7.0 > > Attachments: cas-handle.tar.gz, hive-1434-1.txt, > hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt > > > Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902611#action_12902611 ] John Sichi commented on HIVE-1434: -- Some points to be resolved. * I'd like to avoid checking all of the dependency jars into cassandra-handler/lib. From googling around, it sounds like an official Cassandra maven repo is not going to happen any time soon, and I'm not sure if we can use the unofficial ones. Would it make sense to just do what we've been doing with the Hadoop dependencies, i.e. fetch the tarball via ivy and then unpack it? If so, I can get it added to mirror.facebook.net/facebook/hive-deps. * Should we attempt to factor out the HBase commonality immediately, or commit the overlapping code and then do refactoring as a followup? I'm fine either way; I can give suggestions on how to create the reusable abstract bases and where to package+name them. * Need a checkstyle run to bring the code into conformance there. * The tests are very skimpy currently; it would be good to add some joins, unions, etc. * There are some minor code cleanups needed; I'll create a review board entry and post them there. > Cassandra Storage Handler > - > > Key: HIVE-1434 > URL: https://issues.apache.org/jira/browse/HIVE-1434 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Fix For: 0.7.0 > > Attachments: cas-handle.tar.gz, hive-1434-1.txt, > hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt > > > Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899129#action_12899129 ] John Sichi commented on HIVE-1434: -- I'll start taking a closer look at this one...may take me a few days. > Cassandra Storage Handler > - > > Key: HIVE-1434 > URL: https://issues.apache.org/jira/browse/HIVE-1434 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Attachments: cas-handle.tar.gz, hive-1434-1.txt, > hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt > > > Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898786#action_12898786 ] Amr Awadallah commented on HIVE-1434: - I am out of office on vacation and will be slower than usual in responding to emails. If this is urgent then please call my cell phone (or send an sms), otherwise I will reply to your email when I get back. Thanks for your patience, -- amr > Cassandra Storage Handler > - > > Key: HIVE-1434 > URL: https://issues.apache.org/jira/browse/HIVE-1434 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Attachments: cas-handle.tar.gz, hive-1434-1.txt, > hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt > > > Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884397#action_12884397 ] John Sichi commented on HIVE-1434: -- Hey Ed, If you take a look at HIVE-1229, Basab has been helping us clean up the API dependencies, and we have been successful in moving some stuff over to mapreduce from mapred. (I had done some of that already in HiveHFileOutputFormat in order to get it to work, e.g. by making up my own TaskAttemptContext instance wrapping a Progressable.) I think you may be able to do the same. As a whole, we can't drop the pre-0.20 dependencies from Hive yet, but for the HBase Handler, we made the restriction that it only builds with Hadoop 0.20 and later, so you can do the same for Cassandra. > Cassandra Storage Handler > - > > Key: HIVE-1434 > URL: https://issues.apache.org/jira/browse/HIVE-1434 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Attachments: hive-1434-1.txt > > > Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884384#action_12884384 ] Edward Capriolo commented on HIVE-1434: --- I actually got pretty far with this simply duplicating the logic in the Hbase Storage handler. Unfortunately I hit a snafu. Cassandra is not using the deprecated mapred.*, their input format is using mapreduce.*. I have seen a few tickets for this, and as far as I know hive is 100% mapred. So to get this done we either have to wait until hive is converted to mapreduce, or I have to make an "old school" mapred based input format for cassandra. @John am I wrong? Is there a way to work with mapreduce input formats that I am not understanding? > Cassandra Storage Handler > - > > Key: HIVE-1434 > URL: https://issues.apache.org/jira/browse/HIVE-1434 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Edward Capriolo >Assignee: Edward Capriolo > Attachments: hive-1434-1.txt > > > Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883743#action_12883743 ] Jeremy Hanna commented on HIVE-1434: I guess this is the hive version of CASSANDRA-913. I saw hammer in the hall at the hadoop summit and he said there was a hive ticket on this now. > Cassandra Storage Handler > - > > Key: HIVE-1434 > URL: https://issues.apache.org/jira/browse/HIVE-1434 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Edward Capriolo >Assignee: Edward Capriolo > > Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.