[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-08-25 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902696#action_12902696
 ] 

HBase Review Board commented on HIVE-1434:
--

Message from: "John Sichi" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/721/
---

Review request for Hive Developers.


Summary
---

review by JVS


This addresses bug HIVE-1434.
http://issues.apache.org/jira/browse/HIVE-1434


Diffs
-

  http://svn.apache.org/repos/asf/hadoop/hive/trunk/build-common.xml 981263 
  http://svn.apache.org/repos/asf/hadoop/hive/trunk/build.xml 981263 
  http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/build.xml 
PRE-CREATION 
  http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/ivy.xml 
PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/antlr-3.1.3.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/apache-cassandra-0.6.3.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/avro-1.2.0-dev.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/clhm-production.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-cli-1.1.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-codec-1.2.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-collections-3.2.1.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-lang-2.4.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/google-collections-1.0.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/high-scale-lib.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/ivy-2.1.0.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jackson-core-asl-1.4.0.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jackson-mapper-asl-1.4.0.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jline-0.9.94.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/json-simple-1.1.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/libthrift-r917130.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/log4j-1.2.14.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/slf4j-api-1.5.8.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/slf4j-log4j12-1.5.8.jar
 UNKNOWN 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/storage-conf.xml
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/CassandraSerDe.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/CassandraStorageHandler.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/CassandraRowResult.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/CassandraSplit.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/HiveCassandraTableInputFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/HiveIColumn.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/LazyCassandraCellMap.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/LazyCassandraRow.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/CassandraColumn.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/CassandraPut.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/HiveCassandraOutputFormat.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/udf/GetCassandraCol

[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-08-25 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902688#action_12902688
 ] 

John Sichi commented on HIVE-1434:
--

@Ed:  to clarify about the tarball; we would just use a standard Cassandra 
distribution, e.g.

http://apache.opensourceresources.org/cassandra/0.6.4/apache-cassandra-0.6.4-bin.tar.gz


> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-08-25 Thread Basab Maulik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902654#action_12902654
 ] 

Basab Maulik commented on HIVE-1434:


Re: Should we attempt to factor out the HBase commonality immediately, or 
commit the overlapping code and then do refactoring as a followup? I'm fine 
either way; I can give suggestions on how to create the reusable abstract bases 
and where to package+name them.

and Re: For the refactor, let's do it in a followup and also talk with the 
Hypertable folks to plan it out, since I think they had to copy a lot of code 
also. I think it will be possible to do it in a way that is useful and 
understandable since we now have three instances to work from.

Let us refactor as a follow up. It will be good for these pieces to stabilize 
independently initially.

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-08-25 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902621#action_12902621
 ] 

John Sichi commented on HIVE-1434:
--

Regarding the dependencies:  if we use the same mechanism as Hadoop, then we 
don't need a Maven repo.  We just point ivy at the tarball location.  See 
target ivy-retrieve-hadoop-source in build-common.xml, and the various ivy.xml 
files in subdirs.  If you can get this working against a standard Apache mirror 
download, I can start working on getting the files hosted on 
mirror.facebook.net, which has had better availability in the past.

For the refactor, let's do it in a followup and also talk with the Hypertable 
folks to plan it out, since I think they had to copy a lot of code also.  I 
think it will be possible to do it in a way that is useful and understandable 
since we now have three instances to work from.


> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-08-25 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902616#action_12902616
 ] 

Edward Capriolo commented on HIVE-1434:
---

Maven, I am on the fence about it. We actually do not need all the libs I 
included. Having them in a tarball sounds good, but making a maven repo for 
only this purpose seems to be a lot of work.

{quote}
Should we attempt to factor out the HBase commonality immediately, or commit 
the overlapping code and then do refactoring as a followup? I'm fine either 
way; I can give suggestions on how to create the reusable abstract bases and 
where to package+name them.{quote}
If you can specify specific instances then sure. The code may be 99% the same, 
but that one nuance is going to make the abstractions confusing and useless. 

I await further review.

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-08-25 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902611#action_12902611
 ] 

John Sichi commented on HIVE-1434:
--

Some points to be resolved.

* I'd like to avoid checking all of the dependency jars into 
cassandra-handler/lib.  From googling around, it sounds like an official 
Cassandra maven repo is not going to happen any time soon, and I'm not sure if 
we can use the unofficial ones.  Would it make sense to just do what we've been 
doing with the Hadoop dependencies, i.e. fetch the tarball via ivy and then 
unpack it?  If so, I can get it added to mirror.facebook.net/facebook/hive-deps.

* Should we attempt to factor out the HBase commonality immediately, or commit 
the overlapping code and then do refactoring as a followup?  I'm fine either 
way; I can give suggestions on how to create the reusable abstract bases and 
where to package+name them.

* Need a checkstyle run to bring the code into conformance there.

* The tests are very skimpy currently; it would be good to add some joins, 
unions, etc.

* There are some minor code cleanups needed; I'll create a review board entry 
and post them there.


> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-08-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899129#action_12899129
 ] 

John Sichi commented on HIVE-1434:
--

I'll start taking a closer look at this one...may take me a few days.


> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-08-15 Thread Amr Awadallah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898786#action_12898786
 ] 

Amr Awadallah commented on HIVE-1434:
-

I am out of office on vacation and will be slower than usual in
responding to emails. If this is urgent then please call my cell phone
(or send an sms), otherwise I will reply to your email when I get
back.

Thanks for your patience,

-- amr


> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-07-01 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884397#action_12884397
 ] 

John Sichi commented on HIVE-1434:
--

Hey Ed,

If you take a look at HIVE-1229, Basab has been helping us clean up the API 
dependencies, and we have been successful in moving some stuff over to 
mapreduce from mapred.  (I had done some of that already in 
HiveHFileOutputFormat in order to get it to work, e.g. by making up my own 
TaskAttemptContext instance wrapping a Progressable.)  I think you may be able 
to do the same.

As a whole, we can't drop the pre-0.20 dependencies from Hive yet, but for the 
HBase Handler, we made the restriction that it only builds with Hadoop 0.20 and 
later, so you can do the same for Cassandra.


> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: hive-1434-1.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-07-01 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884384#action_12884384
 ] 

Edward Capriolo commented on HIVE-1434:
---

I actually got pretty far with this simply duplicating the logic in the Hbase 
Storage handler. Unfortunately I hit a snafu. Cassandra is not using the 
deprecated mapred.*, their input format is using mapreduce.*. I have seen a few 
tickets for this, and as far as I know hive is 100% mapred. So to get this done 
we either have to wait until hive is converted to mapreduce, or I have to make 
an "old school" mapred based input format for cassandra. 

@John am I wrong? Is there a way to work with mapreduce input formats that I am 
not understanding?



> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: hive-1434-1.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-06-29 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883743#action_12883743
 ] 

Jeremy Hanna commented on HIVE-1434:


I guess this is the hive version of CASSANDRA-913.  I saw hammer in the hall at 
the hadoop summit and he said there was a hive ticket on this now.

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.