[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844584#action_12844584
 ] 

Namit Jain commented on HIVE-705:
-

+1

will commit if the tests pass

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705.3.patch, HIVE-705.4.patch, HIVE-705.5.patch, HIVE-705.6.patch, 
> HIVE-705.7.patch, HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch, zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844293#action_12844293
 ] 

John Sichi commented on HIVE-705:
-

Latest patch hits a test failure with latest trunk.  I'll upload a new patch 
soon to fix it.


> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705.3.patch, HIVE-705.4.patch, HIVE-705.5.patch, HIVE-705.6.patch, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch, zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-10 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843840#action_12843840
 ] 

John Sichi commented on HIVE-705:
-

Use HIVE-705.6.patch.


> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705.3.patch, HIVE-705.4.patch, HIVE-705.5.patch, HIVE-705.6.patch, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch, zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-10 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843826#action_12843826
 ] 

Namit Jain commented on HIVE-705:
-

[ivy:retrieve] :: problems summary ::
[ivy:retrieve]  WARNINGS
[ivy:retrieve]  module not found: hadoop#hbase;${hbase.version}
[ivy:retrieve]   hadoop-source: tried
[ivy:retrieve]-- artifact 
hadoop#hbase;${hbase.version}!hbase.tar.gz(source):
[ivy:retrieve]
http://mirror.facebook.net/facebook/hive-deps/hadoop/core/hbase-${hbase.version}/hbase-${hbase.version}.tar.gz
[ivy:retrieve]   apache-snapshot: tried
[ivy:retrieve]
https://repository.apache.org/content/repositories/snapshots/hadoop/hbase/${hbase.version}/hbase-${hbase.version}.pom
[ivy:retrieve]-- artifact 
hadoop#hbase;${hbase.version}!hbase.tar.gz(source):
[ivy:retrieve]
https://repository.apache.org/content/repositories/snapshots/hadoop/hbase/${hbase.version}/hbase-${hbase.version}.tar.gz
[ivy:retrieve]   maven2: tried
[ivy:retrieve]
http://repo1.maven.org/maven2/hadoop/hbase/${hbase.version}/hbase-${hbase.version}.pom
[ivy:retrieve]-- artifact 
hadoop#hbase;${hbase.version}!hbase.tar.gz(source):
[ivy:retrieve]
http://repo1.maven.org/maven2/hadoop/hbase/${hbase.version}/hbase-${hbase.version}.tar.gz
[ivy:retrieve]  ::
[ivy:retrieve]  ::  UNRESOLVED DEPENDENCIES ::
[ivy:retrieve]  ::
[ivy:retrieve]  :: hadoop#hbase;${hbase.version}: not found
[ivy:retrieve]  ::
[ivy:retrieve] 
[ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS



I am getting the following errors when I compile

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705.3.patch, HIVE-705.4.patch, HIVE-705.5.patch, HIVE-705_draft.patch, 
> HIVE-705_revision806905.patch, HIVE-705_revision883033.patch, 
> zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-10 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843773#action_12843773
 ] 

John Sichi commented on HIVE-705:
-

Followup JIRA issues have been logged and linked to this one as related.


> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705.3.patch, HIVE-705.4.patch, HIVE-705.5.patch, HIVE-705_draft.patch, 
> HIVE-705_revision806905.patch, HIVE-705_revision883033.patch, 
> zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-10 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843702#action_12843702
 ] 

John Sichi commented on HIVE-705:
-

@Jonathan:  I haven't seen any patch uploaded for HIVE-806.  The comments 
indicate that they have a way to customize the serialization per column in 
HBase, which could be interesting, but it's non-essential.  Once HIVE-705 gets 
committed, I'll post a comment on HIVE-806 and ask whether they want to keep it 
open or abandon it.

@Namit:  will do.


> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705.3.patch, HIVE-705.4.patch, HIVE-705.5.patch, HIVE-705_draft.patch, 
> HIVE-705_revision806905.patch, HIVE-705_revision883033.patch, 
> zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-10 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843680#action_12843680
 ] 

Namit Jain commented on HIVE-705:
-

John, can you file the follow-up jiras ?

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705.3.patch, HIVE-705.4.patch, HIVE-705.5.patch, HIVE-705_draft.patch, 
> HIVE-705_revision806905.patch, HIVE-705_revision883033.patch, 
> zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843594#action_12843594
 ] 

Jonathan Ellis commented on HIVE-705:
-

Thanks John, I read your wiki notes and it does look like this will work fine 
for Cassandra at least at the conceptual level.

Is HIVE-806 redundant w/ your latest patchset now?

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705.3.patch, HIVE-705.4.patch, HIVE-705.5.patch, HIVE-705_draft.patch, 
> HIVE-705_revision806905.patch, HIVE-705_revision883033.patch, 
> zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-03 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840993#action_12840993
 ] 

John Sichi commented on HIVE-705:
-

While testing, found a few bugs in HBaseSerDe.serialize for the case where a 
Hive map is being converted into an HBase column family; I'll fix these 
together with whatever comes out of review.


> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch, zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-03 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840848#action_12840848
 ] 

John Sichi commented on HIVE-705:
-

Prasad, the MetaHook interface is defined that way so that if a handler wants 
to, it can carry out the operation in a stateful fashion (e.g. if its 
underlying catalog supports transactions), but there is no requirement for it 
to keep state, and in fact the HBaseStorageHandler implementation is itself 
stateless (and has a NOP for three of its method implementations).

Alter table:  yes, I'm planning to create a followup task for this.  The 
original patch had alter table support in the meta hook interface too, but I 
trimmed it down for now to limit the scope of the first commit.



> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch, zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-03 Thread Prasad Chakka (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840832#action_12840832
 ] 

Prasad Chakka commented on HIVE-705:


John, Why are pre, commit, rollback functions needed in MetaHook? Isn't it 
enough just to drop table as a rollback for create, and do the drop table after 
hive drop table? With the current definition the MetaHook implementation needs 
to keep state around which Hive itself doesn't do.

Also alter table on external tables should be allowed since underlying storage 
format for external tables is not managed by Hive itself. In such cases alter 
table is just changing metadata in side Hive.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch, zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-02-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836941#action_12836941
 ] 

John Sichi commented on HIVE-705:
-

BTW, the new STORED BY 'storage-handler-class' should make it easy to plug in 
Cassandra.


> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>Assignee: John Sichi
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705.1.patch, HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-02-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836828#action_12836828
 ] 

John Sichi commented on HIVE-705:
-

Jonathan, thanks for the input.  I think we should be able to come up with a 
mapping feature which encompasses what you've proposed plus what's in HIVE-806 
so that it will be up to the user to decide how to map a particular set of 
HBase tables into Hive.

We can do this by allowing the HBase table name to be specified as part of 
mapping it into Hive.  That way, you can have

Hive t1(c1, c2) -> HBase t.cf1(c1, c2)
Hive t2(c3, c4) -> HBase t.cf2(c3, c4)

or

Hive t(c1,c2,c3,c4) -> HBase t(cf1(c1, c2), cf2(c3, c4))

or

Hive t(cf1map, cf2map) -> HBase t(cf1, cf2)

or variations.  I'm going to write up a proposal in the Hive wiki and solicit 
feedback.


> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>Assignee: John Sichi
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705.1.patch, HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-02-20 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836323#action_12836323
 ] 

Jonathan Ellis commented on HIVE-705:
-

ISTM that merging the HBase columnfamilies into a single Hive table is the 
wrong approach and could lead to poor performance; rather, each HBase CF should 
be its own Hive table, which may of course be joined with others as necessary.  
(I think using the word "table" for HBase's "collection of CFs" is unfortunate 
in the first place since they are different animals; fundamentally, the basic 
unit of data access in HBase is the CF.)

I'm interested because Cassandra is also looking at adding Hive support, and we 
also implement a ColumnFamily data model.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>Assignee: John Sichi
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705.1.patch, HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-01-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806193#action_12806193
 ] 

John Sichi commented on HIVE-705:
-

I'm going to start working on getting this ready for submission against latest 
trunk.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>Assignee: John Sichi
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-09-07 Thread stephen xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752340#action_12752340
 ] 

stephen xie commented on HIVE-705:
--

thanks very much for Samuel's help.
The issue above has been resolved.
In the distributed test environment, running hive command must be added the 
parameter --auxpath hive_contrib.jar,hbase.jar.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>Assignee: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-09-07 Thread Samuel Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752325#action_12752325
 ] 

Samuel Guo commented on HIVE-705:
-

@stephen

I have run the patch on my notebook. But I did not encounter the 
NullPointerException mentioned in your comment. 
Can you send me the hive log and the userlogs of the mr job 'FROM src INSERT 
OVERWRITE TABLE hbase_table_1 SELECT *;' ?

Thanks.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>Assignee: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-09-06 Thread stephen xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751898#action_12751898
 ] 

stephen xie commented on HIVE-705:
--

Hi, Samuel

Before the testing, I have set the configuration parameter ' 
"hive.othermetadata.handlers" the same as your said.
Thanks.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>Assignee: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-09-06 Thread Samuel Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751876#action_12751876
 ] 

Samuel Guo commented on HIVE-705:
-

@stephen:

Did you set the configuration parameter ' "hive.othermetadata.handlers" : 
"org.apache.hadoop.hive.contrib.hbase.HiveHBaseTableInputFormat:org.apache.hadoop.hive.contrib.hbase.HBaseMetadataHandler"
 '?

I am sorry that I have other things to handle these days. I will fix the bug 
immediately if I have time.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>Assignee: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-09-03 Thread stephen xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751283#action_12751283
 ] 

stephen xie commented on HIVE-705:
--

Hi, Samuel

Thankx very much for your new patch.
There are some problem when i used it as the following,

1. create table src(key int, value string);
ok
2. LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE src;
ok
3. CREATE TABLE hbase_table_1(key int, value string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.hbase.HBaseSerDe' 
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf:string"
) STORED AS HBASETABLE;
ok
4.FROM src INSERT OVERWRITE TABLE hbase_table_1 SELECT *;
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.ExecDriver

I found error in the m/r map process, just as the following,

java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:110)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:165)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:308)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:345)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:330)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:58)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:308)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:345)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:330)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:316)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:308)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:289)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:308)
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:82)
... 7 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:88)
... 19 more

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-24 Thread Raghu Murthy
This can be easily achieved via a lookup UDF in hive. See
https://issues.apache.org/jira/browse/HIVE-758 on how hive and hbase can
interact without having to write a serde.


On 8/23/09 7:43 PM, "Matt Pestritto"  wrote:

> Hi All.  I see a lot of good work being done on HBase/Hive integration
> especially around how to express hbase metadata in hive and how to load data
> from/to hbase/hive.
> 
> Has any thought be been put into how to use HBase data as lookup data in a
> query and not load all of the data as a normal hive query ?
> 
> My use case is as follows:  I have a table < users > with 50m users.  I have
> a 5gb daily clickstream file that only touchs 150k of those users on a daily
> basis.  It would be much more efficient if I didn't have to load all of the
> data in HBase to a hive table and write a traditional hive query but just do
> 150k lookups in the map ( or reduce ) phase of the MR job.  If the hbase
> lookups were done in realtime it would be much faster than sourcing the
> original user table with 50m rows.
> 
> Thoughts ?
> 
> Thanks
> -Matt
> 
> 
> On Sun, Aug 23, 2009 at 8:20 AM, Samuel Guo (JIRA)  wrote:
> 
>> 
>>[
>> https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin
>> .system.issuetabpanels:comment-tabpanel&focusedCommentId=12746592#action_1274
>> 6592]
>> 
>> Samuel Guo commented on HIVE-705:
>> -
>> 
>> Attach a new patch.
>> 
>> 1) move the related hbase code to the contrib package, as hbase just an
>> optional storage for hive, not neccessary.
>> I have tried to avoid modifying the hive original code and just add a hbase
>> serde to connect hive with hbase. But the hbase storage model is quite
>> different with file storage model. For example, a loadwork is used to
>> rename/copy files from temp dir to the target table's dir if a query's
>> target is a hive table. But in a hbased hive table, we can't rename a table
>> now. So it's hard to let a hbased hive table to follow the logic of a normal
>> file-based hive table.  So I add some code(HiveFormatUtils) to distinguish a
>> file-based table from a not-file-based table.
>> 
>> 2) fix some bugs in the draft patch, such as "select *" return nothing.
>> 
>> 
>> -
>> -
>> 
>> How to use the hbase as hive's storage?
>> 
>> 1) remember to add the contrib jar and the hbase jar in the hive's auxPath,
>> so m/r can populate the neccessary hbase-related jars to the whole hadoop
>> m/r cluster.
>> 
>>> $HIVE_HOME/bin/hive -auxPath ${contrib_jar},${hbase_jar}
>> 
>> 2) modify the configuration to add the following configuration parameters.
>> 
>> "hbase.master" : pointer to the hbase's master.
>> "hive.othermetadata.handlers" :
>> "org.apache.hadoop.hive.contrib.hbase.HiveHBaseTableInputFormat:org.apache.ha
>> doop.hive.contrib.hbase.HBaseMetadataHandler"
>> 
>> "hive.othermetadata.handlers" collects the metadata handlers to handle the
>> other metadata operations in the not-file-based hive tables. Take hbase as
>> an example. HBaseMetadataHandler will create the neccessary hbase table and
>> its family columns when we create a hbased hive table from hive's client. It
>> also drop the hbase table when we drop the hive table.
>> 
>> The metastore read the registered handlers map from the configuration file
>> during initialization. The registered handlers map is formated as
>> "table_format_classname:table_metadata_handler_classname,table_format_classna
>> me:table_metadata_handler_classname,...".
>> 
>> 3) enjoy "hive over hbase"!
>> 
>> 
>> 
>> Other problems.
>> 
>> 1) Altering a hased-hive table is not supported now. :(
>> renaming a table in hbase is not supported now, so I just do not support
>> rename operation. ( maybe if we rename a hive table, we do not need to
>> rename the base hbase table.)
>> 
>> adding/replacing cloumns.
>> Now we need to specify the schema mapping in the SerDe properties
>> explicitly. If we want to adding columns, we need to call 'alter' twice to
>> adding columns: change the serde properties and the hive columns.  Either
>> change the serde properties first or change the hive columns first will fail
>> now, because we validate the schema mapping during SerDe initialization. One
>> of the hbase serde validation is to check the counts of hive columns and
>> hbase mapping columns. If we first change the hive columns, the number of
>> hive columns will be more than hbase mapping columns, the HBase Serde
>> initialization will fail this alter operation.  (maybe we need to remove the
>> validation code from HBaseSerDe initialization and do it in other place?)
>> 
>> 2) more flexible schema mapping?
>> As Schubert metioned before, more flexible schema mapping will be useful
>> for user. This feature will be added later.
>> 
>> 
>> welcome for comments~
>> 
>> 
>> 
>> 
>>> Let Hive can analy

Re: [jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-23 Thread Matt Pestritto
Hi All.  I see a lot of good work being done on HBase/Hive integration
especially around how to express hbase metadata in hive and how to load data
from/to hbase/hive.

Has any thought be been put into how to use HBase data as lookup data in a
query and not load all of the data as a normal hive query ?

My use case is as follows:  I have a table < users > with 50m users.  I have
a 5gb daily clickstream file that only touchs 150k of those users on a daily
basis.  It would be much more efficient if I didn't have to load all of the
data in HBase to a hive table and write a traditional hive query but just do
150k lookups in the map ( or reduce ) phase of the MR job.  If the hbase
lookups were done in realtime it would be much faster than sourcing the
original user table with 50m rows.

Thoughts ?

Thanks
-Matt


On Sun, Aug 23, 2009 at 8:20 AM, Samuel Guo (JIRA)  wrote:

>
>[
> https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746592#action_12746592]
>
> Samuel Guo commented on HIVE-705:
> -
>
> Attach a new patch.
>
> 1) move the related hbase code to the contrib package, as hbase just an
> optional storage for hive, not neccessary.
> I have tried to avoid modifying the hive original code and just add a hbase
> serde to connect hive with hbase. But the hbase storage model is quite
> different with file storage model. For example, a loadwork is used to
> rename/copy files from temp dir to the target table's dir if a query's
> target is a hive table. But in a hbased hive table, we can't rename a table
> now. So it's hard to let a hbased hive table to follow the logic of a normal
> file-based hive table.  So I add some code(HiveFormatUtils) to distinguish a
> file-based table from a not-file-based table.
>
> 2) fix some bugs in the draft patch, such as "select *" return nothing.
>
>
> --
>
> How to use the hbase as hive's storage?
>
> 1) remember to add the contrib jar and the hbase jar in the hive's auxPath,
> so m/r can populate the neccessary hbase-related jars to the whole hadoop
> m/r cluster.
>
> > $HIVE_HOME/bin/hive -auxPath ${contrib_jar},${hbase_jar}
>
> 2) modify the configuration to add the following configuration parameters.
>
> "hbase.master" : pointer to the hbase's master.
> "hive.othermetadata.handlers" :
> "org.apache.hadoop.hive.contrib.hbase.HiveHBaseTableInputFormat:org.apache.hadoop.hive.contrib.hbase.HBaseMetadataHandler"
>
> "hive.othermetadata.handlers" collects the metadata handlers to handle the
> other metadata operations in the not-file-based hive tables. Take hbase as
> an example. HBaseMetadataHandler will create the neccessary hbase table and
> its family columns when we create a hbased hive table from hive's client. It
> also drop the hbase table when we drop the hive table.
>
> The metastore read the registered handlers map from the configuration file
> during initialization. The registered handlers map is formated as
> "table_format_classname:table_metadata_handler_classname,table_format_classname:table_metadata_handler_classname,...".
>
> 3) enjoy "hive over hbase"!
>
> 
>
> Other problems.
>
> 1) Altering a hased-hive table is not supported now. :(
> renaming a table in hbase is not supported now, so I just do not support
> rename operation. ( maybe if we rename a hive table, we do not need to
> rename the base hbase table.)
>
> adding/replacing cloumns.
> Now we need to specify the schema mapping in the SerDe properties
> explicitly. If we want to adding columns, we need to call 'alter' twice to
> adding columns: change the serde properties and the hive columns.  Either
> change the serde properties first or change the hive columns first will fail
> now, because we validate the schema mapping during SerDe initialization. One
> of the hbase serde validation is to check the counts of hive columns and
> hbase mapping columns. If we first change the hive columns, the number of
> hive columns will be more than hbase mapping columns, the HBase Serde
> initialization will fail this alter operation.  (maybe we need to remove the
> validation code from HBaseSerDe initialization and do it in other place?)
>
> 2) more flexible schema mapping?
> As Schubert metioned before, more flexible schema mapping will be useful
> for user. This feature will be added later.
>
>
> welcome for comments~
>
>
>
>
> > Let Hive can analyse hbase's tables
> > ---
> >
> > Key: HIVE-705
> > URL: https://issues.apache.org/jira/browse/HIVE-705
> > Project: Hadoop Hive
> >  Issue Type: New Feature
> >Reporter: Samuel Guo
> > Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar,
> HIVE-705_draft.patch, HIVE-705_revision806905.pa

[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-23 Thread Samuel Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746592#action_12746592
 ] 

Samuel Guo commented on HIVE-705:
-

Attach a new patch.

1) move the related hbase code to the contrib package, as hbase just an 
optional storage for hive, not neccessary.
I have tried to avoid modifying the hive original code and just add a hbase 
serde to connect hive with hbase. But the hbase storage model is quite 
different with file storage model. For example, a loadwork is used to 
rename/copy files from temp dir to the target table's dir if a query's target 
is a hive table. But in a hbased hive table, we can't rename a table now. So 
it's hard to let a hbased hive table to follow the logic of a normal file-based 
hive table.  So I add some code(HiveFormatUtils) to distinguish a file-based 
table from a not-file-based table.

2) fix some bugs in the draft patch, such as "select *" return nothing.

--

How to use the hbase as hive's storage?

1) remember to add the contrib jar and the hbase jar in the hive's auxPath, so 
m/r can populate the neccessary hbase-related jars to the whole hadoop m/r 
cluster.

> $HIVE_HOME/bin/hive -auxPath ${contrib_jar},${hbase_jar}

2) modify the configuration to add the following configuration parameters.

"hbase.master" : pointer to the hbase's master.
"hive.othermetadata.handlers" : 
"org.apache.hadoop.hive.contrib.hbase.HiveHBaseTableInputFormat:org.apache.hadoop.hive.contrib.hbase.HBaseMetadataHandler"

"hive.othermetadata.handlers" collects the metadata handlers to handle the 
other metadata operations in the not-file-based hive tables. Take hbase as an 
example. HBaseMetadataHandler will create the neccessary hbase table and its 
family columns when we create a hbased hive table from hive's client. It also 
drop the hbase table when we drop the hive table.

The metastore read the registered handlers map from the configuration file 
during initialization. The registered handlers map is formated as 
"table_format_classname:table_metadata_handler_classname,table_format_classname:table_metadata_handler_classname,...".

3) enjoy "hive over hbase"!



Other problems.

1) Altering a hased-hive table is not supported now. :(
renaming a table in hbase is not supported now, so I just do not support rename 
operation. ( maybe if we rename a hive table, we do not need to rename the base 
hbase table.)

adding/replacing cloumns.
Now we need to specify the schema mapping in the SerDe properties explicitly. 
If we want to adding columns, we need to call 'alter' twice to adding columns: 
change the serde properties and the hive columns.  Either change the serde 
properties first or change the hive columns first will fail now, because we 
validate the schema mapping during SerDe initialization. One of the hbase serde 
validation is to check the counts of hive columns and hbase mapping columns. If 
we first change the hive columns, the number of hive columns will be more than 
hbase mapping columns, the HBase Serde initialization will fail this alter 
operation.  (maybe we need to remove the validation code from HBaseSerDe 
initialization and do it in other place?)

2) more flexible schema mapping?
As Schubert metioned before, more flexible schema mapping will be useful for 
user. This feature will be added later.


welcome for comments~




> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-12 Thread Samuel Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742720#action_12742720
 ] 

Samuel Guo commented on HIVE-705:
-

@kula @stephen
Thank you all for your comments.

1) As stephen methioned, the NullPointerException is thrown out because the 
COLUMN_LIST is set in the wrong job configuration.
I will fixed it in the new path.

2) It seems that "select *" statement is buggy now. I will find out the problem 
and fix it.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-12 Thread stephen xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742704#action_12742704
 ] 

stephen xie commented on HIVE-705:
--

Hi Samuel, 

  Also, I found the same problem as Kula.
  I changed one line in the method HiveInputFormat::getSplits,

--- newjob.set(TableInputFormat.COLUMN_LIST, hbaseColumns);
+++  job.set(TableInputFormat.COLUMN_LIST, hbaseColumns);

  Then the above java exception disappered, select is ok.

  But when I tested more than 2 columns, the query returned nothing.

CREATE TABLE hbase_table_2(key int, value1 string, value2 int) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.hbase.HBaseSerDe' 
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf:value1, cf:value2"
) STORED AS HBASETABLE;
FROM src2 INSERT OVERWRITE TABLE hbase_table_2 SELECT *;

The following 2 queries both returned nothing.
select * from hbase_table_2 where value > '0';
select * from hbase_table2;


> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-12 Thread Kula Liao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742697#action_12742697
 ] 

Kula Liao commented on HIVE-705:


Hi Samuel,

Thanks for your great job.
I found some error when testing your patch.

The sql statements are from the file : 
"ql/src/test/queries/clienthbase/hbase_queries.q".
I created a table named "hbase_table_1" using the following statement:

CREATE TABLE hbase_table_1(key int, value string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.hbase.HBaseSerDe' 
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "cf:string"
) STORED AS HBASETABLE;

OK. Then I inserted data into "hbase_table_1". 

hive> FROM src INSERT OVERWRITE TABLE hbase_table_1 SELECT *;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_200908131113_0002, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_200908131113_0002
Kill Command = /home/stephen/hadoop-0.19.2/bin/../bin/hadoop job  
-Dmapred.job.tracker=localhost:9001 -kill job_200908131113_0002
2009-08-13 11:17:07,162 map = 0%,  reduce =0%
2009-08-13 11:17:14,200 map = 50%,  reduce =0%
2009-08-13 11:17:15,215 map = 100%,  reduce =0%
Ended Job = job_200908131113_0002
500 Rows loaded to hbase_table_1
OK

When I tried to do some queries. I found the following error message:

hive> select * from hbase_table_1 where value > '0'; 
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_200908131113_0003, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_200908131113_0003
Kill Command = /home/stephen/hadoop-0.19.2/bin/../bin/hadoop job  
-Dmapred.job.tracker=localhost:9001 -kill job_200908131113_0003
2009-08-13 11:18:24,019 map = 0%,  reduce =0%
2009-08-13 11:18:42,146 map = 100%,  reduce =100%
Ended Job = job_200908131113_0003 with errors
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.ExecDriver

The following message is found in the mapreduce log:

java.lang.NullPointerException
at 
org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:52)
at 
org.apache.hadoop.hive.ql.io.HiveHBaseTableInputFormat.configure(HiveHBaseTableInputFormat.java:36)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:184)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:211)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
at org.apache.hadoop.mapred.Child.main(Child.java:158)

There is another query, nothing returned.
hive> select * from hbase_table_1;
OK
Time taken: 2.952 seconds

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-11 Thread Samuel Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742191#action_12742191
 ] 

Samuel Guo commented on HIVE-705:
-

@Ashish

Thank you for your comment. 
It is difficult to infer the columns list from a sparse column hbase table, we 
do not know exactly how many columns in a given hbase table. We just know all 
the column families of a given hbase table. 
Also, the data in hbase are all raw bytes. If we do not explicitly stat the 
schema mapping, we will lose the information how to serialize/deserialize the 
data from raw bytes.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-11 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742106#action_12742106
 ] 

Ashish Thusoo commented on HIVE-705:


The data model mapping works. I have one suggestion though. Can we infer the 
columns list of the hive table from the hbase table instead of explicitly 
stating it in the create command. My concerns is that an addition of a column 
family in hbase will require an alter table on hive and if we can avoid it that 
would be great.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-10 Thread Samuel Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741672#action_12741672
 ] 

Samuel Guo commented on HIVE-705:
-

@schubert,

Thank you for you comment.

>> In you patch, we found many java files are modified, it is really a big 
>> effort. I don't know if there is any way to avoid such a big modification.
A HBase Table is quite different with a file in HDFS. The original Hive code is 
based on files. For example, when outputting the reduce results to the target 
table, Hive uses a FileSinkOperator to output the results to the temp file in 
the HDFS, and uses a MoveTask to rename the temp files in the HDFS to the 
target table dir. But when the the target table is based on a HBase Table, we 
do not need to deal with these file operations, and just output to the target 
HBase Table.

The modification of the original java files is to tell hive to deal with a 
hbase table in a differnt way. 

I will try to look into the code and find a way to avoid the modification.

>> 2. The performance is not good when we maped SQL columns to HBase columns in 
>> our past experience. For example, we have a table with 20 columns, then, 
>> each read or write of a row will comprise 20 key-value operations. It is 
>> ineffective.

A good point. The schema mapping does not effect the peformance during creating 
a hive table. The performance is effected if we get all the mapping columns out 
of hbase table in an actual query operation. Some code will be added to do the 
column-prune during hbase table scanning.

For example, an hbase table (cf1:(co1, col2, col3), cf2:(col4,col5,col6), ... , 
cfn:(colk,colj,coll)) is mapping to a hive table (column1, column2, column3, 
column4, ... ,column n).
If a query "select column3, column4 from hbasedhivetable" is invoked, we should 
not let hbase scan out all the columns. We know all the hive columns used in 
the query, map back to the hbase column, and get the scanning list "cf1:col3 
cf2:col4". We set the scanning list "cf1:col3 cf2:col4" in the HBaseInputFormat 
to let HBase just scan out the useful columns.

The code will be added in the new patch.

>> cf2: => {(col3, col5, col6), Default SerDe} 

Cool. Let different SerDe work on different hbase column.  I will try it in the 
new patch.

>> Look forward to have more communication with you in Chinese, by your 
>> convenience.
My Gtalk is : sijie0...@gmail.com


> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-10 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741655#action_12741655
 ] 

He Yongqiang commented on HIVE-705:
---

Samuel, i am now in ShangHai attending a meeting. I will talk with you on phone 
asap when i get back. Thanks for the quick fix. 

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-10 Thread schubert zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741431#action_12741431
 ] 

schubert zhang commented on HIVE-705:
-

Hi Samuel,

Thanks for your great job.
In you patch, we found many java files are modified, it is really a big effort. 
I don't know if there is any way to avoid such a big modification.

Regards the schema mapping between HBase table and Hive SQL table, I have 
following consideration.
1. We just want to use HBase as a scalable structure data store, or key-value 
store.
2. The performance is not good when we maped SQL columns to HBase columns in 
our past experience. For example, we have a table with 20 columns, then, each 
read or write of a row will comprise 20 key-value operations. It is ineffective.

How about consider more flexible schema mapping:
1. one HBase column can map to multiple hive-SQL columns with a SerDe. e.g.  
cf1:q1 => {(col1, col2, col3), Default SerDe} 
2. one HBase column family can map to multiple hive-SQL columns with a SerDe. 
e.g. cf2: => {(col3, col5, col6), Default SerDe} 
3. your MAP column (in Hive table) for sparse column family. [Optional] Since 
Hive is a structured data analysis front-end, we can omit this feature at the 
beginning.

For example:

CREATE EXTERNAL TABLE hive_table (pkey STRING,  col1 STRING, col2 INT, col2, 
STRING, col3 INT, col4 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MyHBaseSerDe'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "cf1:(col1,col2,col3) with DefaultSerDe, cf2:c1 
(col4) with DefaultSerDe",
)
STORED AS HBASETABLE
LOCATION ''

Usually,  we want a more advanced data store backend than HDFS, to achieve more 
flexible data placement and indexing. HBase's data model is very good to meet 
this requirement, but we may need not the full fearures of HBase here.

--
Look forward to have more communication with you in Chinese, by your 
convenience.

Schubert

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> HIVE-705_draft.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-07-31 Thread Samuel Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737453#action_12737453
 ] 

Samuel Guo commented on HIVE-705:
-

The key problem to let hive analyse hbase's tables is how to map the hbase's 
data model to hive's sql data model.

As we know, the hbase's data is accessed by . so a meta-data mapping should be recorded in hive's metadata, as 
below:

---
hbase's tablename -> hive's tablename
hbase's columns   -> hive's columns
hbase's key   -> hive's first column
hbase's timestamp -> hive's second column
---

The key and timestamp of hbase table will be mapped to *first two default 
columns* in hive's table automatically. So the hbased-hive table may be like 
<.key, .timestamp, ..., other columns defined by users>.

For example, a hbase table 'webpages', has columns . There are 2 column families, "contents" and "anchors". The content 
of table 'webpages' is stored in column 'contents:page_content', the data is 
dense. And the anchors of a specified page will varied between different pages, 
so the data in 'anchros:' will be sparse. 
The columns of hbase' table will be mapped manually be programmers : we can map 
a full column  in hbase to a *primitive_type* column 
in hive, while mapping a column family  in hbase to a 
*map_type* column in hive. So the hbase table webpages' hive schema will be 
(.key, .timestamp, page_content, anchors).

Setting up schema mapping between hbase table and hive table, we need to 
consider how to record the shema mapping, serialize the hive object to hbase 
table and deserialize hbase's data to hive object.

The proposal is to add a new HbaseSerDe for recording the schema mapping in 
SerDe properties. So the SerDe can use its schema mapping to serialize the hive 
object to hbase's table and deserialize hbase's data to hive object.

The properties in HBaseSerDe will be:
1)  "hbase.key.type" : the type of .key column in hive table, defining how to 
deserialize the .key field from hbase's key. (the hbase key is a bytes array)
2)  "hbase.schema.mapping" : a string separated by comma, defining the shema 
mapping. The schema will be mapped in order one by one.

These properites should be provided during creating a hbased-hive table. If the 
"hbase.key.type" is not defined, we treat it as a string. But if the 
"hbase.schema.mapping" is not defined, we should fail the table creation 
because we do not how to deserialize hive object from hbase raw bytes data.

A hbased-hive table's operations are showed as below:

*1.  Using existed hbase table as an external table in hive*

The 'create' command will be as below:

-

CREATE EXTERNAL TABLE webpages(page_content STRING, anchors MAP)
COMMENT 'This is the pages table'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.HBaseSerDe'
WITH SERDEPROPERTIES (
"hbase.key.type" = "string",
"hbase.columns.mapping" = "contents:page_content,anchors:",
)
STORED AS HBASETABLE
LOCATION ''

-
Here the hbase_table_location will identify the location of hbase and the hbase 
table name, such as "hbase:/hbase_master:port/hbase_tablename".

And after creating an external table using an existing hbase table, we can do 
analysis over the table like normal hive table.

A. Get all the urls and their pages that added after a specified time t1.

SELECT .key, page_content FROM webpages WHERE .timestamp > t1;

B. Get the revisions of a specified url  from a specified time 
t1 to a specified time t2.

SELECT page_content FROM webpages WHERE .timestamp > t1 AND .timestamp < t2 AND 
.key = 'www.apache.org';

*2. Creating a new hbase table as a hive table.*

The 'create' command will be as below:

-

CREATE TABLE webpages(page_content STRING, anchors MAP)
COMMENT 'This is the pages table'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.HBaseSerDe'
WITH SERDEPROPERTIES (
"hbase.key.type" = "string",
"hbase.columns.mapping" = "contents:page_content,anchors:",
)
STORED AS HBASETABLE
LOCATION ''

-

After invoking the 'create' command, the hive client will also create a hbase 
table in the specified hbase cluster. And the created hbase table will have two 
column families defined in HBaseSerDe properties, "contents:" and "anchros:".

*3. Loading data into tables.*

As we have two default hidden column (.key, .timestamp) in hbased-hive table, 
we must count these two columns in during inserting data. 
We can eigth load data into hbased-hive table by inserting data from other 
tables or loading data from local filesystem. 

*A. Inserting data from other tables.*

for example, we have a 'crawled_pages' table collecting all the pages crawled 
from the internet. the 'crawled_pages' is simple: . 

I. If we want to l

[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-07-29 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736728#action_12736728
 ] 

Ashish Thusoo commented on HIVE-705:


Also would be great if you could comment on how you plan to map the hbase data 
model to the sql data model (i.e. tables, columns etc.)

This will be a cool contribution

SerDe would be the right way to go...

Thanks,
Ashish

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-07-28 Thread Samuel Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736477#action_12736477
 ] 

Samuel Guo commented on HIVE-705:
-

I will add more detail about this issue late.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-07-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736467#action_12736467
 ] 

He Yongqiang commented on HIVE-705:
---

Do we need to add a new serde for this? can you add more in the description?

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Samuel Guo
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.