Re: hive to hbase mapping

2013-06-18 Thread Sanjay Subramanian
How about you have two streams - one to hbase and one to Hive fro your data 
generation source ?

Moving data out of Hbase may not be trivial specially if the data sizes are 
largeā€¦.


From: Mario Casola mario.cas...@gmail.commailto:mario.cas...@gmail.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Friday, June 14, 2013 9:54 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: hive to hbase mapping

Hi Sanjay,

thanks for the response.

I need Hbase because is perfect for aggregating data through the counters, and 
write performance is great.
Now the problem is...Which is the best way for loading periodically (every hour 
for example) Hbase data in Hive table?

Mario



2013/6/14 Sanjay Subramanian 
sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com
6 months back I was tasked with building a Data platform for logs and I 
benchmarked
Hbase + Hive (queries were 8X slower)
Hive only

So I decided for Hive option and am deploying that solution to production.

Couple of things u can think while u design if u really want to go HBase+Hive 
(also look at this http://hadoopstack.com/hive-on-hbase-part-1/)
- Query only todays data in a Hive+Hbase architecture
- Older data than one day query Hive only

Hope I am not diverting from your question and problem

sanjay

From: Mario Casola mario.cas...@gmail.commailto:mario.cas...@gmail.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Friday, June 14, 2013 8:54 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: hive to hbase mapping

Hi,

I have a performance issue when I query HBase from Hive.
My idea is to build the scenario below:
1. Collect data in hbase for aggregation purpose
2. Create an external table that map Hive to Hbase
3. Create a real Hive table
4. Periodically transfer data from hbase to Hive through INSERTO INTO real 
hive table SELECT * FROM external table WHERE time = 201305212909

Currently I'm doing a test on a Hbase table that has 70,000,000 rows and I'm 
trying to query this table with a single column value filter, like the query 
above.
If I try this type of query directly in Hbase the response time is around 80 
seconds.
If I try the query in Hive shell, after 30 minutes, all the tasks (9 in my 
case) are 0,00% complete.

Which could be the problem?

thanks
Mario

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: hive to hbase mapping

2013-06-17 Thread Mario Casola
Hi,

for the first question the answer is yes, with a 500,000 rows Hbase table,
the job complete successfully.
Second, the jobs are in running state. Attached you can see a syslog of one
of the jobs.
Third, I've tryed to set the hbase.zookeeper.quorum property but nothing
is changed.

Let me know if I can check other configurations.

thanks
Mario



2013/6/15 kulkarni.swar...@gmail.com

 Since your jobs are at 0%, it might actually be a problem with your hadoop
 cluster rather than hive. Couple of things to check would be:

 1. Does a simple M/R job complete successfully?
 2. Do logs for the jobs say something? Are the jobs in running state or
 pending state?
 3. It is possible that job submitted from hive is unable to find the
 zookeeper quorum. To do that, you need to set hbase.zookeeper.quorum
 property in your hive-site.xml to point to your zookeeper quorum.

 Hope this helps.

 On Jun 14, 2013, at 11:54 AM, Mario Casola mario.cas...@gmail.com wrote:

 Hi Sanjay,

 thanks for the response.

 I need Hbase because is perfect for aggregating data through the counters,
 and write performance is great.
 Now the problem is...Which is the best way for loading periodically (every
 hour for example) Hbase data in Hive table?

 Mario



 2013/6/14 Sanjay Subramanian sanjay.subraman...@wizecommerce.com

  6 months back I was tasked with building a Data platform for logs and I
 benchmarked
 Hbase + Hive (queries were 8X slower)
 Hive only

  So I decided for Hive option and am deploying that solution to
 production.

  Couple of things u can think while u design if u really want to go
 HBase+Hive (also look at this
 http://hadoopstack.com/hive-on-hbase-part-1/)
 - Query only todays data in a Hive+Hbase architecture
 - Older data than one day query Hive only

  Hope I am not diverting from your question and problem

  sanjay

   From: Mario Casola mario.cas...@gmail.com
 Reply-To: user@hive.apache.org user@hive.apache.org
 Date: Friday, June 14, 2013 8:54 AM
 To: user@hive.apache.org user@hive.apache.org
 Subject: hive to hbase mapping

Hi,

  I have a performance issue when I query HBase from Hive.
 My idea is to build the scenario below:
 1. Collect data in hbase for aggregation purpose
 2. Create an external table that map Hive to Hbase
 3. Create a real Hive table
 4. Periodically transfer data from hbase to Hive through INSERTO INTO
 real hive table SELECT * FROM external table WHERE time = 201305212909

  Currently I'm doing a test on a Hbase table that has 70,000,000 rows
 and I'm trying to query this table with a single column value filter, like
 the query above.
 If I try this type of query directly in Hbase the response time is around
 80 seconds.
 If I try the query in Hive shell, after 30 minutes, all the tasks (9 in
 my case) are 0,00% complete.

  Which could be the problem?

  thanks
 Mario

 CONFIDENTIALITY NOTICE
 ==
 This email message and any attachments are for the exclusive use of the
 intended recipient(s) and may contain confidential and privileged
 information. Any unauthorized review, use, disclosure or distribution is
 prohibited. If you are not the intended recipient, please contact the
 sender by reply email and destroy all copies of the original message along
 with any attachments, from your computer system. If you are the intended
 recipient, please be advised that the content of this message is subject to
 access, review and disclosure by the sender's Email System Administrator.





syslog
Description: Binary data


Re: hive to hbase mapping

2013-06-14 Thread Sanjay Subramanian
6 months back I was tasked with building a Data platform for logs and I 
benchmarked
Hbase + Hive (queries were 8X slower)
Hive only

So I decided for Hive option and am deploying that solution to production.

Couple of things u can think while u design if u really want to go HBase+Hive 
(also look at this http://hadoopstack.com/hive-on-hbase-part-1/)
- Query only todays data in a Hive+Hbase architecture
- Older data than one day query Hive only

Hope I am not diverting from your question and problem

sanjay

From: Mario Casola mario.cas...@gmail.commailto:mario.cas...@gmail.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Friday, June 14, 2013 8:54 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: hive to hbase mapping

Hi,

I have a performance issue when I query HBase from Hive.
My idea is to build the scenario below:
1. Collect data in hbase for aggregation purpose
2. Create an external table that map Hive to Hbase
3. Create a real Hive table
4. Periodically transfer data from hbase to Hive through INSERTO INTO real 
hive table SELECT * FROM external table WHERE time = 201305212909

Currently I'm doing a test on a Hbase table that has 70,000,000 rows and I'm 
trying to query this table with a single column value filter, like the query 
above.
If I try this type of query directly in Hbase the response time is around 80 
seconds.
If I try the query in Hive shell, after 30 minutes, all the tasks (9 in my 
case) are 0,00% complete.

Which could be the problem?

thanks
Mario

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: hive to hbase mapping

2013-06-14 Thread Mario Casola
Hi Sanjay,

thanks for the response.

I need Hbase because is perfect for aggregating data through the counters,
and write performance is great.
Now the problem is...Which is the best way for loading periodically (every
hour for example) Hbase data in Hive table?

Mario



2013/6/14 Sanjay Subramanian sanjay.subraman...@wizecommerce.com

  6 months back I was tasked with building a Data platform for logs and I
 benchmarked
 Hbase + Hive (queries were 8X slower)
 Hive only

  So I decided for Hive option and am deploying that solution to
 production.

  Couple of things u can think while u design if u really want to go
 HBase+Hive (also look at this http://hadoopstack.com/hive-on-hbase-part-1/
 )
 - Query only todays data in a Hive+Hbase architecture
 - Older data than one day query Hive only

  Hope I am not diverting from your question and problem

  sanjay

   From: Mario Casola mario.cas...@gmail.com
 Reply-To: user@hive.apache.org user@hive.apache.org
 Date: Friday, June 14, 2013 8:54 AM
 To: user@hive.apache.org user@hive.apache.org
 Subject: hive to hbase mapping

Hi,

  I have a performance issue when I query HBase from Hive.
 My idea is to build the scenario below:
 1. Collect data in hbase for aggregation purpose
 2. Create an external table that map Hive to Hbase
 3. Create a real Hive table
 4. Periodically transfer data from hbase to Hive through INSERTO INTO
 real hive table SELECT * FROM external table WHERE time = 201305212909

  Currently I'm doing a test on a Hbase table that has 70,000,000 rows and
 I'm trying to query this table with a single column value filter, like the
 query above.
 If I try this type of query directly in Hbase the response time is around
 80 seconds.
 If I try the query in Hive shell, after 30 minutes, all the tasks (9 in my
 case) are 0,00% complete.

  Which could be the problem?

  thanks
 Mario

 CONFIDENTIALITY NOTICE
 ==
 This email message and any attachments are for the exclusive use of the
 intended recipient(s) and may contain confidential and privileged
 information. Any unauthorized review, use, disclosure or distribution is
 prohibited. If you are not the intended recipient, please contact the
 sender by reply email and destroy all copies of the original message along
 with any attachments, from your computer system. If you are the intended
 recipient, please be advised that the content of this message is subject to
 access, review and disclosure by the sender's Email System Administrator.