Re: question about partition table in hive

2013-09-13 Thread Jagat Singh
Adding to Sanjay's reply

The only thing left after flume has added partitions is to tell hive
metastore to update partition information.

which you can do via

add partition command

Then you can read data via hive straight away.


On Sat, Sep 14, 2013 at 10:00 AM, Sanjay Subramanian <
sanjay.subraman...@wizecommerce.com> wrote:

>  A couple of days back, Erik Sammer at the Hadoop Hands On Lab at the
> Cloudera Sessions demonstrated how to achieve dynamic partitioning using
> Flume and created those partitioned directories on HDFS which are then
> readily usable by Hive
>
>  Understanding what I can from the two lines of your mail below, I would
> configure Flume to do dynamic partitioning (YEAR, MONTH, DAY, HOUR) and
> create those directories in HDFS and then create Hive tables with  those
> partitions and run the queries
>
>  As Stephen said earlier , experiment like crazy - and share please - it
> will make all of us better as well !
>
>
>  Thanks
>
>  sanjay
>
>   From: ch huang 
> Reply-To: "user@hive.apache.org" 
> Date: Thursday, September 12, 2013 6:55 PM
> To: "user@hive.apache.org" 
> Subject: question about partition table in hive
>
>   hi,all:
> i use flume collect log data and put it in hdfs ,i want to use
> hive to do some caculate, query based on timerange,i want to use parttion
> table ,
> but the data file in hdfs is a big file ,how can i put it into pratition
> table in hive?
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>


Re: question about partition table in hive

2013-09-13 Thread Sanjay Subramanian
A couple of days back, Erik Sammer at the Hadoop Hands On Lab at the Cloudera 
Sessions demonstrated how to achieve dynamic partitioning using Flume and 
created those partitioned directories on HDFS which are then readily usable by 
Hive

Understanding what I can from the two lines of your mail below, I would 
configure Flume to do dynamic partitioning (YEAR, MONTH, DAY, HOUR) and create 
those directories in HDFS and then create Hive tables with  those partitions 
and run the queries

As Stephen said earlier , experiment like crazy - and share please - it will 
make all of us better as well !


Thanks

sanjay

From: ch huang mailto:justlo...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Thursday, September 12, 2013 6:55 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: question about partition table in hive

hi,all:
i use flume collect log data and put it in hdfs ,i want to use hive to 
do some caculate, query based on timerange,i want to use parttion table ,
but the data file in hdfs is a big file ,how can i put it into pratition table 
in hive?

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: question about partition table in hive

2013-09-13 Thread Dean Wampler
Flume might be able to invoke Hive to do this as the data is ingested, but
I don't know anything about Flume.

Brent has a nice blog post describing many of the details of partitioning.

http://www.brentozar.com/archive/2013/03/introduction-to-hive-partitioning/

We also cover them in our book. The key steps to taking the file(s) you
created and transforming them into partitioned data are the following:

1. Create an "external" table where the location is the directory you wrote
that big HDFS file (or files).
2. Create the final target table with the partitioning, as described in
Brent's blog post.
3. Run a query against the first table to populate the second. Again, Brent
covers the details.

See the Hive wiki for additional details on external tables, etc.

Dean



On Thu, Sep 12, 2013 at 7:55 PM, ch huang  wrote:

> hi,all:
> i use flume collect log data and put it in hdfs ,i want to use
> hive to do some caculate, query based on timerange,i want to use parttion
> table ,
> but the data file in hdfs is a big file ,how can i put it into pratition
> table in hive?
>



-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com


Re: question about partition table in hive

2013-09-13 Thread Stephen Sprague
and have you done any analysis on this yet using the Hive documentation
that's publicly available?

if you show some initiative yourself you're more likely to get others to
join your cause. :)

So what have you tried before asking us for help?


On Thu, Sep 12, 2013 at 6:55 PM, ch huang  wrote:

> hi,all:
> i use flume collect log data and put it in hdfs ,i want to use
> hive to do some caculate, query based on timerange,i want to use parttion
> table ,
> but the data file in hdfs is a big file ,how can i put it into pratition
> table in hive?
>


Re: question about partition table in hive

2013-09-13 Thread Nitin Pawar
You will need to define a partition column like date or hour something like
this.
Then configure flume to rollover filee/directories based on your partition
column.
You will need some kind of cron which will check for the new data being
available into a directory or file and then add it as partition to the table

(Looks easy but fairly complex)

Other approach, write into a single file of a table.
Then create another partitioned table and then select from base table with
dynamic partitions enabled, write into new table. (This will be little bad
as you will always need to reprocess all the data or limit data with where
clause and adding to particular partition only )


On Fri, Sep 13, 2013 at 7:25 AM, ch huang  wrote:

> hi,all:
> i use flume collect log data and put it in hdfs ,i want to use
> hive to do some caculate, query based on timerange,i want to use parttion
> table ,
> but the data file in hdfs is a big file ,how can i put it into pratition
> table in hive?
>



-- 
Nitin Pawar


question about partition table in hive

2013-09-12 Thread ch huang
hi,all:
i use flume collect log data and put it in hdfs ,i want to use hive
to do some caculate, query based on timerange,i want to use parttion table ,
but the data file in hdfs is a big file ,how can i put it into pratition
table in hive?