Re: question about partition table in hive
Adding to Sanjay's reply The only thing left after flume has added partitions is to tell hive metastore to update partition information. which you can do via add partition command Then you can read data via hive straight away. On Sat, Sep 14, 2013 at 10:00 AM, Sanjay Subramanian < sanjay.subraman...@wizecommerce.com> wrote: > A couple of days back, Erik Sammer at the Hadoop Hands On Lab at the > Cloudera Sessions demonstrated how to achieve dynamic partitioning using > Flume and created those partitioned directories on HDFS which are then > readily usable by Hive > > Understanding what I can from the two lines of your mail below, I would > configure Flume to do dynamic partitioning (YEAR, MONTH, DAY, HOUR) and > create those directories in HDFS and then create Hive tables with those > partitions and run the queries > > As Stephen said earlier , experiment like crazy - and share please - it > will make all of us better as well ! > > > Thanks > > sanjay > > From: ch huang > Reply-To: "user@hive.apache.org" > Date: Thursday, September 12, 2013 6:55 PM > To: "user@hive.apache.org" > Subject: question about partition table in hive > > hi,all: > i use flume collect log data and put it in hdfs ,i want to use > hive to do some caculate, query based on timerange,i want to use parttion > table , > but the data file in hdfs is a big file ,how can i put it into pratition > table in hive? > > CONFIDENTIALITY NOTICE > == > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. >
Re: question about partition table in hive
A couple of days back, Erik Sammer at the Hadoop Hands On Lab at the Cloudera Sessions demonstrated how to achieve dynamic partitioning using Flume and created those partitioned directories on HDFS which are then readily usable by Hive Understanding what I can from the two lines of your mail below, I would configure Flume to do dynamic partitioning (YEAR, MONTH, DAY, HOUR) and create those directories in HDFS and then create Hive tables with those partitions and run the queries As Stephen said earlier , experiment like crazy - and share please - it will make all of us better as well ! Thanks sanjay From: ch huang mailto:justlo...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Date: Thursday, September 12, 2013 6:55 PM To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: question about partition table in hive hi,all: i use flume collect log data and put it in hdfs ,i want to use hive to do some caculate, query based on timerange,i want to use parttion table , but the data file in hdfs is a big file ,how can i put it into pratition table in hive? CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: question about partition table in hive
Flume might be able to invoke Hive to do this as the data is ingested, but I don't know anything about Flume. Brent has a nice blog post describing many of the details of partitioning. http://www.brentozar.com/archive/2013/03/introduction-to-hive-partitioning/ We also cover them in our book. The key steps to taking the file(s) you created and transforming them into partitioned data are the following: 1. Create an "external" table where the location is the directory you wrote that big HDFS file (or files). 2. Create the final target table with the partitioning, as described in Brent's blog post. 3. Run a query against the first table to populate the second. Again, Brent covers the details. See the Hive wiki for additional details on external tables, etc. Dean On Thu, Sep 12, 2013 at 7:55 PM, ch huang wrote: > hi,all: > i use flume collect log data and put it in hdfs ,i want to use > hive to do some caculate, query based on timerange,i want to use parttion > table , > but the data file in hdfs is a big file ,how can i put it into pratition > table in hive? > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com
Re: question about partition table in hive
and have you done any analysis on this yet using the Hive documentation that's publicly available? if you show some initiative yourself you're more likely to get others to join your cause. :) So what have you tried before asking us for help? On Thu, Sep 12, 2013 at 6:55 PM, ch huang wrote: > hi,all: > i use flume collect log data and put it in hdfs ,i want to use > hive to do some caculate, query based on timerange,i want to use parttion > table , > but the data file in hdfs is a big file ,how can i put it into pratition > table in hive? >
Re: question about partition table in hive
You will need to define a partition column like date or hour something like this. Then configure flume to rollover filee/directories based on your partition column. You will need some kind of cron which will check for the new data being available into a directory or file and then add it as partition to the table (Looks easy but fairly complex) Other approach, write into a single file of a table. Then create another partitioned table and then select from base table with dynamic partitions enabled, write into new table. (This will be little bad as you will always need to reprocess all the data or limit data with where clause and adding to particular partition only ) On Fri, Sep 13, 2013 at 7:25 AM, ch huang wrote: > hi,all: > i use flume collect log data and put it in hdfs ,i want to use > hive to do some caculate, query based on timerange,i want to use parttion > table , > but the data file in hdfs is a big file ,how can i put it into pratition > table in hive? > -- Nitin Pawar
question about partition table in hive
hi,all: i use flume collect log data and put it in hdfs ,i want to use hive to do some caculate, query based on timerange,i want to use parttion table , but the data file in hdfs is a big file ,how can i put it into pratition table in hive?