External table with partitions
Is it possible to instruct hive to get only specific files from a partitioned external table? For example I have the following directory structure data/dd=2012-12-31/a1.txt data/dd=2012-12-31/a2.txt data/dd=2012-12-31/a3.txt data/dd=2012-12-31/a4.txt data/dd=2012-12-31/b1.txt data/dd=2012-12-31/b2.txt data/dd=2012-12-31/b2.txt Is it possible to add 2012-12-31 as a partition and tell hive to load only the a* files to the table? Thanks,
Re: External table with partitions
If all files are in same partition then they satisfy condition of same value as partion column . You cannot do with hive but can have one intermediate table and then to move required files using glob pattern --- Sent from Mobile , short and crisp. On 07-Jan-2013 1:07 AM, Oded Poncz o...@ubimo.com wrote: Is it possible to instruct hive to get only specific files from a partitioned external table? For example I have the following directory structure data/dd=2012-12-31/a1.txt data/dd=2012-12-31/a2.txt data/dd=2012-12-31/a3.txt data/dd=2012-12-31/a4.txt data/dd=2012-12-31/b1.txt data/dd=2012-12-31/b2.txt data/dd=2012-12-31/b2.txt Is it possible to add 2012-12-31 as a partition and tell hive to load only the a* files to the table? Thanks,
Re: External table with partitions
Hi Oded If you have created the directories manually that would come visible to the hive table only if the partitions/ sub dirs are added to the meta data using 'ALTER TABLE ... ADD PARTITION' . Partitions are not retrieved implicitly into hive tabe even if you have a proper sub dir structure. Similarly if you don't need a particular partition on your table permanently you can always delete them using the alter table command. If you are intending to use a particular partition alone in your query no need to alter the partitions. Just append a where clause to the query that has scope only on the required partitions. Hope this helps. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Oded Poncz o...@ubimo.com Date: Sun, 6 Jan 2013 16:07:26 To: user@hive.apache.org Reply-To: user@hive.apache.org Subject: External table with partitions Is it possible to instruct hive to get only specific files from a partitioned external table? For example I have the following directory structure data/dd=2012-12-31/a1.txt data/dd=2012-12-31/a2.txt data/dd=2012-12-31/a3.txt data/dd=2012-12-31/a4.txt data/dd=2012-12-31/b1.txt data/dd=2012-12-31/b2.txt data/dd=2012-12-31/b2.txt Is it possible to add 2012-12-31 as a partition and tell hive to load only the a* files to the table? Thanks,
Re: External table with partitions
Sorry, I din understand your query on first look through. Like Jagat said, you may need to go with a temp table for this. Do a hadoop fs -cp ../../a.* destn dir Create a external table with location as 'destn dir'. CREATE EXERNAL TABLE tmp tble name LIKE src table name LOCATION '' ; NB: I just gave the syntax from memory. please check the syntax in hive user guide. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: bejoy...@yahoo.com Date: Sun, 6 Jan 2013 14:39:45 To: user@hive.apache.org Reply-To: user@hive.apache.org Subject: Re: External table with partitions Hi Oded If you have created the directories manually that would come visible to the hive table only if the partitions/ sub dirs are added to the meta data using 'ALTER TABLE ... ADD PARTITION' . Partitions are not retrieved implicitly into hive tabe even if you have a proper sub dir structure. Similarly if you don't need a particular partition on your table permanently you can always delete them using the alter table command. If you are intending to use a particular partition alone in your query no need to alter the partitions. Just append a where clause to the query that has scope only on the required partitions. Hope this helps. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Oded Poncz o...@ubimo.com Date: Sun, 6 Jan 2013 16:07:26 To: user@hive.apache.org Reply-To: user@hive.apache.org Subject: External table with partitions Is it possible to instruct hive to get only specific files from a partitioned external table? For example I have the following directory structure data/dd=2012-12-31/a1.txt data/dd=2012-12-31/a2.txt data/dd=2012-12-31/a3.txt data/dd=2012-12-31/a4.txt data/dd=2012-12-31/b1.txt data/dd=2012-12-31/b2.txt data/dd=2012-12-31/b2.txt Is it possible to add 2012-12-31 as a partition and tell hive to load only the a* files to the table? Thanks,
query external table with partitions not return any data
Met a problem that data in an external table didn't get read by hive. Here's how the table was created and data loaded. - Created an external table w/ a partition, pointing to an existing location in hdfs as follows : create external table order_external (item string, quantity int) partitioned by (dt string) row format delimited fields terminated by '\t' stored as textfile location '/user/usera/data/hivetables/order'; - Data from a local file system copied to hdfs Have 2 data files in local file system order.2011-03-01.01, which contains 2 entries order.2011-03-01.02, which contains 1 entry cd to data file folder hadoop fs -copyFromLocal order.* /user/usera/data/hivetables/order/dt=2011-03-01 verify data is there hadoop fs -cat /user/usera/data/hivetables/order/dt=2011-03-01/* returns 3 entries = android2 iphone3 ipad1 - Now, query all items in partition dt='2011-03-01' select * from order_external o where o.dt='2011-03-01'; this does not show any entry nor did select * from order_external. I also played with an external table created similar to above, the same location (w/o 'dt=...' folder ofcourse) and data used, the same schema and table name, etc., except that the only difference is this external table is created without a partition. Querying the table shows correct results. Didn't have this problem w/ internal table that has partitions. So, what is wrong or missing? Any idea? Thanks! --mj
Re: query external table with partitions not return any data
solved. uh, thought that hive will by default look into the table folder in hdfs and match sub-folders with partition column names to recognize partitions automatically. But realized partition addition has to be done explicitly by giving partition name and location. So, by doing alter table add partition (column=value) solved this (no need to give location since column=value is a subfolder under table folder in hdfs ;) ... On Wed, Mar 23, 2011 at 12:41 PM, Michael Jiang it.mjji...@gmail.comwrote: Met a problem that data in an external table didn't get read by hive. Here's how the table was created and data loaded. - Created an external table w/ a partition, pointing to an existing location in hdfs as follows : create external table order_external (item string, quantity int) partitioned by (dt string) row format delimited fields terminated by '\t' stored as textfile location '/user/usera/data/hivetables/order'; - Data from a local file system copied to hdfs Have 2 data files in local file system order.2011-03-01.01, which contains 2 entries order.2011-03-01.02, which contains 1 entry cd to data file folder hadoop fs -copyFromLocal order.* /user/usera/data/hivetables/order/dt=2011-03-01 verify data is there hadoop fs -cat /user/usera/data/hivetables/order/dt=2011-03-01/* returns 3 entries = android2 iphone3 ipad1 - Now, query all items in partition dt='2011-03-01' select * from order_external o where o.dt='2011-03-01'; this does not show any entry nor did select * from order_external. I also played with an external table created similar to above, the same location (w/o 'dt=...' folder ofcourse) and data used, the same schema and table name, etc., except that the only difference is this external table is created without a partition. Querying the table shows correct results. Didn't have this problem w/ internal table that has partitions. So, what is wrong or missing? Any idea? Thanks! --mj