External table with partitions

2013-01-06 Thread Oded Poncz
Is it possible to instruct hive to get only specific files from a
partitioned external table?
For example I have the following directory structure

data/dd=2012-12-31/a1.txt
data/dd=2012-12-31/a2.txt
data/dd=2012-12-31/a3.txt
data/dd=2012-12-31/a4.txt

data/dd=2012-12-31/b1.txt
data/dd=2012-12-31/b2.txt
data/dd=2012-12-31/b2.txt

Is it possible to add 2012-12-31 as a partition and tell hive to load only
the a* files to the table?
Thanks,


Re: External table with partitions

2013-01-06 Thread Jagat Singh
If all files are in same partition then they satisfy condition of same
value as partion column .

You cannot do with hive but can have one intermediate table and then to
move required files using glob pattern

---
Sent from Mobile , short and crisp.
On 07-Jan-2013 1:07 AM, Oded Poncz o...@ubimo.com wrote:

 Is it possible to instruct hive to get only specific files from a
 partitioned external table?
 For example I have the following directory structure

 data/dd=2012-12-31/a1.txt
 data/dd=2012-12-31/a2.txt
 data/dd=2012-12-31/a3.txt
 data/dd=2012-12-31/a4.txt

 data/dd=2012-12-31/b1.txt
 data/dd=2012-12-31/b2.txt
 data/dd=2012-12-31/b2.txt

 Is it possible to add 2012-12-31 as a partition and tell hive to load only
 the a* files to the table?
 Thanks,



Re: External table with partitions

2013-01-06 Thread bejoy_ks
Hi Oded

If you have created the directories manually that would come visible to the 
hive table only if the partitions/ sub dirs are added to the meta data using
'ALTER TABLE ... ADD PARTITION' . 
Partitions are not retrieved implicitly into hive tabe even if you have a 
proper sub dir structure.

Similarly if you don't need a particular partition on your table permanently 
you can always delete them using the alter table command.

If you are intending to use a particular partition alone in your query no need 
to alter the partitions. Just append a where clause to the query that has scope 
only on the required partitions.

Hope this helps.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Oded Poncz o...@ubimo.com
Date: Sun, 6 Jan 2013 16:07:26 
To: user@hive.apache.org
Reply-To: user@hive.apache.org
Subject: External table with partitions

Is it possible to instruct hive to get only specific files from a
partitioned external table?
For example I have the following directory structure

data/dd=2012-12-31/a1.txt
data/dd=2012-12-31/a2.txt
data/dd=2012-12-31/a3.txt
data/dd=2012-12-31/a4.txt

data/dd=2012-12-31/b1.txt
data/dd=2012-12-31/b2.txt
data/dd=2012-12-31/b2.txt

Is it possible to add 2012-12-31 as a partition and tell hive to load only
the a* files to the table?
Thanks,



Re: External table with partitions

2013-01-06 Thread bejoy_ks
Sorry, I din understand your query on first look through.

Like Jagat said, you may need to go with a temp table for this.

Do a hadoop fs -cp ../../a.* destn dir

Create a external table with location as 'destn dir'.

CREATE EXERNAL TABLE tmp tble name LIKE src table name LOCATION '' ;

NB: I just gave the syntax from memory. please check the syntax in hive user 
guide.
Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: bejoy...@yahoo.com
Date: Sun, 6 Jan 2013 14:39:45 
To: user@hive.apache.org
Reply-To: user@hive.apache.org
Subject: Re: External table with partitions

Hi Oded

If you have created the directories manually that would come visible to the 
hive table only if the partitions/ sub dirs are added to the meta data using
'ALTER TABLE ... ADD PARTITION' . 
Partitions are not retrieved implicitly into hive tabe even if you have a 
proper sub dir structure.

Similarly if you don't need a particular partition on your table permanently 
you can always delete them using the alter table command.

If you are intending to use a particular partition alone in your query no need 
to alter the partitions. Just append a where clause to the query that has scope 
only on the required partitions.

Hope this helps.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Oded Poncz o...@ubimo.com
Date: Sun, 6 Jan 2013 16:07:26 
To: user@hive.apache.org
Reply-To: user@hive.apache.org
Subject: External table with partitions

Is it possible to instruct hive to get only specific files from a
partitioned external table?
For example I have the following directory structure

data/dd=2012-12-31/a1.txt
data/dd=2012-12-31/a2.txt
data/dd=2012-12-31/a3.txt
data/dd=2012-12-31/a4.txt

data/dd=2012-12-31/b1.txt
data/dd=2012-12-31/b2.txt
data/dd=2012-12-31/b2.txt

Is it possible to add 2012-12-31 as a partition and tell hive to load only
the a* files to the table?
Thanks,



query external table with partitions not return any data

2011-03-23 Thread Michael Jiang
Met a problem that data in an external table didn't get read by hive.

Here's how the table was created and data loaded.

- Created an external table w/ a partition, pointing to an existing location
in hdfs as follows :

create external table order_external (item string, quantity int) partitioned
by (dt string) row format delimited fields terminated by '\t' stored as
textfile location '/user/usera/data/hivetables/order';

- Data from a local file system copied to hdfs

Have 2 data files in local file system

order.2011-03-01.01, which contains 2 entries
order.2011-03-01.02, which contains 1 entry

cd to data file folder
hadoop fs -copyFromLocal order.*
/user/usera/data/hivetables/order/dt=2011-03-01

verify data is there
hadoop fs -cat /user/usera/data/hivetables/order/dt=2011-03-01/*
returns 3 entries =
android2
iphone3
ipad1

- Now, query all items in partition dt='2011-03-01'

select * from order_external o where o.dt='2011-03-01';

this does not show any entry nor did select * from order_external.

I also played with an external table created similar to above, the same
location (w/o 'dt=...' folder ofcourse) and data used, the same schema and
table name, etc., except that the only difference is this external table is
created without a partition. Querying the table shows correct results.
Didn't have this problem w/ internal table that has partitions.

So, what is wrong or missing? Any idea?

Thanks!
--mj


Re: query external table with partitions not return any data

2011-03-23 Thread Michael Jiang
solved.

uh, thought that hive will by default look into the table folder in hdfs and
match sub-folders with partition column names to recognize partitions
automatically. But realized partition addition has to be done explicitly by
giving partition name and location. So, by doing alter table add partition
(column=value) solved this (no need to give location since column=value
is a subfolder under table folder in hdfs ;) ...

On Wed, Mar 23, 2011 at 12:41 PM, Michael Jiang it.mjji...@gmail.comwrote:

 Met a problem that data in an external table didn't get read by hive.

 Here's how the table was created and data loaded.

 - Created an external table w/ a partition, pointing to an existing
 location in hdfs as follows :

 create external table order_external (item string, quantity int)
 partitioned by (dt string) row format delimited fields terminated by '\t'
 stored as textfile location '/user/usera/data/hivetables/order';

 - Data from a local file system copied to hdfs

 Have 2 data files in local file system

 order.2011-03-01.01, which contains 2 entries
 order.2011-03-01.02, which contains 1 entry

 cd to data file folder
 hadoop fs -copyFromLocal order.*
 /user/usera/data/hivetables/order/dt=2011-03-01

 verify data is there
 hadoop fs -cat /user/usera/data/hivetables/order/dt=2011-03-01/*
 returns 3 entries =
 android2
 iphone3
 ipad1

 - Now, query all items in partition dt='2011-03-01'

 select * from order_external o where o.dt='2011-03-01';

 this does not show any entry nor did select * from order_external.

 I also played with an external table created similar to above, the same
 location (w/o 'dt=...' folder ofcourse) and data used, the same schema and
 table name, etc., except that the only difference is this external table is
 created without a partition. Querying the table shows correct results.
 Didn't have this problem w/ internal table that has partitions.

 So, what is wrong or missing? Any idea?

 Thanks!
 --mj