Re: Virtual Columns error

2010-09-20 Thread Thiruvel Thirumoolan
It should be INPUT__FILE__NAME and BLOCK__OFFSET__INSIDE__FILE.

On Sep 20, 2010, at 3:15 PM, lei liu wrote:

 I use hive0.6 version and  execute 'select INPUT_FILE_NAME,  
 BLOCK_OFFSET_INSIDE_FILE from person1' statement,  hive0.6 throws below error:
 FAILED: Error in semantic analysis: line 1:7 Invalid Table Alias or Column 
 Reference INPUT_FILE_NAME error.
 
 Don't hive0.6 support virtual columns?
 
 



Re: Virtual Columns error

2010-09-20 Thread Thiruvel Thirumoolan
I dont think https://issues.apache.org/jira/browse/HIVE-417 which added virtual 
columns was committed to 0.6.

https://issues.apache.org/jira/browse/HIVE-417
On Sep 20, 2010, at 3:47 PM, Thiruvel Thirumoolan wrote:

It should be INPUT__FILE__NAME and BLOCK__OFFSET__INSIDE__FILE.

On Sep 20, 2010, at 3:15 PM, lei liu wrote:

I use hive0.6 version and  execute 'select INPUT_FILE_NAME,  
BLOCK_OFFSET_INSIDE_FILE from person1' statement,  hive0.6 throws below error:
FAILED: Error in semantic analysis: line 1:7 Invalid Table Alias or Column 
Reference INPUT_FILE_NAME error.

Don't hive0.6 support virtual columns?






Re: add partition

2010-09-19 Thread Thiruvel Thirumoolan
You can specify the location of a partition while adding one. Location should 
be a directory though.

ALTER TABLE test ADD PARTITION(pt='01' ) LOCATION '/user/hive/warehouse/user'

On Sep 19, 2010, at 1:33 PM, lei liu wrote:

I use below statement to create one tabale and add one partition:
create external table test(userid bigint,name string, age int) partitioned 
by(pt string);
alter table test add partition(pt='01');


Now there is one file in HDFS, the file path is /user/hive/warehouse/user, I 
use load statement to load the file to partition: load data inpath 
'/user/hive/warehouse/user'  into table test partition(pt='01'). I find the 
file path is changed, form /user/hive/warehouse/user to 
/user/hive/warehouse/test/pt=01. I want to do't change the file path, how can I 
do it?



Re: Multi Table Inserts produces multiple jobs

2010-08-24 Thread Thiruvel Thirumoolan
Hi Cristi,

The source_table is scanned only once in a multi-insert scenario, whereas if u 
have 2 queries it will be scanned twice.

If you do an 'explain extended' on the query you would know the flow of data.

You could find related info @ 
http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook - Slides 
51-53.

-Thiruvel

On Aug 24, 2010, at 9:18 PM, Cristi Cioriia wrote:

 Hi guys,
 
 I would like to use the Multi Insert feature of HIVE so that I could
 have fewer map-reduce jobs than running separate queries.
 
 I have some HIVE queries that use the Multi Insert feature as below:
 
 FROM source_table
 INSERT OVERWRITE TABLE tablename1 
 SELECT field1, field2 ...fieldN 
 GROUP BY field1, field2 
 INSERT OVERWRITE TABLE tablename2
 SELECT field1,  field3 ... fieldK
 GROUP BY field1, field3
 
 I was hoping that by using this feature only 1 Map-Reduce job will be
 created, but what I found out when running the query is that 2  jobs are
 created, just as if I would have ran 2 separate queries:
 
 FROM source_table
 INSERT OVERWRITE TABLE tablename1 
 SELECT field1, field2 ...fieldN 
 GROUP BY field1, field2
 
 FROM source_table
 INSERT OVERWRITE TABLE tablename1 
 SELECT field1,  field3 ... fieldK
 GROUP BY field1, field3
 
 Is there any way that I can get only 1 MR job with the multi insert
 syntax?
 
 Thanks,
 Cristi
 
 
 
 
 



Re: I modify the HiveInputFormat.java class, but the content modified don't take effect.

2010-08-23 Thread Thiruvel Thirumoolan
Did you change hive-log4j.properties? By default the logging threshold is WARN. 
You have to change it to INFO.


On Aug 22, 2010, at 8:03 AM, lei liu wrote:

 I add one line code in  HiveInputFormat.java class, example: 
 LOG.info(1), then I package the codes into hive_exec.jar and put 
 the code in $HIVE_HOME/lib, but the code I add don't take effect, I want to 
 konw why it don't take effect, whether hadoop cache the hive_exec.jar.
 
 By theway, I use hive-0.4.1 and hadoop-0.19.2.
 
 
 Thanks,
 
 LiuLei



Shud partition order be same in create and insert?

2010-08-06 Thread Thiruvel Thirumoolan
Hello,

When the order of partitioning columns is different in create table and insert, 
I am not able to query for any data. However if the order is the same its 
possible to see data.

Should partitioning order be maintained through all inserts? As you see below, 
kv2.txt is still on HDFS and 2 different partition orders are created. Running 
off hive trunk.

hive CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING, 
country STRING);  
hive LOAD DATA LOCAL INPATH '/tmp/kv2.txt' OVERWRITE INTO TABLE invites 
PARTITION (country='india', ds='2008-08-15');
hive select * from invites;
hive 

[thiru...@hive]$ hadoop fs -lsr /user/hive/warehouse/invites
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/country=india
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/country=india/ds=2008-08-15
-rw-r--r--   1 thiruvel supergroup   5791 2010-08-06 12:52 
/user/hive/warehouse/invites/country=india/ds=2008-08-15/kv2.txt
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/ds=2008-08-15
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/ds=2008-08-15/country=india
[thiruvel@ hive]$ 

Thanks,
Thiruvel