Shud partition order be same in create and insert?

2010-08-06 Thread Thiruvel Thirumoolan
Hello,

When the order of partitioning columns is different in create table and insert, 
I am not able to query for any data. However if the order is the same its 
possible to see data.

Should partitioning order be maintained through all inserts? As you see below, 
kv2.txt is still on HDFS and 2 different partition orders are created. Running 
off hive trunk.

hive CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING, 
country STRING);  
hive LOAD DATA LOCAL INPATH '/tmp/kv2.txt' OVERWRITE INTO TABLE invites 
PARTITION (country='india', ds='2008-08-15');
hive select * from invites;
hive 

[thiru...@hive]$ hadoop fs -lsr /user/hive/warehouse/invites
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/country=india
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/country=india/ds=2008-08-15
-rw-r--r--   1 thiruvel supergroup   5791 2010-08-06 12:52 
/user/hive/warehouse/invites/country=india/ds=2008-08-15/kv2.txt
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/ds=2008-08-15
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/ds=2008-08-15/country=india
[thiruvel@ hive]$ 

Thanks,
Thiruvel

how to debug code in org.apache.hadoop.hive.ql.exec package

2010-08-06 Thread lei liu
how can I debug code in org.apache.hadoop.hive.ql.exec package?


RE: Shud partition order be same in create and insert?

2010-08-06 Thread Namit Jain
The order should be the same. 

Can you file a jira for this issue ? We should throw an error.


Thanks,
-namit


From: Thiruvel Thirumoolan [thiru...@yahoo-inc.com]
Sent: Friday, August 06, 2010 12:29 AM
To: hive-user@hadoop.apache.org
Subject: Shud partition order be same in create and insert?

Hello,

When the order of partitioning columns is different in create table and insert, 
I am not able to query for any data. However if the order is the same its 
possible to see data.

Should partitioning order be maintained through all inserts? As you see below, 
kv2.txt is still on HDFS and 2 different partition orders are created. Running 
off hive trunk.

hive CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING, 
country STRING);
hive LOAD DATA LOCAL INPATH '/tmp/kv2.txt' OVERWRITE INTO TABLE invites 
PARTITION (country='india', ds='2008-08-15');
hive select * from invites;
hive

[thiru...@hive]$ hadoop fs -lsr /user/hive/warehouse/invites
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/country=india
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/country=india/ds=2008-08-15
-rw-r--r--   1 thiruvel supergroup   5791 2010-08-06 12:52 
/user/hive/warehouse/invites/country=india/ds=2008-08-15/kv2.txt
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/ds=2008-08-15
drwxr-xr-x   - thiruvel supergroup  0 2010-08-06 12:52 
/user/hive/warehouse/invites/ds=2008-08-15/country=india
[thiruvel@ hive]$

Thanks,
Thiruvel


RE: How to merge small files

2010-08-06 Thread Namit Jain
HIVEMERGEMAPFILES(hive.merge.mapfiles, true),
HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false),


Set the above parameters to true before your query.




From: lei liu [liulei...@gmail.com]
Sent: Thursday, August 05, 2010 8:47 PM
To: hive-user@hadoop.apache.org
Subject: How to merge small files

When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1 FROM 
from_statement, there are many files which size is zero are stored to hadoop,

How can I merge these small files?

Thanks,



LiuLei



Syntax and Semantics for Continuous queries in Hive

2010-08-06 Thread Shyam Sarkar
Hello,

I am curious whether large streaming data can be queried by syantax and 
semantics of continuous queries inside Hive as defined in Streams Databases 
(e.g. StreamBase, Coral8 etc.).  Continuous queries are essential for real-time 
enterprise, log data and many other applications.

Thanks,
Shyam Sarkar



  



Re: How HIVE manages a join

2010-08-06 Thread Jeff Hammerbacher
Yongqiang mentioned he was going to update the wiki with this information in
the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.

Yongqiang, have you gotten a chance to complete the sort merge bucket map
join and the other skew join you mention in the above thread?

Thanks,
Jeff

On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada 
bhara...@students.iiit.ac.in wrote:

 Roberto ..

 You can find these links useful ..


 http://www.slideshare.net/ragho/hive-icde-2010?src=related_normalrel=2374551-
  Simple joins and optimizations..

 http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
 New kind of joins / features of hive ..

 Thanks

 Bharath.V
 4th year Undergraduate..
 IIIT Hyderabad


 On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto 
 roberto.ca...@guest.telecomitalia.it wrote:

 Hi,

 I cannot find any documentation about what algorithm performs HIVE to
 translate JOIN clauses to Map-Reduce tasks.

 In particular, if I have two tables A and B, each table is written on a
 separate file and each file is splitted on hadoop nodes. When I perform a
 JOIN with A.column = B.column, the framework has to compare full data from
 the first file and full data from the second file. In order to perform a
 full scan of all possibile combinations of values, how can hadoop perform
 it? If each node contains a portion of each file, it seems not possible to
 have a complete comparison. Does one of the two files enterely replicated on
 each node? Or, does HIVE use another kind of strategy/optimization?

 Thanks.