Shud partition order be same in create and insert?
Hello, When the order of partitioning columns is different in create table and insert, I am not able to query for any data. However if the order is the same its possible to see data. Should partitioning order be maintained through all inserts? As you see below, kv2.txt is still on HDFS and 2 different partition orders are created. Running off hive trunk. hive CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING, country STRING); hive LOAD DATA LOCAL INPATH '/tmp/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (country='india', ds='2008-08-15'); hive select * from invites; hive [thiru...@hive]$ hadoop fs -lsr /user/hive/warehouse/invites drwxr-xr-x - thiruvel supergroup 0 2010-08-06 12:52 /user/hive/warehouse/invites/country=india drwxr-xr-x - thiruvel supergroup 0 2010-08-06 12:52 /user/hive/warehouse/invites/country=india/ds=2008-08-15 -rw-r--r-- 1 thiruvel supergroup 5791 2010-08-06 12:52 /user/hive/warehouse/invites/country=india/ds=2008-08-15/kv2.txt drwxr-xr-x - thiruvel supergroup 0 2010-08-06 12:52 /user/hive/warehouse/invites/ds=2008-08-15 drwxr-xr-x - thiruvel supergroup 0 2010-08-06 12:52 /user/hive/warehouse/invites/ds=2008-08-15/country=india [thiruvel@ hive]$ Thanks, Thiruvel
how to debug code in org.apache.hadoop.hive.ql.exec package
how can I debug code in org.apache.hadoop.hive.ql.exec package?
RE: Shud partition order be same in create and insert?
The order should be the same. Can you file a jira for this issue ? We should throw an error. Thanks, -namit From: Thiruvel Thirumoolan [thiru...@yahoo-inc.com] Sent: Friday, August 06, 2010 12:29 AM To: hive-user@hadoop.apache.org Subject: Shud partition order be same in create and insert? Hello, When the order of partitioning columns is different in create table and insert, I am not able to query for any data. However if the order is the same its possible to see data. Should partitioning order be maintained through all inserts? As you see below, kv2.txt is still on HDFS and 2 different partition orders are created. Running off hive trunk. hive CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING, country STRING); hive LOAD DATA LOCAL INPATH '/tmp/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (country='india', ds='2008-08-15'); hive select * from invites; hive [thiru...@hive]$ hadoop fs -lsr /user/hive/warehouse/invites drwxr-xr-x - thiruvel supergroup 0 2010-08-06 12:52 /user/hive/warehouse/invites/country=india drwxr-xr-x - thiruvel supergroup 0 2010-08-06 12:52 /user/hive/warehouse/invites/country=india/ds=2008-08-15 -rw-r--r-- 1 thiruvel supergroup 5791 2010-08-06 12:52 /user/hive/warehouse/invites/country=india/ds=2008-08-15/kv2.txt drwxr-xr-x - thiruvel supergroup 0 2010-08-06 12:52 /user/hive/warehouse/invites/ds=2008-08-15 drwxr-xr-x - thiruvel supergroup 0 2010-08-06 12:52 /user/hive/warehouse/invites/ds=2008-08-15/country=india [thiruvel@ hive]$ Thanks, Thiruvel
RE: How to merge small files
HIVEMERGEMAPFILES(hive.merge.mapfiles, true), HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false), Set the above parameters to true before your query. From: lei liu [liulei...@gmail.com] Sent: Thursday, August 05, 2010 8:47 PM To: hive-user@hadoop.apache.org Subject: How to merge small files When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop, How can I merge these small files? Thanks, LiuLei
Syntax and Semantics for Continuous queries in Hive
Hello, I am curious whether large streaming data can be queried by syantax and semantics of continuous queries inside Hive as defined in Streams Databases (e.g. StreamBase, Coral8 etc.). Continuous queries are essential for real-time enterprise, log data and many other applications. Thanks, Shyam Sarkar
Re: How HIVE manages a join
Yongqiang mentioned he was going to update the wiki with this information in the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw. Yongqiang, have you gotten a chance to complete the sort merge bucket map join and the other skew join you mention in the above thread? Thanks, Jeff On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada bhara...@students.iiit.ac.in wrote: Roberto .. You can find these links useful .. http://www.slideshare.net/ragho/hive-icde-2010?src=related_normalrel=2374551- Simple joins and optimizations.. http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team - New kind of joins / features of hive .. Thanks Bharath.V 4th year Undergraduate.. IIIT Hyderabad On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto roberto.ca...@guest.telecomitalia.it wrote: Hi, I cannot find any documentation about what algorithm performs HIVE to translate JOIN clauses to Map-Reduce tasks. In particular, if I have two tables A and B, each table is written on a separate file and each file is splitted on hadoop nodes. When I perform a JOIN with A.column = B.column, the framework has to compare full data from the first file and full data from the second file. In order to perform a full scan of all possibile combinations of values, how can hadoop perform it? If each node contains a portion of each file, it seems not possible to have a complete comparison. Does one of the two files enterely replicated on each node? Or, does HIVE use another kind of strategy/optimization? Thanks.