Inserting into a table from a map/reduce transform results in no data
---------------------------------------------------------------------
Key: HIVE-233
URL: https://issues.apache.org/jira/browse/HIVE-233
Project: Hadoop Hive
Issue Type: Bug
Components: Query Processor
Reporter: Josh Ferguson
When attempting to run a query of the form
INSERT OVERWRITE TABLE table_name PARTITION ( ... ) SELECT TRANSFORM ...
The table 'table_name' ends up empty even when the SELECT statement returns
valid results that can be loaded by hand.
*My target table*
CREATE TABLE percentiles
(actor_id STRING, percentile INT, count INT)
PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT,
span INT)
CLUSTERED BY (actor_id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;
*The attempted query*
INSERT OVERWRITE TABLE percentiles PARTITION (
account='cUU5T7y6DmdzMJFcFt3JDe', application='test', dataset='test',
hour=341976, span=168 ) SELECT TRANSFORM(actor_id) USING
'/Users/Josh/cluster/bin/percentiles.rb' AS (actor_id, percentile, count) FROM
( SELECT actor_id FROM activities CLUSTER BY actor_id ) actors;
*The result in hadoop (which is 0 bytes)*
$ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/*
-rw-r--r-- 1 Josh supergroup 0 2009-01-14 11:19
/user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0009_r_000000_0
*Inserting into a temporary directory first*
hive> INSERT OVERWRITE DIRECTORY 'hdfs://localhost:9000/tmp/hdfs_out' SELECT
TRANSFORM(actor_id) USING '/Users/Josh/cluster/bin/percentiles.rb' AS
(actor_id, percentile, count) FROM ( SELECT actor_id FROM activities CLUSTER BY
actor_id ) actors;
*The results in hadoop (8600 bytes)*
$ hadoop fs -ls /tmp/hdfs_out
Found 1 items
-rw-r--r-- 1 Josh supergroup 8600 2009-01-14 11:27
/tmp/hdfs_out/attempt_200901131908_0010_r_000000_0
*Loading from the temporary directory into percentiles*
hive> LOAD DATA INPATH 'hdfs://localhost:9000/tmp/hdfs_out' INTO TABLE
percentiles PARTITION ( account='cUU5T7y6DmdzMJFcFt3JDe', application='test',
dataset='test', hour=341976, span=168 );
Loading data to table percentiles partition {account=cUU5T7y6DmdzMJFcFt3JDe,
application=test, dataset=test, hour=341976, span=168}
OK
*The results in hadoop (8600 bytes)*
$ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/*
-rw-r--r-- 1 Josh supergroup 8600 2009-01-14 11:27
/user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0010_r_000000_0
So it works fine when loading via a LOAD DATA statement from a temporary
directory but not in a query with an INSERT OVERWRITE statement.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.