Jim Krehl created HIVE-3587:
-------------------------------
Summary: Lost data during INSERT query
Key: HIVE-3587
URL: https://issues.apache.org/jira/browse/HIVE-3587
Project: Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.9.0
Environment: Ubuntu 10.04
Hadoop MapReduce 0.20.2
Cloudera 4.1.0
3 data/task nodes
Reporter: Jim Krehl
Priority: Critical
I'm trying to load a table using an INSERT query [1]. Not all the data is
making it from the original table into the new table. The query generates 2
jobs. The first job takes about 45 minutes with mapred.mapper.class =
org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileMergeMapper and the second
takes ~10 seconds with mapred.mapper.class =
org.apache.hadoop.hive.ql.exec.ExecMapper. Toward the end (< 2 minutes) of the
first job a number of IOExceptions are raised [2]. The exceptions are only
raised in the last mapper task to complete, the other mapper tasks complete
successfully. The exceptions indicate that an expected temporary file is
missing. The second jobs completes entirely successfully. According to the
task tracker web interface the jobs are run sequentially with no overlap.
However, the second job spawns a number of tasks which rename the very
temporary files that are the cause of the failures in the first job [3].
[1]
https://cwiki.apache.org/Hive/languagemanual-dml.html#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
[2] Example: ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file
/tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
File does not exist. Holder DFSClient_NONMAPREDUCE_-672101740_1 does not have
any open files.
[3] Example: 2012-10-16 15:36:57,605 INFO RCFileMergeMapper: renamed path
hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_task_tmp.-ext-10000/month=2012-01/_tmp.000011_0
to
hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_tmp.-ext-10000/month=2012-01/000011_0
. File size is 3482
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira