We are planning to start enabling ad-hoc querying on our hive warehouse and
we tested some of the concurrent queries and found the following issue:

Query 1 ­ doing Œinsert overwrite table yyy .... partition (dateint = xxx)
select ...  from yyy where dateint = xxx¹  This is done to merge small files
within a partition in table yyy
Query 2 ­ doing some select on the same table joining another table.

What we found is that query 2 would fail with the following exceptions in
multiple reducers. 
java.io.FileNotFoundException: File does not exist:
hdfs://ip-10-251-98-80.ec2.internal:9000/user/hive/dataeng/warehouse/nccp_se
ssion_facts/dateint=20090908/hour=9/sessionsFacts_P20090909T021823L20090908T
09-r-00006
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSy
stem.java:457)
 at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:671)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
 at 
org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.
java:43)
 at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFil
eInputFormat.java:63)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat
.java:236)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:336)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)

Is this expected? If so, is there a jira or is it planned to be addressed?
We are trying to think of workaround, but haven¹t thought of good ones as
swapping of files would ideally be handled inside hive.

Please let us know your feedback.

Thanks,
Eva.

Reply via email to