Re: Files does not exist error: concurrency control on hive queries...

Prasad Chakka Wed, 09 Sep 2009 09:55:39 -0700

If a certain input file/dir does not exist then the job can’t be submitted. 
Since only a few reducers are failing, the problem could be something else.
Eva, Does the same job succeed on a second try? Ie. Is the file/dir available 
eventually? What is the replication factor?

Prasad

________________________________
From: Yongqiang He <heyongqi...@software.ict.ac.cn>
Reply-To: <hive-user@hadoop.apache.org>
Date: Wed, 9 Sep 2009 04:07:31 -0700
To: <hive-user@hadoop.apache.org>
Subject: Re: Files does not exist error: concurrency control on hive queries...

Hi Eva,
   After a close at the code, I think this is not a bug. We need to find out 
how to avoid this.

Thanks,
Yongqiang
On 09-9-9 下午1:31, "He Yongqiang" <heyongqi...@software.ict.ac.cn> wrote:

Hi Eva,
    Can you open a new jira for this?  And let’s discuss and resolve this issue.
I guess this is because the partition metadata is added before the data is 
available.

Thanks
Yongqiang
On 09-9-9 下午1:18, "Eva Tse" <e...@netflix.com> wrote:

We are planning to start enabling ad-hoc querying on our hive warehouse and we 
tested some of the concurrent queries and found the following issue:

Query 1 – doing ‘insert overwrite table yyy .... partition (dateint = xxx) 
select ...  from yyy where dateint = xxx’  This is done to merge small files 
within a partition in table yyy
Query 2 – doing some select on the same table joining another table.

What we found is that query 2 would fail with the following exceptions in 
multiple reducers.
java.io.FileNotFoundException: File does not exist: 
hdfs://ip-10-251-98-80.ec2.internal:9000/user/hive/dataeng/warehouse/nccp_session_facts/dateint=20090908/hour=9/sessionsFacts_P20090909T021823L20090908T09-r-00006
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:457)
 at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:671)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
 at 
org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43)
 at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:236)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:336)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)

Is this expected? If so, is there a jira or is it planned to be addressed? We 
are trying to think of workaround, but haven’t thought of good ones as swapping 
of files would ideally be handled inside hive.

Please let us know your feedback.

Thanks,
Eva.

Re: Files does not exist error: concurrency control on hive queries...

Reply via email to