Hello All,
I'm trying to run Pig e2e tests in parallel and there are many
failures like this in local mode:
WARN org.apache.hadoop.mapred.Task - Could not find output size
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
output/file.out in any of the configured local directories
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
at
org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:944)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:924)
at org.apache.hadoop.mapred.Task.done(Task.java:875)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:374)
It seems that the problem is in concurrent access to the JobTracker's
temporary directory - file.out is a temporary JobTracker's file. It's
clearly visible that different tests open files in the same directory:
$ lsof | grep output
java 20719 ikatsov 13r REG 8,1 3486
17039996
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
java 20719 ikatsov 16r REG 8,1 349196
17039986
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out
$ lsof | grep output
java 25410 ikatsov 13w REG 8,1 8145
17039997
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
$ lsof | grep output
java 2223 ikatsov 13r REG 8,1 289196
16384629
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0003/attempt_local_0003_r_000000_0/output/map_0.out
$ lsof | grep output
java 12187 ikatsov 14r REG 8,1 349196
17039996
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
java 12187 ikatsov 17r REG 8,1 349196
17039999
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out
I wonder, is there way to specify temporary Hadoop directory
(mapreduce.cluster.local.dir) when launching Pig in local mode?
Thank you in advance,
Ilya