[
https://issues.apache.org/jira/browse/HADOOP-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550793
]
Christian Kunz commented on HADOOP-308:
---------------------------------------
This still happens. I identified a taskTracker which has 4 disks, of which only
one is read-only, and seemingly submits any task to the read-only disk (no job
got successfully submitted since Nov 30), although mapred.local.dir in
hadoop-site.xml specifies local directories on all 4 disks. This node has 3
good disks, still accepts tasks, but cannot execute any of them, de-facto an
unusable node without being detected as such.
The reason seems to be that the localizeJob method in TaskTracker defines a
path localJarFile
always on the same disk, because the hash is just based on
'taskTracker/jobcache/', independent on the job id,
and when the disk has become read-only after that directory got created, then
the checking in getLocalPath in Configuration.java does not help to identify
the disk as read-only.
Exception(s) look like:
Error initializing task_200712090222_0017_m_000870_0:
java.io.IOException: Mkdirs failed to create<localDir on disk
2>taskTracker/jobcache/job_200712090222_0017
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:345)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:353)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:260)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:139)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:853)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:834)
at
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:585)
at
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1143)
at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:807)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1179)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1880)
> Task Tracker does not handle the case of read only local dir case correctly
> ----------------------------------------------------------------------------
>
> Key: HADOOP-308
> URL: https://issues.apache.org/jira/browse/HADOOP-308
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.3.2
> Environment: all
> Reporter: Runping Qi
> Assignee: Owen O'Malley
>
> In case that the local dir is not writable on a node, the tasks on the node
> will fail as expected, with an exception like:
> (Read-only file system) at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:179)
> at java.io.FileOutputStream.(FileOutputStream.java:131)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:723)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:241)
> at
> org.apache.hadoop.dfs.DistributedFileSystem.createRaw(DistributedFileSystem.java:96)
>
> at
> org.apache.hadoop.fs.FSDataOutputStream$Summer.(FSDataOutputStream.java:44)
> at org.apache.hadoop.fs.FSDataOutputStream.(FSDataOutputStream.java:134)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:224)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:176)
> ....
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:265)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:847)
> However, the task tracker will continue accept new tasks and continue to fail.
> The runloop of tasktracker should detect such a problem and exits.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.