[jira] Commented: (HADOOP-2391) Speculative Execution race condition with output paths

Arun C Murthy (JIRA) Mon, 10 Dec 2007 23:03:04 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550332
 ]


Arun C Murthy commented on HADOOP-2391:
---------------------------------------

bq. I will say that this only happens because hadoop keep the output of tasks 
in a task_id directory directly beneath the output directory.

Agreed.

The reason this is done is that once we have file-permissions and/or quotas in 
HDFS this will continue to work seamlessly... the other option of putting the 
temporary outputs elsewhere is brittle as soon as we do permissions/quotas 
unless we ask the user to specify yet another temp-dir in his 
job-configuration; IMHO a needless complication.

----

Taking a step back, I'd argue the simplest solution is for your input format to 
just ignore input directories beginning with underscore, thus all 
${mapred.output.dir}/__${taskid}_ dirs are silently ignored. 

In fact {{FileInputFormat}} does exactly that: there is a {{hiddenFileFilter}} 
which ignores files starting with either "_" or ".". This bug then becomes a 
question of figuring why that filter isn't working...

> Speculative Execution race condition with output paths
> ------------------------------------------------------
>
>                 Key: HADOOP-2391
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2391
>             Project: Hadoop
>          Issue Type: Bug
>         Environment: all
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>
> I am tracking a problem where when speculative execution is enabled, there is 
> a race condition when trying to read output paths from a previously completed 
> job.  More specifically when reduce tasks run their output is put into a 
> working directory under the task name until the task in completed.  The 
> directory name is something like workdir/_taskid.  Upon completion the output 
> get moved into workdir.  Regular tasks are checked for this move and not 
> considered completed until this move is made.  I have not verified it but all 
> indications point to speculative tasks NOT having this same check for 
> completion and more importantly removal when killed.  So what we end up with 
> when trying to read the output of previous tasks with speculative execution 
> enabled is the possibility that previous workdir/_taskid will be present when 
> the output directory is read by a chained job.  Here is an error when 
> supports my theory:
> Generator: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot 
> open filename 
> /u01/hadoop/mapred/temp/generate-temp-1197104928603/_task_200712080949_0005_r_000014_1
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:234)
>         at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:389)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:644)
>         at org.apache.hadoop.ipc.Client.call(Client.java:507)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:186)
>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>         at 
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:839)
>         at 
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:831)
>         at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:263)
>         at 
> org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:114)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1356)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344)
>         at 
> org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:87)
>         at org.apache.nutch.crawl.Generator.generate(Generator.java:429)
>         at org.apache.nutch.crawl.Generator.run(Generator.java:563)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
>         at org.apache.nutch.crawl.Generator.main(Generator.java:526)
> I will continue to research this and post as I make progress on tracking down 
> this bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2391) Speculative Execution race condition with output paths

Reply via email to