Unhandled failures starting jobs with S3 as backing store
---------------------------------------------------------
Key: HADOOP-4637
URL: https://issues.apache.org/jira/browse/HADOOP-4637
Project: Hadoop Core
Issue Type: Bug
Components: fs/s3
Affects Versions: 0.18.1
Reporter: Robert
I run Hadoop 0.18.1 on Amazon EC2, with S3 as the backing store.
When starting jobs, I sometimes get the following failure, which causes the job
to be abandoned:
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
java.lang.NullPointerException
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveBlock(Jets3tFileSystemStore.java:222)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy4.retrieveBlock(Unknown Source)
at
org.apache.hadoop.fs.s3.S3InputStream.blockSeekTo(S3InputStream.java:160)
at org.apache.hadoop.fs.s3.S3InputStream.read(S3InputStream.java:119)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:214)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:150)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1212)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1193)
at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:177)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.mapred.$Proxy5.submitJob(Unknown Source)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
The stack trace suggests that copying the job file fails, because the HDFS S3
filesystem can't find all of the expected block objects when it needs them.
Since S3 is an "eventually consistent" kind of a filesystem, and does not
always provide an up-to-date view of the stored data, this execution path
probably should be strengthened - at least to retry these failed operations, or
wait for the expected block file if it hasn't shown up yet.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.