Re: dfs create block sticking

Chandraprakash Bhagtani Mon, 28 Sep 2009 23:18:54 -0700

I faced the org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException
exception once.
What I was doing that I was overriding FileOutputFormat in some class and in
that I had
opend a file stream. I did this because I needed only a single file as an
output. It was working
fine when I had only one reducer. But when I increased the number of
reducers, every reducer
was trying to create/use a file with the same name, therefore I got
AlreadyBeingCreatedException.


your case may be different, but I thought to share mine.

On Tue, Sep 29, 2009 at 11:03 AM, Jason Venner <[email protected]>wrote:

> How long does it take you to create a file in on one of your datanodes, in
> the dfs block storage area, while your job is running, it could simply be
> that the OS level file creation is taking longer than the RPC timeout.
>
> On Mon, Sep 28, 2009 at 5:30 PM, dave bayer <[email protected]>
> wrote:
>
> > On a cluster running 0.19.2
> >
> > We have some production jobs that perform ETL tasks that open files
> > in hdfs during the reduce task (with speculative execution in reduce
> stage
> > programmatically turned off). Since upgrading the cluster from 0.19.1,
> > we've
> > been seeing some odd behavior in that we are experiencing timeouts with
> > block/file creation, timeouts that are long enough that the reduce
> attempt
> > gets
> > killed. Subsequent reduce attempts then fail because the first killed
> > attempt
> > is still noted (by the namenode I assume) to create the block/file
> > according to
> > the exception that bubbles up. Didn't see anything like this in JIRA, and
> > I'm
> > trying to grab a few jstacks from the namenode when I see these errors
> pop
> > up (usually correlated with a somewhat busy cluster) in an effort to get
> > some
> > idea of what is going on here.
> >
> > Currently the cluster is small with about 5 data nodes and 10s of TBs
> with
> > the 2x the namespace files easily fitting in memory.... I don't see any
> > process
> > eating more than a couple percent of cpu on the name node box (which
> > also hosts the secondary nn). iostat shows 100-200 block read/written
> every
> > other second on this host leaving plenty of headroom there. The cluster
> is
> > scheduled to grow in the near future, which may worsen this hang/blocking
> > if its due to a bottleneck.
> >
> > Before I start tracing through the code, I thought I might ask whether
> > anyone
> > has seen anything the exerts from the jobtracker logs below? Is there a
> way
> > to guarantee that all in processes takes for a given reduce task will be
> > terminated (and any associated network connections be sent a reset or
> > something) before a new reduce task is started.
> >
> > On kind of side thought - is the task attempt name in the jobconf that is
> > handed
> > to the reduce in configure() and if so - what might the setting name be
> to
> > get at
> > it? Or does one need to go through a more circuitous route to obtain the
> > TaskAttemptID associated with the attempt?
> >
> > Back to the point at hand, from the jobtracker logs:
> >
> > Failing initial reduce:
> > ----------------------------
> > 2009-09-27 22:24:25,056 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error
> > from attempt_200909231347_0694_r_000002_0:
> java.net.SocketTimeoutException:
> > 69000 millis timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/X.X.X.2:47440
> > remote=/X.X.X.2:50010]
> >       at
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
> >       at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
> >       at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
> >       at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
> >       at java.io.DataInputStream.readByte(DataInputStream.java:248)
> >       at
> > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325)
> >       at
> > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346)
> >       at org.apache.hadoop.io.Text.readString(Text.java:400)
> >       at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2787)
> >       at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2712)
> >       at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> >      at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)
> >
> > Failing second reduce:
> > -------------------------------
> > 2009-09-27 22:53:22,048 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error
> > from attempt_200909231347_0694_r_000002_3:
> > org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
> > create file >blah<
> > for DFSClient_attempt_200909231347_0694_r_000002_3 on client X.X.X.7,
> > because this file is already being created by
> > DFSClient_attempt_200909231347_0694_r_000002_0 on X.X.X.2
> >       at
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal
> > (FSNamesystem.java:1085)        at
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNames
> > ystem.java:998)        at
> > org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:
> > 301)
> >       at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> >  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> > sorImpl.java:25)
> >       at java.lang.reflect.Method.invoke(Method.java:597)        at
> > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> >       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> >
> >       at org.apache.hadoop.ipc.Client.call(Client.java:697)
> >       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> >       at $Proxy1.create(Unknown Source)
> >       at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> >       at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >       at java.lang.reflect.Method.invoke(Method.java:597)
> >       at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >       at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >       at $Proxy1.create(Unknown Source)
> >       at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2594)
> >       at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454)
> >       at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:188)
> >       at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487)
> >       at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:468)
> >       at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:375)
> >
> >
> > Many thanks...
> >
> > dave bayer
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>



-- 
Thanks & Regards,
Chandra Prakash Bhagtani,

Re: dfs create block sticking

Reply via email to