I faced the org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException exception once. What I was doing that I was overriding FileOutputFormat in some class and in that I had opend a file stream. I did this because I needed only a single file as an output. It was working fine when I had only one reducer. But when I increased the number of reducers, every reducer was trying to create/use a file with the same name, therefore I got AlreadyBeingCreatedException.
your case may be different, but I thought to share mine. On Tue, Sep 29, 2009 at 11:03 AM, Jason Venner <[email protected]>wrote: > How long does it take you to create a file in on one of your datanodes, in > the dfs block storage area, while your job is running, it could simply be > that the OS level file creation is taking longer than the RPC timeout. > > On Mon, Sep 28, 2009 at 5:30 PM, dave bayer <[email protected]> > wrote: > > > On a cluster running 0.19.2 > > > > We have some production jobs that perform ETL tasks that open files > > in hdfs during the reduce task (with speculative execution in reduce > stage > > programmatically turned off). Since upgrading the cluster from 0.19.1, > > we've > > been seeing some odd behavior in that we are experiencing timeouts with > > block/file creation, timeouts that are long enough that the reduce > attempt > > gets > > killed. Subsequent reduce attempts then fail because the first killed > > attempt > > is still noted (by the namenode I assume) to create the block/file > > according to > > the exception that bubbles up. Didn't see anything like this in JIRA, and > > I'm > > trying to grab a few jstacks from the namenode when I see these errors > pop > > up (usually correlated with a somewhat busy cluster) in an effort to get > > some > > idea of what is going on here. > > > > Currently the cluster is small with about 5 data nodes and 10s of TBs > with > > the 2x the namespace files easily fitting in memory.... I don't see any > > process > > eating more than a couple percent of cpu on the name node box (which > > also hosts the secondary nn). iostat shows 100-200 block read/written > every > > other second on this host leaving plenty of headroom there. The cluster > is > > scheduled to grow in the near future, which may worsen this hang/blocking > > if its due to a bottleneck. > > > > Before I start tracing through the code, I thought I might ask whether > > anyone > > has seen anything the exerts from the jobtracker logs below? Is there a > way > > to guarantee that all in processes takes for a given reduce task will be > > terminated (and any associated network connections be sent a reset or > > something) before a new reduce task is started. > > > > On kind of side thought - is the task attempt name in the jobconf that is > > handed > > to the reduce in configure() and if so - what might the setting name be > to > > get at > > it? Or does one need to go through a more circuitous route to obtain the > > TaskAttemptID associated with the attempt? > > > > Back to the point at hand, from the jobtracker logs: > > > > Failing initial reduce: > > ---------------------------- > > 2009-09-27 22:24:25,056 INFO org.apache.hadoop.mapred.TaskInProgress: > Error > > from attempt_200909231347_0694_r_000002_0: > java.net.SocketTimeoutException: > > 69000 millis timeout while waiting for channel to be ready for read. ch : > > java.nio.channels.SocketChannel[connected local=/X.X.X.2:47440 > > remote=/X.X.X.2:50010] > > at > > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) > > at > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) > > at > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) > > at > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116) > > at java.io.DataInputStream.readByte(DataInputStream.java:248) > > at > > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325) > > at > > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346) > > at org.apache.hadoop.io.Text.readString(Text.java:400) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2787) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2712) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182) > > > > Failing second reduce: > > ------------------------------- > > 2009-09-27 22:53:22,048 INFO org.apache.hadoop.mapred.TaskInProgress: > Error > > from attempt_200909231347_0694_r_000002_3: > > org.apache.hadoop.ipc.RemoteException: > > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to > > create file >blah< > > for DFSClient_attempt_200909231347_0694_r_000002_3 on client X.X.X.7, > > because this file is already being created by > > DFSClient_attempt_200909231347_0694_r_000002_0 on X.X.X.2 > > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal > > (FSNamesystem.java:1085) at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNames > > ystem.java:998) at > > org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java: > > 301) > > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > > sorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) at > > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:697) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > > at $Proxy1.create(Unknown Source) > > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > > at > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > > at $Proxy1.create(Unknown Source) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2594) > > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:188) > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487) > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:468) > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:375) > > > > > > Many thanks... > > > > dave bayer > > > > > > -- > Pro Hadoop, a book to guide you from beginner to hadoop mastery, > http://www.amazon.com/dp/1430219424?tag=jewlerymall > www.prohadoopbook.com a community for Hadoop Professionals > -- Thanks & Regards, Chandra Prakash Bhagtani,
