I had a problem like that with a custom record writer - solr-1301 On Mon, Sep 28, 2009 at 11:18 PM, Chandraprakash Bhagtani < [email protected]> wrote:
> I faced the org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException > exception once. > What I was doing that I was overriding FileOutputFormat in some class and > in > that I had > opend a file stream. I did this because I needed only a single file as an > output. It was working > fine when I had only one reducer. But when I increased the number of > reducers, every reducer > was trying to create/use a file with the same name, therefore I got > AlreadyBeingCreatedException. > > your case may be different, but I thought to share mine. > > On Tue, Sep 29, 2009 at 11:03 AM, Jason Venner <[email protected] > >wrote: > > > How long does it take you to create a file in on one of your datanodes, > in > > the dfs block storage area, while your job is running, it could simply be > > that the OS level file creation is taking longer than the RPC timeout. > > > > On Mon, Sep 28, 2009 at 5:30 PM, dave bayer <[email protected]> > > wrote: > > > > > On a cluster running 0.19.2 > > > > > > We have some production jobs that perform ETL tasks that open files > > > in hdfs during the reduce task (with speculative execution in reduce > > stage > > > programmatically turned off). Since upgrading the cluster from 0.19.1, > > > we've > > > been seeing some odd behavior in that we are experiencing timeouts with > > > block/file creation, timeouts that are long enough that the reduce > > attempt > > > gets > > > killed. Subsequent reduce attempts then fail because the first killed > > > attempt > > > is still noted (by the namenode I assume) to create the block/file > > > according to > > > the exception that bubbles up. Didn't see anything like this in JIRA, > and > > > I'm > > > trying to grab a few jstacks from the namenode when I see these errors > > pop > > > up (usually correlated with a somewhat busy cluster) in an effort to > get > > > some > > > idea of what is going on here. > > > > > > Currently the cluster is small with about 5 data nodes and 10s of TBs > > with > > > the 2x the namespace files easily fitting in memory.... I don't see any > > > process > > > eating more than a couple percent of cpu on the name node box (which > > > also hosts the secondary nn). iostat shows 100-200 block read/written > > every > > > other second on this host leaving plenty of headroom there. The cluster > > is > > > scheduled to grow in the near future, which may worsen this > hang/blocking > > > if its due to a bottleneck. > > > > > > Before I start tracing through the code, I thought I might ask whether > > > anyone > > > has seen anything the exerts from the jobtracker logs below? Is there a > > way > > > to guarantee that all in processes takes for a given reduce task will > be > > > terminated (and any associated network connections be sent a reset or > > > something) before a new reduce task is started. > > > > > > On kind of side thought - is the task attempt name in the jobconf that > is > > > handed > > > to the reduce in configure() and if so - what might the setting name be > > to > > > get at > > > it? Or does one need to go through a more circuitous route to obtain > the > > > TaskAttemptID associated with the attempt? > > > > > > Back to the point at hand, from the jobtracker logs: > > > > > > Failing initial reduce: > > > ---------------------------- > > > 2009-09-27 22:24:25,056 INFO org.apache.hadoop.mapred.TaskInProgress: > > Error > > > from attempt_200909231347_0694_r_000002_0: > > java.net.SocketTimeoutException: > > > 69000 millis timeout while waiting for channel to be ready for read. ch > : > > > java.nio.channels.SocketChannel[connected local=/X.X.X.2:47440 > > > remote=/X.X.X.2:50010] > > > at > > > > > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) > > > at > > > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) > > > at > > > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) > > > at > > > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116) > > > at java.io.DataInputStream.readByte(DataInputStream.java:248) > > > at > > > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325) > > > at > > > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346) > > > at org.apache.hadoop.io.Text.readString(Text.java:400) > > > at > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2787) > > > at > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2712) > > > at > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) > > > at > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182) > > > > > > Failing second reduce: > > > ------------------------------- > > > 2009-09-27 22:53:22,048 INFO org.apache.hadoop.mapred.TaskInProgress: > > Error > > > from attempt_200909231347_0694_r_000002_3: > > > org.apache.hadoop.ipc.RemoteException: > > > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to > > > create file >blah< > > > for DFSClient_attempt_200909231347_0694_r_000002_3 on client X.X.X.7, > > > because this file is already being created by > > > DFSClient_attempt_200909231347_0694_r_000002_0 on X.X.X.2 > > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal > > > (FSNamesystem.java:1085) at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNames > > > ystem.java:998) at > > > org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java: > > > 301) > > > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > > > sorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) at > > > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) > > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:697) > > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > > > at $Proxy1.create(Unknown Source) > > > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > > > at > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at > > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > > > at > > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > > > at $Proxy1.create(Unknown Source) > > > at > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2594) > > > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454) > > > at > > > > > > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:188) > > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487) > > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:468) > > > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:375) > > > > > > > > > Many thanks... > > > > > > dave bayer > > > > > > > > > > > -- > > Pro Hadoop, a book to guide you from beginner to hadoop mastery, > > http://www.amazon.com/dp/1430219424?tag=jewlerymall > > www.prohadoopbook.com a community for Hadoop Professionals > > > > > > -- > Thanks & Regards, > Chandra Prakash Bhagtani, > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
