Thanks Raghu: Here is where it gets stuck:
"DataStreamer for file /hbasetrunk2/.logs/aa0-000-13.u.powerset.com_1241988169615_60021/hlog.dat.1242020985471 block blk_-1659539029802462400_12649" daemon prio=10 tid=0x00007f10a0000c00 nid=0x660 in Object.wait() [0x0000000043a33000..0x0000000043a33c80] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2322) - locked <0x00007f10e2b0c588> (a java.util.LinkedList) Which is the wait in the below in the middle of DataStream.run: // Is this block full? if (one.lastPacketInBlock) { synchronized (ackQueue) { while (!hasError && ackQueue.size() != 0 && clientRunning) { try { ackQueue.wait(); // wait for acks to arrive from datanodes } catch (InterruptedException e) { } } } Sounds like, if we set the replication down from 3 to 2 it should write a little faster. Regards increasing size of ackqueue, are you thinking maxPackage? Currently its hardcoded at 80 -- a queue of 5MB (packets are 64k). You thinking I should experiment with that? I suppose that won't hel w/ much w/ getting my writes on the datanode. Maybe I should be digging on datanode side to figure why its slow getting back to the client? Thanks, St.Ack On Sun, May 10, 2009 at 7:49 PM, Raghu Angadi <rang...@yahoo-inc.com> wrote: > > It should not be waiting unnecessarily. But the client has to, if any of > the datanodes in the pipeline is not able to receive the as fast as client > is writing. IOW writing goes as fast as the slowest of nodes involved in the > pipeline (1 client and 3 datanodes). > > But based on what your case is, you probably could benefit by increasing > the buffer (number of unacked packets).. it would depend on where the > datastream thread is blocked. > > Raghu. > > > stack wrote: > >> Writing a file, our application spends a load of time here: >> >> at java.lang.Object.wait(Native Method) >> at java.lang.Object.wait(Object.java:485) >> at >> >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2964) >> - locked <0x00007f11054c2b68> (a java.util.LinkedList) >> - locked <0x00007f11054c24c0> (a >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) >> at >> >> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150) >> at >> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132) >> - locked <0x00007f11054c24c0> (a >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) >> at >> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121) >> - locked <0x00007f11054c24c0> (a >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) >> at >> org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112) >> at >> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86) >> - locked <0x00007f11054c24c0> (a >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) >> at >> >> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49) >> at java.io.DataOutputStream.write(DataOutputStream.java:90) >> - locked <0x00007f1105694f28> (a >> org.apache.hadoop.fs.FSDataOutputStream) >> at >> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1020) >> - locked <0x00007f1105694e98> (a >> org.apache.hadoop.io.SequenceFile$Writer) >> at >> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:984) >> >> Here is the code from around line 2964 in writeChunk. >> >> // If queue is full, then wait till we can create enough >> space >> while (!closed && dataQueue.size() + ackQueue.size() > maxPackets) >> { >> try >> { >> >> dataQueue.wait(); >> } catch (InterruptedException e) { >> >> } >> >> } >> >> The queue of packets is full and we're waiting for it to be cleared. >> >> Any suggestions for how I might get the DataStreamer to act more promptly >> clearing the package queue? >> >> This is hadoop 0.20 branch. Its a small cluster but relatively lightly >> loaded (so says ganglia). >> >> Thanks, >> St.Ack >> >> >