[ https://issues.apache.org/jira/browse/HDFS-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaobing Zhou updated HDFS-11608: --------------------------------- Description: We've seen HDFS write crashes in the case of huge block size. For example, writing a 3G file using 3G block size, HDFS client throws out of memory exception. DataNode gives out IOException. After changing heap size limit, DFSOutputStream ResponseProcessor exception is seen followed by Broken pipe and pipeline recovery. Give below: Client out-of-mem exception, {noformat} 17/03/30 07:13:50 WARN hdfs.DFSClient: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1245) at java.lang.Thread.join(Thread.java:1319) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:624) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeInternal(DFSOutputStream.java:592) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hdfs.util.ByteArrayManager$NewByteArrayWithoutLimit.newByteArray(ByteArrayManager.java:308) at org.apache.hadoop.hdfs.DFSOutputStream.createPacket(DFSOutputStream.java:197) at org.apache.hadoop.hdfs.DFSOutputStream.writeChunkImpl(DFSOutputStream.java:1906) at org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1884) at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:163) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:144) at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2321) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2303) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:261) at HdfsWriterOutputStream.run(HdfsWriterOutputStream.java:57) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at HdfsWriterOutputStream.main(HdfsWriterOutputStream.java:77) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} Client ResponseProcessor exception, {noformat} 17/03/30 18:20:12 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1828245847-192.168.64.101-1490851685890:blk_1073741859_1040 java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2293) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:748) 17/03/30 18:22:32 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.hdfs.DFSPacket.writeTo(DFSPacket.java:176) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:522) {noformat} DN exception, {noformat} 2017-03-30 16:34:33,828 ERROR datanode.DataNode (DataXceiver.java:run(278)) - c6401.ambari.apache.org:50010:DataXceiver error processing WRITE_BLOCK operation src: /192.168.64.101:47167 dst: /192.168.64.101:50010 java.io.IOException: Incorrect value for packet payload size: 2147483128 at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:502) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:898) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:806) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) at java.lang.Thread.run(Thread.java:745) {noformat} > HDFS write crashed in the case of huge block size > ------------------------------------------------- > > Key: HDFS-11608 > URL: https://issues.apache.org/jira/browse/HDFS-11608 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 2.8.0 > Reporter: Xiaobing Zhou > Assignee: Xiaobing Zhou > Priority: Critical > > We've seen HDFS write crashes in the case of huge block size. For example, > writing a 3G file using 3G block size, HDFS client throws out of memory > exception. DataNode gives out IOException. After changing heap size limit, > DFSOutputStream ResponseProcessor exception is seen followed by Broken pipe > and pipeline recovery. > Give below: > Client out-of-mem exception, > {noformat} > 17/03/30 07:13:50 WARN hdfs.DFSClient: Caught exception > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1245) > at java.lang.Thread.join(Thread.java:1319) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:624) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeInternal(DFSOutputStream.java:592) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.hdfs.util.ByteArrayManager$NewByteArrayWithoutLimit.newByteArray(ByteArrayManager.java:308) > at > org.apache.hadoop.hdfs.DFSOutputStream.createPacket(DFSOutputStream.java:197) > at > org.apache.hadoop.hdfs.DFSOutputStream.writeChunkImpl(DFSOutputStream.java:1906) > at > org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1884) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206) > at > org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:163) > at > org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:144) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2321) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2303) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244) > at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:261) > at HdfsWriterOutputStream.run(HdfsWriterOutputStream.java:57) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at HdfsWriterOutputStream.main(HdfsWriterOutputStream.java:77) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Client ResponseProcessor exception, > {noformat} > 17/03/30 18:20:12 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor > exception for block > BP-1828245847-192.168.64.101-1490851685890:blk_1073741859_1040 > java.io.EOFException: Premature EOF: no length prefix available > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2293) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:748) > 17/03/30 18:22:32 WARN hdfs.DFSClient: DataStreamer Exception > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at org.apache.hadoop.hdfs.DFSPacket.writeTo(DFSPacket.java:176) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:522) > {noformat} > DN exception, > {noformat} > 2017-03-30 16:34:33,828 ERROR datanode.DataNode (DataXceiver.java:run(278)) - > c6401.ambari.apache.org:50010:DataXceiver error processing WRITE_BLOCK > operation src: /192.168.64.101:47167 dst: /192.168.64.101:50010 > java.io.IOException: Incorrect value for packet payload size: 2147483128 > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:502) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:898) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:806) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org