[ https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189037#comment-14189037 ]
Tsz Wo Nicholas Sze commented on HDFS-7276: ------------------------------------------- > ... However, it is unfortunate that our full package size is 64k + hearder > length, which will round up to 128k. I was wrong about the full package size. In DFSOutputStream.computePacketChunkSize(..), {code} private void computePacketChunkSize(int psize, int csize) { final int chunkSize = csize + getChecksumSize(); chunksPerPacket = Math.max(psize/chunkSize, 1); packetSize = chunkSize*chunksPerPacket; if (DFSClient.LOG.isDebugEnabled()) { ... } } {code} So we have the following || variables || usual values || | psize | dfsClient.getConf().writePacketSize = 64kB | | csize | bytesPerChecksum = 512B | | getChecksumSize(), i.e. CRC size | 32B | | chunkSize = csize + getChecksumSize() | 544B (not a power of two) | | psize/chunkSize | 120.47 | | chunksPerPacket = max(psize/chunkSize, 1) | 120 | | packetSize = chunkSize*chunksPerPacket (not including header) | 65280 | | PacketHeader.PKT_MAX_HEADER_LEN | 33B | | actual packet size | 65280 + 33 = *65313* < 65536 = 64k | It is fortunate that the usual packetSize = 65313 < 64k although the calculation above does not guarantee it happen (e.g. if PKT_MAX_HEADER_LEN=257, then actual packet size=65537 > 64k.) I will fix the computation in order to guarantee actual packet size < 64k. > Limit the number of byte arrays used by DFSOutputStream > ------------------------------------------------------- > > Key: HDFS-7276 > URL: https://issues.apache.org/jira/browse/HDFS-7276 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Reporter: Tsz Wo Nicholas Sze > Assignee: Tsz Wo Nicholas Sze > Attachments: h7276_20141021.patch, h7276_20141022.patch, > h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, > h7276_20141027b.patch, h7276_20141028.patch > > > When there are a lot of DFSOutputStream's writing concurrently, the number of > outstanding packets could be large. The byte arrays created by those packets > could occupy a lot of memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)