[ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189037#comment-14189037
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7276:
-------------------------------------------

> ... However, it is unfortunate that our full package size is 64k + hearder 
> length, which will round up to 128k.

I was wrong about the full package size.  In 
DFSOutputStream.computePacketChunkSize(..),
{code}
  private void computePacketChunkSize(int psize, int csize) {
    final int chunkSize = csize + getChecksumSize();
    chunksPerPacket = Math.max(psize/chunkSize, 1);
    packetSize = chunkSize*chunksPerPacket;
    if (DFSClient.LOG.isDebugEnabled()) {
      ...
    }
  }
{code}
So we have the following
|| variables || usual values ||
| psize | dfsClient.getConf().writePacketSize = 64kB |
| csize | bytesPerChecksum = 512B |
| getChecksumSize(), i.e. CRC size | 32B |
| chunkSize = csize + getChecksumSize() | 544B (not a power of two) |
| psize/chunkSize | 120.47 |
| chunksPerPacket = max(psize/chunkSize, 1) | 120 |
| packetSize = chunkSize*chunksPerPacket (not including header) | 65280 |
| PacketHeader.PKT_MAX_HEADER_LEN | 33B |
| actual packet size | 65280 + 33 = *65313* < 65536 = 64k |
It is fortunate that the usual packetSize = 65313 < 64k although the 
calculation above does not guarantee it happen (e.g. if PKT_MAX_HEADER_LEN=257, 
then actual packet size=65537 > 64k.)  I will fix the computation in order to 
guarantee actual packet size < 64k.

> Limit the number of byte arrays used by DFSOutputStream
> -------------------------------------------------------
>
>                 Key: HDFS-7276
>                 URL: https://issues.apache.org/jira/browse/HDFS-7276
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h7276_20141021.patch, h7276_20141022.patch, 
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, 
> h7276_20141027b.patch, h7276_20141028.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to