The default dfs.client-write-packet-size value is 64k, at least it's in my 
Hadoop2 env.
I did a benchmark about i via ycsb loading 2 million records(3*200 bytes):
1) dfs.client-write-packet-size=64k ygc count:399, ygct:4.208s
2) dfs.client-write-packet-size=8k ygc count:163, ygct:2.644s
you see, it's about 40% benefit on gct:)
It's because: in DFSOutputStream.Packet class, each "Create a new packet" 
operation,
will call "buf = new byte[PacketHeader.PKT_MAX_HEADER_LEN + pktSize];",
here "pktSize" comes from dfs.client-write-packet-size setting, and in HBase 
write scenario,
we sync WAL asap, so all the new packets are very small
(in my ycsb testing, most of them were only hundreds of bytes, or a few kilo 
bytes), 
rarely reached to 64k, so always allocating 64k array is just a waste.
It would be better that if we add it to refguide note:)

ps; 8k just a test setting, we should set it according the real kv size pattern.

Thanks,

Reply via email to