[ https://issues.apache.org/jira/browse/HBASE-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400174#comment-16400174 ]
Anoop Sam John commented on HBASE-20197: ---------------------------------------- bq.if (len > TEMP_BUF_LENGTH) { Hmm that is bad.. We need the check to be changed against the current buf length. Why the ByteBufferUtils.copyFromBufferToArray is changed? We use the Unsafe way of copy throughout code base. The buf should be lazily created. This is with intent been done. write(ByteBuffer b, int off, int len) will get called iff the cell is backed by DBB. This is the not the default case any way. So if we create lazily, we can avoid the buf of 4KB/8 KB creation with every WAL codec. When it is DBB, we will call the write with a buf of size = cell's size. (key+ value). Ideally we might not have this large sized cells. Like 8 KB. We thought 4 KB is good enough. The chunking way of write looks nice. Generally speaking the 8 KB a good one. But here I guess we can continue with 4 KB but have the chunking way Comments 1. We need the lazy initialization 2. 4 KB to 8 KB change is not needed 3. Go with chunking writes but make use of the BBUtils API > Review of ByteBufferWriterOutputStream.java > ------------------------------------------- > > Key: HBASE-20197 > URL: https://issues.apache.org/jira/browse/HBASE-20197 > Project: HBase > Issue Type: Improvement > Components: hbase > Affects Versions: 2.0.0 > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Minor > Attachments: HBASE-20197.1.patch, HBASE-20197.2.patch, > HBASE-20197.3.patch > > > In looking at this class, two things caught my eye. > # Default buffer size of 4K > # Re-sizing of buffer on demand > > Java's {{BufferedOutputStream}} uses an internal buffer size of 8K on modern > JVMs. This is due to various bench-marking that showed optimal performance > at this level. > The Re-sizing buffer looks a bit "unsafe": > > {code:java} > public void write(ByteBuffer b, int off, int len) throws IOException { > byte[] buf = null; > if (len > TEMP_BUF_LENGTH) { > buf = new byte[len]; > } else { > if (this.tempBuf == null) { > this.tempBuf = new byte[TEMP_BUF_LENGTH]; > } > buf = this.tempBuf; > } > ... > } > {code} > If this method gets one call with a 'len' of 4000, then 4001, then 4002, then > 4003, etc. then the 'tempBuf' will be re-created many times. Also, it seems > unsafe to create a buffer as large as the 'len' input. This could > theoretically lead to an internal buffer of 2GB for each instance of this > class. > I propose: > # Increase the default buffer size to 8K > # Create the buffer once and chunk the output instead of loading data into a > single array and writing it to the output stream. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)