[ 
https://issues.apache.org/jira/browse/HBASE-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398978#comment-16398978
 ] 

BELUGA BEHR commented on HBASE-20197:
-------------------------------------

For me, the interesting mico-bench-marking shows a nice speed up for a "large" 
chunk.  I don't know why this is, but my first theory would be that the JVM is 
faster to allocate buffers that are on even boundaries of 1K,2K,4K, etc. than 
it is to allocate an "arbitrary' size that does not fall nicely on a boundary.
{code}
BenchmarkByteBufferOutputStream.testWriteByteBufferChunkingLarge thrpt 200 
5526553.389 ± 31657.753 ops/s
BenchmarkByteBufferOutputStream.testWriteByteBufferLarge thrpt 200 1793877.918 
± 6022.621 ops/s
{code}

> Review of ByteBufferWriterOutputStream.java
> -------------------------------------------
>
>                 Key: HBASE-20197
>                 URL: https://issues.apache.org/jira/browse/HBASE-20197
>             Project: HBase
>          Issue Type: Improvement
>          Components: hbase
>    Affects Versions: 2.0.0
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Minor
>         Attachments: HBASE-20197.1.patch, HBASE-20197.2.patch
>
>
> In looking at this class, two things caught my eye.
>  # Default buffer size of 4K
>  # Re-sizing of buffer on demand
>  
> Java's {{BufferedOutputStream}} uses an internal buffer size of 8K on modern 
> JVMs.  This is due to various bench-marking that showed optimal performance 
> at this level.
>  The Re-sizing buffer looks a bit "unsafe":
>  
> {code:java}
> public void write(ByteBuffer b, int off, int len) throws IOException {
>   byte[] buf = null;
>   if (len > TEMP_BUF_LENGTH) {
>     buf = new byte[len];
>   } else {
>     if (this.tempBuf == null) {
>       this.tempBuf = new byte[TEMP_BUF_LENGTH];
>     }
>     buf = this.tempBuf;
>   }
> ...
> }
> {code}
> If this method gets one call with a 'len' of 4000, then 4001, then 4002, then 
> 4003, etc. then the 'tempBuf' will be re-created many times.  Also, it seems 
> unsafe to create a buffer as large as the 'len' input.  This could 
> theoretically lead to an internal buffer of 2GB for each instance of this 
> class.
> I propose:
>  # Increase the default buffer size to 8K
>  # Create the buffer once and chunk the output instead of loading data into a 
> single array and writing it to the output stream.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to