elek opened a new pull request #1336:
URL: https://github.com/apache/hadoop-ozone/pull/1336


   ## What changes were proposed in this pull request?
   
   Teragen reported to be slow with low number of mappers compared to HDFS.
   
   In my test (one pipeline, 3 yarn nodes) 10 g teragen with HDFS was ~3 mins 
but with Ozone it was 6 mins. It could be fixed with using more mappers, but 
when I investigated the execution I found a few problems reagrding to the 
BufferPool management.
   
    1. IncrementalChunkBuffer is slow and it might not be required as 
BufferPool itself is incremental
    2. For each write operation the bufferPool.allocateBufferIfNeeded is called 
which can be a slow operation (positions should be  calculated).
    3. There is no explicit support for write(byte) operations
   
   In the 
[flamegraphs](https://github.com/elek/ozone-notes/tree/master/profiles) it's 
clearly visible that with low number of mappers the client is busy with buffer 
operations. After the patch the Rpc call and the checksum calculation give the 
majority of the time.
   
   Overall write performance is improved with at least 30% when minimal number 
of threads/mappers are used. 
   
   ## Thanks
   
   Special thanks to @lokeshj1703, who helped me find the small mistakes in the 
original verison of the patch.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4119
   
   ## How was this patch tested?
   
   Teragen 10/100g with 2/30 mappers.
   
   (https://github.com/elek/ozone-perf-env/tree/master/teragen-hdfs)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to