[ 
https://issues.apache.org/jira/browse/HDFS-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1081:
------------------------------

    Attachment: HDFS-1081-trunk.patch

Patch for trunk. Basically same as the 20S patch but with modifications for new 
BlockManager.  We've been running the 20 patch in production and it's good.

This optimization was benchmarked on a 5 DN cluster using the (to-be-attached) 
script to measure performance time with and without patch on trunk.  
Results:
Round trip times for getBlockLocations call (in milliseconds) for files with 
specified number of blocks across 100 calls for each # of blocks.

*without patch*
|| ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 
blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 
blocks||1000 blocks||
|mean|2.20|2.07|2.06|2.05|2.05|2.01|2.01|2.01|2.03|2.07|42.23|4.05|7.02|16.08|30.47|50.57|
|median|2.00|2.00|2.00|2.00|2.00|2.00|2.00|2.00|2.00|2.00|42.00|4.00|7.00|16.00|28.00|49.50|
|std 
dev|1.54|0.38|0.28|0.36|0.26|0.10|0.10|0.10|0.17|0.70|1.24|0.33|0.32|1.04|27.00|5.04|

*With patch*
|| ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 
blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 
blocks||1000 blocks||
|mean|1.15|1.01|1.02|1.09|1.00|1.01|1.00|1.01|1.00|1.01|40.76|2.00|3.97|11.61|25.61|88.02|
|median|1.00|1.00|1.00|1.00|1.00|1.00|1.00|1.00|1.00|1.00|41.00|2.00|4.00|10.00|24.00|71.00|
|std 
dev|1.22|0.10|0.14|0.90|0.00|0.10|0.00|0.10|0.00|0.10|0.67|0.00|1.33|8.16|6.38|115.07|

*raw difference: how much less time it took with the patch (negative numbers 
are better)*
|| ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 
blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 
blocks||1000 blocks||
|mean|-1.05|-1.06|-1.04|-0.96|-1.05|-1.00|-1.01|-1.00|-1.03|-1.06|-1.47|-2.05|-3.05|-4.47|-4.86|37.45|
|median|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-2.00|-3.00|-6.00|-4.00|21.50|
|std 
dev|-0.33|-0.28|-0.14|0.54|-0.26|0.00|-0.10|0.00|-0.17|-0.60|-0.57|-0.33|1.01|7.12|-20.62|110.03|

*% difference: Amount of time the patched call took compared to the unpatched 
time*
|| ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 
blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 
blocks||1000 blocks||
|mean|0.52|0.49|0.50|0.53|0.49|0.50|0.50|0.50|0.49|0.49|0.97|0.49|0.57|0.72|0.84|1.74|
|median|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.98|0.50|0.57|0.63|0.86|1.43|
|std 
dev|0.79|0.26|0.51|2.51|0.00|1.00|0.00|1.00|0.00|0.14|0.54|0.00|4.19|7.83|0.24|22.84|

For files with 1-100 blocks we cut the time in half.  

At 20 blocks I see a big spike in the amount of time to do the processing, but 
this is in both the patched and unpatched versions.  I'm not sure what's 
causing this; it warrants looking into.  

This patch saves a lot of time on the NN CPU by only doing the big calculation 
once, but currently could do better at network usage.  This starts to show up 
with 250+ blocks, where we're sending a bigger and bigger amount of data and 
this overwhelms (eventually) the CPU savings.  250+ blocks for a single file in 
HDFS is exceedingly rare, and can also be improved, and I'll open another JIRA 
to optimize this.

I think the data support this particular optimization.  Patch is ready for 
review.

> Performance regression in DistributedFileSystem::getFileBlockLocations in 
> secure systems
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-1081
>                 URL: https://issues.apache.org/jira/browse/HDFS-1081
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: security
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: HADOOP-1081-Y20-1.patch, HADOOP-1081-Y20-2.patch, 
> HDFS-1081-trunk.patch
>
>
> We've seen a significant decrease in the performance of 
> DistributedFileSystem::getFileBlockLocations() with security turned on Y20. 
> This JIRA is for correcting and tracking it both on Y20 and trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to