[ https://issues.apache.org/jira/browse/HDFS-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jakob Homan updated HDFS-1081: ------------------------------ Attachment: HDFS-1081-trunk.patch Patch for trunk. Basically same as the 20S patch but with modifications for new BlockManager. We've been running the 20 patch in production and it's good. This optimization was benchmarked on a 5 DN cluster using the (to-be-attached) script to measure performance time with and without patch on trunk. Results: Round trip times for getBlockLocations call (in milliseconds) for files with specified number of blocks across 100 calls for each # of blocks. *without patch* || ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 blocks||1000 blocks|| |mean|2.20|2.07|2.06|2.05|2.05|2.01|2.01|2.01|2.03|2.07|42.23|4.05|7.02|16.08|30.47|50.57| |median|2.00|2.00|2.00|2.00|2.00|2.00|2.00|2.00|2.00|2.00|42.00|4.00|7.00|16.00|28.00|49.50| |std dev|1.54|0.38|0.28|0.36|0.26|0.10|0.10|0.10|0.17|0.70|1.24|0.33|0.32|1.04|27.00|5.04| *With patch* || ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 blocks||1000 blocks|| |mean|1.15|1.01|1.02|1.09|1.00|1.01|1.00|1.01|1.00|1.01|40.76|2.00|3.97|11.61|25.61|88.02| |median|1.00|1.00|1.00|1.00|1.00|1.00|1.00|1.00|1.00|1.00|41.00|2.00|4.00|10.00|24.00|71.00| |std dev|1.22|0.10|0.14|0.90|0.00|0.10|0.00|0.10|0.00|0.10|0.67|0.00|1.33|8.16|6.38|115.07| *raw difference: how much less time it took with the patch (negative numbers are better)* || ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 blocks||1000 blocks|| |mean|-1.05|-1.06|-1.04|-0.96|-1.05|-1.00|-1.01|-1.00|-1.03|-1.06|-1.47|-2.05|-3.05|-4.47|-4.86|37.45| |median|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-1.00|-2.00|-3.00|-6.00|-4.00|21.50| |std dev|-0.33|-0.28|-0.14|0.54|-0.26|0.00|-0.10|0.00|-0.17|-0.60|-0.57|-0.33|1.01|7.12|-20.62|110.03| *% difference: Amount of time the patched call took compared to the unpatched time* || ||1 blocks||2 blocks||3 blocks||4 blocks||5 blocks||6 blocks||7 blocks||8 blocks||9 blocks||10 blocks||20 blocks||50 blocks||100 blocks||250 blocks||500 blocks||1000 blocks|| |mean|0.52|0.49|0.50|0.53|0.49|0.50|0.50|0.50|0.49|0.49|0.97|0.49|0.57|0.72|0.84|1.74| |median|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.50|0.98|0.50|0.57|0.63|0.86|1.43| |std dev|0.79|0.26|0.51|2.51|0.00|1.00|0.00|1.00|0.00|0.14|0.54|0.00|4.19|7.83|0.24|22.84| For files with 1-100 blocks we cut the time in half. At 20 blocks I see a big spike in the amount of time to do the processing, but this is in both the patched and unpatched versions. I'm not sure what's causing this; it warrants looking into. This patch saves a lot of time on the NN CPU by only doing the big calculation once, but currently could do better at network usage. This starts to show up with 250+ blocks, where we're sending a bigger and bigger amount of data and this overwhelms (eventually) the CPU savings. 250+ blocks for a single file in HDFS is exceedingly rare, and can also be improved, and I'll open another JIRA to optimize this. I think the data support this particular optimization. Patch is ready for review. > Performance regression in DistributedFileSystem::getFileBlockLocations in > secure systems > ---------------------------------------------------------------------------------------- > > Key: HDFS-1081 > URL: https://issues.apache.org/jira/browse/HDFS-1081 > Project: Hadoop HDFS > Issue Type: Improvement > Components: security > Reporter: Jakob Homan > Assignee: Jakob Homan > Attachments: HADOOP-1081-Y20-1.patch, HADOOP-1081-Y20-2.patch, > HDFS-1081-trunk.patch > > > We've seen a significant decrease in the performance of > DistributedFileSystem::getFileBlockLocations() with security turned on Y20. > This JIRA is for correcting and tracking it both on Y20 and trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.