[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization
[ https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1353: -- Attachment: HDFS-1353-y20-2.patch Uploading updated patch. We changed the fix a bit to not bump the RPC protocol version since it's a minor fix. Not for commit to Apache. Remove most of getBlockLocation optimization Key: HDFS-1353 URL: https://issues.apache.org/jira/browse/HDFS-1353 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: Benchmarking results.xlsx, HDFS-1353-optmized-wire-not-to-be-committed.patch, HDFS-1353-y20-2.patch, HDFS-1353-y20.patch, HDFS-1353.patch This description is not valid. See comment. HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations, as this is an expensive operation. However, that JIRA put off another optimization which was then made possible, which is to just send a single block access token across the wire (and maintain a single BAT on the client side). This JIRA is for implementing that optimization. Since a single BAT is generated for all the blocks, we just write that single BAT to the wire, rather than writing n BATs for n blocks, as is currently done. This turns out to be a useful optimization for files with very large numbers of blocks, as the new lone BAT is much larger than was a BAT previously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization
[ https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1353: -- Summary: Remove most of getBlockLocation optimization (was: Optimize number of block access tokens returned by getBlockLocations) Description: This description is not valid. See comment. HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations, as this is an expensive operation. However, that JIRA put off another optimization which was then made possible, which is to just send a single block access token across the wire (and maintain a single BAT on the client side). This JIRA is for implementing that optimization. Since a single BAT is generated for all the blocks, we just write that single BAT to the wire, rather than writing n BATs for n blocks, as is currently done. This turns out to be a useful optimization for files with very large numbers of blocks, as the new lone BAT is much larger than was a BAT previously. was:HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations, as this is an expensive operation. However, that JIRA put off another optimization which was then made possible, which is to just send a single block access token across the wire (and maintain a single BAT on the client side). This JIRA is for implementing that optimization. Since a single BAT is generated for all the blocks, we just write that single BAT to the wire, rather than writing n BATs for n blocks, as is currently done. This turns out to be a useful optimization for files with very large numbers of blocks, as the new lone BAT is much larger than was a BAT previously. While benchmarking this new patch, originally an addendum to HDFS-1081, we determined that 1081's original benchmarks were in error. getBlockLocations was not the culprit in the performance degradation. 1081 didn't do any damage to speed, and with this addendum, actually does give some benefit for files with moderate numbers of blocks (see to-be-attached benchmarks). However, since getBL isn't really a slow method, these gains aren't worth the extra complexity they introduce. I'll upload the on-the-wire optimization patch, in case it becomes useful at some point, but I'm going to use this JIRA to roll back most of 1081, excluding some byte-array allocating that we can easily cache. ...sigh. Remove most of getBlockLocation optimization Key: HDFS-1353 URL: https://issues.apache.org/jira/browse/HDFS-1353 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.21.1 Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch This description is not valid. See comment. HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations, as this is an expensive operation. However, that JIRA put off another optimization which was then made possible, which is to just send a single block access token across the wire (and maintain a single BAT on the client side). This JIRA is for implementing that optimization. Since a single BAT is generated for all the blocks, we just write that single BAT to the wire, rather than writing n BATs for n blocks, as is currently done. This turns out to be a useful optimization for files with very large numbers of blocks, as the new lone BAT is much larger than was a BAT previously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization
[ https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1353: -- Attachment: Benchmarking results.xlsx Benchmarks of original patch, which optimized the on-the-wire combined block tokens. Remove most of getBlockLocation optimization Key: HDFS-1353 URL: https://issues.apache.org/jira/browse/HDFS-1353 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.21.1 Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch This description is not valid. See comment. HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations, as this is an expensive operation. However, that JIRA put off another optimization which was then made possible, which is to just send a single block access token across the wire (and maintain a single BAT on the client side). This JIRA is for implementing that optimization. Since a single BAT is generated for all the blocks, we just write that single BAT to the wire, rather than writing n BATs for n blocks, as is currently done. This turns out to be a useful optimization for files with very large numbers of blocks, as the new lone BAT is much larger than was a BAT previously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization
[ https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1353: -- Attachment: HDFS-1353-y20.patch Patch for y20. Not for commit here. Remove most of getBlockLocation optimization Key: HDFS-1353 URL: https://issues.apache.org/jira/browse/HDFS-1353 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.21.1 Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch This description is not valid. See comment. HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations, as this is an expensive operation. However, that JIRA put off another optimization which was then made possible, which is to just send a single block access token across the wire (and maintain a single BAT on the client side). This JIRA is for implementing that optimization. Since a single BAT is generated for all the blocks, we just write that single BAT to the wire, rather than writing n BATs for n blocks, as is currently done. This turns out to be a useful optimization for files with very large numbers of blocks, as the new lone BAT is much larger than was a BAT previously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization
[ https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1353: -- Attachment: HDFS-1353.patch Patch for trunk and 21. Remove most of getBlockLocation optimization Key: HDFS-1353 URL: https://issues.apache.org/jira/browse/HDFS-1353 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.21.1, 0.22.0 Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch, HDFS-1353.patch This description is not valid. See comment. HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations, as this is an expensive operation. However, that JIRA put off another optimization which was then made possible, which is to just send a single block access token across the wire (and maintain a single BAT on the client side). This JIRA is for implementing that optimization. Since a single BAT is generated for all the blocks, we just write that single BAT to the wire, rather than writing n BATs for n blocks, as is currently done. This turns out to be a useful optimization for files with very large numbers of blocks, as the new lone BAT is much larger than was a BAT previously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization
[ https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1353: -- Status: Patch Available (was: Open) Fix Version/s: 0.22.0 Submitting patch. Remove most of getBlockLocation optimization Key: HDFS-1353 URL: https://issues.apache.org/jira/browse/HDFS-1353 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.21.1, 0.22.0 Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch, HDFS-1353.patch This description is not valid. See comment. HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations, as this is an expensive operation. However, that JIRA put off another optimization which was then made possible, which is to just send a single block access token across the wire (and maintain a single BAT on the client side). This JIRA is for implementing that optimization. Since a single BAT is generated for all the blocks, we just write that single BAT to the wire, rather than writing n BATs for n blocks, as is currently done. This turns out to be a useful optimization for files with very large numbers of blocks, as the new lone BAT is much larger than was a BAT previously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization
[ https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1353: -- Attachment: HDFS-1353-optmized-wire-not-to-be-committed.patch For completeness' sake, here's the planned optimizations referenced in the spreadsheet... Not to be committed. Remove most of getBlockLocation optimization Key: HDFS-1353 URL: https://issues.apache.org/jira/browse/HDFS-1353 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.21.1, 0.22.0 Attachments: Benchmarking results.xlsx, HDFS-1353-optmized-wire-not-to-be-committed.patch, HDFS-1353-y20.patch, HDFS-1353.patch This description is not valid. See comment. HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations, as this is an expensive operation. However, that JIRA put off another optimization which was then made possible, which is to just send a single block access token across the wire (and maintain a single BAT on the client side). This JIRA is for implementing that optimization. Since a single BAT is generated for all the blocks, we just write that single BAT to the wire, rather than writing n BATs for n blocks, as is currently done. This turns out to be a useful optimization for files with very large numbers of blocks, as the new lone BAT is much larger than was a BAT previously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization
[ https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1353: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: (was: 0.21.1) Resolution: Fixed I've committed this to trunk. Resolving as fixed. Remove most of getBlockLocation optimization Key: HDFS-1353 URL: https://issues.apache.org/jira/browse/HDFS-1353 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: Benchmarking results.xlsx, HDFS-1353-optmized-wire-not-to-be-committed.patch, HDFS-1353-y20.patch, HDFS-1353.patch This description is not valid. See comment. HDFS-1081 optimized the number of block access tokens (BATs) created in a single call to getBlockLocations, as this is an expensive operation. However, that JIRA put off another optimization which was then made possible, which is to just send a single block access token across the wire (and maintain a single BAT on the client side). This JIRA is for implementing that optimization. Since a single BAT is generated for all the blocks, we just write that single BAT to the wire, rather than writing n BATs for n blocks, as is currently done. This turns out to be a useful optimization for files with very large numbers of blocks, as the new lone BAT is much larger than was a BAT previously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.