[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization

2010-10-06 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1353:
--

Attachment: HDFS-1353-y20-2.patch

Uploading updated patch.  We changed the fix a bit to not bump the RPC protocol 
version since it's a minor fix.  Not for commit to Apache.

 Remove most of getBlockLocation optimization
 

 Key: HDFS-1353
 URL: https://issues.apache.org/jira/browse/HDFS-1353
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.22.0

 Attachments: Benchmarking results.xlsx, 
 HDFS-1353-optmized-wire-not-to-be-committed.patch, HDFS-1353-y20-2.patch, 
 HDFS-1353-y20.patch, HDFS-1353.patch


 This description is not valid. See comment.
 HDFS-1081 optimized the number of block access tokens (BATs) created in a 
 single call to getBlockLocations, as this is an expensive operation.  
 However, that JIRA put off another optimization which was then made possible, 
 which is to just send a single block access token across the wire (and 
 maintain a single BAT on the client side).  This JIRA is for implementing 
 that optimization.  Since a single BAT is generated for all the blocks, we 
 just write that single BAT to the wire, rather than writing n BATs for n 
 blocks, as is currently done.  This turns out to be a useful optimization for 
 files with very large numbers of blocks, as the new lone BAT is much larger 
 than was a BAT previously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization

2010-09-03 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1353:
--

Summary: Remove most of getBlockLocation optimization  (was: Optimize 
number of block access tokens returned by getBlockLocations)
Description: 
This description is not valid. See comment.
HDFS-1081 optimized the number of block access tokens (BATs) created in a 
single call to getBlockLocations, as this is an expensive operation.  However, 
that JIRA put off another optimization which was then made possible, which is 
to just send a single block access token across the wire (and maintain a single 
BAT on the client side).  This JIRA is for implementing that optimization.  
Since a single BAT is generated for all the blocks, we just write that single 
BAT to the wire, rather than writing n BATs for n blocks, as is currently done. 
 This turns out to be a useful optimization for files with very large numbers 
of blocks, as the new lone BAT is much larger than was a BAT previously.

  was:HDFS-1081 optimized the number of block access tokens (BATs) created in a 
single call to getBlockLocations, as this is an expensive operation.  However, 
that JIRA put off another optimization which was then made possible, which is 
to just send a single block access token across the wire (and maintain a single 
BAT on the client side).  This JIRA is for implementing that optimization.  
Since a single BAT is generated for all the blocks, we just write that single 
BAT to the wire, rather than writing n BATs for n blocks, as is currently done. 
 This turns out to be a useful optimization for files with very large numbers 
of blocks, as the new lone BAT is much larger than was a BAT previously.


While benchmarking this new patch, originally an addendum to HDFS-1081, we 
determined that 1081's original benchmarks were in error.  getBlockLocations 
was not the culprit in the performance degradation.  1081 didn't do any damage 
to speed, and with this addendum, actually does give some benefit for files 
with moderate numbers of blocks (see to-be-attached benchmarks).  However, 
since getBL isn't really a slow method, these gains aren't worth the extra 
complexity they introduce.  I'll upload the on-the-wire optimization patch, in 
case it becomes useful at some point, but I'm going to use this JIRA to roll 
back most of 1081, excluding some byte-array allocating that we can easily 
cache.  ...sigh.

 Remove most of getBlockLocation optimization
 

 Key: HDFS-1353
 URL: https://issues.apache.org/jira/browse/HDFS-1353
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.21.1

 Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch


 This description is not valid. See comment.
 HDFS-1081 optimized the number of block access tokens (BATs) created in a 
 single call to getBlockLocations, as this is an expensive operation.  
 However, that JIRA put off another optimization which was then made possible, 
 which is to just send a single block access token across the wire (and 
 maintain a single BAT on the client side).  This JIRA is for implementing 
 that optimization.  Since a single BAT is generated for all the blocks, we 
 just write that single BAT to the wire, rather than writing n BATs for n 
 blocks, as is currently done.  This turns out to be a useful optimization for 
 files with very large numbers of blocks, as the new lone BAT is much larger 
 than was a BAT previously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization

2010-09-03 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1353:
--

Attachment: Benchmarking results.xlsx

Benchmarks of original patch, which optimized the on-the-wire combined block 
tokens.

 Remove most of getBlockLocation optimization
 

 Key: HDFS-1353
 URL: https://issues.apache.org/jira/browse/HDFS-1353
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.21.1

 Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch


 This description is not valid. See comment.
 HDFS-1081 optimized the number of block access tokens (BATs) created in a 
 single call to getBlockLocations, as this is an expensive operation.  
 However, that JIRA put off another optimization which was then made possible, 
 which is to just send a single block access token across the wire (and 
 maintain a single BAT on the client side).  This JIRA is for implementing 
 that optimization.  Since a single BAT is generated for all the blocks, we 
 just write that single BAT to the wire, rather than writing n BATs for n 
 blocks, as is currently done.  This turns out to be a useful optimization for 
 files with very large numbers of blocks, as the new lone BAT is much larger 
 than was a BAT previously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization

2010-09-03 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1353:
--

Attachment: HDFS-1353-y20.patch

Patch for y20.  Not for commit here.

 Remove most of getBlockLocation optimization
 

 Key: HDFS-1353
 URL: https://issues.apache.org/jira/browse/HDFS-1353
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.21.1

 Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch


 This description is not valid. See comment.
 HDFS-1081 optimized the number of block access tokens (BATs) created in a 
 single call to getBlockLocations, as this is an expensive operation.  
 However, that JIRA put off another optimization which was then made possible, 
 which is to just send a single block access token across the wire (and 
 maintain a single BAT on the client side).  This JIRA is for implementing 
 that optimization.  Since a single BAT is generated for all the blocks, we 
 just write that single BAT to the wire, rather than writing n BATs for n 
 blocks, as is currently done.  This turns out to be a useful optimization for 
 files with very large numbers of blocks, as the new lone BAT is much larger 
 than was a BAT previously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization

2010-09-03 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1353:
--

Attachment: HDFS-1353.patch

Patch for trunk and 21.

 Remove most of getBlockLocation optimization
 

 Key: HDFS-1353
 URL: https://issues.apache.org/jira/browse/HDFS-1353
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.21.1, 0.22.0

 Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch, 
 HDFS-1353.patch


 This description is not valid. See comment.
 HDFS-1081 optimized the number of block access tokens (BATs) created in a 
 single call to getBlockLocations, as this is an expensive operation.  
 However, that JIRA put off another optimization which was then made possible, 
 which is to just send a single block access token across the wire (and 
 maintain a single BAT on the client side).  This JIRA is for implementing 
 that optimization.  Since a single BAT is generated for all the blocks, we 
 just write that single BAT to the wire, rather than writing n BATs for n 
 blocks, as is currently done.  This turns out to be a useful optimization for 
 files with very large numbers of blocks, as the new lone BAT is much larger 
 than was a BAT previously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization

2010-09-03 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1353:
--

   Status: Patch Available  (was: Open)
Fix Version/s: 0.22.0

Submitting patch.

 Remove most of getBlockLocation optimization
 

 Key: HDFS-1353
 URL: https://issues.apache.org/jira/browse/HDFS-1353
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.21.1, 0.22.0

 Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch, 
 HDFS-1353.patch


 This description is not valid. See comment.
 HDFS-1081 optimized the number of block access tokens (BATs) created in a 
 single call to getBlockLocations, as this is an expensive operation.  
 However, that JIRA put off another optimization which was then made possible, 
 which is to just send a single block access token across the wire (and 
 maintain a single BAT on the client side).  This JIRA is for implementing 
 that optimization.  Since a single BAT is generated for all the blocks, we 
 just write that single BAT to the wire, rather than writing n BATs for n 
 blocks, as is currently done.  This turns out to be a useful optimization for 
 files with very large numbers of blocks, as the new lone BAT is much larger 
 than was a BAT previously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization

2010-09-03 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1353:
--

Attachment: HDFS-1353-optmized-wire-not-to-be-committed.patch

For completeness' sake, here's the planned optimizations referenced in the 
spreadsheet... Not to be committed.

 Remove most of getBlockLocation optimization
 

 Key: HDFS-1353
 URL: https://issues.apache.org/jira/browse/HDFS-1353
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.21.1, 0.22.0

 Attachments: Benchmarking results.xlsx, 
 HDFS-1353-optmized-wire-not-to-be-committed.patch, HDFS-1353-y20.patch, 
 HDFS-1353.patch


 This description is not valid. See comment.
 HDFS-1081 optimized the number of block access tokens (BATs) created in a 
 single call to getBlockLocations, as this is an expensive operation.  
 However, that JIRA put off another optimization which was then made possible, 
 which is to just send a single block access token across the wire (and 
 maintain a single BAT on the client side).  This JIRA is for implementing 
 that optimization.  Since a single BAT is generated for all the blocks, we 
 just write that single BAT to the wire, rather than writing n BATs for n 
 blocks, as is currently done.  This turns out to be a useful optimization for 
 files with very large numbers of blocks, as the new lone BAT is much larger 
 than was a BAT previously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1353) Remove most of getBlockLocation optimization

2010-09-03 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1353:
--

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: (was: 0.21.1)
   Resolution: Fixed

I've committed this to trunk.  Resolving as fixed.

 Remove most of getBlockLocation optimization
 

 Key: HDFS-1353
 URL: https://issues.apache.org/jira/browse/HDFS-1353
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.22.0

 Attachments: Benchmarking results.xlsx, 
 HDFS-1353-optmized-wire-not-to-be-committed.patch, HDFS-1353-y20.patch, 
 HDFS-1353.patch


 This description is not valid. See comment.
 HDFS-1081 optimized the number of block access tokens (BATs) created in a 
 single call to getBlockLocations, as this is an expensive operation.  
 However, that JIRA put off another optimization which was then made possible, 
 which is to just send a single block access token across the wire (and 
 maintain a single BAT on the client side).  This JIRA is for implementing 
 that optimization.  Since a single BAT is generated for all the blocks, we 
 just write that single BAT to the wire, rather than writing n BATs for n 
 blocks, as is currently done.  This turns out to be a useful optimization for 
 files with very large numbers of blocks, as the new lone BAT is much larger 
 than was a BAT previously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.