[ https://issues.apache.org/jira/browse/HADOOP-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated HADOOP-7339: -------------------------------- Fix Version/s: (was: 0.22.0) 0.23.0 bumping to 0.23 since this is an optimization > Introduce a buffered checksum for avoiding frequently calls on > Checksum.update() > -------------------------------------------------------------------------------- > > Key: HADOOP-7339 > URL: https://issues.apache.org/jira/browse/HADOOP-7339 > Project: Hadoop Common > Issue Type: Improvement > Components: util > Reporter: Min Zhou > Fix For: 0.23.0 > > Attachments: HADOOP-7339-v1.diff, HADOOP-7339-v2.diff > > > We found that PureJavaCRC32/CRC32.update() is the TOP 1 of the methods > consuming CPU in a map side, and in reduce side, it cost a lots of CPU too. > IFileOutputStream would frequently call Checksum.update() during writing a > record. It's very common a MR key/value less than 512 bytes. > Checksum.update() would be called every time writing a key/value. > Test case: terasort 100MB. > Checksum.update() calls has be reduced from 4030348 to 28069. This method is > not a hotspot anymore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira