[jira] [Updated] (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated MAPREDUCE-1248: --- Affects Version/s: 0.22.0 Assignee: Ruibang He Redundant memory copying in StreamKeyValUtil Key: MAPREDUCE-1248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming Affects Versions: 0.22.0 Reporter: Ruibang He Assignee: Ruibang He Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1248-v1.0.patch I found that when MROutputThread collecting the output of Reducer, it calls StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy() method, the other is inside key.set() / val.set(). This causes double times of memory copying for the whole output (may lead to higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1248: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.22.0 Resolution: Fixed I just committed this. Thanks Ruibang ! Redundant memory copying in StreamKeyValUtil Key: MAPREDUCE-1248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming Reporter: Ruibang He Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1248-v1.0.patch I found that when MROutputThread collecting the output of Reducer, it calls StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy() method, the other is inside key.set() / val.set(). This causes double times of memory copying for the whole output (may lead to higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1248: --- Status: Patch Available (was: Open) Patch looks good. Submitting for hudson. Redundant memory copying in StreamKeyValUtil Key: MAPREDUCE-1248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming Reporter: Ruibang He Priority: Minor Attachments: MAPREDUCE-1248-v1.0.patch I found that when MROutputThread collecting the output of Reducer, it calls StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy() method, the other is inside key.set() / val.set(). This causes double times of memory copying for the whole output (may lead to higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruibang He updated MAPREDUCE-1248: -- Attachment: MAPREDUCE-1248-v1.0.patch An early solution Redundant memory copying in StreamKeyValUtil Key: MAPREDUCE-1248 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming Reporter: Ruibang He Priority: Minor Attachments: MAPREDUCE-1248-v1.0.patch I found that when MROutputThread collecting the output of Reducer, it calls StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy() method, the other is inside key.set() / val.set(). This causes double times of memory copying for the whole output (may lead to higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.