[ https://issues.apache.org/jira/browse/TEZ-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Wohlstadter updated TEZ-3936: ---------------------------------- Target Version/s: 0.9.2, 0.10.0 (was: 0.9.2) > Reduce TezEvent messaging overhead > ---------------------------------- > > Key: TEZ-3936 > URL: https://issues.apache.org/jira/browse/TEZ-3936 > Project: Apache Tez > Issue Type: Bug > Reporter: Jonathan Eagles > Assignee: Jonathan Eagles > Priority: Major > Attachments: TEZ-3936.001.patch, TEZ-3936.002.patch > > > Revisiting TEZ-3145, and found that in addition to improving the way empty > partitions are send from Maps to AM and AM to Reducers, message serialization > can be improved to reduce network traffic. > For example in a job with 42000 Maps and 7500 reduces where 95% of the > partition data produced is empty. Tez DME events send from the AM to the > Reducers is num(Maps) * num(Reducers) * size (Wrapped DME). With 95% empty > partitions message size is 450 bytes where 260 bytes is needed for sending > empty partitions and 190 bytes for messaging. Total messaging is 132 GBs > 76 GBs for empty partition data and 56 GBs for non-empty partition messaging. > This jira aims to reduce the non-empty partition messaging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)