[ https://issues.apache.org/jira/browse/TEZ-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001082#comment-16001082 ]
Zhiyuan Yang commented on TEZ-3708: ----------------------------------- Submit patch for jenkins run since TEZ-3697 was fixed. > Improve parallelism and auto grouping of unpartitioned cartesian product > ------------------------------------------------------------------------ > > Key: TEZ-3708 > URL: https://issues.apache.org/jira/browse/TEZ-3708 > Project: Apache Tez > Issue Type: Sub-task > Reporter: Zhiyuan Yang > Assignee: Zhiyuan Yang > Attachments: TEZ-3708.1.patch, TEZ-3708.2.patch > > > Current unpartitioned cartesian product has a few limitations > 1. parallelism can be not enough in case of large split and small # src task > 2. parallelism can be too much in in case of large # src task > 3. workload is not ideally distributed across the worker. Even with auto > grouping, grouping by size may not be accurate because same size can means > different #record and different cartesian product ops. -- This message was sent by Atlassian JIRA (v6.3.15#6346)