[ https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chao Sun updated HIVE-15796: ---------------------------- Attachment: HIVE-15796.2.patch > HoS: poor reducer parallelism when operator stats are not accurate > ------------------------------------------------------------------ > > Key: HIVE-15796 > URL: https://issues.apache.org/jira/browse/HIVE-15796 > Project: Hive > Issue Type: Improvement > Components: Statistics > Affects Versions: 2.2.0 > Reporter: Chao Sun > Assignee: Chao Sun > Attachments: HIVE-15796.1.patch, HIVE-15796.2.patch, > HIVE-15796.wip.1.patch, HIVE-15796.wip.2.patch, HIVE-15796.wip.patch > > > In HoS we use currently use operator stats to determine reducer parallelism. > However, it is often the case that operator stats are not accurate, > especially if column stats are not available. This sometimes will generate > extremely poor reducer parallelism, and cause HoS query to run forever. > This JIRA tries to offer an alternative way to compute reducer parallelism, > similar to how MR does. Here's the approach we are suggesting: > 1. when computing the parallelism for a MapWork, use stats associated with > the TableScan operator; > 2. when computing the parallelism for a ReduceWork, use the *maximum* > parallelism from all its parents. -- This message was sent by Atlassian JIRA (v6.3.15#6346)