[ https://issues.apache.org/jira/browse/IMPALA-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Becker updated IMPALA-9654: ---------------------------------- Target Version: Impala 4.3.0 (was: Impala 4.2.0) > Intra-node execution skew increase with mt_dop > ---------------------------------------------- > > Key: IMPALA-9654 > URL: https://issues.apache.org/jira/browse/IMPALA-9654 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Reporter: Tim Armstrong > Assignee: Bikramjeet Vig > Priority: Major > Labels: multithreading, performance > > We've seen significant amounts of execution skew (big gap between avg and max > execution time for a scan node) with multithreading enabled on TPC-DS > queries. We balance bytes well, but bytes of input files are often not > correlated with the amount of work in the scan, or above the scan. Some > causes are: > * Dynamic partition pruning leading to different instance with variable > numbers of input splits > * Different amounts of rows being filtered out by predicates and row filters, > leading to skew in rows returned from the plan. > * Different amounts of compressibility > * Files being written in different ways, e.g. different schema, different > writer. > More dynamic load balancing can address all of this if scans pick up the next > range when its pipeline has finished processing the rows from the previous > range. I.e. with the threading model we can deal with time skew anywhere in > the pipeline by balancing in the scan. > I *think* we can solve this for HDFS scans by lifting the ReaderContext up to > the FragmentState (one per plan node) and making corresponding changes to the > scan implementation. We would need to add a bit more machinery to support > Kudu and HBase scans but I think a similar approach would work conceptually. > A more invasive (and probably expensive) solution is to do a local exchange > above the scan node, e.g. a multi-producer multi-consumer queue. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org