[ https://issues.apache.org/jira/browse/IMPALA-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716824#comment-17716824 ]
Yida Wu commented on IMPALA-8081: --------------------------------- [~rizaon] is the task done? > Avoid over-parallelizing queries when there are small input splits > ------------------------------------------------------------------ > > Key: IMPALA-8081 > URL: https://issues.apache.org/jira/browse/IMPALA-8081 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Reporter: Janaki Lahorani > Assignee: Yida Wu > Priority: Major > Labels: multithreading > > Currently we maximise parallelism given the number of input splits available. > This is often a good decision, unless there are very many small input splits, > particularly small files. We could avoid this pathological behaviour by > having a minimum threshold of input bytes per instance (this is still pretty > crude, since file input bytes only correlates loosely with the amount of work > required). > An example: > {noformat} > [localhost.EXAMPLE.COM:21050] default> show files in functional.alltypes; > Query: show files in functional.alltypes > +-------------------------------------------------------------------------------+---------+--------------------+ > | Path > | Size | Partition | > +-------------------------------------------------------------------------------+---------+--------------------+ > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=1/090101.txt > | 19.95KB | year=2009/month=1 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=2/090201.txt > | 18.12KB | year=2009/month=2 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=3/090301.txt > | 20.06KB | year=2009/month=3 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=4/090401.txt > | 19.61KB | year=2009/month=4 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=5/090501.txt > | 20.36KB | year=2009/month=5 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=6/090601.txt > | 19.71KB | year=2009/month=6 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=7/090701.txt > | 20.36KB | year=2009/month=7 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=8/090801.txt > | 20.36KB | year=2009/month=8 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=9/090901.txt > | 19.71KB | year=2009/month=9 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=10/091001.txt > | 20.36KB | year=2009/month=10 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt > | 19.71KB | year=2009/month=11 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=12/091201.txt > | 20.36KB | year=2009/month=12 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=1/100101.txt > | 20.36KB | year=2010/month=1 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=2/100201.txt > | 18.39KB | year=2010/month=2 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=3/100301.txt > | 20.36KB | year=2010/month=3 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=4/100401.txt > | 19.71KB | year=2010/month=4 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=5/100501.txt > | 20.36KB | year=2010/month=5 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=6/100601.txt > | 19.71KB | year=2010/month=6 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=7/100701.txt > | 20.36KB | year=2010/month=7 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=8/100801.txt > | 20.36KB | year=2010/month=8 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=9/100901.txt > | 19.71KB | year=2010/month=9 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=10/101001.txt > | 20.36KB | year=2010/month=10 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=11/101101.txt > | 19.71KB | year=2010/month=11 | > | > hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=12/101201.txt > | 20.36KB | year=2010/month=12 | > +-------------------------------------------------------------------------------+---------+--------------------+ > [localhost:21000] default> set mt_dop=8; select count(*) from > functional.alltypes; summary; > MT_DOP set to 8 > Query: select count(*) from functional.alltypes > Query submitted at: 2020-10-30 17:47:26 (Coordinator: > http://tarmstrong-box:25000) > Query progress can be monitored at: > http://tarmstrong-box:25000/query_plan?query_id=d242927a09d39968:3ae0ecec00000000 > +----------+ > | count(*) | > +----------+ > | 7300 | > +----------+ > Fetched 1 row(s) in 0.19s > +---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------+ > | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | > Peak Mem | Est. Peak Mem | Detail | > +---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------+ > | F01:ROOT | 1 | 242.85us | 242.85us | | | 0 > B | 0 B | | > | 03:AGGREGATE | 1 | 2.51ms | 2.51ms | 1 | 1 | > 16.00 KB | 10.00 MB | FINALIZE | > | 02:EXCHANGE | 1 | 616.31us | 616.31us | 24 | 1 | > 240.00 KB | 16.00 KB | UNPARTITIONED | > | F00:EXCHANGE SENDER | 24 | 1.34ms | 2.66ms | | | > 16.00 KB | 0 B | | > | 01:AGGREGATE | 24 | 1.52ms | 2.13ms | 24 | 1 | > 16.00 KB | 10.00 MB | | > | 00:SCAN HDFS | 24 | 32.76ms | 35.75ms | 7.30K | 7.30K | > 32.00 KB | 16.00 MB | functional.alltypes | > +---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------+ > {noformat} > In this example, we create 8 instances per impala daemon to scan a tiny > amount of data each. We would be better off, typically, in creating fewer > instances to avoid the overhead. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org