Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19880


Change subject: IMPALA-12120: Limit output writer parallelism based on write 
volume
......................................................................

IMPALA-12120: Limit output writer parallelism based on write volume

The new processing cost-based planner changes (IMPALA-11604,
IMPALA-12091) will impact output writer parallelism for insert queries,
with the potential for more small files if the processing cost-based
planning results in too many writer fragments. It can further exacerbate
a problem introduced by MT_DOP (see IMPALA-8125).

The MAX_FS_WRITERS query option can help mitigate this. But even without
the MAX_FS_WRITERS set, the default output writer parallelism should
avoid creating excessive writer parallelism for partitioned and
unpartitioned inserts.

This patch implements such a limit when using the cost-based planner. It
limits the number of writer fragments such that each writer fragment
writes at least 128MB of rows. It always includes an exchange node, so
there will be no collocation of a scanner and a table writer in a single
fragment, thus simplifying the estimation. This patch also allows
CTAS (a kind of DDL query) to be eligible for auto-scaling.

Testing:
- Add test cases in test_query_cpu_count_divisor_default
- Add test_processing_cost_writer_limit in test_insert.py
- Pass test_insert.py::TestInsertHdfsWriterLimit
- Pass test_executor_groups.py

Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
---
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_insert.py
5 files changed, 137 insertions(+), 31 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/19880/1
--
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <[email protected]>

Reply via email to