[Impala-ASF-CR] IMPALA-7234: Improve memory estimates produced by the Planner

Tim Armstrong (Code Review) Mon, 30 Jul 2018 12:19:34 -0700

Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11001 )


Change subject: IMPALA-7234: Improve memory estimates produced by the Planner
......................................................................


Patch Set 6:

(4 comments)

Just a few nits, otherwise looks good and will be some valuable cleanp

http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1364
PS6, Line 1364:     int perHostScanRanges;
              :     if (fileFormats_.contains(HdfsFileFormat.PARQUET)
              :         || fileFormats_.contains(HdfsFileFormat.ORC)) {
I think it would be clearer if we iterated over the formats, computed the 
perHostScanRanges for each and took the max - this would match the intent 
described in the comment more obviously.

I don't think perHostScanRanges for non-columnar formats is guaranteed to be 
lower anyway since the formula is non-trivial.


http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1370
PS6, Line 1370: the scan ranges should be allocated based
              :       // on column reservations.
This is a bit misleading since this calculation is purely an estimate and 
doesn't affect the behaviour of the query at all. I would just say something 
like "From the resource management purview, we want to conservatively estimate 
memory consumption based on the partition with the highest memory requirements."


http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@130
PS6, Line 130: we still need to reserve
             :       // 1GB of buffer for insertion.
We're not really reserving anything based on this estimate for now - maybe just 
something like "even if there are non-Parquet partitions, we want to be 
conservative make a high memory estimate.".


http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@137
PS6, Line 137:     return 100L * 1024L;
Yeah these estimates are pretty bogus :). We will revisit them at some point.



--
To view, visit http://gerrit.cloudera.org:8080/11001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0666ae3d45fbd8615d3fa9a8626ebd29cf94fb4b
Gerrit-Change-Number: 11001
Gerrit-PatchSet: 6
Gerrit-Owner: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Jul 2018 19:19:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7234: Improve memory estimates produced by the Planner

Reply via email to