Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19033 )
Change subject: IMPALA-11604 Planner changes for CPU usage ...................................................................... Patch Set 49: (9 comments) http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@213 PS43, Line 213: overlapping between fragment execution and blocking operators. We > The upper bound for each fragment should be the number of threads or someth min_processing_per_thread=10M seems to be a good upper bound in my local machine. http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@140 PS47, Line 140: a subtree of PlanNodes/DataSink in the fragment with a DataSink or Added SegmentCost class for segment abstraction. Also added TPCDS-Q49 into tpcds-processing-cost.test to test against union fragment. http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG@325 PS48, Line 325: sing_per_th > As the comments in https://github.com/apache/impala/blob/master/fe/src/main My thought as well. Should we revert the default back to True? http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG@346 PS48, Line 346: > Could you attach the bench mark which show effective parallelism improvemen Our single-node benchmark script mainly measure query latency. I don't expect any faster query latency with this patch since the default combination of all new query options and backend flags will actually reduce parallelism in some fragments rather than increasing them. As long as latency does not regress severely compared to regular MT_DOP mode, I take it as a good outcome. The improvement probably best expressed as memory and thread count reduction. http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc File be/src/util/backend-gflag-util.cc: http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@210 PS43, Line 210: > Would rather keep this as cost as cost of a row is a highly variable metric Changed into min_processing_per_thread in ps49. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263 PS43, Line 263: // Returns the total estimated size (in bytes) of the row batch queues by > This assume the total cost for a row batch is 1. Is it right estimation? Changed in ps49 to model the cost as 1 per 1KB of average serialized row size. That seems good enough to increase DataStreamSink and ExchangeNode cost. http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/Planner.java@470 PS48, Line 470: ot = postOrderFra > nit: this result seems not used now. Add "TODO" comment Done http://gerrit.cloudera.org:8080/#/c/19033/49/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java: http://gerrit.cloudera.org:8080/#/c/19033/49/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@20 PS49, Line 20: com.google.cloud.hadoop.repackaged.gcs.com.google.common.math.LongMath Does not look like the right class to import. http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/ScanNode.java@359 PS48, Line 359: > In ExchangeNode.estimateProcessingCostPerRow(), the cost per row is calcula Changed in ps49 to model the cost as 1 per 1KB of average row size. -- To view, visit http://gerrit.cloudera.org:8080/19033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a Gerrit-Change-Number: 19033 Gerrit-PatchSet: 49 Gerrit-Owner: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Fri, 17 Feb 2023 00:32:47 +0000 Gerrit-HasComments: Yes