Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15370 )
Change subject: IMPALA-6636: Use async IO in ORC scanner ...................................................................... Patch Set 26: (10 comments) http://gerrit.cloudera.org:8080/#/c/15370/25/be/src/exec/hdfs-orc-scanner.h File be/src/exec/hdfs-orc-scanner.h: http://gerrit.cloudera.org:8080/#/c/15370/25/be/src/exec/hdfs-orc-scanner.h@393 PS25, Line 393: is within the remaining > clarification idea: "within the remaining part of the footer stream"? Done http://gerrit.cloudera.org:8080/#/c/15370/25/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15370/25/be/src/exec/hdfs-orc-scanner.cc@109 PS25, Line 109: Status status; > The same logic is also implemented here: Done http://gerrit.cloudera.org:8080/#/c/15370/25/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/15370/25/common/thrift/Query.thrift@572 PS25, Line 572: 141: optional bool orc_async_read = true; > I disagree with turning off async read by default, as this means that it is Flipped this to true. http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2106 PS25, Line 2106: hould compute total reservation > Should we check the orc_async_read flag here? I think this causes the estim Done http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2143 PS25, Line 2143: : * : * If there are nested co > nit: Could you update this? Done. Add more clarifications with examples. http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2161 PS25, Line 2161: ble > nit: long Done http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2181 PS25, Line 2181: > nit: long Done http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2199 PS25, Line 2199: : } else { > typo, I mean "not doing it in async manner". Patch set 26 retain this, but change the comment for better clarity. http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2274 PS25, Line 2274: scan range. Returns t > not used? Done http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2287 PS25, Line 2287: Preconditions.checkNotNull(column); : long reservationBytes = DEFAULT_COLUMN_SCAN_RANGE_RESERVATION; : ColumnStats stats = column.getStats(); : if (stats.hasAvgSize() && maxScanRangeNumRows_ != -1) { : // Estimate the column's uncompressed data size based on row count and average : // size. : // TODO: Size of strings seems to be underestimated, as avg size returns the : // average length of the strings and does not include the 4 byte length : // field used in Parquet plain encoding. (IMPALA-8431) : reservationBytes = > We probably need to review this for ORC. We can create a follow up JIRA if Filed IMPALA-11104 for this. -- To view, visit http://gerrit.cloudera.org:8080/15370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074 Gerrit-Change-Number: 15370 Gerrit-PatchSet: 26 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Comment-Date: Wed, 02 Feb 2022 22:13:54 +0000 Gerrit-HasComments: Yes