Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: IMPALA-6636: Use async IO in ORC scanner
......................................................................


Patch Set 26:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/15370/25/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15370/25/be/src/exec/hdfs-orc-scanner.h@393
PS25, Line 393: is within the remaining
> clarification idea: "within the remaining part of the footer stream"?
Done


http://gerrit.cloudera.org:8080/#/c/15370/25/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/25/be/src/exec/hdfs-orc-scanner.cc@109
PS25, Line 109:   Status status;
> The same logic is also implemented here:
Done


http://gerrit.cloudera.org:8080/#/c/15370/25/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/15370/25/common/thrift/Query.thrift@572
PS25, Line 572:   141: optional bool orc_async_read = true;
> I disagree with turning off async read by default, as this means that it is
Flipped this to true.


http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2106
PS25, Line 2106: hould compute total reservation
> Should we check the orc_async_read flag here? I think this causes the estim
Done


http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2143
PS25, Line 2143:
               :    *
               :    * If there are nested co
> nit: Could you update this?
Done. Add more clarifications with examples.


http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2161
PS25, Line 2161: ble
> nit: long
Done


http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2181
PS25, Line 2181:
> nit: long
Done


http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2199
PS25, Line 2199:
               :         } else {
> typo, I mean "not doing it in async manner".
Patch set 26 retain this, but change the comment for better clarity.


http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2274
PS25, Line 2274:  scan range. Returns t
> not used?
Done


http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2287
PS25, Line 2287:     Preconditions.checkNotNull(column);
               :     long reservationBytes = 
DEFAULT_COLUMN_SCAN_RANGE_RESERVATION;
               :     ColumnStats stats = column.getStats();
               :     if (stats.hasAvgSize() && maxScanRangeNumRows_ != -1) {
               :       // Estimate the column's uncompressed data size based on 
row count and average
               :       // size.
               :       // TODO: Size of strings seems to be underestimated, as 
avg size returns the
               :       //       average length of the strings and does not 
include the 4 byte length
               :       //       field used in Parquet plain encoding. 
(IMPALA-8431)
               :       reservationBytes =
> We probably need to review this for ORC. We can create a follow up JIRA if
Filed IMPALA-11104 for this.



--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 26
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Wed, 02 Feb 2022 22:13:54 +0000
Gerrit-HasComments: Yes

Reply via email to