Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: IMPALA-6636: Use async IO in ORC scanner
......................................................................


Patch Set 25:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15370/25//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15370/25//COMMIT_MSG@24
PS25, Line 24: relies on the backend to divide them as
             : needed.
> I think we can do this in the orc-scanner as well. There are some APIs like
Filed IMPALA-11099 for this.


http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/15370/25/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2199
PS25, Line 2199: the current ORC scanner does not have the select
               :         // count(*) optimization yet like in Parquet.
> Isn't this the optimization for count(*)? https://github.com/apache/impala/
Select count(*) over nested column still need to read an ORC column, for 
example:

select count(*) from complextypes_partitioned.int_array

For this kind of query, that code region will not be evaluated since 
IsZeroSlotTableScan() == false (materialized_slots is empty, but 
tuple_desc()->tuple_path() is not empty).
Therefore, we still need to allocate memory to read column int_array in this 
example.

I check for parquet and it's select count optimization is not turned on in this 
example. returned columnByteSizes is also empty. I suppose it still read the 
column, but not doing it in sync manner. We can do it too if we want, skip 
async io if materialized_slots is empty.



--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 25
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Tue, 01 Feb 2022 04:40:36 +0000
Gerrit-HasComments: Yes

Reply via email to