Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16723/4/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test:

http://gerrit.cloudera.org:8080/#/c/16723/4/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@241
PS4, Line 241:  limit 1
> This makes me feel we should skip those files with 0 rows during pruning. I
In HdfsScanNode.computeScanRangeLocation(), we skip computing the scan range if 
file is empty:
  Line 912 on master:
          // Skips files that have no associated blocks.
          if (fileDesc.getNumFileBlocks() == 0) continue;
However, we populate the totalFilesPerFs_  treemap  earlier .. on line 891 and 
that's the one that gets used to display the EXPLAIN string. So, yeah there's 
some inconsistency in the display (although it is possible it is intentional to 
show all files including empty ones in the explain).

For my patch, there are 2 steps in which the pruning happens: (1) in  
HdfsPartitionPruner when I am limiting the number of partitions based only the 
number of file descriptors per partition - i.e not examining each file 
descriptor since that would have overhead,  and (2) in HdfsScanNode I am 
limiting the number of files since that code already iterates over the file 
descriptors.   I guess I could skip empty files in step 2 even though it would 
mess up the calculation that was done in step 1.



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Nov 2020 23:14:47 +0000
Gerrit-HasComments: Yes

Reply via email to