[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint where applicable

2020-11-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16792 )

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint 
where applicable
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16792/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16792/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@210
PS1, Line 210: if (estimatedTotalRows > 0 && limitValue > 0 && 
limitValue <= estimatedTotalRows) {
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16792/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@211
PS1, Line 211:   long percent = Math.max(1, (long) ((double) limitValue 
/ estimatedTotalRows * 100));
line too long (94 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 1
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 30 Nov 2020 01:43:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint where applicable

2020-11-29 Thread Aman Sinha (Code Review)
Aman Sinha has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16792


Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint 
where applicable
..

IMPALA-10360: Allow simple limit to be treated as sampling hint where applicable

As a follow-up to IMPALA-10314, it is sometimes useful to consider
a simple limit as a way to sample from a table if a relevant hint
has been provided. Doing a sample instead of pure limit serves
dual purposes: (a) it still helps with reducing the planning time
since the scan ranges need be computed only for the sample files,
(b) it allows sufficient number of files/rows to be read from
the table such that after applying filter conditions or joins with
another table, the query may still produce the N rows needed for
limit.

This fuctionality is especially useful if the query is against a
view (note that TABLESAMPLE clause cannot be applied to a view).

In this patch, a new table level hint, 'convert_limit_to_sample'
is added. If this hint is attached to a table either in the main
query block or within a view/subquery and simple limit optimization
conditions are satisfied (according to IMPALA-10314), the limit
is converted to a table sample. For example:

 set optimize_simple_limit = true;
 CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample]
WHERE [always_true] ;
 SELECT * FROM v1 LIMIT 10;

In this case, the limit 10 is converted to a sample of T and the
sampling percent is the greater of 1% or ratio (in percent) of
limit to the estimated row count of the table.

Testing:
 - Added a alltypes_date_partition_2 table where the date and
   timestamp values match (this helps with setting the
   'always_true' hint).
 - Added views with 'convert_limit_to_sample' and 'always_true'
   hints.
 - Added new tests against the views. Modified a few existing
   tests to reference the new table variant.

Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
---
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test
9 files changed, 222 insertions(+), 35 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16792/1
--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 1
Gerrit-Owner: Aman Sinha