[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7685/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 01 Dec 2021 06:30:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 01 Dec 2021 06:30:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 5:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7684/


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 01 Dec 2021 06:01:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] WiP: IMPALA-10798 : Prototype for JSON reader

2021-11-30 Thread Anonymous Coward (Code Review)
shikha.asran...@gmail.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
..


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/CMakeLists.txt
File be/src/exec/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/CMakeLists.txt@71
PS9, Line 71:
> nit: redundant whitespace
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h
File be/src/exec/hdfs-json-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@38
PS9, Line 38: #include 
: #include 
: #include 
: #include 
: #include 
: #include 
: #include "exec/hdfs-scan-node.h"
> nit: use <> for external includes
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@132
PS9, Line 132: row_read
> nit: rows_read_
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@133
PS9, Line 133: num_rows
> nit: num_rows_
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@136
PS9, Line 136:   std::shared_ptr reader_ = nullptr;
> nit: could you move this above to be together with the methods?
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@19
PS9, Line 19: #include "common/names.h"
: #include "runtime/collection-value-builder.h"
: #include "runtime/datetime-simple-date-format-parser.h"
: #include "runtime/io/request-context.h"
: #include "runtime/mem-tracker.h"
: #include "runtime/row-batch.h"
: #include "runtime/runtime-filter.inline.h"
: #include "runtime/timestamp-value.h"
: #include "runtime/timestamp-value.inline.h"
: #include "runtime/tuple-row.h"
: #include "util/decompress.h"
:
: using namespace impala;
: using namespace impala::io;
:
: Status HdfsJsonScanner::IssueIni
> nit: could you please remove headers that are included in hdfs-json-scanner
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@296
PS9, Line 296: *(reinterpret_cast
> nit: for (const auto& column : table_->columns())
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@305
PS9, Line 305: memcpy(blob_, blob->data(), blob->size());
 : char* src_ptr;
> nit: could you rename these variables? I'm confused in what "ar" means.. BT
Done


http://gerrit.cloudera.org:8080/#/c/17771/9/cmake_modules/FindArrow.cmake
File cmake_modules/FindArrow.cmake:

http://gerrit.cloudera.org:8080/#/c/17771/9/cmake_modules/FindArrow.cmake@18
PS9, Line 18: # - Find Arrow (headers and libarrow.a) with ARROW_ROOT hinting a 
location
: # This module defines
: #  ARROW_INCLUDE_DIR, directory containing headers
: #  ARROW_STATIC_LIB, path to libarrow.a
: #  ARROW_FOU
> nit: please update these comments
Done



--
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 01 Dec 2021 01:30:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

2021-11-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() 
statements
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift@154
PS1, Line 154:
> Tried to move the new field to the last spot, but did not work. Backend c++
Changed code to set the new field with setter function.



--
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 7
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 01 Dec 2021 01:10:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() 
statements
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9857/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 7
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 01 Dec 2021 01:06:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() 
statements
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9856/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 01 Dec 2021 01:03:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

2021-11-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#7) to the change originally created 
by Abhishek Rawat. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() 
statements
..

WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.
- TODO: add frontend tests for the expression rewrite.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
   - Analysis finished: 14.533ms (13.881ms)
   - Analysis finished: 36.736ms (35.478ms)
   - Analysis finished: 112.932ms (108.913ms)
   - Analysis finished: 357.739ms (352.843ms)
   - Analysis finished: 1s242ms (1s234ms)
   - Analysis finished: 5s832ms (5s815ms)
   - Analysis finished: 28s994ms (28s960ms)
   - Analysis finished: 2m28s (2m28s)

Test:
   - Analysis finished: 2.107ms (1.380ms)
   - Analysis finished: 6.176ms (4.887ms)
   - Analysis finished: 20.043ms (17.569ms)
   - Analysis finished: 58.013ms (53.620ms)
   - Analysis finished: 241.455ms (232.775ms)
   - Analysis finished: 1s084ms (1s067ms)
   - Analysis finished: 5s718ms (5s674ms)
   - Analysis finished: 45s177ms (45s107ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/fragment-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
A 
testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
15 files changed, 173 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13645/7
--
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 7
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

2021-11-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() 
statements
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13645/6/tests/query_test/test_codegen.py
File tests/query_test/test_codegen.py:

http://gerrit.cloudera.org:8080/#/c/13645/6/tests/query_test/test_codegen.py@101
PS6, Line 101:
> flake8: W391 blank line at end of file
fixed



--
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 01 Dec 2021 00:43:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() 
statements
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13645/6/tests/query_test/test_codegen.py
File tests/query_test/test_codegen.py:

http://gerrit.cloudera.org:8080/#/c/13645/6/tests/query_test/test_codegen.py@101
PS6, Line 101:
flake8: W391 blank line at end of file



--
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 01 Dec 2021 00:40:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

2021-11-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#6) to the change originally created 
by Abhishek Rawat. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() 
statements
..

WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.
- TODO: add frontend tests for the expression rewrite.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
   - Analysis finished: 14.533ms (13.881ms)
   - Analysis finished: 36.736ms (35.478ms)
   - Analysis finished: 112.932ms (108.913ms)
   - Analysis finished: 357.739ms (352.843ms)
   - Analysis finished: 1s242ms (1s234ms)
   - Analysis finished: 5s832ms (5s815ms)
   - Analysis finished: 28s994ms (28s960ms)
   - Analysis finished: 2m28s (2m28s)

Test:
   - Analysis finished: 2.107ms (1.380ms)
   - Analysis finished: 6.176ms (4.887ms)
   - Analysis finished: 20.043ms (17.569ms)
   - Analysis finished: 58.013ms (53.620ms)
   - Analysis finished: 241.455ms (232.775ms)
   - Analysis finished: 1s084ms (1s067ms)
   - Analysis finished: 5s718ms (5s674ms)
   - Analysis finished: 45s177ms (45s107ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/fragment-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
A 
testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
15 files changed, 174 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13645/6
--
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 


[native-toolchain-CR] IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020

2021-11-30 Thread Quanlong Huang (Code Review)
Quanlong Huang has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18058 )

Change subject: IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020
..

IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020

ORC-1020 improves read performance of the ORC library in scanning random
integers. Columns that encoded into integers, e.g. dictionary encoded
strings, will also benifit from this. This patch bumps the ORC version
from 1.7.0-p3 to 1.7.0-p4 to have the improvement.

Test:
 - Build ORC locally.

Change-Id: Id0d3cc3b357e2f03f2eca7c289ec376d12d58061
Reviewed-on: http://gerrit.cloudera.org:8080/18058
Reviewed-by: Quanlong Huang 
Tested-by: Quanlong Huang 
---
M buildall.sh
A 
source/orc/orc-1.7.0-patches/0004-ORC-1020-C-Optimize-RleDecoderV2-nextDirect-base-on-.patch
2 files changed, 962 insertions(+), 3 deletions(-)

Approvals:
  Quanlong Huang: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id0d3cc3b357e2f03f2eca7c289ec376d12d58061
Gerrit-Change-Number: 18058
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[native-toolchain-CR] IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020

2021-11-30 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18058 )

Change subject: IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0d3cc3b357e2f03f2eca7c289ec376d12d58061
Gerrit-Change-Number: 18058
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Dec 2021 00:38:44 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020

2021-11-30 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18058 )

Change subject: IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020
..


Patch Set 2: Code-Review+2

(1 comment)

Carry on Csaba's +2.

http://gerrit.cloudera.org:8080/#/c/18058/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18058/1//COMMIT_MSG@11
PS1, Line 11: bump
> nit: bumps
Done



--
To view, visit http://gerrit.cloudera.org:8080/18058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0d3cc3b357e2f03f2eca7c289ec376d12d58061
Gerrit-Change-Number: 18058
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Dec 2021 00:37:53 +
Gerrit-HasComments: Yes


[native-toolchain-CR] IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020

2021-11-30 Thread Quanlong Huang (Code Review)
Hello Zoltan Borok-Nagy, Csaba Ringhofer,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18058

to look at the new patch set (#2).

Change subject: IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020
..

IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020

ORC-1020 improves read performance of the ORC library in scanning random
integers. Columns that encoded into integers, e.g. dictionary encoded
strings, will also benifit from this. This patch bumps the ORC version
from 1.7.0-p3 to 1.7.0-p4 to have the improvement.

Test:
 - Build ORC locally.

Change-Id: Id0d3cc3b357e2f03f2eca7c289ec376d12d58061
---
M buildall.sh
A 
source/orc/orc-1.7.0-patches/0004-ORC-1020-C-Optimize-RleDecoderV2-nextDirect-base-on-.patch
2 files changed, 962 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/58/18058/2
--
To view, visit http://gerrit.cloudera.org:8080/18058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0d3cc3b357e2f03f2eca7c289ec376d12d58061
Gerrit-Change-Number: 18058
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18050 )

Change subject: [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9855/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8a31a574b364f39b049a4bae33a8b98c5fc20bd
Gerrit-Change-Number: 18050
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Tue, 30 Nov 2021 23:57:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 5:

There is one test failed at "Rows Processed" check in Dockerised-test but it 
seems similar to https://issues.apache.org/jira/browse/IMPALA-6004. It seems 
irrelevant to the patch.

Other failures in "ubuntu-16.04-from-scratch" didn't exist in one previous 
build so they might be flasky. A previous run of the same patch passed at: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/15371/.


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 30 Nov 2021 23:47:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7684/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 30 Nov 2021 23:38:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2

2021-11-30 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18050 )

Change subject: [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2
..


Patch Set 3:

Resolve issue p1:

Section DISTRIBUTEDPLAN of query:
with v as ( 
  select row_number() over (order by 'a') as rn from functional.alltypes
) 
select count(distinct rn) from v


Actual does not match expected result:
PLAN-ROOT SINK  
|   
 
03:AGGREGATE [FINALIZE]
|  output: count(rn) 
|  row-size=8B cardinality=1
  
|   
02:AGGREGATE [FINALIZE]<===   FINALIZE should not appear(resolved)


--
To view, visit http://gerrit.cloudera.org:8080/18050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8a31a574b364f39b049a4bae33a8b98c5fc20bd
Gerrit-Change-Number: 18050
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Tue, 30 Nov 2021 23:36:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2

2021-11-30 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/18050 )

Change subject: [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2
..

[WIP] IMPALA-10992 Planner changes for estimate peak memory - v2

This patch provides relan support in planner when a set of executor
groups are present.

In the patch, a distributed plan is generated for each executor
group and returned only if its estimated memory (per host) is no
more than the query size threshold for the group. The search is
efficient as the executor groups are sorted in increasing order
of the query size threshold.

A new query option 'enable_replan', default to true, is added. It
can be set to false to restore to the previous behavior.

Change-Id: If8a31a574b364f39b049a4bae33a8b98c5fc20bd
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/Frontend.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/common/IdGenerator.java
M fe/src/main/java/org/apache/impala/common/TreeNode.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/util/ClassUtil.java
M fe/src/main/java/org/apache/impala/util/ExecutorMembershipSnapshot.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
14 files changed, 329 insertions(+), 138 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/18050/3
--
To view, visit http://gerrit.cloudera.org:8080/18050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If8a31a574b364f39b049a4bae33a8b98c5fc20bd
Gerrit-Change-Number: 18050
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 5:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7683/


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 30 Nov 2021 23:30:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-6636: Use async IO in ORC scanner

2021-11-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15370 )

Change subject: WIP IMPALA-6636: Use async IO in ORC scanner
..


Patch Set 13:

(4 comments)

Hi All,
David has ran several benchmark run. Perf number seems to improve from this 
async IO prototype.
I will proceed cleaning up the code and add proper commit message. Here are 
some that I plan to address next.

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-orc-scanner.h@123
PS13, Line 123:  // ExecEnv::GetInstance()->disk_io_mgr()->max_buffer_size();
Can be removed?


http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-orc-scanner.cc@300
PS13, Line 300: // stream_->ReleaseCompletedResources(true);
  : stream_->ReleaseCompletedResources(false);
Calling 'ReleaseCompletedResources(true)' seems to be OK here?


http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/15370/13/be/src/exec/hdfs-scan-node-base.cc@821
PS13, Line 821:   // DCHECK_LE(offset + len, 
GetFileDesc(metadata->partition_id, file)->file_length)
  :   //<< "Scan range beyond end of file (offset=" << offset 
<< ", len=" << len << ")";
Can be removed?


http://gerrit.cloudera.org:8080/#/c/15370/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/15370/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2134
PS13, Line 2134: for (SlotDescriptor slot: desc_.getSlots()) {
Just for our note, we found a corner case here for "select count(*)" kind of 
query over ORC.
Somehow, desc._getSlots() is empty in this corner case, but 
HdfsOrcScanner::StartColumnReading actually see couple streams that is eligible 
for async read.

Patch set 12 already adds a workaround within 
HdfsOrcScanner::StartColumnReading to TryIncreaseReservation 8KB 
(min_buffer_size) for each eligible stream. If it can't increase, then the rest 
of the stream will be read synchronously. I will file a follow up JIRA to 
document this situation.



--
To view, visit http://gerrit.cloudera.org:8080/15370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I348ad9e55f0cae7dff0d74d941b026dcbf5e4074
Gerrit-Change-Number: 15370
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 30 Nov 2021 23:17:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

2021-11-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() 
statements
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13645/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13645/1//COMMIT_MSG@38
PS1, Line 38:   Results below.
> I think we should have some frontend tests for the expression rewrite part
Will add new frontend tests


http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exec/union-node.h
File be/src/exec/union-node.h:

http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exec/union-node.h@70
PS1, Line 70:   bool is_codegen_status_added_ = false;
> Can you mention that this is only used for observability? Or something alon
Done


http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exprs/scalar-expr.h@402
PS1, Line 402:   /// it the ScalarExprEvaluator and TupleRow. These are 
cross-compiled and used by
> Could be const since it always initialised in the constructor.
Done


http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/runtime/runtime-state.h@155
PS1, Line 155:
> Please use an int64_t - the style guide says to stick to signed integers (w
Done


http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift@153
PS1, Line 153:   3: required i32 num_children
> I was thinking about whether it would be better to invert the meaning of th
ok


http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift@154
PS1, Line 154:
> I would avoid renumbering the fields. you could just make this 22.
Tried to move the new field to the last spot, but did not work. Backend c++ 
code always get the value as false even the frontend set it as true.


http://gerrit.cloudera.org:8080/#/c/13645/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/13645/1/fe/src/main/java/org/apache/impala/analysis/Expr.java@400
PS1, Line 400:   private boolean isConstant_;
> Non-standard formatting. If we use braces, we generally just put the statem
done



-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 30 Nov 2021 21:50:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

2021-11-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#5) to the change originally created 
by Abhishek Rawat. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() 
statements
..

WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.
- TODO: add frontend tests for the expression rewrite.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
   - Analysis finished: 14.533ms (13.881ms)
   - Analysis finished: 36.736ms (35.478ms)
   - Analysis finished: 112.932ms (108.913ms)
   - Analysis finished: 357.739ms (352.843ms)
   - Analysis finished: 1s242ms (1s234ms)
   - Analysis finished: 5s832ms (5s815ms)
   - Analysis finished: 28s994ms (28s960ms)
   - Analysis finished: 2m28s (2m28s)

Test:
   - Analysis finished: 2.107ms (1.380ms)
   - Analysis finished: 6.176ms (4.887ms)
   - Analysis finished: 20.043ms (17.569ms)
   - Analysis finished: 58.013ms (53.620ms)
   - Analysis finished: 241.455ms (232.775ms)
   - Analysis finished: 1s084ms (1s067ms)
   - Analysis finished: 5s718ms (5s674ms)
   - Analysis finished: 45s177ms (45s107ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/fragment-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
A 
testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
15 files changed, 192 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13645/5
--
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10970: Fix criterion for classifying coordinator only query

2021-11-30 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17937 )

Change subject: IMPALA-10970: Fix criterion for classifying coordinator only 
query
..

IMPALA-10970: Fix criterion for classifying coordinator only query

This patch fixes a bug in the criterion which decided whether a query
can be considered as a coordinator only query. It did not consider
the possibility of parallel plans and ended up mis-classifying some
queries as coordinator only queries.

This classification was used during scheduling when dedicated
coordinators and executor groups are used and allowed coordinator
queries to be scheduled only on the coordinator even in the absence
of healthy executor groups.

As a result of this bug, queries classified wrongly ended up with
error code: NO_REGISTERED_BACKENDS.

Testing:
- Add new mt_dop test case for functional_query and pass
- Ran and passed custom_cluster/test_coordinators, test_executor_groups

Change-Id: Icaaf1f1ba7a976122b4d37bd675e6d8181dc8700
Reviewed-on: http://gerrit.cloudera.org:8080/17937
Tested-by: Impala Public Jenkins 
Reviewed-by: Bikramjeet Vig 
---
M be/src/scheduling/scheduler.cc
M common/thrift/Query.thrift
M testdata/workloads/functional-query/queries/QueryTest/mt-dop.test
3 files changed, 129 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Bikramjeet Vig: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17937
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icaaf1f1ba7a976122b4d37bd675e6d8181dc8700
Gerrit-Change-Number: 17937
Gerrit-PatchSet: 5
Gerrit-Owner: guojingfeng 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: guojingfeng 


[Impala-ASF-CR] IMPALA-10970: Fix criterion for classifying coordinator only query

2021-11-30 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17937 )

Change subject: IMPALA-10970: Fix criterion for classifying coordinator only 
query
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17937
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icaaf1f1ba7a976122b4d37bd675e6d8181dc8700
Gerrit-Change-Number: 17937
Gerrit-PatchSet: 4
Gerrit-Owner: guojingfeng 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: guojingfeng 
Gerrit-Comment-Date: Tue, 30 Nov 2021 20:12:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18050 )

Change subject: [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9854/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8a31a574b364f39b049a4bae33a8b98c5fc20bd
Gerrit-Change-Number: 18050
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Tue, 30 Nov 2021 19:35:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2

2021-11-30 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18050 )

Change subject: [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2
..


Patch Set 2:

Major work to backup the following info to facilitate proper replan.

1. The structure of a query tree as the child of a node can be altered;
2. The next node Id before create*Fragment() call;
3. The fragment Id (reset to 0);

The restoration of the above info when the current distributed plan is not good 
enough. In addition, AggregationNode.needsFinalize_ is restored to true 
unconditionally.


--
To view, visit http://gerrit.cloudera.org:8080/18050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8a31a574b364f39b049a4bae33a8b98c5fc20bd
Gerrit-Change-Number: 18050
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Tue, 30 Nov 2021 19:18:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2

2021-11-30 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/18050 )

Change subject: [WIP] IMPALA-10992 Planner changes for estimate peak memory - v2
..

[WIP] IMPALA-10992 Planner changes for estimate peak memory - v2

This patch provides relan support in planner when a set of executor
groups are present.

In the patch, a distributed plan is generated for each executor
group and returned only if its estimated memory (per host) is no
more than the query size threshold for the group. The search is
efficient as the executor groups are sorted in increasing order
of the query size threshold.

A new query option 'enable_replan', default to true, is added. It
can be set to false to restore the previous behavior.

Change-Id: If8a31a574b364f39b049a4bae33a8b98c5fc20bd
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/Frontend.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/common/IdGenerator.java
M fe/src/main/java/org/apache/impala/common/TreeNode.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/util/ClassUtil.java
M fe/src/main/java/org/apache/impala/util/ExecutorMembershipSnapshot.java
13 files changed, 337 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/18050/2
--
To view, visit http://gerrit.cloudera.org:8080/18050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If8a31a574b364f39b049a4bae33a8b98c5fc20bd
Gerrit-Change-Number: 18050
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 5:

Looks like the previous test failures were due to the change related to the 
flag. Once the gerrit job comes back with a +1 I can merge the patch.


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 30 Nov 2021 18:26:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 30 Nov 2021 18:25:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7683/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 30 Nov 2021 16:31:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7682/


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 30 Nov 2021 11:13:04 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020

2021-11-30 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18058 )

Change subject: IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18058/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18058/1//COMMIT_MSG@11
PS1, Line 11: bump
nit: bumps



--
To view, visit http://gerrit.cloudera.org:8080/18058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0d3cc3b357e2f03f2eca7c289ec376d12d58061
Gerrit-Change-Number: 18058
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 30 Nov 2021 10:32:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7942: Add query hints for cardinalities and selectivities

2021-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942: Add query hints for cardinalities and selectivities
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9853/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 30 Nov 2021 09:32:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7942: Add query hints for cardinalities and selectivities

2021-11-30 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942: Add query hints for cardinalities and selectivities
..


Patch Set 2:

(6 comments)

Thanks for cr. I've already modify the code. Sorry for late reply due to bug 
which already fixed in IMPALA-11021.

http://gerrit.cloudera.org:8080/#/c/18023/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18023/1//COMMIT_MSG@21
PS1, Line 21: hint value only valid when table does not have stats or stats is
> nit: line should have 72 or fewer characters
Done


http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/main/java/org/apache/impala/analysis/Predicate.java
File fe/src/main/java/org/apache/impala/analysis/Predicate.java:

http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/main/java/org/apache/impala/analysis/Predicate.java@32
PS1, Line 32:   // The allowed values is [0,1], 1 means all records are 
eligible, 0 mean all records
> nit: Documenting the allowed values and what would 0 and 1 mean would help
Done


http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/main/java/org/apache/impala/analysis/TableRef.java@526
PS1, Line 526:   return;
> nit: Warning here should tell the correct format of specifying the HDFS_NUM
Done


http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1556
PS1, Line 1556:
> Shouldn't user provided hint be given higher preference ?
In my opinion, table rows is precise, unless table not stats or has corrupt 
stats. This is different from selectivity. For selectiviy, Impala use very 
simple computing, and this may lead to worse plan even table has precise stast, 
so selectivity hint is higher priority, but num_rows has lower priority. How do 
you think?


http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@4971
PS1, Line 4971:   public void testPredicateHint() {
> What happens when we provide SELECTIVITY hint for Join predicates ?
Cool! I didn't consider this situation before, and added a check for single 
column predicate. If 'Predicate' is not single column predicate, Impala will 
print a warning msg, and this hint is invalid for this predicate.


http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@4974
PS1, Line 4974: AnalyzesOk("select * from tpch.lineitem where /* 
+ALWAYS_TRUE */ " +
> Can we include tests where SELECTIVITY is applied on expressions having sca
Done



--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 30 Nov 2021 09:12:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7942: Add query hints for cardinalities and selectivities

2021-11-30 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942: Add query hints for cardinalities and selectivities
..

IMPALA-7942: Add query hints for cardinalities and selectivities

Currently, Impala only use simple estimation to compute selectivity
for some predicates, other predicates, and this maybe lead to worse
query plan due to CBO. Hence, we add new hints to set these stats
manually in query to help us get better CBO. Maybe in the future,
we can use histograms to get more precise query plan.

This patch adds two query hints: 'HDFS_NUM_ROWS' and 'SELECTIVITY'.
We can add 'HDFS_NUM_ROWS' after a hdfs table in query like this:

  select col from t /* +HDFS_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows. But this
hint value only valid when table does not have stats or stats is
corrupt. Otherwise, Impala will use table original stats.

For 'SELECTIVITY' hint, we can use in these predicates:
* BinaryPredicate
* InPredicate
* IsNullPredicate
* LikePredicate, including 'not like' syntax
* BetweenPredicate, including 'not between and' syntax
Format like this:

  select col from t where a=1 /* +SELECTIVITY(0.5) */;

This value will replace original selectivity computing. These format
are not allowed:

  select col from t where (a=1) /* +SELECTIVITY(0.5) */;
  select col from t where (a=1 and b<2) /* +SELECTIVITY(0.5) */;
  select col from t1 where exists (...) /* +SELECTIVITY(0.5) */;

Pay attention, if you set selectivity hint like this:

  select col from t where (a=1 /* +SELECTIVITY(0.5) */ and b>2);

Impala will set 0.5 for first binary predicate, second is -1, so
Impala can not compute this predicate.The whole compound predicate
selectivity is still unavailable. Hence, for compound predicate, we
need ensure that each child selectivity is been set by hint or
computable. Otherwise, this hint maybe does not take effect as you
expected.
Another thing, for 'BetweenPredicate', Impala will transfom this
predicate to a 'CompoundPredicate' with two 'BinaryPredicate', if
set hint for 'BetweenPredicate' in query, we will split this hint
value for two 'BinaryPredicate' children.

Testing:
- Added new fe tests in 'PlannerTest'
- Added new fe tests in 'AnalyzeStmtsTest' for negative cases

Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/hdfs-cardinality-hint.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test
13 files changed, 1,475 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/18023/2
--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[native-toolchain-CR] IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020

2021-11-30 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18058


Change subject: IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020
..

IMPALA-11037: Bump ORC to 1.7.0-p4 to contain ORC-1020

ORC-1020 improves read performance of the ORC library in scanning random
integers. Columns that encoded into integers, e.g. dictionary encoded
strings, will also benifit from this. This patch bump the ORC version
from 1.7.0-p3 to 1.7.0-p4 to have the improvement.

Test:
 - Build ORC locally.

Change-Id: Id0d3cc3b357e2f03f2eca7c289ec376d12d58061
---
M buildall.sh
A 
source/orc/orc-1.7.0-patches/0004-ORC-1020-C-Optimize-RleDecoderV2-nextDirect-base-on-.patch
2 files changed, 962 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/58/18058/1
--
To view, visit http://gerrit.cloudera.org:8080/18058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id0d3cc3b357e2f03f2eca7c289ec376d12d58061
Gerrit-Change-Number: 18058
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang