[Impala-ASF-CR] IMPALA-11049: Added expr analyzed check in 'SimplifyCastExprRule.java'

2022-02-14 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18099 )

Change subject: IMPALA-11049: Added expr analyzed check in 
'SimplifyCastExprRule.java'
..


Patch Set 8: Code-Review+2

Thanks for fixing this, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/18099
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2780e04a6d5a32e224cd0470cf6f166a832363ec
Gerrit-Change-Number: 18099
Gerrit-PatchSet: 8
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 14 Feb 2022 10:03:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsileHandles

2022-02-14 Thread Code Review
Gergely Fürnstáhl has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18191


Change subject: IMPALA-9433: Improved caching of HdfsileHandles
..

IMPALA-9433: Improved caching of HdfsileHandles

Seperated LRU caching functionality to a templated LruMultiCache class.

Replaced std::multimap with std::unordered_map with std::list for O(1)
lookups and less memory overhead, as it stores each key one time. Added
boost::intrusive::list to handle LRU relations with less overhead.
Added O(1) release method, instead of O(n) with minimal memory overhead.
Implemented RAII Accessor to remove the responsibility of releasing
the objects from the user.

Wrapped cache accessor and related DiskIOManager metrics to a
FileHandleCache::Accessor. Removed Release*() call trees from
FileHandleCache and DiskIOManager, removed scoped exit from
HdfsFileReader as they are handled automatically.

Testing:

Implemented extensive unit testing of the class, including forced
rehashes, collisions, capacity overshoot, explicit/automatic release
and destroy.

Ran tests/custom_cluster/test_hdfs_fd_caching.py to verify
FileHandleCache::Accessor behaviour through metrics.

Ran bin/single_node_perf_run.py with TPCH and TPC-DS on parquet tables,
no visible change in performance:
TPCH   scale=10 iterations=100: Delta(Avg)=-0.67% Delta(GeoMean)=-0.49%
TPC-DS scale=10 iterations= 50: Delta(Avg)=-0.02% Delta(GeoMean)= 0.00%

Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
---
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/handle-cache.h
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/CMakeLists.txt
A be/src/util/lru-multi-cache-test.cc
A be/src/util/lru-multi-cache.h
A be/src/util/lru-multi-cache.inline.h
9 files changed, 1,188 insertions(+), 274 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/18191/18
--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 18
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

2022-02-14 Thread Code Review
Gergely Fürnstáhl has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
..

IMPALA-9433: Improved caching of HdfsFileHandles

Seperated LRU caching functionality to a templated LruMultiCache class.

Replaced std::multimap with std::unordered_map with std::list for O(1)
lookups and less memory overhead, as it stores each key one time. Added
boost::intrusive::list to handle LRU relations with less overhead.
Added O(1) release method, instead of O(n) with minimal memory overhead.
Implemented RAII Accessor to remove the responsibility of releasing
the objects from the user.

Wrapped cache accessor and related DiskIOManager metrics to a
FileHandleCache::Accessor. Removed Release*() call trees from
FileHandleCache and DiskIOManager, removed scoped exit from
HdfsFileReader as they are handled automatically.

Testing:

Implemented extensive unit testing of the class, including forced
rehashes, collisions, capacity overshoot, explicit/automatic release
and destroy.

Ran tests/custom_cluster/test_hdfs_fd_caching.py to verify
FileHandleCache::Accessor behaviour through metrics.

Ran bin/single_node_perf_run.py with TPCH and TPC-DS on parquet tables,
no visible change in performance:
TPCH   scale=10 iterations=100: Delta(Avg)=-0.67% Delta(GeoMean)=-0.49%
TPC-DS scale=10 iterations= 50: Delta(Avg)=-0.02% Delta(GeoMean)= 0.00%

Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
---
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/handle-cache.h
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/CMakeLists.txt
A be/src/util/lru-multi-cache-test.cc
A be/src/util/lru-multi-cache.h
A be/src/util/lru-multi-cache.inline.h
9 files changed, 1,188 insertions(+), 274 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/18191/19
--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 19
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
..


Patch Set 18:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/10149/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 18
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 11:39:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
..


Patch Set 19:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/10150/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 19
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 11:47:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11113 and IMPALA-11114: fixed single node perf run.py for TPCDS

2022-02-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18215 )

Change subject: IMPALA-3 and IMPALA-4: fixed single_node_perf_run.py 
for TPCDS
..


Patch Set 2:

Thanks for working on this! However, I still hit the following error at the end 
of the run:

Traceback (most recent call last):
  File "bin/single_node_perf_run.py", line 359, in 
main()
  File "bin/single_node_perf_run.py", line 349, in main
perf_ab_test(options, args)
  File "bin/single_node_perf_run.py", line 267, in perf_ab_test
compare(temp_dir, hash_a, hash_b)
  File "bin/single_node_perf_run.py", line 178, in compare
generate_profile_file(file_a, hash_a, base_dir)
  File "bin/single_node_perf_run.py", line 194, in generate_profile_file
data = json.load(fid)
  File 
"/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/__init__.py",
 line 291, in load
**kw)
  File 
"/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/__init__.py",
 line 339, in loads
return _default_decoder.decode(s)
  File 
"/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/decoder.py",
 line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File 
"/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/decoder.py",
 line 380, in raw_decode
obj, end = self.scan_once(s, idx)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd4 in position 1: invalid 
continuation byte

Did I miss something?


--
To view, visit http://gerrit.cloudera.org:8080/18215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I094763188a1f3ddf40b7140c65acf95918a6597f
Gerrit-Change-Number: 18215
Gerrit-PatchSet: 2
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 12:21:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11113 and IMPALA-11114: fixed single node perf run.py for TPCDS

2022-02-14 Thread Code Review
Gergely Fürnstáhl has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/18215 )

Change subject: IMPALA-3 and IMPALA-4: fixed single_node_perf_run.py 
for TPCDS
..

IMPALA-3 and IMPALA-4: fixed single_node_perf_run.py for TPCDS

Fixed the UTF-8 UnicodeDecodeError which was thrown while dumping and
loading the json file. Now the script ignores non-decodable characters.

Fixed the ZeroDevisionError coming from t-test when the standard
deviations were 0. "(N/A) Invalid t-test type" is shown for significant
changes and a hint at the end if any invalid t-test was detected.

Change-Id: I094763188a1f3ddf40b7140c65acf95918a6597f
---
M bin/run-workload.py
M bin/single_node_perf_run.py
M tests/benchmark/report_benchmark_results.py
3 files changed, 18 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/18215/4
--
To view, visit http://gerrit.cloudera.org:8080/18215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I094763188a1f3ddf40b7140c65acf95918a6597f
Gerrit-Change-Number: 18215
Gerrit-PatchSet: 4
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11113 and IMPALA-11114: fixed single node perf run.py for TPCDS

2022-02-14 Thread Code Review
Gergely Fürnstáhl has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18215 )

Change subject: IMPALA-3 and IMPALA-4: fixed single_node_perf_run.py 
for TPCDS
..


Patch Set 4:

> This patch is in 'Draft' mode. Could you please publish it?
 >
 > BTW, when updating the patch, we can push to 'refs/for/master'
 > instead of 'refs/drafts/master'.

Thanks, published


--
To view, visit http://gerrit.cloudera.org:8080/18215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I094763188a1f3ddf40b7140c65acf95918a6597f
Gerrit-Change-Number: 18215
Gerrit-PatchSet: 4
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 12:32:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

2022-02-14 Thread Code Review
Gergely Fürnstáhl has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
..

IMPALA-9433: Improved caching of HdfsFileHandles

Seperated LRU caching functionality to a templated LruMultiCache class.

Replaced std::multimap with std::unordered_map with std::list for O(1)
lookups and less memory overhead, as it stores each key one time. Added
boost::intrusive::list to handle LRU relations with less overhead.
Added O(1) release method, instead of O(n) with minimal memory overhead.
Implemented RAII Accessor to remove the responsibility of releasing
the objects from the user.

Wrapped cache accessor and related DiskIOManager metrics to a
FileHandleCache::Accessor. Removed Release*() call trees from
FileHandleCache and DiskIOManager, removed scoped exit from
HdfsFileReader as they are handled automatically.

Testing:

Implemented extensive unit testing of the class, including forced
rehashes, collisions, capacity overshoot, explicit/automatic release
and destroy.

Ran tests/custom_cluster/test_hdfs_fd_caching.py to verify
FileHandleCache::Accessor behaviour through metrics.

Ran bin/single_node_perf_run.py with TPCH and TPC-DS on parquet tables,
no visible change in performance:
TPCH   scale=10 iterations=100: Delta(Avg)=-0.67% Delta(GeoMean)=-0.49%
TPC-DS scale=10 iterations= 50: Delta(Avg)=-0.02% Delta(GeoMean)= 0.00%

Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
---
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/handle-cache.h
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/CMakeLists.txt
A be/src/util/lru-multi-cache-test.cc
A be/src/util/lru-multi-cache.h
A be/src/util/lru-multi-cache.inline.h
9 files changed, 1,183 insertions(+), 274 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/18191/20
--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 20
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11113 and IMPALA-11114: fixed single node perf run.py for TPCDS

2022-02-14 Thread Code Review
Gergely Fürnstáhl has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18215 )

Change subject: IMPALA-3 and IMPALA-4: fixed single_node_perf_run.py 
for TPCDS
..


Patch Set 4:

> Thanks for working on this! However, I still hit the following
 > error at the end of the run:
 >
 > Traceback (most recent call last):
 > File "bin/single_node_perf_run.py", line 359, in 
 > main()
 > File "bin/single_node_perf_run.py", line 349, in main
 > perf_ab_test(options, args)
 > File "bin/single_node_perf_run.py", line 267, in perf_ab_test
 > compare(temp_dir, hash_a, hash_b)
 > File "bin/single_node_perf_run.py", line 178, in compare
 > generate_profile_file(file_a, hash_a, base_dir)
 > File "bin/single_node_perf_run.py", line 194, in generate_profile_file
 > data = json.load(fid)
 > File 
 > "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/__init__.py",
 > line 291, in load
 > **kw)
 > File 
 > "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/__init__.py",
 > line 339, in loads
 > return _default_decoder.decode(s)
 > File 
 > "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/decoder.py",
 > line 364, in decode
 > obj, end = self.raw_decode(s, idx=_w(s, 0).end())
 > File 
 > "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/json/decoder.py",
 > line 380, in raw_decode
 > obj, end = self.scan_once(s, idx)
 > UnicodeDecodeError: 'utf8' codec can't decode byte 0xd4 in position
 > 1: invalid continuation byte
 >
 > Did I miss something?

Fixed it in patch set 4


--
To view, visit http://gerrit.cloudera.org:8080/18215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I094763188a1f3ddf40b7140c65acf95918a6597f
Gerrit-Change-Number: 18215
Gerrit-PatchSet: 4
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 12:38:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11113 and IMPALA-11114: fixed single node perf run.py for TPCDS

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18215 )

Change subject: IMPALA-3 and IMPALA-4: fixed single_node_perf_run.py 
for TPCDS
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10151/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I094763188a1f3ddf40b7140c65acf95918a6597f
Gerrit-Change-Number: 18215
Gerrit-PatchSet: 4
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 12:55:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10152/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 20
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 13:03:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2019(part-4): Add UTF-8 support for case conversion functions

2022-02-14 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17785 )

Change subject: IMPALA-2019(part-4): Add UTF-8 support for case conversion 
functions
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17785/10/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17785/10/be/src/exprs/string-functions-ir.cc@379
PS10, Line 379:   uint8_t* result_ptr = result.ptr;
This will be null if the allocation fails in the constructor. We should handle 
it similarly to the failure of resize()


http://gerrit.cloudera.org:8080/#/c/17785/10/be/src/exprs/string-functions-ir.cc@383
PS10, Line 383: mbstate_t
My understanding is that we should use the same mbstate_t during the processing 
of a string, as it's goal is to allow a conversion function to depend on the 
previous characters. This probably doesn't matter in utf8 though.


http://gerrit.cloudera.org:8080/#/c/17785/10/be/src/exprs/string-functions-ir.cc@422
PS10, Line 422: context->has_error()
We shouldn't need this, Resize() should return false on error.



--
To view, visit http://gerrit.cloudera.org:8080/17785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd
Gerrit-Change-Number: 17785
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 14 Feb 2022 14:45:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10838: Error when struct returned from WITH()

2022-02-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17847 )

Change subject: IMPALA-10838: Error when struct returned from WITH()
..


Patch Set 15:

(6 comments)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/17847/14/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java:

http://gerrit.cloudera.org:8080/#/c/17847/14/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@239
PS14, Line 239: // Struct children are allowed to be non-materialised because 
the query may only
  :   // concern a subset of the fields of the struct.
> It is not the child but the struct that we keep.
I see. Thanks for the explanation.

So the purpose of the IF test below at line 242 is to retain any structs which 
may have a reference in the query. But would it be possible that some of the 
structs do not have any references in the query? If so, we still can remove 
them.


http://gerrit.cloudera.org:8080/#/c/17847/14/fe/src/main/java/org/apache/impala/analysis/Path.java
File fe/src/main/java/org/apache/impala/analysis/Path.java:

http://gerrit.cloudera.org:8080/#/c/17847/14/fe/src/main/java/org/apache/impala/analysis/Path.java@480
PS14, Line 480: prefixPath.size()
> List.subList would throw an exception but we check it on L479 so it is guar
Done


http://gerrit.cloudera.org:8080/#/c/17847/14/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/17847/14/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@186
PS14, Line 186: if (tupleDescs != null)
> In the WHILE loop we check 'tupleDesc' of type TupleDescriptor, here it is
Aha. my bad.

Maybe rename tupleDesc as parentTupleDesc to avoid confusion?


http://gerrit.cloudera.org:8080/#/c/17847/14/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@188
PS14, Line 188: parentStructSl
> Renamed it to parentStructSlotDesc as I think it is important that it shoul
Done


http://gerrit.cloudera.org:8080/#/c/17847/15/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test:

http://gerrit.cloudera.org:8080/#/c/17847/15/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@187
PS15, Line 187: , sub.id
I wonder if sub.id can be removed from the order clause to fit the test 
description better.


http://gerrit.cloudera.org:8080/#/c/17847/15/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@198
PS15, Line 198: # WITH clause creates an inline view containing a nested 
struct; filter by a struct field
  : # from the inline view.
  : with sub as (
  : select id, outer_struct from 
functional_orc_def.complextypes_nested_structs)
  : select sub.id, sub.outer_struct.str
  : from sub
  : where length(sub.outer_struct.str) < 4;
May also add tests as follows.

1.  The main query does not select from the inline view at all;
2. The inline return two structs and the main query refers to only one struct.



--
To view, visit http://gerrit.cloudera.org:8080/17847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
Gerrit-Change-Number: 17847
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Mon, 14 Feb 2022 15:23:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-14 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 8: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 15:34:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: WIP IMPALA-10898: Add runtime IN-list filters for ORC tables
..


Patch Set 6:

(6 comments)

Thanks!

http://gerrit.cloudera.org:8080/#/c/18141/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18141/4//COMMIT_MSG@30
PS4, Line 30:
> change to "with"? It means the IN-list has 4 items.
Okay.


http://gerrit.cloudera.org:8080/#/c/18141/4//COMMIT_MSG@34
PS4, Line 34:  ps_partkey and l_suppkey = ps_suppkey;
:
> You are right but not sure we have misunderstanding here. There are two kin
Good to know! Thanks for the explanation.


http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc@1221
PS4, Line 1221: f (in_list_filter->AlwaysTrue()) continue;
> Yeah, the check is done by FE: https://github.com/apache/impala/blob/6c845e
I was originally thinking that when the target of a IN-list filter is partition 
columns, then the target can be removed in FE.

Doing the test here means such targets are retained in the plan and do not 
contribute.

Personally, I feel we should allow the target to be a partition column in this 
patch to pick up good performance gain, especially for large tables with 
hundreds of partitions. The code to deal with partition column is here: 
https://github.com/apache/impala/blob/master/be/src/exec/hdfs-scan-node-base.cc#L922.
 Seems your code will work out of box in this situation if line @1221 is 
removed.


http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc@1271
PS4, Line 1271:
> PrepareSearchArguments() will be called multiple times after this patch. Th
Okay.

Calling PrepareSearchArguments() for each ORC stripe may be an overkill. My 
understanding is that there is a consolidation step to merge the filters from 
different partitions (for PARTITIONED HJ). Only the merged filter can arrive at 
the scan node. For BROADCAST HJ, such merge step os not needed.


http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@394
PS4, Line 394: r
> I think it's assumed that both sides are casted to the same type. EQUALS pr
It also depends on how ORC layer handles the types.

>From https://orc.apache.org/api/orc-core/org/apache/orc/Reader.Options.html, 
>https://orc.apache.org/api/hive-storage-api/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.html?is-external=true
> and 
>https://orc.apache.org/api/hive-storage-api/org/apache/hadoop/hive/ql/io/sarg/PredicateLeaf.html,
> it seems the literal list can only take one of the four primitive typed 
>objects: Integer, Long, Double, or String. Denote such a type T.  Then 
>technically, it is sufficient that both the inner and the outer, after 
>optional casting, are of type T. Note also that we need to verify the 
>surviving column values because of IN-list predicates being mapped to ORC 
>bloom filters.

The rules of casting may be like this, in the order of priority.

1. If either the inner or outer is small/tiny int, cast both to int;
2. If either is less than or equal to int, cast both to int;
3. If either is less than or equal to big int, cast both to big int;
4. If either is less than or equal to double, cast both to double;
5. If either is SQL character types, cast both to string;


I think it is a good idea to verify the types here to make it possible to 
detect type mismatch early.


http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@742
PS4, Line 742:   public int compare(RuntimeFilter a, RuntimeFilter b) {
> I think it's very likely that partitioned HJs will exceed the threshold. Bu
Sounds like a good idea to handle partitioned HJs in another JIRA.

We can borrow BE code from min/max filters to handle both 1) and 2).



--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 14 Feb 2022 17:04:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory

2022-02-14 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/18178 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory
..

IMPALA-10992 Planner changes for estimate peak memory

This patch provides replan support for a set of executor groups. Each
executor group in the set is associated with a distinct number of nodes
and a threshold for estimated memory per host in bytes that can be
denoted as [:<#nodes>, ].

In the patch, a query of type EXPLAIN, QUERY or DML can be compiled
more than once. In each attempt, per host memory is estimated and
compared with the threshold of an executor group. If the estimated
memory is no more than the threshold, the iteration process terminates
and the final plan is determined. The executor group with the threshold
is selected to run the query.

A new query option 'enable_replan', default to 1 (enabled), is added.
It can be set to 0 to disable this patch and to generate the distributed
plan for the default executor group.

To avoid long compilation time, the following enhancement is enabled.
Note 1) and 2) can be disabled when relevant meta-data change is
detected.

 1. Authorization is performed only for the 1st compilation;
 2. The needed meta-data is fetched into a StmtTableCache in 1st
compilation and reused in subsequent compilations;
 3. openTransaction() is called for transactional queries in 1st
compilation and the saved transactional info is used in
subsequent compilations. Similar logic is applied to Kudu
transactional queries.

To facilitate testing, the patch imposes an artificial two executor
group setup in FE as follows.

 1. [regular:<#nodes>, 64MB]
 2. [large:<#nodes>, 8PB]

This setup is enabled when a new query option 'test_replan' is set
to 1 in backend tests, or RuntimeEnv.INSTANCE.isTestEnv() is true as
in most frontend tests. This query option is set to 0 by default.

Compilation time increases when a query is compiled in several
iterations, as shown below for several TPCDs queries. The increase
is mostly due to redundant work in either single node plan creation
or recomputing value transfer graph phase. For small queries, the
increase can be avoided if they can be compiled in sinlge iteration
by properly setting the smallest threshold among all executor group
sets. For example, for the set of queries listed below, the smallest
threshold can be set to 320MB to catch both q15 and q21 in one
compilation.

  Compilation time (ms)
Queries  Estimated Memory   2-iterations  1-iteration  Percentage of
 increase
 q1 408MB  18.32 13.0140.81%
 q11   1.37GB 186.17 86.28   115.77%
 q10a   519MB 108.27 53.58   102.07%
 q13339MB 118.03 82.4343.19%
 q14a  3.56GB 628.27307.24   104.49%
 q14b  2.20GB 518.79239.05   117.02%
 q15314MB  13.12  4.51   190.91%
 q21275MB  11.04  6.3474.13%
 q23a  1.34GB  458.7227.62   101.52%
 q23b  1.50GB 471.29224.75   109.70%
 q42.60GB 206.34 98.64   109.18%
 q67   5.16GB 691.45336.31   105.60%

Testing:
 1. Almost all FE and BE tests are now run in the artificial two
executor setup except a few where a specific cluster configuration
is desirable;
 2. Ran core tests successfully;
 3. Added a new observability test.

Change-Id: I75cf17290be2c64fd4b732a5505bdac31869712a
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/Frontend.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/ClassUtil.java
M fe/src/main/java/org/apache/impala/util/ExecutorMembershipSnapshot.java
M fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/planner/ClusterSizeTest.java
M tests/common/test_dimensions.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_observability.py
21 files changed, 523 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78

[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory

2022-02-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18178 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory
..


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18178/11/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/18178/11/fe/src/main/java/org/apache/impala/service/Frontend.java@1776
PS11, Line 1776: embershipSnapshot cluster = Exe
> If queryOptions.isEnable_replan() is false, but RuntimeEnv.INSTANCE.isTestE
Good point. Move the RuntimeEnv.INSTANCE.isTestEnv() into the supplier side of 
test_replan argument as

queryOptions.enable_replan && (RuntimeEnv.INSTANCE.isTestEnv()  || 
queryOptions.test_replan())


http://gerrit.cloudera.org:8080/#/c/18178/11/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/18178/11/tests/query_test/test_observability.py@775
PS11, Line 775: results = self.execute_query(query, query_opts)
  : assert results.success
  : runtime_profile = results.runtime_profile
> Why do we need to call execute_query twice?
Done



--
To view, visit http://gerrit.cloudera.org:8080/18178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75cf17290be2c64fd4b732a5505bdac31869712a
Gerrit-Change-Number: 18178
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 14 Feb 2022 17:04:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2019(part-4): Add UTF-8 support for case conversion functions

2022-02-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17785 )

Change subject: IMPALA-2019(part-4): Add UTF-8 support for case conversion 
functions
..


Patch Set 10: Code-Review+1

Thanks a lot for taking care of the comments.


--
To view, visit http://gerrit.cloudera.org:8080/17785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd
Gerrit-Change-Number: 17785
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 14 Feb 2022 17:06:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18178 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10153/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75cf17290be2c64fd4b732a5505bdac31869712a
Gerrit-Change-Number: 18178
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 14 Feb 2022 17:27:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7840/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 17:38:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10931 (part2): Fixed rebased Kudu source to compile

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18155 )

Change subject: IMPALA-10931 (part2): Fixed rebased Kudu source to compile
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7841/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9919c06e60d132d997093abb7b14825847e07c7
Gerrit-Change-Number: 18155
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 14 Feb 2022 18:14:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory

2022-02-14 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/18178 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory
..

IMPALA-10992 Planner changes for estimate peak memory

This patch provides replan support for a set of executor groups. Each
executor group in the set is associated with a distinct number of nodes
and a threshold for estimated memory per host in bytes that can be
denoted as [:<#nodes>, ].

In the patch, a query of type EXPLAIN, QUERY or DML can be compiled
more than once. In each attempt, per host memory is estimated and
compared with the threshold of an executor group. If the estimated
memory is no more than the threshold, the iteration process terminates
and the final plan is determined. The executor group with the threshold
is selected to run the query.

A new query option 'enable_replan', default to 1 (enabled), is added.
It can be set to 0 to disable this patch and to generate the distributed
plan for the default executor group.

To avoid long compilation time, the following enhancement is enabled.
Note 1) and 2) can be disabled when relevant meta-data change is
detected.

 1. Authorization is performed only for the 1st compilation;
 2. The needed meta-data is fetched into a StmtTableCache in 1st
compilation and reused in subsequent compilations;
 3. openTransaction() is called for transactional queries in 1st
compilation and the saved transactional info is used in
subsequent compilations. Similar logic is applied to Kudu
transactional queries.

To facilitate testing, the patch imposes an artificial two executor
group setup in FE as follows.

 1. [regular:<#nodes>, 64MB]
 2. [large:<#nodes>, 8PB]

This setup is enabled when a new query option 'test_replan' is set
to 1 in backend tests, or RuntimeEnv.INSTANCE.isTestEnv() is true as
in most frontend tests. This query option is set to 0 by default.

Compilation time increases when a query is compiled in several
iterations, as shown below for several TPCDs queries. The increase
is mostly due to redundant work in either single node plan creation
or recomputing value transfer graph phase. For small queries, the
increase can be avoided if they can be compiled in sinlge iteration
by properly setting the smallest threshold among all executor group
sets. For example, for the set of queries listed below, the smallest
threshold can be set to 320MB to catch both q15 and q21 in one
compilation.

  Compilation time (ms)
Queries  Estimated Memory   2-iterations  1-iteration  Percentage of
 increase
 q1 408MB  18.32 13.0140.81%
 q11   1.37GB 186.17 86.28   115.77%
 q10a   519MB 108.27 53.58   102.07%
 q13339MB 118.03 82.4343.19%
 q14a  3.56GB 628.27307.24   104.49%
 q14b  2.20GB 518.79239.05   117.02%
 q15314MB  13.12  4.51   190.91%
 q21275MB  11.04  6.3474.13%
 q23a  1.34GB  458.7227.62   101.52%
 q23b  1.50GB 471.29224.75   109.70%
 q42.60GB 206.34 98.64   109.18%
 q67   5.16GB 691.45336.31   105.60%

Testing:
 1. Almost all FE and BE tests are now run in the artificial two
executor setup except a few where a specific cluster configuration
is desirable;
 2. Ran core tests successfully;
 3. Added a new observability test.

Change-Id: I75cf17290be2c64fd4b732a5505bdac31869712a
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/Frontend.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/ClassUtil.java
M fe/src/main/java/org/apache/impala/util/ExecutorMembershipSnapshot.java
M fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/planner/ClusterSizeTest.java
M tests/common/test_dimensions.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_observability.py
21 files changed, 523 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78

[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory

2022-02-14 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18178 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory
..


Patch Set 13:

Rebase and make the observability test with auto-scaling more robust.


--
To view, visit http://gerrit.cloudera.org:8080/18178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75cf17290be2c64fd4b732a5505bdac31869712a
Gerrit-Change-Number: 18178
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 14 Feb 2022 18:19:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18178 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10154/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75cf17290be2c64fd4b732a5505bdac31869712a
Gerrit-Change-Number: 18178
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 14 Feb 2022 18:42:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-14 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Feb 2022 19:06:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10992 Planner changes for estimate peak memory

2022-02-14 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18178 )

Change subject: IMPALA-10992 Planner changes for estimate peak memory
..


Patch Set 13: Code-Review+1

Thanks you to take care the comments.


--
To view, visit http://gerrit.cloudera.org:8080/18178
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I75cf17290be2c64fd4b732a5505bdac31869712a
Gerrit-Change-Number: 18178
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 14 Feb 2022 19:19:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-14 Thread Quanlong Huang (Code Review)
Hello Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18141

to look at the new patch set (#7).

Change subject: WIP IMPALA-10898: Add runtime IN-list filters for ORC tables
..

WIP IMPALA-10898: Add runtime IN-list filters for ORC tables

ORC files have optional bloom filter indexes for each column. Since
ORC-1.7.0, the C++ reader supports pushing down predicates to skip
unreleated RowGroups. The pushed down predicates will be evaludated on
file indexes (i.e. statistics and bloom filter indexes). Note that only
EQUALS and IN-list predicates can leverage bloom filter indexes.

Currently Impala has two kinds of runtime filters: bloom filter and
min-max filter. Unfortunately they can't be converted into EQUALS or
IN-list predicates. So they can't leverage the file level bloom filter
indexes.

This patch adds runtime IN-list filters for this purpose. Currently they
are generated only for small build side (e.g. #rows <= 1024) of a
broadcast join. They will only be applied on ORC tables and be pushed
down to the ORC reader(i.e. ORC lib). To avoid exploding the IN-list,
if #rows of the build side exceeds the threshold (1024), we set the
filter to ALWAYS_TRUE. The threshold can be configured by a new query
option, runtime_in_list_filter_entry_limit.

Example query that will benefit from this patch:
  use tpch_orc_def;
  select count(*) from lineitem_bf join (
select * from partsupp, part
where ps_partkey = p_partkey and p_size = 15
  and p_type like '%BRASS' and ps_availqty < 10) v
  on l_partkey = ps_partkey and l_suppkey = ps_suppkey;

The inline-view populates a runtime IN-list filter with 4 items. Note that
we need to re-generate the lineitem table with bloom filter indexes enabled
(e.g. setting orc.bloom.filter.columns to
"l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity" in
tblproperties before inserting the data), so the runtime IN-list filter
can have a better filter rate.

Evaluating runtime IN-list filters is much slower than evaluating
runtime bloom filters due to the current simple implementation (i.e.
std::unorder_set). So we disable it at row level.

TODO: Codegen InListFilter::Insert() and InListFilter::Find().

For visibility, this patch addes two counters in the HdfsScanNode:
 - NumPushedDownPredicates
 - NumPushedDownRuntimeFilters
They reflect the predicates and runtime filters that are pushed down to
the ORC reader.

Tests:
 - Many planner tests have changes in the runtime filter ids.
 - TODO: Test IN-list filter with NULLs
 - TODO: Test IN-list filter on complex exprs targets
 - TODO: Test IN-list filter on all types including DATE

Change-Id: I25080628233799aa0b6be18d5a832f1385414501
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter-test.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/service/data-stream-service.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/CMakeLists.txt
A be/src/util/in-list-filter-ir.cc
A be/src/util/in-list-filter.cc
A be/src/util/in-list-filter.h
M common/protobuf/data_stream_service.proto
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-multi-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-t

[Impala-ASF-CR] WIP IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: WIP IMPALA-10898: Add runtime IN-list filters for ORC tables
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18141/7/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18141/7/be/src/exec/hdfs-orc-scanner.cc@160
PS7, Line 160:   ADD_COUNTER(scan_node_->runtime_profile(), 
"NumPushedDownRuntimeFilters", TUnit::UNIT);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/18141/7/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/18141/7/tests/query_test/test_runtime_filters.py@70
PS7, Line 70: [
flake8: E131 continuation line unaligned for hanging indent



--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 14 Feb 2022 23:18:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-14 Thread Quanlong Huang (Code Review)
Hello Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18141

to look at the new patch set (#8).

Change subject: WIP IMPALA-10898: Add runtime IN-list filters for ORC tables
..

WIP IMPALA-10898: Add runtime IN-list filters for ORC tables

ORC files have optional bloom filter indexes for each column. Since
ORC-1.7.0, the C++ reader supports pushing down predicates to skip
unreleated RowGroups. The pushed down predicates will be evaludated on
file indexes (i.e. statistics and bloom filter indexes). Note that only
EQUALS and IN-list predicates can leverage bloom filter indexes.

Currently Impala has two kinds of runtime filters: bloom filter and
min-max filter. Unfortunately they can't be converted into EQUALS or
IN-list predicates. So they can't leverage the file level bloom filter
indexes.

This patch adds runtime IN-list filters for this purpose. Currently they
are generated only for small build side (e.g. #rows <= 1024) of a
broadcast join. They will only be applied on ORC tables and be pushed
down to the ORC reader(i.e. ORC lib). To avoid exploding the IN-list,
if #rows of the build side exceeds the threshold (1024), we set the
filter to ALWAYS_TRUE. The threshold can be configured by a new query
option, runtime_in_list_filter_entry_limit.

Example query that will benefit from this patch:
  use tpch_orc_def;
  select count(*) from lineitem_bf join (
select * from partsupp, part
where ps_partkey = p_partkey and p_size = 15
  and p_type like '%BRASS' and ps_availqty < 10) v
  on l_partkey = ps_partkey and l_suppkey = ps_suppkey;

The inline-view populates a runtime IN-list filter with 4 items. Note that
we need to re-generate the lineitem table with bloom filter indexes enabled
(e.g. setting orc.bloom.filter.columns to
"l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity" in
tblproperties before inserting the data), so the runtime IN-list filter
can have a better filter rate.

Evaluating runtime IN-list filters is much slower than evaluating
runtime bloom filters due to the current simple implementation (i.e.
std::unorder_set). So we disable it at row level.

TODO: Codegen InListFilter::Insert() and InListFilter::Find().

For visibility, this patch addes two counters in the HdfsScanNode:
 - NumPushedDownPredicates
 - NumPushedDownRuntimeFilters
They reflect the predicates and runtime filters that are pushed down to
the ORC reader.

Tests:
 - Many planner tests have changes in the runtime filter ids.
 - TODO: Test IN-list filter with NULLs
 - TODO: Test IN-list filter on complex exprs targets
 - TODO: Test IN-list filter on all types including DATE

Change-Id: I25080628233799aa0b6be18d5a832f1385414501
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter-test.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/service/data-stream-service.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/CMakeLists.txt
A be/src/util/in-list-filter-ir.cc
A be/src/util/in-list-filter.cc
A be/src/util/in-list-filter.h
M common/protobuf/data_stream_service.proto
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-multi-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-outer-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-t

[Impala-ASF-CR] WIP IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: WIP IMPALA-10898: Add runtime IN-list filters for ORC tables
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10155/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 14 Feb 2022 23:42:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: WIP IMPALA-10898: Add runtime IN-list filters for ORC tables
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18141/8/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18141/8/be/src/exec/hdfs-orc-scanner.cc@318
PS8, Line 318:   ADD_COUNTER(scan_node_->runtime_profile(), 
"NumPushedDownRuntimeFilters", TUnit::UNIT);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/18141/8/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/18141/8/tests/query_test/test_runtime_filters.py@70
PS8, Line 70: [
flake8: E131 continuation line unaligned for hanging indent



--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 14 Feb 2022 23:41:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: WIP IMPALA-10898: Add runtime IN-list filters for ORC tables
..


Patch Set 8:

> Patch Set 6:
>
> (6 comments)
>
> Thanks!

Thank Qifan! I'll address your comments in the next patch set.

Patch set 7 fixes the failed tests and add two profile counters.
Patch set 8 is a rebase to fix the merge conflicts.


--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 14 Feb 2022 23:43:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-10898: Add runtime IN-list filters for ORC tables

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: WIP IMPALA-10898: Add runtime IN-list filters for ORC tables
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10156/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 15 Feb 2022 00:03:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7840/


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 15 Feb 2022 00:12:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10931 (part2): Fixed rebased Kudu source to compile

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18155 )

Change subject: IMPALA-10931 (part2): Fixed rebased Kudu source to compile
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9919c06e60d132d997093abb7b14825847e07c7
Gerrit-Change-Number: 18155
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 15 Feb 2022 00:52:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2019(part-4): Add UTF-8 support for case conversion functions

2022-02-14 Thread Quanlong Huang (Code Review)
Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17785

to look at the new patch set (#11).

Change subject: IMPALA-2019(part-4): Add UTF-8 support for case conversion 
functions
..

IMPALA-2019(part-4): Add UTF-8 support for case conversion functions

There are 3 builtin case conversion string functions: upper(), lower(),
and initcap(). Previously they only convert English alphabetic
characters. This patch adds support to deal with Unicode characters.

There are many corner cases in case conversion depending on the locale
and context. E.g.
1) Case conversion is locale-sensitive.
Turkish has 4 letter "I"s. English has only two, a lowercase dotted i
and an uppercase dotless I. Turkish has lowercase and uppercase forms of
both dotted and dotless I. So simply converting "i" to "I" for upper
case is wrong in Turkish:
+---++-+
|   | Dotted | Dotless |
+---++-+
| Upper | İ  | I   |
+---++-+
| Lower | i  | ı   |
+---++-+

2) Case conversion may change a string's length.
The German word "grüßen" should be converted to "GRÜSSEN" in upper case:
the letter "ß" should be converted to "SS".

3) Case conversion is context-sensitive.
The Greek word "ὈΔΥΣΣΕΎΣ" should be converted to "ὀδυσσεύς", where the
Greek letter "Σ" is converted to "σ" or to "ς", depending on its
position in the word.

The above cases will be focus in follow-up JIRAs. This patch addes the
initial implementation of UTF-8 aware case conversion functions.


Implementation:
In UTF-8 mode (turned on by set UTF8_MODE=true) of these functions, the
bytes in strings are converted to wide characters using std::mbrtowc().
Each wide character (wchar_t) will then be converted using std::towupper
or std::towlower correspondingly. We then convert them back to multi
bytes using std::wcrtomb().

Note that these builtins are locale aware. If impalad is launched
without a UTF-8 aware locale, e.g. LC_ALL="C", these builtins can't
recognize non-ascii characters, which will return unexpected results.
Thus we modify our docker images to set LC_ALL="C.UTF-8" instead of "C".
This patch also logs the current locale when launching impala daemons
for better debugging. We will support customized locale in IMPALA-11080.

Test:
 - Add BE unit tests and e2e tests.

Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd
---
M be/src/common/init.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M docker/daemon_entrypoint.sh
M docker/test-with-docker.py
M 
testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
8 files changed, 249 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/17785/11
--
To view, visit http://gerrit.cloudera.org:8080/17785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd
Gerrit-Change-Number: 17785
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-2019(part-4): Add UTF-8 support for case conversion functions

2022-02-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17785 )

Change subject: IMPALA-2019(part-4): Add UTF-8 support for case conversion 
functions
..


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17785/10/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17785/10/be/src/exprs/string-functions-ir.cc@379
PS10, Line 379:   uint8_t* result_ptr = result.ptr;
> This will be null if the allocation fails in the constructor. We should han
Thanks for catching this!


http://gerrit.cloudera.org:8080/#/c/17785/10/be/src/exprs/string-functions-ir.cc@383
PS10, Line 383: byte sequ
> My understanding is that we should use the same mbstate_t during the proces
Done


http://gerrit.cloudera.org:8080/#/c/17785/10/be/src/exprs/string-functions-ir.cc@422
PS10, Line 422: eturn StringVal::nul
> We shouldn't need this, Resize() should return false on error.
You are right. We probably need to change L599 as well.



--
To view, visit http://gerrit.cloudera.org:8080/17785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd
Gerrit-Change-Number: 17785
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 15 Feb 2022 07:07:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2019(part-4): Add UTF-8 support for case conversion functions

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17785 )

Change subject: IMPALA-2019(part-4): Add UTF-8 support for case conversion 
functions
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10157/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd
Gerrit-Change-Number: 17785
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 15 Feb 2022 07:29:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10961: Implementing adaptive 3-way quicksort in sorter

2022-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18184 )

Change subject: IMPALA-10961: Implementing adaptive 3-way quicksort in sorter
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7842/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e7b36a04a43de3b83e6aeee49ca0943f0bf202
Gerrit-Change-Number: 18184
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 15 Feb 2022 07:52:18 +
Gerrit-HasComments: No