[Impala-ASF-CR] IMPALA-10791 Add batching reading for remote temporary files

2021-10-28 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batching reading for remote temporary files
..


Patch Set 1:

(37 comments)

http://gerrit.cloudera.org:8080/#/c/17979/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17979/1//COMMIT_MSG@38
PS1, Line 38: now the system
> You mean Impala or OS?. Is it a guarantee?
changed, should be the impala process memory. The mem_limit should be 80% by 
default.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@47
PS1, Line 47: DONE
> nit. WRITTEN probably is better.
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@55
PS1, Line 55: DISABLED
> nit. Could be UNINIT too?
I am a little worried about the memory leak.
So the final state of the memory block needs to be MemBlockStatus::DISABLED, 
and every block needs to be called Delete() before destroyed to prevent any 
memory not released.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@62
PS1, Line 62: poool
> nit.
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@70
PS1, Line 70:  bool IsDisabled() {
: std::unique_lock l(lock_);
: return IsDisabledLocked(l);
:   }
:
:   bool IsDisabledLocked(const std::unique_lock& lock) {
: DCHECK(lock.mutex() == _ && lock.owns_lock());
: return status_ == MemBlockStatus::DISABLED;
:   }
> Can this pair of methods to be simplified as follows?
I would like to keep the caller away from passing the enum, therefore design it 
this way, the benefit is that the caller doesn't need to pass the enum.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@193
PS1, Line 193: lock_
> A little bit confusing when read code which applies lock_. since there are
Changed the name to memory_block_lock_.

I think the benefit of using the memory block is that it may reduce potential 
contention. For example, if using disk file lock, which is a spinlock, only one 
block out of four (or more) can be written at a time, and also the status 
changing.

The rule now is the disk lock is to protect the members actual_file_size_ and 
read_buffer_ctrl_ (exclude the memory blocks). The members of memory blocks are 
protected by their own locks.

If the caller worries the memory block to be destructed while using, the caller 
can first hold the physical_file_lock_, which is a shared lock, to protect the 
while file from destruction.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@292
PS1, Line 292:   std::lock_guard l(lock_);
 : return actual_file_size_;
> actual_file_size_ is AtomicInt64. Do we need to lock with lock_ here first?
Changed the actual_file_size_ to ordinary int64_t. Just think maybe it is 
better to be managed by the lock_ like others.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@379
PS1, Line 379: num_of_read_buffer
> nit. num_of_read_buffers.
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@381
PS1, Line 381: equals
> nit equal
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@391
PS1, Line 391: DCHECK_LT(buffer_idx, num_of_read_buffer());
 : DCHECK_GE(buffer_idx, 0);
> Should call DCheckReadBufferIdx().
Thanks, done.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@393
PS1, Line 393:  std::lock_guard lock(lock_);
> should we lock it first before check?
It should be okay without the lock, because the check is to use the index to 
compare with constant variables.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@415
PS1, Line 415:  DCheckReadBufferIdx(buffer_idx);
 : std::lock_guard lock(lock_);
> Lock before check?
same as above.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@509
PS1, Line 509: lock_
> Suggest to rename as disk_file_lock_.
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.cc
File be/src/runtime/io/disk-file.cc:

http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.cc@123
PS1, Line 123:  *allocated = true;
> If the memory is deleted, the state in its control block should become rese
Added some comments for MemBlock in disk-file.h.
The MemBlock must be in MemBlockStatus::DISABLED before destruction, and the 
Delete() would be needed to call before destruction and it will set the state 
to MemBlockStatus::DISABLED for all cases.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:


[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17948 )

Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9694/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17948
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43
Gerrit-Change-Number: 17948
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 29 Oct 2021 01:30:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala

2021-10-28 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17948 )

Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
..

WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala

There are some API changes in the new version of protobuf library.
This patch makes necessary changes for Impala code to pass compiling.
It also adds a work around for the static initializer issue, which
is caused by the conflicts between two copies of protobuf library
in Impala's binary, one came from Kudu client shared library.

Testing:
 - Passed an core DEBUG build.
 - TODO pass exhaustive DEBUG build and a core ASAN build

Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43
---
M be/src/kudu/rpc/exactly_once_rpc-test.cc
M be/src/kudu/rpc/inbound_call.cc
M be/src/kudu/rpc/outbound_call.cc
M be/src/kudu/rpc/result_tracker.h
M be/src/kudu/rpc/serialization.cc
M be/src/kudu/rpc/serialization.h
M be/src/kudu/util/pb_util.cc
M be/src/kudu/util/protobuf_util.h
M be/src/rpc/rpc-mgr.cc
M bin/impala-config.sh
10 files changed, 113 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17948/3
--
To view, visit http://gerrit.cloudera.org:8080/17948
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43
Gerrit-Change-Number: 17948
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala

2021-10-28 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17948 )

Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc
File be/src/rpc/rpc-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc@100
PS2, Line 100: // Since Kudu client shared library has its own protobuf, there 
are two copies of protobuf
 : // library in Impala's binary. Each copy has independent 
descriptor pool, but they
 : // interact each other at static-initialization time, which 
cause AddDescriptors() is not
 : // called for .proto files for Impala's protobuf.
 : // This function load descriptors for 
google/protobuf/descriptor.proto, which are used by
 : // rpc_header.proto.
> Just trying to wrap my head around the root cause here. Am I correct in und
Your understanding is correct.


http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc@108
PS2, Line 108: static bool CheckProtobufDescriptorProto() {
> Given the issue we're fixing is quite nuanced, perhaps it's worth adding li
Added the link in the comments.



--
To view, visit http://gerrit.cloudera.org:8080/17948
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43
Gerrit-Change-Number: 17948
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 29 Oct 2021 00:54:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala

2021-10-28 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17948 )

Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc
File be/src/rpc/rpc-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc@100
PS2, Line 100: // Since Kudu client shared library has its own protobuf, there 
are two copies of protobuf
 : // library in Impala's binary. Each copy has independent 
descriptor pool, but they
 : // interact each other at static-initialization time, which 
cause AddDescriptors() is not
 : // called for .proto files for Impala's protobuf.
 : // This function load descriptors for 
google/protobuf/descriptor.proto, which are used by
 : // rpc_header.proto.
Just trying to wrap my head around the root cause here. Am I correct in 
understand that:
- As stated, each protobuf library has its own independent descriptor pool.
- The Kudu library calls AddDescriptors()
- The Impala library is initialized later, and doesn't call AddDescriptors() 
because descriptor_table_google_2fprotobuf_2fdescriptor_2eproto.is_initialized 
is already true
- Hence, the Impala descriptor pool is missing its descriptors

?


http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc@108
PS2, Line 108: static bool CheckProtobufDescriptorProto() {
Given the issue we're fixing is quite nuanced, perhaps it's worth adding links 
that have a bit more context, like 
https://stackoverflow.com/questions/29960871/protobuf-message-object-creation-by-name?



--
To view, visit http://gerrit.cloudera.org:8080/17948
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43
Gerrit-Change-Number: 17948
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Fri, 29 Oct 2021 00:29:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10791 Add batching reading for remote temporary files

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batching reading for remote temporary files
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9693/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Fri, 29 Oct 2021 00:28:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17948 )

Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/9692/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17948
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43
Gerrit-Change-Number: 17948
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Fri, 29 Oct 2021 00:14:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10791 Add batching reading for remote temporary files

2021-10-28 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batching reading for remote temporary files
..

IMPALA-10791 Add batching reading for remote temporary files

The patch adds a feature to batching read from a remote temporary
file in order to improve the reading performance for the spilled
remote data.

Originally, the design is to use the local disk file as the buffer
for batching reading from the remote file. But in practice, it
doesn't help to improve the performance. Therefore, the design
is changed to use the memory as the read buffer.

Currently, each TmpFileRemote has two DiskFile, one is for the
remote, and one is for the local buffer. The patch adds MemBlocks
to the local buffer file. Each local buffer file is divided into
several MemBlocks evenly, but in order to guarantee a page not
being cut into two parts in different blocks, the block size
could be a little different to each other in practice. The default
block size is the minimum value between 1/4 default file size and
MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_MB, which is 16MB.

When pinning a page, the system will detect if there is enough
memory for the block that holds the page, if not, we will go
reading the page directly and disable this block, because it may
be good to avoid duplicated reads from the remote fs for the same
content. If the system decides to fetch a block, the block will be
stored in the memory until all of the pages in the block are read
or the query ends.

One challenge of using the memory for the buffer is that, when the
system is lacking of memory when it needs to spill the data. So we
make a restriction to limit the percentage of the memory for the
read buffer to 5% of the total, because right now the impala
process will reserve 20% memory as unused memory by default, using
5% for the emergency case like spilling is reasonable.

Two start options have been added for the new feature.

1. remote_batching_read. Default is false. If set true, the batching
read is enabled.
2. remote_read_memory_buffer_size. Default is 1G. The maximum memory
that can be used by the read buffer. The number also restricted by
the total system memory, which can not exceed 5% of the total memory.

The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB
from 256 to 512.

Tests:
Ran core and exhaustive tests.
Added and ran TmpFileMgrTest::TestBatchingReadFromRemote.

Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
---
M be/src/runtime/io/disk-file.cc
M be/src/runtime/io/disk-file.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
M be/src/runtime/tmp-file-mgr-internal.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M common/thrift/metrics.json
11 files changed, 1,098 insertions(+), 133 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/3
--
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala

2021-10-28 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17948


Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
..

WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala

There are some API changes in the new version of protobuf library.
This patch makes necessary changes for Impala code to pass compiling.
It also adds a work around for the static initializer issue, which
is caused by the conflicts between two copies of protobuf library
in Impala's binary, one came from Kudu client shared library.

Testing:
 - Passed an core DEBUG build.
 - TODO pass exhaustive DEBUG build and a core ASAN build

Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43
---
M be/src/kudu/rpc/exactly_once_rpc-test.cc
M be/src/kudu/rpc/inbound_call.cc
M be/src/kudu/rpc/outbound_call.cc
M be/src/kudu/rpc/result_tracker.h
M be/src/kudu/rpc/serialization.cc
M be/src/kudu/rpc/serialization.h
M be/src/kudu/util/pb_util.cc
M be/src/kudu/util/protobuf_util.h
M be/src/rpc/rpc-mgr.cc
M bin/impala-config.sh
10 files changed, 111 insertions(+), 49 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17948/2
--
To view, visit http://gerrit.cloudera.org:8080/17948
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43
Gerrit-Change-Number: 17948
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.

2021-10-28 Thread Qifan Chen (Code Review)
Qifan Chen has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materialization of columns for filtered out 
rows in Parquet table.
..

IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet 
table.

Currently, entire row is materialized before filtering during scan.
Instead of paying the cost of materializing upfront, for columnar
formats we can avoid doing it for rows that are filtered out.
Columns that are required for filtering are the only ones that are
needed to be materialized before filtering. For rest of the columns,
materialization can be delayed and be done only for rows that survive.
This patch implements this technique for Parquet format only.

New configuration 'parquet_materialization_threshold' is introduced,
which is minimum number of consecutive rows that are filtered out
to avoid materialization. If set to less than 0, it disables the
late materialization.

Performance:
Peformance measured for single daemon, single threaded impalad
upon TPCH scale 42 lineitem table with 252 million rows,
unsorted data. Upto 2.5x improvement for non-page indexed and
upto 4x improvement in page index seen. Queries for page index
borrowed from blog:
https://blog.cloudera.com/speeding-up-select-queries-with-parquet-page-indexes/
More details:
https://docs.google.com/spreadsheets/d/17s5OLaFOPo-64kimAPP6n3kJA42vM-iVT24OvsQgfuA/edit?usp=sharing

Testing:
 1. Ran existing tests
 2. Added UT for 'ScratchTupleBatch::GetMicroBatch'
 3. Added end-to-end test for late materialization.
Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Reviewed-on: http://gerrit.cloudera.org:8080/17860
Tested-by: Impala Public Jenkins 
Reviewed-by: Qifan Chen 
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-columnar-scanner-ir.cc
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-collection-column-reader.cc
M be/src/exec/parquet/parquet-collection-column-reader.h
M be/src/exec/parquet/parquet-column-chunk-reader.cc
M be/src/exec/parquet/parquet-column-chunk-reader.h
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
A be/src/exec/scratch-tuple-batch-test.cc
M be/src/exec/scratch-tuple-batch.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/tuple-row-compare.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization.test
A tests/query_test/test_parquet_late_materialization.py
22 files changed, 1,070 insertions(+), 52 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Qifan Chen: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 19
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.

2021-10-28 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materialization of columns for filtered out 
rows in Parquet table.
..


Patch Set 18: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 18
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Oct 2021 20:09:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9691/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 19:46:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-28 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49
PS2, Line 49: 3.6 33.6 33.8 0.888X 0.884X  0.88X
:   from_unixtime(0,'-MM-dd')
> Some unoptimal change in patch set 2. Might be the reinterpret_cast thing f
Patch set 3 shows even faster result after we use FastInt32ToBufferLeft and 
local char array.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47
PS1, Line 47:
> Right, this shouldn't be named ISO at all. I think I will change it to DEFA
This is removed in patch set 3.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81
PS1, Line 81: ,
> The difference is in the cast direction. All previous tokenizer have PARSE
I get what you're saying now. The CastDirection does not differentiate the 
resulting DateTimeFormatContext. So we can actually reuse the existing 
DateTimeFormatContext, even if they're tokenized with PARSE direction. In this 
case, this should be the same as DEFAULT_SHORT_DATE_TIME_CTX.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86
PS1, Line 86: case 5:   // hh:mm
> Difference is in the cast direction as well. This one has FORMAT direction.
This is removed in patch set 3. We reuse DEFAULT_DATE_TIME_CTX[9] instead.


http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h@79
PS3, Line 79:   /// max_length -- the maximum length of characters that 'dst' 
can hold. Only used for
:   ///   assertion in debug build.
I need to update this comment in next patch set, since we're enforcing the 
max_length in patch set 3.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341
PS2, Line 341:
> Good point. Having on-stack char array should also drop the necessity of tm
Done


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351
PS2, Line 351:   DCHECK(!d.is_special());
> Will do.
Done.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc@208
PS1, Line 208:   } else {
> This still copy on return in patch set 2, but only for usages outside of ex
It looks like returning string by value is actually OK, since std::string has 
move constructor.
https://www.delftstack.com/howto/cpp/return-string-from-function-cpp/



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 19:35:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting

2021-10-28 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17980

to look at the new patch set (#3).

Change subject: IMPALA-10984: Improve TimestampValue to String casting
..

IMPALA-10984: Improve TimestampValue to String casting

TimestampValue::ToString was implemented by concatenating
boost::gregorian::to_iso_extended_string and
boost::posix_time::to_simple_string using stringstream. This involves
multiple string allocations, copying, and might hit lock within
tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that
touches this function can be inefficient if the expression is being
evaluated for millions of rows.

This patch reimplement TimestampValue::ToString by supplying default
DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss"
will be picked as the default format if the time_ component does not
have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will
be picked as the default format. The chosen DateTimeFormatContext then
passed to TimestampParser::Format along with date_ and time_ to be
formatted into the string representation. Int to string parsing method
is replaced with FastInt32ToBufferLeft in TimestampParser::Format.

This patch gives > 10X performance improvement for CAST timestamp to
string and FROM_UNIXTIME without a date-time pattern, as shown by the
following benchmark (modified from expr-benchmark), before and after the
patch.

Before the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:   Function  iters/ms   10%ile   50%ile   90%ile 
10%ile 50%ile 90%ile
 
(relative) (relative) (relative)
-
literal   37.4 38.6 39.1
 1X 1X 1X
  cast(now() as string)2.2 2.28 2.31
0.0589X0.0591X0.0591X
from_unixtime(0,'-MM-dd HH:mm:ss')   5.94 6.18 6.23 
0.159X  0.16X 0.159X
  from_unixtime(0,'-MM-dd')   11.5 11.8 11.9 
0.308X 0.305X 0.304X
   from_unixtime(0)   2.26 2.35 2.39
0.0606X 0.061X0.0612X

After the patch:
Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
FromUnixCodegen:   Function  iters/ms   10%ile   50%ile   90%ile 
10%ile 50%ile 90%ile
 
(relative) (relative) (relative)
-
literal   37.9   38 38.4
 1X 1X 1X
  cast(now() as string)   29.3 29.3 29.4 
0.773X  0.77X 0.767X
from_unixtime(0,'-MM-dd HH:mm:ss')   33.6 33.6 33.8 
0.888X 0.884X  0.88X
  from_unixtime(0,'-MM-dd')   49.9 49.9 50.3  
1.32X  1.31X  1.31X
   from_unixtime(0)   33.1 33.2 33.5 
0.875X 0.872X 0.873X

The literal expression used as the baseline in this benchmark is
"cast('2012-01-01 09:10:11.123456789' as timestamp)".

This patch also updates numbers in expr-benchmark for
BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear
its MemPool in between benchmark iteration so that it does not run out
of memory.

Testing:
- Pass core tests.

Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.cc
M be/src/runtime/datetime-iso-sql-format-tokenizer.h
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/datetime-simple-date-format-parser.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/runtime/timestamp-parse-util.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
13 files changed, 209 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/3
--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public 

[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17982 )

Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage 
doesn't work as expected
..

IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as 
expected

Until this patch buildall.sh -release_and_debug -codecoverage didn't
do a coverage build at all. It only did a release build, then a debug
build.

With this patch the above command creates a release+coverage build,
then a debug+coverage build. After each build it saves the .gcno
files to a directory in $IMPALA_HOME (gcov_release and gcov_debug).

These .gcno files are needed to generate code coverage reports later.

Testing:
 * manually tested by invoking buildall.sh -release_and_debug \
   -codecoverage
   and
   buildall.sh -release -codecoverage

Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
Reviewed-on: http://gerrit.cloudera.org:8080/17982
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M buildall.sh
1 file changed, 33 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
Gerrit-Change-Number: 17982
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17982 )

Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage 
doesn't work as expected
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
Gerrit-Change-Number: 17982
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Oct 2021 19:00:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materialization of columns for filtered out 
rows in Parquet table.
..


Patch Set 18: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 18
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Oct 2021 18:04:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast expr in some situations

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17933 )

Change subject: IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast 
expr in some situations
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8fac7100060d4e139a8b24d4795c6f279c55954
Gerrit-Change-Number: 17933
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 28 Oct 2021 16:56:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-9498: Allow returning arrays in select list

2021-10-28 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17811 )

Change subject: WIP IMPALA-9498: Allow returning arrays in select list
..


Patch Set 15:

(19 comments)

I went through the frontend code only, here are a few thoughts.

http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1296
PS15, Line 1296: /**
   :* Returns an existing or new SlotDescriptor for the given 
path. Always returns
   :* a new empty SlotDescriptor for paths with a 
collection-typed destination.
   :*/
I think this comment refers to the 1-argument overload on L1293.
Also, the argument 'duplicateCollections' should be explained.


http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1315
PS15, Line 1315: SlotDescriptor result = null;
   : if (slotPath.destType().isArrayType()) {
   :   Path currentPath = slotPath;
   :   List rawPath = currentPath.getRawPath();
   :   boolean first_level = true;
   :   while (currentPath.destType().isArrayType()) {
   : TableRef tblRef = new TableRef(rawPath, null);
   :
   : CollectionTableRef collTblRef = new 
CollectionTableRef(tblRef, currentPath, true);
   : collTblRef.analyze(this);
   :
   : 
Preconditions.checkState(collTblRef.getCollectionExpr() != null);
   : 
Preconditions.checkState(collTblRef.getCollectionExpr() instanceof SlotRef);
   : SlotDescriptor desc = ((SlotRef) 
collTblRef.getCollectionExpr()).getDesc();
   : desc.setIsMaterializedWithParent(true);
   : if (first_level) {
   :   result = desc;
   :   first_level = false;
   : }
   :
   : // Resolve path
   : List rawPathToItem = new ArrayList();
   : rawPathToItem.add("item");
   :
   : Path resolvedPathToItem = new 
Path(collTblRef.getDesc(), rawPathToItem);
   : boolean isResolved = resolvedPathToItem.resolve();
   : Preconditions.checkState(isResolved);
   :
   : rawPath = rawPathToItem;
   : currentPath = resolvedPathToItem;
   :   }
   :   if (currentPath.destType().isStructType()) {
   : throw new AnalysisException(
   :   "STRUCT type inside collection types is not 
supported.");
   :   }
   :   SlotDescriptor slotDesc = 
this.registerSlotRef(currentPath);
   :   slotDesc.setStats(new ColumnStats(slotDesc.getType()));
   :   slotDesc.setIsMaterialized(true);
   : } else {
Optional: the handling of arrays could be extracted to a function.


http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java
File fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java:

http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java@25
PS15, Line 25: implies its
 :  * flattening during execution
Is it still true? The TODO on L104 indicates it is not.


http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java@52
PS15, Line 52: /**
 :* Create a CollectionTableRef from the original unresolved 
table ref as well as
 :* its resolved path. Sets table aliases and join-related 
attributes.
 :*/
This comment should be for the 2 argument version on L48. The boolean argument 
could also be explained (though admittedly its meaning is pretty clear).


http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java@91
PS15, Line 91: i
Typo: y


http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java@92
PS15, Line 92: i
Typo: y


http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
File fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java:

http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@344
PS15, Line 344: hasOnlyUnionAllOps
Do we only have to do this check if there are only UNION ALL operations? What 
if we 

[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-28 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47
PS1, Line 47: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX
> I am not sure what ISO means here exactly, but in the other constants it le
Right, this shouldn't be named ISO at all. I think I will change it to 
DEFAULT_SHORT_DATE_TIME_FRACTIONAL_FORMATTER_CTX.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81
PS1, Line 81:
> Is this different than DEFAULT_DATE_TIME_CTX[0]?
The difference is in the cast direction. All previous tokenizer have PARSE 
direction. This one has FORMAT direction.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86
PS1, Line 86: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX
> Is this different than DEFAULT_DATE_TIME_CTX[9]?
Difference is in the cast direction as well. This one has FORMAT direction.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 15:43:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-28 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..


Patch Set 2:

(4 comments)

Thank you, Csaba, for the comments! Will address them in next patch set.

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49
PS2, Line 49:  4.91 4.91 0.128X 0.128X 0.128X
: from_unixtime(0,'-MM-dd HH:mm:ss')
> Wonder why this became slow during the last patch.
Some unoptimal change in patch set 2. Might be the reinterpret_cast thing from 
uint8_t* to char* in TimestampValue::ToStringVal. Will take a look at it during 
next iteration.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341
PS2, Line 341: tmp_buff
> I think that we can easily avoid using the tmp_buff here.
Good point. Having on-stack char array should also drop the necessity of 
tmp_string_ in FunctionContextImpl. Will try it in next patch set.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@344
PS2, Line 344: for (int i = written_length; i < tok.len; i++) {
 :   *(dst + pos) = '0';
 :   pos++;
 : }
> If I understand this correctly, then we are writing to the end of the buffe
That is correct.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351
PS2, Line 351: DCHECK_LE(pos, max_length) << "Maximum buffer length 
exceeded!";
> I am a bit concerned that at the time we hit this DCHECK, we have already w
Will do.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 15:38:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WiP: IMPALA-10920: Unnest function for zipping unnest arrays

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17983 )

Change subject: WiP: IMPALA-10920: Unnest function for zipping unnest arrays
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9690/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic58ff6579ecff03962e7a8698edfbe0684ce6cf7
Gerrit-Change-Number: 17983
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Oct 2021 13:38:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] WiP: IMPALA-10920: Unnest function for zipping unnest arrays

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17983 )

Change subject: WiP: IMPALA-10920: Unnest function for zipping unnest arrays
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17983/1/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/17983/1/tests/query_test/test_nested_types.py@196
PS1, Line 196: class TestZippingUnnest(ImpalaTestSuite):
flake8: E302 expected 2 blank lines, found 1



--
To view, visit http://gerrit.cloudera.org:8080/17983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic58ff6579ecff03962e7a8698edfbe0684ce6cf7
Gerrit-Change-Number: 17983
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Oct 2021 13:15:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WiP: IMPALA-10920: Unnest function for zipping unnest arrays

2021-10-28 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17983


Change subject: WiP: IMPALA-10920: Unnest function for zipping unnest arrays
..

WiP: IMPALA-10920: Unnest function for zipping unnest arrays

The SQL standard compliant syntax is kind of ready.

TODO: The Postgres kind of syntax where UNNEST() is in the select
list.

Change-Id: Ic58ff6579ecff03962e7a8698edfbe0684ce6cf7
---
M be/src/exec/unnest-node.cc
M be/src/exec/unnest-node.h
M common/thrift/PlanNodes.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnnestNode.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/ComplexTypesTbl/arrays.orc
A testdata/ComplexTypesTbl/arrays.parq
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-from-clause.test
M tests/query_test/test_nested_types.py
22 files changed, 847 insertions(+), 119 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/17983/1
--
To view, visit http://gerrit.cloudera.org:8080/17983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic58ff6579ecff03962e7a8698edfbe0684ce6cf7
Gerrit-Change-Number: 17983
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 


[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString

2021-10-28 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17980 )

Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49
PS2, Line 49:  4.91 4.91 0.128X 0.128X 0.128X
: from_unixtime(0,'-MM-dd HH:mm:ss')
Wonder why this became slow during the last patch.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc
File be/src/runtime/datetime-simple-date-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47
PS1, Line 47: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX
I am not sure what ISO means here exactly, but in the other constants it leads 
to a "T" between the date and time part, while in your new format it is just 
related to the subseconds part.


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81
PS1, Line 81:
Is this different than DEFAULT_DATE_TIME_CTX[0]?


http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86
PS1, Line 86: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX
Is this different than DEFAULT_DATE_TIME_CTX[9]?


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341
PS2, Line 341: tmp_buff
I think that we can easily avoid using the tmp_buff here.
As num_val is a non-negative int32, we could simply use an on-stack char array 
of size 10, and copy from there.

There is also an alternative to snprintf that we use in int->string casts, 
FastInt32ToBufferLeft() from gutil:
https://github.com/apache/impala/blob/master/be/src/gutil/strings/numbers.h#L171
We use it here: 
https://github.com/apache/impala/blob/master/be/src/exprs/cast-functions-ir.cc#L156

We could use FastInt32ToBufferLeft() to print to an array on stack, and then 
calculate the number of '0's to prepend from the result, and then copy it to 
dst.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@344
PS2, Line 344: for (int i = written_length; i < tok.len; i++) {
 :   *(dst + pos) = '0';
 :   pos++;
 : }
If I understand this correctly, then we are writing to the end of the buffer 
here, while we used to write to the beginning of the tmp_str.


http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351
PS2, Line 351: DCHECK_LE(pos, max_length) << "Maximum buffer length 
exceeded!";
I am a bit concerned that at the time we hit this DCHECK, we have already 
written outside the buffer in the AppendToBuffer functions. It would be safer 
and probably not too slow to pass max_length as an argument to 
AppendToBuffer(s) and skip the copy there if it has overflown.



--
To view, visit http://gerrit.cloudera.org:8080/17980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61
Gerrit-Change-Number: 17980
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 28 Oct 2021 12:45:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17982 )

Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage 
doesn't work as expected
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
Gerrit-Change-Number: 17982
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Oct 2021 12:45:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17982 )

Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage 
doesn't work as expected
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7578/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
Gerrit-Change-Number: 17982
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Oct 2021 12:45:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materialization of columns for filtered out 
rows in Parquet table.
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9689/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 18
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Oct 2021 11:58:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materialization of columns for filtered out 
rows in Parquet table.
..


Patch Set 18:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7577/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 18
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Oct 2021 11:41:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.

2021-10-28 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materialization of columns for filtered out 
rows in Parquet table.
..

IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet 
table.

Currently, entire row is materialized before filtering during scan.
Instead of paying the cost of materializing upfront, for columnar
formats we can avoid doing it for rows that are filtered out.
Columns that are required for filtering are the only ones that are
needed to be materialized before filtering. For rest of the columns,
materialization can be delayed and be done only for rows that survive.
This patch implements this technique for Parquet format only.

New configuration 'parquet_materialization_threshold' is introduced,
which is minimum number of consecutive rows that are filtered out
to avoid materialization. If set to less than 0, it disables the
late materialization.

Performance:
Peformance measured for single daemon, single threaded impalad
upon TPCH scale 42 lineitem table with 252 million rows,
unsorted data. Upto 2.5x improvement for non-page indexed and
upto 4x improvement in page index seen. Queries for page index
borrowed from blog:
https://blog.cloudera.com/speeding-up-select-queries-with-parquet-page-indexes/
More details:
https://docs.google.com/spreadsheets/d/17s5OLaFOPo-64kimAPP6n3kJA42vM-iVT24OvsQgfuA/edit?usp=sharing

Testing:
 1. Ran existing tests
 2. Added UT for 'ScratchTupleBatch::GetMicroBatch'
 3. Added end-to-end test for late materialization.
Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-columnar-scanner-ir.cc
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-collection-column-reader.cc
M be/src/exec/parquet/parquet-collection-column-reader.h
M be/src/exec/parquet/parquet-column-chunk-reader.cc
M be/src/exec/parquet/parquet-column-chunk-reader.h
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
A be/src/exec/scratch-tuple-batch-test.cc
M be/src/exec/scratch-tuple-batch.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/tuple-row-compare.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization.test
A tests/query_test/test_parquet_late_materialization.py
22 files changed, 1,070 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/17860/18
--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 18
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected

2021-10-28 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17982 )

Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage 
doesn't work as expected
..


Patch Set 3: Code-Review+2

LGTM


--
To view, visit http://gerrit.cloudera.org:8080/17982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
Gerrit-Change-Number: 17982
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Oct 2021 10:57:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17982 )

Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage 
doesn't work as expected
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9688/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
Gerrit-Change-Number: 17982
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Oct 2021 10:57:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast expr in some situations

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17933 )

Change subject: IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast 
expr in some situations
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7576/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8fac7100060d4e139a8b24d4795c6f279c55954
Gerrit-Change-Number: 17933
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 28 Oct 2021 10:45:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected

2021-10-28 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17982 )

Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage 
doesn't work as expected
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17982/2/buildall.sh
File buildall.sh:

http://gerrit.cloudera.org:8080/#/c/17982/2/buildall.sh@464
PS2, Line 464: local build_type=$1
 :   local gcov_
> Both of these variables should be declared "local" (see L489). The name "bu
Thanks for catching this. Done.



--
To view, visit http://gerrit.cloudera.org:8080/17982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
Gerrit-Change-Number: 17982
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Oct 2021 10:36:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected

2021-10-28 Thread Zoltan Borok-Nagy (Code Review)
Hello Laszlo Gaal, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17982

to look at the new patch set (#3).

Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage 
doesn't work as expected
..

IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as 
expected

Until this patch buildall.sh -release_and_debug -codecoverage didn't
do a coverage build at all. It only did a release build, then a debug
build.

With this patch the above command creates a release+coverage build,
then a debug+coverage build. After each build it saves the .gcno
files to a directory in $IMPALA_HOME (gcov_release and gcov_debug).

These .gcno files are needed to generate code coverage reports later.

Testing:
 * manually tested by invoking buildall.sh -release_and_debug \
   -codecoverage
   and
   buildall.sh -release -codecoverage

Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
---
M buildall.sh
1 file changed, 33 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/17982/3
--
To view, visit http://gerrit.cloudera.org:8080/17982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
Gerrit-Change-Number: 17982
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 23:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/9687/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 23
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Thu, 28 Oct 2021 10:25:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..


Patch Set 23:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17859/23/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17859/23/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2402
PS23, Line 2402:   batchEvents = eventFactory.createBatchEvents(mockEvents, 
eventsProcessor_.getMetrics());
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17859/23/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
File 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17859/23/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@85
PS23, Line 85: private static boolean flagEnableCatalogCache 
,flagInvalidateCache, flagSyncToLatestEventId;
line too long (96 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 23
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Thu, 28 Oct 2021 10:05:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints

2021-10-28 Thread Sourabh Goyal (Code Review)
Hello Vihang Karajgaonkar, kis...@cloudera.com, Yu-Wen Lai, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17859

to look at the new patch set (#23).

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
..

IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when 
performing
DDL operations via catalog HMS endpoints

Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/EventFactory.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/test/java/org/apache/impala/catalog/AlterDatabaseTest.java
A fe/src/test/java/org/apache/impala/catalog/MetastoreApiTestUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/AbstractCatalogMetastoreTest.java
A 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M tests/custom_cluster/test_metastore_service.py
26 files changed, 3,362 insertions(+), 282 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17859/23
--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 23
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast expr in some situations

2021-10-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17933 )

Change subject: IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast 
expr in some situations
..


Patch Set 9:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9686/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8fac7100060d4e139a8b24d4795c6f279c55954
Gerrit-Change-Number: 17933
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 28 Oct 2021 06:46:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast expr in some situations

2021-10-28 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17933 )

Change subject: IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast 
expr in some situations
..


Patch Set 9:

Modify text plan content in 'insert-sort-by.test' due to 
'PlannerTest.testInsertSortBy' failed. Here is the new Jenkins url: 
https://jenkins.impala.io/job/pre-review-test/1189/


--
To view, visit http://gerrit.cloudera.org:8080/17933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8fac7100060d4e139a8b24d4795c6f279c55954
Gerrit-Change-Number: 17933
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 28 Oct 2021 06:27:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast expr in some situations

2021-10-28 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/17933 )

Change subject: IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast 
expr in some situations
..

IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast expr in some 
situations

This patch adds a new expr rewrite rule to simplify some cast expr when
cast target data type is the same as inner expr data type. We will
remove unnecessary cast expr if any rules are matched. This kind of
rewrite will improve query performance when casting a non-partition
column, especially when scanning lots of data. Besides, cast expr in
where clause can not pushdown to Kudu server, if we can remove
unnecessary cast expr, Impala will pushdown this predicate to Kudu
server, and this will save lots of time and IO/memmory.

Testing:
- Added unit test cases in `ExprRewriteRulesTest`

Change-Id: Id8fac7100060d4e139a8b24d4795c6f279c55954
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/rewrite/SimplifyCastExprRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
4 files changed, 146 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/17933/9
--
To view, visit http://gerrit.cloudera.org:8080/17933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id8fac7100060d4e139a8b24d4795c6f279c55954
Gerrit-Change-Number: 17933
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng