[Impala-ASF-CR] IMPALA-10934: Enable table definition over a single file
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17878 ) Change subject: IMPALA-10934: Enable table definition over a single file .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17878/2/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17878/2/be/src/runtime/io/disk-io-mgr.cc@142 PS2, Line 142: // The maximum number of SFS I/O threads. : DEFINE_int32(num_sfs_io_threads, 16, "Number of SFS I/O threads"); > SFS maps down to some other storage type, and it should be using the logic As a second thought on this, if we are ok with turning off file handle caching, then this approach could work. We are losing some understanding of the underlying storage and some performance, but this is only one file. We should verify the data cache works, and that would generally handle most use cases. -- To view, visit http://gerrit.cloudera.org:8080/17878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32be936243aa4c8320f5d06d2b7fbf98822f82e7 Gerrit-Change-Number: 17878 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Fri, 29 Oct 2021 06:05:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10791 Add batching reading for remote temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batching reading for remote temporary files .. Patch Set 1: (37 comments) http://gerrit.cloudera.org:8080/#/c/17979/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17979/1//COMMIT_MSG@38 PS1, Line 38: now the system > You mean Impala or OS?. Is it a guarantee? changed, should be the impala process memory. The mem_limit should be 80% by default. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@47 PS1, Line 47: DONE > nit. WRITTEN probably is better. Done http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@55 PS1, Line 55: DISABLED > nit. Could be UNINIT too? I am a little worried about the memory leak. So the final state of the memory block needs to be MemBlockStatus::DISABLED, and every block needs to be called Delete() before destroyed to prevent any memory not released. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@62 PS1, Line 62: poool > nit. Done http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@70 PS1, Line 70: bool IsDisabled() { : std::unique_lock l(lock_); : return IsDisabledLocked(l); : } : : bool IsDisabledLocked(const std::unique_lock& lock) { : DCHECK(lock.mutex() == &lock_ && lock.owns_lock()); : return status_ == MemBlockStatus::DISABLED; : } > Can this pair of methods to be simplified as follows? I would like to keep the caller away from passing the enum, therefore design it this way, the benefit is that the caller doesn't need to pass the enum. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@193 PS1, Line 193: lock_ > A little bit confusing when read code which applies lock_. since there are Changed the name to memory_block_lock_. I think the benefit of using the memory block is that it may reduce potential contention. For example, if using disk file lock, which is a spinlock, only one block out of four (or more) can be written at a time, and also the status changing. The rule now is the disk lock is to protect the members actual_file_size_ and read_buffer_ctrl_ (exclude the memory blocks). The members of memory blocks are protected by their own locks. If the caller worries the memory block to be destructed while using, the caller can first hold the physical_file_lock_, which is a shared lock, to protect the while file from destruction. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@292 PS1, Line 292: std::lock_guard l(lock_); : return actual_file_size_; > actual_file_size_ is AtomicInt64. Do we need to lock with lock_ here first? Changed the actual_file_size_ to ordinary int64_t. Just think maybe it is better to be managed by the lock_ like others. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@379 PS1, Line 379: num_of_read_buffer > nit. num_of_read_buffers. Done http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@381 PS1, Line 381: equals > nit equal Done http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@391 PS1, Line 391: DCHECK_LT(buffer_idx, num_of_read_buffer()); : DCHECK_GE(buffer_idx, 0); > Should call DCheckReadBufferIdx(). Thanks, done. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@393 PS1, Line 393: std::lock_guard lock(lock_); > should we lock it first before check? It should be okay without the lock, because the check is to use the index to compare with constant variables. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@415 PS1, Line 415: DCheckReadBufferIdx(buffer_idx); : std::lock_guard lock(lock_); > Lock before check? same as above. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@509 PS1, Line 509: lock_ > Suggest to rename as disk_file_lock_. Done http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.cc File be/src/runtime/io/disk-file.cc: http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.cc@123 PS1, Line 123: *allocated = true; > If the memory is deleted, the state in its control block should become rese Added some comments for MemBlock in disk-file.h. The MemBlock must be in MemBlockStatus::DISABLED before destruction, and the Delete() would be needed to call before destruction and it will set the state to MemBlockStatus::DISABLED for all cases. http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.o
[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17948 ) Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9694/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17948 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43 Gerrit-Change-Number: 17948 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 29 Oct 2021 01:30:19 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
Wenzhe Zhou has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17948 ) Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala .. WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala There are some API changes in the new version of protobuf library. This patch makes necessary changes for Impala code to pass compiling. It also adds a work around for the static initializer issue, which is caused by the conflicts between two copies of protobuf library in Impala's binary, one came from Kudu client shared library. Testing: - Passed an core DEBUG build. - TODO pass exhaustive DEBUG build and a core ASAN build Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43 --- M be/src/kudu/rpc/exactly_once_rpc-test.cc M be/src/kudu/rpc/inbound_call.cc M be/src/kudu/rpc/outbound_call.cc M be/src/kudu/rpc/result_tracker.h M be/src/kudu/rpc/serialization.cc M be/src/kudu/rpc/serialization.h M be/src/kudu/util/pb_util.cc M be/src/kudu/util/protobuf_util.h M be/src/rpc/rpc-mgr.cc M bin/impala-config.sh 10 files changed, 113 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17948/3 -- To view, visit http://gerrit.cloudera.org:8080/17948 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43 Gerrit-Change-Number: 17948 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17948 ) Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc File be/src/rpc/rpc-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc@100 PS2, Line 100: // Since Kudu client shared library has its own protobuf, there are two copies of protobuf : // library in Impala's binary. Each copy has independent descriptor pool, but they : // interact each other at static-initialization time, which cause AddDescriptors() is not : // called for .proto files for Impala's protobuf. : // This function load descriptors for google/protobuf/descriptor.proto, which are used by : // rpc_header.proto. > Just trying to wrap my head around the root cause here. Am I correct in und Your understanding is correct. http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc@108 PS2, Line 108: static bool CheckProtobufDescriptorProto() { > Given the issue we're fixing is quite nuanced, perhaps it's worth adding li Added the link in the comments. -- To view, visit http://gerrit.cloudera.org:8080/17948 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43 Gerrit-Change-Number: 17948 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 29 Oct 2021 00:54:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17948 ) Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc File be/src/rpc/rpc-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc@100 PS2, Line 100: // Since Kudu client shared library has its own protobuf, there are two copies of protobuf : // library in Impala's binary. Each copy has independent descriptor pool, but they : // interact each other at static-initialization time, which cause AddDescriptors() is not : // called for .proto files for Impala's protobuf. : // This function load descriptors for google/protobuf/descriptor.proto, which are used by : // rpc_header.proto. Just trying to wrap my head around the root cause here. Am I correct in understand that: - As stated, each protobuf library has its own independent descriptor pool. - The Kudu library calls AddDescriptors() - The Impala library is initialized later, and doesn't call AddDescriptors() because descriptor_table_google_2fprotobuf_2fdescriptor_2eproto.is_initialized is already true - Hence, the Impala descriptor pool is missing its descriptors ? http://gerrit.cloudera.org:8080/#/c/17948/2/be/src/rpc/rpc-mgr.cc@108 PS2, Line 108: static bool CheckProtobufDescriptorProto() { Given the issue we're fixing is quite nuanced, perhaps it's worth adding links that have a bit more context, like https://stackoverflow.com/questions/29960871/protobuf-message-object-creation-by-name? -- To view, visit http://gerrit.cloudera.org:8080/17948 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43 Gerrit-Change-Number: 17948 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Fri, 29 Oct 2021 00:29:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10791 Add batching reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batching reading for remote temporary files .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9693/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Fri, 29 Oct 2021 00:28:42 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17948 ) Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/9692/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17948 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43 Gerrit-Change-Number: 17948 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Fri, 29 Oct 2021 00:14:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batching reading for remote temporary files
Yida Wu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batching reading for remote temporary files .. IMPALA-10791 Add batching reading for remote temporary files The patch adds a feature to batching read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batching reading from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly, but in order to guarantee a page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between 1/4 default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_MB, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page, if not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. If the system decides to fetch a block, the block will be stored in the memory until all of the pages in the block are read or the query ends. One challenge of using the memory for the buffer is that, when the system is lacking of memory when it needs to spill the data. So we make a restriction to limit the percentage of the memory for the read buffer to 5% of the total, because right now the impala process will reserve 20% memory as unused memory by default, using 5% for the emergency case like spilling is reasonable. Two start options have been added for the new feature. 1. remote_batching_read. Default is false. If set true, the batching read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number also restricted by the total system memory, which can not exceed 5% of the total memory. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchingReadFromRemote. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b --- M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M common/thrift/metrics.json 11 files changed, 1,098 insertions(+), 133 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/3 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala
Wenzhe Zhou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17948 Change subject: WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala .. WIP IMPALA-10931: Integrate protobuf 3.14.0 library with Impala There are some API changes in the new version of protobuf library. This patch makes necessary changes for Impala code to pass compiling. It also adds a work around for the static initializer issue, which is caused by the conflicts between two copies of protobuf library in Impala's binary, one came from Kudu client shared library. Testing: - Passed an core DEBUG build. - TODO pass exhaustive DEBUG build and a core ASAN build Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43 --- M be/src/kudu/rpc/exactly_once_rpc-test.cc M be/src/kudu/rpc/inbound_call.cc M be/src/kudu/rpc/outbound_call.cc M be/src/kudu/rpc/result_tracker.h M be/src/kudu/rpc/serialization.cc M be/src/kudu/rpc/serialization.h M be/src/kudu/util/pb_util.cc M be/src/kudu/util/protobuf_util.h M be/src/rpc/rpc-mgr.cc M bin/impala-config.sh 10 files changed, 111 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17948/2 -- To view, visit http://gerrit.cloudera.org:8080/17948 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia1df4faceff9fda169c9d15fe8b1e69cfabe0d43 Gerrit-Change-Number: 17948 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.
Qifan Chen has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17860 ) Change subject: IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table. .. IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table. Currently, entire row is materialized before filtering during scan. Instead of paying the cost of materializing upfront, for columnar formats we can avoid doing it for rows that are filtered out. Columns that are required for filtering are the only ones that are needed to be materialized before filtering. For rest of the columns, materialization can be delayed and be done only for rows that survive. This patch implements this technique for Parquet format only. New configuration 'parquet_materialization_threshold' is introduced, which is minimum number of consecutive rows that are filtered out to avoid materialization. If set to less than 0, it disables the late materialization. Performance: Peformance measured for single daemon, single threaded impalad upon TPCH scale 42 lineitem table with 252 million rows, unsorted data. Upto 2.5x improvement for non-page indexed and upto 4x improvement in page index seen. Queries for page index borrowed from blog: https://blog.cloudera.com/speeding-up-select-queries-with-parquet-page-indexes/ More details: https://docs.google.com/spreadsheets/d/17s5OLaFOPo-64kimAPP6n3kJA42vM-iVT24OvsQgfuA/edit?usp=sharing Testing: 1. Ran existing tests 2. Added UT for 'ScratchTupleBatch::GetMicroBatch' 3. Added end-to-end test for late materialization. Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Reviewed-on: http://gerrit.cloudera.org:8080/17860 Tested-by: Impala Public Jenkins Reviewed-by: Qifan Chen --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-columnar-scanner-ir.cc M be/src/exec/hdfs-columnar-scanner.cc M be/src/exec/hdfs-columnar-scanner.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-collection-column-reader.cc M be/src/exec/parquet/parquet-collection-column-reader.h M be/src/exec/parquet/parquet-column-chunk-reader.cc M be/src/exec/parquet/parquet-column-chunk-reader.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h A be/src/exec/scratch-tuple-batch-test.cc M be/src/exec/scratch-tuple-batch.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/tuple-row-compare.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test A testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization.test A tests/query_test/test_parquet_late_materialization.py 22 files changed, 1,070 insertions(+), 52 deletions(-) Approvals: Impala Public Jenkins: Verified Qifan Chen: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 19 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 ) Change subject: IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table. .. Patch Set 18: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 18 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Oct 2021 20:09:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9691/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 28 Oct 2021 19:46:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve TimestampValue to String casting .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49 PS2, Line 49: 3.6 33.6 33.8 0.888X 0.884X 0.88X : from_unixtime(0,'-MM-dd') > Some unoptimal change in patch set 2. Might be the reinterpret_cast thing f Patch set 3 shows even faster result after we use FastInt32ToBufferLeft and local char array. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc File be/src/runtime/datetime-simple-date-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47 PS1, Line 47: > Right, this shouldn't be named ISO at all. I think I will change it to DEFA This is removed in patch set 3. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81 PS1, Line 81: , > The difference is in the cast direction. All previous tokenizer have PARSE I get what you're saying now. The CastDirection does not differentiate the resulting DateTimeFormatContext. So we can actually reuse the existing DateTimeFormatContext, even if they're tokenized with PARSE direction. In this case, this should be the same as DEFAULT_SHORT_DATE_TIME_CTX. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86 PS1, Line 86: case 5: // hh:mm > Difference is in the cast direction as well. This one has FORMAT direction. This is removed in patch set 3. We reuse DEFAULT_DATE_TIME_CTX[9] instead. http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/17980/3/be/src/runtime/timestamp-parse-util.h@79 PS3, Line 79: /// max_length -- the maximum length of characters that 'dst' can hold. Only used for : /// assertion in debug build. I need to update this comment in next patch set, since we're enforcing the max_length in patch set 3. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341 PS2, Line 341: > Good point. Having on-stack char array should also drop the necessity of tm Done http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351 PS2, Line 351: DCHECK(!d.is_special()); > Will do. Done. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/timestamp-value.cc@208 PS1, Line 208: } else { > This still copy on return in patch set 2, but only for usages outside of ex It looks like returning string by value is actually OK, since std::string has move constructor. https://www.delftstack.com/howto/cpp/return-string-from-function-cpp/ -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 28 Oct 2021 19:35:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve TimestampValue to String casting
Hello Kurt Deschler, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17980 to look at the new patch set (#3). Change subject: IMPALA-10984: Improve TimestampValue to String casting .. IMPALA-10984: Improve TimestampValue to String casting TimestampValue::ToString was implemented by concatenating boost::gregorian::to_iso_extended_string and boost::posix_time::to_simple_string using stringstream. This involves multiple string allocations, copying, and might hit lock within tcmalloc::CentralFreeList. FROM_UNIXTIME and CAST expression that touches this function can be inefficient if the expression is being evaluated for millions of rows. This patch reimplement TimestampValue::ToString by supplying default DateTimeFormatContext if no pattern was specified. "-MM-dd HH:mm:ss" will be picked as the default format if the time_ component does not have fractional seconds. Otherwise, "-MM-dd HH:mm:ss.S" will be picked as the default format. The chosen DateTimeFormatContext then passed to TimestampParser::Format along with date_ and time_ to be formatted into the string representation. Int to string parsing method is replaced with FastInt32ToBufferLeft in TimestampParser::Format. This patch gives > 10X performance improvement for CAST timestamp to string and FROM_UNIXTIME without a date-time pattern, as shown by the following benchmark (modified from expr-benchmark), before and after the patch. Before the patch: Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz FromUnixCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) - literal 37.4 38.6 39.1 1X 1X 1X cast(now() as string)2.2 2.28 2.31 0.0589X0.0591X0.0591X from_unixtime(0,'-MM-dd HH:mm:ss') 5.94 6.18 6.23 0.159X 0.16X 0.159X from_unixtime(0,'-MM-dd') 11.5 11.8 11.9 0.308X 0.305X 0.304X from_unixtime(0) 2.26 2.35 2.39 0.0606X 0.061X0.0612X After the patch: Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz FromUnixCodegen: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) - literal 37.9 38 38.4 1X 1X 1X cast(now() as string) 29.3 29.3 29.4 0.773X 0.77X 0.767X from_unixtime(0,'-MM-dd HH:mm:ss') 33.6 33.6 33.8 0.888X 0.884X 0.88X from_unixtime(0,'-MM-dd') 49.9 49.9 50.3 1.32X 1.31X 1.31X from_unixtime(0) 33.1 33.2 33.5 0.875X 0.872X 0.873X The literal expression used as the baseline in this benchmark is "cast('2012-01-01 09:10:11.123456789' as timestamp)". This patch also updates numbers in expr-benchmark for BenchmarkTimestampFunctions and tidy up expr-benchmark a bit to clear its MemPool in between benchmark iteration so that it does not run out of memory. Testing: - Pass core tests. Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.cc M be/src/runtime/datetime-iso-sql-format-tokenizer.h M be/src/runtime/datetime-parser-common.h M be/src/runtime/datetime-simple-date-format-parser.cc M be/src/runtime/datetime-simple-date-format-parser.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-parse-util.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h 13 files changed, 209 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17980/3 -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkin
[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17982 ) Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as expected .. IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as expected Until this patch buildall.sh -release_and_debug -codecoverage didn't do a coverage build at all. It only did a release build, then a debug build. With this patch the above command creates a release+coverage build, then a debug+coverage build. After each build it saves the .gcno files to a directory in $IMPALA_HOME (gcov_release and gcov_debug). These .gcno files are needed to generate code coverage reports later. Testing: * manually tested by invoking buildall.sh -release_and_debug \ -codecoverage and buildall.sh -release -codecoverage Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c Reviewed-on: http://gerrit.cloudera.org:8080/17982 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M buildall.sh 1 file changed, 33 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c Gerrit-Change-Number: 17982 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17982 ) Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as expected .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c Gerrit-Change-Number: 17982 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Oct 2021 19:00:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 ) Change subject: IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table. .. Patch Set 18: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 18 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Oct 2021 18:04:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast expr in some situations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17933 ) Change subject: IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast expr in some situations .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17933 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id8fac7100060d4e139a8b24d4795c6f279c55954 Gerrit-Change-Number: 17933 Gerrit-PatchSet: 10 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xianqing He Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 28 Oct 2021 16:56:18 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-9498: Allow returning arrays in select list
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/17811 ) Change subject: WIP IMPALA-9498: Allow returning arrays in select list .. Patch Set 15: (19 comments) I went through the frontend code only, here are a few thoughts. http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1296 PS15, Line 1296: /** :* Returns an existing or new SlotDescriptor for the given path. Always returns :* a new empty SlotDescriptor for paths with a collection-typed destination. :*/ I think this comment refers to the 1-argument overload on L1293. Also, the argument 'duplicateCollections' should be explained. http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1315 PS15, Line 1315: SlotDescriptor result = null; : if (slotPath.destType().isArrayType()) { : Path currentPath = slotPath; : List rawPath = currentPath.getRawPath(); : boolean first_level = true; : while (currentPath.destType().isArrayType()) { : TableRef tblRef = new TableRef(rawPath, null); : : CollectionTableRef collTblRef = new CollectionTableRef(tblRef, currentPath, true); : collTblRef.analyze(this); : : Preconditions.checkState(collTblRef.getCollectionExpr() != null); : Preconditions.checkState(collTblRef.getCollectionExpr() instanceof SlotRef); : SlotDescriptor desc = ((SlotRef) collTblRef.getCollectionExpr()).getDesc(); : desc.setIsMaterializedWithParent(true); : if (first_level) { : result = desc; : first_level = false; : } : : // Resolve path : List rawPathToItem = new ArrayList(); : rawPathToItem.add("item"); : : Path resolvedPathToItem = new Path(collTblRef.getDesc(), rawPathToItem); : boolean isResolved = resolvedPathToItem.resolve(); : Preconditions.checkState(isResolved); : : rawPath = rawPathToItem; : currentPath = resolvedPathToItem; : } : if (currentPath.destType().isStructType()) { : throw new AnalysisException( : "STRUCT type inside collection types is not supported."); : } : SlotDescriptor slotDesc = this.registerSlotRef(currentPath); : slotDesc.setStats(new ColumnStats(slotDesc.getType())); : slotDesc.setIsMaterialized(true); : } else { Optional: the handling of arrays could be extracted to a function. http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java File fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java: http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java@25 PS15, Line 25: implies its : * flattening during execution Is it still true? The TODO on L104 indicates it is not. http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java@52 PS15, Line 52: /** :* Create a CollectionTableRef from the original unresolved table ref as well as :* its resolved path. Sets table aliases and join-related attributes. :*/ This comment should be for the 2 argument version on L48. The boolean argument could also be explained (though admittedly its meaning is pretty clear). http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java@91 PS15, Line 91: i Typo: y http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java@92 PS15, Line 92: i Typo: y http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java File fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java: http://gerrit.cloudera.org:8080/#/c/17811/15/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@344 PS15, Line 344: hasOnlyUnionAllOps Do we only have to do this check if there are only UNION ALL operations? What if we ha
[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc File be/src/runtime/datetime-simple-date-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47 PS1, Line 47: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX > I am not sure what ISO means here exactly, but in the other constants it le Right, this shouldn't be named ISO at all. I think I will change it to DEFAULT_SHORT_DATE_TIME_FRACTIONAL_FORMATTER_CTX. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81 PS1, Line 81: > Is this different than DEFAULT_DATE_TIME_CTX[0]? The difference is in the cast direction. All previous tokenizer have PARSE direction. This one has FORMAT direction. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86 PS1, Line 86: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX > Is this different than DEFAULT_DATE_TIME_CTX[9]? Difference is in the cast direction as well. This one has FORMAT direction. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 28 Oct 2021 15:43:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString .. Patch Set 2: (4 comments) Thank you, Csaba, for the comments! Will address them in next patch set. http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49 PS2, Line 49: 4.91 4.91 0.128X 0.128X 0.128X : from_unixtime(0,'-MM-dd HH:mm:ss') > Wonder why this became slow during the last patch. Some unoptimal change in patch set 2. Might be the reinterpret_cast thing from uint8_t* to char* in TimestampValue::ToStringVal. Will take a look at it during next iteration. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341 PS2, Line 341: tmp_buff > I think that we can easily avoid using the tmp_buff here. Good point. Having on-stack char array should also drop the necessity of tmp_string_ in FunctionContextImpl. Will try it in next patch set. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@344 PS2, Line 344: for (int i = written_length; i < tok.len; i++) { : *(dst + pos) = '0'; : pos++; : } > If I understand this correctly, then we are writing to the end of the buffe That is correct. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351 PS2, Line 351: DCHECK_LE(pos, max_length) << "Maximum buffer length exceeded!"; > I am a bit concerned that at the time we hit this DCHECK, we have already w Will do. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 28 Oct 2021 15:38:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WiP: IMPALA-10920: Unnest function for zipping unnest arrays
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17983 ) Change subject: WiP: IMPALA-10920: Unnest function for zipping unnest arrays .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9690/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic58ff6579ecff03962e7a8698edfbe0684ce6cf7 Gerrit-Change-Number: 17983 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 28 Oct 2021 13:38:14 + Gerrit-HasComments: No
[Impala-ASF-CR] WiP: IMPALA-10920: Unnest function for zipping unnest arrays
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17983 ) Change subject: WiP: IMPALA-10920: Unnest function for zipping unnest arrays .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17983/1/tests/query_test/test_nested_types.py File tests/query_test/test_nested_types.py: http://gerrit.cloudera.org:8080/#/c/17983/1/tests/query_test/test_nested_types.py@196 PS1, Line 196: class TestZippingUnnest(ImpalaTestSuite): flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/17983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic58ff6579ecff03962e7a8698edfbe0684ce6cf7 Gerrit-Change-Number: 17983 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 28 Oct 2021 13:15:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WiP: IMPALA-10920: Unnest function for zipping unnest arrays
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17983 Change subject: WiP: IMPALA-10920: Unnest function for zipping unnest arrays .. WiP: IMPALA-10920: Unnest function for zipping unnest arrays The SQL standard compliant syntax is kind of ready. TODO: The Postgres kind of syntax where UNNEST() is in the select list. Change-Id: Ic58ff6579ecff03962e7a8698edfbe0684ce6cf7 --- M be/src/exec/unnest-node.cc M be/src/exec/unnest-node.h M common/thrift/PlanNodes.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnnestNode.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java A testdata/ComplexTypesTbl/arrays.orc A testdata/ComplexTypesTbl/arrays.parq M testdata/data/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-from-clause.test M tests/query_test/test_nested_types.py 22 files changed, 847 insertions(+), 119 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/17983/1 -- To view, visit http://gerrit.cloudera.org:8080/17983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic58ff6579ecff03962e7a8698edfbe0684ce6cf7 Gerrit-Change-Number: 17983 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-10984: Improve performance of TimestampValue::ToString
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17980 ) Change subject: IMPALA-10984: Improve performance of TimestampValue::ToString .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17980/2//COMMIT_MSG@49 PS2, Line 49: 4.91 4.91 0.128X 0.128X 0.128X : from_unixtime(0,'-MM-dd HH:mm:ss') Wonder why this became slow during the last patch. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc File be/src/runtime/datetime-simple-date-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@47 PS1, Line 47: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX I am not sure what ISO means here exactly, but in the other constants it leads to a "T" between the date and time part, while in your new format it is just related to the subseconds part. http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@81 PS1, Line 81: Is this different than DEFAULT_DATE_TIME_CTX[0]? http://gerrit.cloudera.org:8080/#/c/17980/1/be/src/runtime/datetime-simple-date-format-parser.cc@86 PS1, Line 86: DEFAULT_ISO_DATE_TIME_FORMATTER_CTX Is this different than DEFAULT_DATE_TIME_CTX[9]? http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@341 PS2, Line 341: tmp_buff I think that we can easily avoid using the tmp_buff here. As num_val is a non-negative int32, we could simply use an on-stack char array of size 10, and copy from there. There is also an alternative to snprintf that we use in int->string casts, FastInt32ToBufferLeft() from gutil: https://github.com/apache/impala/blob/master/be/src/gutil/strings/numbers.h#L171 We use it here: https://github.com/apache/impala/blob/master/be/src/exprs/cast-functions-ir.cc#L156 We could use FastInt32ToBufferLeft() to print to an array on stack, and then calculate the number of '0's to prepend from the result, and then copy it to dst. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@344 PS2, Line 344: for (int i = written_length; i < tok.len; i++) { : *(dst + pos) = '0'; : pos++; : } If I understand this correctly, then we are writing to the end of the buffer here, while we used to write to the beginning of the tmp_str. http://gerrit.cloudera.org:8080/#/c/17980/2/be/src/runtime/timestamp-parse-util.cc@351 PS2, Line 351: DCHECK_LE(pos, max_length) << "Maximum buffer length exceeded!"; I am a bit concerned that at the time we hit this DCHECK, we have already written outside the buffer in the AppendToBuffer functions. It would be safer and probably not too slow to pass max_length as an argument to AppendToBuffer(s) and skip the copy there if it has overflown. -- To view, visit http://gerrit.cloudera.org:8080/17980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fcb4545d9c9a3fdb38c4db58bb4b1321a429d61 Gerrit-Change-Number: 17980 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 28 Oct 2021 12:45:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17982 ) Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as expected .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c Gerrit-Change-Number: 17982 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Oct 2021 12:45:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17982 ) Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as expected .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7578/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c Gerrit-Change-Number: 17982 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Oct 2021 12:45:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 ) Change subject: IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table. .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9689/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 18 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Oct 2021 11:58:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 ) Change subject: IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table. .. Patch Set 18: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7577/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 18 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Oct 2021 11:41:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.
Amogh Margoor has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/17860 ) Change subject: IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table. .. IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table. Currently, entire row is materialized before filtering during scan. Instead of paying the cost of materializing upfront, for columnar formats we can avoid doing it for rows that are filtered out. Columns that are required for filtering are the only ones that are needed to be materialized before filtering. For rest of the columns, materialization can be delayed and be done only for rows that survive. This patch implements this technique for Parquet format only. New configuration 'parquet_materialization_threshold' is introduced, which is minimum number of consecutive rows that are filtered out to avoid materialization. If set to less than 0, it disables the late materialization. Performance: Peformance measured for single daemon, single threaded impalad upon TPCH scale 42 lineitem table with 252 million rows, unsorted data. Upto 2.5x improvement for non-page indexed and upto 4x improvement in page index seen. Queries for page index borrowed from blog: https://blog.cloudera.com/speeding-up-select-queries-with-parquet-page-indexes/ More details: https://docs.google.com/spreadsheets/d/17s5OLaFOPo-64kimAPP6n3kJA42vM-iVT24OvsQgfuA/edit?usp=sharing Testing: 1. Ran existing tests 2. Added UT for 'ScratchTupleBatch::GetMicroBatch' 3. Added end-to-end test for late materialization. Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-columnar-scanner-ir.cc M be/src/exec/hdfs-columnar-scanner.cc M be/src/exec/hdfs-columnar-scanner.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-collection-column-reader.cc M be/src/exec/parquet/parquet-collection-column-reader.h M be/src/exec/parquet/parquet-column-chunk-reader.cc M be/src/exec/parquet/parquet-column-chunk-reader.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h A be/src/exec/scratch-tuple-batch-test.cc M be/src/exec/scratch-tuple-batch.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/tuple-row-compare.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test A testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization.test A tests/query_test/test_parquet_late_materialization.py 22 files changed, 1,070 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/17860/18 -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 18 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17982 ) Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as expected .. Patch Set 3: Code-Review+2 LGTM -- To view, visit http://gerrit.cloudera.org:8080/17982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c Gerrit-Change-Number: 17982 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Oct 2021 10:57:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17982 ) Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as expected .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9688/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c Gerrit-Change-Number: 17982 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Oct 2021 10:57:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast expr in some situations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17933 ) Change subject: IMPALA-10836: Add 'SimplifyCastExprRule' rule to rewrite cast expr in some situations .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7576/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17933 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id8fac7100060d4e139a8b24d4795c6f279c55954 Gerrit-Change-Number: 17933 Gerrit-PatchSet: 10 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xianqing He Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 28 Oct 2021 10:45:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17982 ) Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as expected .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17982/2/buildall.sh File buildall.sh: http://gerrit.cloudera.org:8080/#/c/17982/2/buildall.sh@464 PS2, Line 464: local build_type=$1 : local gcov_ > Both of these variables should be declared "local" (see L489). The name "bu Thanks for catching this. Done. -- To view, visit http://gerrit.cloudera.org:8080/17982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c Gerrit-Change-Number: 17982 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Oct 2021 10:36:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10899: buildall.sh -release and debug -codecoverage doesn't work as expected
Hello Laszlo Gaal, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17982 to look at the new patch set (#3). Change subject: IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as expected .. IMPALA-10899: buildall.sh -release_and_debug -codecoverage doesn't work as expected Until this patch buildall.sh -release_and_debug -codecoverage didn't do a coverage build at all. It only did a release build, then a debug build. With this patch the above command creates a release+coverage build, then a debug+coverage build. After each build it saves the .gcno files to a directory in $IMPALA_HOME (gcov_release and gcov_debug). These .gcno files are needed to generate code coverage reports later. Testing: * manually tested by invoking buildall.sh -release_and_debug \ -codecoverage and buildall.sh -release -codecoverage Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c --- M buildall.sh 1 file changed, 33 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/17982/3 -- To view, visit http://gerrit.cloudera.org:8080/17982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c Gerrit-Change-Number: 17982 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 23: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9687/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 23 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Thu, 28 Oct 2021 10:25:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17859 ) Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. Patch Set 23: (2 comments) http://gerrit.cloudera.org:8080/#/c/17859/23/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17859/23/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2402 PS23, Line 2402: batchEvents = eventFactory.createBatchEvents(mockEvents, eventsProcessor_.getMetrics()); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17859/23/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java File fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java: http://gerrit.cloudera.org:8080/#/c/17859/23/fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java@85 PS23, Line 85: private static boolean flagEnableCatalogCache ,flagInvalidateCache, flagSyncToLatestEventId; line too long (96 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 23 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai Gerrit-Comment-Date: Thu, 28 Oct 2021 10:05:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints
Hello Vihang Karajgaonkar, kis...@cloudera.com, Yu-Wen Lai, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17859 to look at the new patch set (#23). Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints .. IMPALA-10926: Sync db/table in catalog cache to latest HMS event id when performing DDL operations via catalog HMS endpoints Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/EventFactory.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java M fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/catalog/metastore/HmsApiNameEnum.java M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/test/java/org/apache/impala/catalog/AlterDatabaseTest.java A fe/src/test/java/org/apache/impala/catalog/MetastoreApiTestUtils.java M fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java M fe/src/test/java/org/apache/impala/catalog/metastore/AbstractCatalogMetastoreTest.java A fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M tests/custom_cluster/test_metastore_service.py 26 files changed, 3,362 insertions(+), 282 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/17859/23 -- To view, visit http://gerrit.cloudera.org:8080/17859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I36364e401911352c474eb98c8d61bbaae9b9 Gerrit-Change-Number: 17859 Gerrit-PatchSet: 23 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yu-Wen Lai