[jira] [Updated] (IMPALA-12711) DDL/DML errors are not shown in impalad logs
[ https://issues.apache.org/jira/browse/IMPALA-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12711: Affects Version/s: Impala 4.3.0 Impala 4.1.2 Impala 4.1.1 Impala 4.2.0 Impala 4.1.0 > DDL/DML errors are not shown in impalad logs > > > Key: IMPALA-12711 > URL: https://issues.apache.org/jira/browse/IMPALA-12711 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, > Impala 4.3.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Major > > Since IMPALA-10811, DDLs are executed in an async thread by default. The > errors are not logged after that. For instance, run "INVALIDATE METADATA a.b" > on an inexisting table, the error shown in the client is "ERROR: > TableNotFoundException: Table not found: a.b". However, in the impalad logs, > it looks like the statement succeeds. > {noformat} > I0115 13:47:43.256397 23443 Frontend.java:2072] > dc497affd5678498:365a4600] Analyzing query: INVALIDATE METADATA a.b > db: default > I0115 13:47:43.256489 23443 Frontend.java:2084] > dc497affd5678498:365a4600] The original executor group sets from > executor membership snapshot: [TExecutorGroupSet(curr_num_executors:3, > expected_num_executors:20, exec_group_name_prefix:)] > I0115 13:47:43.256561 23443 RequestPoolService.java:200] > dc497affd5678498:365a4600] Default pool only, scheduler allocation is > not specified. > I0115 13:47:43.256652 23443 Frontend.java:2104] > dc497affd5678498:365a4600] A total of 1 executor group sets to be > considered for auto-scaling: [TExecutorGroupSet(curr_num_executors:3, > expected_num_executors:20, exec_group_name_prefix:, > max_mem_limit:9223372036854775807, num_cores_per_executor:2147483647)] > I0115 13:47:43.256775 23443 Frontend.java:2138] > dc497affd5678498:365a4600] Consider executor group set: > TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, > exec_group_name_prefix:, max_mem_limit:9223372036854775807, > num_cores_per_executor:2147483647) with assumption of 0 cores per node. > I0115 13:47:43.263244 23443 AnalysisContext.java:508] > dc497affd5678498:365a4600] Analysis took 4 ms > I0115 13:47:43.264606 23443 BaseAuthorizationChecker.java:114] > dc497affd5678498:365a4600] Authorization check took 1 ms > I0115 13:47:43.264681 23443 Frontend.java:2400] > dc497affd5678498:365a4600] Analysis and authorization finished. > I0115 13:47:43.301832 23443 Frontend.java:2319] > dc497affd5678498:365a4600] Selected executor group: > TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, > exec_group_name_prefix:, max_mem_limit:9223372036854775807, > num_cores_per_executor:2147483647), reason: query is not auto-scalable > I0115 13:47:43.305258 23443 client-request-state.cc:785] > dc497affd5678498:365a4600] DDL exec mode=asynchronous > I0115 13:47:43.306367 23443 impala-hs2-server.cc:573] ExecuteStatement(): > return_val=TExecuteStatementResp { > 01: status (struct) = TStatus { > 01: statusCode (i32) = 0, > }, > 02: operationHandle (struct) = TOperationHandle { > 01: operationId (struct) = THandleIdentifier { > 01: guid (string) = "\x98\x84g\xd5\xffzI\xdc\x00\x00\x00\x00\x00FZ6", > 02: secret (string) = "", > }, > 02: operationType (i32) = 0, > 03: hasResultSet (bool) = false, > }, > } > I0115 13:47:43.509263 23443 impala-hs2-server.cc:887] CloseOperation(): > query_id=dc497affd5678498:365a4600 > I0115 13:47:43.509281 23443 impala-server.cc:1554] UnregisterQuery(): > query_id=dc497affd5678498:365a4600 > I0115 13:47:43.509642 23298 impala-server.cc:1586] Query successfully > unregistered: query_id=dc497affd5678498:365a4600{noformat} > If the DDL is executed with "set enable_async_ddl_execution=false", the error > is shown in the logs: > {noformat} > I0115 13:48:31.054708 23794 Frontend.java:2072] > 8a48ab2ae184395d:dd53cfc1] Analyzing query: INVALIDATE METADATA a.b > db: default > I0115 13:48:31.054780 23794 Frontend.java:2084] > 8a48ab2ae184395d:dd53cfc1] The original executor group sets from > executor membership snapshot: [TExecutorGroupSet(curr_num_executors:3, > expected_num_executors:20, exec_group_name_prefix:)] > I0115 13:48:31.054841 23794 RequestPoolService.java:200] > 8a48ab2ae184395d:dd53cfc1] Default pool only, scheduler allocation is > not specified. > I0115 13:48:31.054934 23794 Frontend.java:2104] > 8a48ab2ae184395d:dd53cfc1] A total of 1 executor group sets to be > considered for auto-scaling: [TExecutor
[jira] [Created] (IMPALA-12711) DDL/DML errors are not shown in impalad logs
Quanlong Huang created IMPALA-12711: --- Summary: DDL/DML errors are not shown in impalad logs Key: IMPALA-12711 URL: https://issues.apache.org/jira/browse/IMPALA-12711 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang Assignee: Quanlong Huang Since IMPALA-10811, DDLs are executed in an async thread by default. The errors are not logged after that. For instance, run "INVALIDATE METADATA a.b" on an inexisting table, the error shown in the client is "ERROR: TableNotFoundException: Table not found: a.b". However, in the impalad logs, it looks like the statement succeeds. {noformat} I0115 13:47:43.256397 23443 Frontend.java:2072] dc497affd5678498:365a4600] Analyzing query: INVALIDATE METADATA a.b db: default I0115 13:47:43.256489 23443 Frontend.java:2084] dc497affd5678498:365a4600] The original executor group sets from executor membership snapshot: [TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, exec_group_name_prefix:)] I0115 13:47:43.256561 23443 RequestPoolService.java:200] dc497affd5678498:365a4600] Default pool only, scheduler allocation is not specified. I0115 13:47:43.256652 23443 Frontend.java:2104] dc497affd5678498:365a4600] A total of 1 executor group sets to be considered for auto-scaling: [TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, exec_group_name_prefix:, max_mem_limit:9223372036854775807, num_cores_per_executor:2147483647)] I0115 13:47:43.256775 23443 Frontend.java:2138] dc497affd5678498:365a4600] Consider executor group set: TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, exec_group_name_prefix:, max_mem_limit:9223372036854775807, num_cores_per_executor:2147483647) with assumption of 0 cores per node. I0115 13:47:43.263244 23443 AnalysisContext.java:508] dc497affd5678498:365a4600] Analysis took 4 ms I0115 13:47:43.264606 23443 BaseAuthorizationChecker.java:114] dc497affd5678498:365a4600] Authorization check took 1 ms I0115 13:47:43.264681 23443 Frontend.java:2400] dc497affd5678498:365a4600] Analysis and authorization finished. I0115 13:47:43.301832 23443 Frontend.java:2319] dc497affd5678498:365a4600] Selected executor group: TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, exec_group_name_prefix:, max_mem_limit:9223372036854775807, num_cores_per_executor:2147483647), reason: query is not auto-scalable I0115 13:47:43.305258 23443 client-request-state.cc:785] dc497affd5678498:365a4600] DDL exec mode=asynchronous I0115 13:47:43.306367 23443 impala-hs2-server.cc:573] ExecuteStatement(): return_val=TExecuteStatementResp { 01: status (struct) = TStatus { 01: statusCode (i32) = 0, }, 02: operationHandle (struct) = TOperationHandle { 01: operationId (struct) = THandleIdentifier { 01: guid (string) = "\x98\x84g\xd5\xffzI\xdc\x00\x00\x00\x00\x00FZ6", 02: secret (string) = "", }, 02: operationType (i32) = 0, 03: hasResultSet (bool) = false, }, } I0115 13:47:43.509263 23443 impala-hs2-server.cc:887] CloseOperation(): query_id=dc497affd5678498:365a4600 I0115 13:47:43.509281 23443 impala-server.cc:1554] UnregisterQuery(): query_id=dc497affd5678498:365a4600 I0115 13:47:43.509642 23298 impala-server.cc:1586] Query successfully unregistered: query_id=dc497affd5678498:365a4600{noformat} If the DDL is executed with "set enable_async_ddl_execution=false", the error is shown in the logs: {noformat} I0115 13:48:31.054708 23794 Frontend.java:2072] 8a48ab2ae184395d:dd53cfc1] Analyzing query: INVALIDATE METADATA a.b db: default I0115 13:48:31.054780 23794 Frontend.java:2084] 8a48ab2ae184395d:dd53cfc1] The original executor group sets from executor membership snapshot: [TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, exec_group_name_prefix:)] I0115 13:48:31.054841 23794 RequestPoolService.java:200] 8a48ab2ae184395d:dd53cfc1] Default pool only, scheduler allocation is not specified. I0115 13:48:31.054934 23794 Frontend.java:2104] 8a48ab2ae184395d:dd53cfc1] A total of 1 executor group sets to be considered for auto-scaling: [TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, exec_group_name_prefix:, max_mem_limit:9223372036854775807, num_cores_per_executor:2147483647)] I0115 13:48:31.055053 23794 Frontend.java:2138] 8a48ab2ae184395d:dd53cfc1] Consider executor group set: TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, exec_group_name_prefix:, max_mem_limit:9223372036854775807, num_cores_per_executor:2147483647) with assumption of 0 cores per node. I0115 13:48:31.055748 23794 AnalysisContext.java:508] 8a48ab2ae184395d:dd53cfc1] Analysis took 0 ms I0115 13:48:31.055960 23794 BaseAuthor
[jira] [Comment Edited] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806610#comment-17806610 ] Maxwell Guo edited comment on IMPALA-12709 at 1/15/24 3:52 AM: --- I may have a different point of view. Is it possible to divide the db into buckets according to the original operation time and parallelize each bucket? Each time, 1000 events are taken from HMS, divided into buckets, and then processed in parallel. After all events are processed, the next batch is processed. was (Author: maxwellguo): I may have a different point of view. Is it possible to divide the db into buckets according to the original operation time and parallelize each bucket? > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing
[ https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806610#comment-17806610 ] Maxwell Guo commented on IMPALA-12709: -- I may have a different point of view. Is it possible to divide the db into buckets according to the original operation time and parallelize each bucket? > Hierarchical metastore event processing > --- > > Key: IMPALA-12709 > URL: https://issues.apache.org/jira/browse/IMPALA-12709 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > Attachments: Hierarchical metastore event processing.docx > > > *Current Issue:* > At present, metastore event processor is single threaded. Notification events > are processed sequentially with a maximum limit of 1000 events fetched and > processed in a single batch. Multiple locks are used to address the > concurrency issues that may arise when catalog DDL operation processing and > metastore event processing tries to access/update the catalog objects > concurrently. Waiting for a lock or file metadata loading of a table can slow > the event processing and can affect the processing of other events following > it. Those events may not be dependent on the previous event. Altogether it > takes a very long time to synchronize all the HMS events. > *Proposal:* > Existing metastore event processing can be turned into multi-level event > processing. Idea is to segregate the events based on their dependency, > maintain the order of events as they occur within the dependency and process > them independently as much as possible: > # All the events of a table are processed in the same order they have > actually occurred. > # Events of different tables are processed in parallel. > # When a database is altered, all the events relating to the database(i.e., > for all its tables) occurring after the alter db event are processed only > after the alter database event is processed ensuring the order. > Have attached an initial proposal design document > https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12710) dockerized-impala-bootstrap-and-test.sh ignores existing value of IMPALA_TOOLCHAIN_HOST
Laszlo Gaal created IMPALA-12710: Summary: dockerized-impala-bootstrap-and-test.sh ignores existing value of IMPALA_TOOLCHAIN_HOST Key: IMPALA-12710 URL: https://issues.apache.org/jira/browse/IMPALA-12710 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 4.4.0 Reporter: Laszlo Gaal Assignee: Laszlo Gaal Impala build driver scripts allow changing the source location for toolchain downloads by supplying an initial value for IMPALA_TOOLCHAIN_HOST, which is evaluated in bin/impala-config.sh. bin/jenkins/dockerized-impala-bootstrap-and-test.sh uses a slightly different, two-phase mechanism for initializing the build environment, which does not preserve the initial value of this environment variable, making the override ineffective. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12665) Adjust complete_micro_batch_ length to new scratch_batch_->capacity after ScratchTupleBatch::Reset
[ https://issues.apache.org/jira/browse/IMPALA-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806463#comment-17806463 ] ASF subversion and git services commented on IMPALA-12665: -- Commit 6ddd69c605d4c594e33fdd39a2ca888538b4b8d7 in impala's branch refs/heads/master from Zinway Liu [ https://gitbox.apache.org/repos/asf?p=impala.git;h=6ddd69c60 ] IMPALA-12665: Adjust complete_micro_batch_ length to new scratch_batch_->capacity after ScratchTupleBatch::Reset **IMPALA-12665 Description:** The issue occurs when scanning Parquet tables with a row size > 4096 bytes and a row batch size > 1024. A heap-buffer-overflow was detected by AddressSanitizer, indicating a write operation beyond the allocated buffer space. **Root Cause Analysis:** The error log by AddressSanitizer points to a heap-buffer-overflow, where memory is accessed beyond the allocated region. This occurs in the `HdfsParquetScanner` and `ScratchTupleBatch` classes when handling large rows > 4096 bytes. **Fault Reproduction:** The issue can be reproduced by creating a Parquet table with many columns, inserting data using Hive, then querying with Impala. Bash and Hive client scripts in IMPALA-12665 create a table and populate it, triggering the bug. **Technical Analysis:** `ScratchTupleBatch::Reset` recalculates `capacity` based on tuple size and fixed memory limits. When row size > 4096 bytes, `capacity` is set < 1024. `HdfsParquetScanner` incorrectly assumes `complete_micro_batch_` length of 1024, leading to overflow. **Proposed Solution:** Ensure `complete_micro_batch_` length is updated after `ScratchTupleBatch::Reset`. This prevents accessing memory outside allocated buffer, avoiding heap-buffer-overflow. Change-Id: I966ff10ba734ed8b1b61325486de0dfcc7b58e4d Reviewed-on: http://gerrit.cloudera.org:8080/20834 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Adjust complete_micro_batch_ length to new scratch_batch_->capacity after > ScratchTupleBatch::Reset > -- > > Key: IMPALA-12665 > URL: https://issues.apache.org/jira/browse/IMPALA-12665 > Project: IMPALA > Issue Type: Bug > Components: be >Affects Versions: Impala 4.3.0 >Reporter: Zinway >Assignee: Zinway >Priority: Major > > {panel} > *Happened when parquet table scanning where row_size > 4096 bytes and row > batch > 1024.* > {panel} > h3. Log with AddressSanitizer > > {code:java} > ==557405==ERROR: AddressSanitizer: heap-buffer-overflow on address > 0x7fa162333408 at pc 0x0413a68c bp 0x7fa162f2fc10 sp 0x7fa162f2fc08 > WRITE of size 4 at 0x7fa162333408 thread T559 > #0 0x413a68b (/usr/lib/impala/sbin/impalad+0x413a68b)# addr2line => > apache-impala-4.3.0/be/src/exec/parquet/parquet-common.h:570 > #1 0x419b76f (/usr/lib/impala/sbin/impalad+0x419b76f)# addr2line => > apache-impala-4.3.0/be/src/exec/parquet/parquet-common.h:616 > #2 0x4199769 (/usr/lib/impala/sbin/impalad+0x4199769)# addr2line => > apache-impala-4.3.0/be/src/exec/parquet/parquet-column-readers.cc:864 > #3 0x4195e74 (/usr/lib/impala/sbin/impalad+0x4195e74)# addr2line => > apache-impala-4.3.0/be/src/exec/parquet/parquet-column-readers.cc:663 > #4 0x419f719 (/usr/lib/impala/sbin/impalad+0x419f719)# addr2line => > apache-impala-4.3.0/be/src/exec/parquet/parquet-column-readers.cc:496 > #5 0x38876d4 (/usr/lib/impala/sbin/impalad+0x38876d4)# addr2line => > apache-impala-4.3.0/be/src/exec/parquet/hdfs-parquet-scanner.cc:? > #6 0x388ef4f (/usr/lib/impala/sbin/impalad+0x388ef4f)# addr2line => > apache-impala-4.3.0/be/src/exec/parquet/hdfs-parquet-scanner.cc:2370 > #7 0x386db0d (/usr/lib/impala/sbin/impalad+0x386db0d)# addr2line => > apache-impala-4.3.0/be/src/exec/parquet/hdfs-parquet-scanner.cc:532 > #8 0x386b7d1 (/usr/lib/impala/sbin/impalad+0x386b7d1)# addr2line => > apache-impala-4.3.0/be/src/exec/parquet/hdfs-parquet-scanner.cc:416 > #9 0x3742adf (/usr/lib/impala/sbin/impalad+0x3742adf)# addr2line => > apache-impala-4.3.0/be/src/exec/hdfs-scan-node.cc:495 > #10 0x37418b8 (/usr/lib/impala/sbin/impalad+0x37418b8)# addr2line => > apache-impala-4.3.0/be/src/exec/hdfs-scan-node.cc:413 > #11 0x28720f6 (/usr/lib/impala/sbin/impalad+0x28720f6) > #12 0x33db1ef (/usr/lib/impala/sbin/impalad+0x33db1ef) > #13 0x33e74f8 (/usr/lib/impala/sbin/impalad+0x33e74f8) > #14 0x33e734b (/usr/lib/impala/sbin/impalad+0x33e734b) > #15 0x4b016f6 (/usr/lib/impala/sbin/impalad+0x4b016f6) > #16 0x7fa5a4d1cdd4 (/lib64/libpthread.so.0+0x7dd4) > #17 0x7fa5a1d0102c (/lib64/libc.so.6+0xfe02c) > 0x7fa162333408 is located 8 bytes to the right of 4193280-byte region > [0
[jira] [Commented] (IMPALA-12706) Failing DCHECK when querying STRUCT inside a STRUCT for Iceberg metadata table
[ https://issues.apache.org/jira/browse/IMPALA-12706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806462#comment-17806462 ] ASF subversion and git services commented on IMPALA-12706: -- Commit 74617537b5c805327349ef0ac5c79b84dc1e786d in impala's branch refs/heads/master from Tamas Mate [ https://gitbox.apache.org/repos/asf?p=impala.git;h=74617537b ] IMPALA-12706: Fix nested struct querying for Iceberg metadata tables This commit fixes a DCHECK failure when querying a struct inside a struct. The previous field accessor creation logic was trying to find the ColumnDescriptor for a struct inside a struct and hit a DCHECK because there are no ColumnDescriptors for struct fields. The logic has been reworked to only use ColumnDescriptors for top level columns. Testing: - Added E2E test to cover this case Change-Id: Iadd029a4edc500bd8d8fca3f958903c2dbe09e8e Reviewed-on: http://gerrit.cloudera.org:8080/20883 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Failing DCHECK when querying STRUCT inside a STRUCT for Iceberg metadata table > -- > > Key: IMPALA-12706 > URL: https://issues.apache.org/jira/browse/IMPALA-12706 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Tamas Mate >Assignee: Tamas Mate >Priority: Major > Labels: impala-iceberg > > When querying a STRUCT type inside a STRUCT type there is a failing DCHECK. > {code:none} > F0111 09:01:35.626691 15777 descriptors.h:366] > 83474e353d7baccd:d966f47c] Check failed: slot_desc->col_path().size() > == 1 (2 vs. 1) > {code} > While the following is working: > {code:none} > select readable_metrics from > functional_parquet.iceberg_query_metadata.data_files; > {code} > this fails: > {code:none} > select readable_metrics.i from > functional_parquet.iceberg_query_metadata.data_files; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org