[jira] [Updated] (IMPALA-12711) DDL/DML errors are not shown in impalad logs

2024-01-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12711:

Affects Version/s: Impala 4.3.0
   Impala 4.1.2
   Impala 4.1.1
   Impala 4.2.0
   Impala 4.1.0

> DDL/DML errors are not shown in impalad logs
> 
>
> Key: IMPALA-12711
> URL: https://issues.apache.org/jira/browse/IMPALA-12711
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, 
> Impala 4.3.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> Since IMPALA-10811, DDLs are executed in an async thread by default. The 
> errors are not logged after that. For instance, run "INVALIDATE METADATA a.b" 
> on an inexisting table, the error shown in the client is "ERROR: 
> TableNotFoundException: Table not found: a.b". However, in the impalad logs, 
> it looks like the statement succeeds.
> {noformat}
> I0115 13:47:43.256397 23443 Frontend.java:2072] 
> dc497affd5678498:365a4600] Analyzing query: INVALIDATE METADATA a.b 
> db: default
> I0115 13:47:43.256489 23443 Frontend.java:2084] 
> dc497affd5678498:365a4600] The original executor group sets from 
> executor membership snapshot: [TExecutorGroupSet(curr_num_executors:3, 
> expected_num_executors:20, exec_group_name_prefix:)]
> I0115 13:47:43.256561 23443 RequestPoolService.java:200] 
> dc497affd5678498:365a4600] Default pool only, scheduler allocation is 
> not specified.
> I0115 13:47:43.256652 23443 Frontend.java:2104] 
> dc497affd5678498:365a4600] A total of 1 executor group sets to be 
> considered for auto-scaling: [TExecutorGroupSet(curr_num_executors:3, 
> expected_num_executors:20, exec_group_name_prefix:, 
> max_mem_limit:9223372036854775807, num_cores_per_executor:2147483647)]
> I0115 13:47:43.256775 23443 Frontend.java:2138] 
> dc497affd5678498:365a4600] Consider executor group set: 
> TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, 
> exec_group_name_prefix:, max_mem_limit:9223372036854775807, 
> num_cores_per_executor:2147483647) with assumption of 0 cores per node.
> I0115 13:47:43.263244 23443 AnalysisContext.java:508] 
> dc497affd5678498:365a4600] Analysis took 4 ms
> I0115 13:47:43.264606 23443 BaseAuthorizationChecker.java:114] 
> dc497affd5678498:365a4600] Authorization check took 1 ms
> I0115 13:47:43.264681 23443 Frontend.java:2400] 
> dc497affd5678498:365a4600] Analysis and authorization finished.
> I0115 13:47:43.301832 23443 Frontend.java:2319] 
> dc497affd5678498:365a4600] Selected executor group: 
> TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, 
> exec_group_name_prefix:, max_mem_limit:9223372036854775807, 
> num_cores_per_executor:2147483647), reason: query is not auto-scalable
> I0115 13:47:43.305258 23443 client-request-state.cc:785] 
> dc497affd5678498:365a4600] DDL exec mode=asynchronous
> I0115 13:47:43.306367 23443 impala-hs2-server.cc:573] ExecuteStatement(): 
> return_val=TExecuteStatementResp {
>   01: status (struct) = TStatus {
> 01: statusCode (i32) = 0,
>   },
>   02: operationHandle (struct) = TOperationHandle {
> 01: operationId (struct) = THandleIdentifier {
>   01: guid (string) = "\x98\x84g\xd5\xffzI\xdc\x00\x00\x00\x00\x00FZ6",
>   02: secret (string) = "",
> }, 
> 02: operationType (i32) = 0,
> 03: hasResultSet (bool) = false,
>   },
> }
> I0115 13:47:43.509263 23443 impala-hs2-server.cc:887] CloseOperation(): 
> query_id=dc497affd5678498:365a4600
> I0115 13:47:43.509281 23443 impala-server.cc:1554] UnregisterQuery(): 
> query_id=dc497affd5678498:365a4600
> I0115 13:47:43.509642 23298 impala-server.cc:1586] Query successfully 
> unregistered: query_id=dc497affd5678498:365a4600{noformat}
> If the DDL is executed with "set enable_async_ddl_execution=false", the error 
> is shown in the logs:
> {noformat}
> I0115 13:48:31.054708 23794 Frontend.java:2072] 
> 8a48ab2ae184395d:dd53cfc1] Analyzing query: INVALIDATE METADATA a.b 
> db: default
> I0115 13:48:31.054780 23794 Frontend.java:2084] 
> 8a48ab2ae184395d:dd53cfc1] The original executor group sets from 
> executor membership snapshot: [TExecutorGroupSet(curr_num_executors:3, 
> expected_num_executors:20, exec_group_name_prefix:)]
> I0115 13:48:31.054841 23794 RequestPoolService.java:200] 
> 8a48ab2ae184395d:dd53cfc1] Default pool only, scheduler allocation is 
> not specified.
> I0115 13:48:31.054934 23794 Frontend.java:2104] 
> 8a48ab2ae184395d:dd53cfc1] A total of 1 executor group sets to be 
> considered for auto-scaling: [TExecutor

[jira] [Created] (IMPALA-12711) DDL/DML errors are not shown in impalad logs

2024-01-14 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-12711:
---

 Summary: DDL/DML errors are not shown in impalad logs
 Key: IMPALA-12711
 URL: https://issues.apache.org/jira/browse/IMPALA-12711
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Since IMPALA-10811, DDLs are executed in an async thread by default. The errors 
are not logged after that. For instance, run "INVALIDATE METADATA a.b" on an 
inexisting table, the error shown in the client is "ERROR: 
TableNotFoundException: Table not found: a.b". However, in the impalad logs, it 
looks like the statement succeeds.
{noformat}
I0115 13:47:43.256397 23443 Frontend.java:2072] 
dc497affd5678498:365a4600] Analyzing query: INVALIDATE METADATA a.b db: 
default
I0115 13:47:43.256489 23443 Frontend.java:2084] 
dc497affd5678498:365a4600] The original executor group sets from 
executor membership snapshot: [TExecutorGroupSet(curr_num_executors:3, 
expected_num_executors:20, exec_group_name_prefix:)]
I0115 13:47:43.256561 23443 RequestPoolService.java:200] 
dc497affd5678498:365a4600] Default pool only, scheduler allocation is 
not specified.
I0115 13:47:43.256652 23443 Frontend.java:2104] 
dc497affd5678498:365a4600] A total of 1 executor group sets to be 
considered for auto-scaling: [TExecutorGroupSet(curr_num_executors:3, 
expected_num_executors:20, exec_group_name_prefix:, 
max_mem_limit:9223372036854775807, num_cores_per_executor:2147483647)]
I0115 13:47:43.256775 23443 Frontend.java:2138] 
dc497affd5678498:365a4600] Consider executor group set: 
TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, 
exec_group_name_prefix:, max_mem_limit:9223372036854775807, 
num_cores_per_executor:2147483647) with assumption of 0 cores per node.
I0115 13:47:43.263244 23443 AnalysisContext.java:508] 
dc497affd5678498:365a4600] Analysis took 4 ms
I0115 13:47:43.264606 23443 BaseAuthorizationChecker.java:114] 
dc497affd5678498:365a4600] Authorization check took 1 ms
I0115 13:47:43.264681 23443 Frontend.java:2400] 
dc497affd5678498:365a4600] Analysis and authorization finished.
I0115 13:47:43.301832 23443 Frontend.java:2319] 
dc497affd5678498:365a4600] Selected executor group: 
TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, 
exec_group_name_prefix:, max_mem_limit:9223372036854775807, 
num_cores_per_executor:2147483647), reason: query is not auto-scalable
I0115 13:47:43.305258 23443 client-request-state.cc:785] 
dc497affd5678498:365a4600] DDL exec mode=asynchronous
I0115 13:47:43.306367 23443 impala-hs2-server.cc:573] ExecuteStatement(): 
return_val=TExecuteStatementResp {
  01: status (struct) = TStatus {
01: statusCode (i32) = 0,
  },
  02: operationHandle (struct) = TOperationHandle {
01: operationId (struct) = THandleIdentifier {
  01: guid (string) = "\x98\x84g\xd5\xffzI\xdc\x00\x00\x00\x00\x00FZ6",
  02: secret (string) = "",
}, 
02: operationType (i32) = 0,
03: hasResultSet (bool) = false,
  },
}
I0115 13:47:43.509263 23443 impala-hs2-server.cc:887] CloseOperation(): 
query_id=dc497affd5678498:365a4600
I0115 13:47:43.509281 23443 impala-server.cc:1554] UnregisterQuery(): 
query_id=dc497affd5678498:365a4600
I0115 13:47:43.509642 23298 impala-server.cc:1586] Query successfully 
unregistered: query_id=dc497affd5678498:365a4600{noformat}
If the DDL is executed with "set enable_async_ddl_execution=false", the error 
is shown in the logs:
{noformat}
I0115 13:48:31.054708 23794 Frontend.java:2072] 
8a48ab2ae184395d:dd53cfc1] Analyzing query: INVALIDATE METADATA a.b db: 
default
I0115 13:48:31.054780 23794 Frontend.java:2084] 
8a48ab2ae184395d:dd53cfc1] The original executor group sets from 
executor membership snapshot: [TExecutorGroupSet(curr_num_executors:3, 
expected_num_executors:20, exec_group_name_prefix:)]
I0115 13:48:31.054841 23794 RequestPoolService.java:200] 
8a48ab2ae184395d:dd53cfc1] Default pool only, scheduler allocation is 
not specified.
I0115 13:48:31.054934 23794 Frontend.java:2104] 
8a48ab2ae184395d:dd53cfc1] A total of 1 executor group sets to be 
considered for auto-scaling: [TExecutorGroupSet(curr_num_executors:3, 
expected_num_executors:20, exec_group_name_prefix:, 
max_mem_limit:9223372036854775807, num_cores_per_executor:2147483647)]
I0115 13:48:31.055053 23794 Frontend.java:2138] 
8a48ab2ae184395d:dd53cfc1] Consider executor group set: 
TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, 
exec_group_name_prefix:, max_mem_limit:9223372036854775807, 
num_cores_per_executor:2147483647) with assumption of 0 cores per node.
I0115 13:48:31.055748 23794 AnalysisContext.java:508] 
8a48ab2ae184395d:dd53cfc1] Analysis took 0 ms
I0115 13:48:31.055960 23794 BaseAuthor

[jira] [Comment Edited] (IMPALA-12709) Hierarchical metastore event processing

2024-01-14 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806610#comment-17806610
 ] 

Maxwell Guo edited comment on IMPALA-12709 at 1/15/24 3:52 AM:
---

I may have a different point of view. Is it possible to divide the db into 
buckets according to the original operation time and parallelize each bucket? 
Each time, 1000 events are taken from HMS, divided into buckets, and then 
processed in parallel. After all events are processed, the next batch is 
processed.


was (Author: maxwellguo):
I may have a different point of view. Is it possible to divide the db into 
buckets according to the original operation time and parallelize each bucket?

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12709) Hierarchical metastore event processing

2024-01-14 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806610#comment-17806610
 ] 

Maxwell Guo commented on IMPALA-12709:
--

I may have a different point of view. Is it possible to divide the db into 
buckets according to the original operation time and parallelize each bucket?

> Hierarchical metastore event processing
> ---
>
> Key: IMPALA-12709
> URL: https://issues.apache.org/jira/browse/IMPALA-12709
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
> Attachments: Hierarchical metastore event processing.docx
>
>
> *Current Issue:*
> At present, metastore event processor is single threaded. Notification events 
> are processed sequentially with a maximum limit of 1000 events fetched and 
> processed in a single batch. Multiple locks are used to address the 
> concurrency issues that may arise when catalog DDL operation processing and 
> metastore event processing tries to access/update the catalog objects 
> concurrently. Waiting for a lock or file metadata loading of a table can slow 
> the event processing and can affect the processing of other events following 
> it. Those events may not be dependent on the previous event. Altogether it 
> takes a very long time to synchronize all the HMS events.
> *Proposal:*
> Existing metastore event processing can be turned into multi-level event 
> processing. Idea is to segregate the events based on their dependency, 
> maintain the order of events as they occur within the dependency and process 
> them independently as much as possible:
>  # All the events of a table are processed in the same order they have 
> actually occurred.
>  # Events of different tables are processed in parallel.
>  # When a database is altered, all the events relating to the database(i.e., 
> for all its tables) occurring after the alter db event are processed only 
> after the alter database event is processed ensuring the order.
> Have attached an initial proposal design document
> https://docs.google.com/document/d/1KZ-ANko-qn5CYmY13m4OVJXAYjLaS1VP-c64Pumipq8/edit?pli=1#heading=h.qyk8qz8ez37b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12710) dockerized-impala-bootstrap-and-test.sh ignores existing value of IMPALA_TOOLCHAIN_HOST

2024-01-14 Thread Laszlo Gaal (Jira)
Laszlo Gaal created IMPALA-12710:


 Summary: dockerized-impala-bootstrap-and-test.sh ignores existing 
value of IMPALA_TOOLCHAIN_HOST
 Key: IMPALA-12710
 URL: https://issues.apache.org/jira/browse/IMPALA-12710
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.4.0
Reporter: Laszlo Gaal
Assignee: Laszlo Gaal


Impala build driver scripts allow changing the source location for toolchain 
downloads by supplying an initial value for IMPALA_TOOLCHAIN_HOST, which is 
evaluated in bin/impala-config.sh.
bin/jenkins/dockerized-impala-bootstrap-and-test.sh uses a slightly different, 
two-phase mechanism for initializing the build environment, which does not 
preserve the initial value of this environment variable, making the override 
ineffective.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12665) Adjust complete_micro_batch_ length to new scratch_batch_->capacity after ScratchTupleBatch::Reset

2024-01-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806463#comment-17806463
 ] 

ASF subversion and git services commented on IMPALA-12665:
--

Commit 6ddd69c605d4c594e33fdd39a2ca888538b4b8d7 in impala's branch 
refs/heads/master from Zinway Liu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6ddd69c60 ]

IMPALA-12665: Adjust complete_micro_batch_ length to new 
scratch_batch_->capacity after ScratchTupleBatch::Reset

**IMPALA-12665 Description:**
The issue occurs when scanning Parquet tables with a row size
> 4096 bytes and a row batch size > 1024. A heap-buffer-overflow
was detected by AddressSanitizer, indicating a write operation
beyond the allocated buffer space.

**Root Cause Analysis:**
The error log by AddressSanitizer points to a heap-buffer-overflow,
where memory is accessed beyond the allocated region. This occurs
in the `HdfsParquetScanner` and `ScratchTupleBatch` classes when
handling large rows > 4096 bytes.

**Fault Reproduction:**
The issue can be reproduced by creating a Parquet table with many
columns, inserting data using Hive, then querying with Impala.
Bash and Hive client scripts in IMPALA-12665 create a table and
populate it, triggering the bug.

**Technical Analysis:**
`ScratchTupleBatch::Reset` recalculates `capacity` based on tuple
size and fixed memory limits. When row size > 4096 bytes, `capacity`
is set < 1024. `HdfsParquetScanner` incorrectly assumes
`complete_micro_batch_` length of 1024, leading to overflow.

**Proposed Solution:**
Ensure `complete_micro_batch_` length is updated after
`ScratchTupleBatch::Reset`. This prevents accessing memory outside
allocated buffer, avoiding heap-buffer-overflow.

Change-Id: I966ff10ba734ed8b1b61325486de0dfcc7b58e4d
Reviewed-on: http://gerrit.cloudera.org:8080/20834
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Adjust complete_micro_batch_ length to new scratch_batch_->capacity after 
> ScratchTupleBatch::Reset
> --
>
> Key: IMPALA-12665
> URL: https://issues.apache.org/jira/browse/IMPALA-12665
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.3.0
>Reporter: Zinway
>Assignee: Zinway
>Priority: Major
>
> {panel}
> *Happened when parquet table scanning where row_size > 4096 bytes and row 
> batch > 1024.*
> {panel}
> h3. Log with AddressSanitizer
>  
> {code:java}
> ==557405==ERROR: AddressSanitizer: heap-buffer-overflow on address 
> 0x7fa162333408 at pc 0x0413a68c bp 0x7fa162f2fc10 sp 0x7fa162f2fc08
> WRITE of size 4 at 0x7fa162333408 thread T559
> #0  0x413a68b  (/usr/lib/impala/sbin/impalad+0x413a68b)# addr2line => 
> apache-impala-4.3.0/be/src/exec/parquet/parquet-common.h:570
> #1  0x419b76f  (/usr/lib/impala/sbin/impalad+0x419b76f)# addr2line => 
> apache-impala-4.3.0/be/src/exec/parquet/parquet-common.h:616
> #2  0x4199769  (/usr/lib/impala/sbin/impalad+0x4199769)# addr2line => 
> apache-impala-4.3.0/be/src/exec/parquet/parquet-column-readers.cc:864
> #3  0x4195e74  (/usr/lib/impala/sbin/impalad+0x4195e74)# addr2line => 
> apache-impala-4.3.0/be/src/exec/parquet/parquet-column-readers.cc:663
> #4  0x419f719  (/usr/lib/impala/sbin/impalad+0x419f719)# addr2line => 
> apache-impala-4.3.0/be/src/exec/parquet/parquet-column-readers.cc:496
> #5  0x38876d4  (/usr/lib/impala/sbin/impalad+0x38876d4)# addr2line => 
> apache-impala-4.3.0/be/src/exec/parquet/hdfs-parquet-scanner.cc:?
> #6  0x388ef4f  (/usr/lib/impala/sbin/impalad+0x388ef4f)# addr2line => 
> apache-impala-4.3.0/be/src/exec/parquet/hdfs-parquet-scanner.cc:2370
> #7  0x386db0d  (/usr/lib/impala/sbin/impalad+0x386db0d)# addr2line => 
> apache-impala-4.3.0/be/src/exec/parquet/hdfs-parquet-scanner.cc:532
> #8  0x386b7d1  (/usr/lib/impala/sbin/impalad+0x386b7d1)# addr2line => 
> apache-impala-4.3.0/be/src/exec/parquet/hdfs-parquet-scanner.cc:416
> #9  0x3742adf  (/usr/lib/impala/sbin/impalad+0x3742adf)# addr2line => 
> apache-impala-4.3.0/be/src/exec/hdfs-scan-node.cc:495
> #10 0x37418b8  (/usr/lib/impala/sbin/impalad+0x37418b8)# addr2line => 
> apache-impala-4.3.0/be/src/exec/hdfs-scan-node.cc:413
> #11 0x28720f6  (/usr/lib/impala/sbin/impalad+0x28720f6)
> #12 0x33db1ef  (/usr/lib/impala/sbin/impalad+0x33db1ef)
> #13 0x33e74f8  (/usr/lib/impala/sbin/impalad+0x33e74f8)
> #14 0x33e734b  (/usr/lib/impala/sbin/impalad+0x33e734b)
> #15 0x4b016f6  (/usr/lib/impala/sbin/impalad+0x4b016f6)
> #16 0x7fa5a4d1cdd4  (/lib64/libpthread.so.0+0x7dd4)
> #17 0x7fa5a1d0102c  (/lib64/libc.so.6+0xfe02c)
> 0x7fa162333408 is located 8 bytes to the right of 4193280-byte region 
> [0

[jira] [Commented] (IMPALA-12706) Failing DCHECK when querying STRUCT inside a STRUCT for Iceberg metadata table

2024-01-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806462#comment-17806462
 ] 

ASF subversion and git services commented on IMPALA-12706:
--

Commit 74617537b5c805327349ef0ac5c79b84dc1e786d in impala's branch 
refs/heads/master from Tamas Mate
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=74617537b ]

IMPALA-12706: Fix nested struct querying for Iceberg metadata tables

This commit fixes a DCHECK failure when querying a struct inside a
struct. The previous field accessor creation logic was trying to find
the ColumnDescriptor for a struct inside a struct and hit a DCHECK
because there are no ColumnDescriptors for struct fields. The logic
has been reworked to only use ColumnDescriptors for top level columns.

Testing:
 - Added E2E test to cover this case

Change-Id: Iadd029a4edc500bd8d8fca3f958903c2dbe09e8e
Reviewed-on: http://gerrit.cloudera.org:8080/20883
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Failing DCHECK when querying STRUCT inside a STRUCT for Iceberg metadata table
> --
>
> Key: IMPALA-12706
> URL: https://issues.apache.org/jira/browse/IMPALA-12706
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Tamas Mate
>Assignee: Tamas Mate
>Priority: Major
>  Labels: impala-iceberg
>
> When querying a STRUCT type inside a STRUCT type there is a failing DCHECK.
> {code:none}
> F0111 09:01:35.626691 15777 descriptors.h:366] 
> 83474e353d7baccd:d966f47c] Check failed: slot_desc->col_path().size() 
> == 1 (2 vs. 1)
> {code}
> While the following is working:
> {code:none}
> select readable_metrics from 
> functional_parquet.iceberg_query_metadata.data_files;
> {code}
> this fails:
> {code:none}
> select readable_metrics.i from 
> functional_parquet.iceberg_query_metadata.data_files;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org