[jira] [Resolved] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2024-11-08 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12554.
--
 Fix Version/s: Impala 4.5.0
Target Version: Impala 4.5.0
Resolution: Fixed

> Create only one Ranger policy for GRANT statement
> -
>
> Key: IMPALA-12554
> URL: https://issues.apache.org/jira/browse/IMPALA-12554
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Currently Impala would create a Ranger policy for each column specified in a 
> GRANT statement. For instance, after the following query, 3 Ranger policies 
> would be created on the Ranger server. This could result in a lot of policies 
> created when there are many columns specified and it may result in Impala's 
> Ranger plug-in taking a long time to download the policies from the Ranger 
> server. It would be great if Impala only creates one single policy for 
> columns in the same table.
> {code:java}
> [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
> functional.alltypes to user non_owner;
> Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes 
> to user non_owner
> Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
> Query progress can be monitored at: 
> http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-----+
> Fetched 1 row(s) in 0.67s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13482) Calcite planner: Bug fixes for an analytics.test

2024-11-08 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13482.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Calcite planner: Bug fixes for an analytics.test
> 
>
> Key: IMPALA-13482
> URL: https://issues.apache.org/jira/browse/IMPALA-13482
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Specifically, 
> select lag(coalesce(505, 1 + NULL), 1) over (order by int_col desc)
> from alltypestiny
> had a couple of issues that needed fixing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13539) TestCalcitePlanner.test_calcite_frontend fails on non-HDFS test jobs

2024-11-08 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13539:
--

 Summary: TestCalcitePlanner.test_calcite_frontend fails on 
non-HDFS test jobs
 Key: IMPALA-13539
 URL: https://issues.apache.org/jira/browse/IMPALA-13539
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


On S3 and other non-HDFS jobs, TestCalcitePlanner.test_calcite_frontend fails 
with this:
{noformat}
custom_cluster/test_calcite_planner.py:40: in test_calcite_frontend
    self.run_test_case('QueryTest/calcite', vector, use_db=unique_database)
common/impala_test_suite.py:849: in run_test_case
    self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:656: in __verify_results_and_errors
    replace_filenames_with_placeholder)
common/test_result_verifier.py:520: in verify_raw_results
    VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:290: in verify_query_result_is_subset
    unicode(expected_row), unicode(actual_results))
E   AssertionError: Could not find expected row row_regex:.*00:SCAN HDFS.* in 
actual rows:
E   '   S3 partitions=4/4 files=4 size=460B'
E   '   row-size=89B cardinality=8'
E   '00:SCAN S3 [functional.alltypestiny]'
E   '01:EXCHANGE [UNPARTITIONED]'
E   'PLAN-ROOT SINK'
E   '|'
E   '|'{noformat}
It is looking for SCAN HDFS, but non-HDFS filesystems will have a different 
message. We should change what it expects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13538) Support query plan for imported query profiles

2024-11-08 Thread Surya Hebbar (Jira)
Surya Hebbar created IMPALA-13538:
-

 Summary: Support query plan for imported query profiles
 Key: IMPALA-13538
 URL: https://issues.apache.org/jira/browse/IMPALA-13538
 Project: IMPALA
  Issue Type: Improvement
Reporter: Surya Hebbar
Assignee: Surya Hebbar


Currently, only text plan rendering has been supported for imported query 
profiles. It would be useful to support enhanced SVG rendering of the query 
plan for imported profiles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13537) TestPartitionDeletion.test_local_catalog_with_event_processing fails in some builds

2024-11-08 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-13537:
--

 Summary: 
TestPartitionDeletion.test_local_catalog_with_event_processing fails in some 
builds
 Key: IMPALA-13537
 URL: https://issues.apache.org/jira/browse/IMPALA-13537
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Quanlong Huang


test_partition.TestPartitionDeletion.test_local_catalog_with_event_processing 
fails in some of our builds:

{code:java}
custom_cluster/test_partition.py:127: in 
test_local_catalog_with_event_processing
self._test_partition_deletion(unique_database)
custom_cluster/test_partition.py:162: in _test_partition_deletion
self.assert_catalogd_log_contains("INFO", deletion_log_regex.format(tbl, i))
common/impala_test_suite.py:1341: in assert_catalogd_log_contains
daemon, level, line_regex, expected_count, timeout_s, dry_run)
common/impala_test_suite.py:1380: in assert_log_contains
(expected_count, log_file_path, line_regex, found, line)
E   AssertionError: Expected 1 lines in file 
/data0/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/catalogd.impala-ec2-centos79-m6i-4xlarge-xldisk-1323.vpc.cloudera.com.jenkins.log.INFO.20241103-192936.15236
 matching regex 'Collected . partition 
deletion.*HDFS_PARTITION:test_local_catalog_with_event_processing_4fbd8416.part_tbl:.*p=3',
 but found 0 lines. Last line was: 
E   I1103 19:29:55.873072 16894 TableLoader.java:177] Loaded metadata for: 
test_local_catalog_with_event_processing_4fbd8416.part_tbl (68ms)
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13536) Tests failing in TestWorkloadManagementInitWait

2024-11-08 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-13536:
--

 Summary: Tests failing in TestWorkloadManagementInitWait
 Key: IMPALA-13536
 URL: https://issues.apache.org/jira/browse/IMPALA-13536
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Michael Smith


Some of our tests failed in the test class 
test_workload_mgmt_init.py::TestWorkloadManagementInitWait.

test_upgrade_1_0_0_to_1_1_0():
{code:java}
custom_cluster/test_workload_mgmt_init.py:222: in test_upgrade_1_0_0_to_1_1_0
self.check_schema_version("1.1.0")
custom_cluster/test_workload_mgmt_init.py:129: in check_schema_version
self.assert_table_prop(tbl_name, "wm_schema_version", schema_version)
custom_cluster/test_workload_mgmt_init.py:104: in assert_table_prop
assert found, "did not find expected table prop '{}' with value '{}' on 
table " \
E   AssertionError: did not find expected table prop 'wm_schema_version' with 
value '1.1.0' on table 'sys.impala_query_log'
E   assert False
{code}

test_invalid_wm_schema_version_live_table_prop():

{code:java}
custom_cluster/test_workload_mgmt_init.py:375: in 
test_invalid_wm_schema_version_live_table_prop
self._run_invalid_table_prop_test(self.QUERY_TBL_LIVE, "wm_schema_version")
custom_cluster/test_workload_mgmt_init.py:325: in _run_invalid_table_prop_test
"found on the '{}' property of table '{}'".format(prop_name, table))
common/impala_test_suite.py:1351: in assert_catalogd_log_contains
daemon, level, line_regex, expected_count, timeout_s, dry_run)
common/impala_test_suite.py:1397: in assert_log_contains
(expected_count, log_file_path, line_regex, found, line)
E   AssertionError: Expected 1 lines in file 
/data0/jenkins/workspace/impala-asf-master-core-ozone-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.impala-ec2-centos79-m6i-4xlarge-xldisk-126a.vpc.cloudera.com.jenkins.log.FATAL.20241107-042724.11427
 matching regex 'could not parse version string '' found on the 
'wm_schema_version' property of table 'sys.impala_query_live'', but found 0 
lines. Last line was: 
E   . Impalad exiting.
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13535) Add script to restore stats on PlannerTest files

2024-11-07 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13535:
-

 Summary: Add script to restore stats on PlannerTest files
 Key: IMPALA-13535
 URL: https://issues.apache.org/jira/browse/IMPALA-13535
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend, Test
Reporter: Riza Suminto
Assignee: Riza Suminto


We have several PlannerTest that validate over EXTENDED profile and validate 
cardinality. In EXTENDED level, profile display stored table stats from HMS 
like 'numRows' and 'totalSize', which can vary between data loads. They are not 
validated by PlannerTest and will not fail the test. But frequent change of 
these lines can disturb code review process because they are mostly noise.

We need to have some script to help ease restoring the stored table stats 
information in those .test files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13534) Support CTE in distributed plan

2024-11-07 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13534:
--

 Summary: Support CTE in distributed plan
 Key: IMPALA-13534
 URL: https://issues.apache.org/jira/browse/IMPALA-13534
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend, Frontend
Reporter: Michael Smith
Assignee: Michael Smith


Add planner support for generating CTE producer fragments, removing sequence in 
distributed plan. CTE buffer must remain active until all consumers have 
finished reading.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13533) Execute single-node CTE plans

2024-11-07 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13533:
--

 Summary: Execute single-node CTE plans
 Key: IMPALA-13533
 URL: https://issues.apache.org/jira/browse/IMPALA-13533
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Michael Smith
Assignee: Michael Smith


Add backend support for CTE plans.

* Add PlanNode/ExecNode for CTE producer, consumer, and sequence nodes.
* Implement backend node-local dataflow from CTE producers to consumers. Use 
BufferedTupleStream to buffer CTE producer results until each consumer is ready 
to process them; initially this will be a batch operation, where CTE producer 
puts all results into the BufferedTupleStream, then consumers can start reading 
them.
* Should be able to execute all TPC-DS queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13532) Add concurrent unpinned reader support to BufferedTupleStream

2024-11-07 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13532:
--

 Summary: Add concurrent unpinned reader support to 
BufferedTupleStream
 Key: IMPALA-13532
 URL: https://issues.apache.org/jira/browse/IMPALA-13532
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Michael Smith


Modify BufferedTupleStream to support multiple concurrent unpinned readers. 
Currently it requires the buffer to be pinned to support concurrent readers.

This primarily involves additional locking around pinning/unpinning to ensure 
concurrent readers don't cause conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13531) Implement CTE identification in Impala Calcite planner

2024-11-07 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13531:
--

 Summary: Implement CTE identification in Impala Calcite planner
 Key: IMPALA-13531
 URL: https://issues.apache.org/jira/browse/IMPALA-13531
 Project: IMPALA
  Issue Type: Sub-task
  Components: Frontend
Reporter: Michael Smith


Port HIVE-28259 CTE identification to the Impala Calcite planner. That should 
be sufficient to produce CTE PlanNodes in a single-node plan that we can verify 
by reviewing the plan produced. Plan is not expected to be executable.

Add query options to enable/disable CTEs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13530) Calcite planner: support decimal_v1 query option

2024-11-07 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13530:
-

 Summary: Calcite planner: support decimal_v1 query option
 Key: IMPALA-13530
 URL: https://issues.apache.org/jira/browse/IMPALA-13530
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13529) Calcite planner: need to support query option appx_count_distinct

2024-11-07 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13529:
-

 Summary: Calcite planner: need to support query option 
appx_count_distinct
 Key: IMPALA-13529
 URL: https://issues.apache.org/jira/browse/IMPALA-13529
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13528) Calcite planner: run unsupported query options through original planner

2024-11-07 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13528:
-

 Summary: Calcite planner: run unsupported query options through 
original planner
 Key: IMPALA-13528
 URL: https://issues.apache.org/jira/browse/IMPALA-13528
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13494) Calcite planner: group_concat failing with distinct

2024-11-07 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13494.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Calcite planner: group_concat failing with distinct
> ---
>
> Key: IMPALA-13494
> URL: https://issues.apache.org/jira/browse/IMPALA-13494
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The following query is failing in distinct.test
>  
>     select sum(len_orderkey), sum(len_comment)
>     from (
>       select
>         length(group_concat(distinct cast(l_orderkey as string))) 
> len_orderkey,
>         length(group_concat(distinct(l_comment))) len_comment
>         from tpch.lineitem
>         group by l_comment
>       ) v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13510) Unset the environment variable for tuple cache tests

2024-11-07 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13510.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Unset the environment variable for tuple cache tests
> 
>
> Key: IMPALA-13510
> URL: https://issues.apache.org/jira/browse/IMPALA-13510
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Yida Wu
>Assignee: Yida Wu
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The test_cache_disabled test case would fail in the tuple cache build because 
> the build enables the tuple cache using the environment variables, while the 
> test case requires the tuple cache to remain disabled. To unset the related 
> environment variables is a way to resolve the failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13512) Print .test file name if PlannerTest fail

2024-11-07 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13512.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Print .test file name if PlannerTest fail
> -
>
> Key: IMPALA-13512
> URL: https://issues.apache.org/jira/browse/IMPALA-13512
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Test
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Minor
> Fix For: Impala 4.5.0
>
>
> If PlannerTest fail, error message will show hint of which test case fail by 
> printing the section and line number like this:
> {code:java}
> Error Message
> Section PLAN of query at line 239: {code}
> This can be improved by also printing the path to .test file that is failed 
> like this:
> {code:java}
> Error Message
> Section PLAN of query at 
> functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:239:
>  {code}
> PlannerTest should also skip printing VERBOSE plan if 
> PlannerTestOption.EXTENDED_EXPLAIN is specified, since EXTENDED level already 
> contains sufficient details including tuples, sizes, and cardinality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13505) NullPointerException in Analyzer.resolveActualPath with Calcite planner

2024-11-07 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13505.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> NullPointerException in Analyzer.resolveActualPath with Calcite planner
> ---
>
> Key: IMPALA-13505
> URL: https://issues.apache.org/jira/browse/IMPALA-13505
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Michael Smith
>Assignee: Jason Fehr
>Priority: Major
>  Labels: calcite
> Fix For: Impala 4.5.0
>
>
> Encountered a NullPointerException when running some TPC-DS queries (such as 
> q8) with the Calcite planner:
> {code:java}
> Stack Trace:java.lang.NullPointerException
> at 
> org.apache.impala.analysis.Analyzer.lambda$resolveActualPath$18(Analyzer.java:4699)
> at java.util.Collections$2.tryAdvance(Collections.java:4719)
> at java.util.Collections$2.forEachRemaining(Collections.java:4727)
> at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
> at 
> org.apache.impala.analysis.Analyzer.resolveActualPath(Analyzer.java:4690)
> at 
> org.apache.impala.analysis.Analyzer.lambda$addColumnsTo$17(Analyzer.java:4655)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
> at 
> org.apache.impala.analysis.Analyzer.addColumnsTo(Analyzer.java:4655)
> at 
> org.apache.impala.analysis.Analyzer.addJoinColumns(Analyzer.java:4732)
> at org.apache.impala.planner.JoinNode.init(JoinNode.java:293)
> at org.apache.impala.planner.HashJoinNode.init(HashJoinNode.java:82)
> at 
> org.apache.impala.calcite.rel.phys.ImpalaHashJoinNode.(ImpalaHashJoinNode.java:46)
> ...
> {code}
> {{SlotRef.getResolvedPath}} returns null at line 4699. Looking at the 
> SlotRef, I don't see any way to determine an origin, so this may be part of 
> incomplete implementation of the Calcite planner integration.
> To reproduce
> {code:java}
> $ start-impala-cluster.py -s 1 --use_calcite_planner=true
> $ impala-py.test 
> tests/query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8
> {code}
> *Analysis:*
> The root issue with this particular query is that it contains a very lengthy 
> [list of zip 
> codes|https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test#L12-L62]
>  that are used in a where clause. The Calcite planner is producing this join 
> node for that where clause:
> {noformat}
> 05:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: substring(tpcds.customer_address.ca_zip, 1, 5) = EXPR$0
> |  fk/pk conjuncts: assumed fk/pk
> |  runtime filters: RF002[bloom] <- EXPR$0
> {noformat}
> Since EXPR$0 is not a named column, it has a null resolved path and can be 
> skipped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13527) Duplicate S3 Paths for Workload Management Iceberg Table

2024-11-07 Thread Jason Fehr (Jira)
Jason Fehr created IMPALA-13527:
---

 Summary: Duplicate S3 Paths for Workload Management Iceberg Table
 Key: IMPALA-13527
 URL: https://issues.apache.org/jira/browse/IMPALA-13527
 Project: IMPALA
  Issue Type: Bug
Reporter: Jason Fehr
Assignee: Jason Fehr


Error writing completed queries to table sys.impala_query_log: 
{noformat}
User: impala
State: EXCEPTION

Status: IcebergTableLoadingException: Error loading metadata for Iceberg table 
s3a://datalake-name/warehouse/tablespace/external/hive/sys.db/impala_query_log

CAUSED BY: TableLoadingException: Failed to load metadata for table: 
sys.impala_query_log

CAUSED BY: IllegalArgumentException: Multiple entries with same key: 
data/cluster_id=my-cluster/start_time_utc_hour=2024-10-02-09/723531a60b0fa6a7-a526e67b_3520834245_data.0.parq=FileDescriptor{RelativePath=data/cluster_id=my-cluster/start_time_utc_hour=2024-10-02-09/723531a60b0fa6a7-a526e67b_3520834245_data.0.parq,
 Length=4967583, Compression=NONE, ModificationTime=1, Blocks=} and 
data/cluster_id=my-cluster/start_time_utc_hour=2024-10-02-09/723531a60b0fa6a7-a526e67b_3520834245_data.0.parq=FileDescriptor{RelativePath=data/cluster_id=my-cluster/start_time_utc_hour=2024-10-02-09/723531a60b0fa6a7-a526e67b_3520834245_data.0.parq,
 Length=4967583, Compression=NONE, ModificationTime=1, Blocks=}.

To index multiple values under a key, use Multimaps.index.
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13463) Impala should ignore case of Iceberg schema elements

2024-11-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-13463.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Impala should ignore case of Iceberg schema elements
> 
>
> Key: IMPALA-13463
> URL: https://issues.apache.org/jira/browse/IMPALA-13463
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.5.0
>
>
> Schema is case insensitive in Impala.
> Via Spark it's possible to create schema elements with upper/lower case 
> letters and store them in the metadata JSON files of Iceberg, e.g.:
> {noformat}
>"schemas" : [ {
>  "type" : "struct",
>  "schema-id" : 0,
>  "fields" : [ {
>"id" : 1,
>"name" : "ID",
>"required" : false,
>"type" : "string"
>  }, {
>"id" : 2,
>"name" : "OWNERID",
>"required" : false,
>"type" : "string"
>  } ]
>} ],
> {noformat}
> This can cause problems in Impala during predicate pushdown, as we can get a 
> ValidationException from the Iceberg library (as Impala pushes down 
> predicates with lower case column names, while Iceberg sees upper case names).
> We should invoke Scan.caseSensitive(boolean caseSensitive) on the TableScan 
> object to set case insensitivity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13526) Inconsistent Agg node stats recomputation.

2024-11-06 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13526:
-

 Summary: Inconsistent Agg node stats recomputation.
 Key: IMPALA-13526
 URL: https://issues.apache.org/jira/browse/IMPALA-13526
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.4.0
Reporter: Riza Suminto
Assignee: Riza Suminto


Within DistributedPlanner.java, there are several place where Planner need to 
insert extra merge aggregation node. It require transferring HAVING conjuncts 
from preaggregation node to merge aggregation, unsetting limit, and recompute 
stats of preaggregation node. However, the stats recompute is not consistently 
done, and there might be an inefficient recompute happening.

Example of inefficient recomputes:
https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L1074-L1077

Example of missing recompute for phase2AggNode:
https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L1143-L1168



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13525) Calcite planner: handle escaped characters in string literal

2024-11-06 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13525:
-

 Summary: Calcite planner: handle escaped characters in string 
literal
 Key: IMPALA-13525
 URL: https://issues.apache.org/jira/browse/IMPALA-13525
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13193) RuntimeFilter on parquet dictionary should evaluate null values

2024-11-06 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-13193.
-
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

Resolving this. Thank [~tangzhi]!

> RuntimeFilter on parquet dictionary should evaluate null values
> ---
>
> Key: IMPALA-13193
> URL: https://issues.apache.org/jira/browse/IMPALA-13193
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, 
> Impala 4.3.0, Impala 4.4.0
>Reporter: Quanlong Huang
>Assignee: Zhi Tang
>Priority: Critical
>  Labels: correctness
> Fix For: Impala 4.5.0
>
>
> IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime 
> filter on parquet dictionary values. If non of the values can pass the check, 
> the whole row group will be skipped. However, NULL values are not included in 
> the parquet dictionary. Runtime filters that accept NULL values might 
> incorrectly reject the row group if none of the dictionary values can pass 
> the check.
> Here are steps to reproduce the bug:
> {code:sql}
> create table parq_tbl (id bigint, name string) stored as parquet;
> insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc");
> create table dim_tbl (name string);
> insert into dim_tbl values (NULL);
> select * from parq_tbl p join dim_tbl d
>   on COALESCE(p.name, '') = COALESCE(d.name, '');{code}
> The SELECT query should return 2 rows but now it returns 0 rows.
> A workaround is to disable this optimization:
> {code:sql}
> set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13524) Calcite planner: support for functions in exprs.test

2024-11-06 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13524:
-

 Summary: Calcite planner: support for functions in exprs.test
 Key: IMPALA-13524
 URL: https://issues.apache.org/jira/browse/IMPALA-13524
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13523) Calcite planner: Derive default decimal precision for functions

2024-11-06 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13523:
-

 Summary: Calcite planner: Derive default decimal precision for 
functions
 Key: IMPALA-13523
 URL: https://issues.apache.org/jira/browse/IMPALA-13523
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13522) Calcite planner: "real" type should be treated as double

2024-11-06 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13522:
-

 Summary: Calcite planner: "real" type should be treated as double
 Key: IMPALA-13522
 URL: https://issues.apache.org/jira/browse/IMPALA-13522
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13521) Calcite Planner: Handle functions taking literal types from a lower level

2024-11-06 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13521:
-

 Summary: Calcite Planner: Handle functions taking literal types 
from a lower level
 Key: IMPALA-13521
 URL: https://issues.apache.org/jira/browse/IMPALA-13521
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13520) Calcite planner: support in clause coercing

2024-11-06 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13520:
-

 Summary: Calcite planner: support in clause coercing
 Key: IMPALA-13520
 URL: https://issues.apache.org/jira/browse/IMPALA-13520
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13519) Calcite Planner Does Not Set Workload Management Data

2024-11-06 Thread Jason Fehr (Jira)
Jason Fehr created IMPALA-13519:
---

 Summary: Calcite Planner Does Not Set Workload Management Data
 Key: IMPALA-13519
 URL: https://issues.apache.org/jira/browse/IMPALA-13519
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Jason Fehr
Assignee: Steve Carlin


Workload management writes the tables queried and the select, join, where, 
aggregate, order by columns to the sys.impala_query_log table.  This data is 
determined by the frontend planner and returned to the backend coordinator via 
the [TExecRequest 
object|https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/common/thrift/Frontend.thrift#L702-L718].

The Calcite planner needs to provide this information as well on the 
TExecRequest object is creates.

The 
[test_workload_mgmt_sql_details.py|https://github.com/apache/impala/blob/master/tests/custom_cluster/test_workload_mgmt_sql_details.py]
 custom cluster test is the best test to determine if Calcite is generating the 
correct values for each piece of data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-12758) Event Processor is ignoring the prev_id while reloading the existing partitions

2024-11-06 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala resolved IMPALA-12758.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Event Processor is ignoring the prev_id while reloading the existing 
> partitions
> ---
>
> Key: IMPALA-12758
> URL: https://issues.apache.org/jira/browse/IMPALA-12758
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: catalog-2024
> Fix For: Impala 4.5.0
>
>
> Partitioned table Insert events consumed by the event processor reloads the 
> partitions to update file metadata. Currently while reloading the 
> filemetadata, 'prev_id' of the old partitions is being ignored in the 
> partition builder
> {code:java}
> HdfsPartition oldPartition = entry.getValue();
> HdfsPartition.Builder partBuilder = createPartitionBuilder(
>    hmsPartition.getSd(), hmsPartition, permissionCache); {code}
> As a result 'prev_id' of the partBuilder will always be -1, so when the 
> catalogDelta is sent from the state store to impala demons, because prev_id 
> is not valid, impalads will not know whether to invalidate the current 
> partition and then request the new partition information. 
> This might lead to data correctness issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13148) Show the number of in-progress Catalog operations

2024-11-06 Thread Saurabh Katiyal (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Katiyal resolved IMPALA-13148.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Show the number of in-progress Catalog operations
> -
>
> Key: IMPALA-13148
> URL: https://issues.apache.org/jira/browse/IMPALA-13148
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Quanlong Huang
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: newbie, ramp-up
> Fix For: Impala 4.5.0
>
> Attachments: Selection_122.png, Selection_123.png
>
>
> In the /operations page of catalogd WebUI, the list of In-progress Catalog 
> Operations are shown. It'd be helpful to also show the number of such 
> operations. Like in the /queries page of coordinator WebUI, it shows 100 
> queries in flight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13518) Show target name of COMMIT_TXN events in logs

2024-11-05 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13518:
---

 Summary: Show target name of COMMIT_TXN events in logs
 Key: IMPALA-13518
 URL: https://issues.apache.org/jira/browse/IMPALA-13518
 Project: IMPALA
  Issue Type: Task
Reporter: Quanlong Huang
Assignee: Quanlong Huang


IMPALA-12460 adds logs about Top-10 expensive events and Top-10 expensive 
targets. However, for COMMIT_TXN events, the target is just "CLUSTER_WIDE":
{noformat}
Top 9 targets in event processing: (target=CLUSTER_WIDE, duration_ms=955792) 
{noformat}
It'd be helpful to show the name of the tables involved in the transaction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13502) Constructor cleanup

2024-11-05 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13502.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Constructor cleanup
> ---
>
> Key: IMPALA-13502
> URL: https://issues.apache.org/jira/browse/IMPALA-13502
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Minor
> Fix For: Impala 4.5.0
>
>
> Various cleanup ideas around constructors identified in IMPALA-12390.
> - Replace {{const shared_ptr<>&}} with row pointers to make the API simpler 
> and more general.
> - LLVM CreateBinaryPhiNode, CodegenNullPhiNode, and CodegenIsNullPhiNode 
> should all make {{name}} mandatory. Remove empty name handling from 
> CreateBinaryPhiNode.
> - Several TSaslServerTransport constructors may be unused.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13396) Unify temporary dir management in CustomClusterTestSuite

2024-11-05 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-13396.
---
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Unify temporary dir management in CustomClusterTestSuite
> 
>
> Key: IMPALA-13396
> URL: https://issues.apache.org/jira/browse/IMPALA-13396
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Test
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> There are many custom cluster test that require creating temporary director. 
> The temporary directory typically live within a scope of test method and 
> cleaned afterwards. However, some test do create temporary directly and 
> forgot to clean them afterwards, leaving junk dirs under /tmp/ or $LOG_DIR.
> We can unify the temporary directory management inside 
> CustomClusterTestSuite. Some argument of CustomClusterTestSuite.with_args(), 
> such as 'impalad_args', 'catalogd_args', and 'impala_log_dir', should accept 
> formatting pattern that is replaceable by a temporary dir path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13517) Calcite Planner: fix || operator (both concat and or)

2024-11-05 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13517:
-

 Summary: Calcite Planner: fix || operator (both concat and or)
 Key: IMPALA-13517
 URL: https://issues.apache.org/jira/browse/IMPALA-13517
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13516) Calcite Planner: Fix explicit cast issues

2024-11-05 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13516:
-

 Summary: Calcite Planner: Fix explicit cast issues
 Key: IMPALA-13516
 URL: https://issues.apache.org/jira/browse/IMPALA-13516
 Project: IMPALA
  Issue Type: Improvement
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13515) ORC tables hit IllegalStateException due to "row__id" column

2024-11-05 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13515:
--

 Summary: ORC tables hit IllegalStateException due to "row__id" 
column
 Key: IMPALA-13515
 URL: https://issues.apache.org/jira/browse/IMPALA-13515
 Project: IMPALA
  Issue Type: Sub-task
  Components: Frontend
Reporter: Joe McDonnell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13334) test_sort.py hit DCHECK when max_sort_run_size>0

2024-11-05 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-13334.
---
 Fix Version/s: Impala 4.5.0
Target Version: Impala 4.5.0
Resolution: Fixed

> test_sort.py hit DCHECK when max_sort_run_size>0
> 
>
> Key: IMPALA-13334
> URL: https://issues.apache.org/jira/browse/IMPALA-13334
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Test
>Reporter: Riza Suminto
>Assignee: Noemi Pap-Takacs
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> test_sort.py declare 'max_sort_run_size' query option, but has silently not 
> exercising it. Fixing the query option declaration using helper function 
> add_exec_option_dimension() reveals a DCHECK failure in sorter.cc
> {code:java}
> F0827 16:45:38.425906 2405388 sorter.cc:1183] 
> 054e9b0e1fecdaaf:298af369] Check failed: !*allocation_failed && 
> unsorted_run_->run_size() == inmem_run_max_pages_{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13514) Consider a mode where the Status from a Java exception includes the stack trace

2024-11-05 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13514:
--

 Summary: Consider a mode where the Status from a Java exception 
includes the stack trace
 Key: IMPALA-13514
 URL: https://issues.apache.org/jira/browse/IMPALA-13514
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


JniUtil::GetJniExceptionMsg() takes the message from a Java exception and turns 
it into an error Status. It currently has a mode where it writes the 
exception's stack trace to the log file. It might be nice to have a mode where 
it includes the exception stack trace in the actual error Status message, which 
will go all the way to the client.

When a user hits some ambiguous error (e.g. an InternalStateException with no 
message), the stack trace is useful for tracking down the code location. If 
there is a mode where the stack trace is in the error message, it eliminates 
the need to search through logs (which can be enormous).

This is also useful for cases where tests are running in parallel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13513) Calcite planner: support decode function

2024-11-05 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13513:
-

 Summary: Calcite planner: support decode function
 Key: IMPALA-13513
 URL: https://issues.apache.org/jira/browse/IMPALA-13513
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13512) Print .test file name if PlannerTest fail

2024-11-05 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13512:
-

 Summary: Print .test file name if PlannerTest fail
 Key: IMPALA-13512
 URL: https://issues.apache.org/jira/browse/IMPALA-13512
 Project: IMPALA
  Issue Type: Improvement
  Components: Test
Reporter: Riza Suminto
Assignee: Riza Suminto


If PlannerTest fail, error message will show hint of which test case fail by 
printing the section and line number like this:
{code:java}
Error Message
Section PLAN of query at line 239: {code}
This can be improved by also printing the path to .test file that is failed 
like this:
{code:java}
Error Message
Section PLAN of query at 
functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:239:
 {code}

PlannerTest should also skip printing VERBOSE plan if 
PlannerTestOption.EXTENDED_EXPLAIN is specified, since EXTENDED level already 
contains sufficient details including tuples, sizes, and cardinality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13507) Add param to disable glog buffering in with_args fixture

2024-11-05 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-13507.
---
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Add param to disable glog buffering in with_args fixture
> 
>
> Key: IMPALA-13507
> URL: https://issues.apache.org/jira/browse/IMPALA-13507
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Test
>Affects Versions: Impala 4.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> We have plenty of custom_cluster tests that assert against content of Impala 
> daemon log files while the process is still running using 
> assert_log_contains() and it's wrappers. The method specifically mention 
> about disabling glog buffering ('-logbuflevel=-1'), but not all 
> custom_cluster tests do that.
> {code:java}
>   def assert_log_contains(self, daemon, level, line_regex, expected_count=1, 
> timeout_s=6,
>       dry_run=False):
>     """
>     Assert that the daemon log with specified level (e.g. ERROR, WARNING, 
> INFO) contains
>     expected_count lines with a substring matching the regex. When 
> expected_count is -1,
>     at least one match is expected.
>     Retries until 'timeout_s' has expired. The default timeout is the default 
> minicluster
>     log buffering time (5 seconds) with a one second buffer.
>     When using this method to check log files of running processes, the 
> caller should
>     make sure that log buffering has been disabled, for example by adding
>     '-logbuflevel=-1' to the daemon startup options or set timeout_s to a 
> value higher
>     than the log flush interval.    Returns the result of the very last call 
> to line_regex.search or None if
>     expected_count is 0 or the line_regex did not match any lines.
>     """ {code}
> This often result in flaky test that hard to triage and often neglected if it 
> does not frequently run in core exploration.
> We can improve this by adding boolean param into 
> CustomClusterTestSuite.with_args, say 'disable_log_buffering', for test to 
> declare intention to inspect log files in live minicluster. If it is True, 
> start minicluster with '-logbuflevel=-1' for all daemons. If it is False, log 
> WARNING on any calls to assert_log_contains().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13511) Calcite planner: support sub-millisecond datetime parts

2024-11-05 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13511:
-

 Summary: Calcite planner: support sub-millisecond datetime parts
 Key: IMPALA-13511
 URL: https://issues.apache.org/jira/browse/IMPALA-13511
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13468) Calcite planner: fix aggregation.test queries

2024-11-05 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin resolved IMPALA-13468.
---
Resolution: Fixed

> Calcite planner: fix aggregation.test queries
> -
>
> Key: IMPALA-13468
> URL: https://issues.apache.org/jira/browse/IMPALA-13468
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13455) Calcite planner: convert expressions to normal form for performance

2024-11-05 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin resolved IMPALA-13455.
---
Resolution: Fixed

> Calcite planner: convert expressions to normal form for performance
> ---
>
> Key: IMPALA-13455
> URL: https://issues.apache.org/jira/browse/IMPALA-13455
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>
> Enables q13, q48 in tpcds



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13461) Calcite planner: Need some translation rules to get tpcds queries to work

2024-11-05 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin resolved IMPALA-13461.
---
Resolution: Fixed

> Calcite planner: Need some translation rules to get tpcds queries to work
> -
>
> Key: IMPALA-13461
> URL: https://issues.apache.org/jira/browse/IMPALA-13461
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (IMPALA-13221) Calcite: fix and enable tpcds and tpch tests

2024-11-05 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin closed IMPALA-13221.
-
Resolution: Won't Fix

Originally this Jira was set up as a checkpoint for Calcite, but this will come 
later with other tests.

> Calcite: fix and enable tpcds and tpch tests
> 
>
> Key: IMPALA-13221
> URL: https://issues.apache.org/jira/browse/IMPALA-13221
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>
> As a minor milestone, making the tpcds and tpch queries work will show that 
> use case queries work in the Calcite framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13096) Cleanup Parser.jj for Calcite planner to only use supported syntax

2024-11-05 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin resolved IMPALA-13096.
---
Resolution: Fixed

> Cleanup Parser.jj for Calcite planner to only use supported syntax
> --
>
> Key: IMPALA-13096
> URL: https://issues.apache.org/jira/browse/IMPALA-13096
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (IMPALA-9170) close idle connections without an associated session

2024-11-04 Thread YUBI LEE (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YUBI LEE closed IMPALA-9170.

Resolution: Duplicate

> close idle connections without an associated session
> 
>
> Key: IMPALA-9170
> URL: https://issues.apache.org/jira/browse/IMPALA-9170
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Xiaomin Zhang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> With the fix of IMPALA-7802,  Impala now can close an idle connection after a 
> configured interval. But it still leaves some connections opened which has no 
> associated sessions:
> [https://github.com/cloudera/Impala/blob/cdh6.2.1/be/src/service/impala-server.cc#L2078]
> if (it == connection_to_sessions_map_.end()) return false;
> Some clients like HUE could use different connections to check the query 
> status or fetch result. In these cases, those connections have no associated 
> sessions, and not added into the connection_to_sessions_map. This caused 
> issues when we use Radware to load balance Impala, because Radware does not 
> send FIN to close an idle connection, but require backend to close idle 
> connections. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13510) Unset the environment variable for tuple cache tests

2024-11-04 Thread Yida Wu (Jira)
Yida Wu created IMPALA-13510:


 Summary: Unset the environment variable for tuple cache tests
 Key: IMPALA-13510
 URL: https://issues.apache.org/jira/browse/IMPALA-13510
 Project: IMPALA
  Issue Type: Bug
Reporter: Yida Wu
Assignee: Yida Wu


The test_cache_disabled test case would fail in the tuple cache build because 
the build enables the tuple cache using the environment variables, while the 
test case requires the tuple cache to remain disabled. To unset the related 
environment variables is a way to resolve the failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13509) Avoid duplicate deepcopy duing hash partitioning in KrpcDataStreamSender

2024-11-04 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13509:


 Summary: Avoid duplicate deepcopy duing hash partitioning in 
KrpcDataStreamSender
 Key: IMPALA-13509
 URL: https://issues.apache.org/jira/browse/IMPALA-13509
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


Currently all rows are deep copied twice:
1. to the RowBatch of the given channel
2. to an OutboundRowBatch when the collector RowBatch is at capacity

Copying directly to an OutboundRowBatch could avoid some CPU work.
The would also allow easier implementation of the following improvements:
- deduplicate tuples similarly to broadcast/unpartitioned exchange 
(IMPALA-13225).
- keep outbound row batch size below data_stream_sender_buffer_size even for 
var len data 





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-12433) KrpcDataStreamSender could share some buffers between channels

2024-11-04 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12433.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> KrpcDataStreamSender could share some buffers between channels
> --
>
> Key: IMPALA-12433
> URL: https://issues.apache.org/jira/browse/IMPALA-12433
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: memory-saving, performance
> Fix For: Impala 4.4.0
>
>
> Currently each channel has two outbound row batches and each of those have 2 
> buffers, one for serialization and another for compression.
> https://github.com/apache/impala/blob/0f55e551bc98843c79a9ec82582ddca237aa4fe9/be/src/runtime/row-batch.h#L100
> https://github.com/apache/impala/blob/0f55e551bc98843c79a9ec82582ddca237aa4fe9/be/src/runtime/krpc-data-stream-sender.cc#L236
> https://github.com/apache/impala/blob/0f55e551bc98843c79a9ec82582ddca237aa4fe9/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java#L81
> As serialization + compression is always done from the fragment instance 
> thread only one compression is done at a time, so a single compression buffer 
> could be shared between channels. If this buffer is sent via KRPC then it 
> could be swapped with the per channel buffer. 
> As far as I understand at least one buffer per channel is needed because  
> async KRPC calls can use it from another thread (this is done to avoid an 
> extra copy of the buffer before RPCs). We can only reuse that buffer after 
> getting a callback from KRPC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13296) Hive to Iceberg table-migration: pre-check column compatibility

2024-11-04 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-13296.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Hive to Iceberg table-migration: pre-check column compatibility
> ---
>
> Key: IMPALA-13296
> URL: https://issues.apache.org/jira/browse/IMPALA-13296
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.5.0
>
>
> The table migration from a Hive table to an Iceberg table is a multi-step 
> process that has a middle step to rename the original table to a temp name. 
> If a later step fails the user gets an error but the table remains renamed 
> (the user also gets a hint how to set the original name back).
> When the failure is column incompatibility (e.g. Iceberg doesn't support 
> Hive's smallint, tinyint, varchar( n ) column types) we can do better because 
> this imcompatibility could also be found during query analysis. That way the 
> error could be sent before we rename the table. This results in a cleaner 
> user experience.
> {code:java}
> Query: alter table hive_tbl convert to iceberg ERROR: 
> IllegalArgumentException: Unsupported Hive type: VARCHAR, use string instead 
> Your table might have been renamed. To reset the name try running: ALTER 
> TABLE default.hive_tbl_tmp_8fb36dff RENAME TO default.hive_tbl;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13508) Create SetMetric listing active subsystem in an Impala Daemon

2024-11-01 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13508:
-

 Summary: Create SetMetric listing active subsystem in an Impala 
Daemon
 Key: IMPALA-13508
 URL: https://issues.apache.org/jira/browse/IMPALA-13508
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Riza Suminto


Impala has several subsystem that can be enabled/disabled through backend flags 
such as Admission Control, Workload Management, HMS Event Processing, and so 
on. It will be great to have SetMetric listing all active subsystem in an 
Impala daemon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13507) Add param to disable glog buffering in with_args fixture

2024-11-01 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13507:
-

 Summary: Add param to disable glog buffering in with_args fixture
 Key: IMPALA-13507
 URL: https://issues.apache.org/jira/browse/IMPALA-13507
 Project: IMPALA
  Issue Type: Improvement
  Components: Test
Affects Versions: Impala 4.4.0
Reporter: Riza Suminto
Assignee: Riza Suminto


We have plenty of custom_cluster tests that assert against content of Impala 
daemon log files while the process is still running using assert_log_contains() 
and it's wrappers. The method specifically mention about disabling glog 
buffering ('-logbuflevel=-1'), but not all custom_cluster tests do that.
{code:java}
  def assert_log_contains(self, daemon, level, line_regex, expected_count=1, 
timeout_s=6,
      dry_run=False):
    """
    Assert that the daemon log with specified level (e.g. ERROR, WARNING, INFO) 
contains
    expected_count lines with a substring matching the regex. When 
expected_count is -1,
    at least one match is expected.
    Retries until 'timeout_s' has expired. The default timeout is the default 
minicluster
    log buffering time (5 seconds) with a one second buffer.
    When using this method to check log files of running processes, the caller 
should
    make sure that log buffering has been disabled, for example by adding
    '-logbuflevel=-1' to the daemon startup options or set timeout_s to a value 
higher
    than the log flush interval.    Returns the result of the very last call to 
line_regex.search or None if
    expected_count is 0 or the line_regex did not match any lines.
    """ {code}
This often result in flaky test that hard to triage and often neglected if it 
does not frequently run in core exploration.

We can improve this by adding boolean param into 
CustomClusterTestSuite.with_args, say 'disable_log_buffering', for test to 
declare intention to inspect log files in live minicluster. If it is True, 
start minicluster with '-logbuflevel=-1' for all daemons. If it is False, log 
WARNING on any calls to assert_log_contains().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-12390) Enable performance related clang-tidy checks

2024-10-31 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12390.

Resolution: Fixed

> Enable performance related clang-tidy checks
> 
>
> Key: IMPALA-12390
> URL: https://issues.apache.org/jira/browse/IMPALA-12390
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> clang-tidy has several performance-related checks that seem like they would 
> be useful to enforce. Here are some examples:
> {noformat}
> /home/joemcdonnell/upstream/Impala/be/src/runtime/types.h:313:25: warning: 
> loop variable is copied but only used as const reference; consider making it 
> a const reference [performance-for-range-copy]
>         for (ColumnType child_type : col_type.children) {
>              ~~ ^
>              const &
> /home/joemcdonnell/upstream/Impala/be/src/catalog/catalog-util.cc:168:34: 
> warning: 'find' called with a string literal consisting of a single 
> character; consider using the more effective overload accepting a character 
> [performance-faster-string-find]
>       int pos = object_name.find(".");
>                                  ^~~~
>                                  '.'
> /home/joemcdonnell/upstream/Impala/be/src/util/decimal-util.h:55:53: warning: 
> the parameter 'b' is copied for each invocation but only used as a const 
> reference; consider making it a const reference 
> [performance-unnecessary-value-param]
>   static int256_t SafeMultiply(int256_t a, int256_t b, bool may_overflow) {
>                                             ^
>                                            const &
> /home/joemcdonnell/upstream/Impala/be/src/codegen/llvm-codegen.cc:847:5: 
> warning: 'push_back' is called inside a loop; consider pre-allocating the 
> vector capacity before the loop [performance-inefficient-vector-operation]
>     arguments.push_back(args_[i].type);
>     ^{noformat}
> In all, they seem to flag things that developers wouldn't ordinarily notice, 
> and it doesn't seem to have too many false positives. We should look into 
> enabling these.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13505) NullPointerException in Analyzer.resolveActualPath with Calcite planner

2024-10-31 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13505:
--

 Summary: NullPointerException in Analyzer.resolveActualPath with 
Calcite planner
 Key: IMPALA-13505
 URL: https://issues.apache.org/jira/browse/IMPALA-13505
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Michael Smith


Encountered a NullPointerException when running some TPC-DS queries (such as 
q8) with the Calcite planner:
{code}
Stack Trace:java.lang.NullPointerException
at 
org.apache.impala.analysis.Analyzer.lambda$resolveActualPath$18(Analyzer.java:4699)
at java.util.Collections$2.tryAdvance(Collections.java:4719)
at java.util.Collections$2.forEachRemaining(Collections.java:4727)
at 
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
at 
org.apache.impala.analysis.Analyzer.resolveActualPath(Analyzer.java:4690)
at 
org.apache.impala.analysis.Analyzer.lambda$addColumnsTo$17(Analyzer.java:4655)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.apache.impala.analysis.Analyzer.addColumnsTo(Analyzer.java:4655)
at 
org.apache.impala.analysis.Analyzer.addJoinColumns(Analyzer.java:4732)
at org.apache.impala.planner.JoinNode.init(JoinNode.java:293)
at org.apache.impala.planner.HashJoinNode.init(HashJoinNode.java:82)
at 
org.apache.impala.calcite.rel.phys.ImpalaHashJoinNode.(ImpalaHashJoinNode.java:46)
...
{code}

{{SlotRef.getResolvedPath}} returns null at line 4699. Looking at the SlotRef, 
I don't see any way to determine an origin, so this may be part of incomplete 
implementation of the Calcite planner integration.

To reproduce
{code}
$ start-impala-cluster.py -s 1 --use_calcite_planner=true
$ impala-py.test 
tests/query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13506) Crash in RawValue::PrintValue() when running query_test/test_chars.py

2024-10-31 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13506:
--

 Summary: Crash in RawValue::PrintValue() when running 
query_test/test_chars.py
 Key: IMPALA-13506
 URL: https://issues.apache.org/jira/browse/IMPALA-13506
 Project: IMPALA
  Issue Type: Sub-task
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


I ran into a crash with the Calcite planner when running test_chars.py. Here 
are reproducing conditions:

 
{noformat}
# Start Impala cluster with --use_calcite_planner=true
# Connect to Impala using Beeswax
bin/impala-shell.sh --protocol=beeswax

# Run these statements
use functional;
WITH numbered AS (
  SELECT *, row_number() over (order by cs) as rn
  FROM chars_tiny)
SELECT *
FROM (
  SELECT CASE WHEN rn % 2 = 0 THEN cs END cs,
    CASE WHEN rn % 2 = 1 THEN cl END cl,
    CASE WHEN rn % 3 = 0 THEN vc END vc
  FROM numbered
  UNION ALL
  SELECT CASE WHEN rn % 2 = 1 THEN cs END cs,
    CASE WHEN rn % 2 = 0 THEN cl END cl,
    CASE WHEN rn % 3 = 1 THEN vc END vc
  FROM numbered) v{noformat}
It hits this DCHECK with this stacktrace:

 
{noformat}
F1031 14:45:41.711074 2288125 raw-value.cc:471] 
65447b8728b9f39a:cdb466c3] Check failed: string_val->Len() <= type.len

 6  impalad!google::LogMessageFatal::~LogMessageFatal() [logging.cc : 2048 + 
0x5]
 7  impalad!impala::RawValue::PrintValue(void const*, impala::ColumnType 
const&, int, std::__cxx11::basic_stringstream, 
std::allocator >*, bool) [raw-value.cc : 471 + 0x16]
 8  
impalad!impala::AsciiQueryResultSet::AddRows(std::vector > const&, impala::RowBatch*, int, 
int) [query-result-set.cc : 222 + 0x1b]
 9  impalad!impala::BufferedPlanRootSink::GetNext(impala::RuntimeState*, 
impala::QueryResultSet*, int, bool*, long) [buffered-plan-root-sink.cc : 239 + 
0x1b]
10  impalad!impala::Coordinator::GetNext(impala::QueryResultSet*, int, bool*, 
long) [coordinator.cc : 1051 + 0x23]
11  impalad!impala::ClientRequestState::FetchRowsInternal(int, 
impala::QueryResultSet*, long) [client-request-state.cc : 1425 + 0x1f]
12  impalad!impala::ClientRequestState::FetchRows(int, impala::QueryResultSet*, 
long) [client-request-state.cc : 1272 + 0x18]
13  impalad!impala::ImpalaServer::FetchInternal(impala::TUniqueId, bool, int, 
beeswax::Results*) [impala-beeswax-server.cc : 688 + 0x20]
14  impalad!impala::ImpalaServer::fetch(beeswax::Results&, beeswax::QueryHandle 
const&, bool, int) [impala-beeswax-server.cc : 205 + 0x35]{noformat}
 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13504) AI functions should be included in FunctionCallExpr::isNondeterministicBuiltinFn

2024-10-31 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13504:
--

 Summary: AI functions should be included in 
FunctionCallExpr::isNondeterministicBuiltinFn
 Key: IMPALA-13504
 URL: https://issues.apache.org/jira/browse/IMPALA-13504
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


FunctionCallExpr::isNondeterministicBuiltinFn() is used to determine if a 
function can produce different results for different invocations with the same 
arguments in a single query. Currently, it applies to rand/random/uuid. It can 
influence sorting (see IMPALA-4728). It is also controls whether constant 
folding can be used (if all the arguments are constants). It would be uncommon 
for an AI function to be used on a constant, but it is theoretically possible.

It seems like ai_generate_text / ai_generate_text_default should be on that 
list, because it isn't deterministic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13503) Support CustomClusterTestSuite with single cluster for a class

2024-10-31 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13503:
--

 Summary: Support CustomClusterTestSuite with single cluster for a 
class
 Key: IMPALA-13503
 URL: https://issues.apache.org/jira/browse/IMPALA-13503
 Project: IMPALA
  Issue Type: Task
Reporter: Michael Smith


Support creating custom cluster test classes where a single cluster is 
configured for the whole class, and re-used for individual test cases. This can 
significantly speed up certain types of custom cluster tests, such as tuple 
cache tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13502) Constructor cleanup

2024-10-31 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13502:
--

 Summary: Constructor cleanup
 Key: IMPALA-13502
 URL: https://issues.apache.org/jira/browse/IMPALA-13502
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Reporter: Michael Smith


Various cleanup ideas around constructors identified in IMPALA-12390.
- Replace {{const shared_ptr<>&}} with row pointers to make the API simpler and 
more general.
- PrintIdSet in debug-util.h may be replaceable with PrintIdsInMultiLine for 
all existing use cases.
- LLVM CreateBinaryPhiNode, CodegenNullPhiNode, and CodegenIsNullPhiNode should 
all make {{name}} mandatory. Remove empty name handling from 
CreateBinaryPhiNode.
- Several TSaslServerTransport constructors may be unused.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13325) Use RowBatch::CopyRows in IcebergDeleteNode

2024-10-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-13325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-13325.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Use RowBatch::CopyRows in IcebergDeleteNode
> ---
>
> Key: IMPALA-13325
> URL: https://issues.apache.org/jira/browse/IMPALA-13325
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.5.0
>
>
> Typically there are much more data records than delete records in a healthy 
> Iceberg table. This means it is suboptimal to copy probe rows one by one in 
> the IcebergDeleteNode. We should use the new RowBatch::CopyRows method to 
> copy tuple rows in batches.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-12769) test_query_cancel_exception failed in ASAN build

2024-10-31 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12769.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> test_query_cancel_exception failed in ASAN build
> 
>
> Key: IMPALA-12769
> URL: https://issues.apache.org/jira/browse/IMPALA-12769
> Project: IMPALA
>  Issue Type: Bug
>Reporter: David Rorke
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> IMPALA-12493 added test_query_cancel_exception. It is failing in ASAN build 
> with following error:
> {noformat}
> Error Message
> assert 1 == 0
> Stacktrace
> webserver/test_web_pages.py:1044: in test_query_cancel_exception assert 
> response_json['num_in_flight_queries'] == 0 E assert 1 == 0
> {noformat}
> This appears to be the same failure reported in IMPALA-12542 but for a 
> different test case. It's possible that the underlying cause is the same 
> (timing issue caused by slowness of ASAN build) and we just need to apply the 
> same fix from IMPALA-12542 to this test case.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13340) COPY TESTCASE in LocalCatalog mode doesn't dump the partition and file metadata

2024-10-30 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-13340.
-
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

Resolving this. Thank [~MikaelSmith] and [~jasonmfehr] for the review!

> COPY TESTCASE in LocalCatalog mode doesn't dump the partition and file 
> metadata
> ---
>
> Key: IMPALA-13340
> URL: https://issues.apache.org/jira/browse/IMPALA-13340
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> IMPALA-11901 fixes the failures of using COPY TESTCASE statements in 
> LocalCatalog mode. However, only the table metadata is dumped, e.g. table 
> schema, column stats. The partition and file metadata are missing.
> To reproduce the issue locally, start the Impala cluster in LocalCatalog mode.
> {code:bash}
> bin/start-impala-cluster.py --catalogd_args=--catalog_topic_mode=minimal 
> --impalad_args=--use_local_catalog{code}
> Dump the metadata of a query on a partitioned table:
> {noformat}
> copy testcase to '/tmp' select * from functional_parquet.alltypes;
> +--+
> | Test case data output path  
>  |
> +--+
> | 
> hdfs://localhost:20500/tmp/impala-testcase-data-c8316356-6448-4458-acad-c2f72f43c3e1
>  |
> +--+
>  {noformat}
> Check the metadata from the source cluster
> {noformat}
> show partitions functional_parquet.alltypes
> +---+---+---++--+--+---+-+---+---+---+
> | year  | month | #Rows | #Files | Size | Bytes Cached | Cache 
> Replication | Format  | Incremental stats | Location  
> | EC Policy |
> +---+---+---++--+--+---+-+---+---+---+
> | 2009  | 1 | -1| 1  | 8.60KB   | NOT CACHED   | NOT CACHED   
>  | PARQUET | false | 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=1  | 
> NONE  |
> | 2009  | 2 | -1| 1  | 8.09KB   | NOT CACHED   | NOT CACHED   
>  | PARQUET | false | 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=2  | 
> NONE  |
> | 2009  | 3 | -1| 1  | 8.60KB   | NOT CACHED   | NOT CACHED   
>  | PARQUET | false | 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=3  | 
> NONE  |
> | 2009  | 4 | -1| 1  | 8.20KB   | NOT CACHED   | NOT CACHED   
>  | PARQUET | false | 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=4  | 
> NONE  |
> | 2009  | 5 | -1| 1  | 8.55KB   | NOT CACHED   | NOT CACHED   
>  | PARQUET | false | 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=5  | 
> NONE  |
> | 2009  | 6 | -1| 1  | 8.23KB   | NOT CACHED   | NOT CACHED   
>  | PARQUET | false | 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=6  | 
> NONE  |
> | 2009  | 7 | -1| 1  | 8.25KB   | NOT CACHED   | NOT CACHED   
>  | PARQUET | false | 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=7  | 
> NONE  |
> | 2009  | 8 | -1| 1  | 8.60KB   | NOT CACHED   | NOT CACHED   
>  | PARQUET | false | 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=8  | 
> NONE  |
> | 2009  | 9 | -1| 1  | 8.41KB   | NOT CACHED   | NOT CACHED   
>  | PARQUET | false | 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=9  | 
> NONE  |
> | 2009  | 10| -1| 1  | 8.60KB   | NOT CACHED   | NOT CACHED   
>  | PARQUET | false | 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=10 | 
> NONE  |
> | 2009  | 11|

[jira] [Resolved] (IMPALA-13497) Add profile counters for bytes written / read from the tuple cache

2024-10-30 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13497.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Add profile counters for bytes written / read from the tuple cache
> --
>
> Key: IMPALA-13497
> URL: https://issues.apache.org/jira/browse/IMPALA-13497
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The size of the tuple cache entry written / read is useful information for 
> understanding the performance of the cache. Having this information in the 
> profile will help us tune the placement policy for the tuple cache nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13500) test_invalidate_stale_partition_on_reload is flaky

2024-10-30 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13500:
-

 Summary: test_invalidate_stale_partition_on_reload is flaky
 Key: IMPALA-13500
 URL: https://issues.apache.org/jira/browse/IMPALA-13500
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog, Frontend
Reporter: Riza Suminto
Assignee: Sai Hemanth Gantasala


TestEventProcessingCustomConfigs.test_invalidate_stale_partition_on_reload is 
flaky in ARM build for not finding log lines it is looking for.
{code:java}
Error Message

AssertionError: Expected 1 lines in file 
/data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-1e5d.vpc.cloudera.com.jenkins.log.INFO.20241030-041545.3408110
 matching regex 'Invalidated objects in cache: \[partition 
test_invalidate_stale_partition_on_reload_382624f9.test_invalidate_table:p=\d 
\(id=0\)\]', but found 0 lines. Last line was:  I1030 04:16:03.199949 3408505 
query-exec-mgr.cc:219] ReleaseQueryState(): deleted 
query_id=7745d0a812d0b474:7883f4f8


Stacktrace

custom_cluster/test_events_custom_configs.py:1328: in 
test_invalidate_stale_partition_on_reload
self.assert_impalad_log_contains('INFO', log_regex % 0)
common/impala_test_suite.py:1311: in assert_impalad_log_contains
"impalad", level, line_regex, expected_count, timeout_s, dry_run)
common/impala_test_suite.py:1364: in assert_log_contains
(expected_count, log_file_path, line_regex, found, line)
E   AssertionError: Expected 1 lines in file 
/data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-1e5d.vpc.cloudera.com.jenkins.log.INFO.20241030-041545.3408110
 matching regex 'Invalidated objects in cache: \[partition 
test_invalidate_stale_partition_on_reload_382624f9.test_invalidate_table:p=\d 
\(id=0\)\]', but found 0 lines. Last line was: 
E   I1030 04:16:03.199949 3408505 query-exec-mgr.cc:219] ReleaseQueryState(): 
deleted query_id=7745d0a812d0b474:7883f4f8
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13501) Conflicting commits to Iceberg tables leave uncommitted orphan files

2024-10-30 Thread Noemi Pap-Takacs (Jira)
Noemi Pap-Takacs created IMPALA-13501:
-

 Summary: Conflicting commits to Iceberg tables leave uncommitted 
orphan files
 Key: IMPALA-13501
 URL: https://issues.apache.org/jira/browse/IMPALA-13501
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Noemi Pap-Takacs


Iceberg supports multiple writers with optimistic concurrency. Each writer can 
write new files which are then added to the table after a validation check to 
ensure that the commit does not conflict with other modifications made during 
the execution.

When there was a conflicting change and the newly written files cannot be 
committed, there are 2 ways to proceed: the commit can be retried and rebased 
on top of the latest snapshot. If this cannot resolve the conflict, the change 
cannot be committed and the files become orphan files in the file system.

It would be nice to remove the remaining files from an unsuccessful commit in 
one step. Deleting orphan files later as a table maintenance step is also a 
possible resolution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13499) REFRESH on Iceberg tables can lead to data loss

2024-10-30 Thread Saulius Valatka (Jira)
Saulius Valatka created IMPALA-13499:


 Summary: REFRESH on Iceberg tables can lead to data loss
 Key: IMPALA-13499
 URL: https://issues.apache.org/jira/browse/IMPALA-13499
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 4.4.1
Reporter: Saulius Valatka


When running a REFRESH statement on an Iceberg table the catalog loads it from 
the Hive metastore and later performs an {{alter_table}} 
[here|https://github.com/apache/impala/blob/bdce7778b239f6fbf8ea89ea32b91a83c8017828/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L445].
 It does so without taking a Hive lock, meaning that if any external process 
commits to the table between load and alter, the newly committed 
"metadata_location" property will be overwritten with the previous value and 
effectively will result in data loss.

It should either take a Hive lock when doing this, or, if 
"{{{}iceberg.engine.hive.lock-enabled = false{}}}" use 
"{{{}alter_table_with_environmentContext{}}}" and set 
{{expected_parameter_key}} / expected_parameter_value to metadata_location / 
.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13498) TestQueryLive.test_executor_groups is flaky

2024-10-29 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13498:
--

 Summary: TestQueryLive.test_executor_groups is flaky
 Key: IMPALA-13498
 URL: https://issues.apache.org/jira/browse/IMPALA-13498
 Project: IMPALA
  Issue Type: Task
  Components: Test
Affects Versions: Impala 4.4.1
Reporter: Michael Smith


{code}
custom_cluster/test_query_live.py:299: in test_executor_groups
self.assert_only_coordinators(result.runtime_profile, coords=[0, 1], 
execs=[2, 3])
custom_cluster/test_query_live.py:63: in assert_only_coordinators
self.assert_impalads(profile, coords, execs)
custom_cluster/test_query_live.py:60: in assert_impalads
assert ":" + str(DEFAULT_KRPC_PORT + port_idx) not in profile
E   assert ':27003' not in 'Query (id=1d439574eef2... TotalTime: 15.635us\n'
E ':27003' is contained here:
E   Query (id=1d439574eef2a2d7:27003d50):
E ?   ++
{code}

The assertion here is not specific enough, so unique identifiers can 
accidentally match it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13497) Add profile counters for bytes written / read from the tuple cache

2024-10-29 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13497:
--

 Summary: Add profile counters for bytes written / read from the 
tuple cache
 Key: IMPALA-13497
 URL: https://issues.apache.org/jira/browse/IMPALA-13497
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


The size of the tuple cache entry written / read is useful information for 
understanding the performance of the cache. Having this information in the 
profile will help us tune the placement policy for the tuple cache nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13496) JniFrontend should use JniUtil.serializeToThrift() for serialization

2024-10-29 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13496:
--

 Summary: JniFrontend should use JniUtil.serializeToThrift() for 
serialization
 Key: IMPALA-13496
 URL: https://issues.apache.org/jira/browse/IMPALA-13496
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


JniFrontend.java has many locations like this:
{code:java}
    try {
      TSerializer serializer = new TSerializer(protocolFactory_);
      return serializer.serialize(result);
    } catch (TException e) {
      throw new InternalException(e.getMessage());
    } {code}
This is the same as JniUtil.serializeToThrift(). We should standardize on 
JniUtil.serializeToThrift() to avoid the code duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13495) Calcite planner: Make exceptions easier to classify

2024-10-29 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13495:
--

 Summary: Calcite planner: Make exceptions easier to classify
 Key: IMPALA-13495
 URL: https://issues.apache.org/jira/browse/IMPALA-13495
 Project: IMPALA
  Issue Type: Sub-task
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


To make it easier to diagnose what is going on, it would be useful for the 
Calcite planner to produce categorized exceptions like ParseException, 
AnalysisException, etc. It would also be useful to produce a 
UnsupportedFeatureException for things that are not expected to work (e.g. 
HBase, Kudu, views, etc). 

Right now, the Calcite planner converts all exceptions from CalciteJniFrontend 
to InternalException, which makes it harder to classify them after the fact.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13494) Calcite planner: group_concat failing with distinct

2024-10-28 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13494:
-

 Summary: Calcite planner: group_concat failing with distinct
 Key: IMPALA-13494
 URL: https://issues.apache.org/jira/browse/IMPALA-13494
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin


The following query is failing in distinct.test

 

    select sum(len_orderkey), sum(len_comment)

    from (

      select

        length(group_concat(distinct cast(l_orderkey as string))) len_orderkey,

        length(group_concat(distinct(l_comment))) len_comment

        from tpch.lineitem

        group by l_comment

      ) v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13493) Calcite planner: parsing error on bracket hint comment

2024-10-28 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13493:
-

 Summary: Calcite planner: parsing error on bracket hint comment
 Key: IMPALA-13493
 URL: https://issues.apache.org/jira/browse/IMPALA-13493
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin


The hint comment surrounded by square brackets is causing a parsing error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13492) TestIcebergTable.test_describe_history_params is flaky.

2024-10-28 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13492:
-

 Summary: TestIcebergTable.test_describe_history_params is flaky.
 Key: IMPALA-13492
 URL: https://issues.apache.org/jira/browse/IMPALA-13492
 Project: IMPALA
  Issue Type: Bug
  Components: Test
Affects Versions: Impala 4.4.0
Reporter: Riza Suminto
Assignee: Zoltán Borók-Nagy


TestIcebergTable.test_describe_history_params is flaky with following error 
message and stack trace on failed run:
{code:java}
Error Message:

query_test/test_iceberg.py:504: in test_describe_history_params 
self.expect_num_snapshots_from(impalad_client, tbl_name, now_budapest, 1) 
common/iceberg_test_suite.py:39: in expect_num_snapshots_from 
expected_result_size=expected_result_size) util/iceberg_util.py:102: in 
get_snapshots rows = impalad_client.execute(query, user) 
common/impala_connection.py:216: in execute 
fetch_profile_after_close=fetch_profile_after_close) 
beeswax/impala_beeswax.py:190: in execute handle = 
self.__execute_query(query_string.strip(), user=user) 
beeswax/impala_beeswax.py:381: in __execute_query handle = 
self.execute_query_async(query_string, user=user) 
beeswax/impala_beeswax.py:375: in execute_query_async handle = 
self.__do_rpc(lambda: self.imp_service.query(query,)) 
beeswax/impala_beeswax.py:553: in __do_rpc raise 
ImpalaBeeswaxException(self.__build_error_message(b), b) E   
ImpalaBeeswaxException: Query 884a5fa72d806f79:7c6e01ce failed: E   
AnalysisException: Invalid TIMESTAMP expression: UDF WARNING: Timestamp 
'2024-10-27 02:47:33.482594000' in timezone 'Europe/Budapest' could not be 
converted to UTC EEE   CAUSED BY: InternalException: UDF WARNING: 
Timestamp '2024-10-27 02:47:33.482594000' in timezone 'Europe/Budapest' could 
not be converted to UTC


Stacktrace:

query_test/test_iceberg.py:504: in test_describe_history_params
self.expect_num_snapshots_from(impalad_client, tbl_name, now_budapest, 1)
common/iceberg_test_suite.py:39: in expect_num_snapshots_from
expected_result_size=expected_result_size)
util/iceberg_util.py:102: in get_snapshots
rows = impalad_client.execute(query, user)
common/impala_connection.py:216: in execute
fetch_profile_after_close=fetch_profile_after_close)
beeswax/impala_beeswax.py:190: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:381: in __execute_query
handle = self.execute_query_async(query_string, user=user)
beeswax/impala_beeswax.py:375: in execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
beeswax/impala_beeswax.py:553: in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E   ImpalaBeeswaxException: Query 884a5fa72d806f79:7c6e01ce failed:
E   AnalysisException: Invalid TIMESTAMP expression: UDF WARNING: Timestamp 
'2024-10-27 02:47:33.482594000' in timezone 'Europe/Budapest' could not be 
converted to UTC
E   
E   
E   CAUSED BY: InternalException: UDF WARNING: Timestamp '2024-10-27 
02:47:33.482594000' in timezone 'Europe/Budapest' could not be converted to UTC
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13477) CTAS query should set request_pool in QueryStateRecord

2024-10-28 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-13477.
---
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> CTAS query should set request_pool in QueryStateRecord
> --
>
> Key: IMPALA-13477
> URL: https://issues.apache.org/jira/browse/IMPALA-13477
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Minor
> Fix For: Impala 4.5.0
>
>
> Resource Pool information for CTAS query is missing from /queries page of 
> WebUI. This is because CTAS query has TExecRequest.stmt_type = DDL. However, 
> CTAS also has TQueryExecRequest.stmt_type = DML and subject to 
> AdmissionControl. Therefore, its request pool must be recorded into 
> QueryStateRecord and displayed at /queries page of WebUI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13491) Add config on catalogd for controlling the number of concurrent loading/refresh commands

2024-10-27 Thread Manish Maheshwari (Jira)
Manish Maheshwari created IMPALA-13491:
--

 Summary: Add config on catalogd for controlling the number of 
concurrent loading/refresh commands
 Key: IMPALA-13491
 URL: https://issues.apache.org/jira/browse/IMPALA-13491
 Project: IMPALA
  Issue Type: Improvement
Reporter: Manish Maheshwari


When running Table Loading or Refresh commands, catalogd requires working 
memory in proportion to the number of tables been refreshed. While we have a 
table level lock, we dont have a config to control concurrent load/refresh 
operations.

In case of customers that run refresh in parallel in multiple threads, the 
number of load/refresh command can cause OOM on the catalog due to running out 
of working memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-13469) test_query_cpu_count_on_insert seems to be flaky

2024-10-25 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-13469.
---
Target Version: Impala 4.5.0
Resolution: Fixed

> test_query_cpu_count_on_insert seems to be flaky
> 
>
> Key: IMPALA-13469
> URL: https://issues.apache.org/jira/browse/IMPALA-13469
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Riza Suminto
>Priority: Major
>  Labels: broken-build
> Fix For: Impala 4.5.0
>
>
> We found that the test test_query_cpu_count_on_insert() that was recently 
> added in IMPALA-13445 seems to be flaky could fail with the following error.
> +*Error Message*+
> {code:java}
> ImpalaBeeswaxException: Query 554a332e9f9b499a:da216f59 failed: 
> IllegalStateException: null
> {code}
> +*Stacktrace*+
> {code:java}
> custom_cluster/test_executor_groups.py:1375: in test_query_cpu_count_on_insert
> "Verdict: Match", "CpuAsk: 9", "CpuAskBounded: 9", "|  
> partitions=unavailable"])
> custom_cluster/test_executor_groups.py:946: in _run_query_and_verify_profile
> result = self.execute_query_expect_success(self.client, query)
> common/impala_test_suite.py:891: in wrapper
> return function(*args, **kwargs)
> common/impala_test_suite.py:901: in execute_query_expect_success
> result = cls.__execute_query(impalad_client, query, query_options, user)
> common/impala_test_suite.py:1045: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:216: in execute
> fetch_profile_after_close=fetch_profile_after_close)
> beeswax/impala_beeswax.py:190: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:381: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:375: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:553: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: Query 554a332e9f9b499a:da216f59 failed:
> E   IllegalStateException: null
> {code}
>  
> The stack trace from the coordinator is given as follows too.
> {code}
> I1021 09:42:04.707075 18064 jni-util.cc:321] 
> 554a332e9f9b499a:da216f59] java.lang.IllegalStateException
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:496)
> at 
> org.apache.impala.planner.DistributedPlanner.createDmlFragment(DistributedPlanner.java:308)
> at 
> org.apache.impala.planner.Planner.createPlanFragments(Planner.java:173)
> at org.apache.impala.planner.Planner.createPlans(Planner.java:310)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1969)
> at 
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2968)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2730)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2269)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:2030)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> I1021 09:42:04.707089 18064 status.cc:129] 554a332e9f9b499a:da216f59] 
> IllegalStateException: null
> @  0x10c3dc7  impala::Status::Status()
> @  0x19b3668  impala::JniUtil::GetJniExceptionMsg()
> @  0x16b39ee  impala::JniCall::Call<>()
> @  0x1684d0f  impala::Frontend::GetExecRequest()
> @  0x23acec3  impala::QueryDriver::DoFrontendPlanning()
> @  0x23ad0b3  impala::QueryDriver::RunFrontendPlanner()
> @  0x17124cb  impala::ImpalaServer::ExecuteInternal()
> @  0x17131ba  impala::ImpalaServer::Execute()
> @  0x1885fd1  impala::ImpalaServer::query()
> @  0x175c4bc  beeswax::BeeswaxServiceProcessorT<>::process_query()
> @  0x17e0545  beeswax::BeeswaxServiceProcessorT<>::dispatchCall()
> @  0x17e0aea  impala::ImpalaServiceProcessorT<>::dispatchCall()
> @   0xf6c5d3  apache::thrift::TDispatchProcessor::process()
> @  0x13ea8b6  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x13d727b  impala::ThriftThread::RunRunnable()
> @  0x13d8e

[jira] [Created] (IMPALA-13490) TpcdsCpuCostPlannerTest#testNonTpcdsDdl() could fail after IMPALA-13469

2024-10-25 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-13490:


 Summary: TpcdsCpuCostPlannerTest#testNonTpcdsDdl() could fail 
after IMPALA-13469
 Key: IMPALA-13490
 URL: https://issues.apache.org/jira/browse/IMPALA-13490
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Fang-Yu Rao
Assignee: Riza Suminto


We found that testNonTpcdsDdl() in 
[TpcdsCpuCostPlannerTest.java|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java]
 could fail after IMPALA-13469 with the following error.

 

It looks like the expected value of 'segment-costs' does not match the actual 
one in the single node plan.

+*Error Message*+
{code:java}
Section PLAN of query at line 651:
create table t partitioned by (c_nationkey) sort by (c_custkey) as
select c_custkey, max(o_totalprice) as maxprice, c_nationkey
  from tpch.orders join tpch.customer on c_custkey = o_custkey
 where c_nationkey < 10
 group by c_custkey, c_nationkey

Actual does not match expected result:
Max Per-Host Resource Reservation: Memory=19.44MB Threads=1
Per-Host Resource Estimates: Memory=35MB
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB 
thread-reservation=1 runtime-filters-memory=1.00MB
|  max-parallelism=1 segment-costs=[8689789, 272154, 4822204]
^
WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, 
PARTITION-KEYS=(c_nationkey)]
|  partitions=25
|  output exprs: c_custkey, max(o_totalprice), c_nationkey
|  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=4822204
|
04:SORT
|  order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
|  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
thread-reservation=0
|  tuple-ids=3 row-size=18B cardinality=228.68K cost=272154
|  in pipelines: 04(GETNEXT), 03(OPEN)
|
03:AGGREGATE [FINALIZE]
|  output: max(o_totalprice)
|  group by: c_custkey, c_nationkey
|  mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB 
thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
|  in pipelines: 03(GETNEXT), 00(OPEN)
|
02:HASH JOIN [INNER JOIN]
|  hash predicates: o_custkey = c_custkey
|  fk/pk conjuncts: o_custkey = c_custkey
|  runtime filters: RF000[bloom] <- c_custkey
|  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0
|  tuple-ids=0,1 row-size=26B cardinality=228.68K cost=441187
|  in pipelines: 00(GETNEXT), 01(OPEN)
|
|--01:SCAN HDFS [tpch.customer]
| HDFS partitions=1/1 files=1 size=23.08MB
| predicates: c_nationkey < CAST(10 AS SMALLINT)
| stored statistics:
|   table: rows=150.00K size=23.08MB
|   columns: all
| extrapolated-rows=disabled max-scan-range-rows=150.00K
| mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
| tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch.orders]
   HDFS partitions=1/1 files=1 size=162.56MB
   runtime filters: RF000[bloom] -> o_custkey
   stored statistics:
 table: rows=1.50M size=162.56MB
 columns: all
   extrapolated-rows=disabled max-scan-range-rows=1.18M
   mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
   tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
   in pipelines: 00(GETNEXT)

Expected:
Max Per-Host Resource Reservation: Memory=19.44MB Threads=1
Per-Host Resource Estimates: Memory=35MB
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB 
thread-reservation=1 runtime-filters-memory=1.00MB
|  max-parallelism=1 segment-costs=[8689789, 17851, 3700630]
WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, 
PARTITION-KEYS=(c_nationkey)]
|  partitions=25
|  output exprs: c_custkey, max(o_totalprice), c_nationkey
|  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=3700630
|
04:SORT
|  order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
|  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
thread-reservation=0
|  tuple-ids=3 row-size=18B cardinality=15.00K cost=17851
|  in pipelines: 04(GETNEXT), 03(OPEN)
|
03:AGGREGATE [FINALIZE]
|  output: max(o_totalprice)
|  group by: c_custkey, c_nationkey
|  mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB 
thread-reservation=0
|  tuple-ids=2 row-size=18B cardinality=15.00K cost=1349818
|  in pipelines: 03(GETNEXT), 00(OPEN)
|
02:HASH JOIN [INNER JOIN]
|  hash predicates: o_custkey = c_custkey
|  fk/pk conjuncts: o_custkey = c_custkey
|  runtime filters: RF000[bloom] <- c_custkey
|  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0
|  t

[jira] [Created] (IMPALA-13489) Planner should estimate BATCH_SIZE automatically

2024-10-25 Thread Manish Maheshwari (Jira)
Manish Maheshwari created IMPALA-13489:
--

 Summary: Planner should estimate BATCH_SIZE automatically
 Key: IMPALA-13489
 URL: https://issues.apache.org/jira/browse/IMPALA-13489
 Project: IMPALA
  Issue Type: Improvement
Reporter: Manish Maheshwari


Planner should estimate BATCH_SIZE automatically according to the materialized 
row size to avoid the scanner thread becoming memory bound 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13488) Add query option for DEFAULT_INITIAL_TUPLE_CAPACITY

2024-10-25 Thread Manish Maheshwari (Jira)
Manish Maheshwari created IMPALA-13488:
--

 Summary: Add query option for DEFAULT_INITIAL_TUPLE_CAPACITY 
 Key: IMPALA-13488
 URL: https://issues.apache.org/jira/browse/IMPALA-13488
 Project: IMPALA
  Issue Type: Improvement
Reporter: Manish Maheshwari


DEFAULT_INITIAL_TUPLE_CAPACITY  is defaulted for 4 and is hardcoded. We need to 
convert this into a query option



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13487) Capture metrics for memory allocation in execution profile

2024-10-25 Thread Manish Maheshwari (Jira)
Manish Maheshwari created IMPALA-13487:
--

 Summary: Capture metrics for memory allocation in execution profile
 Key: IMPALA-13487
 URL: https://issues.apache.org/jira/browse/IMPALA-13487
 Project: IMPALA
  Issue Type: Improvement
Reporter: Manish Maheshwari


Capture metrics for memory allocation in execution profile to identify queries 
that are bound on memory allocation and allocation wait times esp during 
running many concurrent queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13486) Improve memory estimation and allocation for queries on complex types

2024-10-25 Thread Manish Maheshwari (Jira)
Manish Maheshwari created IMPALA-13486:
--

 Summary: Improve memory estimation and allocation for queries on 
complex types
 Key: IMPALA-13486
 URL: https://issues.apache.org/jira/browse/IMPALA-13486
 Project: IMPALA
  Issue Type: Improvement
Reporter: Manish Maheshwari


Improve memory estimation and allocation for queries on complex types taking 
into account the number of values in the complex type and any functions like 
unnest that would consume a lot of memory. 

Currently underestimating memory causes a lot of tcmalloc calls and the query 
get blocked waiting for tcmalloc central freelist.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13485) Support stats capture on complex types

2024-10-25 Thread Manish Maheshwari (Jira)
Manish Maheshwari created IMPALA-13485:
--

 Summary: Support stats capture on complex types
 Key: IMPALA-13485
 URL: https://issues.apache.org/jira/browse/IMPALA-13485
 Project: IMPALA
  Issue Type: Improvement
Reporter: Manish Maheshwari


In compute stats, we must capture the below -
 * length of complex types
 * min, max, avg for numeric item types
 * min, max, avg of length for string item types

For nested complex types, we need to evaluate how this would be captured. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13484) Querying an Iceberg table with TIMESTAMP_LTZ can cause data loss

2024-10-25 Thread Gabor Kaszab (Jira)
Gabor Kaszab created IMPALA-13484:
-

 Summary: Querying an Iceberg table with TIMESTAMP_LTZ can cause 
data loss
 Key: IMPALA-13484
 URL: https://issues.apache.org/jira/browse/IMPALA-13484
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Gabor Kaszab


*+Repro steps:+*

1. Create a table with Hive that has a TS_LTZ column:
{code:java}
create table ice_hive_tbl (i int, ts_ltz timestamp with local time zone) stored 
by iceberg;
{code}
2. Insert some data using Hive:
{code:java}
insert into ice_hive_tbl values (1, current_timestamp());
{code}
3. Add a breakpoint in Impala to the table loading code right before Impala 
sends out an alter_table to HMS to change the column type from TS_LTZ to TS. 
[Here|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463],
 for instance.
4. Query the table from Impala. This triggers a table load. Impala will come to 
a decision that it should change the TS_LTZ type of a column to TS. However, 
the break point will hold it doing this at this point.
5. Use Hive to add additional rows into the table:
{code:java}
insert into ice_hive_tbl values (2, current_timestamp());
insert into ice_hive_tbl values (3, current_timestamp());
{code}
6. Release the breakpoint and let Impala finish the SELECT query started at 4)
7. Do another SELECT * from Hive and/or Impala and verify that the extra rows 
added at 5) are not present in the table.

*+Root cause:+*

When Impala changes the TS_LTZ column to TS it does so by calling alter_table() 
on HMS directly. It gives a Metastore Table object to HMS as the desired state 
of the table. HMS then persists this table object.

The problem with this:
 - Impala doesn't use the Iceberg API to alter the table. As a result there is 
no conflict detection performed, and it won't be detected that another commits 
went into the table since Impala grabbed a table object from HMS.
 - The metadata.json path is part of the table properties, and when Impala 
calls alter_table(tbl_obj) HMS will also persist this metadata path to the 
table, even though there were other changes that already moved the metadata 
path forward.
 - Essentially this will revert the changes on the table back to the state when 
Impala loaded the table object from HMS.
 - In a high-frequency scenario this could cause problems when Hive (or even 
Spark) heavily writes the table and meanwhile Impala reads this table. Some 
snapshots could be unintentionally reverted by this behavior and as a result 
could cause data loss or any changes like deletes being reverted.

{+}Just a note{+}, FWIW, with the current approach Impala doesn't change the 
column types in the Iceberg metadata, it does change the column types in HMS. 
So even with this, the Iceberg metadata would show the column type as 
timestamptz.
{+}Note2{+}, I described this problem using timestamp with local time zone as 
an example but it could also be triggered by other column types not entirely 
compatible with Impala. I haven't made my research to find out if there is any 
other such type, though.
{+}Note3{+}, this issue seems to be there forever. I found the code that 
triggers this being added by one of the first changes wrt Iceberg integration, 
the "[Create Iceberg 
table|https://github.com/apache/impala/commit/8fcad905a12d018eb0a354f7e4793e5b0d5efd3b]";
 change.

*+Possible solutions:+*
1. Impala can do the alter table by calling the Iceberg API and not HMS 
directly.
There are thing to be careful about:
 - With this approach would the above repro steps make the table loading fail 
due to conflict between commits on the tables? Or could the schema change be 
merged automatically be Iceberg lib to the latest state even if there had been 
changes on the table? I think this would work as expected and won't reject 
loading the table, but we should make sure when testing this.
 - With this approach Impala would set the TS_LTZ cols to TS properly causing 
no snapshots to be lost. However, when a new write is performed by Hive/Spark, 
they'd set the col types back to TS_LTZ. And then when Impala reads the table 
again, it will set these cols to TS again. And so on. Question is, would a 
scenario like this flood Iceberg metadata, e.g. metadata.json with all this 
uncontrolled schema changes?
 - Now we talk about schema changes, but in fact what the code does now is way 
wider than that. It sends a table object to HMS to persist it. We have to 
double check if the current approach only persists schema changes or could do 
any other changes too. E.g. the code also sets DO_NOT_UPDATE_STATS property and 
the last DDL time too. Could it change anything else as well that we might miss 
with this approach?

2. Do not do any alter_tables after loading the Iceberg table
This ap

[jira] [Closed] (IMPALA-12652) Limit Length of Completed Queries Insert DML

2024-10-24 Thread Jason Fehr (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Fehr closed IMPALA-12652.
---
Resolution: Fixed

Implemented with commit 
https://github.com/apache/impala/commit/711a9f2bad84f92dc4af61d49ae115f0dc4239da

> Limit Length of Completed Queries Insert DML
> 
>
> Key: IMPALA-12652
> URL: https://issues.apache.org/jira/browse/IMPALA-12652
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Jason Fehr
>Assignee: Jason Fehr
>Priority: Major
>  Labels: backend, workload-management
>
> Implement a coordinator startup flag that limits the max length (number of 
> characters) of the insert DML statement that inserts records into the 
> impala_query_log table.
> The purpose of this flag is to ensure that workload management does not 
> generate an insert DML statement that exceeds Impala's max length for a sql 
> statement (approximately 16 megabytes or 16 million characters).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13483) Calcite Planner: some scalar subquery throws exception

2024-10-24 Thread weihua zhang (Jira)
weihua zhang created IMPALA-13483:
-

 Summary: Calcite Planner: some scalar subquery throws exception
 Key: IMPALA-13483
 URL: https://issues.apache.org/jira/browse/IMPALA-13483
 Project: IMPALA
  Issue Type: Sub-task
Reporter: weihua zhang


{code:java}
create table correlated_scalar_t1(c1 bigint, c2 bigint);
create table correlated_scalar_t2(c1 bigint, c2 bigint);
insert into correlated_scalar_t1 values (1,null),(null,1),(1,2), 
(null,2),(1,3), (2,4), (2,5), (3,3), (3,4), (20,2), (22,3), (24,4),(null,null);
insert into correlated_scalar_t2 values (1,null),(null,1),(1,4), (1,2), 
(null,3), (2,4), (3,7), (3,9),(null,null),(5,1);

select c1 from correlated_scalar_t1 where correlated_scalar_t1.c2 > (select c1 
from correlated_scalar_t2 where correlated_scalar_t1.c1 = 
correlated_scalar_t2.c1 and correlated_scalar_t2.c2 < 4) order by c1;{code}


I1023 19:56:24.310750 1989386 CalciteOptimizer.java:184] 
044892e8f77df486:abfd3cda] [Impala plan]
LogicalSort(sort0=[$0], dir0=[ASC]), id = 717
  LogicalProject(C1=[$0]), id = 716
LogicalJoin(condition=[AND(=($0, $2), >($1, $3))], joinType=[inner]), id = 
715
  LogicalTableScan(table=[[default, correlated_scalar_t1]]), id = 547
  LogicalAggregate(group=[{0}], agg#0=[SINGLE_VALUE($1)]), id = 714
LogicalProject(c11=[$0], C1=[$0]), id = 713
  LogicalFilter(condition=[AND(<($1, 4), IS NOT NULL($0))]), id = 712
LogicalTableScan(table=[[default, correlated_scalar_t2]]), id = 549


I1023 19:56:24.312273 1989386 CalciteJniFrontend.java:174] 
044892e8f77df486:abfd3cda] Optimized logical plan
I1023 19:56:24.312394 1989386 CalciteMetadataHandler.java:202] 
044892e8f77df486:abfd3cda] Loaded tables: correlated_scalar_t1, 
correlated_scalar_t2
I1023 19:56:24.312475 1989386 AuthorizationUtil.java:100] 
044892e8f77df486:abfd3cda] Authorization is 'DISABLED'.
I1023 19:56:24.79 1989386 CalciteJniFrontend.java:123] 
044892e8f77df486:abfd3cda] Calcite planner failed.
I1023 19:56:24.333417 1989386 CalciteJniFrontend.java:124] 
044892e8f77df486:abfd3cda] Exception: 
java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
I1023 19:56:24.333540 1989386 CalciteJniFrontend.java:126] 
044892e8f77df486:abfd3cda] Stack 
Trace:java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
at java.util.ArrayList.rangeCheck(ArrayList.java:659)
at java.util.ArrayList.get(ArrayList.java:435)
at 
org.apache.impala.calcite.rel.util.CreateExprVisitor.visitInputRef(CreateExprVisitor.java:51)
at 
org.apache.impala.calcite.rel.util.CreateExprVisitor.visitInputRef(CreateExprVisitor.java:33)
at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:125)
at 
org.apache.impala.calcite.rel.util.CreateExprVisitor.visitCall(CreateExprVisitor.java:58)
at 
org.apache.impala.calcite.rel.util.CreateExprVisitor.visitCall(CreateExprVisitor.java:33)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:189)
at 
org.apache.impala.calcite.rel.node.ImpalaJoinRel.getConditionConjuncts(ImpalaJoinRel.java:412)
at 
org.apache.impala.calcite.rel.node.ImpalaJoinRel.getPlanNode(ImpalaJoinRel.java:101)
at 
org.apache.impala.calcite.rel.node.ImpalaProjectRel.getChildPlanNode(ImpalaProjectRel.java:117)
at 
org.apache.impala.calcite.rel.node.ImpalaProjectRel.getPlanNode(ImpalaProjectRel.java:62)
at 
org.apache.impala.calcite.rel.node.ImpalaSortRel.getChildPlanNode(ImpalaSortRel.java:141)
at 
org.apache.impala.calcite.rel.node.ImpalaSortRel.getPlanNode(ImpalaSortRel.java:84)
at 
org.apache.impala.calcite.service.CalcitePhysPlanCreator.create(CalcitePhysPlanCreator.java:51)
at 
org.apache.impala.calcite.service.CalciteJniFrontend.createExecRequest(CalciteJniFrontend.java:108)
I1023 19:56:24.333645 1989386 jni-util.cc:288] 
044892e8f77df486:abfd3cda] org.apache.impala.common.InternalException: 
Index: 3, Size: 3
at 
org.apache.impala.calcite.service.CalciteJniFrontend.createExecRequest(CalciteJniFrontend.java:127)
I1023 19:56:24.333654 1989386 status.cc:129] 044892e8f77df486:abfd3cda] 
InternalException: Index: 3, Size: 3
@  0x11f6c5d  impala::Status::Status()
@  0x1b579e6  impala::JniUtil::GetJniExceptionMsg()
@  0x183b922  impala::JniCall::Call<>()
@  0x180e86a  impala::Frontend::GetExecRequest()
@  0x252d1d4  impala::QueryDriver::RunFrontendPlanner()
@  0x18a7fd5  impala::ImpalaServer::ExecuteInternal()
@  0x18a9459  impala::ImpalaServer::Execute()
@  0x1a58c54  impala::ImpalaServer::ExecuteStatementCommon()
@  0x1a5a4a2  impala::ImpalaServer::ExecuteStatement()
@  0x192e001  
apache::hive::service::cli::thrift::TCLIServiceProcessorT

[jira] [Created] (IMPALA-13479) Patch gperftools to allow max_total_thread_cache_bytes to exceed 1GB

2024-10-24 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13479:
--

 Summary: Patch gperftools to allow max_total_thread_cache_bytes to 
exceed 1GB
 Key: IMPALA-13479
 URL: https://issues.apache.org/jira/browse/IMPALA-13479
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


gperftools limits max_total_thread_cache_bytes to 1GB here:

[https://github.com/gperftools/gperftools/blob/gperftools-2.10/src/thread_cache.cc#L520-L523]
{noformat}
void ThreadCache::set_overall_thread_cache_size(size_t new_size) {
  // Clip the value to a reasonable range
  if (new_size < kMinThreadCacheSize) new_size = kMinThreadCacheSize;
  if (new_size > (1<<30)) new_size = (1<<30);     // Limit to 1GB{noformat}
I confirmed that setting --tcmalloc_max_total_thread_cache_bytes=2147483648 
still results in a 1GB limit.

Sometimes, we would want a higher limit for systems with a large amount of 
memory and CPUs. For example, some systems now have 1TB of memory and 96 CPUs. 
With high concurrency, there is high contention on tcmalloc locks on central 
data structures. Increasing the total thread cache size could avoid this, and a 
value higher than 1GB is still a small part of system memory.

We can patch our toolchain gperftools to allow a higher value (and notify 
gperftools community).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13482) Calcite planner: Bug fixees for an analytics.test

2024-10-24 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13482:
-

 Summary: Calcite planner: Bug fixees for an analytics.test
 Key: IMPALA-13482
 URL: https://issues.apache.org/jira/browse/IMPALA-13482
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin


Specifically, 

select lag(coalesce(505, 1 + NULL), 1) over (order by int_col desc)

from alltypestiny
had a couple of issues that needed fixing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13481) Calcite planner: Add support for various analytic and agg functions

2024-10-24 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13481:
-

 Summary: Calcite planner: Add support for various analytic and agg 
functions
 Key: IMPALA-13481
 URL: https://issues.apache.org/jira/browse/IMPALA-13481
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13478) Don't sync tuple cache file contents to disk immediately

2024-10-24 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13478:
--

 Summary: Don't sync tuple cache file contents to disk immediately
 Key: IMPALA-13478
 URL: https://issues.apache.org/jira/browse/IMPALA-13478
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell


Currently, the tuple cache file writer syncs the file contents to disk before 
closing the file. This slows down the write path considerably, especially if 
disks are slow. This should be moved off of the fast path and done 
asynchronously. As a first step, this can remove the sync call and close the 
file without syncing. Other readers are still able to access it, and the kernel 
will flush the buffers as needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13480) PlannerTest.testAggregation should VALIDATE_CARDINALITY

2024-10-24 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13480:
-

 Summary: PlannerTest.testAggregation should VALIDATE_CARDINALITY
 Key: IMPALA-13480
 URL: https://issues.apache.org/jira/browse/IMPALA-13480
 Project: IMPALA
  Issue Type: Improvement
  Components: Test
Affects Versions: Impala 4.4.0
Reporter: Riza Suminto
Assignee: Riza Suminto


PlannerTest.testAggregation does not VALIDATE_CARDINALITY today. Validating 
cardinality will allow us to track our estimation quality and capture behavior 
change like 
https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L74-L76



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13477) CTAS query should set request_pool in QueryStateRecord

2024-10-24 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13477:
-

 Summary: CTAS query should set request_pool in QueryStateRecord
 Key: IMPALA-13477
 URL: https://issues.apache.org/jira/browse/IMPALA-13477
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.4.0
Reporter: Riza Suminto
Assignee: Riza Suminto


Resource Pool information for CTAS query is missing from /queries page of 
WebUI. This is because CTAS query has TExecRequest.stmt_type = DDL. However, 
CTAS also has TQueryExecRequest.stmt_type = DML and subject to 
AdmissionControl. Therefore, its request pool must be recorded into 
QueryStateRecord and displayed at /queries page of WebUI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13476) Calcite Planner: IMPALA-1519 query throws exception

2024-10-24 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-13476:
-

 Summary: Calcite Planner: IMPALA-1519 query throws exception
 Key: IMPALA-13476
 URL: https://issues.apache.org/jira/browse/IMPALA-13476
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin


The following query in analytics-fn.test does not work in CalcitePlanner:

# IMPALA-1519: Check that the first analytic sort of a select block

# materializes TupleIsNullPredicates to be substituted in ancestor nodes.

select

  sum(t1.id) over (partition by t1.bool_col),

  count(1) over (order by t1.int_col),

  avg(g) over (order by f),

  t2.a,

  t2.d 

from alltypestiny t1

left outer join 

  (select

     id as a,

     coalesce(id, 10) as b,

     int_col as c,

     coalesce(int_col, 20) as d,

     bigint_col e,

     coalesce(bigint_col, 30) as f,

     coalesce(id + bigint_col, 40) as g 

   from alltypestiny) t2

on (t1.id = t2.a + 100)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13475) Consider byte size when enqueuing deferred RPCs in KrpcDataStreamRecvr

2024-10-24 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13475:


 Summary: Consider byte size when enqueuing deferred RPCs in 
KrpcDataStreamRecvr
 Key: IMPALA-13475
 URL: https://issues.apache.org/jira/browse/IMPALA-13475
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


KrpcDataStreamRecvr::SenderQueue::ProcessDeferredRpc() can fail to process the 
deferred RCP if batch_queue is not empty  and the batch queue + the currently 
processed batch would consume too much memory (see 
KrpcDataStreamRecvr::CanEnqueue for details). The deferred RPC is moved back to 
the queue in this case.

Meanwhile KrpcDataStreamRecvr::SenderQueue::GetBatch() doesn't consider the mem 
requirement of the batches when initiating the deserialization of deferred RCPs 
( EnqueueDeserializeTask) and tries to deserialize as much batches in parallel 
as possible (FLAGS_datastream_service_num_deserialization_threads, 
https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/be/src/runtime/krpc-data-stream-recvr.cc#L281).

This means that several threads may start ProcessDeferredRpc() even if 
GetBatch() could have predicted that most will fail due to the memory limit. 
While this ProcessDeferredRpc() will fail early in this case and won't do much 
work, these extra failed attempts lock contention worse in 
KrpcDataStreamRecvr::SenderQueue. In the worst case when only 1 batch fits to 
memory this can lead to O(FLAGS_datastream_service_num_deserialization_threads 
* num_batches) wasted ProcessDeferredRpc attempts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13474) Support query timeline display for experimental profile

2024-10-23 Thread Surya Hebbar (Jira)
Surya Hebbar created IMPALA-13474:
-

 Summary: Support query timeline display for experimental profile
 Key: IMPALA-13474
 URL: https://issues.apache.org/jira/browse/IMPALA-13474
 Project: IMPALA
  Issue Type: Improvement
Reporter: Surya Hebbar
Assignee: Surya Hebbar


The webUI's query timeline only partially supports the experimental profile, 
i.e. when impala is started with {{gen_experimental_profile=true}}.

With the inclusion of aggregate event sequence metrics in the following patch, 
it is now possible to support the query timeline display for both profile 
formats.
https://gerrit.cloudera.org/#/c/21683/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11761) test_partition_dir_removed_inflight fails with "AssertionError: REFRESH should fail"

2024-10-23 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-11761.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> test_partition_dir_removed_inflight fails with "AssertionError: REFRESH 
> should fail"
> 
>
> Key: IMPALA-11761
> URL: https://issues.apache.org/jira/browse/IMPALA-11761
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> When running ozone tests, 
> TestRecursiveListing.test_partition_dir_removed_inflight fails:
> {code}
> metadata/test_recursive_listing.py:184: in test_partition_dir_removed_inflight
> refresh_should_fail=True)
> metadata/test_recursive_listing.py:217: in _test_listing_large_dir
> assert not refresh_should_fail, "REFRESH should fail"
> E   AssertionError: REFRESH should fail
> E   assert not True
> {code} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13473) Add support for JS code analysis and linting using ESLint

2024-10-23 Thread Surya Hebbar (Jira)
Surya Hebbar created IMPALA-13473:
-

 Summary: Add support for JS code analysis and linting using ESLint
 Key: IMPALA-13473
 URL: https://issues.apache.org/jira/browse/IMPALA-13473
 Project: IMPALA
  Issue Type: New Feature
Reporter: Surya Hebbar
Assignee: Surya Hebbar


Impala webUI's client side JS codebase is consistently increasing in size.

It would be helpful to support enforcing code style and quality for all the 
webUI's scripts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13472) Minidumps for UBSAN on ARM don't give the stack

2024-10-22 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-13472:


 Summary: Minidumps for UBSAN on ARM don't give the stack
 Key: IMPALA-13472
 URL: https://issues.apache.org/jira/browse/IMPALA-13472
 Project: IMPALA
  Issue Type: Task
Affects Versions: Impala 4.5.0
Reporter: Fang-Yu Rao
 Attachments: 75d667d8-3a76-4240-d95a63bd-01806ba9.dmp_dumpedv2, 
ce1d2431-ec45-4b30-8115d3a8-1c9b5e9e.dmp_dumpedv2, 
e85b9f26-2fd0-4beb-7823778e-cbed9c7b.dmp_dumpedv2

Currently Minidumps for UBSAN on ARM don't give the stack as shown in  
[^e85b9f26-2fd0-4beb-7823778e-cbed9c7b.dmp_dumpedv2],  
[^ce1d2431-ec45-4b30-8115d3a8-1c9b5e9e.dmp_dumpedv2], and  
[^75d667d8-3a76-4240-d95a63bd-01806ba9.dmp_dumpedv2]. It would be good if the 
functions in each thread could be resolved.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13471) test_enable_reading_puffin() seems to fail in the Ozone build

2024-10-22 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-13471:


 Summary: test_enable_reading_puffin() seems to fail in the Ozone 
build
 Key: IMPALA-13471
 URL: https://issues.apache.org/jira/browse/IMPALA-13471
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Daniel Becker


We found that the test 
[test_enable_reading_puffin()|https://github.com/apache/impala/blame/master/tests/custom_cluster/test_iceberg_with_puffin.py#L59]
 added in IMPALA-13247 seems to fail in the Ozone build.

+*Error Message*+
{code}
assert [-1, -1] == [2, 2]   At index 0 diff: -1 != 2   Full diff:   - [-1, -1]  
 + [2, 2]
{code}

+*Stacktrace*+
{code}
custom_cluster/test_iceberg_with_puffin.py:50: in test_enable_reading_puffin
self._read_ndv_stats_expect_result([2, 2])
custom_cluster/test_iceberg_with_puffin.py:59: in _read_ndv_stats_expect_result
assert ndvs == expected_ndv_stats
E   assert [-1, -1] == [2, 2]
E At index 0 diff: -1 != 2
E Full diff:
E - [-1, -1]
E + [2, 2]
{code}

According to the above, in the Ozone build, the result of "show column stats" 
was [-1, -1]. It looks like the NDV statistics is not available in the Ozone 
build.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >