[jira] [Resolved] (IMPALA-12554) Create only one Ranger policy for GRANT statement
[ https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12554. -- Fix Version/s: Impala 4.5.0 Target Version: Impala 4.5.0 Resolution: Fixed > Create only one Ranger policy for GRANT statement > - > > Key: IMPALA-12554 > URL: https://issues.apache.org/jira/browse/IMPALA-12554 > Project: IMPALA > Issue Type: Improvement >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Fix For: Impala 4.5.0 > > > Currently Impala would create a Ranger policy for each column specified in a > GRANT statement. For instance, after the following query, 3 Ranger policies > would be created on the Ranger server. This could result in a lot of policies > created when there are many columns specified and it may result in Impala's > Ranger plug-in taking a long time to download the policies from the Ranger > server. It would be great if Impala only creates one single policy for > columns in the same table. > {code:java} > [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table > functional.alltypes to user non_owner; > Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes > to user non_owner > Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000) > Query progress can be monitored at: > http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69 > +-+ > | summary | > +-+ > | Privilege(s) have been granted. | > +-----+ > Fetched 1 row(s) in 0.67s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13482) Calcite planner: Bug fixes for an analytics.test
[ https://issues.apache.org/jira/browse/IMPALA-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-13482. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Calcite planner: Bug fixes for an analytics.test > > > Key: IMPALA-13482 > URL: https://issues.apache.org/jira/browse/IMPALA-13482 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > Fix For: Impala 4.5.0 > > > Specifically, > select lag(coalesce(505, 1 + NULL), 1) over (order by int_col desc) > from alltypestiny > had a couple of issues that needed fixing -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13539) TestCalcitePlanner.test_calcite_frontend fails on non-HDFS test jobs
Joe McDonnell created IMPALA-13539: -- Summary: TestCalcitePlanner.test_calcite_frontend fails on non-HDFS test jobs Key: IMPALA-13539 URL: https://issues.apache.org/jira/browse/IMPALA-13539 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell Assignee: Joe McDonnell On S3 and other non-HDFS jobs, TestCalcitePlanner.test_calcite_frontend fails with this: {noformat} custom_cluster/test_calcite_planner.py:40: in test_calcite_frontend self.run_test_case('QueryTest/calcite', vector, use_db=unique_database) common/impala_test_suite.py:849: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:656: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:520: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:290: in verify_query_result_is_subset unicode(expected_row), unicode(actual_results)) E AssertionError: Could not find expected row row_regex:.*00:SCAN HDFS.* in actual rows: E ' S3 partitions=4/4 files=4 size=460B' E ' row-size=89B cardinality=8' E '00:SCAN S3 [functional.alltypestiny]' E '01:EXCHANGE [UNPARTITIONED]' E 'PLAN-ROOT SINK' E '|' E '|'{noformat} It is looking for SCAN HDFS, but non-HDFS filesystems will have a different message. We should change what it expects. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13538) Support query plan for imported query profiles
Surya Hebbar created IMPALA-13538: - Summary: Support query plan for imported query profiles Key: IMPALA-13538 URL: https://issues.apache.org/jira/browse/IMPALA-13538 Project: IMPALA Issue Type: Improvement Reporter: Surya Hebbar Assignee: Surya Hebbar Currently, only text plan rendering has been supported for imported query profiles. It would be useful to support enhanced SVG rendering of the query plan for imported profiles. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13537) TestPartitionDeletion.test_local_catalog_with_event_processing fails in some builds
Daniel Becker created IMPALA-13537: -- Summary: TestPartitionDeletion.test_local_catalog_with_event_processing fails in some builds Key: IMPALA-13537 URL: https://issues.apache.org/jira/browse/IMPALA-13537 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Quanlong Huang test_partition.TestPartitionDeletion.test_local_catalog_with_event_processing fails in some of our builds: {code:java} custom_cluster/test_partition.py:127: in test_local_catalog_with_event_processing self._test_partition_deletion(unique_database) custom_cluster/test_partition.py:162: in _test_partition_deletion self.assert_catalogd_log_contains("INFO", deletion_log_regex.format(tbl, i)) common/impala_test_suite.py:1341: in assert_catalogd_log_contains daemon, level, line_regex, expected_count, timeout_s, dry_run) common/impala_test_suite.py:1380: in assert_log_contains (expected_count, log_file_path, line_regex, found, line) E AssertionError: Expected 1 lines in file /data0/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/catalogd.impala-ec2-centos79-m6i-4xlarge-xldisk-1323.vpc.cloudera.com.jenkins.log.INFO.20241103-192936.15236 matching regex 'Collected . partition deletion.*HDFS_PARTITION:test_local_catalog_with_event_processing_4fbd8416.part_tbl:.*p=3', but found 0 lines. Last line was: E I1103 19:29:55.873072 16894 TableLoader.java:177] Loaded metadata for: test_local_catalog_with_event_processing_4fbd8416.part_tbl (68ms) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13536) Tests failing in TestWorkloadManagementInitWait
Daniel Becker created IMPALA-13536: -- Summary: Tests failing in TestWorkloadManagementInitWait Key: IMPALA-13536 URL: https://issues.apache.org/jira/browse/IMPALA-13536 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Michael Smith Some of our tests failed in the test class test_workload_mgmt_init.py::TestWorkloadManagementInitWait. test_upgrade_1_0_0_to_1_1_0(): {code:java} custom_cluster/test_workload_mgmt_init.py:222: in test_upgrade_1_0_0_to_1_1_0 self.check_schema_version("1.1.0") custom_cluster/test_workload_mgmt_init.py:129: in check_schema_version self.assert_table_prop(tbl_name, "wm_schema_version", schema_version) custom_cluster/test_workload_mgmt_init.py:104: in assert_table_prop assert found, "did not find expected table prop '{}' with value '{}' on table " \ E AssertionError: did not find expected table prop 'wm_schema_version' with value '1.1.0' on table 'sys.impala_query_log' E assert False {code} test_invalid_wm_schema_version_live_table_prop(): {code:java} custom_cluster/test_workload_mgmt_init.py:375: in test_invalid_wm_schema_version_live_table_prop self._run_invalid_table_prop_test(self.QUERY_TBL_LIVE, "wm_schema_version") custom_cluster/test_workload_mgmt_init.py:325: in _run_invalid_table_prop_test "found on the '{}' property of table '{}'".format(prop_name, table)) common/impala_test_suite.py:1351: in assert_catalogd_log_contains daemon, level, line_regex, expected_count, timeout_s, dry_run) common/impala_test_suite.py:1397: in assert_log_contains (expected_count, log_file_path, line_regex, found, line) E AssertionError: Expected 1 lines in file /data0/jenkins/workspace/impala-asf-master-core-ozone-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.impala-ec2-centos79-m6i-4xlarge-xldisk-126a.vpc.cloudera.com.jenkins.log.FATAL.20241107-042724.11427 matching regex 'could not parse version string '' found on the 'wm_schema_version' property of table 'sys.impala_query_live'', but found 0 lines. Last line was: E . Impalad exiting. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13535) Add script to restore stats on PlannerTest files
Riza Suminto created IMPALA-13535: - Summary: Add script to restore stats on PlannerTest files Key: IMPALA-13535 URL: https://issues.apache.org/jira/browse/IMPALA-13535 Project: IMPALA Issue Type: Improvement Components: Frontend, Test Reporter: Riza Suminto Assignee: Riza Suminto We have several PlannerTest that validate over EXTENDED profile and validate cardinality. In EXTENDED level, profile display stored table stats from HMS like 'numRows' and 'totalSize', which can vary between data loads. They are not validated by PlannerTest and will not fail the test. But frequent change of these lines can disturb code review process because they are mostly noise. We need to have some script to help ease restoring the stored table stats information in those .test files. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13534) Support CTE in distributed plan
Michael Smith created IMPALA-13534: -- Summary: Support CTE in distributed plan Key: IMPALA-13534 URL: https://issues.apache.org/jira/browse/IMPALA-13534 Project: IMPALA Issue Type: Sub-task Components: Backend, Frontend Reporter: Michael Smith Assignee: Michael Smith Add planner support for generating CTE producer fragments, removing sequence in distributed plan. CTE buffer must remain active until all consumers have finished reading. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13533) Execute single-node CTE plans
Michael Smith created IMPALA-13533: -- Summary: Execute single-node CTE plans Key: IMPALA-13533 URL: https://issues.apache.org/jira/browse/IMPALA-13533 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Michael Smith Assignee: Michael Smith Add backend support for CTE plans. * Add PlanNode/ExecNode for CTE producer, consumer, and sequence nodes. * Implement backend node-local dataflow from CTE producers to consumers. Use BufferedTupleStream to buffer CTE producer results until each consumer is ready to process them; initially this will be a batch operation, where CTE producer puts all results into the BufferedTupleStream, then consumers can start reading them. * Should be able to execute all TPC-DS queries. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13532) Add concurrent unpinned reader support to BufferedTupleStream
Michael Smith created IMPALA-13532: -- Summary: Add concurrent unpinned reader support to BufferedTupleStream Key: IMPALA-13532 URL: https://issues.apache.org/jira/browse/IMPALA-13532 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Michael Smith Modify BufferedTupleStream to support multiple concurrent unpinned readers. Currently it requires the buffer to be pinned to support concurrent readers. This primarily involves additional locking around pinning/unpinning to ensure concurrent readers don't cause conflicts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13531) Implement CTE identification in Impala Calcite planner
Michael Smith created IMPALA-13531: -- Summary: Implement CTE identification in Impala Calcite planner Key: IMPALA-13531 URL: https://issues.apache.org/jira/browse/IMPALA-13531 Project: IMPALA Issue Type: Sub-task Components: Frontend Reporter: Michael Smith Port HIVE-28259 CTE identification to the Impala Calcite planner. That should be sufficient to produce CTE PlanNodes in a single-node plan that we can verify by reviewing the plan produced. Plan is not expected to be executable. Add query options to enable/disable CTEs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13530) Calcite planner: support decimal_v1 query option
Steve Carlin created IMPALA-13530: - Summary: Calcite planner: support decimal_v1 query option Key: IMPALA-13530 URL: https://issues.apache.org/jira/browse/IMPALA-13530 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13529) Calcite planner: need to support query option appx_count_distinct
Steve Carlin created IMPALA-13529: - Summary: Calcite planner: need to support query option appx_count_distinct Key: IMPALA-13529 URL: https://issues.apache.org/jira/browse/IMPALA-13529 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13528) Calcite planner: run unsupported query options through original planner
Steve Carlin created IMPALA-13528: - Summary: Calcite planner: run unsupported query options through original planner Key: IMPALA-13528 URL: https://issues.apache.org/jira/browse/IMPALA-13528 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13494) Calcite planner: group_concat failing with distinct
[ https://issues.apache.org/jira/browse/IMPALA-13494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-13494. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Calcite planner: group_concat failing with distinct > --- > > Key: IMPALA-13494 > URL: https://issues.apache.org/jira/browse/IMPALA-13494 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > Fix For: Impala 4.5.0 > > > The following query is failing in distinct.test > > select sum(len_orderkey), sum(len_comment) > from ( > select > length(group_concat(distinct cast(l_orderkey as string))) > len_orderkey, > length(group_concat(distinct(l_comment))) len_comment > from tpch.lineitem > group by l_comment > ) v -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13510) Unset the environment variable for tuple cache tests
[ https://issues.apache.org/jira/browse/IMPALA-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-13510. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Unset the environment variable for tuple cache tests > > > Key: IMPALA-13510 > URL: https://issues.apache.org/jira/browse/IMPALA-13510 > Project: IMPALA > Issue Type: Bug >Reporter: Yida Wu >Assignee: Yida Wu >Priority: Major > Fix For: Impala 4.5.0 > > > The test_cache_disabled test case would fail in the tuple cache build because > the build enables the tuple cache using the environment variables, while the > test case requires the tuple cache to remain disabled. To unset the related > environment variables is a way to resolve the failure. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13512) Print .test file name if PlannerTest fail
[ https://issues.apache.org/jira/browse/IMPALA-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-13512. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Print .test file name if PlannerTest fail > - > > Key: IMPALA-13512 > URL: https://issues.apache.org/jira/browse/IMPALA-13512 > Project: IMPALA > Issue Type: Improvement > Components: Test >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Minor > Fix For: Impala 4.5.0 > > > If PlannerTest fail, error message will show hint of which test case fail by > printing the section and line number like this: > {code:java} > Error Message > Section PLAN of query at line 239: {code} > This can be improved by also printing the path to .test file that is failed > like this: > {code:java} > Error Message > Section PLAN of query at > functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:239: > {code} > PlannerTest should also skip printing VERBOSE plan if > PlannerTestOption.EXTENDED_EXPLAIN is specified, since EXTENDED level already > contains sufficient details including tuples, sizes, and cardinality. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13505) NullPointerException in Analyzer.resolveActualPath with Calcite planner
[ https://issues.apache.org/jira/browse/IMPALA-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-13505. Fix Version/s: Impala 4.5.0 Resolution: Fixed > NullPointerException in Analyzer.resolveActualPath with Calcite planner > --- > > Key: IMPALA-13505 > URL: https://issues.apache.org/jira/browse/IMPALA-13505 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Michael Smith >Assignee: Jason Fehr >Priority: Major > Labels: calcite > Fix For: Impala 4.5.0 > > > Encountered a NullPointerException when running some TPC-DS queries (such as > q8) with the Calcite planner: > {code:java} > Stack Trace:java.lang.NullPointerException > at > org.apache.impala.analysis.Analyzer.lambda$resolveActualPath$18(Analyzer.java:4699) > at java.util.Collections$2.tryAdvance(Collections.java:4719) > at java.util.Collections$2.forEachRemaining(Collections.java:4727) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) > at > org.apache.impala.analysis.Analyzer.resolveActualPath(Analyzer.java:4690) > at > org.apache.impala.analysis.Analyzer.lambda$addColumnsTo$17(Analyzer.java:4655) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at > org.apache.impala.analysis.Analyzer.addColumnsTo(Analyzer.java:4655) > at > org.apache.impala.analysis.Analyzer.addJoinColumns(Analyzer.java:4732) > at org.apache.impala.planner.JoinNode.init(JoinNode.java:293) > at org.apache.impala.planner.HashJoinNode.init(HashJoinNode.java:82) > at > org.apache.impala.calcite.rel.phys.ImpalaHashJoinNode.(ImpalaHashJoinNode.java:46) > ... > {code} > {{SlotRef.getResolvedPath}} returns null at line 4699. Looking at the > SlotRef, I don't see any way to determine an origin, so this may be part of > incomplete implementation of the Calcite planner integration. > To reproduce > {code:java} > $ start-impala-cluster.py -s 1 --use_calcite_planner=true > $ impala-py.test > tests/query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8 > {code} > *Analysis:* > The root issue with this particular query is that it contains a very lengthy > [list of zip > codes|https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test#L12-L62] > that are used in a where clause. The Calcite planner is producing this join > node for that where clause: > {noformat} > 05:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: substring(tpcds.customer_address.ca_zip, 1, 5) = EXPR$0 > | fk/pk conjuncts: assumed fk/pk > | runtime filters: RF002[bloom] <- EXPR$0 > {noformat} > Since EXPR$0 is not a named column, it has a null resolved path and can be > skipped. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13527) Duplicate S3 Paths for Workload Management Iceberg Table
Jason Fehr created IMPALA-13527: --- Summary: Duplicate S3 Paths for Workload Management Iceberg Table Key: IMPALA-13527 URL: https://issues.apache.org/jira/browse/IMPALA-13527 Project: IMPALA Issue Type: Bug Reporter: Jason Fehr Assignee: Jason Fehr Error writing completed queries to table sys.impala_query_log: {noformat} User: impala State: EXCEPTION Status: IcebergTableLoadingException: Error loading metadata for Iceberg table s3a://datalake-name/warehouse/tablespace/external/hive/sys.db/impala_query_log CAUSED BY: TableLoadingException: Failed to load metadata for table: sys.impala_query_log CAUSED BY: IllegalArgumentException: Multiple entries with same key: data/cluster_id=my-cluster/start_time_utc_hour=2024-10-02-09/723531a60b0fa6a7-a526e67b_3520834245_data.0.parq=FileDescriptor{RelativePath=data/cluster_id=my-cluster/start_time_utc_hour=2024-10-02-09/723531a60b0fa6a7-a526e67b_3520834245_data.0.parq, Length=4967583, Compression=NONE, ModificationTime=1, Blocks=} and data/cluster_id=my-cluster/start_time_utc_hour=2024-10-02-09/723531a60b0fa6a7-a526e67b_3520834245_data.0.parq=FileDescriptor{RelativePath=data/cluster_id=my-cluster/start_time_utc_hour=2024-10-02-09/723531a60b0fa6a7-a526e67b_3520834245_data.0.parq, Length=4967583, Compression=NONE, ModificationTime=1, Blocks=}. To index multiple values under a key, use Multimaps.index. {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13463) Impala should ignore case of Iceberg schema elements
[ https://issues.apache.org/jira/browse/IMPALA-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-13463. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Impala should ignore case of Iceberg schema elements > > > Key: IMPALA-13463 > URL: https://issues.apache.org/jira/browse/IMPALA-13463 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.5.0 > > > Schema is case insensitive in Impala. > Via Spark it's possible to create schema elements with upper/lower case > letters and store them in the metadata JSON files of Iceberg, e.g.: > {noformat} >"schemas" : [ { > "type" : "struct", > "schema-id" : 0, > "fields" : [ { >"id" : 1, >"name" : "ID", >"required" : false, >"type" : "string" > }, { >"id" : 2, >"name" : "OWNERID", >"required" : false, >"type" : "string" > } ] >} ], > {noformat} > This can cause problems in Impala during predicate pushdown, as we can get a > ValidationException from the Iceberg library (as Impala pushes down > predicates with lower case column names, while Iceberg sees upper case names). > We should invoke Scan.caseSensitive(boolean caseSensitive) on the TableScan > object to set case insensitivity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13526) Inconsistent Agg node stats recomputation.
Riza Suminto created IMPALA-13526: - Summary: Inconsistent Agg node stats recomputation. Key: IMPALA-13526 URL: https://issues.apache.org/jira/browse/IMPALA-13526 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 4.4.0 Reporter: Riza Suminto Assignee: Riza Suminto Within DistributedPlanner.java, there are several place where Planner need to insert extra merge aggregation node. It require transferring HAVING conjuncts from preaggregation node to merge aggregation, unsetting limit, and recompute stats of preaggregation node. However, the stats recompute is not consistently done, and there might be an inefficient recompute happening. Example of inefficient recomputes: https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L1074-L1077 Example of missing recompute for phase2AggNode: https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L1143-L1168 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13525) Calcite planner: handle escaped characters in string literal
Steve Carlin created IMPALA-13525: - Summary: Calcite planner: handle escaped characters in string literal Key: IMPALA-13525 URL: https://issues.apache.org/jira/browse/IMPALA-13525 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13193) RuntimeFilter on parquet dictionary should evaluate null values
[ https://issues.apache.org/jira/browse/IMPALA-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-13193. - Fix Version/s: Impala 4.5.0 Resolution: Fixed Resolving this. Thank [~tangzhi]! > RuntimeFilter on parquet dictionary should evaluate null values > --- > > Key: IMPALA-13193 > URL: https://issues.apache.org/jira/browse/IMPALA-13193 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, > Impala 4.3.0, Impala 4.4.0 >Reporter: Quanlong Huang >Assignee: Zhi Tang >Priority: Critical > Labels: correctness > Fix For: Impala 4.5.0 > > > IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime > filter on parquet dictionary values. If non of the values can pass the check, > the whole row group will be skipped. However, NULL values are not included in > the parquet dictionary. Runtime filters that accept NULL values might > incorrectly reject the row group if none of the dictionary values can pass > the check. > Here are steps to reproduce the bug: > {code:sql} > create table parq_tbl (id bigint, name string) stored as parquet; > insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc"); > create table dim_tbl (name string); > insert into dim_tbl values (NULL); > select * from parq_tbl p join dim_tbl d > on COALESCE(p.name, '') = COALESCE(d.name, '');{code} > The SELECT query should return 2 rows but now it returns 0 rows. > A workaround is to disable this optimization: > {code:sql} > set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13524) Calcite planner: support for functions in exprs.test
Steve Carlin created IMPALA-13524: - Summary: Calcite planner: support for functions in exprs.test Key: IMPALA-13524 URL: https://issues.apache.org/jira/browse/IMPALA-13524 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13523) Calcite planner: Derive default decimal precision for functions
Steve Carlin created IMPALA-13523: - Summary: Calcite planner: Derive default decimal precision for functions Key: IMPALA-13523 URL: https://issues.apache.org/jira/browse/IMPALA-13523 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13522) Calcite planner: "real" type should be treated as double
Steve Carlin created IMPALA-13522: - Summary: Calcite planner: "real" type should be treated as double Key: IMPALA-13522 URL: https://issues.apache.org/jira/browse/IMPALA-13522 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13521) Calcite Planner: Handle functions taking literal types from a lower level
Steve Carlin created IMPALA-13521: - Summary: Calcite Planner: Handle functions taking literal types from a lower level Key: IMPALA-13521 URL: https://issues.apache.org/jira/browse/IMPALA-13521 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13520) Calcite planner: support in clause coercing
Steve Carlin created IMPALA-13520: - Summary: Calcite planner: support in clause coercing Key: IMPALA-13520 URL: https://issues.apache.org/jira/browse/IMPALA-13520 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13519) Calcite Planner Does Not Set Workload Management Data
Jason Fehr created IMPALA-13519: --- Summary: Calcite Planner Does Not Set Workload Management Data Key: IMPALA-13519 URL: https://issues.apache.org/jira/browse/IMPALA-13519 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Jason Fehr Assignee: Steve Carlin Workload management writes the tables queried and the select, join, where, aggregate, order by columns to the sys.impala_query_log table. This data is determined by the frontend planner and returned to the backend coordinator via the [TExecRequest object|https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/common/thrift/Frontend.thrift#L702-L718]. The Calcite planner needs to provide this information as well on the TExecRequest object is creates. The [test_workload_mgmt_sql_details.py|https://github.com/apache/impala/blob/master/tests/custom_cluster/test_workload_mgmt_sql_details.py] custom cluster test is the best test to determine if Calcite is generating the correct values for each piece of data. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-12758) Event Processor is ignoring the prev_id while reloading the existing partitions
[ https://issues.apache.org/jira/browse/IMPALA-12758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala resolved IMPALA-12758. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Event Processor is ignoring the prev_id while reloading the existing > partitions > --- > > Key: IMPALA-12758 > URL: https://issues.apache.org/jira/browse/IMPALA-12758 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: catalog-2024 > Fix For: Impala 4.5.0 > > > Partitioned table Insert events consumed by the event processor reloads the > partitions to update file metadata. Currently while reloading the > filemetadata, 'prev_id' of the old partitions is being ignored in the > partition builder > {code:java} > HdfsPartition oldPartition = entry.getValue(); > HdfsPartition.Builder partBuilder = createPartitionBuilder( > hmsPartition.getSd(), hmsPartition, permissionCache); {code} > As a result 'prev_id' of the partBuilder will always be -1, so when the > catalogDelta is sent from the state store to impala demons, because prev_id > is not valid, impalads will not know whether to invalidate the current > partition and then request the new partition information. > This might lead to data correctness issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13148) Show the number of in-progress Catalog operations
[ https://issues.apache.org/jira/browse/IMPALA-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saurabh Katiyal resolved IMPALA-13148. -- Fix Version/s: Impala 4.5.0 Resolution: Fixed > Show the number of in-progress Catalog operations > - > > Key: IMPALA-13148 > URL: https://issues.apache.org/jira/browse/IMPALA-13148 > Project: IMPALA > Issue Type: Improvement >Reporter: Quanlong Huang >Assignee: Saurabh Katiyal >Priority: Major > Labels: newbie, ramp-up > Fix For: Impala 4.5.0 > > Attachments: Selection_122.png, Selection_123.png > > > In the /operations page of catalogd WebUI, the list of In-progress Catalog > Operations are shown. It'd be helpful to also show the number of such > operations. Like in the /queries page of coordinator WebUI, it shows 100 > queries in flight. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13518) Show target name of COMMIT_TXN events in logs
Quanlong Huang created IMPALA-13518: --- Summary: Show target name of COMMIT_TXN events in logs Key: IMPALA-13518 URL: https://issues.apache.org/jira/browse/IMPALA-13518 Project: IMPALA Issue Type: Task Reporter: Quanlong Huang Assignee: Quanlong Huang IMPALA-12460 adds logs about Top-10 expensive events and Top-10 expensive targets. However, for COMMIT_TXN events, the target is just "CLUSTER_WIDE": {noformat} Top 9 targets in event processing: (target=CLUSTER_WIDE, duration_ms=955792) {noformat} It'd be helpful to show the name of the tables involved in the transaction. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13502) Constructor cleanup
[ https://issues.apache.org/jira/browse/IMPALA-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-13502. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Constructor cleanup > --- > > Key: IMPALA-13502 > URL: https://issues.apache.org/jira/browse/IMPALA-13502 > Project: IMPALA > Issue Type: Task > Components: Backend >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Minor > Fix For: Impala 4.5.0 > > > Various cleanup ideas around constructors identified in IMPALA-12390. > - Replace {{const shared_ptr<>&}} with row pointers to make the API simpler > and more general. > - LLVM CreateBinaryPhiNode, CodegenNullPhiNode, and CodegenIsNullPhiNode > should all make {{name}} mandatory. Remove empty name handling from > CreateBinaryPhiNode. > - Several TSaslServerTransport constructors may be unused. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13396) Unify temporary dir management in CustomClusterTestSuite
[ https://issues.apache.org/jira/browse/IMPALA-13396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riza Suminto resolved IMPALA-13396. --- Fix Version/s: Impala 4.5.0 Resolution: Fixed > Unify temporary dir management in CustomClusterTestSuite > > > Key: IMPALA-13396 > URL: https://issues.apache.org/jira/browse/IMPALA-13396 > Project: IMPALA > Issue Type: Improvement > Components: Test >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.5.0 > > > There are many custom cluster test that require creating temporary director. > The temporary directory typically live within a scope of test method and > cleaned afterwards. However, some test do create temporary directly and > forgot to clean them afterwards, leaving junk dirs under /tmp/ or $LOG_DIR. > We can unify the temporary directory management inside > CustomClusterTestSuite. Some argument of CustomClusterTestSuite.with_args(), > such as 'impalad_args', 'catalogd_args', and 'impala_log_dir', should accept > formatting pattern that is replaceable by a temporary dir path. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13517) Calcite Planner: fix || operator (both concat and or)
Steve Carlin created IMPALA-13517: - Summary: Calcite Planner: fix || operator (both concat and or) Key: IMPALA-13517 URL: https://issues.apache.org/jira/browse/IMPALA-13517 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13516) Calcite Planner: Fix explicit cast issues
Steve Carlin created IMPALA-13516: - Summary: Calcite Planner: Fix explicit cast issues Key: IMPALA-13516 URL: https://issues.apache.org/jira/browse/IMPALA-13516 Project: IMPALA Issue Type: Improvement Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13515) ORC tables hit IllegalStateException due to "row__id" column
Joe McDonnell created IMPALA-13515: -- Summary: ORC tables hit IllegalStateException due to "row__id" column Key: IMPALA-13515 URL: https://issues.apache.org/jira/browse/IMPALA-13515 Project: IMPALA Issue Type: Sub-task Components: Frontend Reporter: Joe McDonnell -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13334) test_sort.py hit DCHECK when max_sort_run_size>0
[ https://issues.apache.org/jira/browse/IMPALA-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riza Suminto resolved IMPALA-13334. --- Fix Version/s: Impala 4.5.0 Target Version: Impala 4.5.0 Resolution: Fixed > test_sort.py hit DCHECK when max_sort_run_size>0 > > > Key: IMPALA-13334 > URL: https://issues.apache.org/jira/browse/IMPALA-13334 > Project: IMPALA > Issue Type: Bug > Components: Backend, Test >Reporter: Riza Suminto >Assignee: Noemi Pap-Takacs >Priority: Major > Fix For: Impala 4.5.0 > > > test_sort.py declare 'max_sort_run_size' query option, but has silently not > exercising it. Fixing the query option declaration using helper function > add_exec_option_dimension() reveals a DCHECK failure in sorter.cc > {code:java} > F0827 16:45:38.425906 2405388 sorter.cc:1183] > 054e9b0e1fecdaaf:298af369] Check failed: !*allocation_failed && > unsorted_run_->run_size() == inmem_run_max_pages_{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13514) Consider a mode where the Status from a Java exception includes the stack trace
Joe McDonnell created IMPALA-13514: -- Summary: Consider a mode where the Status from a Java exception includes the stack trace Key: IMPALA-13514 URL: https://issues.apache.org/jira/browse/IMPALA-13514 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell JniUtil::GetJniExceptionMsg() takes the message from a Java exception and turns it into an error Status. It currently has a mode where it writes the exception's stack trace to the log file. It might be nice to have a mode where it includes the exception stack trace in the actual error Status message, which will go all the way to the client. When a user hits some ambiguous error (e.g. an InternalStateException with no message), the stack trace is useful for tracking down the code location. If there is a mode where the stack trace is in the error message, it eliminates the need to search through logs (which can be enormous). This is also useful for cases where tests are running in parallel. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13513) Calcite planner: support decode function
Steve Carlin created IMPALA-13513: - Summary: Calcite planner: support decode function Key: IMPALA-13513 URL: https://issues.apache.org/jira/browse/IMPALA-13513 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13512) Print .test file name if PlannerTest fail
Riza Suminto created IMPALA-13512: - Summary: Print .test file name if PlannerTest fail Key: IMPALA-13512 URL: https://issues.apache.org/jira/browse/IMPALA-13512 Project: IMPALA Issue Type: Improvement Components: Test Reporter: Riza Suminto Assignee: Riza Suminto If PlannerTest fail, error message will show hint of which test case fail by printing the section and line number like this: {code:java} Error Message Section PLAN of query at line 239: {code} This can be improved by also printing the path to .test file that is failed like this: {code:java} Error Message Section PLAN of query at functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:239: {code} PlannerTest should also skip printing VERBOSE plan if PlannerTestOption.EXTENDED_EXPLAIN is specified, since EXTENDED level already contains sufficient details including tuples, sizes, and cardinality. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13507) Add param to disable glog buffering in with_args fixture
[ https://issues.apache.org/jira/browse/IMPALA-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riza Suminto resolved IMPALA-13507. --- Fix Version/s: Impala 4.5.0 Resolution: Fixed > Add param to disable glog buffering in with_args fixture > > > Key: IMPALA-13507 > URL: https://issues.apache.org/jira/browse/IMPALA-13507 > Project: IMPALA > Issue Type: Improvement > Components: Test >Affects Versions: Impala 4.4.0 >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.5.0 > > > We have plenty of custom_cluster tests that assert against content of Impala > daemon log files while the process is still running using > assert_log_contains() and it's wrappers. The method specifically mention > about disabling glog buffering ('-logbuflevel=-1'), but not all > custom_cluster tests do that. > {code:java} > def assert_log_contains(self, daemon, level, line_regex, expected_count=1, > timeout_s=6, > dry_run=False): > """ > Assert that the daemon log with specified level (e.g. ERROR, WARNING, > INFO) contains > expected_count lines with a substring matching the regex. When > expected_count is -1, > at least one match is expected. > Retries until 'timeout_s' has expired. The default timeout is the default > minicluster > log buffering time (5 seconds) with a one second buffer. > When using this method to check log files of running processes, the > caller should > make sure that log buffering has been disabled, for example by adding > '-logbuflevel=-1' to the daemon startup options or set timeout_s to a > value higher > than the log flush interval. Returns the result of the very last call > to line_regex.search or None if > expected_count is 0 or the line_regex did not match any lines. > """ {code} > This often result in flaky test that hard to triage and often neglected if it > does not frequently run in core exploration. > We can improve this by adding boolean param into > CustomClusterTestSuite.with_args, say 'disable_log_buffering', for test to > declare intention to inspect log files in live minicluster. If it is True, > start minicluster with '-logbuflevel=-1' for all daemons. If it is False, log > WARNING on any calls to assert_log_contains(). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13511) Calcite planner: support sub-millisecond datetime parts
Steve Carlin created IMPALA-13511: - Summary: Calcite planner: support sub-millisecond datetime parts Key: IMPALA-13511 URL: https://issues.apache.org/jira/browse/IMPALA-13511 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13468) Calcite planner: fix aggregation.test queries
[ https://issues.apache.org/jira/browse/IMPALA-13468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Carlin resolved IMPALA-13468. --- Resolution: Fixed > Calcite planner: fix aggregation.test queries > - > > Key: IMPALA-13468 > URL: https://issues.apache.org/jira/browse/IMPALA-13468 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13455) Calcite planner: convert expressions to normal form for performance
[ https://issues.apache.org/jira/browse/IMPALA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Carlin resolved IMPALA-13455. --- Resolution: Fixed > Calcite planner: convert expressions to normal form for performance > --- > > Key: IMPALA-13455 > URL: https://issues.apache.org/jira/browse/IMPALA-13455 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > > Enables q13, q48 in tpcds -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13461) Calcite planner: Need some translation rules to get tpcds queries to work
[ https://issues.apache.org/jira/browse/IMPALA-13461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Carlin resolved IMPALA-13461. --- Resolution: Fixed > Calcite planner: Need some translation rules to get tpcds queries to work > - > > Key: IMPALA-13461 > URL: https://issues.apache.org/jira/browse/IMPALA-13461 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (IMPALA-13221) Calcite: fix and enable tpcds and tpch tests
[ https://issues.apache.org/jira/browse/IMPALA-13221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Carlin closed IMPALA-13221. - Resolution: Won't Fix Originally this Jira was set up as a checkpoint for Calcite, but this will come later with other tests. > Calcite: fix and enable tpcds and tpch tests > > > Key: IMPALA-13221 > URL: https://issues.apache.org/jira/browse/IMPALA-13221 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > > As a minor milestone, making the tpcds and tpch queries work will show that > use case queries work in the Calcite framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13096) Cleanup Parser.jj for Calcite planner to only use supported syntax
[ https://issues.apache.org/jira/browse/IMPALA-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Carlin resolved IMPALA-13096. --- Resolution: Fixed > Cleanup Parser.jj for Calcite planner to only use supported syntax > -- > > Key: IMPALA-13096 > URL: https://issues.apache.org/jira/browse/IMPALA-13096 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (IMPALA-9170) close idle connections without an associated session
[ https://issues.apache.org/jira/browse/IMPALA-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE closed IMPALA-9170. Resolution: Duplicate > close idle connections without an associated session > > > Key: IMPALA-9170 > URL: https://issues.apache.org/jira/browse/IMPALA-9170 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Xiaomin Zhang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > With the fix of IMPALA-7802, Impala now can close an idle connection after a > configured interval. But it still leaves some connections opened which has no > associated sessions: > [https://github.com/cloudera/Impala/blob/cdh6.2.1/be/src/service/impala-server.cc#L2078] > if (it == connection_to_sessions_map_.end()) return false; > Some clients like HUE could use different connections to check the query > status or fetch result. In these cases, those connections have no associated > sessions, and not added into the connection_to_sessions_map. This caused > issues when we use Radware to load balance Impala, because Radware does not > send FIN to close an idle connection, but require backend to close idle > connections. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13510) Unset the environment variable for tuple cache tests
Yida Wu created IMPALA-13510: Summary: Unset the environment variable for tuple cache tests Key: IMPALA-13510 URL: https://issues.apache.org/jira/browse/IMPALA-13510 Project: IMPALA Issue Type: Bug Reporter: Yida Wu Assignee: Yida Wu The test_cache_disabled test case would fail in the tuple cache build because the build enables the tuple cache using the environment variables, while the test case requires the tuple cache to remain disabled. To unset the related environment variables is a way to resolve the failure. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13509) Avoid duplicate deepcopy duing hash partitioning in KrpcDataStreamSender
Csaba Ringhofer created IMPALA-13509: Summary: Avoid duplicate deepcopy duing hash partitioning in KrpcDataStreamSender Key: IMPALA-13509 URL: https://issues.apache.org/jira/browse/IMPALA-13509 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Csaba Ringhofer Currently all rows are deep copied twice: 1. to the RowBatch of the given channel 2. to an OutboundRowBatch when the collector RowBatch is at capacity Copying directly to an OutboundRowBatch could avoid some CPU work. The would also allow easier implementation of the following improvements: - deduplicate tuples similarly to broadcast/unpartitioned exchange (IMPALA-13225). - keep outbound row batch size below data_stream_sender_buffer_size even for var len data -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-12433) KrpcDataStreamSender could share some buffers between channels
[ https://issues.apache.org/jira/browse/IMPALA-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer resolved IMPALA-12433. -- Fix Version/s: Impala 4.4.0 Resolution: Fixed > KrpcDataStreamSender could share some buffers between channels > -- > > Key: IMPALA-12433 > URL: https://issues.apache.org/jira/browse/IMPALA-12433 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Csaba Ringhofer >Priority: Major > Labels: memory-saving, performance > Fix For: Impala 4.4.0 > > > Currently each channel has two outbound row batches and each of those have 2 > buffers, one for serialization and another for compression. > https://github.com/apache/impala/blob/0f55e551bc98843c79a9ec82582ddca237aa4fe9/be/src/runtime/row-batch.h#L100 > https://github.com/apache/impala/blob/0f55e551bc98843c79a9ec82582ddca237aa4fe9/be/src/runtime/krpc-data-stream-sender.cc#L236 > https://github.com/apache/impala/blob/0f55e551bc98843c79a9ec82582ddca237aa4fe9/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java#L81 > As serialization + compression is always done from the fragment instance > thread only one compression is done at a time, so a single compression buffer > could be shared between channels. If this buffer is sent via KRPC then it > could be swapped with the per channel buffer. > As far as I understand at least one buffer per channel is needed because > async KRPC calls can use it from another thread (this is done to avoid an > extra copy of the buffer before RPCs). We can only reuse that buffer after > getting a callback from KRPC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13296) Hive to Iceberg table-migration: pre-check column compatibility
[ https://issues.apache.org/jira/browse/IMPALA-13296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer resolved IMPALA-13296. -- Fix Version/s: Impala 4.5.0 Resolution: Fixed > Hive to Iceberg table-migration: pre-check column compatibility > --- > > Key: IMPALA-13296 > URL: https://issues.apache.org/jira/browse/IMPALA-13296 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.5.0 > > > The table migration from a Hive table to an Iceberg table is a multi-step > process that has a middle step to rename the original table to a temp name. > If a later step fails the user gets an error but the table remains renamed > (the user also gets a hint how to set the original name back). > When the failure is column incompatibility (e.g. Iceberg doesn't support > Hive's smallint, tinyint, varchar( n ) column types) we can do better because > this imcompatibility could also be found during query analysis. That way the > error could be sent before we rename the table. This results in a cleaner > user experience. > {code:java} > Query: alter table hive_tbl convert to iceberg ERROR: > IllegalArgumentException: Unsupported Hive type: VARCHAR, use string instead > Your table might have been renamed. To reset the name try running: ALTER > TABLE default.hive_tbl_tmp_8fb36dff RENAME TO default.hive_tbl; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13508) Create SetMetric listing active subsystem in an Impala Daemon
Riza Suminto created IMPALA-13508: - Summary: Create SetMetric listing active subsystem in an Impala Daemon Key: IMPALA-13508 URL: https://issues.apache.org/jira/browse/IMPALA-13508 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Riza Suminto Impala has several subsystem that can be enabled/disabled through backend flags such as Admission Control, Workload Management, HMS Event Processing, and so on. It will be great to have SetMetric listing all active subsystem in an Impala daemon. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13507) Add param to disable glog buffering in with_args fixture
Riza Suminto created IMPALA-13507: - Summary: Add param to disable glog buffering in with_args fixture Key: IMPALA-13507 URL: https://issues.apache.org/jira/browse/IMPALA-13507 Project: IMPALA Issue Type: Improvement Components: Test Affects Versions: Impala 4.4.0 Reporter: Riza Suminto Assignee: Riza Suminto We have plenty of custom_cluster tests that assert against content of Impala daemon log files while the process is still running using assert_log_contains() and it's wrappers. The method specifically mention about disabling glog buffering ('-logbuflevel=-1'), but not all custom_cluster tests do that. {code:java} def assert_log_contains(self, daemon, level, line_regex, expected_count=1, timeout_s=6, dry_run=False): """ Assert that the daemon log with specified level (e.g. ERROR, WARNING, INFO) contains expected_count lines with a substring matching the regex. When expected_count is -1, at least one match is expected. Retries until 'timeout_s' has expired. The default timeout is the default minicluster log buffering time (5 seconds) with a one second buffer. When using this method to check log files of running processes, the caller should make sure that log buffering has been disabled, for example by adding '-logbuflevel=-1' to the daemon startup options or set timeout_s to a value higher than the log flush interval. Returns the result of the very last call to line_regex.search or None if expected_count is 0 or the line_regex did not match any lines. """ {code} This often result in flaky test that hard to triage and often neglected if it does not frequently run in core exploration. We can improve this by adding boolean param into CustomClusterTestSuite.with_args, say 'disable_log_buffering', for test to declare intention to inspect log files in live minicluster. If it is True, start minicluster with '-logbuflevel=-1' for all daemons. If it is False, log WARNING on any calls to assert_log_contains(). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-12390) Enable performance related clang-tidy checks
[ https://issues.apache.org/jira/browse/IMPALA-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-12390. Resolution: Fixed > Enable performance related clang-tidy checks > > > Key: IMPALA-12390 > URL: https://issues.apache.org/jira/browse/IMPALA-12390 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > Fix For: Impala 4.5.0 > > > clang-tidy has several performance-related checks that seem like they would > be useful to enforce. Here are some examples: > {noformat} > /home/joemcdonnell/upstream/Impala/be/src/runtime/types.h:313:25: warning: > loop variable is copied but only used as const reference; consider making it > a const reference [performance-for-range-copy] > for (ColumnType child_type : col_type.children) { > ~~ ^ > const & > /home/joemcdonnell/upstream/Impala/be/src/catalog/catalog-util.cc:168:34: > warning: 'find' called with a string literal consisting of a single > character; consider using the more effective overload accepting a character > [performance-faster-string-find] > int pos = object_name.find("."); > ^~~~ > '.' > /home/joemcdonnell/upstream/Impala/be/src/util/decimal-util.h:55:53: warning: > the parameter 'b' is copied for each invocation but only used as a const > reference; consider making it a const reference > [performance-unnecessary-value-param] > static int256_t SafeMultiply(int256_t a, int256_t b, bool may_overflow) { > ^ > const & > /home/joemcdonnell/upstream/Impala/be/src/codegen/llvm-codegen.cc:847:5: > warning: 'push_back' is called inside a loop; consider pre-allocating the > vector capacity before the loop [performance-inefficient-vector-operation] > arguments.push_back(args_[i].type); > ^{noformat} > In all, they seem to flag things that developers wouldn't ordinarily notice, > and it doesn't seem to have too many false positives. We should look into > enabling these. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13505) NullPointerException in Analyzer.resolveActualPath with Calcite planner
Michael Smith created IMPALA-13505: -- Summary: NullPointerException in Analyzer.resolveActualPath with Calcite planner Key: IMPALA-13505 URL: https://issues.apache.org/jira/browse/IMPALA-13505 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Michael Smith Encountered a NullPointerException when running some TPC-DS queries (such as q8) with the Calcite planner: {code} Stack Trace:java.lang.NullPointerException at org.apache.impala.analysis.Analyzer.lambda$resolveActualPath$18(Analyzer.java:4699) at java.util.Collections$2.tryAdvance(Collections.java:4719) at java.util.Collections$2.forEachRemaining(Collections.java:4727) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at org.apache.impala.analysis.Analyzer.resolveActualPath(Analyzer.java:4690) at org.apache.impala.analysis.Analyzer.lambda$addColumnsTo$17(Analyzer.java:4655) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at org.apache.impala.analysis.Analyzer.addColumnsTo(Analyzer.java:4655) at org.apache.impala.analysis.Analyzer.addJoinColumns(Analyzer.java:4732) at org.apache.impala.planner.JoinNode.init(JoinNode.java:293) at org.apache.impala.planner.HashJoinNode.init(HashJoinNode.java:82) at org.apache.impala.calcite.rel.phys.ImpalaHashJoinNode.(ImpalaHashJoinNode.java:46) ... {code} {{SlotRef.getResolvedPath}} returns null at line 4699. Looking at the SlotRef, I don't see any way to determine an origin, so this may be part of incomplete implementation of the Calcite planner integration. To reproduce {code} $ start-impala-cluster.py -s 1 --use_calcite_planner=true $ impala-py.test tests/query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13506) Crash in RawValue::PrintValue() when running query_test/test_chars.py
Joe McDonnell created IMPALA-13506: -- Summary: Crash in RawValue::PrintValue() when running query_test/test_chars.py Key: IMPALA-13506 URL: https://issues.apache.org/jira/browse/IMPALA-13506 Project: IMPALA Issue Type: Sub-task Components: Frontend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell I ran into a crash with the Calcite planner when running test_chars.py. Here are reproducing conditions: {noformat} # Start Impala cluster with --use_calcite_planner=true # Connect to Impala using Beeswax bin/impala-shell.sh --protocol=beeswax # Run these statements use functional; WITH numbered AS ( SELECT *, row_number() over (order by cs) as rn FROM chars_tiny) SELECT * FROM ( SELECT CASE WHEN rn % 2 = 0 THEN cs END cs, CASE WHEN rn % 2 = 1 THEN cl END cl, CASE WHEN rn % 3 = 0 THEN vc END vc FROM numbered UNION ALL SELECT CASE WHEN rn % 2 = 1 THEN cs END cs, CASE WHEN rn % 2 = 0 THEN cl END cl, CASE WHEN rn % 3 = 1 THEN vc END vc FROM numbered) v{noformat} It hits this DCHECK with this stacktrace: {noformat} F1031 14:45:41.711074 2288125 raw-value.cc:471] 65447b8728b9f39a:cdb466c3] Check failed: string_val->Len() <= type.len 6 impalad!google::LogMessageFatal::~LogMessageFatal() [logging.cc : 2048 + 0x5] 7 impalad!impala::RawValue::PrintValue(void const*, impala::ColumnType const&, int, std::__cxx11::basic_stringstream, std::allocator >*, bool) [raw-value.cc : 471 + 0x16] 8 impalad!impala::AsciiQueryResultSet::AddRows(std::vector > const&, impala::RowBatch*, int, int) [query-result-set.cc : 222 + 0x1b] 9 impalad!impala::BufferedPlanRootSink::GetNext(impala::RuntimeState*, impala::QueryResultSet*, int, bool*, long) [buffered-plan-root-sink.cc : 239 + 0x1b] 10 impalad!impala::Coordinator::GetNext(impala::QueryResultSet*, int, bool*, long) [coordinator.cc : 1051 + 0x23] 11 impalad!impala::ClientRequestState::FetchRowsInternal(int, impala::QueryResultSet*, long) [client-request-state.cc : 1425 + 0x1f] 12 impalad!impala::ClientRequestState::FetchRows(int, impala::QueryResultSet*, long) [client-request-state.cc : 1272 + 0x18] 13 impalad!impala::ImpalaServer::FetchInternal(impala::TUniqueId, bool, int, beeswax::Results*) [impala-beeswax-server.cc : 688 + 0x20] 14 impalad!impala::ImpalaServer::fetch(beeswax::Results&, beeswax::QueryHandle const&, bool, int) [impala-beeswax-server.cc : 205 + 0x35]{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13504) AI functions should be included in FunctionCallExpr::isNondeterministicBuiltinFn
Joe McDonnell created IMPALA-13504: -- Summary: AI functions should be included in FunctionCallExpr::isNondeterministicBuiltinFn Key: IMPALA-13504 URL: https://issues.apache.org/jira/browse/IMPALA-13504 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell FunctionCallExpr::isNondeterministicBuiltinFn() is used to determine if a function can produce different results for different invocations with the same arguments in a single query. Currently, it applies to rand/random/uuid. It can influence sorting (see IMPALA-4728). It is also controls whether constant folding can be used (if all the arguments are constants). It would be uncommon for an AI function to be used on a constant, but it is theoretically possible. It seems like ai_generate_text / ai_generate_text_default should be on that list, because it isn't deterministic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13503) Support CustomClusterTestSuite with single cluster for a class
Michael Smith created IMPALA-13503: -- Summary: Support CustomClusterTestSuite with single cluster for a class Key: IMPALA-13503 URL: https://issues.apache.org/jira/browse/IMPALA-13503 Project: IMPALA Issue Type: Task Reporter: Michael Smith Support creating custom cluster test classes where a single cluster is configured for the whole class, and re-used for individual test cases. This can significantly speed up certain types of custom cluster tests, such as tuple cache tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13502) Constructor cleanup
Michael Smith created IMPALA-13502: -- Summary: Constructor cleanup Key: IMPALA-13502 URL: https://issues.apache.org/jira/browse/IMPALA-13502 Project: IMPALA Issue Type: Task Components: Backend Reporter: Michael Smith Various cleanup ideas around constructors identified in IMPALA-12390. - Replace {{const shared_ptr<>&}} with row pointers to make the API simpler and more general. - PrintIdSet in debug-util.h may be replaceable with PrintIdsInMultiLine for all existing use cases. - LLVM CreateBinaryPhiNode, CodegenNullPhiNode, and CodegenIsNullPhiNode should all make {{name}} mandatory. Remove empty name handling from CreateBinaryPhiNode. - Several TSaslServerTransport constructors may be unused. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13325) Use RowBatch::CopyRows in IcebergDeleteNode
[ https://issues.apache.org/jira/browse/IMPALA-13325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-13325. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Use RowBatch::CopyRows in IcebergDeleteNode > --- > > Key: IMPALA-13325 > URL: https://issues.apache.org/jira/browse/IMPALA-13325 > Project: IMPALA > Issue Type: Improvement >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.5.0 > > > Typically there are much more data records than delete records in a healthy > Iceberg table. This means it is suboptimal to copy probe rows one by one in > the IcebergDeleteNode. We should use the new RowBatch::CopyRows method to > copy tuple rows in batches. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-12769) test_query_cancel_exception failed in ASAN build
[ https://issues.apache.org/jira/browse/IMPALA-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-12769. Fix Version/s: Impala 4.5.0 Resolution: Fixed > test_query_cancel_exception failed in ASAN build > > > Key: IMPALA-12769 > URL: https://issues.apache.org/jira/browse/IMPALA-12769 > Project: IMPALA > Issue Type: Bug >Reporter: David Rorke >Assignee: Michael Smith >Priority: Major > Fix For: Impala 4.5.0 > > > IMPALA-12493 added test_query_cancel_exception. It is failing in ASAN build > with following error: > {noformat} > Error Message > assert 1 == 0 > Stacktrace > webserver/test_web_pages.py:1044: in test_query_cancel_exception assert > response_json['num_in_flight_queries'] == 0 E assert 1 == 0 > {noformat} > This appears to be the same failure reported in IMPALA-12542 but for a > different test case. It's possible that the underlying cause is the same > (timing issue caused by slowness of ASAN build) and we just need to apply the > same fix from IMPALA-12542 to this test case. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13340) COPY TESTCASE in LocalCatalog mode doesn't dump the partition and file metadata
[ https://issues.apache.org/jira/browse/IMPALA-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-13340. - Fix Version/s: Impala 4.5.0 Resolution: Fixed Resolving this. Thank [~MikaelSmith] and [~jasonmfehr] for the review! > COPY TESTCASE in LocalCatalog mode doesn't dump the partition and file > metadata > --- > > Key: IMPALA-13340 > URL: https://issues.apache.org/jira/browse/IMPALA-13340 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > IMPALA-11901 fixes the failures of using COPY TESTCASE statements in > LocalCatalog mode. However, only the table metadata is dumped, e.g. table > schema, column stats. The partition and file metadata are missing. > To reproduce the issue locally, start the Impala cluster in LocalCatalog mode. > {code:bash} > bin/start-impala-cluster.py --catalogd_args=--catalog_topic_mode=minimal > --impalad_args=--use_local_catalog{code} > Dump the metadata of a query on a partitioned table: > {noformat} > copy testcase to '/tmp' select * from functional_parquet.alltypes; > +--+ > | Test case data output path > | > +--+ > | > hdfs://localhost:20500/tmp/impala-testcase-data-c8316356-6448-4458-acad-c2f72f43c3e1 > | > +--+ > {noformat} > Check the metadata from the source cluster > {noformat} > show partitions functional_parquet.alltypes > +---+---+---++--+--+---+-+---+---+---+ > | year | month | #Rows | #Files | Size | Bytes Cached | Cache > Replication | Format | Incremental stats | Location > | EC Policy | > +---+---+---++--+--+---+-+---+---+---+ > | 2009 | 1 | -1| 1 | 8.60KB | NOT CACHED | NOT CACHED > | PARQUET | false | > hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=1 | > NONE | > | 2009 | 2 | -1| 1 | 8.09KB | NOT CACHED | NOT CACHED > | PARQUET | false | > hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=2 | > NONE | > | 2009 | 3 | -1| 1 | 8.60KB | NOT CACHED | NOT CACHED > | PARQUET | false | > hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=3 | > NONE | > | 2009 | 4 | -1| 1 | 8.20KB | NOT CACHED | NOT CACHED > | PARQUET | false | > hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=4 | > NONE | > | 2009 | 5 | -1| 1 | 8.55KB | NOT CACHED | NOT CACHED > | PARQUET | false | > hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=5 | > NONE | > | 2009 | 6 | -1| 1 | 8.23KB | NOT CACHED | NOT CACHED > | PARQUET | false | > hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=6 | > NONE | > | 2009 | 7 | -1| 1 | 8.25KB | NOT CACHED | NOT CACHED > | PARQUET | false | > hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=7 | > NONE | > | 2009 | 8 | -1| 1 | 8.60KB | NOT CACHED | NOT CACHED > | PARQUET | false | > hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=8 | > NONE | > | 2009 | 9 | -1| 1 | 8.41KB | NOT CACHED | NOT CACHED > | PARQUET | false | > hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=9 | > NONE | > | 2009 | 10| -1| 1 | 8.60KB | NOT CACHED | NOT CACHED > | PARQUET | false | > hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=10 | > NONE | > | 2009 | 11|
[jira] [Resolved] (IMPALA-13497) Add profile counters for bytes written / read from the tuple cache
[ https://issues.apache.org/jira/browse/IMPALA-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-13497. Fix Version/s: Impala 4.5.0 Resolution: Fixed > Add profile counters for bytes written / read from the tuple cache > -- > > Key: IMPALA-13497 > URL: https://issues.apache.org/jira/browse/IMPALA-13497 > Project: IMPALA > Issue Type: Task > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > Fix For: Impala 4.5.0 > > > The size of the tuple cache entry written / read is useful information for > understanding the performance of the cache. Having this information in the > profile will help us tune the placement policy for the tuple cache nodes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13500) test_invalidate_stale_partition_on_reload is flaky
Riza Suminto created IMPALA-13500: - Summary: test_invalidate_stale_partition_on_reload is flaky Key: IMPALA-13500 URL: https://issues.apache.org/jira/browse/IMPALA-13500 Project: IMPALA Issue Type: Bug Components: Catalog, Frontend Reporter: Riza Suminto Assignee: Sai Hemanth Gantasala TestEventProcessingCustomConfigs.test_invalidate_stale_partition_on_reload is flaky in ARM build for not finding log lines it is looking for. {code:java} Error Message AssertionError: Expected 1 lines in file /data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-1e5d.vpc.cloudera.com.jenkins.log.INFO.20241030-041545.3408110 matching regex 'Invalidated objects in cache: \[partition test_invalidate_stale_partition_on_reload_382624f9.test_invalidate_table:p=\d \(id=0\)\]', but found 0 lines. Last line was: I1030 04:16:03.199949 3408505 query-exec-mgr.cc:219] ReleaseQueryState(): deleted query_id=7745d0a812d0b474:7883f4f8 Stacktrace custom_cluster/test_events_custom_configs.py:1328: in test_invalidate_stale_partition_on_reload self.assert_impalad_log_contains('INFO', log_regex % 0) common/impala_test_suite.py:1311: in assert_impalad_log_contains "impalad", level, line_regex, expected_count, timeout_s, dry_run) common/impala_test_suite.py:1364: in assert_log_contains (expected_count, log_file_path, line_regex, found, line) E AssertionError: Expected 1 lines in file /data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-1e5d.vpc.cloudera.com.jenkins.log.INFO.20241030-041545.3408110 matching regex 'Invalidated objects in cache: \[partition test_invalidate_stale_partition_on_reload_382624f9.test_invalidate_table:p=\d \(id=0\)\]', but found 0 lines. Last line was: E I1030 04:16:03.199949 3408505 query-exec-mgr.cc:219] ReleaseQueryState(): deleted query_id=7745d0a812d0b474:7883f4f8 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13501) Conflicting commits to Iceberg tables leave uncommitted orphan files
Noemi Pap-Takacs created IMPALA-13501: - Summary: Conflicting commits to Iceberg tables leave uncommitted orphan files Key: IMPALA-13501 URL: https://issues.apache.org/jira/browse/IMPALA-13501 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Noemi Pap-Takacs Iceberg supports multiple writers with optimistic concurrency. Each writer can write new files which are then added to the table after a validation check to ensure that the commit does not conflict with other modifications made during the execution. When there was a conflicting change and the newly written files cannot be committed, there are 2 ways to proceed: the commit can be retried and rebased on top of the latest snapshot. If this cannot resolve the conflict, the change cannot be committed and the files become orphan files in the file system. It would be nice to remove the remaining files from an unsuccessful commit in one step. Deleting orphan files later as a table maintenance step is also a possible resolution. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13499) REFRESH on Iceberg tables can lead to data loss
Saulius Valatka created IMPALA-13499: Summary: REFRESH on Iceberg tables can lead to data loss Key: IMPALA-13499 URL: https://issues.apache.org/jira/browse/IMPALA-13499 Project: IMPALA Issue Type: Bug Components: Catalog Affects Versions: Impala 4.4.1 Reporter: Saulius Valatka When running a REFRESH statement on an Iceberg table the catalog loads it from the Hive metastore and later performs an {{alter_table}} [here|https://github.com/apache/impala/blob/bdce7778b239f6fbf8ea89ea32b91a83c8017828/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L445]. It does so without taking a Hive lock, meaning that if any external process commits to the table between load and alter, the newly committed "metadata_location" property will be overwritten with the previous value and effectively will result in data loss. It should either take a Hive lock when doing this, or, if "{{{}iceberg.engine.hive.lock-enabled = false{}}}" use "{{{}alter_table_with_environmentContext{}}}" and set {{expected_parameter_key}} / expected_parameter_value to metadata_location / . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13498) TestQueryLive.test_executor_groups is flaky
Michael Smith created IMPALA-13498: -- Summary: TestQueryLive.test_executor_groups is flaky Key: IMPALA-13498 URL: https://issues.apache.org/jira/browse/IMPALA-13498 Project: IMPALA Issue Type: Task Components: Test Affects Versions: Impala 4.4.1 Reporter: Michael Smith {code} custom_cluster/test_query_live.py:299: in test_executor_groups self.assert_only_coordinators(result.runtime_profile, coords=[0, 1], execs=[2, 3]) custom_cluster/test_query_live.py:63: in assert_only_coordinators self.assert_impalads(profile, coords, execs) custom_cluster/test_query_live.py:60: in assert_impalads assert ":" + str(DEFAULT_KRPC_PORT + port_idx) not in profile E assert ':27003' not in 'Query (id=1d439574eef2... TotalTime: 15.635us\n' E ':27003' is contained here: E Query (id=1d439574eef2a2d7:27003d50): E ? ++ {code} The assertion here is not specific enough, so unique identifiers can accidentally match it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13497) Add profile counters for bytes written / read from the tuple cache
Joe McDonnell created IMPALA-13497: -- Summary: Add profile counters for bytes written / read from the tuple cache Key: IMPALA-13497 URL: https://issues.apache.org/jira/browse/IMPALA-13497 Project: IMPALA Issue Type: Task Components: Backend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell Assignee: Joe McDonnell The size of the tuple cache entry written / read is useful information for understanding the performance of the cache. Having this information in the profile will help us tune the placement policy for the tuple cache nodes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13496) JniFrontend should use JniUtil.serializeToThrift() for serialization
Joe McDonnell created IMPALA-13496: -- Summary: JniFrontend should use JniUtil.serializeToThrift() for serialization Key: IMPALA-13496 URL: https://issues.apache.org/jira/browse/IMPALA-13496 Project: IMPALA Issue Type: Task Components: Frontend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell JniFrontend.java has many locations like this: {code:java} try { TSerializer serializer = new TSerializer(protocolFactory_); return serializer.serialize(result); } catch (TException e) { throw new InternalException(e.getMessage()); } {code} This is the same as JniUtil.serializeToThrift(). We should standardize on JniUtil.serializeToThrift() to avoid the code duplication. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13495) Calcite planner: Make exceptions easier to classify
Joe McDonnell created IMPALA-13495: -- Summary: Calcite planner: Make exceptions easier to classify Key: IMPALA-13495 URL: https://issues.apache.org/jira/browse/IMPALA-13495 Project: IMPALA Issue Type: Sub-task Components: Frontend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell Assignee: Joe McDonnell To make it easier to diagnose what is going on, it would be useful for the Calcite planner to produce categorized exceptions like ParseException, AnalysisException, etc. It would also be useful to produce a UnsupportedFeatureException for things that are not expected to work (e.g. HBase, Kudu, views, etc). Right now, the Calcite planner converts all exceptions from CalciteJniFrontend to InternalException, which makes it harder to classify them after the fact. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13494) Calcite planner: group_concat failing with distinct
Steve Carlin created IMPALA-13494: - Summary: Calcite planner: group_concat failing with distinct Key: IMPALA-13494 URL: https://issues.apache.org/jira/browse/IMPALA-13494 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin The following query is failing in distinct.test select sum(len_orderkey), sum(len_comment) from ( select length(group_concat(distinct cast(l_orderkey as string))) len_orderkey, length(group_concat(distinct(l_comment))) len_comment from tpch.lineitem group by l_comment ) v -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13493) Calcite planner: parsing error on bracket hint comment
Steve Carlin created IMPALA-13493: - Summary: Calcite planner: parsing error on bracket hint comment Key: IMPALA-13493 URL: https://issues.apache.org/jira/browse/IMPALA-13493 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin The hint comment surrounded by square brackets is causing a parsing error -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13492) TestIcebergTable.test_describe_history_params is flaky.
Riza Suminto created IMPALA-13492: - Summary: TestIcebergTable.test_describe_history_params is flaky. Key: IMPALA-13492 URL: https://issues.apache.org/jira/browse/IMPALA-13492 Project: IMPALA Issue Type: Bug Components: Test Affects Versions: Impala 4.4.0 Reporter: Riza Suminto Assignee: Zoltán Borók-Nagy TestIcebergTable.test_describe_history_params is flaky with following error message and stack trace on failed run: {code:java} Error Message: query_test/test_iceberg.py:504: in test_describe_history_params self.expect_num_snapshots_from(impalad_client, tbl_name, now_budapest, 1) common/iceberg_test_suite.py:39: in expect_num_snapshots_from expected_result_size=expected_result_size) util/iceberg_util.py:102: in get_snapshots rows = impalad_client.execute(query, user) common/impala_connection.py:216: in execute fetch_profile_after_close=fetch_profile_after_close) beeswax/impala_beeswax.py:190: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:381: in __execute_query handle = self.execute_query_async(query_string, user=user) beeswax/impala_beeswax.py:375: in execute_query_async handle = self.__do_rpc(lambda: self.imp_service.query(query,)) beeswax/impala_beeswax.py:553: in __do_rpc raise ImpalaBeeswaxException(self.__build_error_message(b), b) E ImpalaBeeswaxException: Query 884a5fa72d806f79:7c6e01ce failed: E AnalysisException: Invalid TIMESTAMP expression: UDF WARNING: Timestamp '2024-10-27 02:47:33.482594000' in timezone 'Europe/Budapest' could not be converted to UTC EEE CAUSED BY: InternalException: UDF WARNING: Timestamp '2024-10-27 02:47:33.482594000' in timezone 'Europe/Budapest' could not be converted to UTC Stacktrace: query_test/test_iceberg.py:504: in test_describe_history_params self.expect_num_snapshots_from(impalad_client, tbl_name, now_budapest, 1) common/iceberg_test_suite.py:39: in expect_num_snapshots_from expected_result_size=expected_result_size) util/iceberg_util.py:102: in get_snapshots rows = impalad_client.execute(query, user) common/impala_connection.py:216: in execute fetch_profile_after_close=fetch_profile_after_close) beeswax/impala_beeswax.py:190: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:381: in __execute_query handle = self.execute_query_async(query_string, user=user) beeswax/impala_beeswax.py:375: in execute_query_async handle = self.__do_rpc(lambda: self.imp_service.query(query,)) beeswax/impala_beeswax.py:553: in __do_rpc raise ImpalaBeeswaxException(self.__build_error_message(b), b) E ImpalaBeeswaxException: Query 884a5fa72d806f79:7c6e01ce failed: E AnalysisException: Invalid TIMESTAMP expression: UDF WARNING: Timestamp '2024-10-27 02:47:33.482594000' in timezone 'Europe/Budapest' could not be converted to UTC E E E CAUSED BY: InternalException: UDF WARNING: Timestamp '2024-10-27 02:47:33.482594000' in timezone 'Europe/Budapest' could not be converted to UTC {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13477) CTAS query should set request_pool in QueryStateRecord
[ https://issues.apache.org/jira/browse/IMPALA-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riza Suminto resolved IMPALA-13477. --- Fix Version/s: Impala 4.5.0 Resolution: Fixed > CTAS query should set request_pool in QueryStateRecord > -- > > Key: IMPALA-13477 > URL: https://issues.apache.org/jira/browse/IMPALA-13477 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Minor > Fix For: Impala 4.5.0 > > > Resource Pool information for CTAS query is missing from /queries page of > WebUI. This is because CTAS query has TExecRequest.stmt_type = DDL. However, > CTAS also has TQueryExecRequest.stmt_type = DML and subject to > AdmissionControl. Therefore, its request pool must be recorded into > QueryStateRecord and displayed at /queries page of WebUI. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13491) Add config on catalogd for controlling the number of concurrent loading/refresh commands
Manish Maheshwari created IMPALA-13491: -- Summary: Add config on catalogd for controlling the number of concurrent loading/refresh commands Key: IMPALA-13491 URL: https://issues.apache.org/jira/browse/IMPALA-13491 Project: IMPALA Issue Type: Improvement Reporter: Manish Maheshwari When running Table Loading or Refresh commands, catalogd requires working memory in proportion to the number of tables been refreshed. While we have a table level lock, we dont have a config to control concurrent load/refresh operations. In case of customers that run refresh in parallel in multiple threads, the number of load/refresh command can cause OOM on the catalog due to running out of working memory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13469) test_query_cpu_count_on_insert seems to be flaky
[ https://issues.apache.org/jira/browse/IMPALA-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riza Suminto resolved IMPALA-13469. --- Target Version: Impala 4.5.0 Resolution: Fixed > test_query_cpu_count_on_insert seems to be flaky > > > Key: IMPALA-13469 > URL: https://issues.apache.org/jira/browse/IMPALA-13469 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Riza Suminto >Priority: Major > Labels: broken-build > Fix For: Impala 4.5.0 > > > We found that the test test_query_cpu_count_on_insert() that was recently > added in IMPALA-13445 seems to be flaky could fail with the following error. > +*Error Message*+ > {code:java} > ImpalaBeeswaxException: Query 554a332e9f9b499a:da216f59 failed: > IllegalStateException: null > {code} > +*Stacktrace*+ > {code:java} > custom_cluster/test_executor_groups.py:1375: in test_query_cpu_count_on_insert > "Verdict: Match", "CpuAsk: 9", "CpuAskBounded: 9", "| > partitions=unavailable"]) > custom_cluster/test_executor_groups.py:946: in _run_query_and_verify_profile > result = self.execute_query_expect_success(self.client, query) > common/impala_test_suite.py:891: in wrapper > return function(*args, **kwargs) > common/impala_test_suite.py:901: in execute_query_expect_success > result = cls.__execute_query(impalad_client, query, query_options, user) > common/impala_test_suite.py:1045: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:216: in execute > fetch_profile_after_close=fetch_profile_after_close) > beeswax/impala_beeswax.py:190: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:381: in __execute_query > handle = self.execute_query_async(query_string, user=user) > beeswax/impala_beeswax.py:375: in execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > beeswax/impala_beeswax.py:553: in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > E ImpalaBeeswaxException: Query 554a332e9f9b499a:da216f59 failed: > E IllegalStateException: null > {code} > > The stack trace from the coordinator is given as follows too. > {code} > I1021 09:42:04.707075 18064 jni-util.cc:321] > 554a332e9f9b499a:da216f59] java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:496) > at > org.apache.impala.planner.DistributedPlanner.createDmlFragment(DistributedPlanner.java:308) > at > org.apache.impala.planner.Planner.createPlanFragments(Planner.java:173) > at org.apache.impala.planner.Planner.createPlans(Planner.java:310) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1969) > at > org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2968) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2730) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2269) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:2030) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > I1021 09:42:04.707089 18064 status.cc:129] 554a332e9f9b499a:da216f59] > IllegalStateException: null > @ 0x10c3dc7 impala::Status::Status() > @ 0x19b3668 impala::JniUtil::GetJniExceptionMsg() > @ 0x16b39ee impala::JniCall::Call<>() > @ 0x1684d0f impala::Frontend::GetExecRequest() > @ 0x23acec3 impala::QueryDriver::DoFrontendPlanning() > @ 0x23ad0b3 impala::QueryDriver::RunFrontendPlanner() > @ 0x17124cb impala::ImpalaServer::ExecuteInternal() > @ 0x17131ba impala::ImpalaServer::Execute() > @ 0x1885fd1 impala::ImpalaServer::query() > @ 0x175c4bc beeswax::BeeswaxServiceProcessorT<>::process_query() > @ 0x17e0545 beeswax::BeeswaxServiceProcessorT<>::dispatchCall() > @ 0x17e0aea impala::ImpalaServiceProcessorT<>::dispatchCall() > @ 0xf6c5d3 apache::thrift::TDispatchProcessor::process() > @ 0x13ea8b6 > apache::thrift::server::TAcceptQueueServer::Task::run() > @ 0x13d727b impala::ThriftThread::RunRunnable() > @ 0x13d8e
[jira] [Created] (IMPALA-13490) TpcdsCpuCostPlannerTest#testNonTpcdsDdl() could fail after IMPALA-13469
Fang-Yu Rao created IMPALA-13490: Summary: TpcdsCpuCostPlannerTest#testNonTpcdsDdl() could fail after IMPALA-13469 Key: IMPALA-13490 URL: https://issues.apache.org/jira/browse/IMPALA-13490 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 4.5.0 Reporter: Fang-Yu Rao Assignee: Riza Suminto We found that testNonTpcdsDdl() in [TpcdsCpuCostPlannerTest.java|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java] could fail after IMPALA-13469 with the following error. It looks like the expected value of 'segment-costs' does not match the actual one in the single node plan. +*Error Message*+ {code:java} Section PLAN of query at line 651: create table t partitioned by (c_nationkey) sort by (c_custkey) as select c_custkey, max(o_totalprice) as maxprice, c_nationkey from tpch.orders join tpch.customer on c_custkey = o_custkey where c_nationkey < 10 group by c_custkey, c_nationkey Actual does not match expected result: Max Per-Host Resource Reservation: Memory=19.44MB Threads=1 Per-Host Resource Estimates: Memory=35MB F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB thread-reservation=1 runtime-filters-memory=1.00MB | max-parallelism=1 segment-costs=[8689789, 272154, 4822204] ^ WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, PARTITION-KEYS=(c_nationkey)] | partitions=25 | output exprs: c_custkey, max(o_totalprice), c_nationkey | mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=4822204 | 04:SORT | order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB thread-reservation=0 | tuple-ids=3 row-size=18B cardinality=228.68K cost=272154 | in pipelines: 04(GETNEXT), 03(OPEN) | 03:AGGREGATE [FINALIZE] | output: max(o_totalprice) | group by: c_custkey, c_nationkey | mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB thread-reservation=0 | tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818 | in pipelines: 03(GETNEXT), 00(OPEN) | 02:HASH JOIN [INNER JOIN] | hash predicates: o_custkey = c_custkey | fk/pk conjuncts: o_custkey = c_custkey | runtime filters: RF000[bloom] <- c_custkey | mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0 | tuple-ids=0,1 row-size=26B cardinality=228.68K cost=441187 | in pipelines: 00(GETNEXT), 01(OPEN) | |--01:SCAN HDFS [tpch.customer] | HDFS partitions=1/1 files=1 size=23.08MB | predicates: c_nationkey < CAST(10 AS SMALLINT) | stored statistics: | table: rows=150.00K size=23.08MB | columns: all | extrapolated-rows=disabled max-scan-range-rows=150.00K | mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0 | tuple-ids=1 row-size=10B cardinality=15.00K cost=864778 | in pipelines: 01(GETNEXT) | 00:SCAN HDFS [tpch.orders] HDFS partitions=1/1 files=1 size=162.56MB runtime filters: RF000[bloom] -> o_custkey stored statistics: table: rows=1.50M size=162.56MB columns: all extrapolated-rows=disabled max-scan-range-rows=1.18M mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0 tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006 in pipelines: 00(GETNEXT) Expected: Max Per-Host Resource Reservation: Memory=19.44MB Threads=1 Per-Host Resource Estimates: Memory=35MB F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB thread-reservation=1 runtime-filters-memory=1.00MB | max-parallelism=1 segment-costs=[8689789, 17851, 3700630] WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, PARTITION-KEYS=(c_nationkey)] | partitions=25 | output exprs: c_custkey, max(o_totalprice), c_nationkey | mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=3700630 | 04:SORT | order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB thread-reservation=0 | tuple-ids=3 row-size=18B cardinality=15.00K cost=17851 | in pipelines: 04(GETNEXT), 03(OPEN) | 03:AGGREGATE [FINALIZE] | output: max(o_totalprice) | group by: c_custkey, c_nationkey | mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB thread-reservation=0 | tuple-ids=2 row-size=18B cardinality=15.00K cost=1349818 | in pipelines: 03(GETNEXT), 00(OPEN) | 02:HASH JOIN [INNER JOIN] | hash predicates: o_custkey = c_custkey | fk/pk conjuncts: o_custkey = c_custkey | runtime filters: RF000[bloom] <- c_custkey | mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0 | t
[jira] [Created] (IMPALA-13489) Planner should estimate BATCH_SIZE automatically
Manish Maheshwari created IMPALA-13489: -- Summary: Planner should estimate BATCH_SIZE automatically Key: IMPALA-13489 URL: https://issues.apache.org/jira/browse/IMPALA-13489 Project: IMPALA Issue Type: Improvement Reporter: Manish Maheshwari Planner should estimate BATCH_SIZE automatically according to the materialized row size to avoid the scanner thread becoming memory bound -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13488) Add query option for DEFAULT_INITIAL_TUPLE_CAPACITY
Manish Maheshwari created IMPALA-13488: -- Summary: Add query option for DEFAULT_INITIAL_TUPLE_CAPACITY Key: IMPALA-13488 URL: https://issues.apache.org/jira/browse/IMPALA-13488 Project: IMPALA Issue Type: Improvement Reporter: Manish Maheshwari DEFAULT_INITIAL_TUPLE_CAPACITY is defaulted for 4 and is hardcoded. We need to convert this into a query option -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13487) Capture metrics for memory allocation in execution profile
Manish Maheshwari created IMPALA-13487: -- Summary: Capture metrics for memory allocation in execution profile Key: IMPALA-13487 URL: https://issues.apache.org/jira/browse/IMPALA-13487 Project: IMPALA Issue Type: Improvement Reporter: Manish Maheshwari Capture metrics for memory allocation in execution profile to identify queries that are bound on memory allocation and allocation wait times esp during running many concurrent queries. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13486) Improve memory estimation and allocation for queries on complex types
Manish Maheshwari created IMPALA-13486: -- Summary: Improve memory estimation and allocation for queries on complex types Key: IMPALA-13486 URL: https://issues.apache.org/jira/browse/IMPALA-13486 Project: IMPALA Issue Type: Improvement Reporter: Manish Maheshwari Improve memory estimation and allocation for queries on complex types taking into account the number of values in the complex type and any functions like unnest that would consume a lot of memory. Currently underestimating memory causes a lot of tcmalloc calls and the query get blocked waiting for tcmalloc central freelist. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13485) Support stats capture on complex types
Manish Maheshwari created IMPALA-13485: -- Summary: Support stats capture on complex types Key: IMPALA-13485 URL: https://issues.apache.org/jira/browse/IMPALA-13485 Project: IMPALA Issue Type: Improvement Reporter: Manish Maheshwari In compute stats, we must capture the below - * length of complex types * min, max, avg for numeric item types * min, max, avg of length for string item types For nested complex types, we need to evaluate how this would be captured. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13484) Querying an Iceberg table with TIMESTAMP_LTZ can cause data loss
Gabor Kaszab created IMPALA-13484: - Summary: Querying an Iceberg table with TIMESTAMP_LTZ can cause data loss Key: IMPALA-13484 URL: https://issues.apache.org/jira/browse/IMPALA-13484 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Gabor Kaszab *+Repro steps:+* 1. Create a table with Hive that has a TS_LTZ column: {code:java} create table ice_hive_tbl (i int, ts_ltz timestamp with local time zone) stored by iceberg; {code} 2. Insert some data using Hive: {code:java} insert into ice_hive_tbl values (1, current_timestamp()); {code} 3. Add a breakpoint in Impala to the table loading code right before Impala sends out an alter_table to HMS to change the column type from TS_LTZ to TS. [Here|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463], for instance. 4. Query the table from Impala. This triggers a table load. Impala will come to a decision that it should change the TS_LTZ type of a column to TS. However, the break point will hold it doing this at this point. 5. Use Hive to add additional rows into the table: {code:java} insert into ice_hive_tbl values (2, current_timestamp()); insert into ice_hive_tbl values (3, current_timestamp()); {code} 6. Release the breakpoint and let Impala finish the SELECT query started at 4) 7. Do another SELECT * from Hive and/or Impala and verify that the extra rows added at 5) are not present in the table. *+Root cause:+* When Impala changes the TS_LTZ column to TS it does so by calling alter_table() on HMS directly. It gives a Metastore Table object to HMS as the desired state of the table. HMS then persists this table object. The problem with this: - Impala doesn't use the Iceberg API to alter the table. As a result there is no conflict detection performed, and it won't be detected that another commits went into the table since Impala grabbed a table object from HMS. - The metadata.json path is part of the table properties, and when Impala calls alter_table(tbl_obj) HMS will also persist this metadata path to the table, even though there were other changes that already moved the metadata path forward. - Essentially this will revert the changes on the table back to the state when Impala loaded the table object from HMS. - In a high-frequency scenario this could cause problems when Hive (or even Spark) heavily writes the table and meanwhile Impala reads this table. Some snapshots could be unintentionally reverted by this behavior and as a result could cause data loss or any changes like deletes being reverted. {+}Just a note{+}, FWIW, with the current approach Impala doesn't change the column types in the Iceberg metadata, it does change the column types in HMS. So even with this, the Iceberg metadata would show the column type as timestamptz. {+}Note2{+}, I described this problem using timestamp with local time zone as an example but it could also be triggered by other column types not entirely compatible with Impala. I haven't made my research to find out if there is any other such type, though. {+}Note3{+}, this issue seems to be there forever. I found the code that triggers this being added by one of the first changes wrt Iceberg integration, the "[Create Iceberg table|https://github.com/apache/impala/commit/8fcad905a12d018eb0a354f7e4793e5b0d5efd3b]"; change. *+Possible solutions:+* 1. Impala can do the alter table by calling the Iceberg API and not HMS directly. There are thing to be careful about: - With this approach would the above repro steps make the table loading fail due to conflict between commits on the tables? Or could the schema change be merged automatically be Iceberg lib to the latest state even if there had been changes on the table? I think this would work as expected and won't reject loading the table, but we should make sure when testing this. - With this approach Impala would set the TS_LTZ cols to TS properly causing no snapshots to be lost. However, when a new write is performed by Hive/Spark, they'd set the col types back to TS_LTZ. And then when Impala reads the table again, it will set these cols to TS again. And so on. Question is, would a scenario like this flood Iceberg metadata, e.g. metadata.json with all this uncontrolled schema changes? - Now we talk about schema changes, but in fact what the code does now is way wider than that. It sends a table object to HMS to persist it. We have to double check if the current approach only persists schema changes or could do any other changes too. E.g. the code also sets DO_NOT_UPDATE_STATS property and the last DDL time too. Could it change anything else as well that we might miss with this approach? 2. Do not do any alter_tables after loading the Iceberg table This ap
[jira] [Closed] (IMPALA-12652) Limit Length of Completed Queries Insert DML
[ https://issues.apache.org/jira/browse/IMPALA-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Fehr closed IMPALA-12652. --- Resolution: Fixed Implemented with commit https://github.com/apache/impala/commit/711a9f2bad84f92dc4af61d49ae115f0dc4239da > Limit Length of Completed Queries Insert DML > > > Key: IMPALA-12652 > URL: https://issues.apache.org/jira/browse/IMPALA-12652 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Jason Fehr >Assignee: Jason Fehr >Priority: Major > Labels: backend, workload-management > > Implement a coordinator startup flag that limits the max length (number of > characters) of the insert DML statement that inserts records into the > impala_query_log table. > The purpose of this flag is to ensure that workload management does not > generate an insert DML statement that exceeds Impala's max length for a sql > statement (approximately 16 megabytes or 16 million characters). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13483) Calcite Planner: some scalar subquery throws exception
weihua zhang created IMPALA-13483: - Summary: Calcite Planner: some scalar subquery throws exception Key: IMPALA-13483 URL: https://issues.apache.org/jira/browse/IMPALA-13483 Project: IMPALA Issue Type: Sub-task Reporter: weihua zhang {code:java} create table correlated_scalar_t1(c1 bigint, c2 bigint); create table correlated_scalar_t2(c1 bigint, c2 bigint); insert into correlated_scalar_t1 values (1,null),(null,1),(1,2), (null,2),(1,3), (2,4), (2,5), (3,3), (3,4), (20,2), (22,3), (24,4),(null,null); insert into correlated_scalar_t2 values (1,null),(null,1),(1,4), (1,2), (null,3), (2,4), (3,7), (3,9),(null,null),(5,1); select c1 from correlated_scalar_t1 where correlated_scalar_t1.c2 > (select c1 from correlated_scalar_t2 where correlated_scalar_t1.c1 = correlated_scalar_t2.c1 and correlated_scalar_t2.c2 < 4) order by c1;{code} I1023 19:56:24.310750 1989386 CalciteOptimizer.java:184] 044892e8f77df486:abfd3cda] [Impala plan] LogicalSort(sort0=[$0], dir0=[ASC]), id = 717 LogicalProject(C1=[$0]), id = 716 LogicalJoin(condition=[AND(=($0, $2), >($1, $3))], joinType=[inner]), id = 715 LogicalTableScan(table=[[default, correlated_scalar_t1]]), id = 547 LogicalAggregate(group=[{0}], agg#0=[SINGLE_VALUE($1)]), id = 714 LogicalProject(c11=[$0], C1=[$0]), id = 713 LogicalFilter(condition=[AND(<($1, 4), IS NOT NULL($0))]), id = 712 LogicalTableScan(table=[[default, correlated_scalar_t2]]), id = 549 I1023 19:56:24.312273 1989386 CalciteJniFrontend.java:174] 044892e8f77df486:abfd3cda] Optimized logical plan I1023 19:56:24.312394 1989386 CalciteMetadataHandler.java:202] 044892e8f77df486:abfd3cda] Loaded tables: correlated_scalar_t1, correlated_scalar_t2 I1023 19:56:24.312475 1989386 AuthorizationUtil.java:100] 044892e8f77df486:abfd3cda] Authorization is 'DISABLED'. I1023 19:56:24.79 1989386 CalciteJniFrontend.java:123] 044892e8f77df486:abfd3cda] Calcite planner failed. I1023 19:56:24.333417 1989386 CalciteJniFrontend.java:124] 044892e8f77df486:abfd3cda] Exception: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 I1023 19:56:24.333540 1989386 CalciteJniFrontend.java:126] 044892e8f77df486:abfd3cda] Stack Trace:java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 at java.util.ArrayList.rangeCheck(ArrayList.java:659) at java.util.ArrayList.get(ArrayList.java:435) at org.apache.impala.calcite.rel.util.CreateExprVisitor.visitInputRef(CreateExprVisitor.java:51) at org.apache.impala.calcite.rel.util.CreateExprVisitor.visitInputRef(CreateExprVisitor.java:33) at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:125) at org.apache.impala.calcite.rel.util.CreateExprVisitor.visitCall(CreateExprVisitor.java:58) at org.apache.impala.calcite.rel.util.CreateExprVisitor.visitCall(CreateExprVisitor.java:33) at org.apache.calcite.rex.RexCall.accept(RexCall.java:189) at org.apache.impala.calcite.rel.node.ImpalaJoinRel.getConditionConjuncts(ImpalaJoinRel.java:412) at org.apache.impala.calcite.rel.node.ImpalaJoinRel.getPlanNode(ImpalaJoinRel.java:101) at org.apache.impala.calcite.rel.node.ImpalaProjectRel.getChildPlanNode(ImpalaProjectRel.java:117) at org.apache.impala.calcite.rel.node.ImpalaProjectRel.getPlanNode(ImpalaProjectRel.java:62) at org.apache.impala.calcite.rel.node.ImpalaSortRel.getChildPlanNode(ImpalaSortRel.java:141) at org.apache.impala.calcite.rel.node.ImpalaSortRel.getPlanNode(ImpalaSortRel.java:84) at org.apache.impala.calcite.service.CalcitePhysPlanCreator.create(CalcitePhysPlanCreator.java:51) at org.apache.impala.calcite.service.CalciteJniFrontend.createExecRequest(CalciteJniFrontend.java:108) I1023 19:56:24.333645 1989386 jni-util.cc:288] 044892e8f77df486:abfd3cda] org.apache.impala.common.InternalException: Index: 3, Size: 3 at org.apache.impala.calcite.service.CalciteJniFrontend.createExecRequest(CalciteJniFrontend.java:127) I1023 19:56:24.333654 1989386 status.cc:129] 044892e8f77df486:abfd3cda] InternalException: Index: 3, Size: 3 @ 0x11f6c5d impala::Status::Status() @ 0x1b579e6 impala::JniUtil::GetJniExceptionMsg() @ 0x183b922 impala::JniCall::Call<>() @ 0x180e86a impala::Frontend::GetExecRequest() @ 0x252d1d4 impala::QueryDriver::RunFrontendPlanner() @ 0x18a7fd5 impala::ImpalaServer::ExecuteInternal() @ 0x18a9459 impala::ImpalaServer::Execute() @ 0x1a58c54 impala::ImpalaServer::ExecuteStatementCommon() @ 0x1a5a4a2 impala::ImpalaServer::ExecuteStatement() @ 0x192e001 apache::hive::service::cli::thrift::TCLIServiceProcessorT
[jira] [Created] (IMPALA-13479) Patch gperftools to allow max_total_thread_cache_bytes to exceed 1GB
Joe McDonnell created IMPALA-13479: -- Summary: Patch gperftools to allow max_total_thread_cache_bytes to exceed 1GB Key: IMPALA-13479 URL: https://issues.apache.org/jira/browse/IMPALA-13479 Project: IMPALA Issue Type: Improvement Components: Infrastructure Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell gperftools limits max_total_thread_cache_bytes to 1GB here: [https://github.com/gperftools/gperftools/blob/gperftools-2.10/src/thread_cache.cc#L520-L523] {noformat} void ThreadCache::set_overall_thread_cache_size(size_t new_size) { // Clip the value to a reasonable range if (new_size < kMinThreadCacheSize) new_size = kMinThreadCacheSize; if (new_size > (1<<30)) new_size = (1<<30); // Limit to 1GB{noformat} I confirmed that setting --tcmalloc_max_total_thread_cache_bytes=2147483648 still results in a 1GB limit. Sometimes, we would want a higher limit for systems with a large amount of memory and CPUs. For example, some systems now have 1TB of memory and 96 CPUs. With high concurrency, there is high contention on tcmalloc locks on central data structures. Increasing the total thread cache size could avoid this, and a value higher than 1GB is still a small part of system memory. We can patch our toolchain gperftools to allow a higher value (and notify gperftools community). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13482) Calcite planner: Bug fixees for an analytics.test
Steve Carlin created IMPALA-13482: - Summary: Calcite planner: Bug fixees for an analytics.test Key: IMPALA-13482 URL: https://issues.apache.org/jira/browse/IMPALA-13482 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin Specifically, select lag(coalesce(505, 1 + NULL), 1) over (order by int_col desc) from alltypestiny had a couple of issues that needed fixing -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13481) Calcite planner: Add support for various analytic and agg functions
Steve Carlin created IMPALA-13481: - Summary: Calcite planner: Add support for various analytic and agg functions Key: IMPALA-13481 URL: https://issues.apache.org/jira/browse/IMPALA-13481 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13478) Don't sync tuple cache file contents to disk immediately
Joe McDonnell created IMPALA-13478: -- Summary: Don't sync tuple cache file contents to disk immediately Key: IMPALA-13478 URL: https://issues.apache.org/jira/browse/IMPALA-13478 Project: IMPALA Issue Type: Task Components: Backend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell Currently, the tuple cache file writer syncs the file contents to disk before closing the file. This slows down the write path considerably, especially if disks are slow. This should be moved off of the fast path and done asynchronously. As a first step, this can remove the sync call and close the file without syncing. Other readers are still able to access it, and the kernel will flush the buffers as needed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13480) PlannerTest.testAggregation should VALIDATE_CARDINALITY
Riza Suminto created IMPALA-13480: - Summary: PlannerTest.testAggregation should VALIDATE_CARDINALITY Key: IMPALA-13480 URL: https://issues.apache.org/jira/browse/IMPALA-13480 Project: IMPALA Issue Type: Improvement Components: Test Affects Versions: Impala 4.4.0 Reporter: Riza Suminto Assignee: Riza Suminto PlannerTest.testAggregation does not VALIDATE_CARDINALITY today. Validating cardinality will allow us to track our estimation quality and capture behavior change like https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L74-L76 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13477) CTAS query should set request_pool in QueryStateRecord
Riza Suminto created IMPALA-13477: - Summary: CTAS query should set request_pool in QueryStateRecord Key: IMPALA-13477 URL: https://issues.apache.org/jira/browse/IMPALA-13477 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.4.0 Reporter: Riza Suminto Assignee: Riza Suminto Resource Pool information for CTAS query is missing from /queries page of WebUI. This is because CTAS query has TExecRequest.stmt_type = DDL. However, CTAS also has TQueryExecRequest.stmt_type = DML and subject to AdmissionControl. Therefore, its request pool must be recorded into QueryStateRecord and displayed at /queries page of WebUI. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13476) Calcite Planner: IMPALA-1519 query throws exception
Steve Carlin created IMPALA-13476: - Summary: Calcite Planner: IMPALA-1519 query throws exception Key: IMPALA-13476 URL: https://issues.apache.org/jira/browse/IMPALA-13476 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin The following query in analytics-fn.test does not work in CalcitePlanner: # IMPALA-1519: Check that the first analytic sort of a select block # materializes TupleIsNullPredicates to be substituted in ancestor nodes. select sum(t1.id) over (partition by t1.bool_col), count(1) over (order by t1.int_col), avg(g) over (order by f), t2.a, t2.d from alltypestiny t1 left outer join (select id as a, coalesce(id, 10) as b, int_col as c, coalesce(int_col, 20) as d, bigint_col e, coalesce(bigint_col, 30) as f, coalesce(id + bigint_col, 40) as g from alltypestiny) t2 on (t1.id = t2.a + 100) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13475) Consider byte size when enqueuing deferred RPCs in KrpcDataStreamRecvr
Csaba Ringhofer created IMPALA-13475: Summary: Consider byte size when enqueuing deferred RPCs in KrpcDataStreamRecvr Key: IMPALA-13475 URL: https://issues.apache.org/jira/browse/IMPALA-13475 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Csaba Ringhofer KrpcDataStreamRecvr::SenderQueue::ProcessDeferredRpc() can fail to process the deferred RCP if batch_queue is not empty and the batch queue + the currently processed batch would consume too much memory (see KrpcDataStreamRecvr::CanEnqueue for details). The deferred RPC is moved back to the queue in this case. Meanwhile KrpcDataStreamRecvr::SenderQueue::GetBatch() doesn't consider the mem requirement of the batches when initiating the deserialization of deferred RCPs ( EnqueueDeserializeTask) and tries to deserialize as much batches in parallel as possible (FLAGS_datastream_service_num_deserialization_threads, https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/be/src/runtime/krpc-data-stream-recvr.cc#L281). This means that several threads may start ProcessDeferredRpc() even if GetBatch() could have predicted that most will fail due to the memory limit. While this ProcessDeferredRpc() will fail early in this case and won't do much work, these extra failed attempts lock contention worse in KrpcDataStreamRecvr::SenderQueue. In the worst case when only 1 batch fits to memory this can lead to O(FLAGS_datastream_service_num_deserialization_threads * num_batches) wasted ProcessDeferredRpc attempts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13474) Support query timeline display for experimental profile
Surya Hebbar created IMPALA-13474: - Summary: Support query timeline display for experimental profile Key: IMPALA-13474 URL: https://issues.apache.org/jira/browse/IMPALA-13474 Project: IMPALA Issue Type: Improvement Reporter: Surya Hebbar Assignee: Surya Hebbar The webUI's query timeline only partially supports the experimental profile, i.e. when impala is started with {{gen_experimental_profile=true}}. With the inclusion of aggregate event sequence metrics in the following patch, it is now possible to support the query timeline display for both profile formats. https://gerrit.cloudera.org/#/c/21683/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11761) test_partition_dir_removed_inflight fails with "AssertionError: REFRESH should fail"
[ https://issues.apache.org/jira/browse/IMPALA-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-11761. Fix Version/s: Impala 4.5.0 Resolution: Fixed > test_partition_dir_removed_inflight fails with "AssertionError: REFRESH > should fail" > > > Key: IMPALA-11761 > URL: https://issues.apache.org/jira/browse/IMPALA-11761 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Andrew Sherman >Assignee: Michael Smith >Priority: Critical > Fix For: Impala 4.5.0 > > > When running ozone tests, > TestRecursiveListing.test_partition_dir_removed_inflight fails: > {code} > metadata/test_recursive_listing.py:184: in test_partition_dir_removed_inflight > refresh_should_fail=True) > metadata/test_recursive_listing.py:217: in _test_listing_large_dir > assert not refresh_should_fail, "REFRESH should fail" > E AssertionError: REFRESH should fail > E assert not True > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13473) Add support for JS code analysis and linting using ESLint
Surya Hebbar created IMPALA-13473: - Summary: Add support for JS code analysis and linting using ESLint Key: IMPALA-13473 URL: https://issues.apache.org/jira/browse/IMPALA-13473 Project: IMPALA Issue Type: New Feature Reporter: Surya Hebbar Assignee: Surya Hebbar Impala webUI's client side JS codebase is consistently increasing in size. It would be helpful to support enforcing code style and quality for all the webUI's scripts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13472) Minidumps for UBSAN on ARM don't give the stack
Fang-Yu Rao created IMPALA-13472: Summary: Minidumps for UBSAN on ARM don't give the stack Key: IMPALA-13472 URL: https://issues.apache.org/jira/browse/IMPALA-13472 Project: IMPALA Issue Type: Task Affects Versions: Impala 4.5.0 Reporter: Fang-Yu Rao Attachments: 75d667d8-3a76-4240-d95a63bd-01806ba9.dmp_dumpedv2, ce1d2431-ec45-4b30-8115d3a8-1c9b5e9e.dmp_dumpedv2, e85b9f26-2fd0-4beb-7823778e-cbed9c7b.dmp_dumpedv2 Currently Minidumps for UBSAN on ARM don't give the stack as shown in [^e85b9f26-2fd0-4beb-7823778e-cbed9c7b.dmp_dumpedv2], [^ce1d2431-ec45-4b30-8115d3a8-1c9b5e9e.dmp_dumpedv2], and [^75d667d8-3a76-4240-d95a63bd-01806ba9.dmp_dumpedv2]. It would be good if the functions in each thread could be resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13471) test_enable_reading_puffin() seems to fail in the Ozone build
Fang-Yu Rao created IMPALA-13471: Summary: test_enable_reading_puffin() seems to fail in the Ozone build Key: IMPALA-13471 URL: https://issues.apache.org/jira/browse/IMPALA-13471 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Daniel Becker We found that the test [test_enable_reading_puffin()|https://github.com/apache/impala/blame/master/tests/custom_cluster/test_iceberg_with_puffin.py#L59] added in IMPALA-13247 seems to fail in the Ozone build. +*Error Message*+ {code} assert [-1, -1] == [2, 2] At index 0 diff: -1 != 2 Full diff: - [-1, -1] + [2, 2] {code} +*Stacktrace*+ {code} custom_cluster/test_iceberg_with_puffin.py:50: in test_enable_reading_puffin self._read_ndv_stats_expect_result([2, 2]) custom_cluster/test_iceberg_with_puffin.py:59: in _read_ndv_stats_expect_result assert ndvs == expected_ndv_stats E assert [-1, -1] == [2, 2] E At index 0 diff: -1 != 2 E Full diff: E - [-1, -1] E + [2, 2] {code} According to the above, in the Ozone build, the result of "show column stats" was [-1, -1]. It looks like the NDV statistics is not available in the Ozone build. -- This message was sent by Atlassian Jira (v8.20.10#820010)