from:"Michael Smith \(Jira\)"

[jira] [Work started] (IMPALA-11761) test_partition_dir_removed_inflight fails with "AssertionError: REFRESH should fail"

2024-10-07 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-11761 started by Michael Smith.
--
> test_partition_dir_removed_inflight fails with "AssertionError: REFRESH 
> should fail"
> 
>
> Key: IMPALA-11761
> URL: https://issues.apache.org/jira/browse/IMPALA-11761
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
>
> When running ozone tests, 
> TestRecursiveListing.test_partition_dir_removed_inflight fails:
> {code}
> metadata/test_recursive_listing.py:184: in test_partition_dir_removed_inflight
> refresh_should_fail=True)
> metadata/test_recursive_listing.py:217: in _test_listing_large_dir
> assert not refresh_should_fail, "REFRESH should fail"
> E   AssertionError: REFRESH should fail
> E   assert not True
> {code} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-13426) Log Java debug sleeps in dev environment

2024-10-07 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13426:
--

Assignee: Michael Smith

> Log Java debug sleeps in dev environment
> 
>
> Key: IMPALA-13426
> URL: https://issues.apache.org/jira/browse/IMPALA-13426
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> The backend logs sleeps from debug actions at VLOG(1), which are included in 
> Impala's logs as part of the default dev configuration. That means they show 
> up in Jenkins run logs for help with diagnosing test issues.
> Frontend debug actions should do the same. They're currently logged at trace 
> level, which are omitted from Jenkins run logs. Debug actions are entirely 
> opt-in and primarily used for testing, so I don't see any harm in raising the 
> level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13426) Log Java debug sleeps in dev environment

2024-10-07 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13426:
--

 Summary: Log Java debug sleeps in dev environment
 Key: IMPALA-13426
 URL: https://issues.apache.org/jira/browse/IMPALA-13426
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Affects Versions: Impala 4.4.0
Reporter: Michael Smith


The backend logs sleeps from debug actions at VLOG(1), which are included in 
Impala's logs as part of the default dev configuration. That means they show up 
in Jenkins run logs for help with diagnosing test issues.

Frontend debug actions should do the same. They're currently logged at trace 
level, which are omitted from Jenkins run logs. Debug actions are entirely 
opt-in and primarily used for testing, so I don't see any harm in raising the 
level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-13426) Log Java debug sleeps in dev environment

2024-10-07 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13426 started by Michael Smith.
--
> Log Java debug sleeps in dev environment
> 
>
> Key: IMPALA-13426
> URL: https://issues.apache.org/jira/browse/IMPALA-13426
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> The backend logs sleeps from debug actions at VLOG(1), which are included in 
> Impala's logs as part of the default dev configuration. That means they show 
> up in Jenkins run logs for help with diagnosing test issues.
> Frontend debug actions should do the same. They're currently logged at trace 
> level, which are omitted from Jenkins run logs. Debug actions are entirely 
> opt-in and primarily used for testing, so I don't see any harm in raising the 
> level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-11761) test_partition_dir_removed_inflight fails with "AssertionError: REFRESH should fail"

2024-10-07 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887410#comment-17887410
 ] 

Michael Smith commented on IMPALA-11761:


Logging looks insufficient to identify why this is happening. I'll try a bit 
locally, but may need to add some extra logging to help debug it in future runs.

> test_partition_dir_removed_inflight fails with "AssertionError: REFRESH 
> should fail"
> 
>
> Key: IMPALA-11761
> URL: https://issues.apache.org/jira/browse/IMPALA-11761
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
>
> When running ozone tests, 
> TestRecursiveListing.test_partition_dir_removed_inflight fails:
> {code}
> metadata/test_recursive_listing.py:184: in test_partition_dir_removed_inflight
> refresh_should_fail=True)
> metadata/test_recursive_listing.py:217: in _test_listing_large_dir
> assert not refresh_should_fail, "REFRESH should fail"
> E   AssertionError: REFRESH should fail
> E   assert not True
> {code} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13422) TestIcebergTable.test_load failed when running concurrently

2024-10-07 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887377#comment-17887377
 ] 

Michael Smith commented on IMPALA-13422:


[~boroknagyz] might be fixing this with https://gerrit.cloudera.org/c/21882/.

> TestIcebergTable.test_load failed when running concurrently
> ---
>
> Key: IMPALA-13422
> URL: https://issues.apache.org/jira/browse/IMPALA-13422
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> TestIcebergTable.test_load failed when two tests running concurrently in an 
> exhaustive build:
> {code:java}
> 19:12:26 [gw2] FAILED 
> query_test/test_iceberg.py::TestIcebergTable::test_load[protocol: beeswax | 
> exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] 
> 19:12:26 [gw3] FAILED 
> query_test/test_iceberg.py::TestIcebergTable::test_load[protocol: beeswax | 
> exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]{code}
> Stacktrace of both tests are the same:
> {code:python}
> query_test/test_iceberg.py:1300: in test_load
> "iceberg_mixed_file_format_test", "parquet")
> common/file_utils.py:62: in create_iceberg_table_from_directory
> check_call(['hdfs', 'dfs', '-put', '-d', local_dir, hdfs_dir])
> /data/jenkins/workspace/impala-asf-master-exhaustive-release-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190:
>  in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['hdfs', 'dfs', '-put', '-d', 
> '/data/jenkins/workspace/impala-asf-master-exhaustive-release-arm/repos/Impala/testdata/data/iceberg_test/iceberg_mixed_file_format_test',
>  '/test-warehouse/iceberg_mixed_file_format_test']' returned non-zero exit 
> status 1{code}
> Stderr of the first test:
> {code}
> CREATE DATABASE `test_load_a61184e9`;
> -- 2024-10-03 19:12:05,971 INFO MainThread: Started query 
> 0e431e374cd58716:0013fb81
> -- 2024-10-03 19:12:06,022 INFO MainThread: Created database 
> "test_load_a61184e9" for test ID 
> "query_test/test_iceberg.py::TestIcebergTable::()::test_load[protocol: 
> beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> Picked up JAVA_TOOL_OPTIONS:  
> -javaagent:/data/jenkins/workspace/impala-asf-master-exhaustive-release-arm/repos/Impala/fe/target/dependency/jamm-0.4.0.jar
> Picked up JAVA_TOOL_OPTIONS:  
> -javaagent:/data/jenkins/workspace/impala-asf-master-exhaustive-release-arm/repos/Impala/fe/target/dependency/jamm-0.4.0.jar
> put: 
> `/test-warehouse/iceberg_mixed_file_format_test/data/0-0-data-gfurnstahl_20220906113255_8d49367d-e338-4996-ade5-ee500a19c1d1-job_16619542960420_0003-1-1.orc':
>  File exists
> put: 
> `/test-warehouse/iceberg_mixed_file_format_test/metadata/d43cc1ea-096f-4594-9583-b1b27f8f0230-m0.avro':
>  File exists
> put: 
> `/test-warehouse/iceberg_mixed_file_format_test/metadata/snap-5574591442446832859-1-055baf62-de6d-4583-bf21-f187f9482343.avro':
>  File exists
> put: 
> `/test-warehouse/iceberg_mixed_file_format_test/metadata/snap-660396137547572-1-871d1473-8566-46c0-a530-a2256b3f396f.avro':
>  File exists
> put: 
> `/test-warehouse/iceberg_mixed_file_format_test/metadata/v2.metadata.json': 
> File exists
> put: 
> `/test-warehouse/iceberg_mixed_file_format_test/metadata/v4.metadata.json': 
> File exists
> put: 
> `/test-warehouse/iceberg_mixed_file_format_test/metadata/v6.metadata.json': 
> File exists
> put: 
> `/test-warehouse/iceberg_mixed_file_format_test/metadata/v8.metadata.json': 
> File exists{code}
> stderr of the second test:
> {code}
> CREATE DATABASE `test_load_53f920bb`;
> -- 2024-10-03 19:12:05,971 INFO MainThread: Started query 
> 68460e4230189328:9f6261bc
> -- 2024-10-03 19:12:06,022 INFO MainThread: Created database 
> "test_load_53f920bb" for test ID 
> "query_test/test_iceberg.py::TestIcebergTable::()::test_load[protocol: 
> beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> Picked up JAVA_TOOL_OPTIONS:  
> -javaagent:/data/jenkins/workspace/impala-asf-master-exhaustive-release-arm/

[jira] [Updated] (IMPALA-13406) Upgrade curl to latest version (8.10.1)

2024-10-05 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13406:
---
Affects Version/s: Impala 4.4.1
   (was: Impala 4.5.0)

> Upgrade curl to latest version (8.10.1)
> ---
>
> Key: IMPALA-13406
> URL: https://issues.apache.org/jira/browse/IMPALA-13406
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Infrastructure
>Affects Versions: Impala 4.4.1
>Reporter: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Impala currently uses curl 7.78.0, which has several low and medium security 
> vulnerabilities. We should upgrade to the latest curl 8.10.1. It would also 
> be good to change the build to skip functionality that we don't need (e.g. 
> TELNET, FTP, etc).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13406) Upgrade curl to latest version (8.10.1)

2024-10-05 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13406.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Upgrade curl to latest version (8.10.1)
> ---
>
> Key: IMPALA-13406
> URL: https://issues.apache.org/jira/browse/IMPALA-13406
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Impala currently uses curl 7.78.0, which has several low and medium security 
> vulnerabilities. We should upgrade to the latest curl 8.10.1. It would also 
> be good to change the build to skip functionality that we don't need (e.g. 
> TELNET, FTP, etc).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13185) Tuple cache keys need to incorporate runtime filter information

2024-10-04 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13185.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Tuple cache keys need to incorporate runtime filter information
> ---
>
> Key: IMPALA-13185
> URL: https://issues.apache.org/jira/browse/IMPALA-13185
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> If a runtime filter impacts the results of a fragment, then the tuple cache 
> key needs to incorporate information about the generation of that runtime 
> filter. This needs to include information about the base tables that impact 
> the runtime filter.
> For example, suppose there is a join. The build side of the join produces a 
> runtime filter that gets delivered to the probe side of the join. The tuple 
> cache key for the probe side of the join will need to include a 
> representation of the runtime filter. If the table on the build side of the 
> join changes, the tuple cache key for the probe side needs to change due to 
> the possible difference in the runtime filter.
> This can also impact eligibility. In theory, the build side of a join could 
> be constructed from a source with a limit specified, and this can result in 
> non-determinism. Since the build of the runtime filter is not deterministic, 
> the consumer of the runtime filter is not deterministic and can't participate 
> in tuple caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13402) Avoid starting cluster for skipped tests in test_tuple_cache

2024-10-01 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13402.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Avoid starting cluster for skipped tests in test_tuple_cache
> 
>
> Key: IMPALA-13402
> URL: https://issues.apache.org/jira/browse/IMPALA-13402
> Project: IMPALA
>  Issue Type: Task
>  Components: Test
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> TestTupleCacheRuntimeKeys tests skip when mt_dop > 1, but it still takes ~5s 
> to run the test case because it starts up a new test cluster.
> Clean up test definition to avoid using skips in custom cluster tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-6064) Analysis and planning should happen asynchronously

2024-09-27 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-6064:
-

Assignee: Michael Smith

> Analysis and planning should happen asynchronously
> --
>
> Key: IMPALA-6064
> URL: https://issues.apache.org/jira/browse/IMPALA-6064
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Clients
>Reporter: Daniel Hecht
>Assignee: Michael Smith
>Priority: Major
>
> Today analysis and planning happen synchronously in ExecuteStatement() RPC.  
> Instead, these should happen on the asynchronous path as to not prevent a 
> query handle to being returned timely when e.g. metadata loading blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13402) Avoid starting cluster for skipped tests in test_tuple_cache

2024-09-26 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13402:
--

 Summary: Avoid starting cluster for skipped tests in 
test_tuple_cache
 Key: IMPALA-13402
 URL: https://issues.apache.org/jira/browse/IMPALA-13402
 Project: IMPALA
  Issue Type: Task
  Components: Test
Reporter: Michael Smith


TestTupleCacheRuntimeKeys tests skip when mt_dop > 1, but it still takes ~5s to 
run the test case because it starts up a new test cluster.

Clean up test definition to avoid using skips in custom cluster tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-12939) Improve default for IMPALA_BUILD_THREADS

2024-09-26 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12939.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Improve default for IMPALA_BUILD_THREADS
> 
>
> Key: IMPALA-12939
> URL: https://issues.apache.org/jira/browse/IMPALA-12939
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Improve the default selection for IMPALA_BUILD_THREADS and other 
> parallelization config.
> Impala's build process needs 2GB of memory per CPU core with {{-notests}} or 
> 4GB of memory per CPU core when building unit test binaries (the link process 
> is especially memory-intensive). Exceeding these can lead to systems with 
> many cores running out of memory during the build. We currently do not 
> consider memory when selecting a default value for IMPALA_BUILD_THREADS.
> We currently default IMPALA_BUILD_THREADS to {{{}nproc{}}}, which also does 
> not reflect CPU slicing that may happen in containers. How to detect this 
> differs between cgroups v1 and v2 (v2 is becoming more common, present on 
> Ubuntu 22 and RedHat 9).
>  * v1:
> {code:java}
> echo $(($(cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us) / 
> $(/sys/fs/cgroup/cpu/cpu.cfs_period_us))){code}
>  * v2:
> {code:java}
> awk '{ cores = ($1 == "max" ? '$(nproc)' : $1 / $2); print cores==int(cores) 
> ? cores : int(cores)+1 }' /sys/fs/cgroup/cpu.max{code}
> Something like the following should handle all cases
> {code}
> if [[ -f /sys/fs/cgroup/cpu.max ]]; then
>   awk '{ cores = ($1 == "max" ? '$(nproc)' : $1 / $2); print 
> cores==int(cores) ? cores : int(cores)+1 }' /sys/fs/cgroup/cpu.max
> elif [[ -f /sys/fs/cgroup/cpu/cpu.cfs_quota_us ]]; then
>   echo $(($(cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us) / 
> $(/sys/fs/cgroup/cpu/cpu.cfs_period_us)))
> else
>   nproc
> fi
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13302) Some ExprRewriteRule results are not analyzed, leading to unmaterialized slots from reAnalyze

2024-09-24 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13302.

 Fix Version/s: Impala 4.5.0
Target Version: Impala 4.4.2
Resolution: Fixed

> Some ExprRewriteRule results are not analyzed, leading to unmaterialized 
> slots from reAnalyze
> -
>
> Key: IMPALA-13302
> URL: https://issues.apache.org/jira/browse/IMPALA-13302
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> IMPALA-12164 skipped registering conjuncts that the analyzer expects to 
> remove because an earlier conjunct evaluates to constant False. However some 
> ExprRewriteRules don't analyze the predicates they produce, which can lead to 
> those conjuncts not actually being removed until a reAnalyze phase.
> reAnalyze uses a new Analyzer (with new GlobalState); it restarts counting 
> Expr IDs from 0. That can lead to re-using the same Expr ID and marking it as 
> assigned. Then when a new Expr gets the same ID, it will skip materializing 
> slots, which can cause problems later (like if that Expr is part of a hash 
> join).
> Some example queries:
> 1. still a problem
> {code}
> WITH v AS (SELECT 1 FROM functional.alltypestiny t1
>   JOIN functional.alltypestiny t2 ON t1.id = t2.id)
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}
> 2. already fixed via IMPALA-13203
> {code}
> WITH v as (SELECT 1 FROM functional.alltypes t1
>   JOIN functional.alltypes t2 ON t1.id = t2.id)
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND t1.id = 1 AND t1.id = 1 AND false
> UNION ALL
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND false
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13376) Slight documentation mistake for AGG_MEM_CORRELATION_FACTOR

2024-09-24 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13376:
---
Description: 
IMPALA-12548 fix behavior of AGG_MEM_CORRELATION_FACTOR. Higher value will 
lower memory estimation, while lower value will result in higher memory 
estimation. The documentation in 
[ImpalaService.thrift|https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/common/thrift/ImpalaService.thrift#L850-L851],
 however, says the opposite.
{code:java}
  // Setting value 1.0 will result in an equal memory estimate as the default 
estimation
  // (no change). Defaults to 0.5. {code}

AGG_MEM_CORRELATION_FACTOR was also never documented.

  was:
IMPALA-12548 fix behavior of AGG_MEM_CORRELATION_FACTOR. Higher value will 
lower memory estimation, while lower value will result in higher memory 
estimation. The documentation in 
[ImpalaService.thrift|https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/common/thrift/ImpalaService.thrift#L850-L851],
 however, says the opposite.
{code:java}
  // Setting value 1.0 will result in an equal memory estimate as the default 
estimation
  // (no change). Defaults to 0.5. {code}


> Slight documentation mistake for AGG_MEM_CORRELATION_FACTOR
> ---
>
> Key: IMPALA-13376
> URL: https://issues.apache.org/jira/browse/IMPALA-13376
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Affects Versions: Impala 4.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> IMPALA-12548 fix behavior of AGG_MEM_CORRELATION_FACTOR. Higher value will 
> lower memory estimation, while lower value will result in higher memory 
> estimation. The documentation in 
> [ImpalaService.thrift|https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/common/thrift/ImpalaService.thrift#L850-L851],
>  however, says the opposite.
> {code:java}
>   // Setting value 1.0 will result in an equal memory estimate as the default 
> estimation
>   // (no change). Defaults to 0.5. {code}
> AGG_MEM_CORRELATION_FACTOR was also never documented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13376) Slight documentation mistake for AGG_MEM_CORRELATION_FACTOR

2024-09-24 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13376.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Slight documentation mistake for AGG_MEM_CORRELATION_FACTOR
> ---
>
> Key: IMPALA-13376
> URL: https://issues.apache.org/jira/browse/IMPALA-13376
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Affects Versions: Impala 4.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> IMPALA-12548 fix behavior of AGG_MEM_CORRELATION_FACTOR. Higher value will 
> lower memory estimation, while lower value will result in higher memory 
> estimation. The documentation in 
> [ImpalaService.thrift|https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/common/thrift/ImpalaService.thrift#L850-L851],
>  however, says the opposite.
> {code:java}
>   // Setting value 1.0 will result in an equal memory estimate as the default 
> estimation
>   // (no change). Defaults to 0.5. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Reopened] (IMPALA-11761) test_partition_dir_removed_inflight fails on ozone with "AssertionError: REFRESH should fail"

2024-09-20 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reopened IMPALA-11761:


Also failed on an ARM exhaustive release run.

> test_partition_dir_removed_inflight fails on ozone with "AssertionError: 
> REFRESH should fail"
> -
>
> Key: IMPALA-11761
> URL: https://issues.apache.org/jira/browse/IMPALA-11761
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
>
> When running ozone tests, 
> TestRecursiveListing.test_partition_dir_removed_inflight fails:
> {code}
> metadata/test_recursive_listing.py:184: in test_partition_dir_removed_inflight
> refresh_should_fail=True)
> metadata/test_recursive_listing.py:217: in _test_listing_large_dir
> assert not refresh_should_fail, "REFRESH should fail"
> E   AssertionError: REFRESH should fail
> E   assert not True
> {code} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-13393) Cleanup outdated maven config

2024-09-19 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13393 started by Michael Smith.
--
> Cleanup outdated maven config
> -
>
> Key: IMPALA-13393
> URL: https://issues.apache.org/jira/browse/IMPALA-13393
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> The config around javax.el is very outdated. It was added in 2018, when 
> Impala used Sentry. The comment about Hbase depending on it is also no longer 
> accurate. There's no need to pin javax.el anymore, and an update to 
> CDP_BUILD_NUMBER will update cron-utils (via HMS) so javax.el isn't used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13393) Cleanup outdated maven config

2024-09-19 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13393:
--

 Summary: Cleanup outdated maven config
 Key: IMPALA-13393
 URL: https://issues.apache.org/jira/browse/IMPALA-13393
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Affects Versions: Impala 4.4.0
Reporter: Michael Smith


The config around javax.el is very outdated. It was added in 2018, when Impala 
used Sentry. The comment about Hbase depending on it is also no longer 
accurate. There's no need to pin javax.el anymore, and an update to 
CDP_BUILD_NUMBER will update cron-utils (via HMS) so javax.el isn't used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-13393) Cleanup outdated maven config

2024-09-19 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13393:
--

Assignee: Michael Smith

> Cleanup outdated maven config
> -
>
> Key: IMPALA-13393
> URL: https://issues.apache.org/jira/browse/IMPALA-13393
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> The config around javax.el is very outdated. It was added in 2018, when 
> Impala used Sentry. The comment about Hbase depending on it is also no longer 
> accurate. There's no need to pin javax.el anymore, and an update to 
> CDP_BUILD_NUMBER will update cron-utils (via HMS) so javax.el isn't used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13302) Some ExprRewriteRule results are not analyzed, leading to unmaterialized slots from reAnalyze

2024-09-19 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13302:
---
Description: 
IMPALA-12164 skipped registering conjuncts that the analyzer expects to remove 
because an earlier conjunct evaluates to constant False. However some 
ExprRewriteRules don't analyze the predicates they produce, which can lead to 
those conjuncts not actually being removed until a reAnalyze phase.

reAnalyze uses a new Analyzer (with new GlobalState); it restarts counting Expr 
IDs from 0. That can lead to re-using the same Expr ID and marking it as 
assigned. Then when a new Expr gets the same ID, it will skip materializing 
slots, which can cause problems later (like if that Expr is part of a hash 
join).

Some example queries:
1. still a problem
{code}
WITH v AS (SELECT 1 FROM functional.alltypestiny t1
  JOIN functional.alltypestiny t2 ON t1.id = t2.id)
SELECT 1
FROM functional.alltypestiny t1
WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
  AND t1.id = 1 AND t1.id = 1
UNION ALL
SELECT 1
FROM functional.alltypestiny t1
WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
  AND t1.id = 1 AND t1.id = 1
UNION ALL SELECT 1 FROM v
UNION ALL SELECT 1 FROM v;
{code}
2. already fixed via IMPALA-13203
{code}
WITH v as (SELECT 1 FROM functional.alltypes t1
  JOIN functional.alltypes t2 ON t1.id = t2.id)
SELECT 1 FROM functional.alltypes t1
  WHERE t1.id = 1 AND t1.id = 1 AND t1.id = 1 AND false
UNION ALL
SELECT 1 FROM functional.alltypes t1
  WHERE t1.id = 1 AND false
UNION ALL SELECT 1 FROM v
UNION ALL SELECT 1 FROM v;
{code}


  was:
IMPALA-12164 skipped registering conjuncts that the analyzer expects to remove 
because an earlier conjunct evaluates to constant False. However some 
ExprRewriteRules don't analyze the predicates they produce, which can lead to 
those conjuncts not actually being removed until a reAnalyze phase.

reAnalyze uses a new Analyzer (with new GlobalState); it restarts counting Expr 
IDs from 0. That can lead to re-using the same Expr ID and marking it as 
assigned. Then when a new Expr gets the same ID, it will skip materializing 
slots, which can cause problems later (like if that Expr is part of a hash 
join).

Some example queries:
{code}
WITH v AS (SELECT 1 FROM functional.alltypestiny t1
  JOIN functional.alltypestiny t2 ON t1.id = t2.id)
SELECT 1
FROM functional.alltypestiny t1
WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
  AND t1.id = 1 AND t1.id = 1
UNION ALL
SELECT 1
FROM functional.alltypestiny t1
WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
  AND t1.id = 1 AND t1.id = 1
UNION ALL SELECT 1 FROM v
UNION ALL SELECT 1 FROM v;
{code}
(already fixed via IMPALA-13203) and
{code}
WITH v as (SELECT 1 FROM functional.alltypes t1
  JOIN functional.alltypes t2 ON t1.id = t2.id)
SELECT 1 FROM functional.alltypes t1
  WHERE t1.id = 1 AND t1.id = 1 AND t1.id = 1 AND false
UNION ALL
SELECT 1 FROM functional.alltypes t1
  WHERE t1.id = 1 AND false
UNION ALL SELECT 1 FROM v
UNION ALL SELECT 1 FROM v;
{code}



> Some ExprRewriteRule results are not analyzed, leading to unmaterialized 
> slots from reAnalyze
> -
>
> Key: IMPALA-13302
> URL: https://issues.apache.org/jira/browse/IMPALA-13302
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>
> IMPALA-12164 skipped registering conjuncts that the analyzer expects to 
> remove because an earlier conjunct evaluates to constant False. However some 
> ExprRewriteRules don't analyze the predicates they produce, which can lead to 
> those conjuncts not actually being removed until a reAnalyze phase.
> reAnalyze uses a new Analyzer (with new GlobalState); it restarts counting 
> Expr IDs from 0. That can lead to re-using the same Expr ID and marking it as 
> assigned. Then when a new Expr gets the same ID, it will skip materializing 
> slots, which can cause problems later (like if that Expr is part of a hash 
> join).
> Some example queries:
> 1. still a problem
> {code}
> WITH v AS (SELECT 1 FROM functional.alltypestiny t1
>   JOIN functional.alltypestiny t2 ON t1.id = t2.id)
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}
> 2. already fixed via IMPALA-13203
> {code}
> WITH v as (SELECT 1 FROM functional.alltypes t1
>   JOIN functional.alltypes t2 ON t1.id = t2.id)
> SELECT 1 FROM functional.alltypes t1
>   WHERE t

[jira] [Commented] (IMPALA-11340) Support BINARY in persistent legacy Java UDFs

2024-09-16 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882159#comment-17882159
 ] 

Michael Smith commented on IMPALA-11340:


There's no definition for BINARY in 
https://github.com/apache/impala/blob/master/be/src/udf/udf.h.

> Support BINARY in persistent legacy Java UDFs
> -
>
> Key: IMPALA-11340
> URL: https://issues.apache.org/jira/browse/IMPALA-11340
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> The BINARY implementation in https://gerrit.cloudera.org/#/c/16066/ supports 
> Java UDFs when the arguments and results are set explicitly, but does not 
> create functions with BINARY types when only the class is given and Impala is 
> supposed to create a function for all overloads of evaluate(),
> The reason is backwards compatibility - before BINARY support Impala mapped 
> BytesWritable and BytesArray types to STRING, while Hive maps these types to 
> BINARY. To avoid breaking existing queries in Impala, we cannot create 
> functions with BINARY instead.
> If BytesWritable/BytesArray is present as argument, we could create functions 
> with both signatures (on where all these types are mapped to STRING, one 
> where all are mapped to BINARY), but this would not work if these types are 
> only present as return type. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13322) Cannot Alter Table sys.impala_query_live

2024-09-16 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13322.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Cannot Alter Table sys.impala_query_live
> 
>
> Key: IMPALA-13322
> URL: https://issues.apache.org/jira/browse/IMPALA-13322
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Jason Fehr
>Assignee: Michael Smith
>Priority: Critical
>  Labels: workload-management
> Fix For: Impala 4.5.0
>
>
> When running "alter table add column if not exists" ddls on 
> sys.impala_query_live, Impala returns "IllegalStateException: null".  No 
> stack trace can be found in any log files.
> Even though that error was returned, the column was still added, it just took 
> about 10 seconds to show up in the table.
> Once this issue is fixed, the disparate logic to handle upgrades on 
> sys.impala_query_log and sys.impala_query_live can be unified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13378) Impalad crash in RowDescriptor::InitTupleIdxMap()

2024-09-13 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13378.

 Fix Version/s: Impala 4.5.0
Target Version: Impala 4.4.2
Resolution: Fixed

> Impalad crash in RowDescriptor::InitTupleIdxMap()
> -
>
> Key: IMPALA-13378
> URL: https://issues.apache.org/jira/browse/IMPALA-13378
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
>  We saw a crash in RowDescriptor::InitTupleIdxMap() showing that the crash 
> address is 0x0
> {noformat}
>  0  impalad!impala::RowDescriptor::InitTupleIdxMap() [descriptors.cc : 432 + 
> 0x3]
>  1  impalad!impala::RowDescriptor::RowDescriptor(impala::DescriptorTbl 
> const&, std::vector > const&, std::vector std::allocator > const&) [descriptors.cc : 401 + 0x8]
>  2  impalad!impala::PlanNode::Init(impala::TPlanNode const&, 
> impala::FragmentState*) [exec-node.cc : 79 + 0x20]
>  3  impalad!impala::ScanPlanNode::Init(impala::TPlanNode const&, 
> impala::FragmentState*) [scan-node.cc : 102 + 0x8]
>  4  impalad!impala::HdfsScanPlanNode::Init(impala::TPlanNode const&, 
> impala::FragmentState*) [hdfs-scan-node-base.cc : 156 + 0xc]
>  5  impalad!impala::PlanNode::CreateTreeHelper(impala::FragmentState*, 
> std::vector > const&, 
> impala::PlanNode*, int*, impala::PlanNode**) [exec-node.cc : 145 + 0x10]
>  6  impalad!impala::PlanNode::CreateTree(impala::FragmentState*, 
> impala::TPlan const&, impala::PlanNode**) [exec-node.cc : 104 + 0x5]
>  7  impalad!impala::FragmentState::Init() [fragment-state.cc : 84 + 0x10]
>  8  
> impalad!impala::FragmentState::CreateFragmentStateMap(impala::TExecPlanFragmentInfo
>  const&, impala::ExecQueryFInstancesRequestPB const&, impala::QueryState*, 
> std::unordered_map, 
> std::equal_to, std::allocator impala::FragmentState*> > >&) [fragment-state.cc : 78 + 0xc]
>  9  impalad!impala::QueryState::StartFInstances() [query-state.cc : 820 + 
> 0x2e] {noformat}
> The code is
> {code:cpp}
> 428 void RowDescriptor::InitTupleIdxMap() {
> 429   // find max id
> 430   TupleId max_id = 0;
> 431   for (int i = 0; i < tuple_desc_map_.size(); ++i) {
> 432 max_id = max(tuple_desc_map_[i]->id(), max_id); // <-- Crash here
> 433   }{code}
> It seems 'tuple_desc_map_[i]' is null here. 'tuple_desc_map_' is initialized 
> in the parent frame:
> {code:cpp}
> 391 RowDescriptor::RowDescriptor(const DescriptorTbl& desc_tbl,
> 392  const vector& row_tuples,
> 393  const vector& nullable_tuples)
> 394   : tuple_idx_nullable_map_(nullable_tuples) {
> 395   DCHECK_EQ(nullable_tuples.size(), row_tuples.size());
> 396   DCHECK_GT(row_tuples.size(), 0);
> 397   for (int i = 0; i < row_tuples.size(); ++i) {
> 398 
> tuple_desc_map_.push_back(desc_tbl.GetTupleDescriptor(row_tuples[i])); // <-- 
> init here
> 399 DCHECK(tuple_desc_map_.back() != NULL);
> 400   }
> 401   InitTupleIdxMap();
> 402   InitHasVarlenSlots();
> 403 }{code}
> We have a DCHECK to make sure the TupleDescriptor pointer is not NULL. But 
> it's not executed in RELEASE build.
> 'desc_tbl' comes from TQueryCtx and 'row_tuples' comes from 
> TExecPlanFragmentInfo. We should verify whether they are consistent before 
> starting all the fragment instances.
> [https://github.com/apache/impala/blob/3e1b10556bc83b0e697b7a2aac411ccad6094563/be/src/service/control-service.cc#L162-L163]
> Why they could be inconsistent still need further investigation. We saw the 
> same query succeeded sometimes and also saw other queries causing a similar 
> crash, i.e. crash in RowDescriptor::InitTupleIdxMap() but come from 
> initialization of different PlanNodes.
> Might be the same cause of IMPALA-13107. CC [~wzhou], [~MikaelSmith], 
> [~prozsa]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13380) Deduplicate "Referenced Tables", "Tables Queried" in Profile v2

2024-09-12 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13380:
--

 Summary: Deduplicate "Referenced Tables", "Tables Queried" in 
Profile v2
 Key: IMPALA-13380
 URL: https://issues.apache.org/jira/browse/IMPALA-13380
 Project: IMPALA
  Issue Type: Improvement
Reporter: Michael Smith


IMPALA-12626 and IMPALA-3880 both added fields to the profile that list the 
tables referenced by the query. They use the same source, so these are purely 
duplicate entries.

In Profile v2, we should reduce this down to one entry again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12626) Include List of Referenced Tables/Views in Query Log Table

2024-09-12 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881408#comment-17881408
 ] 

Michael Smith commented on IMPALA-12626:


A lot of the work was necessary: we needed a list of tables we could use in the 
backend when writing to live/log tables. But we definitely now duplicate this 
in the profile.

Maybe we should mark this as something to remove in the nextgen profile, as 
it's in an existing release so I don't think we want to pull it back out of the 
profile.

> Include List of Referenced Tables/Views in Query Log Table
> --
>
> Key: IMPALA-12626
> URL: https://issues.apache.org/jira/browse/IMPALA-12626
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Jason Fehr
>Assignee: Michael Smith
>Priority: Major
>  Labels: workload-management
> Fix For: Impala 4.4.0
>
>
> In the Impala query log table where completed queries are stored, add a list 
> of all tables and views that were referenced in the query.  The purpose 
> behind this functionality is for end users to be able to analyze how often 
> each table is used in queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-13322) Cannot Alter Table sys.impala_query_live

2024-09-11 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13322 started by Michael Smith.
--
> Cannot Alter Table sys.impala_query_live
> 
>
> Key: IMPALA-13322
> URL: https://issues.apache.org/jira/browse/IMPALA-13322
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Jason Fehr
>Assignee: Michael Smith
>Priority: Critical
>  Labels: workload-management
>
> When running "alter table add column if not exists" ddls on 
> sys.impala_query_live, Impala returns "IllegalStateException: null".  No 
> stack trace can be found in any log files.
> Even though that error was returned, the column was still added, it just took 
> about 10 seconds to show up in the table.
> Once this issue is fixed, the disparate logic to handle upgrades on 
> sys.impala_query_log and sys.impala_query_live can be unified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-13322) Cannot Alter Table sys.impala_query_live

2024-09-11 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13322:
--

Assignee: Michael Smith

> Cannot Alter Table sys.impala_query_live
> 
>
> Key: IMPALA-13322
> URL: https://issues.apache.org/jira/browse/IMPALA-13322
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Jason Fehr
>Assignee: Michael Smith
>Priority: Critical
>  Labels: workload-management
>
> When running "alter table add column if not exists" ddls on 
> sys.impala_query_live, Impala returns "IllegalStateException: null".  No 
> stack trace can be found in any log files.
> Even though that error was returned, the column was still added, it just took 
> about 10 seconds to show up in the table.
> Once this issue is fixed, the disparate logic to handle upgrades on 
> sys.impala_query_log and sys.impala_query_live can be unified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13107) Invalid TExecPlanFragmentInfo received by executor with instance number as 0

2024-09-11 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881114#comment-17881114
 ] 

Michael Smith commented on IMPALA-13107:


I think I looked into DiscardTransfer further on the call and couldn't identify 
anywhere it would be used afterwards.

> Invalid TExecPlanFragmentInfo received by executor with instance number as 0
> 
>
> Key: IMPALA-13107
> URL: https://issues.apache.org/jira/browse/IMPALA-13107
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0, Impala 4.4.1
>
>
> In a customer reported case, TExecPlanFragmentInfo received by executors with 
> instance number equals 0, which caused impala daemon to crash. Here are log 
> messages collected on the Impala executors:
> {code:java}
> impalad.executor.net.impala.log.INFO.20240522-160138.197583:I0523 
> 00:59:16.892853 199528 control-service.cc:148] 
> 624c47e9264ebb62:5aa89af3] ExecQueryFInstances(): 
> query_id=624c47e9264ebb62:5aa89af3 coord=coordinator.net:27000 
> #instances=0
> ..
> I0523 00:59:19.306522 199185 kMinidump in thread 
> [1890723]query-state-624c47e9264ebb62:5aa89af3 running query 
> 624c47e9264ebb62:5aa89af3, fragment instance 
> :
> Wrote minidump to 
> /var/log/impala-minidumps/impalad/021b06ea-1627-4c69-9f27858a-f3cd9026.dmp
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x012ff9d9, pid=197583, tid=0x7eefc98a0700
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_381) (build 1.8.0_381-b09)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.381-b09 mixed mode 
> linux-amd64 )
> # Problematic frame:
> # C  [impalad+0xeff9d9]  
> impala::FragmentState::FragmentState(impala::QueryState*, 
> impala::TPlanFragment const&, impala::PlanFragmentCtxPB const&)+0xf9
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> {code}
> From the collected profiles, there was no fragment with instance number as 0 
> in the corresponding query plan so coordinator should not send fragments to 
> executor with number of instances as 0.  Executor log files showed that there 
> were lots of KRPC errors around the time when receiving invalid 
> TExecPlanFragmentInfo. It seems KRPC messages were truncated due to KRPC 
> failures, but truncation might not cause thrift deserialization error. The 
> invalid TExecPlanFragmentInfo caused Impala daemon to crash with following 
> stack trace when the query was started on executor.
> {code:java}
> #0  SubstituteArg (value=..., this=0x7f86cec79d30) at 
> ../gutil/strings/substitute.h:79
> #1  impala::FragmentState::FragmentState (this=0x35c78f40, 
> query_state=0x7972db00, fragment=..., 
> fragment_ctx= 0x35c78f88>) at fragment-state.cc:143
> #2  0x013019aa in impala::FragmentState::CreateFragmentStateMap 
> (fragment_info=..., exec_request=..., 
> state=state@entry=0x7972db00, fragment_map=...) at fragment-state.cc:47
> #3  0x01292d71 in impala::QueryState::StartFInstances 
> (this=this@entry=0x7972db00) at query-state.cc:820
> #4  0x01284810 in impala::QueryExecMgr::ExecuteQueryHelper 
> (this=0x11943b00, qs=0x7972db00)
> at query-exec-mgr.cc:162
> #5  0x01752915 in operator() (this=0x7f86cec7ab40)
> at 
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #6  impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) (name=..., category=..., 
> functor=..., 
> parent_thread_info=, thread_started=0x7f87b7b9acb0) at 
> thread.cc:360
> #7  0x01753c9b in operator() std::__cxx11::basic_string&, const std::__cxx11::basic_string&, 
> boost::function, const impala::ThreadDebugInfo*, impala::Promise int>*), boost::_bi::list0> (
> a=, f=@0x1f66f3b8: , 
> this=0x1f66f3c0)
> at 
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
> #8  operator() (this=0x1f66f3b8)
> at 
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
> #9  boost::detail::thread_data (*)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), 
> boost::_bi::lis

[jira] [Comment Edited] (IMPALA-12467) Semantic Search In Impala

2024-09-11 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881084#comment-17881084
 ] 

Michael Smith edited comment on IMPALA-12467 at 9/11/24 5:48 PM:
-

There's a lot of potential breadth here:
* cosine distance is based on a vector, and implies the selection of an 
embedding model to turn text into a vector. These tend to include some semantic 
relationship between words, and would be necessary to satisfy something like 
"Europe" returning countries or regions in Europe. Handling the embedding 
calculation as part of the function API would be a lot more useful, but works 
best with access to GPUs.
* Jaro Winkler, Levenshtein, Hamming all operate directly on strings and don't 
require a separate representation. They also don't encode meaning, so primarily 
useful for handling typos or different forms of the same word (quick vs 
quickly).

I would address these as at least 3 separate problems:
* string similarity functions (like Levenshtein)
* vector distance functions (which can be used on vector embeddings)
* functions to generate vector embeddings from a string using a configured 
embedding model (and necessary infrastructure)


was (Author: JIRAUSER288956):
There's a lot of potential breadth here:
* cosine distance is based on a vector, and implies the selection of an 
embedding model to turn text into a vector. These tend to include some semantic 
relationship between words, and would be necessary to satisfy something like 
"Europe" returning countries or regions in Europe. Handling the embedding 
calculation as part of the function API would be a lot more useful, but works 
best with access to GPUs.
* Jaro Winkler, Levenshtein, Hamming all operate directly on strings and don't 
require a separate representation. They also don't encode meaning, so primarily 
useful for handling typos or different forms of the same word (quick vs 
quickly).

I would address those are at least 3 separate problems:
* string similarity functions (like Levenshtein)
* vector distance functions (which can be used on vector embeddings)
* functions to generate vector embeddings from a string using a configured 
embedding model (and necessary infrastructure)

> Semantic Search In Impala
> -
>
> Key: IMPALA-12467
> URL: https://issues.apache.org/jira/browse/IMPALA-12467
> Project: IMPALA
>  Issue Type: Wish
>Reporter: Sreenath
>Priority: Major
>
> _Semantic search is the tech power *vector databases,* and we can have the 
> same power in Impala._
> Semantic search is a way for computers to understand the meaning behind words 
> and phrases when you're searching for something. Instead of just looking for 
> exact matches of keywords, it tries to figure out what you're really asking 
> and provides results that are more relevant and meaningful to your question. 
> It's like having a search engine that can understand what you mean, not just 
> what you say, making it easier to find the information you're looking for. 
> This ticket is a wish to have semantic search in Impala.
> On the implementation side, semantic search uses an embedding model and any 
> of the similarity distance functions.
> My proposal is to implement functions for on-the-fly calculation of 
> similarity distance between two values. Once we have them we could easily do 
> semantic search as part of a where clause.
>  * Eg (using a cosine similarity function): “WHERE cos_dist(region, 'europe') 
> > 0.9“. And it could return records with regions like Scandinavia, Nordic, 
> Baltic etc…
>  * We could have functions thats accept values as text or as vector 
> embeddings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13373) Analytic query with struct from view fails

2024-09-11 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13373:
---
Priority: Critical  (was: Major)

> Analytic query with struct from view fails
> --
>
> Key: IMPALA-13373
> URL: https://issues.apache.org/jira/browse/IMPALA-13373
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Critical
>
> Analytic query with struct from view fails, while the same query from the 
> table behind the view succeeds.
> This is how to reproduce it:
>  
> {code:java}
> CREATE TABLE tbl (    
>   s STRUCT  
> ) STORED AS PARQUET;
> CREATE VIEW view_tbl AS SELECT s FROM tbl;
> SELECT  MAX(view_tbl.s.type) OVER (PARTITION BY view_tbl.s.id) AS type
> FROM view_tbl;
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13322) Cannot Alter Table sys.impala_query_live

2024-09-10 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880846#comment-17880846
 ] 

Michael Smith commented on IMPALA-13322:


I also tried clearColumns() at the start of that load.

The problem I ran into with any changes was: after the alter, the new columns 
wouldn't show up until I ran {{invalidate metadata}}. It sounds like that 
should work if I got {{load}} right, so I'll take a closer look at what I might 
have been missing there.

> Cannot Alter Table sys.impala_query_live
> 
>
> Key: IMPALA-13322
> URL: https://issues.apache.org/jira/browse/IMPALA-13322
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Jason Fehr
>Priority: Critical
>  Labels: workload-management
>
> When running "alter table add column if not exists" ddls on 
> sys.impala_query_live, Impala returns "IllegalStateException: null".  No 
> stack trace can be found in any log files.
> Even though that error was returned, the column was still added, it just took 
> about 10 seconds to show up in the table.
> Once this issue is fixed, the disparate logic to handle upgrades on 
> sys.impala_query_log and sys.impala_query_live can be unified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12626) Include List of Referenced Tables/Views in Query Log Table

2024-09-10 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880844#comment-17880844
 ] 

Michael Smith commented on IMPALA-12626:


Oops, I missed that somehow. We can revert the profile change, and should 
revisit that we didn't duplicate effort to generate the query log column.

> Include List of Referenced Tables/Views in Query Log Table
> --
>
> Key: IMPALA-12626
> URL: https://issues.apache.org/jira/browse/IMPALA-12626
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Jason Fehr
>Assignee: Michael Smith
>Priority: Major
>  Labels: workload-management
> Fix For: Impala 4.4.0
>
>
> In the Impala query log table where completed queries are stored, add a list 
> of all tables and views that were referenced in the query.  The purpose 
> behind this functionality is for end users to be able to analyze how often 
> each table is used in queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-12626) Include List of Referenced Tables/Views in Query Log Table

2024-09-10 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880844#comment-17880844
 ] 

Michael Smith edited comment on IMPALA-12626 at 9/11/24 3:51 AM:
-

Oops, I missed that somehow. We could consider reverting the profile change, 
and should revisit that we didn't duplicate effort to generate the query log 
column.


was (Author: JIRAUSER288956):
Oops, I missed that somehow. We can revert the profile change, and should 
revisit that we didn't duplicate effort to generate the query log column.

> Include List of Referenced Tables/Views in Query Log Table
> --
>
> Key: IMPALA-12626
> URL: https://issues.apache.org/jira/browse/IMPALA-12626
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Jason Fehr
>Assignee: Michael Smith
>Priority: Major
>  Labels: workload-management
> Fix For: Impala 4.4.0
>
>
> In the Impala query log table where completed queries are stored, add a list 
> of all tables and views that were referenced in the query.  The purpose 
> behind this functionality is for end users to be able to analyze how often 
> each table is used in queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-13318) test_local_catalog_no_event_processing failing for impala asf asan

2024-09-10 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13318:
--

Assignee: Csaba Ringhofer

> test_local_catalog_no_event_processing failing for impala asf asan
> --
>
> Key: IMPALA-13318
> URL: https://issues.apache.org/jira/browse/IMPALA-13318
> Project: IMPALA
>  Issue Type: Bug
>Reporter: gaurav singh
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> Stack
> {code:java}
> custom_cluster/test_partition.py:120: in 
> test_local_catalog_no_event_processing 
> self._test_partition_deletion(unique_database) 
> custom_cluster/test_partition.py:162: in _test_partition_deletion 
> self.assert_catalogd_log_contains("INFO", deletion_log_regex.format(tbl, i)) 
> common/impala_test_suite.py:1296: in assert_catalogd_log_contains "catalogd", 
> level, line_regex, expected_count, timeout_s, dry_run) 
> common/impala_test_suite.py:1341: in assert_log_contains (expected_count, 
> log_file_path, line_regex, found, line){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13322) Cannot Alter Table sys.impala_query_live

2024-09-10 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880767#comment-17880767
 ] 

Michael Smith commented on IMPALA-13322:


[~stigahuang] is there a simple way to update the SystemTable.load 
implementation to fix this? Or is handling HMS loads more complicated?

> Cannot Alter Table sys.impala_query_live
> 
>
> Key: IMPALA-13322
> URL: https://issues.apache.org/jira/browse/IMPALA-13322
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Jason Fehr
>Priority: Critical
>  Labels: workload-management
>
> When running "alter table add column if not exists" ddls on 
> sys.impala_query_live, Impala returns "IllegalStateException: null".  No 
> stack trace can be found in any log files.
> Even though that error was returned, the column was still added, it just took 
> about 10 seconds to show up in the table.
> Once this issue is fixed, the disparate logic to handle upgrades on 
> sys.impala_query_log and sys.impala_query_live can be unified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13322) Cannot Alter Table sys.impala_query_live

2024-09-10 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880765#comment-17880765
 ] 

Michael Smith commented on IMPALA-13322:


This fails on 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/SystemTable.java#L104.
 If I remove that precondition, I can alter the table and the new column is 
recognized by subsequent alters
{code}
> alter table sys.impala_query_live add if not exists columns(foo string);
Query: alter table sys.impala_query_live add if not exists columns(foo string)
+-+
| summary |
+-+
| New column(s) have been added to the table. |
+-+
Fetched 1 row(s) in 0.12s
> alter table sys.impala_query_live add if not exists columns(foo string);
Query: alter table sys.impala_query_live add if not exists columns(foo string)
++
| summary|
++
| No new column(s) have been added to the table. |
++
Fetched 1 row(s) in 0.11s
{code}
however it's not visible in {{describe}} and column can't be dropped until I 
invalidate metadata
{code}
> describe sys.impala_query_live;
Query: describe sys.impala_query_live
+--+---+-+
| name | type  | comment |
+--+---+-+
| cluster_id   | string| |
| query_id | string| |
| session_id   | string| |
| session_type | string| |
| hiveserver2_protocol_version | string| |
| db_user  | string| |
| db_user_connection   | string| |
| db_name  | string| |
| impala_coordinator   | string| |
| query_status | string| |
| query_state  | string| |
| impala_query_end_state   | string| |
| query_type   | string| |
| network_address  | string| |
| start_time_utc   | timestamp | |
| total_time_ms| decimal(18,3) | |
| query_opts_config| string| |
| resource_pool| string| |
| per_host_mem_estimate| bigint| |
| dedicated_coord_mem_estimate | bigint| |
| per_host_fragment_instances  | string| |
| backends_count   | int   | |
| admission_result | string| |
| cluster_memory_admitted  | bigint| |
| executor_group   | string| |
| executor_groups  | string| |
| exec_summary | string| |
| num_rows_fetched | bigint| |
| row_materialization_rows_per_sec | bigint| |
| row_materialization_time_ms  | decimal(18,3) | |
| compressed_bytes_spilled | bigint| |
| event_planning_finished  | decimal(18,3) | |
| event_submit_for_admission   | decimal(18,3) | |
| event_completed_admission| decimal(18,3) | |
| event_all_backends_started   | decimal(18,3) | |
| event_rows_available | decimal(18,3) | |
| event_first_row_fetched  | decimal(18,3) | |
| event_last_row_fetched   | decimal(18,3) | |
| event_unregister_query   | decimal(18,3) | |
| read_io_wait_total_ms| decimal(18,3) | |
| read_io_wait_mean_ms | decimal(18,3) | |
| bytes_read_cache_total   | bigint| |
| bytes_read_total | bigint| |
| pernode_peak_mem_min | bigint| |
| pernode_peak_mem_max | bigint| |
| pernode_peak_mem_mean| bigint| |
| sql  | string| |
| plan | string| |
| tables_queried   | string| |
+--+---+-+
Fetched 49 row(s) in 0.04s
> alter table sys.impala_query_live drop foo;
Query: alter table sys.impala_query_live drop

[jira] [Resolved] (IMPALA-13347) TSAN failure in backend tests after IMPALA-12737

2024-09-06 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13347.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> TSAN failure in backend tests after IMPALA-12737
> 
>
> Key: IMPALA-13347
> URL: https://issues.apache.org/jira/browse/IMPALA-13347
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Michael Smith
>Assignee: Jason Fehr
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Impala backend tests (expr-test, session-expiry-test, internal-server-test) 
> started detecting a thread leak in TSAN builds after IMPALA-12737 "Refactor 
> the Workload Management Initialization Process" was merged. Example:
> {code:java}
>  WARNING: ThreadSanitizer: thread leak (pid=475)
>   Thread T566 (tid=1135, finished) created by main thread at:
> #0 pthread_create  (unifiedbetests+0x23d7083)
> #1 boost::thread::start_thread_noexcept()  
> (unifiedbetests+0x521607d)
> #2 boost::thread::thread std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> std::__cxx11::basic_string, std::allocator 
> >, std::__cxx11::basic_string, 
> std::allocator >, boost::function, impala::ThreadDebugInfo*, 
> impala::Promise*>(void 
> (*)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), std::__cxx11::basic_string std::char_traits, std::allocator >, 
> std::__cxx11::basic_string, std::allocator 
> >, boost::function, impala::ThreadDebugInfo*, impala::Promise (impala::PromiseMode)0>*) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/thread/detail/thread.hpp:424:13
>  (unifiedbetests+0x4f8b214)
> #3 impala::Thread::StartThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function const&, std::unique_ptr std::default_delete >*, bool) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/util/thread.cc:317:13
>  (unifiedbetests+0x4f8761c)
> #4 impala::Status impala::Thread::Create boost::_mfi::mf0, 
> boost::_bi::list1 > > 
> >(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> boost::_bi::bind_t, 
> boost::_bi::list1 > > const&, 
> std::unique_ptr >*, bool) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/util/thread.h:74:12
>  (unifiedbetests+0x4c695f2)
> #5 impala::ImpalaServer::Start(int, int, int, int) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/service/impala-server.cc:3220:5
>  (unifiedbetests+0x4c63e4a)
> #6 impala::InProcessImpalaServer::StartWithClientServers(int, int, int) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/testutil/in-process-servers.cc:97:3
>  (unifiedbetests+0x501bd3b)
> #7 
> impala::InProcessImpalaServer::StartWithEphemeralPorts(std::__cxx11::basic_string  std::char_traits, std::allocator > const&, int, 
> impala::InProcessImpalaServer**) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/testutil/in-process-servers.cc:69:21
>  (unifiedbetests+0x501bc22)
> #8 impala::ExprTest::SetUpTestCase() 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/exprs/expr-test.cc:235:5
>  (unifiedbetests+0x26d42f9)
> #9 void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) 
>  (unifiedbetests+0x6876a5c)
> #10 main 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/service/unified-betest-main.cc:48:10
>  (unifiedbetests+0x24363e0)
> SUMMARY: ThreadSanitizer: thread leak 
> (/data0/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/build/debug/service/unifiedbetests+0x23d7083)
>  in __interceptor_pthread_create{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13365) markConjunctAssigned called on conjuncts that aren't registered

2024-09-05 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13365:
--

 Summary: markConjunctAssigned called on conjuncts that aren't 
registered
 Key: IMPALA-13365
 URL: https://issues.apache.org/jira/browse/IMPALA-13365
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Michael Smith
Assignee: Michael Smith


While looking at IMPALA-13302, I've found several cases where conjuncts are 
passed to markConjunctAssigned without first being registered with 
registerConjunct. There seem to be several cases:
 * conjuncts that are never registered and ExprId is {{null}}; mostly seem to 
pop up in AnalyticPlanner.createSingleNodePlan -> addConjunctsToNode
 * conjuncts that were registered and have an ExprId, but then we clone them 
and somehow they get a different SlotRef ID for part of the expression - seen 
for functions like {{row_number()}} - so if we try to verify the conjunct was 
registered they fail both {{==}} or {{equals}}.

I identified these by modifying the Precondition added in IMPALA-13302 and 
running PlannerTest FE tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13344) Impala rewrites may be incomplete

2024-09-04 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13344.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Impala rewrites may be incomplete
> -
>
> Key: IMPALA-13344
> URL: https://issues.apache.org/jira/browse/IMPALA-13344
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Impala's ExprRewriteRules may not be completely applied for complex 
> expressions. Some implementations of ExprRewriteRule don't analyze new Expr 
> they produce, which can lead to other rewrite rules ignoring those Expr. 
> reAnalyze covers some of this by doing a 2nd analysis phase, but that only 
> works for two layers.
> We should analyze any new Expr produced by an ExprRewriteRule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-13302) Some ExprRewriteRule results are not analyzed, leading to unmaterialized slots from reAnalyze

2024-09-04 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879287#comment-17879287
 ] 

Michael Smith edited comment on IMPALA-13302 at 9/4/24 8:50 PM:


Oh, I wonder why that was used in the first place. 
[https://github.com/apache/impala/commit/fd182201bcbe95c4e478ed081247fb748d37f773]
 changed that behavior, 
[https://github.com/apache/impala/commit/aed3bb8b3ef3408930d96cbb471d8e10f670b2db]
 doesn't have any explanation of the original choice.

I'm not opposed to also making that change.

My observation on why that wasn't always an issue was that it only caused 
problems if there were slots that needed to be materialized for a conjunct 
during re-analysis. Many conjuncts didn't have new slots to materialize.


was (Author: JIRAUSER288956):
Oh, I wonder why that was used in the first place. 
[https://github.com/apache/impala/commit/fd182201bcbe95c4e478ed081247fb748d37f773]
 changed that behavior, 
[https://github.com/apache/impala/commit/aed3bb8b3ef3408930d96cbb471d8e10f670b2db]
 doesn't have any explanation of the original choice.

I'm not opposed to also making that change.

> Some ExprRewriteRule results are not analyzed, leading to unmaterialized 
> slots from reAnalyze
> -
>
> Key: IMPALA-13302
> URL: https://issues.apache.org/jira/browse/IMPALA-13302
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>
> IMPALA-12164 skipped registering conjuncts that the analyzer expects to 
> remove because an earlier conjunct evaluates to constant False. However some 
> ExprRewriteRules don't analyze the predicates they produce, which can lead to 
> those conjuncts not actually being removed until a reAnalyze phase.
> reAnalyze uses a new Analyzer (with new GlobalState); it restarts counting 
> Expr IDs from 0. That can lead to re-using the same Expr ID and marking it as 
> assigned. Then when a new Expr gets the same ID, it will skip materializing 
> slots, which can cause problems later (like if that Expr is part of a hash 
> join).
> Some example queries:
> {code}
> WITH v AS (SELECT 1 FROM functional.alltypestiny t1
>   JOIN functional.alltypestiny t2 ON t1.id = t2.id)
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}
> (already fixed via IMPALA-13203) and
> {code}
> WITH v as (SELECT 1 FROM functional.alltypes t1
>   JOIN functional.alltypes t2 ON t1.id = t2.id)
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND t1.id = 1 AND t1.id = 1 AND false
> UNION ALL
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND false
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-12165) Add support for compiling C++ code with -gsplit-dwarf

2024-09-04 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12165.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Add support for compiling C++ code with -gsplit-dwarf
> -
>
> Key: IMPALA-12165
> URL: https://issues.apache.org/jira/browse/IMPALA-12165
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The split DWARF option for compiling puts the debug info into a separate .dwo 
> file for each .o file. This dramatically reduces the amount of debug info 
> that goes into the binary itself, because multiple binaries can all point to 
> the .dwo files. This can dramatically reduce the disk space and link time, 
> which can be very useful for the development iteration cycle.
> Here are some numbers:
> {noformat}
> Debug build without -gsplit-dwarf:
> Disk space (after make be-test):
> $ du -s be/build
> 33018616        be/build
> Link time:
> touch be/build/debug/runtime/libRuntime.a
> time make -j8 be-test
> real    0m49.091s
> user    4m28.908s
> sys     0m40.422s
> Debug build with -gsplit-dwarf:
> Disk space (after make be-test):
> $ du -s be/build
> 17052144        be/build
> Link time:
> touch be/build/debug/runtime/libRuntime.a
> time make -j8 be-test
> real    0m16.798s
> user    1m12.010s
> sys     0m20.397s
> {noformat}
> Diskspace is about 1/2. Link time is about 1/3. This seems like it could be a 
> good improvement to the developer iteration cycle.
> This could be provided as an option alongside IMPALA_COMPRESSED_DEBUG_INFO / 
> IMPALA_MINIMAL_DEBUG_INFO.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10418) buffered-tuple-stream-test SegFault

2024-09-03 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-10418:
--

Assignee: Yida Wu

> buffered-tuple-stream-test SegFault
> ---
>
> Key: IMPALA-10418
> URL: https://issues.apache.org/jira/browse/IMPALA-10418
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Yida Wu
>Priority: Critical
>
> In impala-asf-master-core-tsan job, buffered-tuple-stream-test failed with 
> SegFault:
> 21:09:21 Start  42: buffered-tuple-stream-test
> 21:10:03  42/125 Test  #42: buffered-tuple-stream-test ...***Exception: 
> SegFault 41.54 sec
> 21:41:08 The following tests FAILED:
> 21:41:08   42 - buffered-tuple-stream-test (SEGFAULT)
> 21:41:08 Errors while running CTest
> 21:41:08 Generated: 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/logs/extra_junit_xml_logs/generate_junitxml.build.ab1561e9f9ff167819e634166aee7023.20210102_02_41_08.xml
> 21:41:08 make: *** [test] Error 8
> 21:41:08 ERROR in 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/bin/run-backend-tests.sh
>  at line 43: "${MAKE_CMD:-make}" test ARGS="${BE_TEST_ARGS}"
> 21:41:08 Generated: 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.run-backend-tests.20210102_02_41_08.xml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13350) Workload Management flush on interval test failed

2024-09-03 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879044#comment-17879044
 ] 

Michael Smith commented on IMPALA-13350:


Might be related to IMPALA-13347, linking it until proven otherwise.

> Workload Management flush on interval test failed
> -
>
> Key: IMPALA-13350
> URL: https://issues.apache.org/jira/browse/IMPALA-13350
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Michael Smith
>Assignee: Jason Fehr
>Priority: Critical
> Attachments: failure.tgz
>
>
> After IMPALA-12737 "Refactor the Workload Management Initialization Process", 
> custom_cluster/test_query_log.py::TestQueryLogTableHS2::test_flush_on_interval
>  failed with
> {code:java}
> -- 2024-09-03 11:50:35,147 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad=--shutdown_grace_period_s=0 
> --shutdown_deadline_s=15' '--impalad_args=--enable_workload_mgmt 
> --query_log_write_interval_s=15 ' '--state_store_args=None ' 
> '--catalogd_args=--enable_workload_mgmt ' 
> --impalad_args=--default_query_options=
> 11:50:35 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 11:50:35 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 11:50:35 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 11:50:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:38 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:38 MainThread: Debug webpage not yet available: 
> HTTPConnectionPool(host='impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com',
>  port=25000): Max retries exceeded with url: /backends?json (Caused by 
> NewConnectionError(' 0x7f99f9704d10>: Failed to establish a new connection: [Errno 111] Connection 
> refused',))
> 11:50:40 MainThread: Debug webpage did not become available in expected time.
> 11:50:40 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:50:41 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:41 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:41 MainThread: Debug webpage not yet available: 
> HTTPConnectionPool(host='impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com',
>  port=25000): Max retries exceeded with url: /backends?json (Caused by 
> NewConnectionError(' 0x7f99f97037d0>: Failed to establish a new connection: [Errno 111] Connection 
> refused',))
> 11:50:43 MainThread: Debug webpage did not become available in expected time.
> 11:50:43 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:50:44 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:44 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:44 MainThread: num_known_live_backends has reached value: 3
> 11:50:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25001
> 11:50:45 MainThread: num_known_live_backends has reached value: 3
> 11:50:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25002
> 11:50:45 MainThread: num_known_live_backends has reached value: 3
> 11:50:46 MainThread: Impala Cluster Running with 3 nodes (3

[jira] [Updated] (IMPALA-13350) Workload Management flush on interval test failed

2024-09-03 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13350:
---
Priority: Critical  (was: Major)

> Workload Management flush on interval test failed
> -
>
> Key: IMPALA-13350
> URL: https://issues.apache.org/jira/browse/IMPALA-13350
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Michael Smith
>Priority: Critical
> Attachments: failure.tgz
>
>
> After IMPALA-12737 "Refactor the Workload Management Initialization Process", 
> custom_cluster/test_query_log.py::TestQueryLogTableHS2::test_flush_on_interval
>  failed with
> {code:java}
> -- 2024-09-03 11:50:35,147 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad=--shutdown_grace_period_s=0 
> --shutdown_deadline_s=15' '--impalad_args=--enable_workload_mgmt 
> --query_log_write_interval_s=15 ' '--state_store_args=None ' 
> '--catalogd_args=--enable_workload_mgmt ' 
> --impalad_args=--default_query_options=
> 11:50:35 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 11:50:35 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 11:50:35 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 11:50:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:38 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:38 MainThread: Debug webpage not yet available: 
> HTTPConnectionPool(host='impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com',
>  port=25000): Max retries exceeded with url: /backends?json (Caused by 
> NewConnectionError(' 0x7f99f9704d10>: Failed to establish a new connection: [Errno 111] Connection 
> refused',))
> 11:50:40 MainThread: Debug webpage did not become available in expected time.
> 11:50:40 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:50:41 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:41 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:41 MainThread: Debug webpage not yet available: 
> HTTPConnectionPool(host='impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com',
>  port=25000): Max retries exceeded with url: /backends?json (Caused by 
> NewConnectionError(' 0x7f99f97037d0>: Failed to establish a new connection: [Errno 111] Connection 
> refused',))
> 11:50:43 MainThread: Debug webpage did not become available in expected time.
> 11:50:43 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:50:44 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:44 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:44 MainThread: num_known_live_backends has reached value: 3
> 11:50:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25001
> 11:50:45 MainThread: num_known_live_backends has reached value: 3
> 11:50:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25002
> 11:50:45 MainThread: num_known_live_backends has reached value: 3
> 11:50:46 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> -- 2024-09-03 11:50:46,553 DEBUGMainThread: Found 3 impalad/1 
> statestored/1 catalogd

[jira] [Assigned] (IMPALA-13350) Workload Management flush on interval test failed

2024-09-03 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13350:
--

Assignee: Jason Fehr

> Workload Management flush on interval test failed
> -
>
> Key: IMPALA-13350
> URL: https://issues.apache.org/jira/browse/IMPALA-13350
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Michael Smith
>Assignee: Jason Fehr
>Priority: Critical
> Attachments: failure.tgz
>
>
> After IMPALA-12737 "Refactor the Workload Management Initialization Process", 
> custom_cluster/test_query_log.py::TestQueryLogTableHS2::test_flush_on_interval
>  failed with
> {code:java}
> -- 2024-09-03 11:50:35,147 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad=--shutdown_grace_period_s=0 
> --shutdown_deadline_s=15' '--impalad_args=--enable_workload_mgmt 
> --query_log_write_interval_s=15 ' '--state_store_args=None ' 
> '--catalogd_args=--enable_workload_mgmt ' 
> --impalad_args=--default_query_options=
> 11:50:35 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 11:50:35 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 11:50:35 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 11:50:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:38 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:38 MainThread: Debug webpage not yet available: 
> HTTPConnectionPool(host='impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com',
>  port=25000): Max retries exceeded with url: /backends?json (Caused by 
> NewConnectionError(' 0x7f99f9704d10>: Failed to establish a new connection: [Errno 111] Connection 
> refused',))
> 11:50:40 MainThread: Debug webpage did not become available in expected time.
> 11:50:40 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:50:41 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:41 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:41 MainThread: Debug webpage not yet available: 
> HTTPConnectionPool(host='impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com',
>  port=25000): Max retries exceeded with url: /backends?json (Caused by 
> NewConnectionError(' 0x7f99f97037d0>: Failed to establish a new connection: [Errno 111] Connection 
> refused',))
> 11:50:43 MainThread: Debug webpage did not become available in expected time.
> 11:50:43 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:50:44 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:44 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:44 MainThread: num_known_live_backends has reached value: 3
> 11:50:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25001
> 11:50:45 MainThread: num_known_live_backends has reached value: 3
> 11:50:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25002
> 11:50:45 MainThread: num_known_live_backends has reached value: 3
> 11:50:46 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> -- 2024-09-03 11:50:46,553 DEBUGMainThread: Found 3 impala

[jira] [Updated] (IMPALA-13350) Workload Management flush on interval test failed

2024-09-03 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13350:
---
Attachment: failure.tgz

> Workload Management flush on interval test failed
> -
>
> Key: IMPALA-13350
> URL: https://issues.apache.org/jira/browse/IMPALA-13350
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Michael Smith
>Priority: Major
> Attachments: failure.tgz
>
>
> After IMPALA-12737 "Refactor the Workload Management Initialization Process", 
> custom_cluster/test_query_log.py::TestQueryLogTableHS2::test_flush_on_interval
>  failed with
> {code:java}
> -- 2024-09-03 11:50:35,147 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad=--shutdown_grace_period_s=0 
> --shutdown_deadline_s=15' '--impalad_args=--enable_workload_mgmt 
> --query_log_write_interval_s=15 ' '--state_store_args=None ' 
> '--catalogd_args=--enable_workload_mgmt ' 
> --impalad_args=--default_query_options=
> 11:50:35 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 11:50:35 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 11:50:35 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 11:50:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 11:50:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:38 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:38 MainThread: Debug webpage not yet available: 
> HTTPConnectionPool(host='impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com',
>  port=25000): Max retries exceeded with url: /backends?json (Caused by 
> NewConnectionError(' 0x7f99f9704d10>: Failed to establish a new connection: [Errno 111] Connection 
> refused',))
> 11:50:40 MainThread: Debug webpage did not become available in expected time.
> 11:50:40 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:50:41 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:41 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:41 MainThread: Debug webpage not yet available: 
> HTTPConnectionPool(host='impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com',
>  port=25000): Max retries exceeded with url: /backends?json (Caused by 
> NewConnectionError(' 0x7f99f97037d0>: Failed to establish a new connection: [Errno 111] Connection 
> refused',))
> 11:50:43 MainThread: Debug webpage did not become available in expected time.
> 11:50:43 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:50:44 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:44 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
> 11:50:44 MainThread: num_known_live_backends has reached value: 3
> 11:50:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25001
> 11:50:45 MainThread: num_known_live_backends has reached value: 3
> 11:50:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:50:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25002
> 11:50:45 MainThread: num_known_live_backends has reached value: 3
> 11:50:46 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> -- 2024-09-03 11:50:46,553 DEBUGMainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)

[jira] [Created] (IMPALA-13350) Workload Management flush on interval test failed

2024-09-03 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13350:
--

 Summary: Workload Management flush on interval test failed
 Key: IMPALA-13350
 URL: https://issues.apache.org/jira/browse/IMPALA-13350
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Michael Smith
 Attachments: failure.tgz

After IMPALA-12737 "Refactor the Workload Management Initialization Process", 
custom_cluster/test_query_log.py::TestQueryLogTableHS2::test_flush_on_interval 
failed with
{code:java}
-- 2024-09-03 11:50:35,147 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/start-impala-cluster.py
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
--log_dir=/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests
 --log_level=1 '--impalad=--shutdown_grace_period_s=0 --shutdown_deadline_s=15' 
'--impalad_args=--enable_workload_mgmt --query_log_write_interval_s=15 ' 
'--state_store_args=None ' '--catalogd_args=--enable_workload_mgmt ' 
--impalad_args=--default_query_options=
11:50:35 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
11:50:35 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/statestored.INFO
11:50:35 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
11:50:35 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad.INFO
11:50:35 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
11:50:35 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
11:50:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
11:50:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
11:50:38 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
11:50:38 MainThread: Debug webpage not yet available: 
HTTPConnectionPool(host='impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com',
 port=25000): Max retries exceeded with url: /backends?json (Caused by 
NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection 
refused',))
11:50:40 MainThread: Debug webpage did not become available in expected time.
11:50:40 MainThread: Waiting for num_known_live_backends=3. Current value: None
11:50:41 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
11:50:41 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
11:50:41 MainThread: Debug webpage not yet available: 
HTTPConnectionPool(host='impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com',
 port=25000): Max retries exceeded with url: /backends?json (Caused by 
NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection 
refused',))
11:50:43 MainThread: Debug webpage did not become available in expected time.
11:50:43 MainThread: Waiting for num_known_live_backends=3. Current value: None
11:50:44 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
11:50:44 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25000
11:50:44 MainThread: num_known_live_backends has reached value: 3
11:50:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
11:50:45 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25001
11:50:45 MainThread: num_known_live_backends has reached value: 3
11:50:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
11:50:45 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25002
11:50:45 MainThread: num_known_live_backends has reached value: 3
11:50:46 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
executors).
-- 2024-09-03 11:50:46,553 DEBUGMainThread: Found 3 impalad/1 statestored/1 
catalogd process(es)
-- 2024-09-03 11:50:46,553 INFO MainThread: Getting metric: 
statestore.live-backends from 
impala-ec2-centos79-m6i-4xlarge-xldisk-0f51.vpc.cloudera.com:25010
-- 2024-09-03 11:50:46,556 INFO MainThread: Metric 
'statestore.live-backends' has reached desired value: 4
-- 2024-09-03 11:50:46,556 DEBUGMainThread: Getting num_known_live_backends 
from impala-ec2-centos79-m6i-4xlarge-xld

[jira] [Assigned] (IMPALA-13348) statestore.active-status timeout

2024-09-03 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13348:
--

Assignee: Wenzhe Zhou

> statestore.active-status timeout
> 
>
> Key: IMPALA-13348
> URL: https://issues.apache.org/jira/browse/IMPALA-13348
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Michael Smith
>Assignee: Wenzhe Zhou
>Priority: Major
> Attachments: failure.tgz
>
>
> custom_cluster/test_statestored_ha.py::TestStatestoredHA::test_statestored_manual_failover_with_failed_rpc
>  failed a UBSAN test run with
> {code:java}
> AssertionError: Metric statestore.active-status did not reach value True in 
> 120s. {code}
> This is likely an intermittent issue; no code or test changes related to this 
> test were added since the last failure. Logs attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13348) statestore.active-status timeout

2024-09-03 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13348:
---
Labels: intermittent  (was: )

> statestore.active-status timeout
> 
>
> Key: IMPALA-13348
> URL: https://issues.apache.org/jira/browse/IMPALA-13348
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Michael Smith
>Assignee: Wenzhe Zhou
>Priority: Major
>  Labels: intermittent
> Attachments: failure.tgz
>
>
> custom_cluster/test_statestored_ha.py::TestStatestoredHA::test_statestored_manual_failover_with_failed_rpc
>  failed a UBSAN test run with
> {code:java}
> AssertionError: Metric statestore.active-status did not reach value True in 
> 120s. {code}
> This is likely an intermittent issue; no code or test changes related to this 
> test were added since the last failure. Logs attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13348) statestore.active-status timeout

2024-09-03 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13348:
---
Attachment: failure.tgz

> statestore.active-status timeout
> 
>
> Key: IMPALA-13348
> URL: https://issues.apache.org/jira/browse/IMPALA-13348
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Michael Smith
>Assignee: Wenzhe Zhou
>Priority: Major
> Attachments: failure.tgz
>
>
> custom_cluster/test_statestored_ha.py::TestStatestoredHA::test_statestored_manual_failover_with_failed_rpc
>  failed a UBSAN test run with
> {code:java}
> AssertionError: Metric statestore.active-status did not reach value True in 
> 120s. {code}
> This is likely an intermittent issue; no code or test changes related to this 
> test were added since the last failure. Logs attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13348) statestore.active-status timeout

2024-09-03 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13348:
--

 Summary: statestore.active-status timeout
 Key: IMPALA-13348
 URL: https://issues.apache.org/jira/browse/IMPALA-13348
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Michael Smith


custom_cluster/test_statestored_ha.py::TestStatestoredHA::test_statestored_manual_failover_with_failed_rpc
 failed a UBSAN test run with
{code:java}
AssertionError: Metric statestore.active-status did not reach value True in 
120s. {code}
This is likely an intermittent issue; no code or test changes related to this 
test were added since the last failure. Logs attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-13347) TSAN failure in backend tests after IMPALA-12737

2024-09-03 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13347:
--

Assignee: Jason Fehr

> TSAN failure in backend tests after IMPALA-12737
> 
>
> Key: IMPALA-13347
> URL: https://issues.apache.org/jira/browse/IMPALA-13347
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Michael Smith
>Assignee: Jason Fehr
>Priority: Critical
>
> Impala backend tests (expr-test, session-expiry-test, internal-server-test) 
> started detecting a thread leak in TSAN builds after IMPALA-12737 "Refactor 
> the Workload Management Initialization Process" was merged. Example:
> {code:java}
>  WARNING: ThreadSanitizer: thread leak (pid=475)
>   Thread T566 (tid=1135, finished) created by main thread at:
> #0 pthread_create  (unifiedbetests+0x23d7083)
> #1 boost::thread::start_thread_noexcept()  
> (unifiedbetests+0x521607d)
> #2 boost::thread::thread std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> std::__cxx11::basic_string, std::allocator 
> >, std::__cxx11::basic_string, 
> std::allocator >, boost::function, impala::ThreadDebugInfo*, 
> impala::Promise*>(void 
> (*)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), std::__cxx11::basic_string std::char_traits, std::allocator >, 
> std::__cxx11::basic_string, std::allocator 
> >, boost::function, impala::ThreadDebugInfo*, impala::Promise (impala::PromiseMode)0>*) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/thread/detail/thread.hpp:424:13
>  (unifiedbetests+0x4f8b214)
> #3 impala::Thread::StartThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function const&, std::unique_ptr std::default_delete >*, bool) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/util/thread.cc:317:13
>  (unifiedbetests+0x4f8761c)
> #4 impala::Status impala::Thread::Create boost::_mfi::mf0, 
> boost::_bi::list1 > > 
> >(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> boost::_bi::bind_t, 
> boost::_bi::list1 > > const&, 
> std::unique_ptr >*, bool) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/util/thread.h:74:12
>  (unifiedbetests+0x4c695f2)
> #5 impala::ImpalaServer::Start(int, int, int, int) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/service/impala-server.cc:3220:5
>  (unifiedbetests+0x4c63e4a)
> #6 impala::InProcessImpalaServer::StartWithClientServers(int, int, int) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/testutil/in-process-servers.cc:97:3
>  (unifiedbetests+0x501bd3b)
> #7 
> impala::InProcessImpalaServer::StartWithEphemeralPorts(std::__cxx11::basic_string  std::char_traits, std::allocator > const&, int, 
> impala::InProcessImpalaServer**) 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/testutil/in-process-servers.cc:69:21
>  (unifiedbetests+0x501bc22)
> #8 impala::ExprTest::SetUpTestCase() 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/exprs/expr-test.cc:235:5
>  (unifiedbetests+0x26d42f9)
> #9 void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) 
>  (unifiedbetests+0x6876a5c)
> #10 main 
> /data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/service/unified-betest-main.cc:48:10
>  (unifiedbetests+0x24363e0)
> SUMMARY: ThreadSanitizer: thread leak 
> (/data0/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/build/debug/service/unifiedbetests+0x23d7083)
>  in __interceptor_pthread_create{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13347) TSAN failure in backend tests after IMPALA-12737

2024-09-03 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13347:
--

 Summary: TSAN failure in backend tests after IMPALA-12737
 Key: IMPALA-13347
 URL: https://issues.apache.org/jira/browse/IMPALA-13347
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.5.0
Reporter: Michael Smith


Impala backend tests (expr-test, session-expiry-test, internal-server-test) 
started detecting a thread leak in TSAN builds after IMPALA-12737 "Refactor the 
Workload Management Initialization Process" was merged. Example:
{code:java}
 WARNING: ThreadSanitizer: thread leak (pid=475)
  Thread T566 (tid=1135, finished) created by main thread at:
#0 pthread_create  (unifiedbetests+0x23d7083)
#1 boost::thread::start_thread_noexcept()  (unifiedbetests+0x521607d)
#2 boost::thread::thread, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*), 
std::__cxx11::basic_string, std::allocator 
>, std::__cxx11::basic_string, 
std::allocator >, boost::function, impala::ThreadDebugInfo*, 
impala::Promise*>(void 
(*)(std::__cxx11::basic_string, 
std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), std::__cxx11::basic_string, std::allocator >, 
std::__cxx11::basic_string, std::allocator 
>, boost::function, impala::ThreadDebugInfo*, impala::Promise*) 
/data/jenkins/workspace/impala-cdw-master-staging-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/thread/detail/thread.hpp:424:13
 (unifiedbetests+0x4f8b214)
#3 impala::Thread::StartThread(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function const&, std::unique_ptr >*, bool) 
/data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/util/thread.cc:317:13
 (unifiedbetests+0x4f8761c)
#4 impala::Status impala::Thread::Create, 
boost::_bi::list1 > > 
>(std::__cxx11::basic_string, std::allocator 
> const&, std::__cxx11::basic_string, 
std::allocator > const&, boost::_bi::bind_t, 
boost::_bi::list1 > > const&, 
std::unique_ptr >*, bool) 
/data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/util/thread.h:74:12
 (unifiedbetests+0x4c695f2)
#5 impala::ImpalaServer::Start(int, int, int, int) 
/data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/service/impala-server.cc:3220:5
 (unifiedbetests+0x4c63e4a)
#6 impala::InProcessImpalaServer::StartWithClientServers(int, int, int) 
/data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/testutil/in-process-servers.cc:97:3
 (unifiedbetests+0x501bd3b)
#7 
impala::InProcessImpalaServer::StartWithEphemeralPorts(std::__cxx11::basic_string, std::allocator > const&, int, 
impala::InProcessImpalaServer**) 
/data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/testutil/in-process-servers.cc:69:21
 (unifiedbetests+0x501bc22)
#8 impala::ExprTest::SetUpTestCase() 
/data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/exprs/expr-test.cc:235:5
 (unifiedbetests+0x26d42f9)
#9 void 
testing::internal::HandleExceptionsInMethodIfSupported(testing::TestSuite*, void (testing::TestSuite::*)(), char const*)  
(unifiedbetests+0x6876a5c)
#10 main 
/data/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/src/service/unified-betest-main.cc:48:10
 (unifiedbetests+0x24363e0)

SUMMARY: ThreadSanitizer: thread leak 
(/data0/jenkins/workspace/impala-cdw-master-staging-core-tsan/repos/Impala/be/build/debug/service/unifiedbetests+0x23d7083)
 in __interceptor_pthread_create{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-13344) Impala rewrites may be incomplete

2024-08-29 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13344 started by Michael Smith.
--
> Impala rewrites may be incomplete
> -
>
> Key: IMPALA-13344
> URL: https://issues.apache.org/jira/browse/IMPALA-13344
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Impala's ExprRewriteRules may not be completely applied for complex 
> expressions. Some implementations of ExprRewriteRule don't analyze new Expr 
> they produce, which can lead to other rewrite rules ignoring those Expr. 
> reAnalyze covers some of this by doing a 2nd analysis phase, but that only 
> works for two layers.
> We should analyze any new Expr produced by an ExprRewriteRule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13344) Impala rewrites may be incomplete

2024-08-29 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13344:
--

 Summary: Impala rewrites may be incomplete
 Key: IMPALA-13344
 URL: https://issues.apache.org/jira/browse/IMPALA-13344
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.0.0
Reporter: Michael Smith


Impala's ExprRewriteRules may not be completely applied for complex 
expressions. Some implementations of ExprRewriteRule don't analyze new Expr 
they produce, which can lead to other rewrite rules ignoring those Expr. 
reAnalyze covers some of this by doing a 2nd analysis phase, but that only 
works for two layers.

We should analyze any new Expr produced by an ExprRewriteRule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13302) Some ExprRewriteRule results are not analyzed, leading to unmaterialized slots from reAnalyze

2024-08-29 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877896#comment-17877896
 ] 

Michael Smith commented on IMPALA-13302:


I think this is an interaction of two bugs:
 # registerConjuncts can now call markConjunctAssigned on Expr that weren't 
first registered vi registerConjunct with this Analyzer. In most cases this 
isn't a problem because markConjunctAssigned essentially ignores a null ID. 
This is primarily a problem if the Expr did get an ID assigned in the 1st 
analysis, then reAnalyze encounters a constant false expression and skips 
registering them, but still assigns them with their old ID. That can lead to 
two Expr having the same ID.
 # It's pretty easy to construct an expression that is only partially rewritten 
during the first pass, because several RewriteExprRules don't analyze new 
conjuncts they create (leading to other rules skipping rewrites). This seems to 
be the primary way we trigger bug (1), where a rewrite rule (examples: 
ExtractCommonConjunctRule, NormalizeExprsRule) produces a constant false 
conjunct but SimplifyConditionalsRule doesn't rewrite it because it wasn't 
analyzed.

I'm going to file a separate bug for (2) as it has other implications 
(incomplete rewrites).

> Some ExprRewriteRule results are not analyzed, leading to unmaterialized 
> slots from reAnalyze
> -
>
> Key: IMPALA-13302
> URL: https://issues.apache.org/jira/browse/IMPALA-13302
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>
> IMPALA-12164 skipped registering conjuncts that the analyzer expects to 
> remove because an earlier conjunct evaluates to constant False. However some 
> ExprRewriteRules don't analyze the predicates they produce, which can lead to 
> those conjuncts not actually being removed until a reAnalyze phase.
> reAnalyze uses a new Analyzer (with new GlobalState); it restarts counting 
> Expr IDs from 0. That can lead to re-using the same Expr ID and marking it as 
> assigned. Then when a new Expr gets the same ID, it will skip materializing 
> slots, which can cause problems later (like if that Expr is part of a hash 
> join).
> Some example queries:
> {code}
> WITH v AS (SELECT 1 FROM functional.alltypestiny t1
>   JOIN functional.alltypestiny t2 ON t1.id = t2.id)
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}
> (already fixed via IMPALA-13203) and
> {code}
> WITH v as (SELECT 1 FROM functional.alltypes t1
>   JOIN functional.alltypes t2 ON t1.id = t2.id)
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND t1.id = 1 AND t1.id = 1 AND false
> UNION ALL
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND false
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-13185) Tuple cache keys need to incorporate runtime filter information

2024-08-26 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13185 started by Michael Smith.
--
> Tuple cache keys need to incorporate runtime filter information
> ---
>
> Key: IMPALA-13185
> URL: https://issues.apache.org/jira/browse/IMPALA-13185
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Major
>
> If a runtime filter impacts the results of a fragment, then the tuple cache 
> key needs to incorporate information about the generation of that runtime 
> filter. This needs to include information about the base tables that impact 
> the runtime filter.
> For example, suppose there is a join. The build side of the join produces a 
> runtime filter that gets delivered to the probe side of the join. The tuple 
> cache key for the probe side of the join will need to include a 
> representation of the runtime filter. If the table on the build side of the 
> join changes, the tuple cache key for the probe side needs to change due to 
> the possible difference in the runtime filter.
> This can also impact eligibility. In theory, the build side of a join could 
> be constructed from a source with a limit specified, and this can result in 
> non-determinism. Since the build of the runtime filter is not deterministic, 
> the consumer of the runtime filter is not deterministic and can't participate 
> in tuple caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13185) Tuple cache keys need to incorporate runtime filter information

2024-08-26 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876856#comment-17876856
 ] 

Michael Smith commented on IMPALA-13185:


A simple example of this problem from TPC-DS:
{code:java}
with ssales as (
  select sum(ss_net_paid) netpaid, ss_store_sk, i_color from store_sales, item
where ss_item_sk = i_item_sk group by ss_store_sk, i_color)
select sum(netpaid) from ssales where i_color = 'peach' group by ss_store_sk 
order by ss_store_sk;

with ssales as (
  select sum(ss_net_paid) netpaid, ss_store_sk, i_color from store_sales, item
where ss_item_sk = i_item_sk group by ss_store_sk, i_color)
select sum(netpaid) from ssales where i_color = 'saddle' group by ss_store_sk 
order by ss_store_sk; {code}
The {{where}} conjunct differs between the two calls ("peach" vs "saddle"). In 
the first call, the resulting list of {{i_item_sk}} selected by 
{{i_color=peach}} is delivered and only the rows matching it our cached. In the 
2nd query, we re-use the scan result from the 1st query, but the rows are wrong 
because they all correspond to "peach", not "saddle"; so the query returns 0 
rows.

> Tuple cache keys need to incorporate runtime filter information
> ---
>
> Key: IMPALA-13185
> URL: https://issues.apache.org/jira/browse/IMPALA-13185
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Major
>
> If a runtime filter impacts the results of a fragment, then the tuple cache 
> key needs to incorporate information about the generation of that runtime 
> filter. This needs to include information about the base tables that impact 
> the runtime filter.
> For example, suppose there is a join. The build side of the join produces a 
> runtime filter that gets delivered to the probe side of the join. The tuple 
> cache key for the probe side of the join will need to include a 
> representation of the runtime filter. If the table on the build side of the 
> join changes, the tuple cache key for the probe side needs to change due to 
> the possible difference in the runtime filter.
> This can also impact eligibility. In theory, the build side of a join could 
> be constructed from a source with a limit specified, and this can result in 
> non-determinism. Since the build of the runtime filter is not deterministic, 
> the consumer of the runtime filter is not deterministic and can't participate 
> in tuple caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13323) Remove redundant tests in test_join_queries.py

2024-08-26 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13323.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Remove redundant tests in test_join_queries.py
> --
>
> Key: IMPALA-13323
> URL: https://issues.apache.org/jira/browse/IMPALA-13323
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Test
>Affects Versions: Impala 4.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Minor
> Fix For: Impala 4.5.0
>
>
> test_join_queries.py is expensive to run in exhaustive exploration because it 
> run many test dimension permutation, but actually never exercise some of the 
> dimensions. Those redundant tests are follows:
>  * Have mt_dop dimension, but not exercising it:
> test_outer_to_inner_joins
> test_single_node_nested_loop_joins
>  * Have batch_size dimension but not exercising it:
> test_outer_to_inner_joins
> test_single_node_nested_loop_joins
> test_single_node_nested_loop_joins_exhaustive
> test_semi_joins_exhaustive
>  * Have enable_outer_join_to_inner_transformation dimension but not 
> exercising it:
> All TestJoinQueries except test_outer_to_inner_joins
> test_miss_tuple_joins is also valid to run with much fewer test dimension 
> because it mainly test correctness of predicate pushdown during planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12957) Support reading Inf and NaN from JSON

2024-08-22 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876171#comment-17876171
 ] 

Michael Smith commented on IMPALA-12957:


Sounds good.

> Support reading Inf and NaN from JSON
> -
>
> Key: IMPALA-12957
> URL: https://issues.apache.org/jira/browse/IMPALA-12957
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Zihao Ye
>Assignee: Zihao Ye
>Priority: Major
>
> Despite the fact that special values such as Inf and NaN are not supported in 
>  standard JSON (they are considered invalid values), rapidjson does support 
> them. However, it requires the parsing flag kParseNanAndInfFlag to be 
> enabled. This flag is not enabled in the current JsonParser::Parse(), we 
> could enable it for support reading Inf and NaN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13274) Crash in impala::RowDescriptor::TupleIsNullable(int)

2024-08-22 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13274:
---
Target Version: Impala 4.4.2  (was: Impala 4.4.1)

> Crash in impala::RowDescriptor::TupleIsNullable(int)
> 
>
> Key: IMPALA-13274
> URL: https://issues.apache.org/jira/browse/IMPALA-13274
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Zhi Tang
>Assignee: Zhi Tang
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: image-2024-08-05-17-48-01-975.png
>
>
> Log information:
>   #
>   # A fatal error has been detected by the Java Runtime Environment:
>   #
>   #  SIGSEGV (0xb) at pc=0x011dcad9, pid=63990, tid=0x7f4137874700
>   #
>   # JRE version: Java(TM) SE Runtime Environment (8.0_152-b16) (build 
> 1.8.0_152-b16)
>   # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.152-b16 mixed mode 
> linux-amd64 compressed oops)
>   # Problematic frame:
>   # C  [impalad+0xddcad9]  impala::RowDescriptor::TupleIsNullable(int) 
> const+0x19
>   #
>   # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
>   #
>   # An error report file with more information is saved as:
>   # //hs_err_pid63990.log
>   #
>   # If you would like to submit a bug report, please visit:
>   #   [http://bugreport.java.com/bugreport/crash.jsp]
>   #



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13274) Crash in impala::RowDescriptor::TupleIsNullable(int)

2024-08-22 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13274.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Crash in impala::RowDescriptor::TupleIsNullable(int)
> 
>
> Key: IMPALA-13274
> URL: https://issues.apache.org/jira/browse/IMPALA-13274
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Zhi Tang
>Assignee: Zhi Tang
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: image-2024-08-05-17-48-01-975.png
>
>
> Log information:
>   #
>   # A fatal error has been detected by the Java Runtime Environment:
>   #
>   #  SIGSEGV (0xb) at pc=0x011dcad9, pid=63990, tid=0x7f4137874700
>   #
>   # JRE version: Java(TM) SE Runtime Environment (8.0_152-b16) (build 
> 1.8.0_152-b16)
>   # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.152-b16 mixed mode 
> linux-amd64 compressed oops)
>   # Problematic frame:
>   # C  [impalad+0xddcad9]  impala::RowDescriptor::TupleIsNullable(int) 
> const+0x19
>   #
>   # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
>   #
>   # An error report file with more information is saved as:
>   # //hs_err_pid63990.log
>   #
>   # If you would like to submit a bug report, please visit:
>   #   [http://bugreport.java.com/bugreport/crash.jsp]
>   #



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12957) Support reading Inf and NaN from JSON

2024-08-22 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876064#comment-17876064
 ] 

Michael Smith commented on IMPALA-12957:


Could you explain a rationale for Impala to support them, besides that it's 
easy to add?

> Support reading Inf and NaN from JSON
> -
>
> Key: IMPALA-12957
> URL: https://issues.apache.org/jira/browse/IMPALA-12957
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Zihao Ye
>Assignee: Zihao Ye
>Priority: Major
>
> Despite the fact that special values such as Inf and NaN are not supported in 
>  standard JSON (they are considered invalid values), rapidjson does support 
> them. However, it requires the parsing flag kParseNanAndInfFlag to be 
> enabled. This flag is not enabled in the current JsonParser::Parse(), we 
> could enable it for support reading Inf and NaN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13317) Test_tuple_cache_tpc_queries failing

2024-08-22 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876061#comment-17876061
 ] 

Michael Smith commented on IMPALA-13317:


Where is this file coming from?

> Test_tuple_cache_tpc_queries failing
> 
>
> Key: IMPALA-13317
> URL: https://issues.apache.org/jira/browse/IMPALA-13317
> Project: IMPALA
>  Issue Type: Bug
>Reporter: gaurav singh
>Assignee: Yida Wu
>Priority: Critical
>
> collection failure: ValueError: invalid literal for int() with base 10: '1a'
> Stack Trace:
> {noformat}
> query_test/test_tuple_cache_tpc_queries.py:60: in 
> class TestTupleCacheTpcdsQuery(ImpalaTestSuite):
> query_test/test_tuple_cache_tpc_queries.py:74: in TestTupleCacheTpcdsQuery
> @pytest.mark.parametrize("query", load_tpc_queries_name_sorted('tpcds'))
> util/test_file_parser.py:673: in load_tpc_queries_name_sorted
> queries = sorted(queries, key=tpc_sort_key)
> util/test_file_parser.py:659: in tpc_sort_key
> y = int(parts[2]) if len(parts) > 2 else 0
> E   ValueError: invalid literal for int() with base 10: '1a' {noformat}
> h4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13313) Potential deadlock in ImpalaServer::ExpireQueries()

2024-08-20 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13313:
---
Target Version: Impala 4.4.2

> Potential deadlock in ImpalaServer::ExpireQueries()
> ---
>
> Key: IMPALA-13313
> URL: https://issues.apache.org/jira/browse/IMPALA-13313
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Yida Wu
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> IMPALA-12602 introduces a way to unregister a query from a session when 
> idle_query_timeout is reached. However, it also includes logic in 
> ExpireQueries() that could cause a deadlock by trying to get the 
> SessionState::lock while also holding query_expiration_lock_. This violates 
> the lock order defined in 
> [impala-server.h|https://github.com/apache/impala/blob/9848cd84be6ed07fe542b82d2e2628e658690621/be/src/service/impala-server.h#L187]
>  and could potentially result in a deadlock.
> For example, it can have a deadlock with 
> [SetInFlight()|https://github.infra.cloudera.com/CDH/Impala/blob/1e4e196b53c2ba88c58d13dfb3709b849767a109/be/src/service/impala-server.cc#L1386],
>  which may try to get the query_expiration_lock_ while holding 
> SessionState::lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13313) Potential deadlock in ImpalaServer::ExpireQueries()

2024-08-20 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13313.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Potential deadlock in ImpalaServer::ExpireQueries()
> ---
>
> Key: IMPALA-13313
> URL: https://issues.apache.org/jira/browse/IMPALA-13313
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Yida Wu
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> IMPALA-12602 introduces a way to unregister a query from a session when 
> idle_query_timeout is reached. However, it also includes logic in 
> ExpireQueries() that could cause a deadlock by trying to get the 
> SessionState::lock while also holding query_expiration_lock_. This violates 
> the lock order defined in 
> [impala-server.h|https://github.com/apache/impala/blob/9848cd84be6ed07fe542b82d2e2628e658690621/be/src/service/impala-server.h#L187]
>  and could potentially result in a deadlock.
> For example, it can have a deadlock with 
> [SetInFlight()|https://github.infra.cloudera.com/CDH/Impala/blob/1e4e196b53c2ba88c58d13dfb3709b849767a109/be/src/service/impala-server.cc#L1386],
>  which may try to get the query_expiration_lock_ while holding 
> SessionState::lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-13313) Potential deadlock in ImpalaServer::ExpireQueries()

2024-08-20 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13313 started by Michael Smith.
--
> Potential deadlock in ImpalaServer::ExpireQueries()
> ---
>
> Key: IMPALA-13313
> URL: https://issues.apache.org/jira/browse/IMPALA-13313
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Yida Wu
>Assignee: Michael Smith
>Priority: Critical
>
> IMPALA-12602 introduces a way to unregister a query from a session when 
> idle_query_timeout is reached. However, it also includes logic in 
> ExpireQueries() that could cause a deadlock by trying to get the 
> SessionState::lock while also holding query_expiration_lock_. This violates 
> the lock order defined in 
> [impala-server.h|https://github.com/apache/impala/blob/9848cd84be6ed07fe542b82d2e2628e658690621/be/src/service/impala-server.h#L187]
>  and could potentially result in a deadlock.
> For example, it can have a deadlock with 
> [SetInFlight()|https://github.infra.cloudera.com/CDH/Impala/blob/1e4e196b53c2ba88c58d13dfb3709b849767a109/be/src/service/impala-server.cc#L1386],
>  which may try to get the query_expiration_lock_ while holding 
> SessionState::lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13313) Potential deadlock in ImpalaServer::ExpireQueries()

2024-08-20 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13313:
---
Priority: Critical  (was: Major)

> Potential deadlock in ImpalaServer::ExpireQueries()
> ---
>
> Key: IMPALA-13313
> URL: https://issues.apache.org/jira/browse/IMPALA-13313
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Yida Wu
>Assignee: Michael Smith
>Priority: Critical
>
> IMPALA-12602 introduces a way to unregister a query from a session when 
> idle_query_timeout is reached. However, it also includes logic in 
> ExpireQueries() that could cause a deadlock by trying to get the 
> SessionState::lock while also holding query_expiration_lock_. This violates 
> the lock order defined in 
> [impala-server.h|https://github.com/apache/impala/blob/9848cd84be6ed07fe542b82d2e2628e658690621/be/src/service/impala-server.h#L187]
>  and could potentially result in a deadlock.
> For example, it can have a deadlock with 
> [SetInFlight()|https://github.infra.cloudera.com/CDH/Impala/blob/1e4e196b53c2ba88c58d13dfb3709b849767a109/be/src/service/impala-server.cc#L1386],
>  which may try to get the query_expiration_lock_ while holding 
> SessionState::lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-19 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875010#comment-17875010
 ] 

Michael Smith commented on IMPALA-13262:


Oh I see, row_number() is an analytic function added to the rows passed to it. 
So if we push the predicate eval below the analytic plan node, it numbers just 
the subset of rows it receives.

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
>
> We found that in some scenario Apache Impala 
> ([https://github.com/apache/impala/commit/c539874]) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following query should return 0 row. However Apache Impala produces 
> one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date;
> set explain_level=2;
> // In the output of the EXPLAIN statement, we found that the predicate 
> "start_data = end_date" was pushed
> // down to the scan node, which is wrong.
> | 01:SCAN HDFS [impala_13262.department, RANDOM]                              
>                           |
> |    HDFS partitions=1/1 files=3 size=132B                                    
>                           |
> |    predicates: start_date = end_date                                        
>                           |
> |    stored statistics:                                                       
>                           |
> |      table: rows=unavailable size=unavailable                               
>                           |
> |      columns: unavailable                                                   
>                           |
> |    extrapolated-rows=disabled max-scan-range-rows=unavailable               
>                           |
> |    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
>                           |
> |    tuple-ids=1 row-size=40B cardinality=1                                   
>                           |
> |    in pipelines: 01(GETNEXT)                                                
>                           |
> +---+
> {code}
>  
> +*Edit:*+
> The following is a smaller case to reproduce the issue. The correct result 
> should be 0 row but Impala returns 1 row as above.
> {code:java}
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1 and t2.start_date=t2.end_date;
> {code}
>  
> Recall the contents of the inline view '{*}t2{*}' above is as follows.
> {code:java}
> +-+---+-+-++
> | dept_no | dept_rank | start_date  | end_date| rn |
> +-+---+-+-++
> | 1   | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1  |
> | 1   | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2  |
> | 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3  |
> +-+---+-+-++
> {code}
>  
> On the other hand, the following query without the conjunct '{*}rn=1{*}' 
> returns the correct result, which is the row with '{*}rn{*}' equal to *3* 
> above. It almost looks like adding this '{*}rn=1{*}' predicate triggers the 
> incorrect pushdown of '{*}t2.start_date=t2.end_date{*}' to the scan node of 
> the table '{*}department{*}'.
> {code:java}
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where t2.start_date=t2.end_date;
> {code}



--
This message was sent by Atlassian Jira
(v8

[jira] [Commented] (IMPALA-13262) Predicate pushdown causes incorrect results in join condition

2024-08-19 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875008#comment-17875008
 ] 

Michael Smith commented on IMPALA-13262:


I'm still a little unclear why pushing into the ScanNode causes this issue. 
Wouldn't that limit the ScanNode to return only the 3rd row?

> Predicate pushdown causes incorrect results in join condition
> -
>
> Key: IMPALA-13262
> URL: https://issues.apache.org/jira/browse/IMPALA-13262
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: correctness
>
> We found that in some scenario Apache Impala 
> ([https://github.com/apache/impala/commit/c539874]) could incorrectly push 
> predicates to scan nodes, which in turn produces the wrong result. The 
> following is a concrete example to reproduce the issue.
> {code:sql}
> create database impala_13262;
> use impala_13262;
> create table department ( dept_no integer, dept_rank integer, start_date 
> timestamp,end_date timestamp);
> insert into department values(1,1,'2024-01-01','2024-01-02');
> insert into department values(1,2,'2024-01-02','2024-01-03');
> insert into department values(1,3,'2024-01-03','2024-01-03');
> create table employee (employee_no integer, depart_no integer);
> insert into employee values (1,1);
> // The following query should return 0 row. However Apache Impala produces 
> one row.
> select * from employee t1
> inner join (
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1
> ) t2
> on t1.depart_no=t2.dept_no
> where t2.start_date=t2.end_date;
> set explain_level=2;
> // In the output of the EXPLAIN statement, we found that the predicate 
> "start_data = end_date" was pushed
> // down to the scan node, which is wrong.
> | 01:SCAN HDFS [impala_13262.department, RANDOM]                              
>                           |
> |    HDFS partitions=1/1 files=3 size=132B                                    
>                           |
> |    predicates: start_date = end_date                                        
>                           |
> |    stored statistics:                                                       
>                           |
> |      table: rows=unavailable size=unavailable                               
>                           |
> |      columns: unavailable                                                   
>                           |
> |    extrapolated-rows=disabled max-scan-range-rows=unavailable               
>                           |
> |    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
>                           |
> |    tuple-ids=1 row-size=40B cardinality=1                                   
>                           |
> |    in pipelines: 01(GETNEXT)                                                
>                           |
> +---+
> {code}
>  
> +*Edit:*+
> The following is a smaller case to reproduce the issue. The correct result 
> should be 0 row but Impala returns 1 row as above.
> {code:java}
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where rn=1 and t2.start_date=t2.end_date;
> {code}
>  
> Recall the contents of the inline view '{*}t2{*}' above is as follows.
> {code:java}
> +-+---+-+-++
> | dept_no | dept_rank | start_date  | end_date| rn |
> +-+---+-+-++
> | 1   | 1 | 2024-01-01 00:00:00 | 2024-01-02 00:00:00 | 1  |
> | 1   | 2 | 2024-01-02 00:00:00 | 2024-01-03 00:00:00 | 2  |
> | 1   | 3 | 2024-01-03 00:00:00 | 2024-01-03 00:00:00 | 3  |
> +-+---+-+-++
> {code}
>  
> On the other hand, the following query without the conjunct '{*}rn=1{*}' 
> returns the correct result, which is the row with '{*}rn{*}' equal to *3* 
> above. It almost looks like adding this '{*}rn=1{*}' predicate triggers the 
> incorrect pushdown of '{*}t2.start_date=t2.end_date{*}' to the scan node of 
> the table '{*}department{*}'.
> {code:java}
> select * from
> (
> select dept_no,dept_rank,start_date,end_date
> ,row_number() over(partition by dept_no order by dept_rank) rn
> from department
> ) t2
> where t2.start_date=t2.end_date;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

--

[jira] [Closed] (IMPALA-13307) disable_codegen_rows_threshold should apply to individual fragments

2024-08-19 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith closed IMPALA-13307.
--
Resolution: Fixed

IMPALA-5443 isn't much more work, and supersedes any improvements here.

> disable_codegen_rows_threshold should apply to individual fragments
> ---
>
> Key: IMPALA-13307
> URL: https://issues.apache.org/jira/browse/IMPALA-13307
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Smith
>Priority: Major
>  Labels: codegen
>
> The {{disable_codegen_rows_threshold}} is currently applied query-wide. So if 
> a single scan deep in the query is expected to read a large number of rows, 
> but everything else expects to see hundreds of rows because the results will 
> be aggregated or heavily filtered, we still codegen the whole query.
> {{disable_codegen_rows_threshold}} should be evaluated for each fragment so 
> we can determine whether to codegen based on the particular workload they'll 
> be handling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-5443) Consider automatically disabling codegen per ExecNode based on planner estimates

2024-08-19 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-5443 started by Michael Smith.
-
> Consider automatically disabling codegen per ExecNode based on planner 
> estimates
> 
>
> Key: IMPALA-5443
> URL: https://issues.apache.org/jira/browse/IMPALA-5443
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.9.0
>Reporter: Tim Armstrong
>Assignee: Michael Smith
>Priority: Minor
>  Labels: codegen, performance
>
> We should consider automatically disabling codegen per plan node based on the 
> number of input/output rows. We have the plumbing already in place (it's used 
> to disable codegen for non-grouping merge aggregations) but choosing and 
> implementing a good policy is trickier.
> We should be conservative about doing this, because the cost of codegen is 
> O(1) but the runtime speedup is O\(n).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13307) disable_codegen_rows_threshold should apply to individual fragments

2024-08-19 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874925#comment-17874925
 ] 

Michael Smith commented on IMPALA-13307:


Good point, I'd forgotten about that one. We've separately discussed whether it 
would be feasible to do whole-fragment codegen, but I don't think we're there 
yet. I'll look at how different the implementation would be.

> disable_codegen_rows_threshold should apply to individual fragments
> ---
>
> Key: IMPALA-13307
> URL: https://issues.apache.org/jira/browse/IMPALA-13307
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Smith
>Priority: Major
>  Labels: codegen
>
> The {{disable_codegen_rows_threshold}} is currently applied query-wide. So if 
> a single scan deep in the query is expected to read a large number of rows, 
> but everything else expects to see hundreds of rows because the results will 
> be aggregated or heavily filtered, we still codegen the whole query.
> {{disable_codegen_rows_threshold}} should be evaluated for each fragment so 
> we can determine whether to codegen based on the particular workload they'll 
> be handling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13307) disable_codegen_rows_threshold should apply to individual fragments

2024-08-16 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13307:
--

 Summary: disable_codegen_rows_threshold should apply to individual 
fragments
 Key: IMPALA-13307
 URL: https://issues.apache.org/jira/browse/IMPALA-13307
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Michael Smith


The {{disable_codegen_rows_threshold}} is currently applied query-wide. So if a 
single scan deep in the query is expected to read a large number of rows, but 
everything else expects to see hundreds of rows because the results will be 
aggregated or heavily filtered, we still codegen the whole query.

{{disable_codegen_rows_threshold}} should be evaluated for each fragment so we 
can determine whether to codegen based on the particular workload they'll be 
handling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13289) Skip CodeGen if expression is too large

2024-08-16 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874408#comment-17874408
 ] 

Michael Smith commented on IMPALA-13289:


We should also analyze why codegen is so slow for this case. 
[https://llvm.org/docs/Frontend/PerformanceTips.html] might be helpful.

> Skip CodeGen if expression is too large
> ---
>
> Key: IMPALA-13289
> URL: https://issues.apache.org/jira/browse/IMPALA-13289
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Riza Suminto
>Priority: Major
>
> Running TestExprLimits::test_under_statement_expression_limit with codegen 
> enabled reveals big memory consumption and long codegen compilation time due 
> to very large expression size. Ideally, backend should recognize such case 
> and skip codegen to speed up query and cut memory consumption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-8030) Remove duplicate in-clause values for selectivity calcs

2024-08-16 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874359#comment-17874359
 ] 

Michael Smith edited comment on IMPALA-8030 at 8/16/24 7:33 PM:


Inconsistency analyzing new predicates in rewrite rules leads to failing to 
apply later rules. IMPALA-13302 addresses the simple case of
{code:java}
where c.c_custkey = 10 OR 10 = c.c_custkey {code}
via NormalizeBinaryPredicatesRule -> ExtractCommonConjunctRule. But the 
existing rewrite rules aren't able to handle {{{}(10, 20, 30, 30, 10, 20){}}}.


was (Author: JIRAUSER288956):
Inconsistency analyzing new predicates in rewrite rules leads to failing to 
apply later rules.

> Remove duplicate in-clause values for selectivity calcs
> ---
>
> Key: IMPALA-8030
> URL: https://issues.apache.org/jira/browse/IMPALA-8030
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Minor
>
> If an IN clause has duplicate values, they should be removed so that 
> selectivity estimates are based only on unique values.
> {noformat}
> select *
> from tpch.customer c
> where c.c_custkey in (10, 20, 30, 30, 10, 20)
>  PLAN
> PLAN-ROOT SINK
> |
> 00:SCAN HDFS [tpch.customer c]
>partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=6
>predicates: c.c_custkey IN (10, 20, 30, 30, 10, 20)
> {noformat}
> Expected:
> {noformat}
> 00:SCAN HDFS [tpch.customer c]
>partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=3
> {noformat}
> Notice that in the current version, we treat each value, duplicate or not, as 
> a match. In the expected result, we notice that duplicate values match only 
> once and we return matches for the unique values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8030) Remove duplicate in-clause values for selectivity calcs

2024-08-16 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874359#comment-17874359
 ] 

Michael Smith commented on IMPALA-8030:
---

Inconsistency analyzing new predicates in rewrite rules leads to failing to 
apply later rules.

> Remove duplicate in-clause values for selectivity calcs
> ---
>
> Key: IMPALA-8030
> URL: https://issues.apache.org/jira/browse/IMPALA-8030
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Minor
>
> If an IN clause has duplicate values, they should be removed so that 
> selectivity estimates are based only on unique values.
> {noformat}
> select *
> from tpch.customer c
> where c.c_custkey in (10, 20, 30, 30, 10, 20)
>  PLAN
> PLAN-ROOT SINK
> |
> 00:SCAN HDFS [tpch.customer c]
>partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=6
>predicates: c.c_custkey IN (10, 20, 30, 30, 10, 20)
> {noformat}
> Expected:
> {noformat}
> 00:SCAN HDFS [tpch.customer c]
>partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=3
> {noformat}
> Notice that in the current version, we treat each value, duplicate or not, as 
> a match. In the expected result, we notice that duplicate values match only 
> once and we return matches for the unique values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13301) Upgrade aircompressor

2024-08-16 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13301.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Upgrade aircompressor
> -
>
> Key: IMPALA-13301
> URL: https://issues.apache.org/jira/browse/IMPALA-13301
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Upgrade aircompressor to 0.27+ to address CVE-2024-36114.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13298) TestRPCTimeout.test_miss_complete_cb: RPC Failed: Could Not Connect

2024-08-15 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13298.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> TestRPCTimeout.test_miss_complete_cb: RPC Failed: Could Not Connect
> ---
>
> Key: IMPALA-13298
> URL: https://issues.apache.org/jira/browse/IMPALA-13298
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Jason Fehr
>Assignee: Jason Fehr
>Priority: Minor
>  Labels: broken-build, flaky-test
> Fix For: Impala 4.5.0
>
>
> Custom cluster test assert failure:
> Error:
> {noformat}
> assert 'Query aborted' in 'Query dc4af3c7143ae570:142a80e3 
> failed:\nExec() rpc failed: Remote error: Runtime error: Debug Action: 
> IMPALA_SERVICE_POOL:FAIL\n\n'  +  where 'Query 
> dc4af3c7143ae570:142a80e3 failed:\nExec() rpc failed: Remote error: 
> Runtime error: Debug Action: IMPALA_SERVICE_POOL:FAIL\n\n' = 
> str(ImpalaBeeswaxException())
> {noformat}
> Stack Trace:
> {noformat}
> custom_cluster/test_rpc_timeout.py:231: in test_miss_complete_cb
> assert "Query aborted" in str(ex)
> E   assert 'Query aborted' in 'Query dc4af3c7143ae570:142a80e3 
> failed:\nExec() rpc failed: Remote error: Runtime error: Debug Action: 
> IMPALA_SERVICE_POOL:FAIL\n\n'
> E+  where 'Query dc4af3c7143ae570:142a80e3 failed:\nExec() rpc 
> failed: Remote error: Runtime error: Debug Action: 
> IMPALA_SERVICE_POOL:FAIL\n\n' = str(ImpalaBeeswaxException())
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13298) TestRPCTimeout.test_miss_complete_cb: RPC Failed: Could Not Connect

2024-08-15 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13298:
---
Affects Version/s: Impala 4.5.0

> TestRPCTimeout.test_miss_complete_cb: RPC Failed: Could Not Connect
> ---
>
> Key: IMPALA-13298
> URL: https://issues.apache.org/jira/browse/IMPALA-13298
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.5.0
>Reporter: Jason Fehr
>Assignee: Jason Fehr
>Priority: Minor
>  Labels: broken-build, flaky-test
> Fix For: Impala 4.5.0
>
>
> Custom cluster test assert failure:
> Error:
> {noformat}
> assert 'Query aborted' in 'Query dc4af3c7143ae570:142a80e3 
> failed:\nExec() rpc failed: Remote error: Runtime error: Debug Action: 
> IMPALA_SERVICE_POOL:FAIL\n\n'  +  where 'Query 
> dc4af3c7143ae570:142a80e3 failed:\nExec() rpc failed: Remote error: 
> Runtime error: Debug Action: IMPALA_SERVICE_POOL:FAIL\n\n' = 
> str(ImpalaBeeswaxException())
> {noformat}
> Stack Trace:
> {noformat}
> custom_cluster/test_rpc_timeout.py:231: in test_miss_complete_cb
> assert "Query aborted" in str(ex)
> E   assert 'Query aborted' in 'Query dc4af3c7143ae570:142a80e3 
> failed:\nExec() rpc failed: Remote error: Runtime error: Debug Action: 
> IMPALA_SERVICE_POOL:FAIL\n\n'
> E+  where 'Query dc4af3c7143ae570:142a80e3 failed:\nExec() rpc 
> failed: Remote error: Runtime error: Debug Action: 
> IMPALA_SERVICE_POOL:FAIL\n\n' = str(ImpalaBeeswaxException())
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13115) Always add the query id in the error message to clients

2024-08-15 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13115.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Always add the query id in the error message to clients
> ---
>
> Key: IMPALA-13115
> URL: https://issues.apache.org/jira/browse/IMPALA-13115
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Xuebin Su
>Priority: Critical
>  Labels: ramp-up
> Fix For: Impala 4.5.0
>
>
> We have some errors like "Failed due to unreachable impalad(s)". We should 
> improve them to mention the query id, e.g. "Query ${query_id} failed due to 
> unreachable impalad(s)". In a busy cluster, queries are flushed out quickly 
> in the /queries page. Coordinator logs are also flushed out quickly. It's 
> hard to find the query id there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Reopened] (IMPALA-12554) Create only one Ranger policy for GRANT statement

2024-08-15 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reopened IMPALA-12554:


> Create only one Ranger policy for GRANT statement
> -
>
> Key: IMPALA-12554
> URL: https://issues.apache.org/jira/browse/IMPALA-12554
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Currently Impala would create a Ranger policy for each column specified in a 
> GRANT statement. For instance, after the following query, 3 Ranger policies 
> would be created on the Ranger server. This could result in a lot of policies 
> created when there are many columns specified and it may result in Impala's 
> Ranger plug-in taking a long time to download the policies from the Ranger 
> server. It would be great if Impala only creates one single policy for 
> columns in the same table.
> {code:java}
> [localhost:21050] default> grant select(id, bool_col, tinyint_col) on table 
> functional.alltypes to user non_owner;
> Query: grant select(id, bool_col, tinyint_col) on table functional.alltypes 
> to user non_owner
> Query submitted at: 2023-11-10 09:38:58 (Coordinator: http://fangyu:25000)
> Query progress can be monitored at: 
> http://fangyu:25000/query_plan?query_id=bc4fa1cdefe5881b:413d9a69
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.67s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13203) ExprRewriter did not rewrite 'id = 0 OR false' as expected

2024-08-15 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874052#comment-17874052
 ] 

Michael Smith commented on IMPALA-13203:


This appears to be related to IMPALA-13302. I think we see both that the 
expression is not appropriately simplified, and can run into "Illegal reference 
to non-materialized slot" on an unrelated expression.

>  ExprRewriter did not rewrite 'id = 0 OR false' as expected
> ---
>
> Key: IMPALA-13203
> URL: https://issues.apache.org/jira/browse/IMPALA-13203
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0, Impala 4.4.0
>Reporter: Zihao Ye
>Assignee: Zihao Ye
>Priority: Critical
> Fix For: Impala 4.5.0, Impala 4.4.1
>
> Attachments: query-reduced.sql, tables-reduced.sql
>
>
> The comments in the SimplifyConditionalsRule class mention that 'id = 0 OR 
> false' would be rewritten to 'id = 0', but in reality, it does not perform 
> this rewrite as expected. After executing such SQL, we can see in the text 
> plan that:
> {code:sql}
> Analyzed query: SELECT * FROM functional.alltypestiny WHERE FALSE OR id = 
> CAST(0
> AS INT) {code}
> The issue appears to be that the CompoundPredicate generated by 
> NormalizeExprsRule was not analyzed, causing the SimplifyConditionalsRule to 
> skip the rewrite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12164) The query fails with "IllegalStateException: Illegal reference to non-materialized slot: tid=x sid=x"

2024-08-15 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874051#comment-17874051
 ] 

Michael Smith commented on IMPALA-12164:


I filed IMPALA-13302 for the regression.

> The query fails with "IllegalStateException: Illegal reference to 
> non-materialized slot: tid=x sid=x"
> -
>
> Key: IMPALA-12164
> URL: https://issues.apache.org/jira/browse/IMPALA-12164
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.2
>Reporter: Zhi Tang
>Assignee: Zhi Tang
>Priority: Major
> Fix For: Impala 4.3.0
>
> Attachments: image-2023-05-25-14-26-43-507.png
>
>
> The query failed in the compile execution plan phase. The stacktrace in logs:
> !image-2023-05-25-14-26-43-507.png|width=564,height=211!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-13302) Some ExprRewriteRule results are not analyzed, leading to unmaterialized slots from reAnalyze

2024-08-15 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13302 started by Michael Smith.
--
> Some ExprRewriteRule results are not analyzed, leading to unmaterialized 
> slots from reAnalyze
> -
>
> Key: IMPALA-13302
> URL: https://issues.apache.org/jira/browse/IMPALA-13302
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>
> IMPALA-12164 skipped registering conjuncts that the analyzer expects to 
> remove because an earlier conjunct evaluates to constant False. However some 
> ExprRewriteRules don't analyze the predicates they produce, which can lead to 
> those conjuncts not actually being removed until a reAnalyze phase.
> reAnalyze uses a new Analyzer (with new GlobalState); it restarts counting 
> Expr IDs from 0. That can lead to re-using the same Expr ID and marking it as 
> assigned. Then when a new Expr gets the same ID, it will skip materializing 
> slots, which can cause problems later (like if that Expr is part of a hash 
> join).
> Some example queries:
> {code}
> WITH v AS (SELECT 1 FROM functional.alltypestiny t1
>   JOIN functional.alltypestiny t2 ON t1.id = t2.id)
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}
> (already fixed via IMPALA-13203) and
> {code}
> WITH v as (SELECT 1 FROM functional.alltypes t1
>   JOIN functional.alltypes t2 ON t1.id = t2.id)
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND t1.id = 1 AND t1.id = 1 AND false
> UNION ALL
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND false
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13302) Some ExprRewriteRule results are not analyzed, leading to unmaterialized slots from reAnalyze

2024-08-15 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874040#comment-17874040
 ] 

Michael Smith commented on IMPALA-13302:


There are a couple ways we could fix this. It would be pretty straight-forward 
to remove the optimization from IMPALA-12164 that skips registering some 
conjuncts. However I think the more thorough fix would be to ensure the result 
of all rewrite rules are analyzed, and add some additional testing around these 
cases.

Either way, I plan to add a Precondition to catch these cases more easily 
(without requiring quite as much serendipity in how we write the test cases).

> Some ExprRewriteRule results are not analyzed, leading to unmaterialized 
> slots from reAnalyze
> -
>
> Key: IMPALA-13302
> URL: https://issues.apache.org/jira/browse/IMPALA-13302
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>
> IMPALA-12164 skipped registering conjuncts that the analyzer expects to 
> remove because an earlier conjunct evaluates to constant False. However some 
> ExprRewriteRules don't analyze the predicates they produce, which can lead to 
> those conjuncts not actually being removed until a reAnalyze phase.
> reAnalyze uses a new Analyzer (with new GlobalState); it restarts counting 
> Expr IDs from 0. That can lead to re-using the same Expr ID and marking it as 
> assigned. Then when a new Expr gets the same ID, it will skip materializing 
> slots, which can cause problems later (like if that Expr is part of a hash 
> join).
> Some example queries:
> {code}
> WITH v AS (SELECT 1 FROM functional.alltypestiny t1
>   JOIN functional.alltypestiny t2 ON t1.id = t2.id)
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}
> (already fixed via IMPALA-13203) and
> {code}
> WITH v as (SELECT 1 FROM functional.alltypes t1
>   JOIN functional.alltypes t2 ON t1.id = t2.id)
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND t1.id = 1 AND t1.id = 1 AND false
> UNION ALL
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND false
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13302) Some ExprRewriteRule results are not analyzed, leading to unmaterialized slots from reAnalyze

2024-08-15 Thread Michael Smith (Jira)

Michael Smith created IMPALA-13302:
--

 Summary: Some ExprRewriteRule results are not analyzed, leading to 
unmaterialized slots from reAnalyze
 Key: IMPALA-13302
 URL: https://issues.apache.org/jira/browse/IMPALA-13302
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.3.0
Reporter: Michael Smith


IMPALA-12164 skipped registering conjuncts that the analyzer expects to remove 
because an earlier conjunct evaluates to constant False. However some 
ExprRewriteRules don't analyze the predicates they produce, which can lead to 
those conjuncts not actually being removed until a reAnalyze phase.

reAnalyze uses a new Analyzer (with new GlobalState); it restarts counting Expr 
IDs from 0. That can lead to re-using the same Expr ID and marking it as 
assigned. Then when a new Expr gets the same ID, it will skip materializing 
slots, which can cause problems later (like if that Expr is part of a hash 
join).

Some example queries:
{code}
WITH v AS (SELECT 1 FROM functional.alltypestiny t1
  JOIN functional.alltypestiny t2 ON t1.id = t2.id)
SELECT 1
FROM functional.alltypestiny t1
WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
  AND t1.id = 1 AND t1.id = 1
UNION ALL
SELECT 1
FROM functional.alltypestiny t1
WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
  AND t1.id = 1 AND t1.id = 1
UNION ALL SELECT 1 FROM v
UNION ALL SELECT 1 FROM v;
{code}
(already fixed via IMPALA-13203) and
{code}
WITH v as (SELECT 1 FROM functional.alltypes t1
  JOIN functional.alltypes t2 ON t1.id = t2.id)
SELECT 1 FROM functional.alltypes t1
  WHERE t1.id = 1 AND t1.id = 1 AND t1.id = 1 AND false
UNION ALL
SELECT 1 FROM functional.alltypes t1
  WHERE t1.id = 1 AND false
UNION ALL SELECT 1 FROM v
UNION ALL SELECT 1 FROM v;
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-13302) Some ExprRewriteRule results are not analyzed, leading to unmaterialized slots from reAnalyze

2024-08-15 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13302:
--

Assignee: Michael Smith

> Some ExprRewriteRule results are not analyzed, leading to unmaterialized 
> slots from reAnalyze
> -
>
> Key: IMPALA-13302
> URL: https://issues.apache.org/jira/browse/IMPALA-13302
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>
> IMPALA-12164 skipped registering conjuncts that the analyzer expects to 
> remove because an earlier conjunct evaluates to constant False. However some 
> ExprRewriteRules don't analyze the predicates they produce, which can lead to 
> those conjuncts not actually being removed until a reAnalyze phase.
> reAnalyze uses a new Analyzer (with new GlobalState); it restarts counting 
> Expr IDs from 0. That can lead to re-using the same Expr ID and marking it as 
> assigned. Then when a new Expr gets the same ID, it will skip materializing 
> slots, which can cause problems later (like if that Expr is part of a hash 
> join).
> Some example queries:
> {code}
> WITH v AS (SELECT 1 FROM functional.alltypestiny t1
>   JOIN functional.alltypestiny t2 ON t1.id = t2.id)
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL
> SELECT 1
> FROM functional.alltypestiny t1
> WHERE ((t1.id = 1 and false) or (t1.id = 1 and false))
>   AND t1.id = 1 AND t1.id = 1
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}
> (already fixed via IMPALA-13203) and
> {code}
> WITH v as (SELECT 1 FROM functional.alltypes t1
>   JOIN functional.alltypes t2 ON t1.id = t2.id)
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND t1.id = 1 AND t1.id = 1 AND false
> UNION ALL
> SELECT 1 FROM functional.alltypes t1
>   WHERE t1.id = 1 AND false
> UNION ALL SELECT 1 FROM v
> UNION ALL SELECT 1 FROM v;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13203) ExprRewriter did not rewrite 'id = 0 OR false' as expected

2024-08-15 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13203:
---
Affects Version/s: Impala 4.3.0

>  ExprRewriter did not rewrite 'id = 0 OR false' as expected
> ---
>
> Key: IMPALA-13203
> URL: https://issues.apache.org/jira/browse/IMPALA-13203
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0, Impala 4.4.0
>Reporter: Zihao Ye
>Assignee: Zihao Ye
>Priority: Critical
> Fix For: Impala 4.5.0, Impala 4.4.1
>
> Attachments: query-reduced.sql, tables-reduced.sql
>
>
> The comments in the SimplifyConditionalsRule class mention that 'id = 0 OR 
> false' would be rewritten to 'id = 0', but in reality, it does not perform 
> this rewrite as expected. After executing such SQL, we can see in the text 
> plan that:
> {code:sql}
> Analyzed query: SELECT * FROM functional.alltypestiny WHERE FALSE OR id = 
> CAST(0
> AS INT) {code}
> The issue appears to be that the CompoundPredicate generated by 
> NormalizeExprsRule was not analyzed, causing the SimplifyConditionalsRule to 
> skip the rewrite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12164) The query fails with "IllegalStateException: Illegal reference to non-materialized slot: tid=x sid=x"

2024-08-15 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874035#comment-17874035
 ] 

Michael Smith commented on IMPALA-12164:


Alright, analyzing everything even if they're going to be removed used to mask 
cases where ExprRewriteRules didn't analyze the new predicates they created. So 
this patch now causes any of those that modify the expression (but don't 
analyze it) in a way that will result in reducing the number of conjuncts on 
re-analysis (usually because the expression will end up evaluating to false) to 
re-use Expr IDs and cause new expressions to not materialize their slots.

I can go through on a case-by-case basis and fix up the ExprRewriteRules, or 
modify registerConjuncts to continue analyzing all conjuncts before marking 
them assigned. I plan to add a Precondition to detect this earlier when we do 
run into possible conflicts (if we try to markConjunctAssigned a conjunct that 
wasn't registered with the current analyzer), but that will only detect them if 
we test the relevant query.

> The query fails with "IllegalStateException: Illegal reference to 
> non-materialized slot: tid=x sid=x"
> -
>
> Key: IMPALA-12164
> URL: https://issues.apache.org/jira/browse/IMPALA-12164
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.2
>Reporter: Zhi Tang
>Assignee: Zhi Tang
>Priority: Major
> Fix For: Impala 4.3.0
>
> Attachments: image-2023-05-25-14-26-43-507.png
>
>
> The query failed in the compile execution plan phase. The stacktrace in logs:
> !image-2023-05-25-14-26-43-507.png|width=564,height=211!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-13301) Upgrade aircompressor

2024-08-15 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13301 started by Michael Smith.
--
> Upgrade aircompressor
> -
>
> Key: IMPALA-13301
> URL: https://issues.apache.org/jira/browse/IMPALA-13301
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Upgrade aircompressor to 0.27+ to address CVE-2024-36114.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13207) The 'sys' database for system tables blacklisted by default

2024-08-14 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13207.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> The 'sys' database for system tables blacklisted by default
> ---
>
> Key: IMPALA-13207
> URL: https://issues.apache.org/jira/browse/IMPALA-13207
> Project: IMPALA
>  Issue Type: Bug
>Reporter: YifanZhang
>Assignee: Jason Fehr
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> When enabling `enable_workload_mgmt` in a cluster, the impalad failed to 
> start with a FATAL error:
> {code:java}
> F0710 17:46:51.211848 456937 workload-management.cc:346] AnalysisException: 
> Database does not exist: sys. Impalad exiting. {code}
> There was also a db creation error in the log before this error happened:
> {code:java}
> I0710 17:46:51.169672 456951 client-request-state.cc:1348] 
> 744c895a3184553c:0759f540] IllegalStateException: Can't create 
> blacklisted database: sys. --blacklisted_dbs may be inconsistent between 
> catalogd and coordinators {code}
> We must change the default value of the impalad/catalogd flag 
> 'blacklisted_dbs' to enable the query log table. Maybe we should add a 
> validator for 'enable_workload_mgmt' to avoid any conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13274) Crash in impala::RowDescriptor::TupleIsNullable(int)

2024-08-14 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873649#comment-17873649
 ] 

Michael Smith commented on IMPALA-13274:


Do we know if this is a regression, and if so what caused it?

> Crash in impala::RowDescriptor::TupleIsNullable(int)
> 
>
> Key: IMPALA-13274
> URL: https://issues.apache.org/jira/browse/IMPALA-13274
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Zhi Tang
>Priority: Critical
> Attachments: image-2024-08-05-17-48-01-975.png
>
>
> Log information:
>   #
>   # A fatal error has been detected by the Java Runtime Environment:
>   #
>   #  SIGSEGV (0xb) at pc=0x011dcad9, pid=63990, tid=0x7f4137874700
>   #
>   # JRE version: Java(TM) SE Runtime Environment (8.0_152-b16) (build 
> 1.8.0_152-b16)
>   # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.152-b16 mixed mode 
> linux-amd64 compressed oops)
>   # Problematic frame:
>   # C  [impalad+0xddcad9]  impala::RowDescriptor::TupleIsNullable(int) 
> const+0x19
>   #
>   # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
>   #
>   # An error report file with more information is saved as:
>   # //hs_err_pid63990.log
>   #
>   # If you would like to submit a bug report, please visit:
>   #   [http://bugreport.java.com/bugreport/crash.jsp]
>   #



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12164) The query fails with "IllegalStateException: Illegal reference to non-materialized slot: tid=x sid=x"

2024-08-13 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873403#comment-17873403
 ] 

Michael Smith commented on IMPALA-12164:


This seems to have introduced a regression (reproduction at 
https://issues.apache.org/jira/browse/IMPALA-13203?focusedCommentId=17873302&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17873302)
{code}
ERROR: IllegalStateException: Illegal reference to non-materialized slot: 
tid=87 sid=680
{code}

IMPALA-13203 seems to address it. I haven't yet determined whether that's 
sufficient, the failure requires pretty specific queries to trigger.

> The query fails with "IllegalStateException: Illegal reference to 
> non-materialized slot: tid=x sid=x"
> -
>
> Key: IMPALA-12164
> URL: https://issues.apache.org/jira/browse/IMPALA-12164
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.2
>Reporter: Zhi Tang
>Assignee: Zhi Tang
>Priority: Major
> Fix For: Impala 4.3.0
>
> Attachments: image-2023-05-25-14-26-43-507.png
>
>
> The query failed in the compile execution plan phase. The stacktrace in logs:
> !image-2023-05-25-14-26-43-507.png|width=564,height=211!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13203) ExprRewriter did not rewrite 'id = 0 OR false' as expected

2024-08-13 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13203:
---
Attachment: tables-reduced.sql

>  ExprRewriter did not rewrite 'id = 0 OR false' as expected
> ---
>
> Key: IMPALA-13203
> URL: https://issues.apache.org/jira/browse/IMPALA-13203
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Zihao Ye
>Assignee: Zihao Ye
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: query-reduced.sql, tables-reduced.sql
>
>
> The comments in the SimplifyConditionalsRule class mention that 'id = 0 OR 
> false' would be rewritten to 'id = 0', but in reality, it does not perform 
> this rewrite as expected. After executing such SQL, we can see in the text 
> plan that:
> {code:sql}
> Analyzed query: SELECT * FROM functional.alltypestiny WHERE FALSE OR id = 
> CAST(0
> AS INT) {code}
> The issue appears to be that the CompoundPredicate generated by 
> NormalizeExprsRule was not analyzed, causing the SimplifyConditionalsRule to 
> skip the rewrite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13203) ExprRewriter did not rewrite 'id = 0 OR false' as expected

2024-08-13 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13203:
---
Attachment: query-reduced.sql

>  ExprRewriter did not rewrite 'id = 0 OR false' as expected
> ---
>
> Key: IMPALA-13203
> URL: https://issues.apache.org/jira/browse/IMPALA-13203
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Zihao Ye
>Assignee: Zihao Ye
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: query-reduced.sql, tables-reduced.sql
>
>
> The comments in the SimplifyConditionalsRule class mention that 'id = 0 OR 
> false' would be rewritten to 'id = 0', but in reality, it does not perform 
> this rewrite as expected. After executing such SQL, we can see in the text 
> plan that:
> {code:sql}
> Analyzed query: SELECT * FROM functional.alltypestiny WHERE FALSE OR id = 
> CAST(0
> AS INT) {code}
> The issue appears to be that the CompoundPredicate generated by 
> NormalizeExprsRule was not analyzed, causing the SimplifyConditionalsRule to 
> skip the rewrite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13203) ExprRewriter did not rewrite 'id = 0 OR false' as expected

2024-08-13 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13203:
---
Attachment: (was: query-reduced.sql)

>  ExprRewriter did not rewrite 'id = 0 OR false' as expected
> ---
>
> Key: IMPALA-13203
> URL: https://issues.apache.org/jira/browse/IMPALA-13203
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Zihao Ye
>Assignee: Zihao Ye
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> The comments in the SimplifyConditionalsRule class mention that 'id = 0 OR 
> false' would be rewritten to 'id = 0', but in reality, it does not perform 
> this rewrite as expected. After executing such SQL, we can see in the text 
> plan that:
> {code:sql}
> Analyzed query: SELECT * FROM functional.alltypestiny WHERE FALSE OR id = 
> CAST(0
> AS INT) {code}
> The issue appears to be that the CompoundPredicate generated by 
> NormalizeExprsRule was not analyzed, causing the SimplifyConditionalsRule to 
> skip the rewrite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13203) ExprRewriter did not rewrite 'id = 0 OR false' as expected

2024-08-13 Thread Michael Smith (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873302#comment-17873302
 ] 

Michael Smith commented on IMPALA-13203:


Here's a very simplified version of the case I ran into. I'm not sure yet why 
this particular patch fixed it. I suspect an expression rewrite moves some 
other generated construct under an 'id = 0 AND false' clause and analyzing that 
subtree happens to fix it. Prepare with  [^tables-reduced.sql], then query  
[^query-reduced.sql].

>  ExprRewriter did not rewrite 'id = 0 OR false' as expected
> ---
>
> Key: IMPALA-13203
> URL: https://issues.apache.org/jira/browse/IMPALA-13203
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Zihao Ye
>Assignee: Zihao Ye
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: query-reduced.sql, tables-reduced.sql
>
>
> The comments in the SimplifyConditionalsRule class mention that 'id = 0 OR 
> false' would be rewritten to 'id = 0', but in reality, it does not perform 
> this rewrite as expected. After executing such SQL, we can see in the text 
> plan that:
> {code:sql}
> Analyzed query: SELECT * FROM functional.alltypestiny WHERE FALSE OR id = 
> CAST(0
> AS INT) {code}
> The issue appears to be that the CompoundPredicate generated by 
> NormalizeExprsRule was not analyzed, causing the SimplifyConditionalsRule to 
> skip the rewrite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13203) ExprRewriter did not rewrite 'id = 0 OR false' as expected

2024-08-13 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13203:
---
Attachment: (was: tables-reduced.sql)

>  ExprRewriter did not rewrite 'id = 0 OR false' as expected
> ---
>
> Key: IMPALA-13203
> URL: https://issues.apache.org/jira/browse/IMPALA-13203
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Zihao Ye
>Assignee: Zihao Ye
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> The comments in the SimplifyConditionalsRule class mention that 'id = 0 OR 
> false' would be rewritten to 'id = 0', but in reality, it does not perform 
> this rewrite as expected. After executing such SQL, we can see in the text 
> plan that:
> {code:sql}
> Analyzed query: SELECT * FROM functional.alltypestiny WHERE FALSE OR id = 
> CAST(0
> AS INT) {code}
> The issue appears to be that the CompoundPredicate generated by 
> NormalizeExprsRule was not analyzed, causing the SimplifyConditionalsRule to 
> skip the rewrite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1738 matches

Mail list logo