[jira] [Created] (IMPALA-12398) Ranger role not exists when altering db/table/view owner to a role

2023-08-22 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-12398:
---

 Summary: Ranger role not exists when altering db/table/view owner 
to a role
 Key: IMPALA-12398
 URL: https://issues.apache.org/jira/browse/IMPALA-12398
 Project: IMPALA
  Issue Type: Bug
  Components: Security
Reporter: Quanlong Huang


To reproduce the issue, start Impala cluster with Ranger authorization enabled:
{code:bash}
bin/start-impala-cluster.py --impalad_args="--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger" --catalogd_args="--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger"
{code}
Create a role "hql_test" and a temp table "tmp_tbl", then set the owner of it 
to the role:
{code:sql}
$ impala-shell.sh -u admin
default> create table tmp_tbl(id int);
default> create role hql_test;
default> alter table tmp_tbl set owner role hql_test;
Query: alter table tmp_tbl set owner role hql_test
ERROR: AnalysisException: Role 'hql_test' does not exist.
{code}
However, SHOW ROLES can show the role:
{code:sql}
default> show roles;
Query: show roles
+---+
| role_name |
+---+
| hql_test  |
+---+
Fetched 1 row(s) in 0.01s
{code}
Ranger roles are not loaded in Impala's catalog cache. We should either load 
them or use RangerPlugin to check existence of a role. Code snipper of the role 
check:
{code:java}
if (analyzer.isAuthzEnabled() && owner_.getOwnerType() == TOwnerType.ROLE
&& analyzer.getCatalog().getAuthPolicy().getRole(ownerName) == null) {
  throw new AnalysisException(String.format("Role '%s' does not exist.", 
ownerName));
}
{code}
https://github.com/apache/impala/blob/08501cef2df16991bbd99656c696b978f08aeebe/fe/src/main/java/org/apache/impala/analysis/AlterTableOrViewSetOwnerStmt.java#L56

CC [~fangyurao]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12397) NullPointerException in SHOW ROLES when there are no roles

2023-08-22 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-12397:
---

 Summary: NullPointerException in SHOW ROLES when there are no roles
 Key: IMPALA-12397
 URL: https://issues.apache.org/jira/browse/IMPALA-12397
 Project: IMPALA
  Issue Type: Bug
  Components: Security
Reporter: Quanlong Huang


When there are no roles in Ranger, SHOW ROLES statement hits 
NullPointerException:
{noformat}
Query: show roles
ERROR: InternalException: Error executing SHOW ROLES. Ranger error message: null
{noformat}
The cause is 'roles' here is null:
{code:java}
Set roles = plugin_.get().getRoles().getRangerRoles();
roleNames = roles.stream().map(RangerRole::getName).collect(Collectors.toSet());
{code}
https://github.com/apache/impala/blob/08501cef2df16991bbd99656c696b978f08aeebe/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L135-L136

To reproduce this, start Impala cluster with Ranger authorization:
{code:bash}
bin/start-impala-cluster.py --impalad_args="--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger" --catalogd_args="--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger"
{code}
At the begining, there are no roles in Ranger. Run "SHOW ROLES" in Impala to 
reproduce the error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12396) Inconsistent error messages between creating hdfs and kudu/iceberg tables when table exists in HMS

2023-08-22 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-12396:
---

 Summary: Inconsistent error messages between creating hdfs and 
kudu/iceberg tables when table exists in HMS
 Key: IMPALA-12396
 URL: https://issues.apache.org/jira/browse/IMPALA-12396
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang


When creating a kudu/iceberg table, we check whether it exists in HMS before 
invoking the createTable HMS RPC:
https://github.com/apache/impala/blob/08501cef2df16991bbd99656c696b978f08aeebe/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3483
https://github.com/apache/impala/blob/08501cef2df16991bbd99656c696b978f08aeebe/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3714

However, when creating a hdfs table, we just invoke the createTable RPC (when 
the table is not in catalog cache):
https://github.com/apache/impala/blob/08501cef2df16991bbd99656c696b978f08aeebe/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3563

This results in different error messages when the table does exist in HMS but 
not exist in catalog cache. E.g. I create a table in Hive and recreate it in an 
Impala cluster that has disabled HMS event processing, the error message is
{noformat}
Query: create table hive_tbl(id int)
ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
Metastore: 
CAUSED BY: AlreadyExistsException: Table hive.default.hive_tbl already exists
{noformat}
Creating the same table in kudu format, got different error message:
{noformat}
Query: create table hive_tbl (id int, name string, primary key(id)) partition 
by hash(id) partitions 3 stored as kudu
+---+
| summary   |
+---+
| Table already exists. |
+---+
Fetched 1 row(s) in 1.63s
{noformat}
We can add the same check in creating HDFS tables and provide the same error 
message.
BTW, we might need to mention Metastore: "Table already exists in Metastore".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10173) Allow implicit casts between numeric and string types when inserting into table

2023-08-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757745#comment-17757745
 ] 

ASF subversion and git services commented on IMPALA-10173:
--

Commit 08501cef2df16991bbd99656c696b978f08aeebe in impala's branch 
refs/heads/master from Peter Rozsa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=08501cef2 ]

IMPALA-12384: Restore NullLiteral's uncheckedCastTo function signature

This change restores NullLiteral's uncheckedCastTo function's signature
to preserve the external compatibility of the method and make it conform
with changes regarding IMPALA-10173.

Change-Id: Id9c01129d3cdcaeb222ea910521704ce2305fd2e
Reviewed-on: http://gerrit.cloudera.org:8080/20376
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Allow implicit casts between numeric and string types when inserting into 
> table
> ---
>
> Key: IMPALA-10173
> URL: https://issues.apache.org/jira/browse/IMPALA-10173
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Peter Rozsa
>Priority: Minor
>  Labels: 2023Q1, ramp-up, sql-language, supportability
>
> Impala is somewhat stricter than other engines such as Hive when it comes 
> into implicit casts. This avoids a lot of ambiguity and edge cases with 
> complex SQL, but we could consider loosening it for simple cases like 
> inserting into a table where the meaning/intent is pretty straightforward.
> Repro
> {code}
> CREATE TABLE iobt (   c0 FLOAT ) ;
> INSERT INTO iobt(c0) VALUES ('0'), (1562998803);
> {code}
> Error
> {code}
> AnalysisException: Incompatible return types 'STRING' and 'INT' of exprs 
> ''0'' and '1562998803'.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12384) Restore NullLiteral's uncheckedCastTo function signature

2023-08-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757744#comment-17757744
 ] 

ASF subversion and git services commented on IMPALA-12384:
--

Commit 08501cef2df16991bbd99656c696b978f08aeebe in impala's branch 
refs/heads/master from Peter Rozsa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=08501cef2 ]

IMPALA-12384: Restore NullLiteral's uncheckedCastTo function signature

This change restores NullLiteral's uncheckedCastTo function's signature
to preserve the external compatibility of the method and make it conform
with changes regarding IMPALA-10173.

Change-Id: Id9c01129d3cdcaeb222ea910521704ce2305fd2e
Reviewed-on: http://gerrit.cloudera.org:8080/20376
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Restore NullLiteral's uncheckedCastTo function signature
> 
>
> Key: IMPALA-12384
> URL: https://issues.apache.org/jira/browse/IMPALA-12384
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.3.0
>Reporter: Peter Rozsa
>Assignee: Peter Rozsa
>Priority: Minor
>
> NullLiteral's uncheckedCastTo function should preserve its signature as it 
> was before IMPALA-10173 to maintain its external compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12395) Planner overestimates scan cardinality for queries using count star optimization

2023-08-22 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto reassigned IMPALA-12395:
-

Assignee: Riza Suminto

> Planner overestimates scan cardinality for queries using count star 
> optimization
> 
>
> Key: IMPALA-12395
> URL: https://issues.apache.org/jira/browse/IMPALA-12395
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Reporter: David Rorke
>Assignee: Riza Suminto
>Priority: Major
>
> The scan cardinality estimate for count(*) queries doesn't account for the 
> fact that the count(*) optimization only scans metadata and not the actual 
> columns.
> Scan for a count(*) query on Parquet store_sales:
>  
> {noformat}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak 
> Mem Detail 
> -
> 00:SCAN S3 6 72 8s131ms 8s496ms 2.71K 8.64B 128.00 KB 88.00 MB 
> tpcds_3000_string_parquet_managed.store_sales
> {noformat}
>  
> This is a problem with all file/table formats that implement count(*) 
> optimizations (Parquet and also probably ORC and Iceberg).
> This problem is more serious than it was in the past because with 
> IMPALA-12091 we now rely on scan cardinality estimates for executor group 
> assignments so count(*) queries are likely to get assigned to a larger 
> executor group than needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12386) NullExpr substitution failure with unsafe casts enabled

2023-08-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757738#comment-17757738
 ] 

ASF subversion and git services commented on IMPALA-12386:
--

Commit c5ecd8e666e6dbdeba4fd9d25acb222eceaa240a in impala's branch 
refs/heads/master from Peter Rozsa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c5ecd8e66 ]

IMPALA-12386: Fix clone constructor in CastExpr

This commit addresses an issue in the CastExpr class where the clone
constructor was not properly preserving compatibility settings. The
clone constructor assigned the default compatibility regardless of the
source expression, causing substitution errors for partitioned tables.

Example:
  'insert into unsafe_insert_partitioned(int_col, string_col)
   values("1", null), (null, "1")'
Throws:
  ERROR: IllegalStateException: Failed analysis after expr substitution.
  CAUSED BY: IllegalStateException: cast STRING to INT

Tests:
  - new test case added to insert-unsafe.test

Change-Id: Iff64ce02539651fcb3a90db678f74467f582648f
Reviewed-on: http://gerrit.cloudera.org:8080/20385
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> NullExpr substitution failure with unsafe casts enabled
> ---
>
> Key: IMPALA-12386
> URL: https://issues.apache.org/jira/browse/IMPALA-12386
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.3.0
>Reporter: Peter Rozsa
>Assignee: Peter Rozsa
>Priority: Major
>
> insert into t01(a, b) values(null, "23"), ("21", null) query fails with the 
> following error:
> ERROR: IllegalStateException: Failed analysis after expr substitution. 
> CAUSED BY: IllegalStateException: cast STRING to INT
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5081) Expose IR optimization level via query option

2023-08-22 Thread Yida Wu (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757727#comment-17757727
 ] 

Yida Wu commented on IMPALA-5081:
-

Yeah, agree with that.

> Expose IR optimization level via query option
> -
>
> Key: IMPALA-5081
> URL: https://issues.apache.org/jira/browse/IMPALA-5081
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Michael Smith
>Priority: Minor
>  Labels: codegen
>
> Certain queries may spend a lot of time in the IR optimization. Currently, 
> there is a start-up option to disable optimization in LLVM. However, it may 
> be of inconvenience to users to have to restart the entire Impala cluster to 
> just use that option. This JIRA aims at exploring exposing a query option for 
> users to choose the optimization level for a given query (e.g. we can have a 
> level which just only have a dead code elimination pass or no optimization at 
> all).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12395) Planner overestimates scan cardinality for queries using count star optimization

2023-08-22 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757695#comment-17757695
 ] 

Riza Suminto commented on IMPALA-12395:
---

Previously reported at https://issues.apache.org/jira/browse/IMPALA-5851 

> Planner overestimates scan cardinality for queries using count star 
> optimization
> 
>
> Key: IMPALA-12395
> URL: https://issues.apache.org/jira/browse/IMPALA-12395
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Reporter: David Rorke
>Priority: Major
>
> The scan cardinality estimate for count(*) queries doesn't account for the 
> fact that the count(*) optimization only scans metadata and not the actual 
> columns.
> Scan for a count(*) query on Parquet store_sales:
>  
> {noformat}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak 
> Mem Detail 
> -
> 00:SCAN S3 6 72 8s131ms 8s496ms 2.71K 8.64B 128.00 KB 88.00 MB 
> tpcds_3000_string_parquet_managed.store_sales
> {noformat}
>  
> This is a problem with all file/table formats that implement count(*) 
> optimizations (Parquet and also probably ORC and Iceberg).
> This problem is more serious than it was in the past because with 
> IMPALA-12091 we now rely on scan cardinality estimates for executor group 
> assignments so count(*) queries are likely to get assigned to a larger 
> executor group than needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12377) Improve count star performance for external data source

2023-08-22 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-12377:
-
Description: The code to handle count(*) query in backend function 
DataSourceScanNode::GetNext() are not efficient. Even there are no column data 
returned from external data source, it still try to materialize rows and add 
rows to RowBatch one by one up to the number of row count.  It also call 
GetNextInputBatch() multiple times (count / batch_size), while  
GetNextInputBatch() invoke JNI function.(was: The code to handle 'select 
count(*)' in backend function DataSourceScanNode::GetNext() are not efficient. 
Even there are no column data returned from external data source, it still try 
to materialize rows and add rows to RowBatch one by one up to the number of row 
count.  It also call GetNextInputBatch() multiple times (count / batch_size), 
while  GetNextInputBatch() invoke JNI function.  )

> Improve count star performance for external data source
> ---
>
> Key: IMPALA-12377
> URL: https://issues.apache.org/jira/browse/IMPALA-12377
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> The code to handle count(*) query in backend function 
> DataSourceScanNode::GetNext() are not efficient. Even there are no column 
> data returned from external data source, it still try to materialize rows and 
> add rows to RowBatch one by one up to the number of row count.  It also call 
> GetNextInputBatch() multiple times (count / batch_size), while  
> GetNextInputBatch() invoke JNI function.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12377) Improve count star performance for external data source

2023-08-22 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-12377:
-
Summary: Improve count star performance for external data source  (was: 
Improve 'select count(*)' performance for external data source)

> Improve count star performance for external data source
> ---
>
> Key: IMPALA-12377
> URL: https://issues.apache.org/jira/browse/IMPALA-12377
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> The code to handle 'select count(*)' in backend function 
> DataSourceScanNode::GetNext() are not efficient. Even there are no column 
> data returned from external data source, it still try to materialize rows and 
> add rows to RowBatch one by one up to the number of row count.  It also call 
> GetNextInputBatch() multiple times (count / batch_size), while  
> GetNextInputBatch() invoke JNI function.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5081) Expose IR optimization level via query option

2023-08-22 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757689#comment-17757689
 ] 

Michael Smith commented on IMPALA-5081:
---

I'm tempted to do nothing for the moment, as it'll be an advanced option. If it 
looks promising after we've experimented for a bit, we can improve usability.

> Expose IR optimization level via query option
> -
>
> Key: IMPALA-5081
> URL: https://issues.apache.org/jira/browse/IMPALA-5081
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Michael Smith
>Priority: Minor
>  Labels: codegen
>
> Certain queries may spend a lot of time in the IR optimization. Currently, 
> there is a start-up option to disable optimization in LLVM. However, it may 
> be of inconvenience to users to have to restart the entire Impala cluster to 
> just use that option. This JIRA aims at exploring exposing a query option for 
> users to choose the optimization level for a given query (e.g. we can have a 
> level which just only have a dead code elimination pass or no optimization at 
> all).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12395) Planner overestimates scan cardinality for queries using count star optimization

2023-08-22 Thread David Rorke (Jira)
David Rorke created IMPALA-12395:


 Summary: Planner overestimates scan cardinality for queries using 
count star optimization
 Key: IMPALA-12395
 URL: https://issues.apache.org/jira/browse/IMPALA-12395
 Project: IMPALA
  Issue Type: Bug
  Components: fe
Reporter: David Rorke


The scan cardinality estimate for count(*) queries doesn't account for the fact 
that the count(*) optimization only scans metadata and not the actual columns.



Scan for a count(*) query on Parquet store_sales:

 
{noformat}
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem 
Detail 
-
00:SCAN S3 6 72 8s131ms 8s496ms 2.71K 8.64B 128.00 KB 88.00 MB 
tpcds_3000_string_parquet_managed.store_sales
{noformat}
 

This is a problem with all file/table formats that implement count(*) 
optimizations (Parquet and also probably ORC and Iceberg).

This problem is more serious than it was in the past because with IMPALA-12091 
we now rely on scan cardinality estimates for executor group assignments so 
count(*) queries are likely to get assigned to a larger executor group than 
needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5081) Expose IR optimization level via query option

2023-08-22 Thread Yida Wu (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757684#comment-17757684
 ] 

Yida Wu commented on IMPALA-5081:
-

[~MikaelSmith] Several solutions on my mind:
1. we have the optimization level in the key, so that multiple cache entries 
with different optimization levels of the fragment exist. (it may affect the 
hit rate by evicting other entries earlier and leaving some useless entries in 
the cache)
2. we have the optimization level in the cache entry content, replace when hits 
and if the current optimization level is different and better (not sure how to 
define better, or maybe replace with the latest). (would it be costly to update 
the entry?)
3. do nothing in the code. The user tries different optimization levels with 
codegen caching disabled for the given query, and find out the best solution. 
Then restart the server to refill the codegen caching with the best 
optimization level.

> Expose IR optimization level via query option
> -
>
> Key: IMPALA-5081
> URL: https://issues.apache.org/jira/browse/IMPALA-5081
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Michael Smith
>Priority: Minor
>  Labels: codegen
>
> Certain queries may spend a lot of time in the IR optimization. Currently, 
> there is a start-up option to disable optimization in LLVM. However, it may 
> be of inconvenience to users to have to restart the entire Impala cluster to 
> just use that option. This JIRA aims at exploring exposing a query option for 
> users to choose the optimization level for a given query (e.g. we can have a 
> level which just only have a dead code elimination pass or no optimization at 
> all).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-22 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757678#comment-17757678
 ] 

Joe McDonnell commented on IMPALA-12393:


For normal execution, I think this statement that clears the Tuple means that 
the padding will be consistent. From be/src/runtime/tuple.h:
{noformat}
  void Init(int size) { memset(this, 0, size); }{noformat}
That would mean this is test-only and not a real issue.

> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(, fixed_buffer_byte_size, 
> _encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-22 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757666#comment-17757666
 ] 

Joe McDonnell commented on IMPALA-12393:


I can't get this to impact inserting into a Parquet table. I'll downgrade this. 
I think there is also a question of performance of TimestampValue::Hash() vs 
doing a hash of the first 12 bytes.

> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(, fixed_buffer_byte_size, 
> _encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-22 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-12393:
---
Priority: Major  (was: Blocker)

> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(, fixed_buffer_byte_size, 
> _encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5081) Expose IR optimization level via query option

2023-08-22 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757658#comment-17757658
 ] 

Michael Smith commented on IMPALA-5081:
---

[~baggio000] any thoughts on how this should interact with codegen cache? We 
could make it part of the key so that if you use a different optimization level 
it won't get a cache hit (and run with a potentially slower optimized version).

> Expose IR optimization level via query option
> -
>
> Key: IMPALA-5081
> URL: https://issues.apache.org/jira/browse/IMPALA-5081
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Michael Smith
>Priority: Minor
>  Labels: codegen
>
> Certain queries may spend a lot of time in the IR optimization. Currently, 
> there is a start-up option to disable optimization in LLVM. However, it may 
> be of inconvenience to users to have to restart the entire Impala cluster to 
> just use that option. This JIRA aims at exploring exposing a query option for 
> users to choose the optimization level for a given query (e.g. we can have a 
> level which just only have a dead code elimination pass or no optimization at 
> all).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-22 Thread Joe McDonnell (Jira)


[ https://issues.apache.org/jira/browse/IMPALA-12393 ]


Joe McDonnell deleted comment on IMPALA-12393:


was (Author: joemcdonnell):
I can't get the Parquet writing to produce the issue, so maybe the padding is 
always zero somehow.

> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(, fixed_buffer_byte_size, 
> _encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-22 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757639#comment-17757639
 ] 

Joe McDonnell commented on IMPALA-12393:


I can't get the Parquet writing to produce the issue, so maybe the padding is 
always zero somehow.

> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(, fixed_buffer_byte_size, 
> _encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-22 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757624#comment-17757624
 ] 

Joe McDonnell commented on IMPALA-12393:


I see this as a bug. I'm not sure if it is actually user-visible.

The bug would come up when inserting timestamps into a Parquet table. If the 
hash is inconsistent, then the dictionary encoding sometimes treats two 
equivalent values as different. This prevents the dictionary encoding from 
working properly, so you can end up with a dictionary with multiple copies of 
the same value using different integer representations. This would result in 
larger Parquet files. I haven't confirmed this Parquet encoding case, but it 
seems possible. It would happen unless we always clear the memory so the 
padding is zero. I don't think we guarantee that.

> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(, fixed_buffer_byte_size, 
> _encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12394) Restore testing single-node planning for most tests

2023-08-22 Thread Michael Smith (Jira)
Michael Smith created IMPALA-12394:
--

 Summary: Restore testing single-node planning for most tests
 Key: IMPALA-12394
 URL: https://issues.apache.org/jira/browse/IMPALA-12394
 Project: IMPALA
  Issue Type: Task
Reporter: Michael Smith


ALL_CLUSTER_SIZES was redefined to be ALL_NODES_ONLY due to IMPALA-561. However 
it wasn't restored after IMPALA-561 was fixed, and now we've added tons of 
tests that might only work with ALL_NODES_ONLY.

Re-evaluate new tests with ALL_CLUSTER_SIZES and restore testing with single 
and distributed planning by updating ImpalaTestSuite#add_test_dimensions and 
test_dimensions.create_exec_option_dimension to use ALL_CLUSTER_SIZES.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-22 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757605#comment-17757605
 ] 

Michael Smith commented on IMPALA-12393:


Is this a bug, or an improvement? Since you didn't change the iteration 
earlier, this seems like a (small) perf improvement.

> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(, fixed_buffer_byte_size, 
> _encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-22 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned IMPALA-12393:
--

Assignee: Joe McDonnell

> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(, fixed_buffer_byte_size, 
> _encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12375) DataSource objects are not persistent

2023-08-22 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari updated IMPALA-12375:
---
Summary: DataSource objects are not persistent  (was: DataSource ojects are 
not persistent)

> DataSource objects are not persistent
> -
>
> Key: IMPALA-12375
> URL: https://issues.apache.org/jira/browse/IMPALA-12375
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Catalog, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> DataSource ojects which are created with "CREATE DATA SOURCE" statements are 
> not persistent.  The objects are not shown in "show data sources" after the 
> catalog server is restarted.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-5081) Expose IR optimization level via query option

2023-08-22 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-5081 started by Michael Smith.
-
> Expose IR optimization level via query option
> -
>
> Key: IMPALA-5081
> URL: https://issues.apache.org/jira/browse/IMPALA-5081
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Michael Smith
>Priority: Minor
>  Labels: codegen
>
> Certain queries may spend a lot of time in the IR optimization. Currently, 
> there is a start-up option to disable optimization in LLVM. However, it may 
> be of inconvenience to users to have to restart the entire Impala cluster to 
> just use that option. This JIRA aims at exploring exposing a query option for 
> users to choose the optimization level for a given query (e.g. we can have a 
> level which just only have a dead code elimination pass or no optimization at 
> all).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10474) Untracked memory is huge like memory leak

2023-08-22 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757569#comment-17757569
 ] 

Michael Smith commented on IMPALA-10474:


This is primarily in Thrift server. Would be worth checking whether newer 
releases of Thrift have addressed any memory leaks.

> Untracked memory is huge like memory leak
> -
>
> Key: IMPALA-10474
> URL: https://issues.apache.org/jira/browse/IMPALA-10474
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0, Impala 3.3.0, Impala 3.4.0
>Reporter: Xianqing He
>Priority: Major
> Attachments: image-2021-02-04-18-15-34-016.png, 
> image-2021-02-04-18-18-47-183.png
>
>
> In a production environment, when impala just started, the untracked memory 
> is huge, but now there is no query.
> !image-2021-02-04-18-15-34-016.png|width=476,height=237!
> In impalad.ERROR
> !image-2021-02-04-18-18-47-183.png|width=1083,height=200!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-22 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-12393:
--

 Summary: DictEncoder uses inconsistent hash function for 
TimestampValue
 Key: IMPALA-12393
 URL: https://issues.apache.org/jira/browse/IMPALA-12393
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.3.0
Reporter: Joe McDonnell


DictEncoder currently uses this hash function for TimestampValue:
{noformat}
template
inline uint32_t DictEncoder::Hash(const T& value) const {
  return HashUtil::Hash(, sizeof(value), 0);
}{noformat}
TimestampValue has some padding, and nothing ensures that the padding is 
cleared. This means that identical TimestampValue objects can hash to different 
values.

This came up when fixing a Clang-Tidy performance check. This line in 
dict-test.cc changed from iterating over values to iterating over const 
references.
{noformat}
  DictEncoder encoder(, fixed_buffer_byte_size, 
_encoder);
  encoder.UsedbyTest();
<<
  for (InternalType i: values) encoder.Put(i);
=
  for (const InternalType& i: values) encoder.Put(i);
>
  bytes_alloc = encoder.DictByteSize();
  EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
  EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
The test became flaky, with the encoder.num_entries() being larger than the 
values_set.size() for TimestampValue. This happened because the hash values 
didn't match even for identical entries and the dictionary would have multiple 
copies of the same value. When iterating over a plain non-reference 
TimestampValue, each TimestampValue is being copied to a temporary value. Maybe 
in this circumstance the padding stays the same between iterations.

It's possible this would come up when writing Parquet data files.

One fix would be to use TimestampValue's Hash function, which ignores the 
padding:
{noformat}
template<>
inline uint32_t DictEncoder::Hash(const TimestampValue& value) 
const {
  return value.Hash();
}{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue

2023-08-22 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-12393:
---
Priority: Blocker  (was: Critical)

> DictEncoder uses inconsistent hash function for TimestampValue
> --
>
> Key: IMPALA-12393
> URL: https://issues.apache.org/jira/browse/IMPALA-12393
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Blocker
>
> DictEncoder currently uses this hash function for TimestampValue:
> {noformat}
> template
> inline uint32_t DictEncoder::Hash(const T& value) const {
>   return HashUtil::Hash(, sizeof(value), 0);
> }{noformat}
> TimestampValue has some padding, and nothing ensures that the padding is 
> cleared. This means that identical TimestampValue objects can hash to 
> different values.
> This came up when fixing a Clang-Tidy performance check. This line in 
> dict-test.cc changed from iterating over values to iterating over const 
> references.
> {noformat}
>   DictEncoder encoder(, fixed_buffer_byte_size, 
> _encoder);
>   encoder.UsedbyTest();
> <<
>   for (InternalType i: values) encoder.Put(i);
> =
>   for (const InternalType& i: values) encoder.Put(i);
> >
>   bytes_alloc = encoder.DictByteSize();
>   EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
>   EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat}
> The test became flaky, with the encoder.num_entries() being larger than the 
> values_set.size() for TimestampValue. This happened because the hash values 
> didn't match even for identical entries and the dictionary would have 
> multiple copies of the same value. When iterating over a plain non-reference 
> TimestampValue, each TimestampValue is being copied to a temporary value. 
> Maybe in this circumstance the padding stays the same between iterations.
> It's possible this would come up when writing Parquet data files.
> One fix would be to use TimestampValue's Hash function, which ignores the 
> padding:
> {noformat}
> template<>
> inline uint32_t DictEncoder::Hash(const TimestampValue& 
> value) const {
>   return value.Hash();
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12392) Fix describe statements once HIVE-24509 arrives as dependency

2023-08-22 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12392:


 Summary: Fix describe statements once HIVE-24509 arrives as 
dependency
 Key: IMPALA-12392
 URL: https://issues.apache.org/jira/browse/IMPALA-12392
 Project: IMPALA
  Issue Type: Task
  Components: Catalog
Reporter: Csaba Ringhofer


HIVE-24509 will break test_describe_materialized_view as the ShowUtils in not 
included in our shaded jar.

It would be also nice to switch to ShowUtils.TextMetaDataTable here:
https://github.com/apache/impala/blob/a34f7ce63299c72ef45a99b01bb4e80210befbff/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java#L88
AFAIK the old function is kept in Hive only because of Impala.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12392) Fix describe statements once HIVE-24509 arrives as dependency

2023-08-22 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer reassigned IMPALA-12392:


Assignee: Csaba Ringhofer

> Fix describe statements once HIVE-24509 arrives as dependency
> -
>
> Key: IMPALA-12392
> URL: https://issues.apache.org/jira/browse/IMPALA-12392
> Project: IMPALA
>  Issue Type: Task
>  Components: Catalog
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>
> HIVE-24509 will break test_describe_materialized_view as the ShowUtils in not 
> included in our shaded jar.
> It would be also nice to switch to ShowUtils.TextMetaDataTable here:
> https://github.com/apache/impala/blob/a34f7ce63299c72ef45a99b01bb4e80210befbff/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java#L88
> AFAIK the old function is kept in Hive only because of Impala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg

2023-08-22 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab updated IMPALA-12266:
--
Labels: impala-iceberg  (was: )

> Sporadic failure after migrating a table to Iceberg
> ---
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @   0xf7a691  
> impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject()
> @   0xf82151  impala::CatalogServiceProcessorT<>::dispatchCall()
> @   0xee330f  apache::thrift::TDispatchProcessor::process()
> @  0x1329246  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x1315a89  impala::ThriftThread::RunRunnable()
> @  0x131773d  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x195ba8c  impala::Thread::SuperviseThread()
> @  0x195c895  boost::detail::thread_data<>::run()
> @  0x23a03a7  thread_proxy
> @ 0x7faaad2a66ba  start_thread
> @ 0x7f2c151d  clone
> E0704 19:09:23.006968   833 catalog-server.cc:278] 
> 7145c21173f2c47b:2579db55] NullPointerException: null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg

2023-08-22 Thread Gabor Kaszab (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757380#comment-17757380
 ] 

Gabor Kaszab commented on IMPALA-12266:
---

I managed to repro this with running the following SQL in a loop:
{code:java}
create table tmp_conv_tbl (i int) stored as parquet;
insert into tmp_conv_tbl values (1), (2), (3);
alter table tmp_conv_tbl convert to iceberg;
alter table tmp_conv_tbl set tblproperties ('format-version'='2');
drop table tmp_conv_tbl; {code}
For me the DROP TABLE statement failed with "Table doesn not exist" error. I 
guess it depends on which command is run on a different coordinator after the 
table conversion.

Note, that this repro came in local catalog mode, however, I wouldn't be 
surprised if this repro-ed in normal catalog mode.

This is how I enabled local catalog mode:
{code:java}
bin/start-impala-cluster.py --impalad_args='--use_local_catalog=true' 
--catalogd_args='--catalog_topic_mode=minimal' {code}

> Sporadic failure after migrating a table to Iceberg
> ---
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Gabor Kaszab
>Priority: Major
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @   0xf7a691  
> impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject()
> @   0xf82151  impala::CatalogServiceProcessorT<>::dispatchCall()
> @   0xee330f  apache::thrift::TDispatchProcessor::process()
> @  0x1329246  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x1315a89  impala::ThriftThread::RunRunnable()
> @  0x131773d  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x195ba8c  impala::Thread::SuperviseThread()
> @  0x195c895  boost::detail::thread_data<>::run()
> @ 

[jira] [Updated] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg

2023-08-22 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab updated IMPALA-12266:
--
Summary: Sporadic failure after migrating a table to Iceberg  (was: Flaky 
TestIcebergTable.test_convert_table NPE)

> Sporadic failure after migrating a table to Iceberg
> ---
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Gabor Kaszab
>Priority: Major
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @   0xf7a691  
> impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject()
> @   0xf82151  impala::CatalogServiceProcessorT<>::dispatchCall()
> @   0xee330f  apache::thrift::TDispatchProcessor::process()
> @  0x1329246  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x1315a89  impala::ThriftThread::RunRunnable()
> @  0x131773d  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x195ba8c  impala::Thread::SuperviseThread()
> @  0x195c895  boost::detail::thread_data<>::run()
> @  0x23a03a7  thread_proxy
> @ 0x7faaad2a66ba  start_thread
> @ 0x7f2c151d  clone
> E0704 19:09:23.006968   833 catalog-server.cc:278] 
> 7145c21173f2c47b:2579db55] NullPointerException: null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12266) Flaky TestIcebergTable.test_convert_table NPE

2023-08-22 Thread Gabor Kaszab (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757376#comment-17757376
 ] 

Gabor Kaszab commented on IMPALA-12266:
---

This is actually more than just a flaky tests as it comes in various scenarios. 
I'll rename the ticket to reflect this.

> Flaky TestIcebergTable.test_convert_table NPE
> -
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Gabor Kaszab
>Priority: Major
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @   0xf7a691  
> impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject()
> @   0xf82151  impala::CatalogServiceProcessorT<>::dispatchCall()
> @   0xee330f  apache::thrift::TDispatchProcessor::process()
> @  0x1329246  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x1315a89  impala::ThriftThread::RunRunnable()
> @  0x131773d  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x195ba8c  impala::Thread::SuperviseThread()
> @  0x195c895  boost::detail::thread_data<>::run()
> @  0x23a03a7  thread_proxy
> @ 0x7faaad2a66ba  start_thread
> @ 0x7f2c151d  clone
> E0704 19:09:23.006968   833 catalog-server.cc:278] 
> 7145c21173f2c47b:2579db55] NullPointerException: null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12019) Support ORDER BY for collections of fixed length types in select list

2023-08-22 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-12019:
---
Summary: Support ORDER BY for collections of fixed length types in select 
list  (was: Support ORDER BY for arrays of fixed length types in select list)

> Support ORDER BY for collections of fixed length types in select list
> -
>
> Key: IMPALA-12019
> URL: https://issues.apache.org/jira/browse/IMPALA-12019
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> As a first stage of IMPALA-10939, we should implement support for ORDER BY 
> for arrays that only contain fixed length types. That way the implementation 
> could be almost the same as the existing handling of strings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2761) Build and Run Impala on OS X

2023-08-22 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757304#comment-17757304
 ] 

Maxwell Guo commented on IMPALA-2761:
-

any update here ? Besides on macos for intel, I think apple m1/m2 is needed too.

> Build and Run Impala on OS X
> 
>
> Key: IMPALA-2761
> URL: https://issues.apache.org/jira/browse/IMPALA-2761
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Affects Versions: Impala 2.3.0
>Reporter: Martin Grund
>Priority: Minor
>  Labels: osx
>
> This is an Umbrella Ticket to support building an running Impala on Mac OS X. 
> Comments will be used to keep track of the status.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12386) NullExpr substitution failure with unsafe casts enabled

2023-08-22 Thread Peter Rozsa (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757216#comment-17757216
 ] 

Peter Rozsa commented on IMPALA-12386:
--

Yes, but it's surfaced using ALLOW_UNSAFE_CASTS query option (added in 
IMPALA-10173), the original functionality is not affected.

> NullExpr substitution failure with unsafe casts enabled
> ---
>
> Key: IMPALA-12386
> URL: https://issues.apache.org/jira/browse/IMPALA-12386
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.3.0
>Reporter: Peter Rozsa
>Assignee: Peter Rozsa
>Priority: Major
>
> insert into t01(a, b) values(null, "23"), ("21", null) query fails with the 
> following error:
> ERROR: IllegalStateException: Failed analysis after expr substitution. 
> CAUSED BY: IllegalStateException: cast STRING to INT
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org