[jira] [Created] (IMPALA-12398) Ranger role not exists when altering db/table/view owner to a role
Quanlong Huang created IMPALA-12398: --- Summary: Ranger role not exists when altering db/table/view owner to a role Key: IMPALA-12398 URL: https://issues.apache.org/jira/browse/IMPALA-12398 Project: IMPALA Issue Type: Bug Components: Security Reporter: Quanlong Huang To reproduce the issue, start Impala cluster with Ranger authorization enabled: {code:bash} bin/start-impala-cluster.py --impalad_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger" --catalogd_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger" {code} Create a role "hql_test" and a temp table "tmp_tbl", then set the owner of it to the role: {code:sql} $ impala-shell.sh -u admin default> create table tmp_tbl(id int); default> create role hql_test; default> alter table tmp_tbl set owner role hql_test; Query: alter table tmp_tbl set owner role hql_test ERROR: AnalysisException: Role 'hql_test' does not exist. {code} However, SHOW ROLES can show the role: {code:sql} default> show roles; Query: show roles +---+ | role_name | +---+ | hql_test | +---+ Fetched 1 row(s) in 0.01s {code} Ranger roles are not loaded in Impala's catalog cache. We should either load them or use RangerPlugin to check existence of a role. Code snipper of the role check: {code:java} if (analyzer.isAuthzEnabled() && owner_.getOwnerType() == TOwnerType.ROLE && analyzer.getCatalog().getAuthPolicy().getRole(ownerName) == null) { throw new AnalysisException(String.format("Role '%s' does not exist.", ownerName)); } {code} https://github.com/apache/impala/blob/08501cef2df16991bbd99656c696b978f08aeebe/fe/src/main/java/org/apache/impala/analysis/AlterTableOrViewSetOwnerStmt.java#L56 CC [~fangyurao] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12397) NullPointerException in SHOW ROLES when there are no roles
Quanlong Huang created IMPALA-12397: --- Summary: NullPointerException in SHOW ROLES when there are no roles Key: IMPALA-12397 URL: https://issues.apache.org/jira/browse/IMPALA-12397 Project: IMPALA Issue Type: Bug Components: Security Reporter: Quanlong Huang When there are no roles in Ranger, SHOW ROLES statement hits NullPointerException: {noformat} Query: show roles ERROR: InternalException: Error executing SHOW ROLES. Ranger error message: null {noformat} The cause is 'roles' here is null: {code:java} Set roles = plugin_.get().getRoles().getRangerRoles(); roleNames = roles.stream().map(RangerRole::getName).collect(Collectors.toSet()); {code} https://github.com/apache/impala/blob/08501cef2df16991bbd99656c696b978f08aeebe/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java#L135-L136 To reproduce this, start Impala cluster with Ranger authorization: {code:bash} bin/start-impala-cluster.py --impalad_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger" --catalogd_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger" {code} At the begining, there are no roles in Ranger. Run "SHOW ROLES" in Impala to reproduce the error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12396) Inconsistent error messages between creating hdfs and kudu/iceberg tables when table exists in HMS
Quanlong Huang created IMPALA-12396: --- Summary: Inconsistent error messages between creating hdfs and kudu/iceberg tables when table exists in HMS Key: IMPALA-12396 URL: https://issues.apache.org/jira/browse/IMPALA-12396 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang When creating a kudu/iceberg table, we check whether it exists in HMS before invoking the createTable HMS RPC: https://github.com/apache/impala/blob/08501cef2df16991bbd99656c696b978f08aeebe/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3483 https://github.com/apache/impala/blob/08501cef2df16991bbd99656c696b978f08aeebe/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3714 However, when creating a hdfs table, we just invoke the createTable RPC (when the table is not in catalog cache): https://github.com/apache/impala/blob/08501cef2df16991bbd99656c696b978f08aeebe/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3563 This results in different error messages when the table does exist in HMS but not exist in catalog cache. E.g. I create a table in Hive and recreate it in an Impala cluster that has disabled HMS event processing, the error message is {noformat} Query: create table hive_tbl(id int) ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore: CAUSED BY: AlreadyExistsException: Table hive.default.hive_tbl already exists {noformat} Creating the same table in kudu format, got different error message: {noformat} Query: create table hive_tbl (id int, name string, primary key(id)) partition by hash(id) partitions 3 stored as kudu +---+ | summary | +---+ | Table already exists. | +---+ Fetched 1 row(s) in 1.63s {noformat} We can add the same check in creating HDFS tables and provide the same error message. BTW, we might need to mention Metastore: "Table already exists in Metastore". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10173) Allow implicit casts between numeric and string types when inserting into table
[ https://issues.apache.org/jira/browse/IMPALA-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757745#comment-17757745 ] ASF subversion and git services commented on IMPALA-10173: -- Commit 08501cef2df16991bbd99656c696b978f08aeebe in impala's branch refs/heads/master from Peter Rozsa [ https://gitbox.apache.org/repos/asf?p=impala.git;h=08501cef2 ] IMPALA-12384: Restore NullLiteral's uncheckedCastTo function signature This change restores NullLiteral's uncheckedCastTo function's signature to preserve the external compatibility of the method and make it conform with changes regarding IMPALA-10173. Change-Id: Id9c01129d3cdcaeb222ea910521704ce2305fd2e Reviewed-on: http://gerrit.cloudera.org:8080/20376 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Allow implicit casts between numeric and string types when inserting into > table > --- > > Key: IMPALA-10173 > URL: https://issues.apache.org/jira/browse/IMPALA-10173 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Tim Armstrong >Assignee: Peter Rozsa >Priority: Minor > Labels: 2023Q1, ramp-up, sql-language, supportability > > Impala is somewhat stricter than other engines such as Hive when it comes > into implicit casts. This avoids a lot of ambiguity and edge cases with > complex SQL, but we could consider loosening it for simple cases like > inserting into a table where the meaning/intent is pretty straightforward. > Repro > {code} > CREATE TABLE iobt ( c0 FLOAT ) ; > INSERT INTO iobt(c0) VALUES ('0'), (1562998803); > {code} > Error > {code} > AnalysisException: Incompatible return types 'STRING' and 'INT' of exprs > ''0'' and '1562998803'. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12384) Restore NullLiteral's uncheckedCastTo function signature
[ https://issues.apache.org/jira/browse/IMPALA-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757744#comment-17757744 ] ASF subversion and git services commented on IMPALA-12384: -- Commit 08501cef2df16991bbd99656c696b978f08aeebe in impala's branch refs/heads/master from Peter Rozsa [ https://gitbox.apache.org/repos/asf?p=impala.git;h=08501cef2 ] IMPALA-12384: Restore NullLiteral's uncheckedCastTo function signature This change restores NullLiteral's uncheckedCastTo function's signature to preserve the external compatibility of the method and make it conform with changes regarding IMPALA-10173. Change-Id: Id9c01129d3cdcaeb222ea910521704ce2305fd2e Reviewed-on: http://gerrit.cloudera.org:8080/20376 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Restore NullLiteral's uncheckedCastTo function signature > > > Key: IMPALA-12384 > URL: https://issues.apache.org/jira/browse/IMPALA-12384 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.3.0 >Reporter: Peter Rozsa >Assignee: Peter Rozsa >Priority: Minor > > NullLiteral's uncheckedCastTo function should preserve its signature as it > was before IMPALA-10173 to maintain its external compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12395) Planner overestimates scan cardinality for queries using count star optimization
[ https://issues.apache.org/jira/browse/IMPALA-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riza Suminto reassigned IMPALA-12395: - Assignee: Riza Suminto > Planner overestimates scan cardinality for queries using count star > optimization > > > Key: IMPALA-12395 > URL: https://issues.apache.org/jira/browse/IMPALA-12395 > Project: IMPALA > Issue Type: Bug > Components: fe >Reporter: David Rorke >Assignee: Riza Suminto >Priority: Major > > The scan cardinality estimate for count(*) queries doesn't account for the > fact that the count(*) optimization only scans metadata and not the actual > columns. > Scan for a count(*) query on Parquet store_sales: > > {noformat} > Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak > Mem Detail > - > 00:SCAN S3 6 72 8s131ms 8s496ms 2.71K 8.64B 128.00 KB 88.00 MB > tpcds_3000_string_parquet_managed.store_sales > {noformat} > > This is a problem with all file/table formats that implement count(*) > optimizations (Parquet and also probably ORC and Iceberg). > This problem is more serious than it was in the past because with > IMPALA-12091 we now rely on scan cardinality estimates for executor group > assignments so count(*) queries are likely to get assigned to a larger > executor group than needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12386) NullExpr substitution failure with unsafe casts enabled
[ https://issues.apache.org/jira/browse/IMPALA-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757738#comment-17757738 ] ASF subversion and git services commented on IMPALA-12386: -- Commit c5ecd8e666e6dbdeba4fd9d25acb222eceaa240a in impala's branch refs/heads/master from Peter Rozsa [ https://gitbox.apache.org/repos/asf?p=impala.git;h=c5ecd8e66 ] IMPALA-12386: Fix clone constructor in CastExpr This commit addresses an issue in the CastExpr class where the clone constructor was not properly preserving compatibility settings. The clone constructor assigned the default compatibility regardless of the source expression, causing substitution errors for partitioned tables. Example: 'insert into unsafe_insert_partitioned(int_col, string_col) values("1", null), (null, "1")' Throws: ERROR: IllegalStateException: Failed analysis after expr substitution. CAUSED BY: IllegalStateException: cast STRING to INT Tests: - new test case added to insert-unsafe.test Change-Id: Iff64ce02539651fcb3a90db678f74467f582648f Reviewed-on: http://gerrit.cloudera.org:8080/20385 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > NullExpr substitution failure with unsafe casts enabled > --- > > Key: IMPALA-12386 > URL: https://issues.apache.org/jira/browse/IMPALA-12386 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.3.0 >Reporter: Peter Rozsa >Assignee: Peter Rozsa >Priority: Major > > insert into t01(a, b) values(null, "23"), ("21", null) query fails with the > following error: > ERROR: IllegalStateException: Failed analysis after expr substitution. > CAUSED BY: IllegalStateException: cast STRING to INT > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5081) Expose IR optimization level via query option
[ https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757727#comment-17757727 ] Yida Wu commented on IMPALA-5081: - Yeah, agree with that. > Expose IR optimization level via query option > - > > Key: IMPALA-5081 > URL: https://issues.apache.org/jira/browse/IMPALA-5081 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Michael Ho >Assignee: Michael Smith >Priority: Minor > Labels: codegen > > Certain queries may spend a lot of time in the IR optimization. Currently, > there is a start-up option to disable optimization in LLVM. However, it may > be of inconvenience to users to have to restart the entire Impala cluster to > just use that option. This JIRA aims at exploring exposing a query option for > users to choose the optimization level for a given query (e.g. we can have a > level which just only have a dead code elimination pass or no optimization at > all). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12395) Planner overestimates scan cardinality for queries using count star optimization
[ https://issues.apache.org/jira/browse/IMPALA-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757695#comment-17757695 ] Riza Suminto commented on IMPALA-12395: --- Previously reported at https://issues.apache.org/jira/browse/IMPALA-5851 > Planner overestimates scan cardinality for queries using count star > optimization > > > Key: IMPALA-12395 > URL: https://issues.apache.org/jira/browse/IMPALA-12395 > Project: IMPALA > Issue Type: Bug > Components: fe >Reporter: David Rorke >Priority: Major > > The scan cardinality estimate for count(*) queries doesn't account for the > fact that the count(*) optimization only scans metadata and not the actual > columns. > Scan for a count(*) query on Parquet store_sales: > > {noformat} > Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak > Mem Detail > - > 00:SCAN S3 6 72 8s131ms 8s496ms 2.71K 8.64B 128.00 KB 88.00 MB > tpcds_3000_string_parquet_managed.store_sales > {noformat} > > This is a problem with all file/table formats that implement count(*) > optimizations (Parquet and also probably ORC and Iceberg). > This problem is more serious than it was in the past because with > IMPALA-12091 we now rely on scan cardinality estimates for executor group > assignments so count(*) queries are likely to get assigned to a larger > executor group than needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12377) Improve count star performance for external data source
[ https://issues.apache.org/jira/browse/IMPALA-12377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou updated IMPALA-12377: - Description: The code to handle count(*) query in backend function DataSourceScanNode::GetNext() are not efficient. Even there are no column data returned from external data source, it still try to materialize rows and add rows to RowBatch one by one up to the number of row count. It also call GetNextInputBatch() multiple times (count / batch_size), while GetNextInputBatch() invoke JNI function.(was: The code to handle 'select count(*)' in backend function DataSourceScanNode::GetNext() are not efficient. Even there are no column data returned from external data source, it still try to materialize rows and add rows to RowBatch one by one up to the number of row count. It also call GetNextInputBatch() multiple times (count / batch_size), while GetNextInputBatch() invoke JNI function. ) > Improve count star performance for external data source > --- > > Key: IMPALA-12377 > URL: https://issues.apache.org/jira/browse/IMPALA-12377 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Frontend >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > > The code to handle count(*) query in backend function > DataSourceScanNode::GetNext() are not efficient. Even there are no column > data returned from external data source, it still try to materialize rows and > add rows to RowBatch one by one up to the number of row count. It also call > GetNextInputBatch() multiple times (count / batch_size), while > GetNextInputBatch() invoke JNI function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12377) Improve count star performance for external data source
[ https://issues.apache.org/jira/browse/IMPALA-12377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou updated IMPALA-12377: - Summary: Improve count star performance for external data source (was: Improve 'select count(*)' performance for external data source) > Improve count star performance for external data source > --- > > Key: IMPALA-12377 > URL: https://issues.apache.org/jira/browse/IMPALA-12377 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Frontend >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > > The code to handle 'select count(*)' in backend function > DataSourceScanNode::GetNext() are not efficient. Even there are no column > data returned from external data source, it still try to materialize rows and > add rows to RowBatch one by one up to the number of row count. It also call > GetNextInputBatch() multiple times (count / batch_size), while > GetNextInputBatch() invoke JNI function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5081) Expose IR optimization level via query option
[ https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757689#comment-17757689 ] Michael Smith commented on IMPALA-5081: --- I'm tempted to do nothing for the moment, as it'll be an advanced option. If it looks promising after we've experimented for a bit, we can improve usability. > Expose IR optimization level via query option > - > > Key: IMPALA-5081 > URL: https://issues.apache.org/jira/browse/IMPALA-5081 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Michael Ho >Assignee: Michael Smith >Priority: Minor > Labels: codegen > > Certain queries may spend a lot of time in the IR optimization. Currently, > there is a start-up option to disable optimization in LLVM. However, it may > be of inconvenience to users to have to restart the entire Impala cluster to > just use that option. This JIRA aims at exploring exposing a query option for > users to choose the optimization level for a given query (e.g. we can have a > level which just only have a dead code elimination pass or no optimization at > all). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12395) Planner overestimates scan cardinality for queries using count star optimization
David Rorke created IMPALA-12395: Summary: Planner overestimates scan cardinality for queries using count star optimization Key: IMPALA-12395 URL: https://issues.apache.org/jira/browse/IMPALA-12395 Project: IMPALA Issue Type: Bug Components: fe Reporter: David Rorke The scan cardinality estimate for count(*) queries doesn't account for the fact that the count(*) optimization only scans metadata and not the actual columns. Scan for a count(*) query on Parquet store_sales: {noformat} Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 00:SCAN S3 6 72 8s131ms 8s496ms 2.71K 8.64B 128.00 KB 88.00 MB tpcds_3000_string_parquet_managed.store_sales {noformat} This is a problem with all file/table formats that implement count(*) optimizations (Parquet and also probably ORC and Iceberg). This problem is more serious than it was in the past because with IMPALA-12091 we now rely on scan cardinality estimates for executor group assignments so count(*) queries are likely to get assigned to a larger executor group than needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5081) Expose IR optimization level via query option
[ https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757684#comment-17757684 ] Yida Wu commented on IMPALA-5081: - [~MikaelSmith] Several solutions on my mind: 1. we have the optimization level in the key, so that multiple cache entries with different optimization levels of the fragment exist. (it may affect the hit rate by evicting other entries earlier and leaving some useless entries in the cache) 2. we have the optimization level in the cache entry content, replace when hits and if the current optimization level is different and better (not sure how to define better, or maybe replace with the latest). (would it be costly to update the entry?) 3. do nothing in the code. The user tries different optimization levels with codegen caching disabled for the given query, and find out the best solution. Then restart the server to refill the codegen caching with the best optimization level. > Expose IR optimization level via query option > - > > Key: IMPALA-5081 > URL: https://issues.apache.org/jira/browse/IMPALA-5081 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Michael Ho >Assignee: Michael Smith >Priority: Minor > Labels: codegen > > Certain queries may spend a lot of time in the IR optimization. Currently, > there is a start-up option to disable optimization in LLVM. However, it may > be of inconvenience to users to have to restart the entire Impala cluster to > just use that option. This JIRA aims at exploring exposing a query option for > users to choose the optimization level for a given query (e.g. we can have a > level which just only have a dead code elimination pass or no optimization at > all). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue
[ https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757678#comment-17757678 ] Joe McDonnell commented on IMPALA-12393: For normal execution, I think this statement that clears the Tuple means that the padding will be consistent. From be/src/runtime/tuple.h: {noformat} void Init(int size) { memset(this, 0, size); }{noformat} That would mean this is test-only and not a real issue. > DictEncoder uses inconsistent hash function for TimestampValue > -- > > Key: IMPALA-12393 > URL: https://issues.apache.org/jira/browse/IMPALA-12393 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > > DictEncoder currently uses this hash function for TimestampValue: > {noformat} > template > inline uint32_t DictEncoder::Hash(const T& value) const { > return HashUtil::Hash(, sizeof(value), 0); > }{noformat} > TimestampValue has some padding, and nothing ensures that the padding is > cleared. This means that identical TimestampValue objects can hash to > different values. > This came up when fixing a Clang-Tidy performance check. This line in > dict-test.cc changed from iterating over values to iterating over const > references. > {noformat} > DictEncoder encoder(, fixed_buffer_byte_size, > _encoder); > encoder.UsedbyTest(); > << > for (InternalType i: values) encoder.Put(i); > = > for (const InternalType& i: values) encoder.Put(i); > > > bytes_alloc = encoder.DictByteSize(); > EXPECT_EQ(track_encoder.consumption(), bytes_alloc); > EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat} > The test became flaky, with the encoder.num_entries() being larger than the > values_set.size() for TimestampValue. This happened because the hash values > didn't match even for identical entries and the dictionary would have > multiple copies of the same value. When iterating over a plain non-reference > TimestampValue, each TimestampValue is being copied to a temporary value. > Maybe in this circumstance the padding stays the same between iterations. > It's possible this would come up when writing Parquet data files. > One fix would be to use TimestampValue's Hash function, which ignores the > padding: > {noformat} > template<> > inline uint32_t DictEncoder::Hash(const TimestampValue& > value) const { > return value.Hash(); > }{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue
[ https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757666#comment-17757666 ] Joe McDonnell commented on IMPALA-12393: I can't get this to impact inserting into a Parquet table. I'll downgrade this. I think there is also a question of performance of TimestampValue::Hash() vs doing a hash of the first 12 bytes. > DictEncoder uses inconsistent hash function for TimestampValue > -- > > Key: IMPALA-12393 > URL: https://issues.apache.org/jira/browse/IMPALA-12393 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Blocker > > DictEncoder currently uses this hash function for TimestampValue: > {noformat} > template > inline uint32_t DictEncoder::Hash(const T& value) const { > return HashUtil::Hash(, sizeof(value), 0); > }{noformat} > TimestampValue has some padding, and nothing ensures that the padding is > cleared. This means that identical TimestampValue objects can hash to > different values. > This came up when fixing a Clang-Tidy performance check. This line in > dict-test.cc changed from iterating over values to iterating over const > references. > {noformat} > DictEncoder encoder(, fixed_buffer_byte_size, > _encoder); > encoder.UsedbyTest(); > << > for (InternalType i: values) encoder.Put(i); > = > for (const InternalType& i: values) encoder.Put(i); > > > bytes_alloc = encoder.DictByteSize(); > EXPECT_EQ(track_encoder.consumption(), bytes_alloc); > EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat} > The test became flaky, with the encoder.num_entries() being larger than the > values_set.size() for TimestampValue. This happened because the hash values > didn't match even for identical entries and the dictionary would have > multiple copies of the same value. When iterating over a plain non-reference > TimestampValue, each TimestampValue is being copied to a temporary value. > Maybe in this circumstance the padding stays the same between iterations. > It's possible this would come up when writing Parquet data files. > One fix would be to use TimestampValue's Hash function, which ignores the > padding: > {noformat} > template<> > inline uint32_t DictEncoder::Hash(const TimestampValue& > value) const { > return value.Hash(); > }{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue
[ https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell updated IMPALA-12393: --- Priority: Major (was: Blocker) > DictEncoder uses inconsistent hash function for TimestampValue > -- > > Key: IMPALA-12393 > URL: https://issues.apache.org/jira/browse/IMPALA-12393 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > > DictEncoder currently uses this hash function for TimestampValue: > {noformat} > template > inline uint32_t DictEncoder::Hash(const T& value) const { > return HashUtil::Hash(, sizeof(value), 0); > }{noformat} > TimestampValue has some padding, and nothing ensures that the padding is > cleared. This means that identical TimestampValue objects can hash to > different values. > This came up when fixing a Clang-Tidy performance check. This line in > dict-test.cc changed from iterating over values to iterating over const > references. > {noformat} > DictEncoder encoder(, fixed_buffer_byte_size, > _encoder); > encoder.UsedbyTest(); > << > for (InternalType i: values) encoder.Put(i); > = > for (const InternalType& i: values) encoder.Put(i); > > > bytes_alloc = encoder.DictByteSize(); > EXPECT_EQ(track_encoder.consumption(), bytes_alloc); > EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat} > The test became flaky, with the encoder.num_entries() being larger than the > values_set.size() for TimestampValue. This happened because the hash values > didn't match even for identical entries and the dictionary would have > multiple copies of the same value. When iterating over a plain non-reference > TimestampValue, each TimestampValue is being copied to a temporary value. > Maybe in this circumstance the padding stays the same between iterations. > It's possible this would come up when writing Parquet data files. > One fix would be to use TimestampValue's Hash function, which ignores the > padding: > {noformat} > template<> > inline uint32_t DictEncoder::Hash(const TimestampValue& > value) const { > return value.Hash(); > }{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5081) Expose IR optimization level via query option
[ https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757658#comment-17757658 ] Michael Smith commented on IMPALA-5081: --- [~baggio000] any thoughts on how this should interact with codegen cache? We could make it part of the key so that if you use a different optimization level it won't get a cache hit (and run with a potentially slower optimized version). > Expose IR optimization level via query option > - > > Key: IMPALA-5081 > URL: https://issues.apache.org/jira/browse/IMPALA-5081 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Michael Ho >Assignee: Michael Smith >Priority: Minor > Labels: codegen > > Certain queries may spend a lot of time in the IR optimization. Currently, > there is a start-up option to disable optimization in LLVM. However, it may > be of inconvenience to users to have to restart the entire Impala cluster to > just use that option. This JIRA aims at exploring exposing a query option for > users to choose the optimization level for a given query (e.g. we can have a > level which just only have a dead code elimination pass or no optimization at > all). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue
[ https://issues.apache.org/jira/browse/IMPALA-12393 ] Joe McDonnell deleted comment on IMPALA-12393: was (Author: joemcdonnell): I can't get the Parquet writing to produce the issue, so maybe the padding is always zero somehow. > DictEncoder uses inconsistent hash function for TimestampValue > -- > > Key: IMPALA-12393 > URL: https://issues.apache.org/jira/browse/IMPALA-12393 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Blocker > > DictEncoder currently uses this hash function for TimestampValue: > {noformat} > template > inline uint32_t DictEncoder::Hash(const T& value) const { > return HashUtil::Hash(, sizeof(value), 0); > }{noformat} > TimestampValue has some padding, and nothing ensures that the padding is > cleared. This means that identical TimestampValue objects can hash to > different values. > This came up when fixing a Clang-Tidy performance check. This line in > dict-test.cc changed from iterating over values to iterating over const > references. > {noformat} > DictEncoder encoder(, fixed_buffer_byte_size, > _encoder); > encoder.UsedbyTest(); > << > for (InternalType i: values) encoder.Put(i); > = > for (const InternalType& i: values) encoder.Put(i); > > > bytes_alloc = encoder.DictByteSize(); > EXPECT_EQ(track_encoder.consumption(), bytes_alloc); > EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat} > The test became flaky, with the encoder.num_entries() being larger than the > values_set.size() for TimestampValue. This happened because the hash values > didn't match even for identical entries and the dictionary would have > multiple copies of the same value. When iterating over a plain non-reference > TimestampValue, each TimestampValue is being copied to a temporary value. > Maybe in this circumstance the padding stays the same between iterations. > It's possible this would come up when writing Parquet data files. > One fix would be to use TimestampValue's Hash function, which ignores the > padding: > {noformat} > template<> > inline uint32_t DictEncoder::Hash(const TimestampValue& > value) const { > return value.Hash(); > }{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue
[ https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757639#comment-17757639 ] Joe McDonnell commented on IMPALA-12393: I can't get the Parquet writing to produce the issue, so maybe the padding is always zero somehow. > DictEncoder uses inconsistent hash function for TimestampValue > -- > > Key: IMPALA-12393 > URL: https://issues.apache.org/jira/browse/IMPALA-12393 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Blocker > > DictEncoder currently uses this hash function for TimestampValue: > {noformat} > template > inline uint32_t DictEncoder::Hash(const T& value) const { > return HashUtil::Hash(, sizeof(value), 0); > }{noformat} > TimestampValue has some padding, and nothing ensures that the padding is > cleared. This means that identical TimestampValue objects can hash to > different values. > This came up when fixing a Clang-Tidy performance check. This line in > dict-test.cc changed from iterating over values to iterating over const > references. > {noformat} > DictEncoder encoder(, fixed_buffer_byte_size, > _encoder); > encoder.UsedbyTest(); > << > for (InternalType i: values) encoder.Put(i); > = > for (const InternalType& i: values) encoder.Put(i); > > > bytes_alloc = encoder.DictByteSize(); > EXPECT_EQ(track_encoder.consumption(), bytes_alloc); > EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat} > The test became flaky, with the encoder.num_entries() being larger than the > values_set.size() for TimestampValue. This happened because the hash values > didn't match even for identical entries and the dictionary would have > multiple copies of the same value. When iterating over a plain non-reference > TimestampValue, each TimestampValue is being copied to a temporary value. > Maybe in this circumstance the padding stays the same between iterations. > It's possible this would come up when writing Parquet data files. > One fix would be to use TimestampValue's Hash function, which ignores the > padding: > {noformat} > template<> > inline uint32_t DictEncoder::Hash(const TimestampValue& > value) const { > return value.Hash(); > }{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue
[ https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757624#comment-17757624 ] Joe McDonnell commented on IMPALA-12393: I see this as a bug. I'm not sure if it is actually user-visible. The bug would come up when inserting timestamps into a Parquet table. If the hash is inconsistent, then the dictionary encoding sometimes treats two equivalent values as different. This prevents the dictionary encoding from working properly, so you can end up with a dictionary with multiple copies of the same value using different integer representations. This would result in larger Parquet files. I haven't confirmed this Parquet encoding case, but it seems possible. It would happen unless we always clear the memory so the padding is zero. I don't think we guarantee that. > DictEncoder uses inconsistent hash function for TimestampValue > -- > > Key: IMPALA-12393 > URL: https://issues.apache.org/jira/browse/IMPALA-12393 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Blocker > > DictEncoder currently uses this hash function for TimestampValue: > {noformat} > template > inline uint32_t DictEncoder::Hash(const T& value) const { > return HashUtil::Hash(, sizeof(value), 0); > }{noformat} > TimestampValue has some padding, and nothing ensures that the padding is > cleared. This means that identical TimestampValue objects can hash to > different values. > This came up when fixing a Clang-Tidy performance check. This line in > dict-test.cc changed from iterating over values to iterating over const > references. > {noformat} > DictEncoder encoder(, fixed_buffer_byte_size, > _encoder); > encoder.UsedbyTest(); > << > for (InternalType i: values) encoder.Put(i); > = > for (const InternalType& i: values) encoder.Put(i); > > > bytes_alloc = encoder.DictByteSize(); > EXPECT_EQ(track_encoder.consumption(), bytes_alloc); > EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat} > The test became flaky, with the encoder.num_entries() being larger than the > values_set.size() for TimestampValue. This happened because the hash values > didn't match even for identical entries and the dictionary would have > multiple copies of the same value. When iterating over a plain non-reference > TimestampValue, each TimestampValue is being copied to a temporary value. > Maybe in this circumstance the padding stays the same between iterations. > It's possible this would come up when writing Parquet data files. > One fix would be to use TimestampValue's Hash function, which ignores the > padding: > {noformat} > template<> > inline uint32_t DictEncoder::Hash(const TimestampValue& > value) const { > return value.Hash(); > }{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12394) Restore testing single-node planning for most tests
Michael Smith created IMPALA-12394: -- Summary: Restore testing single-node planning for most tests Key: IMPALA-12394 URL: https://issues.apache.org/jira/browse/IMPALA-12394 Project: IMPALA Issue Type: Task Reporter: Michael Smith ALL_CLUSTER_SIZES was redefined to be ALL_NODES_ONLY due to IMPALA-561. However it wasn't restored after IMPALA-561 was fixed, and now we've added tons of tests that might only work with ALL_NODES_ONLY. Re-evaluate new tests with ALL_CLUSTER_SIZES and restore testing with single and distributed planning by updating ImpalaTestSuite#add_test_dimensions and test_dimensions.create_exec_option_dimension to use ALL_CLUSTER_SIZES. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue
[ https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757605#comment-17757605 ] Michael Smith commented on IMPALA-12393: Is this a bug, or an improvement? Since you didn't change the iteration earlier, this seems like a (small) perf improvement. > DictEncoder uses inconsistent hash function for TimestampValue > -- > > Key: IMPALA-12393 > URL: https://issues.apache.org/jira/browse/IMPALA-12393 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Blocker > > DictEncoder currently uses this hash function for TimestampValue: > {noformat} > template > inline uint32_t DictEncoder::Hash(const T& value) const { > return HashUtil::Hash(, sizeof(value), 0); > }{noformat} > TimestampValue has some padding, and nothing ensures that the padding is > cleared. This means that identical TimestampValue objects can hash to > different values. > This came up when fixing a Clang-Tidy performance check. This line in > dict-test.cc changed from iterating over values to iterating over const > references. > {noformat} > DictEncoder encoder(, fixed_buffer_byte_size, > _encoder); > encoder.UsedbyTest(); > << > for (InternalType i: values) encoder.Put(i); > = > for (const InternalType& i: values) encoder.Put(i); > > > bytes_alloc = encoder.DictByteSize(); > EXPECT_EQ(track_encoder.consumption(), bytes_alloc); > EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat} > The test became flaky, with the encoder.num_entries() being larger than the > values_set.size() for TimestampValue. This happened because the hash values > didn't match even for identical entries and the dictionary would have > multiple copies of the same value. When iterating over a plain non-reference > TimestampValue, each TimestampValue is being copied to a temporary value. > Maybe in this circumstance the padding stays the same between iterations. > It's possible this would come up when writing Parquet data files. > One fix would be to use TimestampValue's Hash function, which ignores the > padding: > {noformat} > template<> > inline uint32_t DictEncoder::Hash(const TimestampValue& > value) const { > return value.Hash(); > }{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue
[ https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell reassigned IMPALA-12393: -- Assignee: Joe McDonnell > DictEncoder uses inconsistent hash function for TimestampValue > -- > > Key: IMPALA-12393 > URL: https://issues.apache.org/jira/browse/IMPALA-12393 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Blocker > > DictEncoder currently uses this hash function for TimestampValue: > {noformat} > template > inline uint32_t DictEncoder::Hash(const T& value) const { > return HashUtil::Hash(, sizeof(value), 0); > }{noformat} > TimestampValue has some padding, and nothing ensures that the padding is > cleared. This means that identical TimestampValue objects can hash to > different values. > This came up when fixing a Clang-Tidy performance check. This line in > dict-test.cc changed from iterating over values to iterating over const > references. > {noformat} > DictEncoder encoder(, fixed_buffer_byte_size, > _encoder); > encoder.UsedbyTest(); > << > for (InternalType i: values) encoder.Put(i); > = > for (const InternalType& i: values) encoder.Put(i); > > > bytes_alloc = encoder.DictByteSize(); > EXPECT_EQ(track_encoder.consumption(), bytes_alloc); > EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat} > The test became flaky, with the encoder.num_entries() being larger than the > values_set.size() for TimestampValue. This happened because the hash values > didn't match even for identical entries and the dictionary would have > multiple copies of the same value. When iterating over a plain non-reference > TimestampValue, each TimestampValue is being copied to a temporary value. > Maybe in this circumstance the padding stays the same between iterations. > It's possible this would come up when writing Parquet data files. > One fix would be to use TimestampValue's Hash function, which ignores the > padding: > {noformat} > template<> > inline uint32_t DictEncoder::Hash(const TimestampValue& > value) const { > return value.Hash(); > }{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12375) DataSource objects are not persistent
[ https://issues.apache.org/jira/browse/IMPALA-12375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Maheshwari updated IMPALA-12375: --- Summary: DataSource objects are not persistent (was: DataSource ojects are not persistent) > DataSource objects are not persistent > - > > Key: IMPALA-12375 > URL: https://issues.apache.org/jira/browse/IMPALA-12375 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Catalog, Frontend >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > > DataSource ojects which are created with "CREATE DATA SOURCE" statements are > not persistent. The objects are not shown in "show data sources" after the > catalog server is restarted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-5081) Expose IR optimization level via query option
[ https://issues.apache.org/jira/browse/IMPALA-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-5081 started by Michael Smith. - > Expose IR optimization level via query option > - > > Key: IMPALA-5081 > URL: https://issues.apache.org/jira/browse/IMPALA-5081 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Michael Ho >Assignee: Michael Smith >Priority: Minor > Labels: codegen > > Certain queries may spend a lot of time in the IR optimization. Currently, > there is a start-up option to disable optimization in LLVM. However, it may > be of inconvenience to users to have to restart the entire Impala cluster to > just use that option. This JIRA aims at exploring exposing a query option for > users to choose the optimization level for a given query (e.g. we can have a > level which just only have a dead code elimination pass or no optimization at > all). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10474) Untracked memory is huge like memory leak
[ https://issues.apache.org/jira/browse/IMPALA-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757569#comment-17757569 ] Michael Smith commented on IMPALA-10474: This is primarily in Thrift server. Would be worth checking whether newer releases of Thrift have addressed any memory leaks. > Untracked memory is huge like memory leak > - > > Key: IMPALA-10474 > URL: https://issues.apache.org/jira/browse/IMPALA-10474 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0, Impala 3.3.0, Impala 3.4.0 >Reporter: Xianqing He >Priority: Major > Attachments: image-2021-02-04-18-15-34-016.png, > image-2021-02-04-18-18-47-183.png > > > In a production environment, when impala just started, the untracked memory > is huge, but now there is no query. > !image-2021-02-04-18-15-34-016.png|width=476,height=237! > In impalad.ERROR > !image-2021-02-04-18-18-47-183.png|width=1083,height=200! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue
Joe McDonnell created IMPALA-12393: -- Summary: DictEncoder uses inconsistent hash function for TimestampValue Key: IMPALA-12393 URL: https://issues.apache.org/jira/browse/IMPALA-12393 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.3.0 Reporter: Joe McDonnell DictEncoder currently uses this hash function for TimestampValue: {noformat} template inline uint32_t DictEncoder::Hash(const T& value) const { return HashUtil::Hash(, sizeof(value), 0); }{noformat} TimestampValue has some padding, and nothing ensures that the padding is cleared. This means that identical TimestampValue objects can hash to different values. This came up when fixing a Clang-Tidy performance check. This line in dict-test.cc changed from iterating over values to iterating over const references. {noformat} DictEncoder encoder(, fixed_buffer_byte_size, _encoder); encoder.UsedbyTest(); << for (InternalType i: values) encoder.Put(i); = for (const InternalType& i: values) encoder.Put(i); > bytes_alloc = encoder.DictByteSize(); EXPECT_EQ(track_encoder.consumption(), bytes_alloc); EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat} The test became flaky, with the encoder.num_entries() being larger than the values_set.size() for TimestampValue. This happened because the hash values didn't match even for identical entries and the dictionary would have multiple copies of the same value. When iterating over a plain non-reference TimestampValue, each TimestampValue is being copied to a temporary value. Maybe in this circumstance the padding stays the same between iterations. It's possible this would come up when writing Parquet data files. One fix would be to use TimestampValue's Hash function, which ignores the padding: {noformat} template<> inline uint32_t DictEncoder::Hash(const TimestampValue& value) const { return value.Hash(); }{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12393) DictEncoder uses inconsistent hash function for TimestampValue
[ https://issues.apache.org/jira/browse/IMPALA-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell updated IMPALA-12393: --- Priority: Blocker (was: Critical) > DictEncoder uses inconsistent hash function for TimestampValue > -- > > Key: IMPALA-12393 > URL: https://issues.apache.org/jira/browse/IMPALA-12393 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Priority: Blocker > > DictEncoder currently uses this hash function for TimestampValue: > {noformat} > template > inline uint32_t DictEncoder::Hash(const T& value) const { > return HashUtil::Hash(, sizeof(value), 0); > }{noformat} > TimestampValue has some padding, and nothing ensures that the padding is > cleared. This means that identical TimestampValue objects can hash to > different values. > This came up when fixing a Clang-Tidy performance check. This line in > dict-test.cc changed from iterating over values to iterating over const > references. > {noformat} > DictEncoder encoder(, fixed_buffer_byte_size, > _encoder); > encoder.UsedbyTest(); > << > for (InternalType i: values) encoder.Put(i); > = > for (const InternalType& i: values) encoder.Put(i); > > > bytes_alloc = encoder.DictByteSize(); > EXPECT_EQ(track_encoder.consumption(), bytes_alloc); > EXPECT_EQ(encoder.num_entries(), values_set.size()); <{noformat} > The test became flaky, with the encoder.num_entries() being larger than the > values_set.size() for TimestampValue. This happened because the hash values > didn't match even for identical entries and the dictionary would have > multiple copies of the same value. When iterating over a plain non-reference > TimestampValue, each TimestampValue is being copied to a temporary value. > Maybe in this circumstance the padding stays the same between iterations. > It's possible this would come up when writing Parquet data files. > One fix would be to use TimestampValue's Hash function, which ignores the > padding: > {noformat} > template<> > inline uint32_t DictEncoder::Hash(const TimestampValue& > value) const { > return value.Hash(); > }{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12392) Fix describe statements once HIVE-24509 arrives as dependency
Csaba Ringhofer created IMPALA-12392: Summary: Fix describe statements once HIVE-24509 arrives as dependency Key: IMPALA-12392 URL: https://issues.apache.org/jira/browse/IMPALA-12392 Project: IMPALA Issue Type: Task Components: Catalog Reporter: Csaba Ringhofer HIVE-24509 will break test_describe_materialized_view as the ShowUtils in not included in our shaded jar. It would be also nice to switch to ShowUtils.TextMetaDataTable here: https://github.com/apache/impala/blob/a34f7ce63299c72ef45a99b01bb4e80210befbff/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java#L88 AFAIK the old function is kept in Hive only because of Impala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12392) Fix describe statements once HIVE-24509 arrives as dependency
[ https://issues.apache.org/jira/browse/IMPALA-12392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer reassigned IMPALA-12392: Assignee: Csaba Ringhofer > Fix describe statements once HIVE-24509 arrives as dependency > - > > Key: IMPALA-12392 > URL: https://issues.apache.org/jira/browse/IMPALA-12392 > Project: IMPALA > Issue Type: Task > Components: Catalog >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Major > > HIVE-24509 will break test_describe_materialized_view as the ShowUtils in not > included in our shaded jar. > It would be also nice to switch to ShowUtils.TextMetaDataTable here: > https://github.com/apache/impala/blob/a34f7ce63299c72ef45a99b01bb4e80210befbff/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java#L88 > AFAIK the old function is kept in Hive only because of Impala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg
[ https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-12266: -- Labels: impala-iceberg (was: ) > Sporadic failure after migrating a table to Iceberg > --- > > Key: IMPALA-12266 > URL: https://issues.apache.org/jira/browse/IMPALA-12266 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.2.0 >Reporter: Tamas Mate >Assignee: Gabor Kaszab >Priority: Major > Labels: impala-iceberg > Attachments: > catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, > impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1 > > > TestIcebergTable.test_convert_table test failed in a recent verify job's > dockerised tests: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629 > {code:none} > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: Failed to load metadata for table: > 'parquet_nopartitioned' > E CAUSED BY: TableLoadingException: Could not load table > test_convert_table_cdba7383.parquet_nopartitioned from catalog > E CAUSED BY: TException: > TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, > error_msgs:[NullPointerException: null]), lookup_status:OK) > {code} > {code:none} > E0704 19:09:22.980131 833 JniUtil.java:183] > 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of > TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms > I0704 19:09:22.980309 833 jni-util.cc:288] > 7145c21173f2c47b:2579db55] java.lang.NullPointerException > at > org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357) > at > org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480) > at > org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109) > at > org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396) > I0704 19:09:22.980324 833 status.cc:129] 7145c21173f2c47b:2579db55] > NullPointerException: null > @ 0x1012f9f impala::Status::Status() > @ 0x187f964 impala::JniUtil::GetJniExceptionMsg() > @ 0xfee920 impala::JniCall::Call<>() > @ 0xfccd0f impala::Catalog::GetPartialCatalogObject() > @ 0xfb55a5 > impala::CatalogServiceThriftIf::GetPartialCatalogObject() > @ 0xf7a691 > impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject() > @ 0xf82151 impala::CatalogServiceProcessorT<>::dispatchCall() > @ 0xee330f apache::thrift::TDispatchProcessor::process() > @ 0x1329246 > apache::thrift::server::TAcceptQueueServer::Task::run() > @ 0x1315a89 impala::ThriftThread::RunRunnable() > @ 0x131773d > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x195ba8c impala::Thread::SuperviseThread() > @ 0x195c895 boost::detail::thread_data<>::run() > @ 0x23a03a7 thread_proxy > @ 0x7faaad2a66ba start_thread > @ 0x7f2c151d clone > E0704 19:09:23.006968 833 catalog-server.cc:278] > 7145c21173f2c47b:2579db55] NullPointerException: null > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg
[ https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757380#comment-17757380 ] Gabor Kaszab commented on IMPALA-12266: --- I managed to repro this with running the following SQL in a loop: {code:java} create table tmp_conv_tbl (i int) stored as parquet; insert into tmp_conv_tbl values (1), (2), (3); alter table tmp_conv_tbl convert to iceberg; alter table tmp_conv_tbl set tblproperties ('format-version'='2'); drop table tmp_conv_tbl; {code} For me the DROP TABLE statement failed with "Table doesn not exist" error. I guess it depends on which command is run on a different coordinator after the table conversion. Note, that this repro came in local catalog mode, however, I wouldn't be surprised if this repro-ed in normal catalog mode. This is how I enabled local catalog mode: {code:java} bin/start-impala-cluster.py --impalad_args='--use_local_catalog=true' --catalogd_args='--catalog_topic_mode=minimal' {code} > Sporadic failure after migrating a table to Iceberg > --- > > Key: IMPALA-12266 > URL: https://issues.apache.org/jira/browse/IMPALA-12266 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.2.0 >Reporter: Tamas Mate >Assignee: Gabor Kaszab >Priority: Major > Attachments: > catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, > impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1 > > > TestIcebergTable.test_convert_table test failed in a recent verify job's > dockerised tests: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629 > {code:none} > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: Failed to load metadata for table: > 'parquet_nopartitioned' > E CAUSED BY: TableLoadingException: Could not load table > test_convert_table_cdba7383.parquet_nopartitioned from catalog > E CAUSED BY: TException: > TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, > error_msgs:[NullPointerException: null]), lookup_status:OK) > {code} > {code:none} > E0704 19:09:22.980131 833 JniUtil.java:183] > 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of > TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms > I0704 19:09:22.980309 833 jni-util.cc:288] > 7145c21173f2c47b:2579db55] java.lang.NullPointerException > at > org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357) > at > org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480) > at > org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109) > at > org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396) > I0704 19:09:22.980324 833 status.cc:129] 7145c21173f2c47b:2579db55] > NullPointerException: null > @ 0x1012f9f impala::Status::Status() > @ 0x187f964 impala::JniUtil::GetJniExceptionMsg() > @ 0xfee920 impala::JniCall::Call<>() > @ 0xfccd0f impala::Catalog::GetPartialCatalogObject() > @ 0xfb55a5 > impala::CatalogServiceThriftIf::GetPartialCatalogObject() > @ 0xf7a691 > impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject() > @ 0xf82151 impala::CatalogServiceProcessorT<>::dispatchCall() > @ 0xee330f apache::thrift::TDispatchProcessor::process() > @ 0x1329246 > apache::thrift::server::TAcceptQueueServer::Task::run() > @ 0x1315a89 impala::ThriftThread::RunRunnable() > @ 0x131773d > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x195ba8c impala::Thread::SuperviseThread() > @ 0x195c895 boost::detail::thread_data<>::run() > @
[jira] [Updated] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg
[ https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-12266: -- Summary: Sporadic failure after migrating a table to Iceberg (was: Flaky TestIcebergTable.test_convert_table NPE) > Sporadic failure after migrating a table to Iceberg > --- > > Key: IMPALA-12266 > URL: https://issues.apache.org/jira/browse/IMPALA-12266 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.2.0 >Reporter: Tamas Mate >Assignee: Gabor Kaszab >Priority: Major > Attachments: > catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, > impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1 > > > TestIcebergTable.test_convert_table test failed in a recent verify job's > dockerised tests: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629 > {code:none} > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: Failed to load metadata for table: > 'parquet_nopartitioned' > E CAUSED BY: TableLoadingException: Could not load table > test_convert_table_cdba7383.parquet_nopartitioned from catalog > E CAUSED BY: TException: > TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, > error_msgs:[NullPointerException: null]), lookup_status:OK) > {code} > {code:none} > E0704 19:09:22.980131 833 JniUtil.java:183] > 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of > TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms > I0704 19:09:22.980309 833 jni-util.cc:288] > 7145c21173f2c47b:2579db55] java.lang.NullPointerException > at > org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357) > at > org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480) > at > org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109) > at > org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396) > I0704 19:09:22.980324 833 status.cc:129] 7145c21173f2c47b:2579db55] > NullPointerException: null > @ 0x1012f9f impala::Status::Status() > @ 0x187f964 impala::JniUtil::GetJniExceptionMsg() > @ 0xfee920 impala::JniCall::Call<>() > @ 0xfccd0f impala::Catalog::GetPartialCatalogObject() > @ 0xfb55a5 > impala::CatalogServiceThriftIf::GetPartialCatalogObject() > @ 0xf7a691 > impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject() > @ 0xf82151 impala::CatalogServiceProcessorT<>::dispatchCall() > @ 0xee330f apache::thrift::TDispatchProcessor::process() > @ 0x1329246 > apache::thrift::server::TAcceptQueueServer::Task::run() > @ 0x1315a89 impala::ThriftThread::RunRunnable() > @ 0x131773d > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x195ba8c impala::Thread::SuperviseThread() > @ 0x195c895 boost::detail::thread_data<>::run() > @ 0x23a03a7 thread_proxy > @ 0x7faaad2a66ba start_thread > @ 0x7f2c151d clone > E0704 19:09:23.006968 833 catalog-server.cc:278] > 7145c21173f2c47b:2579db55] NullPointerException: null > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12266) Flaky TestIcebergTable.test_convert_table NPE
[ https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757376#comment-17757376 ] Gabor Kaszab commented on IMPALA-12266: --- This is actually more than just a flaky tests as it comes in various scenarios. I'll rename the ticket to reflect this. > Flaky TestIcebergTable.test_convert_table NPE > - > > Key: IMPALA-12266 > URL: https://issues.apache.org/jira/browse/IMPALA-12266 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.2.0 >Reporter: Tamas Mate >Assignee: Gabor Kaszab >Priority: Major > Attachments: > catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, > impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1 > > > TestIcebergTable.test_convert_table test failed in a recent verify job's > dockerised tests: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629 > {code:none} > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: Failed to load metadata for table: > 'parquet_nopartitioned' > E CAUSED BY: TableLoadingException: Could not load table > test_convert_table_cdba7383.parquet_nopartitioned from catalog > E CAUSED BY: TException: > TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, > error_msgs:[NullPointerException: null]), lookup_status:OK) > {code} > {code:none} > E0704 19:09:22.980131 833 JniUtil.java:183] > 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of > TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms > I0704 19:09:22.980309 833 jni-util.cc:288] > 7145c21173f2c47b:2579db55] java.lang.NullPointerException > at > org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357) > at > org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480) > at > org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109) > at > org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396) > I0704 19:09:22.980324 833 status.cc:129] 7145c21173f2c47b:2579db55] > NullPointerException: null > @ 0x1012f9f impala::Status::Status() > @ 0x187f964 impala::JniUtil::GetJniExceptionMsg() > @ 0xfee920 impala::JniCall::Call<>() > @ 0xfccd0f impala::Catalog::GetPartialCatalogObject() > @ 0xfb55a5 > impala::CatalogServiceThriftIf::GetPartialCatalogObject() > @ 0xf7a691 > impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject() > @ 0xf82151 impala::CatalogServiceProcessorT<>::dispatchCall() > @ 0xee330f apache::thrift::TDispatchProcessor::process() > @ 0x1329246 > apache::thrift::server::TAcceptQueueServer::Task::run() > @ 0x1315a89 impala::ThriftThread::RunRunnable() > @ 0x131773d > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x195ba8c impala::Thread::SuperviseThread() > @ 0x195c895 boost::detail::thread_data<>::run() > @ 0x23a03a7 thread_proxy > @ 0x7faaad2a66ba start_thread > @ 0x7f2c151d clone > E0704 19:09:23.006968 833 catalog-server.cc:278] > 7145c21173f2c47b:2579db55] NullPointerException: null > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12019) Support ORDER BY for collections of fixed length types in select list
[ https://issues.apache.org/jira/browse/IMPALA-12019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker updated IMPALA-12019: --- Summary: Support ORDER BY for collections of fixed length types in select list (was: Support ORDER BY for arrays of fixed length types in select list) > Support ORDER BY for collections of fixed length types in select list > - > > Key: IMPALA-12019 > URL: https://issues.apache.org/jira/browse/IMPALA-12019 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Frontend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > Fix For: Impala 4.3.0 > > > As a first stage of IMPALA-10939, we should implement support for ORDER BY > for arrays that only contain fixed length types. That way the implementation > could be almost the same as the existing handling of strings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2761) Build and Run Impala on OS X
[ https://issues.apache.org/jira/browse/IMPALA-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757304#comment-17757304 ] Maxwell Guo commented on IMPALA-2761: - any update here ? Besides on macos for intel, I think apple m1/m2 is needed too. > Build and Run Impala on OS X > > > Key: IMPALA-2761 > URL: https://issues.apache.org/jira/browse/IMPALA-2761 > Project: IMPALA > Issue Type: New Feature > Components: Infrastructure >Affects Versions: Impala 2.3.0 >Reporter: Martin Grund >Priority: Minor > Labels: osx > > This is an Umbrella Ticket to support building an running Impala on Mac OS X. > Comments will be used to keep track of the status. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12386) NullExpr substitution failure with unsafe casts enabled
[ https://issues.apache.org/jira/browse/IMPALA-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757216#comment-17757216 ] Peter Rozsa commented on IMPALA-12386: -- Yes, but it's surfaced using ALLOW_UNSAFE_CASTS query option (added in IMPALA-10173), the original functionality is not affected. > NullExpr substitution failure with unsafe casts enabled > --- > > Key: IMPALA-12386 > URL: https://issues.apache.org/jira/browse/IMPALA-12386 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.3.0 >Reporter: Peter Rozsa >Assignee: Peter Rozsa >Priority: Major > > insert into t01(a, b) values(null, "23"), ("21", null) query fails with the > following error: > ERROR: IllegalStateException: Failed analysis after expr substitution. > CAUSED BY: IllegalStateException: cast STRING to INT > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org