[jira] [Commented] (IMPALA-10756) Catalog failed to load metadata.

2022-06-09 Thread Maarten Wullink (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552571#comment-17552571
 ] 

Maarten Wullink commented on IMPALA-10756:
--

this has not been fixed in the just released 4.1.0 version?

 

> Catalog failed to load metadata.
> 
>
> Key: IMPALA-10756
> URL: https://issues.apache.org/jira/browse/IMPALA-10756
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0
> Environment: System: CentOS Linux release 7.9.2009 (Core)
> impala version: 4.0
> hive version:  hive 3.1.2
>Reporter: zhi tang
>Priority: Major
>
> The Catalog throws a "Invalid method name: 'get_database_req'" exception when 
> it loads the metadata. Details of the exception:
> E0619 17:29:46.031193 301062 CatalogServiceCatalog.java:2614] Error executing 
> getDatabase() metastore call: default
>  Java exception follows:
>  org.apache.thrift.TApplicationException: Invalid method name: 
> 'get_database_req'
>  at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database_req(ThriftHiveMetastore.java:1337)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database_req(ThriftHiveMetastore.java:1324)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1940)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1924)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:208)
>  at com.sun.proxy.$Proxy11.getDatabase(Unknown Source)
>  at 
> org.apache.impala.catalog.CatalogServiceCatalog.invalidateTable(CatalogServiceCatalog.java:2608)
>  at 
> org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:4558)
>  at org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:187)
>  E0619 17:29:46.036005 301062 catalog-server.cc:159] TableNotFoundException: 
> Table not found: default.count_test
>  E0619 17:29:55.036509 301062 CatalogServiceCatalog.java:2614] Error 
> executing getDatabase() metastore call: default
>  Java exception follows:
>  org.apache.thrift.TApplicationException: Invalid method name: 
> 'get_database_req'
>  at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database_req(ThriftHiveMetastore.java:1337)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database_req(ThriftHiveMetastore.java:1324)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1940)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1924)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:208)
>  at com.sun.proxy.$Proxy11.getDatabase(Unknown Source)
>  at 
> org.apache.impala.catalog.CatalogServiceCatalog.invalidateTable(CatalogServiceCatalog.java:2608)
>  at 
> org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:4558)
>  at org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:187)
>  E0619 17:29:55.036792 301062 catalog-server.cc:159] TableNotFoundException: 
> Table not found: default.count_test



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11344) Selecting only the missing fields of ORC files should return NULLs

2022-06-09 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552513#comment-17552513
 ] 

Quanlong Huang commented on IMPALA-11344:
-

[~tangzhi] Do you want to take this? Same as what you did in IMPALA-11296, we 
just need to fix the code in OrcStructReader::TopLevelReadValueBatch().

> Selecting only the missing fields of ORC files should return NULLs
> --
>
> Key: IMPALA-11344
> URL: https://issues.apache.org/jira/browse/IMPALA-11344
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Critical
>  Labels: newbie, ramp-up
>
> While looking into the bug of IMPALA-11296, I found a bug on the same 
> scenario (scanning only the missing columns of ORC files) in current master 
> branch.
> Creating an ORC table with missing fields in the underlying files:
> {code:sql}
> hive> create external table missing_field_orc (f0 int) stored as orc;
> hive> insert into table missing_field_orc select 1;
> hive> alter table missing_field_orc add columns (f1 int);
> hive> select f1 from missing_field_orc;
> +---+
> |  f1   |
> +---+
> | NULL  |
> +---+
> hive> select f0, f1 from missing_field_orc;
> +-+---+
> | f0  |  f1   |
> +-+---+
> | 1   | NULL  |
> +-+---+
> {code}
> Run the same queries in Impala:
> {code:sql}
> impala> VERSION;
> Shell version: impala shell build version not available
> Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build 
> 7273cfdfb901b9ef564c2737cf00c7a8abb57f07)
> impala> invalidate metadata missing_field_orc;
> impala> select f1 from missing_field_orc;
> ERROR: Parse error in possibly corrupt ORC file: 
> 'hdfs://localhost:20500/test-warehouse/missing_field_orc/00_0'. No 
> columns found for this scan.
> impala> select f0, f1 from missing_field_orc;
> ++--+
> | f0 | f1   |
> ++--+
> | 1  | NULL |
> ++--+
> {code}
> While selecting only the column 'f1', the query failed by an error. It should 
> return NULL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11344) Selecting only the missing fields of ORC files should return NULLs

2022-06-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11344:

Labels: ramp-up  (was: )

> Selecting only the missing fields of ORC files should return NULLs
> --
>
> Key: IMPALA-11344
> URL: https://issues.apache.org/jira/browse/IMPALA-11344
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Critical
>  Labels: ramp-up
>
> While looking into the bug of IMPALA-11296, I found a bug on the same 
> scenario (scanning only the missing columns of ORC files) in current master 
> branch.
> Creating an ORC table with missing fields in the underlying files:
> {code:sql}
> hive> create external table missing_field_orc (f0 int) stored as orc;
> hive> insert into table missing_field_orc select 1;
> hive> alter table missing_field_orc add columns (f1 int);
> hive> select f1 from missing_field_orc;
> +---+
> |  f1   |
> +---+
> | NULL  |
> +---+
> hive> select f0, f1 from missing_field_orc;
> +-+---+
> | f0  |  f1   |
> +-+---+
> | 1   | NULL  |
> +-+---+
> {code}
> Run the same queries in Impala:
> {code:sql}
> impala> VERSION;
> Shell version: impala shell build version not available
> Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build 
> 7273cfdfb901b9ef564c2737cf00c7a8abb57f07)
> impala> invalidate metadata missing_field_orc;
> impala> select f1 from missing_field_orc;
> ERROR: Parse error in possibly corrupt ORC file: 
> 'hdfs://localhost:20500/test-warehouse/missing_field_orc/00_0'. No 
> columns found for this scan.
> impala> select f0, f1 from missing_field_orc;
> ++--+
> | f0 | f1   |
> ++--+
> | 1  | NULL |
> ++--+
> {code}
> While selecting only the column 'f1', the query failed by an error. It should 
> return NULL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11344) Selecting only the missing fields of ORC files should return NULLs

2022-06-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11344:

Labels: newbie ramp-up  (was: ramp-up)

> Selecting only the missing fields of ORC files should return NULLs
> --
>
> Key: IMPALA-11344
> URL: https://issues.apache.org/jira/browse/IMPALA-11344
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Critical
>  Labels: newbie, ramp-up
>
> While looking into the bug of IMPALA-11296, I found a bug on the same 
> scenario (scanning only the missing columns of ORC files) in current master 
> branch.
> Creating an ORC table with missing fields in the underlying files:
> {code:sql}
> hive> create external table missing_field_orc (f0 int) stored as orc;
> hive> insert into table missing_field_orc select 1;
> hive> alter table missing_field_orc add columns (f1 int);
> hive> select f1 from missing_field_orc;
> +---+
> |  f1   |
> +---+
> | NULL  |
> +---+
> hive> select f0, f1 from missing_field_orc;
> +-+---+
> | f0  |  f1   |
> +-+---+
> | 1   | NULL  |
> +-+---+
> {code}
> Run the same queries in Impala:
> {code:sql}
> impala> VERSION;
> Shell version: impala shell build version not available
> Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build 
> 7273cfdfb901b9ef564c2737cf00c7a8abb57f07)
> impala> invalidate metadata missing_field_orc;
> impala> select f1 from missing_field_orc;
> ERROR: Parse error in possibly corrupt ORC file: 
> 'hdfs://localhost:20500/test-warehouse/missing_field_orc/00_0'. No 
> columns found for this scan.
> impala> select f0, f1 from missing_field_orc;
> ++--+
> | f0 | f1   |
> ++--+
> | 1  | NULL |
> ++--+
> {code}
> While selecting only the column 'f1', the query failed by an error. It should 
> return NULL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11296) The executor has some resident threads that occupy CPU abnormally.

2022-06-09 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552506#comment-17552506
 ] 

Quanlong Huang commented on IMPALA-11296:
-

[~tangzhi] 's patch is under review: [https://gerrit.cloudera.org/c/18571/]

CC [~boroknagyz] 

> The executor has some resident threads that occupy CPU abnormally.
> --
>
> Key: IMPALA-11296
> URL: https://issues.apache.org/jira/browse/IMPALA-11296
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: zhi tang
>Assignee: zhi tang
>Priority: Major
> Attachments: image-2022-05-17-16-40-52-110.png, top_info.png
>
>
> The executor has some resident threads that occupy CPU abnormally. The 
> following is the call stack information of a thread:
> !image-2022-05-17-16-40-52-110.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11160) TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests

2022-06-09 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552504#comment-17552504
 ] 

Quanlong Huang commented on IMPALA-11160:
-

[~csringhofer] nice finding!

It seems a bug in local catalog mode. The coordinator could get the partially 
updated partition meta. As we see it has correct #Rows, #Files, Size, except 
the incrementalness. I guess setting sync_ddl=1 shower down the process so work 
around this.

> TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests
> 
>
> Key: IMPALA-11160
> URL: https://issues.apache.org/jira/browse/IMPALA-11160
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: broken-build
>
> h3. Error Message
> {noformat}
> query_test/test_acid.py:220: in test_acid_compute_stats 
> self.run_test_case('QueryTest/acid-compute-stats', vector, 
> use_db=unique_database) common/impala_test_suite.py:718: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:554: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:469: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal 
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E '1',1,1,'2B','NOT CACHED','NOT 
> CACHED',regex:.*,'true',regex:.* != '1',1,1,'2B','NOT CACHED','NOT 
> CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1'
>  E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','',''
> {noformat}
> h3. Stacktrace
> {noformat}
> query_test/test_acid.py:220: in test_acid_compute_stats
> self.run_test_case('QueryTest/acid-compute-stats', vector, 
> use_db=unique_database)
> common/impala_test_suite.py:718: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:554: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E '1',1,1,'2B','NOT CACHED','NOT CACHED',regex:.*,'true',regex:.* != 
> '1',1,1,'2B','NOT CACHED','NOT 
> CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1'
> E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','',''
> {noformat}
> It happend in 
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5359/
> *Please click on "Don't keep this build forever" once this issue is resolved*



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-11160) TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests

2022-06-09 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552501#comment-17552501
 ] 

Quanlong Huang edited comment on IMPALA-11160 at 6/10/22 2:15 AM:
--

Saw this again in 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5814]


was (Author: stiga-huang):
Saw this again in 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5814]

 

> TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests
> 
>
> Key: IMPALA-11160
> URL: https://issues.apache.org/jira/browse/IMPALA-11160
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: broken-build
>
> h3. Error Message
> {noformat}
> query_test/test_acid.py:220: in test_acid_compute_stats 
> self.run_test_case('QueryTest/acid-compute-stats', vector, 
> use_db=unique_database) common/impala_test_suite.py:718: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:554: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:469: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal 
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E '1',1,1,'2B','NOT CACHED','NOT 
> CACHED',regex:.*,'true',regex:.* != '1',1,1,'2B','NOT CACHED','NOT 
> CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1'
>  E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','',''
> {noformat}
> h3. Stacktrace
> {noformat}
> query_test/test_acid.py:220: in test_acid_compute_stats
> self.run_test_case('QueryTest/acid-compute-stats', vector, 
> use_db=unique_database)
> common/impala_test_suite.py:718: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:554: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E '1',1,1,'2B','NOT CACHED','NOT CACHED',regex:.*,'true',regex:.* != 
> '1',1,1,'2B','NOT CACHED','NOT 
> CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1'
> E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','',''
> {noformat}
> It happend in 
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5359/
> *Please click on "Don't keep this build forever" once this issue is resolved*



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11160) TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests

2022-06-09 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552501#comment-17552501
 ] 

Quanlong Huang commented on IMPALA-11160:
-

Saw this again in 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5814]

 

> TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests
> 
>
> Key: IMPALA-11160
> URL: https://issues.apache.org/jira/browse/IMPALA-11160
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: broken-build
>
> h3. Error Message
> {noformat}
> query_test/test_acid.py:220: in test_acid_compute_stats 
> self.run_test_case('QueryTest/acid-compute-stats', vector, 
> use_db=unique_database) common/impala_test_suite.py:718: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:554: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:469: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal 
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E '1',1,1,'2B','NOT CACHED','NOT 
> CACHED',regex:.*,'true',regex:.* != '1',1,1,'2B','NOT CACHED','NOT 
> CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1'
>  E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','',''
> {noformat}
> h3. Stacktrace
> {noformat}
> query_test/test_acid.py:220: in test_acid_compute_stats
> self.run_test_case('QueryTest/acid-compute-stats', vector, 
> use_db=unique_database)
> common/impala_test_suite.py:718: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:554: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E '1',1,1,'2B','NOT CACHED','NOT CACHED',regex:.*,'true',regex:.* != 
> '1',1,1,'2B','NOT CACHED','NOT 
> CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1'
> E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','',''
> {noformat}
> It happend in 
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5359/
> *Please click on "Don't keep this build forever" once this issue is resolved*



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11260) Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+

2022-06-09 Thread Vincent Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Tran updated IMPALA-11260:
--
Attachment: (was: image-2022-06-09-16-17-39-445.png)

> Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+
> -
>
> Key: IMPALA-11260
> URL: https://issues.apache.org/jira/browse/IMPALA-11260
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 4.0.0, Impala 3.3.0, 
> Impala 3.4.0, Impala 3.4.1
>Reporter: Quanlong Huang
>Priority: Critical
>
> When running local catalog mode on Java11, the Ehcache sizeof lib complains 
> that cache sizes may be underestimated:
> {code:java}
> W0421 20:50:44.238312  9819 ObjectGraphWalker.java:251] 
> 744e548159a57cb5:879ee74c] The JVM is preventing Ehcache from 
> accessing the subgraph beneath 'final jdk.internal.loader.URLClassPath 
> jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be 
> underestimated as a result
> Java exception follows:
> java.lang.reflect.InaccessibleObjectException: Unable to make field final 
> jdk.internal.loader.URLClassPath 
> jdk.internal.loader.ClassLoaders$AppClassLoader.ucp accessible: module 
> java.base does not "opens jdk.internal.loader" to unnamed module @6ba7383d
>         at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
>         at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
>         at 
> java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176)
>         at java.base/java.lang.reflect.Field.setAccessible(Field.java:170)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159)
>         at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:1999)
>         at 
> com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2010)
>         at 
> com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2956)
>         at com.google.common.cache.LocalCache.replace(LocalCache.java:4258)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:540)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1056)
>         at 
> org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:87)
>         at 
> org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:107)
>         at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:127)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:310)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:165)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:141)
>         at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2014)
>         at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1926)
>         at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1750)
>         at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:164){code}
> Similar errors on other classes:
> {code}
> The JVM is preventing Ehcache from accessing the subgraph beneath 'final 
> jdk.internal.loader.AbstractClassLoaderValue 
> jdk.internal.loader.AbstractClassLoaderValue$Sub.this$0' - cache sizes may be 
> underestimated as a result
> The JVM is preventing Ehcache from accessing the subgraph beneath 'final 
> jdk.internal.loader.URLClassPath 
> jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be 
> underestimated as a result
> The JVM is preventing Ehcache from accessing the subgraph beneath 'private 
> final java.lang.Object jdk.internal.loader.AbstractClassLoaderValue$Sub.key' 
> - cache sizes may be underestimated as a result
> The JVM is preventing Ehcache from accessing the subgraph beneath 'private 
> final java.lang.String java.lang.module.Configuration.targetPlatform' - cache 
> sizes may be underestimated as a result
> The JVM is preventing Ehcache from accessing the subgraph beneath 'private 
> final java.lang.String java.lang.module.ModuleDescriptor.mainClass' - cache 
> sizes may be underestimated as a result
> The JVM is preventing Ehcache from acc

[jira] (IMPALA-11260) Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+

2022-06-09 Thread Vincent Tran (Jira)


[ https://issues.apache.org/jira/browse/IMPALA-11260 ]


Vincent Tran deleted comment on IMPALA-11260:
---

was (Author: thundergun):
In the same way that this can hang {*}ImpalaServer::Start(){*}, this can also 
hang queries in {*}ImpalaServer::ExecuteInternal(){*}, since the call over JNI 
never returns because the Java threads in 
*org.apache.impala.service.JniFrontend.getCatalogMetrics()* are stuck here 
waiting for a *Future* that will never complete:
{noformat}
"Thread-17" #59 prio=5 os_prio=0 cpu=1404.89ms elapsed=5312.17s 
tid=0x0acee000 nid=0xae02 waiting on condition  [0x7f6443484000]
   java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.14/Native Method)
- parking to wait for  <0x7f6faf520b78> (a 
java.util.concurrent.CompletableFuture$Signaller)
at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.14/LockSupport.java:194)
at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.14/CompletableFuture.java:1796)
at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.14/ForkJoinPool.java:3128)
at 
java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.14/CompletableFuture.java:1823)
at 
java.util.concurrent.CompletableFuture.get(java.base@11.0.14/CompletableFuture.java:1998)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:649)
at 
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
at 
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
at 
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:779)
at 
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)

   Locked ownable synchronizers:
- None
{noformat}

> Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+
> -
>
> Key: IMPALA-11260
> URL: https://issues.apache.org/jira/browse/IMPALA-11260
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 4.0.0, Impala 3.3.0, 
> Impala 3.4.0, Impala 3.4.1
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: image-2022-06-09-16-17-39-445.png
>
>
> When running local catalog mode on Java11, the Ehcache sizeof lib complains 
> that cache sizes may be underestimated:
> {code:java}
> W0421 20:50:44.238312  9819 ObjectGraphWalker.java:251] 
> 744e548159a57cb5:879ee74c] The JVM is preventing Ehcache from 
> accessing the subgraph beneath 'final jdk.internal.loader.URLClassPath 
> jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be 
> underestimated as a result
> Java exception follows:
> java.lang.reflect.InaccessibleObjectException: Unable to make field final 
> jdk.internal.loader.URLClassPath 
> jdk.internal.loader.ClassLoaders$AppClassLoader.ucp accessible: module 
> java.base does not "opens jdk.internal.loader" to unnamed module @6ba7383d
>         at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
>         at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
>         at 
> java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176)
>         at java.base/java.lang.reflect.Field.setAccessible(Field.java:170)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159)
>         at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:1999)
>         at 
> com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2010)
>         at 
> com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2956)
>         at com.google.common.cache.LocalCache.replace(LocalCache.java:4258)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:540)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1056)
>         at 
> org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:87)
>         at 
> org.apache.impala.catalog.l

[jira] [Comment Edited] (IMPALA-11260) Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+

2022-06-09 Thread Vincent Tran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552417#comment-17552417
 ] 

Vincent Tran edited comment on IMPALA-11260 at 6/9/22 8:24 PM:
---

In the same way that this can hang {*}ImpalaServer::Start(){*}, this can also 
hang queries in {*}ImpalaServer::ExecuteInternal(){*}, since the call over JNI 
never returns because the Java threads in 
*org.apache.impala.service.JniFrontend.getCatalogMetrics()* are stuck here 
waiting for a *Future* that will never complete:
{noformat}
"Thread-17" #59 prio=5 os_prio=0 cpu=1404.89ms elapsed=5312.17s 
tid=0x0acee000 nid=0xae02 waiting on condition  [0x7f6443484000]
   java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.14/Native Method)
- parking to wait for  <0x7f6faf520b78> (a 
java.util.concurrent.CompletableFuture$Signaller)
at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.14/LockSupport.java:194)
at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.14/CompletableFuture.java:1796)
at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.14/ForkJoinPool.java:3128)
at 
java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.14/CompletableFuture.java:1823)
at 
java.util.concurrent.CompletableFuture.get(java.base@11.0.14/CompletableFuture.java:1998)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:649)
at 
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
at 
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
at 
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:779)
at 
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)

   Locked ownable synchronizers:
- None
{noformat}


was (Author: thundergun):
!image-2022-06-09-16-17-39-445.png!
In the same way that this can hang *ImpalaServer::Start()*, this can also hang 
queries in *ImpalaServer::ExecuteInternal()*, since the call over JNI never 
returns because the Java threads in 
*org.apache.impala.service.JniFrontend.getCatalogMetrics()* are stuck here 
waiting for a *Future* that will never complete:

{noformat}
"Thread-17" #59 prio=5 os_prio=0 cpu=1404.89ms elapsed=5312.17s 
tid=0x0acee000 nid=0xae02 waiting on condition  [0x7f6443484000]
   java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.14/Native Method)
- parking to wait for  <0x7f6faf520b78> (a 
java.util.concurrent.CompletableFuture$Signaller)
at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.14/LockSupport.java:194)
at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.14/CompletableFuture.java:1796)
at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.14/ForkJoinPool.java:3128)
at 
java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.14/CompletableFuture.java:1823)
at 
java.util.concurrent.CompletableFuture.get(java.base@11.0.14/CompletableFuture.java:1998)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:649)
at 
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
at 
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
at 
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:779)
at 
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)

   Locked ownable synchronizers:
- None
{noformat}



> Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+
> -
>
> Key: IMPALA-11260
> URL: https://issues.apache.org/jira/browse/IMPALA-11260
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 4.0.0, Impala 3.3.0, 
> Impala 3.4.0, Impala 3.4.1
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: image-2022-06-09-16-17-39-445.png
>
>
> When running local catalog mode on Java11, the Ehcache sizeof lib complains 
> that cache sizes may be 

[jira] [Commented] (IMPALA-11260) Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+

2022-06-09 Thread Vincent Tran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552417#comment-17552417
 ] 

Vincent Tran commented on IMPALA-11260:
---

!image-2022-06-09-16-17-39-445.png!
In the same way that this can hang *ImpalaServer::Start()*, this can also hang 
queries in *ImpalaServer::ExecuteInternal()*, since the call over JNI never 
returns because the Java threads in 
*org.apache.impala.service.JniFrontend.getCatalogMetrics()* are stuck here 
waiting for a *Future* that will never complete:

{noformat}
"Thread-17" #59 prio=5 os_prio=0 cpu=1404.89ms elapsed=5312.17s 
tid=0x0acee000 nid=0xae02 waiting on condition  [0x7f6443484000]
   java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.14/Native Method)
- parking to wait for  <0x7f6faf520b78> (a 
java.util.concurrent.CompletableFuture$Signaller)
at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.14/LockSupport.java:194)
at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.14/CompletableFuture.java:1796)
at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.14/ForkJoinPool.java:3128)
at 
java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.14/CompletableFuture.java:1823)
at 
java.util.concurrent.CompletableFuture.get(java.base@11.0.14/CompletableFuture.java:1998)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509)
at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:649)
at 
org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170)
at 
org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155)
at 
org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:779)
at 
org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221)

   Locked ownable synchronizers:
- None
{noformat}



> Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+
> -
>
> Key: IMPALA-11260
> URL: https://issues.apache.org/jira/browse/IMPALA-11260
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 4.0.0, Impala 3.3.0, 
> Impala 3.4.0, Impala 3.4.1
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: image-2022-06-09-16-17-39-445.png
>
>
> When running local catalog mode on Java11, the Ehcache sizeof lib complains 
> that cache sizes may be underestimated:
> {code:java}
> W0421 20:50:44.238312  9819 ObjectGraphWalker.java:251] 
> 744e548159a57cb5:879ee74c] The JVM is preventing Ehcache from 
> accessing the subgraph beneath 'final jdk.internal.loader.URLClassPath 
> jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be 
> underestimated as a result
> Java exception follows:
> java.lang.reflect.InaccessibleObjectException: Unable to make field final 
> jdk.internal.loader.URLClassPath 
> jdk.internal.loader.ClassLoaders$AppClassLoader.ucp accessible: module 
> java.base does not "opens jdk.internal.loader" to unnamed module @6ba7383d
>         at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
>         at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
>         at 
> java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176)
>         at java.base/java.lang.reflect.Field.setAccessible(Field.java:170)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159)
>         at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:1999)
>         at 
> com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2010)
>         at 
> com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2956)
>         at com.google.common.cache.LocalCache.replace(LocalCache.java:4258)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:540)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1056)
>         at 
> org.apache.impala.catalog.local.LocalIcebergTa

[jira] [Updated] (IMPALA-11260) Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+

2022-06-09 Thread Vincent Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Tran updated IMPALA-11260:
--
Attachment: image-2022-06-09-16-17-39-445.png

> Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+
> -
>
> Key: IMPALA-11260
> URL: https://issues.apache.org/jira/browse/IMPALA-11260
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 4.0.0, Impala 3.3.0, 
> Impala 3.4.0, Impala 3.4.1
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: image-2022-06-09-16-17-39-445.png
>
>
> When running local catalog mode on Java11, the Ehcache sizeof lib complains 
> that cache sizes may be underestimated:
> {code:java}
> W0421 20:50:44.238312  9819 ObjectGraphWalker.java:251] 
> 744e548159a57cb5:879ee74c] The JVM is preventing Ehcache from 
> accessing the subgraph beneath 'final jdk.internal.loader.URLClassPath 
> jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be 
> underestimated as a result
> Java exception follows:
> java.lang.reflect.InaccessibleObjectException: Unable to make field final 
> jdk.internal.loader.URLClassPath 
> jdk.internal.loader.ClassLoaders$AppClassLoader.ucp accessible: module 
> java.base does not "opens jdk.internal.loader" to unnamed module @6ba7383d
>         at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
>         at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
>         at 
> java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176)
>         at java.base/java.lang.reflect.Field.setAccessible(Field.java:170)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204)
>         at 
> org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159)
>         at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:1999)
>         at 
> com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2010)
>         at 
> com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2956)
>         at com.google.common.cache.LocalCache.replace(LocalCache.java:4258)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:540)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1056)
>         at 
> org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:87)
>         at 
> org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:107)
>         at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:127)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:310)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:165)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:141)
>         at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2014)
>         at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1926)
>         at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1750)
>         at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:164){code}
> Similar errors on other classes:
> {code}
> The JVM is preventing Ehcache from accessing the subgraph beneath 'final 
> jdk.internal.loader.AbstractClassLoaderValue 
> jdk.internal.loader.AbstractClassLoaderValue$Sub.this$0' - cache sizes may be 
> underestimated as a result
> The JVM is preventing Ehcache from accessing the subgraph beneath 'final 
> jdk.internal.loader.URLClassPath 
> jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be 
> underestimated as a result
> The JVM is preventing Ehcache from accessing the subgraph beneath 'private 
> final java.lang.Object jdk.internal.loader.AbstractClassLoaderValue$Sub.key' 
> - cache sizes may be underestimated as a result
> The JVM is preventing Ehcache from accessing the subgraph beneath 'private 
> final java.lang.String java.lang.module.Configuration.targetPlatform' - cache 
> sizes may be underestimated as a result
> The JVM is preventing Ehcache from accessing the subgraph beneath 'private 
> final java.lang.String java.lang.module.ModuleDescriptor.mainClass' - cache 
> sizes may be underestimated as a

[jira] [Work started] (IMPALA-10453) Support file/partition pruning via runtime filters on Iceberg

2022-06-09 Thread Tamas Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10453 started by Tamas Mate.
---
> Support file/partition pruning via runtime filters on Iceberg
> -
>
> Key: IMPALA-10453
> URL: https://issues.apache.org/jira/browse/IMPALA-10453
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tamas Mate
>Priority: Major
>  Labels: iceberg, impala-iceberg, performance
>
> This is a placeholder to figure out what we'd need to do to support dynamic 
> file-level pruning in Iceberg using runtime filters, i.e. have parity for 
> partition pruning.
> * If there is a single partition value per file, then applying bloom filters 
> to the row group stats would be effective at pruning files.
> * If there are partition transforms, e.g. hash-based, then I think we 
> probably need to track the partition that the file is associated with and 
> then have some custom logic in the parquet scanner to do partition pruning.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10267) Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples

2022-06-09 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer reassigned IMPALA-10267:


Assignee: Csaba Ringhofer  (was: Qifan Chen)

> Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples
> -
>
> Key: IMPALA-10267
> URL: https://issues.apache.org/jira/browse/IMPALA-10267
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Joe McDonnell
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.0.0
>
>
> An exhaustive job hit two Impalad crashes with the following stack:
> {noformat}
>  2  impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
> rbx = 0x   rbp = 0x7f82f98ad7a0
> rsp = 0x7f82f98ad6a0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x05209129
> Found by: call frame info
>  3  impalad!impala::HdfsScanner::WriteTemplateTuples(impala::TupleRow*, int) 
> [hdfs-scanner.cc : 235 + 0xf]
> rbx = 0x   rbp = 0x7f82f98ad7a0
> rsp = 0x7f82f98ad6b0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02802013
> Found by: call frame info
>  4  impalad!impala::HdfsAvroScanner::ProcessRange(impala::RowBatch*) 
> [hdfs-avro-scanner.cc : 553 + 0x19]
> rbx = 0x0400   rbp = 0x7f82f98adc60
> rsp = 0x7f82f98ad7b0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x0283880d
> Found by: call frame info
>  5  impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) 
> [base-sequence-scanner.cc : 189 + 0x2b]
> rbx = 0x   rbp = 0x7f82f98adf40
> rsp = 0x7f82f98adc70   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x029302b5
> Found by: call frame info
>  6  impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 143 + 0x39]
> rbx = 0x0292fbd4   rbp = 0x7f82f98ae000
> rsp = 0x7f82f98adf50   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x028011c9
> Found by: call frame info
>  7  
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 500 + 0x28]
> rbx = 0x8000   rbp = 0x7f82f98ae390
> rsp = 0x7f82f98ae010   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x0297aa3d
> Found by: call frame info
>  8  impalad!impala::HdfsScanNode::ScannerThread(bool, long) 
> [hdfs-scan-node.cc : 418 + 0x27]
> rbx = 0x0001abc6a760   rbp = 0x7f82f98ae750
> rsp = 0x7f82f98ae3a0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02979dbe
> Found by: call frame info
>  9  
> impalad!impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::{lambda()#1}::operator()()
>  const + 0x30
> rbx = 0x0bbf   rbp = 0x7f82f98ae770
> rsp = 0x7f82f98ae760   r12 = 0x08e18f40
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02979126
> Found by: call frame info{noformat}
> This seems to happen when running 
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes on 
> Avro. In reading the code in HdfsAvroScanner ProcessScanRanger(), it seems 
> impossible for this value to be negative, so it's unclear what is happening.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10267) Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples

2022-06-09 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552258#comment-17552258
 ] 

Csaba Ringhofer commented on IMPALA-10267:
--

My theory on what happens here is that there is an error during 
HdfsAvroScanner::ProcessRange(), but actually we still continue scanning:
https://github.com/apache/impala/blob/23d09638de35dcec6419a5e30df08fd5d8b27e7d/be/src/exec/base-sequence-scanner.cc#L190

For example first we set num_records_in_block_ to a lower value than the last 
record_pos_, and then fail here:
 
https://github.com/apache/impala/blob/23d09638de35dcec6419a5e30df08fd5d8b27e7d/be/src/exec/hdfs-avro-scanner.cc#L512

The next time we call process ProcessRange() we will assume that everything is 
ok, but num_records_in_block_ will be less than record_pos_.

> Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples
> -
>
> Key: IMPALA-10267
> URL: https://issues.apache.org/jira/browse/IMPALA-10267
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Joe McDonnell
>Assignee: Qifan Chen
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.0.0
>
>
> An exhaustive job hit two Impalad crashes with the following stack:
> {noformat}
>  2  impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
> rbx = 0x   rbp = 0x7f82f98ad7a0
> rsp = 0x7f82f98ad6a0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x05209129
> Found by: call frame info
>  3  impalad!impala::HdfsScanner::WriteTemplateTuples(impala::TupleRow*, int) 
> [hdfs-scanner.cc : 235 + 0xf]
> rbx = 0x   rbp = 0x7f82f98ad7a0
> rsp = 0x7f82f98ad6b0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02802013
> Found by: call frame info
>  4  impalad!impala::HdfsAvroScanner::ProcessRange(impala::RowBatch*) 
> [hdfs-avro-scanner.cc : 553 + 0x19]
> rbx = 0x0400   rbp = 0x7f82f98adc60
> rsp = 0x7f82f98ad7b0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x0283880d
> Found by: call frame info
>  5  impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) 
> [base-sequence-scanner.cc : 189 + 0x2b]
> rbx = 0x   rbp = 0x7f82f98adf40
> rsp = 0x7f82f98adc70   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x029302b5
> Found by: call frame info
>  6  impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 143 + 0x39]
> rbx = 0x0292fbd4   rbp = 0x7f82f98ae000
> rsp = 0x7f82f98adf50   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x028011c9
> Found by: call frame info
>  7  
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 500 + 0x28]
> rbx = 0x8000   rbp = 0x7f82f98ae390
> rsp = 0x7f82f98ae010   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x0297aa3d
> Found by: call frame info
>  8  impalad!impala::HdfsScanNode::ScannerThread(bool, long) 
> [hdfs-scan-node.cc : 418 + 0x27]
> rbx = 0x0001abc6a760   rbp = 0x7f82f98ae750
> rsp = 0x7f82f98ae3a0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02979dbe
> Found by: call frame info
>  9  
> impalad!impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::{lambda()#1}::operator()()
>  const + 0x30
> rbx = 0x0bbf   rbp = 0x7f82f98ae770
> rsp = 0x7f82f98ae760   r12 = 0x08e18f40
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02979126
> Found by: call frame info{noformat}
> This seems to happen when running 
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes on 
> Avro. In reading the code in HdfsAvroScanner ProcessScanRanger(), it seems 
> impossible for this value to be negative, so it's unclear what is happening.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubs

[jira] [Updated] (IMPALA-11280) Zipping unnest hits DCHECK when querying from a view that has an IN operator

2022-06-09 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab updated IMPALA-11280:
--
Description: 
*Repro steps:*

1) Create a view that returns arrays and has an IN operator in the WHERE clause:
{code:java}
drop view if exists unnest_bug_view;
create view unnest_bug_view as (
  select id, arr1, arr2
  from functional_parquet.complextypes_arrays
  where id % 2 = 1 and id in (select id from functional_parquet.alltypestiny)
); {code}
2) Unnest the arrays and filter by the unnested values in an outer SELECT:
{code:java}
select
  id,
  unnested_arr1,
  unnested_arr2
from
  (select
 id,
 unnest(arr1) as unnested_arr1,
 unnest(arr2) as unnested_arr2
   from unnest_bug_view) a
where a.unnested_arr1 < 5; {code}
This hits a DCHECK in RowDescriptor::GetTupleIdx()

 

 
{code:java}
descriptors.cc:467] 5643fd6cdd5cece3:77942ead] Check failed: id < 
tuple_idx_map_.size() (3 vs. 2) RowDescriptor: Tuple(id=0 size=29 
slots=[Slot(id=2 type=INT col_path=[0] offset=24 null=(offset=28 mask=4) 
slot_idx=2 field_idx=2), Slot(id=3 type=ARRAY col_path=[1] children_tuple_id=3 
offset=0 null=(offset=28 mask=1) slot_idx=0 field_idx=0), Slot(id=5 type=ARRAY 
col_path=[2] children_tuple_id=4 offset=12 null=(offset=28 mask=2) slot_idx=1 
field_idx=1)] tuple_path=[])
Tuple(id=1 size=5 slots=[Slot(id=0 type=INT col_path=[2] offset=0 
null=(offset=4 mask=1) slot_idx=0 field_idx=0)] tuple_path=[])
*** Check failure stack trace: ***
    @          0x36fe72c  google::LogMessage::Fail()
    @          0x36fffdc  google::LogMessage::SendToLog()
    @          0x36fe08a  google::LogMessage::Flush()
    @          0x3701c48  google::LogMessageFatal::~LogMessageFatal()
    @          0x12e47ab  impala::RowDescriptor::GetTupleIdx()
    @          0x1b378f5  impala::SlotRef::Init()
    @          0x1b25fea  impala::ScalarExpr::Init()
    @          0x1b665b2  impala::ScalarFnCall::Init()
    @          0x1b2c44e  impala::ScalarExpr::Create()
    @          0x1b2c5df  impala::ScalarExpr::Create()
    @          0x1b2c6a0  impala::ScalarExpr::Create()
    @          0x19ad286  impala::PartitionedHashJoinPlanNode::Init()
    @          0x18b5d8d  impala::PlanNode::CreateTreeHelper()
    @          0x18b5cd9  impala::PlanNode::CreateTreeHelper()
    @          0x18b5e48  impala::PlanNode::CreateTree()
    @          0x12f4ca7  impala::FragmentState::Init()
    @          0x12f839c  impala::FragmentState::CreateFragmentStateMap()
    @          0x126cedb  impala::QueryState::StartFInstances()
    @          0x125c4df  impala::QueryExecMgr::ExecuteQueryHelper()
{code}
 

 

Some notes about the repro:
 - The inside of the select (without filtering on the unnested value) is OK.
 - If I unnest only one array then this is OK.
 - If I remove the IN clause from the view’s DDL then the query runs well.

 

{*}Update{*}:

I managed to do a repro without creating an actual view. This might reduce the 
complexity with the tuple/slot IDs for the investigation.
{code:java}
select id, unnested_arr1, unnested_arr2 from (
select id, unnest(arr1) as unnested_arr1, unnest(arr2) as unnested_arr2
  from functional_parquet.complextypes_arrays
  where id in (select id from functional_parquet.alltypestiny)) a
where a.unnested_arr1 < 5 {code}

  was:
*Repro steps:*

1) Create a view that returns arrays and has an IN operator in the WHERE clause:
{code:java}
drop view if exists unnest_bug_view;
create view unnest_bug_view as (
  select id, arr1, arr2
  from functional_parquet.complextypes_arrays
  where id % 2 = 1 and id in (select id from functional_parquet.alltypestiny)
); {code}
2) Unnest the arrays and filter by the unnested values in an outer SELECT:
{code:java}
select
  id,
  unnested_arr1,
  unnested_arr2
from
  (select
 id,
 unnest(arr1) as unnested_arr1,
 unnest(arr2) as unnested_arr2
   from unnest_bug_view) a
where a.unnested_arr1 < 5; {code}
This hits a DCHECK in RowDescriptor::GetTupleIdx()

 

 
{code:java}
descriptors.cc:467] 5643fd6cdd5cece3:77942ead] Check failed: id < 
tuple_idx_map_.size() (3 vs. 2) RowDescriptor: Tuple(id=0 size=29 
slots=[Slot(id=2 type=INT col_path=[0] offset=24 null=(offset=28 mask=4) 
slot_idx=2 field_idx=2), Slot(id=3 type=ARRAY col_path=[1] children_tuple_id=3 
offset=0 null=(offset=28 mask=1) slot_idx=0 field_idx=0), Slot(id=5 type=ARRAY 
col_path=[2] children_tuple_id=4 offset=12 null=(offset=28 mask=2) slot_idx=1 
field_idx=1)] tuple_path=[])
Tuple(id=1 size=5 slots=[Slot(id=0 type=INT col_path=[2] offset=0 
null=(offset=4 mask=1) slot_idx=0 field_idx=0)] tuple_path=[])
*** Check failure stack trace: ***
    @          0x36fe72c  google::LogMessage::Fail()
    @          0x36fffdc  google::LogMessage::SendToLog()
    @          0x36fe08a  google::LogMessage::Flush()
    @          0x3701c48  google::LogMessageFatal::~LogMessageFatal()
    @          0x12e47ab  impala::RowDe

[jira] [Updated] (IMPALA-11280) Zipping unnest hits DCHECK when querying from a view that has an IN operator

2022-06-09 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab updated IMPALA-11280:
--
Description: 
*Repro steps:*

1) Create a view that returns arrays and has an IN operator in the WHERE clause:
{code:java}
drop view if exists unnest_bug_view;
create view unnest_bug_view as (
  select id, arr1, arr2
  from functional_parquet.complextypes_arrays
  where id % 2 = 1 and id in (select id from functional_parquet.alltypestiny)
); {code}
2) Unnest the arrays and filter by the unnested values in an outer SELECT:
{code:java}
select
  id,
  unnested_arr1,
  unnested_arr2
from
  (select
 id,
 unnest(arr1) as unnested_arr1,
 unnest(arr2) as unnested_arr2
   from unnest_bug_view) a
where a.unnested_arr1 < 5; {code}
This hits a DCHECK in RowDescriptor::GetTupleIdx()

 

 
{code:java}
descriptors.cc:467] 5643fd6cdd5cece3:77942ead] Check failed: id < 
tuple_idx_map_.size() (3 vs. 2) RowDescriptor: Tuple(id=0 size=29 
slots=[Slot(id=2 type=INT col_path=[0] offset=24 null=(offset=28 mask=4) 
slot_idx=2 field_idx=2), Slot(id=3 type=ARRAY col_path=[1] children_tuple_id=3 
offset=0 null=(offset=28 mask=1) slot_idx=0 field_idx=0), Slot(id=5 type=ARRAY 
col_path=[2] children_tuple_id=4 offset=12 null=(offset=28 mask=2) slot_idx=1 
field_idx=1)] tuple_path=[])
Tuple(id=1 size=5 slots=[Slot(id=0 type=INT col_path=[2] offset=0 
null=(offset=4 mask=1) slot_idx=0 field_idx=0)] tuple_path=[])
*** Check failure stack trace: ***
    @          0x36fe72c  google::LogMessage::Fail()
    @          0x36fffdc  google::LogMessage::SendToLog()
    @          0x36fe08a  google::LogMessage::Flush()
    @          0x3701c48  google::LogMessageFatal::~LogMessageFatal()
    @          0x12e47ab  impala::RowDescriptor::GetTupleIdx()
    @          0x1b378f5  impala::SlotRef::Init()
    @          0x1b25fea  impala::ScalarExpr::Init()
    @          0x1b665b2  impala::ScalarFnCall::Init()
    @          0x1b2c44e  impala::ScalarExpr::Create()
    @          0x1b2c5df  impala::ScalarExpr::Create()
    @          0x1b2c6a0  impala::ScalarExpr::Create()
    @          0x19ad286  impala::PartitionedHashJoinPlanNode::Init()
    @          0x18b5d8d  impala::PlanNode::CreateTreeHelper()
    @          0x18b5cd9  impala::PlanNode::CreateTreeHelper()
    @          0x18b5e48  impala::PlanNode::CreateTree()
    @          0x12f4ca7  impala::FragmentState::Init()
    @          0x12f839c  impala::FragmentState::CreateFragmentStateMap()
    @          0x126cedb  impala::QueryState::StartFInstances()
    @          0x125c4df  impala::QueryExecMgr::ExecuteQueryHelper()
{code}
 

 

Some notes about the repro:
 - The inside of the select (without filtering on the unnested value) is OK.
 - If I unnest only one array then this is OK.
 - If I remove the IN clause from the view’s DDL then the query runs well.

 

{*}Update{*}:

I managed to do a repro without creating an actual view:
{code:java}
select id, unnested_arr1, unnested_arr2 from (
select id, unnest(arr1) as unnested_arr1, unnest(arr2) as unnested_arr2
  from functional_parquet.complextypes_arrays
  where id in (select id from functional_parquet.alltypestiny)) a
where a.unnested_arr1 < 5 {code}

  was:
*Repro steps:*

1) Create a view that returns arrays and has an IN operator in the WHERE clause:
{code:java}
drop view if exists unnest_bug_view;
create view unnest_bug_view as (
  select id, arr1, arr2
  from functional_parquet.complextypes_arrays
  where id % 2 = 1 and id in (select id from functional_parquet.alltypestiny)
); {code}
2) Unnest the arrays and filter by the unnested values in an outer SELECT:
{code:java}
select
  id,
  unnested_arr1,
  unnested_arr2
from
  (select
 id,
 unnest(arr1) as unnested_arr1,
 unnest(arr2) as unnested_arr2
   from unnest_bug_view) a
where a.unnested_arr1 < 5; {code}
This hits a DCHECK in RowDescriptor::GetTupleIdx()

 

 
{code:java}
descriptors.cc:467] 5643fd6cdd5cece3:77942ead] Check failed: id < 
tuple_idx_map_.size() (3 vs. 2) RowDescriptor: Tuple(id=0 size=29 
slots=[Slot(id=2 type=INT col_path=[0] offset=24 null=(offset=28 mask=4) 
slot_idx=2 field_idx=2), Slot(id=3 type=ARRAY col_path=[1] children_tuple_id=3 
offset=0 null=(offset=28 mask=1) slot_idx=0 field_idx=0), Slot(id=5 type=ARRAY 
col_path=[2] children_tuple_id=4 offset=12 null=(offset=28 mask=2) slot_idx=1 
field_idx=1)] tuple_path=[])
Tuple(id=1 size=5 slots=[Slot(id=0 type=INT col_path=[2] offset=0 
null=(offset=4 mask=1) slot_idx=0 field_idx=0)] tuple_path=[])
*** Check failure stack trace: ***
    @          0x36fe72c  google::LogMessage::Fail()
    @          0x36fffdc  google::LogMessage::SendToLog()
    @          0x36fe08a  google::LogMessage::Flush()
    @          0x3701c48  google::LogMessageFatal::~LogMessageFatal()
    @          0x12e47ab  impala::RowDescriptor::GetTupleIdx()
    @          0x1b378f5  impala::SlotRef::Init()
    @  

[jira] [Comment Edited] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables

2022-06-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552214#comment-17552214
 ] 

Zoltán Borók-Nagy edited comment on IMPALA-11053 at 6/9/22 2:00 PM:


-I was able to quickly fix it.- Opened IMPALA-11346 to track the bug.

UPDATE: the fix wasn't correct, still working on it.
h1.  


was (Author: boroknagyz):
I was able to quickly fix it. Opened IMPALA-11346 to track the bug.
h1.

> Impala should be able to read migrated partitioned Iceberg tables
> -
>
> Key: IMPALA-11053
> URL: https://issues.apache.org/jira/browse/IMPALA-11053
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> When Hive (and probably other engines as well) converts a legacy Hive table 
> to Iceberg it doesn't rewrite the data files.
> It means that the data files don't have write ids, moreover they don't have 
> the partition columns neither.
> Currently Impala expects tha partition columns to be present in the data 
> files, so it won't be able to read converted partitioned tables.
> So we need to inject partition values from the Iceberg metadata, plus resolve 
> columns correctly (position-based resolution needs an offset).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables

2022-06-09 Thread LiPenglin (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552218#comment-17552218
 ] 

LiPenglin edited comment on IMPALA-11053 at 6/9/22 1:43 PM:


Thanks [~boroknagyz] I cleaned up my code and got the same results as you, 
sorry for the above mistake.
{code:java}
[localhost.localdomain:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=true;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.
 {code}
One of the things I recently did was migrate from a Hive table to a Iceberg 
table. I expected to ensure that the original Hive Partition Column would 
continue to be used in the WHERE clause after the migration. So, is there a 
solution to errors on partition column values?

UPDATE: I saw the IMPALA-11346, that is great!

 


was (Author: lipenglin):
Thanks [~boroknagyz] I cleaned up my code and got the same results as you, 
sorry for the above mistake.
{code:java}
[localhost.localdomain:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=true;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.
 {code}
One of the things I recently did was migrate from a Hive table to a Iceberg 
table. I expected to ensure that the original Hive Partition Column would 
continue to be used in the WHERE clause after the migration. So, is there a 
solution to errors on partition column values?

 

 

> Impala should be able to read migrated partitioned Iceberg tables
> -
>
> Key: IMPALA-11053
> URL: https://issues.apache.org/jira/browse/IMPALA-11053
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> When Hive (and probably other engines as well) converts a legacy Hive table 
> to Iceberg it doesn't rewrite the data files.
> It means that the data files don't have write ids, moreover they don't have 
> the partition columns neither.
> Currently Impala expects tha partition columns to be present in the data 
> files, so it won't be able to read converted partitioned tables.
> So we need to inject partition values from the Iceberg metadata, plus resolve 
> columns correctly (position-based resolution needs an offset).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables

2022-06-09 Thread LiPenglin (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552218#comment-17552218
 ] 

LiPenglin commented on IMPALA-11053:


Thanks [~boroknagyz] I cleaned up my code and got the same results as you, 
sorry for the above mistake.
{code:java}
[localhost.localdomain:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=true;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.
 {code}
One of the things I recently did was migrate from a Hive table to a Iceberg 
table. I expected to ensure that the original Hive Partition Column would 
continue to be used in the WHERE clause after the migration. So, is there a 
solution to errors on partition column values?

 

 

> Impala should be able to read migrated partitioned Iceberg tables
> -
>
> Key: IMPALA-11053
> URL: https://issues.apache.org/jira/browse/IMPALA-11053
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> When Hive (and probably other engines as well) converts a legacy Hive table 
> to Iceberg it doesn't rewrite the data files.
> It means that the data files don't have write ids, moreover they don't have 
> the partition columns neither.
> Currently Impala expects tha partition columns to be present in the data 
> files, so it won't be able to read converted partitioned tables.
> So we need to inject partition values from the Iceberg metadata, plus resolve 
> columns correctly (position-based resolution needs an offset).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-11346) Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

2022-06-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-11346:
--

Assignee: Zoltán Borók-Nagy

> Migrated partitioned Iceberg tables might return ERROR when WHERE condition 
> is used on partition column
> ---
>
> Key: IMPALA-11346
> URL: https://issues.apache.org/jira/browse/IMPALA-11346
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> {noformat}
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where p_bool=false;
> Fetched 0 row(s) in 0.11s
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where p_bool=true;
> ERROR: Unable to find SchemaNode for path 
> 'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where i=3;
> Fetched 0 row(s) in 0.12s
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where i=1;
> +---++---+--+---+--+---++--+
> | i | p_bool | p_int | p_bigint | p_float       | p_double | p_decimal | 
> p_date     | p_string |
> +---++---+--+---+--+---++--+
> | 1 | true   | 1     | 11       | 1.1002384 | 2.222    | 123.321   | 
> 2022-02-22 | impala   |
> +---++---+--+---+--+---++--+
> Fetched 1 row(s) in 0.12s
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where p_int=1;
> ERROR: Unable to find SchemaNode for path 
> 'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where p_int=3;
> Fetched 0 row(s) in 0.11s{noformat}
> So we don't get incorrect results at least, but getting errors on partition 
> column values that are existing.
> It seems like it works well with ORC.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables

2022-06-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552214#comment-17552214
 ] 

Zoltán Borók-Nagy commented on IMPALA-11053:


I was able to quickly fix it. Opened IMPALA-11346 to track the bug.
h1.

> Impala should be able to read migrated partitioned Iceberg tables
> -
>
> Key: IMPALA-11053
> URL: https://issues.apache.org/jira/browse/IMPALA-11053
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> When Hive (and probably other engines as well) converts a legacy Hive table 
> to Iceberg it doesn't rewrite the data files.
> It means that the data files don't have write ids, moreover they don't have 
> the partition columns neither.
> Currently Impala expects tha partition columns to be present in the data 
> files, so it won't be able to read converted partitioned tables.
> So we need to inject partition values from the Iceberg metadata, plus resolve 
> columns correctly (position-based resolution needs an offset).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11346) Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

2022-06-09 Thread Jira
Zoltán Borók-Nagy created IMPALA-11346:
--

 Summary: Migrated partitioned Iceberg tables might return ERROR 
when WHERE condition is used on partition column
 Key: IMPALA-11346
 URL: https://issues.apache.org/jira/browse/IMPALA-11346
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Zoltán Borók-Nagy


{noformat}
[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=false;
Fetched 0 row(s) in 0.11s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=true;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where i=3;
Fetched 0 row(s) in 0.12s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where i=1;
+---++---+--+---+--+---++--+
| i | p_bool | p_int | p_bigint | p_float       | p_double | p_decimal | p_date 
    | p_string |
+---++---+--+---+--+---++--+
| 1 | true   | 1     | 11       | 1.1002384 | 2.222    | 123.321   | 
2022-02-22 | impala   |
+---++---+--+---+--+---++--+
Fetched 1 row(s) in 0.12s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_int=1;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_int=3;
Fetched 0 row(s) in 0.11s{noformat}

So we don't get incorrect results at least, but getting errors on partition 
column values that are existing.

It seems like it works well with ORC.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-11293) Add COMPACT command for Iceberg tables

2022-06-09 Thread Tamas Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-11293 started by Tamas Mate.
---
> Add COMPACT command for Iceberg tables
> --
>
> Key: IMPALA-11293
> URL: https://issues.apache.org/jira/browse/IMPALA-11293
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Tamas Mate
>Priority: Major
>  Labels: impala-iceberg
>
> Currently Impala cannot compact Iceberg tables.
> The following INSERT OVERWRITE statement could be used in the simple cases, 
> i.e. when the following conditions meet:
>  * all data files use the same partition spec (i.e. no partition evolution)
>  * no bucket partitioning (we currently forbid INSERT OVERWRITE for bucket 
> partitioning)
> {noformat}
> INSERT OVERWRITE t SELECT * FROM t;{noformat}
> We could have a command that compacts the Iceberg table (syntax needs to be 
> the same with Hive), e.g.:
> {noformat}
> ALTER TABLE t EXECUTE compaction();{noformat}
> At first, the compact command could be just rewritten to the INSERT OVERWRITE 
> command, but it would also check that there's no partition evolution.
> The "no bucket" partitioning condition could be relaxed in this case, because 
> the result would be deterministic. I.e. the only condition we need to check 
> is that there was no partition evolution.
> Later, we could do compaction by
> {noformat}
> TRUNCATE TABLE t;
> INSERT INTO t SELECT * FROM t FOR SYSTEM_TIME AS OF ...;{noformat}
> Currently time-travel queries are not optimized, but we could workaround it 
> by doing planning at first of:
> {noformat}
> Create the plan for:
> TRUNCATE TABLE t;
> INSERT INTO t SELECT * FROM t;{noformat}
> Then execute them:
> {noformat}
> Actually execute:
> TRUNCATE TABLE t;
> INSERT INTO t SELECT * FROM t; (no need for time-travel, plan was created 
> before TRUNCATE){noformat}
> This could workaround the planning overhead of time-travel queries.
> Also, we might add some locking for the table if possible.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables

2022-06-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552203#comment-17552203
 ] 

Zoltán Borók-Nagy edited comment on IMPALA-11053 at 6/9/22 1:20 PM:


Thanks [~LiPenglin] I'm observing a bit different behavior:
{noformat}
[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=false;
Fetched 0 row(s) in 0.11s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=true;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where i=3;
Fetched 0 row(s) in 0.12s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where i=1;
+---++---+--+---+--+---++--+
| i | p_bool | p_int | p_bigint | p_float       | p_double | p_decimal | p_date 
    | p_string |
+---++---+--+---+--+---++--+
| 1 | true   | 1     | 11       | 1.1002384 | 2.222    | 123.321   | 
2022-02-22 | impala   |
+---++---+--+---+--+---++--+
Fetched 1 row(s) in 0.12s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_int=1;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_int=3;
Fetched 0 row(s) in 0.11s{noformat}
So I don't get incorrect results at least, but getting errors on partition 
column values that are existing.

UPDATE: it seems like it works well with ORC.


was (Author: boroknagyz):
Thanks [~LiPenglin] I'm observing a bit different behavior:
{noformat}
[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=false;
Fetched 0 row(s) in 0.11s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=true;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where i=3;
Fetched 0 row(s) in 0.12s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where i=1;
+---++---+--+---+--+---++--+
| i | p_bool | p_int | p_bigint | p_float       | p_double | p_decimal | p_date 
    | p_string |
+---++---+--+---+--+---++--+
| 1 | true   | 1     | 11       | 1.1002384 | 2.222    | 123.321   | 
2022-02-22 | impala   |
+---++---+--+---+--+---++--+
Fetched 1 row(s) in 0.12s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_int=1;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_int=3;
Fetched 0 row(s) in 0.11s{noformat}
So I don't get incorrect results at least, but getting errors on partition 
column values that are existing.

> Impala should be able to read migrated partitioned Iceberg tables
> -
>
> Key: IMPALA-11053
> URL: https://issues.apache.org/jira/browse/IMPALA-11053
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> When Hive (and probably other engines a

[jira] [Commented] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables

2022-06-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552203#comment-17552203
 ] 

Zoltán Borók-Nagy commented on IMPALA-11053:


Thanks [~LiPenglin] I'm observing a bit different behavior:
{noformat}
[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=false;
Fetched 0 row(s) in 0.11s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=true;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where i=3;
Fetched 0 row(s) in 0.12s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where i=1;
+---++---+--+---+--+---++--+
| i | p_bool | p_int | p_bigint | p_float       | p_double | p_decimal | p_date 
    | p_string |
+---++---+--+---+--+---++--+
| 1 | true   | 1     | 11       | 1.1002384 | 2.222    | 123.321   | 
2022-02-22 | impala   |
+---++---+--+---+--+---++--+
Fetched 1 row(s) in 0.12s

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_int=1;
ERROR: Unable to find SchemaNode for path 
'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file 
'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.

[localhost:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_int=3;
Fetched 0 row(s) in 0.11s{noformat}
So I don't get incorrect results at least, but getting errors on partition 
column values that are existing.

> Impala should be able to read migrated partitioned Iceberg tables
> -
>
> Key: IMPALA-11053
> URL: https://issues.apache.org/jira/browse/IMPALA-11053
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> When Hive (and probably other engines as well) converts a legacy Hive table 
> to Iceberg it doesn't rewrite the data files.
> It means that the data files don't have write ids, moreover they don't have 
> the partition columns neither.
> Currently Impala expects tha partition columns to be present in the data 
> files, so it won't be able to read converted partitioned tables.
> So we need to inject partition values from the Iceberg metadata, plus resolve 
> columns correctly (position-based resolution needs an offset).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8011) Allow filtering on virtual column for file name

2022-06-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-8011.
---
Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> Allow filtering on virtual column for file name
> ---
>
> Key: IMPALA-8011
> URL: https://issues.apache.org/jira/browse/IMPALA-8011
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Peter Ebert
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: built-in-function
> Fix For: Impala 4.2.0
>
>
> An additional performance enhancement would be the capability to filter on 
> file names using a virtual column.  This would be somewhat like the current 
> optimization of sorting data and skipping files based on parquet metadata, 
> but instead you put something in the file name to indicate it's contents 
> should be filtered.
> For example say you were writing first names and then searching for them, 
> during your writing phase you put the first letter of the first name into 
> your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" 
> then when doing a query you could filter based on where INPUT__FILE__NAME 
> contains "D" when searching for David and skip reading the file.
> Another use would be if you had a daily partition, and you put the timestamp 
> into the file name, then limit the search to only the last hour even though 
> your partition is daily. This then gives you the ability to sort by another 
> column making searches even faster on both.
>  
> This requires IMPALA-801



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-801) Add function or virtual column for file name

2022-06-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-801.
--
Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> Add function or virtual column for file name
> 
>
> Key: IMPALA-801
> URL: https://issues.apache.org/jira/browse/IMPALA-801
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Affects Versions: Impala 1.2.3
>Reporter: Udai Kiran Potluri
>Assignee: Zoltán Borók-Nagy
>Priority: Minor
>  Labels: built-in-function, impala-iceberg, ramp-up
> Fix For: Impala 4.2.0
>
>
> Hive can list the data files in a table. For eg the following query lists all 
> the data files for the table or partition:
> {noformat}
> select INPUT__FILE__NAME, count(*) from  where dt='20140210' 
> group by INPUT__FILE__NAME;
> {noformat}
> This has two advantages over the existing "show files" functionality:
> * The output can be used in arbitrary SQL statements.
> * You can see which record came from which file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables

2022-06-09 Thread LiPenglin (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552144#comment-17552144
 ] 

LiPenglin commented on IMPALA-11053:


Hi [~boroknagyz] 

This is ok when a full table scan is performed.

However, the predicate in the WHERE clause does not work.
{code:java}
--- 
https://gerrit.cloudera.org/#/c/18240/11/testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-tables.test
[localhost.localdomain:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where p_bool=false;
+---++---+--+---+--+---++--+
| i | p_bool | p_int | p_bigint | p_float       | p_double | p_decimal | p_date 
    | p_string |
+---++---+--+---+--+---++--+
| 1 | true   | 1     | 11       | 1.1002384 | 2.222    | 123.321   | 
2022-02-22 | impala   |
| 2 | true   | 1     | 11       | 1.1002384 | 2.222    | 123.321   | 
2022-02-22 | impala   |
+---++---+--+---+--+---++--+
Fetched 2 row(s) in 0.13s
[localhost.localdomain:21050] default> select * from 
functional_parquet.iceberg_alltypes_part where i=3;
+---++---+--+---+--+---++--+
| i | p_bool | p_int | p_bigint | p_float       | p_double | p_decimal | p_date 
    | p_string |
+---++---+--+---+--+---++--+
| 1 | true   | 1     | 11       | 1.1002384 | 2.222    | 123.321   | 
2022-02-22 | impala   |
| 2 | true   | 1     | 11       | 1.1002384 | 2.222    | 123.321   | 
2022-02-22 | impala   |
+---++---+--+---+--+---++--+
Fetched 2 row(s) in 0.16s {code}

> Impala should be able to read migrated partitioned Iceberg tables
> -
>
> Key: IMPALA-11053
> URL: https://issues.apache.org/jira/browse/IMPALA-11053
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> When Hive (and probably other engines as well) converts a legacy Hive table 
> to Iceberg it doesn't rewrite the data files.
> It means that the data files don't have write ids, moreover they don't have 
> the partition columns neither.
> Currently Impala expects tha partition columns to be present in the data 
> files, so it won't be able to read converted partitioned tables.
> So we need to inject partition values from the Iceberg metadata, plus resolve 
> columns correctly (position-based resolution needs an offset).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] (IMPALA-10267) Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples

2022-06-09 Thread Jira


[ https://issues.apache.org/jira/browse/IMPALA-10267 ]


Zoltán Garaguly deleted comment on IMPALA-10267:
--

was (Author: zgaraguly):
Same issue happened here:
https://master-03.jenkins.cloudera.com/job/impala-private-parameterized/1060/

> Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples
> -
>
> Key: IMPALA-10267
> URL: https://issues.apache.org/jira/browse/IMPALA-10267
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Joe McDonnell
>Assignee: Qifan Chen
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.0.0
>
>
> An exhaustive job hit two Impalad crashes with the following stack:
> {noformat}
>  2  impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
> rbx = 0x   rbp = 0x7f82f98ad7a0
> rsp = 0x7f82f98ad6a0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x05209129
> Found by: call frame info
>  3  impalad!impala::HdfsScanner::WriteTemplateTuples(impala::TupleRow*, int) 
> [hdfs-scanner.cc : 235 + 0xf]
> rbx = 0x   rbp = 0x7f82f98ad7a0
> rsp = 0x7f82f98ad6b0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02802013
> Found by: call frame info
>  4  impalad!impala::HdfsAvroScanner::ProcessRange(impala::RowBatch*) 
> [hdfs-avro-scanner.cc : 553 + 0x19]
> rbx = 0x0400   rbp = 0x7f82f98adc60
> rsp = 0x7f82f98ad7b0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x0283880d
> Found by: call frame info
>  5  impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) 
> [base-sequence-scanner.cc : 189 + 0x2b]
> rbx = 0x   rbp = 0x7f82f98adf40
> rsp = 0x7f82f98adc70   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x029302b5
> Found by: call frame info
>  6  impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 143 + 0x39]
> rbx = 0x0292fbd4   rbp = 0x7f82f98ae000
> rsp = 0x7f82f98adf50   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x028011c9
> Found by: call frame info
>  7  
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 500 + 0x28]
> rbx = 0x8000   rbp = 0x7f82f98ae390
> rsp = 0x7f82f98ae010   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x0297aa3d
> Found by: call frame info
>  8  impalad!impala::HdfsScanNode::ScannerThread(bool, long) 
> [hdfs-scan-node.cc : 418 + 0x27]
> rbx = 0x0001abc6a760   rbp = 0x7f82f98ae750
> rsp = 0x7f82f98ae3a0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02979dbe
> Found by: call frame info
>  9  
> impalad!impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::{lambda()#1}::operator()()
>  const + 0x30
> rbx = 0x0bbf   rbp = 0x7f82f98ae770
> rsp = 0x7f82f98ae760   r12 = 0x08e18f40
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02979126
> Found by: call frame info{noformat}
> This seems to happen when running 
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes on 
> Avro. In reading the code in HdfsAvroScanner ProcessScanRanger(), it seems 
> impossible for this value to be negative, so it's unclear what is happening.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11345) Query failed when creating equal conjunction map for Parquet bloom filter

2022-06-09 Thread Yuchen Fan (Jira)
Yuchen Fan created IMPALA-11345:
---

 Summary: Query failed when creating equal conjunction map for 
Parquet bloom filter
 Key: IMPALA-11345
 URL: https://issues.apache.org/jira/browse/IMPALA-11345
 Project: IMPALA
  Issue Type: Bug
  Components: Backend, Distributed Exec
Affects Versions: Impala 4.1.0
 Environment: CentOS-7, Impala-4.1
Reporter: Yuchen Fan


When querying Hive table was added columns without using 'cascade', Impala will 
encounter error like "Unable to find SchemaNode for path 'db.table.column' in 
the schema of file 'hdfs://xxx/path/to/parquet_file_before_add_column'." I 
checked parquet file in error log and found that the schema is not compatible 
with table metadata. Call stack is attached as below. Path and table name is 
masked: 
{code:java}
I0609 18:04:25.970052 115413 status.cc:129] c94d0ab3fdf8f943:320300610002] 
Unable to find SchemaNode for path 'xxx_db.xxx_table.xxx_column' in the schema 
of file 'hdfs://xxx_nn/xxx_table_path/00_0'.
    @           0xea543b  impala::Status::Status()
    @          0x1e3225c  
impala::HdfsParquetScanner::CreateColIdx2EqConjunctMap()
    @          0x1e363ea  impala::HdfsParquetScanner::Open()
    @          0x19b40d0  impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
    @          0x1b5cbae  impala::HdfsScanNode::ProcessSplit()
    @          0x1b5e12a  impala::HdfsScanNode::ScannerThread()
    @          0x1b5e9c6  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
    @          0x18eafa9  impala::Thread::SuperviseThread()
    @          0x18ee11a  boost::detail::thread_data<>::run()
    @          0x2385510  thread_proxy
    @     0x7fb5b0745162  start_thread
    @     0x7fb5ad21df6c  __clone{code}
The error may be relation with 
[IMPALA-10640|https://issues.apache.org/jira/browse/IMPALA-10640]. Bloom filter 
requires right  hand values of equal conjunction matches with current file 
schema. The filter will be unavailable if the column does not exist in all 
parquet files scanned. I think we can disable parquet bloom filter for this 
single query or scan node when discovered such situation.

How to reproduce (using impala-shell):
 # create table parquet_test (id INT) stored as parquet;
 # insert into parquet_test values (1),(2),(3);
 # alter table parquet_test add columns (name STRING);
 # insert into parquet_test values (4, "James");
 # select * from parquet_test where name in ("Lily");
 # Error occured.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10947) SQL support for querying Iceberg metadata

2022-06-09 Thread LiPenglin (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552053#comment-17552053
 ] 

LiPenglin commented on IMPALA-10947:


Hi [~tmate]  Thanks for your reply, I got it.

> SQL support for querying Iceberg metadata
> -
>
> Key: IMPALA-10947
> URL: https://issues.apache.org/jira/browse/IMPALA-10947
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Tamas Mate
>Priority: Major
>  Labels: impala-iceberg
>
> HIVE-25457 added support for querying Iceberg table metadata to Hive.
> They support the following syntax:
> SELECT * FROM default.iceberg_table.history;
> Spark uses the same syntax: https://iceberg.apache.org/spark-queries/#history
> Other than "history", the following metadata tables are available in Iceberg:
> The following metadata tables are available in Iceberg:
> * ENTRIES,
> * FILES,
> * HISTORY,
> * SNAPSHOTS,
> * MANIFESTS,
> * PARTITIONS,
> * ALL_DATA_FILES,
> * ALL_MANIFESTS,
> * ALL_ENTRIES
> Impala currently only supports "DESCRIBE HISTORY ". The above SELECT 
> syntax would be more convenient for the users, also it would be more flexible 
> as users could easily define filters in WHERE clauses. And of course we would 
> be consistent with other engines.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10267) Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples

2022-06-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552058#comment-17552058
 ] 

Zoltán Garaguly commented on IMPALA-10267:
--

Same issue happened here:
https://master-03.jenkins.cloudera.com/job/impala-private-parameterized/1060/

> Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples
> -
>
> Key: IMPALA-10267
> URL: https://issues.apache.org/jira/browse/IMPALA-10267
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Joe McDonnell
>Assignee: Qifan Chen
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.0.0
>
>
> An exhaustive job hit two Impalad crashes with the following stack:
> {noformat}
>  2  impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
> rbx = 0x   rbp = 0x7f82f98ad7a0
> rsp = 0x7f82f98ad6a0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x05209129
> Found by: call frame info
>  3  impalad!impala::HdfsScanner::WriteTemplateTuples(impala::TupleRow*, int) 
> [hdfs-scanner.cc : 235 + 0xf]
> rbx = 0x   rbp = 0x7f82f98ad7a0
> rsp = 0x7f82f98ad6b0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02802013
> Found by: call frame info
>  4  impalad!impala::HdfsAvroScanner::ProcessRange(impala::RowBatch*) 
> [hdfs-avro-scanner.cc : 553 + 0x19]
> rbx = 0x0400   rbp = 0x7f82f98adc60
> rsp = 0x7f82f98ad7b0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x0283880d
> Found by: call frame info
>  5  impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) 
> [base-sequence-scanner.cc : 189 + 0x2b]
> rbx = 0x   rbp = 0x7f82f98adf40
> rsp = 0x7f82f98adc70   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x029302b5
> Found by: call frame info
>  6  impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 143 + 0x39]
> rbx = 0x0292fbd4   rbp = 0x7f82f98ae000
> rsp = 0x7f82f98adf50   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x028011c9
> Found by: call frame info
>  7  
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 500 + 0x28]
> rbx = 0x8000   rbp = 0x7f82f98ae390
> rsp = 0x7f82f98ae010   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x0297aa3d
> Found by: call frame info
>  8  impalad!impala::HdfsScanNode::ScannerThread(bool, long) 
> [hdfs-scan-node.cc : 418 + 0x27]
> rbx = 0x0001abc6a760   rbp = 0x7f82f98ae750
> rsp = 0x7f82f98ae3a0   r12 = 0x
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02979dbe
> Found by: call frame info
>  9  
> impalad!impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::{lambda()#1}::operator()()
>  const + 0x30
> rbx = 0x0bbf   rbp = 0x7f82f98ae770
> rsp = 0x7f82f98ae760   r12 = 0x08e18f40
> r13 = 0x7f8306dd1690   r14 = 0x2f6631a0
> r15 = 0x72b8f2f0   rip = 0x02979126
> Found by: call frame info{noformat}
> This seems to happen when running 
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes on 
> Avro. In reading the code in HdfsAvroScanner ProcessScanRanger(), it seems 
> impossible for this value to be negative, so it's unclear what is happening.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10947) SQL support for querying Iceberg metadata

2022-06-09 Thread Tamas Mate (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552049#comment-17552049
 ] 

Tamas Mate commented on IMPALA-10947:
-

Hi [~LiPenglin], yes I am working on this, just had to put it aside for a 
while. I would rather keep it as one task for now.

> SQL support for querying Iceberg metadata
> -
>
> Key: IMPALA-10947
> URL: https://issues.apache.org/jira/browse/IMPALA-10947
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Tamas Mate
>Priority: Major
>  Labels: impala-iceberg
>
> HIVE-25457 added support for querying Iceberg table metadata to Hive.
> They support the following syntax:
> SELECT * FROM default.iceberg_table.history;
> Spark uses the same syntax: https://iceberg.apache.org/spark-queries/#history
> Other than "history", the following metadata tables are available in Iceberg:
> The following metadata tables are available in Iceberg:
> * ENTRIES,
> * FILES,
> * HISTORY,
> * SNAPSHOTS,
> * MANIFESTS,
> * PARTITIONS,
> * ALL_DATA_FILES,
> * ALL_MANIFESTS,
> * ALL_ENTRIES
> Impala currently only supports "DESCRIBE HISTORY ". The above SELECT 
> syntax would be more convenient for the users, also it would be more flexible 
> as users could easily define filters in WHERE clauses. And of course we would 
> be consistent with other engines.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10947) SQL support for querying Iceberg metadata

2022-06-09 Thread Tamas Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10947 started by Tamas Mate.
---
> SQL support for querying Iceberg metadata
> -
>
> Key: IMPALA-10947
> URL: https://issues.apache.org/jira/browse/IMPALA-10947
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Tamas Mate
>Priority: Major
>  Labels: impala-iceberg
>
> HIVE-25457 added support for querying Iceberg table metadata to Hive.
> They support the following syntax:
> SELECT * FROM default.iceberg_table.history;
> Spark uses the same syntax: https://iceberg.apache.org/spark-queries/#history
> Other than "history", the following metadata tables are available in Iceberg:
> The following metadata tables are available in Iceberg:
> * ENTRIES,
> * FILES,
> * HISTORY,
> * SNAPSHOTS,
> * MANIFESTS,
> * PARTITIONS,
> * ALL_DATA_FILES,
> * ALL_MANIFESTS,
> * ALL_ENTRIES
> Impala currently only supports "DESCRIBE HISTORY ". The above SELECT 
> syntax would be more convenient for the users, also it would be more flexible 
> as users could easily define filters in WHERE clauses. And of course we would 
> be consistent with other engines.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11023) Impala should raise an error when a delete delta file is found in an Iceberg table

2022-06-09 Thread Tamas Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Mate resolved IMPALA-11023.
-
Resolution: Fixed

> Impala should raise an error when a delete delta file is found in an Iceberg 
> table
> --
>
> Key: IMPALA-11023
> URL: https://issues.apache.org/jira/browse/IMPALA-11023
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 4.0.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Tamas Mate
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> Impala currently doesn't support row-level deletes for Iceberg tables.
> Therefore we should raise an error when a delete delta file is found.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11338) Update Impala version to 4.2.0-SNAPSHOT

2022-06-09 Thread Tamas Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Mate resolved IMPALA-11338.
-
Resolution: Fixed

> Update Impala version to 4.2.0-SNAPSHOT
> ---
>
> Key: IMPALA-11338
> URL: https://issues.apache.org/jira/browse/IMPALA-11338
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Tamas Mate
>Priority: Minor
> Fix For: Impala 4.2.0
>
>
> WIth the release of 4.1.0, we should update the master to version 4.2.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11023) Impala should raise an error when a delete delta file is found in an Iceberg table

2022-06-09 Thread Tamas Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Mate updated IMPALA-11023:

Affects Version/s: Impala 4.0.0

> Impala should raise an error when a delete delta file is found in an Iceberg 
> table
> --
>
> Key: IMPALA-11023
> URL: https://issues.apache.org/jira/browse/IMPALA-11023
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 4.0.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Tamas Mate
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> Impala currently doesn't support row-level deletes for Iceberg tables.
> Therefore we should raise an error when a delete delta file is found.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11023) Impala should raise an error when a delete delta file is found in an Iceberg table

2022-06-09 Thread Tamas Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Mate updated IMPALA-11023:

Fix Version/s: Impala 4.1.0

> Impala should raise an error when a delete delta file is found in an Iceberg 
> table
> --
>
> Key: IMPALA-11023
> URL: https://issues.apache.org/jira/browse/IMPALA-11023
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Tamas Mate
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> Impala currently doesn't support row-level deletes for Iceberg tables.
> Therefore we should raise an error when a delete delta file is found.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-2792) Syntactic sugar for computing aggregates over nested collections.

2022-06-09 Thread Tamas Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Mate reassigned IMPALA-2792:
--

Assignee: (was: Tamas Mate)

> Syntactic sugar for computing aggregates over nested collections.
> -
>
> Key: IMPALA-2792
> URL: https://issues.apache.org/jira/browse/IMPALA-2792
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 2.3.0
>Reporter: Alexander Behm
>Priority: Major
>  Labels: complextype, nested_types, planner, ramp-up, usability
>
> For user convenience and SQL brevity, we should add syntax extensions to 
> concisely express aggregates over nested collections. Internally, we should 
> re-write the concise versions into the more verbose equivalent with a 
> correlated inline view.
> Example A:
> {code}
> New syntax:
> select count(c.orders) from customer c
> Internally rewrite to:
> select cnt from customer c, (select count(*) from c.orders) v
> {code}
> Example B:
> {code}
> New syntax:
> select avg(c.orders.items.price) from customer c
> Internally rewrite to:
> select a from customer c, (select avg(price) from c.orders.items) v
> {code}
> I suggest performing the rewrite inside StmtRewriter.java after rewriting all 
> subqueries from the WHERE clause.
> Similar syntactic improvements should be considered for analytic functions on 
> nested collections.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-11268) Allow STORED BY and STORED AS as well

2022-06-09 Thread Tamas Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Mate reassigned IMPALA-11268:
---

Assignee: (was: Tamas Mate)

> Allow STORED BY and STORED AS as well
> -
>
> Key: IMPALA-11268
> URL: https://issues.apache.org/jira/browse/IMPALA-11268
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Currently Impala only recognizes the STORED AS clause, and it uses it for 
> every file format and storage engine.
> Hive behaves differently, it uses STORED AS for file formats, and STORED BY 
> for storage engines like Kudu, HBase, Iceberg.
> This is especially convenient for Iceberg uses, because they can write the 
> following statement to create a table:
> CREATE TABLE ice_t (i int) STORED BY ICEBERG STORED AS PARQUET;
> We could extend Impala's syntax to allow the above as. For 
> backward-compatibility we still need to support STORED AS ICEBERG as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11296) The executor has some resident threads that occupy CPU abnormally.

2022-06-09 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552021#comment-17552021
 ] 

Quanlong Huang commented on IMPALA-11296:
-

I can reproduce the issue on the 3.x branch:
{code:java}
Thread 1 (process 8963):
#0  0x0267391e in impala::HdfsScanner::InitTupleFromTemplate 
(this=0x12aff400, template_tuple=0x157bd000, tuple=0x144e9b92, 
tuple_byte_size=5) at /var/lib/jenkins/impala/be/src/exec/hdfs-scanner.h:537
#1  0x026c89bd in impala::HdfsScanner::InitTupleBuffer 
(this=0x12aff400, template_tuple=0x157bd000, tuple_mem=0x144e9b92 "", 
num_tuples=1024) at /var/lib/jenkins/impala/be/src/exec/hdfs-scanner.h:552
#2  0x026c777e in impala::HdfsOrcScanner::TransferTuples 
(this=0x12aff400, coll_reader=0x138aefc0, dst_batch=0x13c4aa80, 
do_batch_read=true) at 
/var/lib/jenkins/impala/be/src/exec/hdfs-orc-scanner.cc:728
#3  0x026c65c1 in impala::HdfsOrcScanner::AssembleRows 
(this=0x12aff400, row_batch=0x13c4aa80) at 
/var/lib/jenkins/impala/be/src/exec/hdfs-orc-scanner.cc:683
#4  0x026c4ea1 in impala::HdfsOrcScanner::GetNextInternal 
(this=0x12aff400, row_batch=0x13c4aa80) at 
/var/lib/jenkins/impala/be/src/exec/hdfs-orc-scanner.cc:560
#5  0x026c3832 in impala::HdfsOrcScanner::ProcessSplit 
(this=0x12aff400) at /var/lib/jenkins/impala/be/src/exec/hdfs-orc-scanner.cc:468
#6  0x027de5ae in impala::HdfsScanNode::ProcessSplit (this=0x1095c400, 
filter_ctxs=..., expr_results_pool=0x7f2814ca4410, scan_range=0x13ab44c0, 
scanner_thread_reservation=0x7f2814ca4368) at 
/var/lib/jenkins/impala/be/src/exec/hdfs-scan-node.cc:515
#7  0x027dd783 in impala::HdfsScanNode::ScannerThread (this=0x1095c400, 
first_thread=true, scanner_thread_reservation=8192) at 
/var/lib/jenkins/impala/be/src/exec/hdfs-scan-node.cc:417
#8  0x027dcae0 in impala::HdfsScanNodeoperator()(void) 
const (__closure=0x7f2814ca4b98) at 
/var/lib/jenkins/impala/be/src/exec/hdfs-scan-node.cc:338
#9  0x027df0d4 in 
boost::detail::function::void_function_obj_invoker0,
 void>::invoke(boost::detail::function::function_buffer &) 
(function_obj_ptr=...) at 
/var/lib/jenkins/impala/toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
#10 0x01fc944c in boost::function0::operator() 
(this=0x7f2814ca4b90) at 
/var/lib/jenkins/impala/toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:771
#11 0x0258a2ff in impala::Thread::SuperviseThread(std::string const&, 
std::string const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) (name=..., category=..., 
functor=..., parent_thread_info=0x7f281169e840, thread_started=0x7f281169d660) 
at /var/lib/jenkins/impala/be/src/util/thread.cc:360
#12 0x02592583 in boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void 
(*&)(std::string const&, std::string const&, boost::function, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) (this=0x13557dc0, 
f=@0x13557db8: 0x2589f98 , impala::ThreadDebugInfo const*, 
impala::Promise*)>, a=...) at 
/var/lib/jenkins/impala/toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
#13 0x025924a7 in boost::_bi::bind_t, impala::ThreadDebugInfo const*, 
impala::Promise*), 
boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> > 
>::operator()() (this=0x13557db8) at 
/var/lib/jenkins/impala/toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
#14 0x0259246a in boost::detail::thread_data, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> > > >::run() 
(this=0x13557c00) at 
/var/lib/jenkins/impala/toolchain/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116
#15 0x03dc1cba in thread_proxy ()
#16 0x7f28cd127e25 in start_thread () from /lib64/libpthread.so.0
#17 0x7f28c9c8b34d in clone () from /lib64/libc.so.6{code}
However, in the master branch, it's another symptom: IMPALA-11344. We will fix 
it separately.

> The executor has some resident threads that occupy CPU abnormally.
> --
>
> Key: IMPALA-11296
> URL: https://issues.apache.org/jira/browse/IMPALA-11296
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: zhi tang
>Assignee: zhi tang
>Priority: Major
> Attachments: image-2022-05-17-16-40-52-110.png, top_info.png
>
>
> The executor has some resident threads that occupy CPU abnormally. The 
> following is the call stack information of a thread:
> !i

[jira] [Commented] (IMPALA-5845) Impala should de-duplicate row parsing error

2022-06-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552017#comment-17552017
 ] 

ASF subversion and git services commented on IMPALA-5845:
-

Commit 7273cfdfb901b9ef564c2737cf00c7a8abb57f07 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7273cfdfb ]

IMPALA-5845: Limit the number of non-fatal errors logging to INFO

RuntimeState::LogError() does both error aggregation to the coordinator
and logging the error to the log file depending on the vlog_level. This
can flood INFO log if the specified vlog_level is 1 and makes it
difficult to analyze other more significant log lines. This patch limits
the number of errors logged to INFO based on max_error_logs_per_instance
flag (default is 2000). When this number is exceeded, vlog_level=1 will
be downgraded to vlog_level=2.

To allow easy debugging in the future, this flag will be ignored if the
user sets query option max_errors < 0, which in that case all errors
targetting vlog_level 1 will be logged.

This patch also fixes a bug where the error count is not increased for
non-general error code that is already in 'error_log_' map.

Testing:
- Add test_logging.py::TestLoggingCore

Change-Id: I924768ec461735c172fbf75d6415033bbdb77f9b
Reviewed-on: http://gerrit.cloudera.org:8080/18565
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Impala should de-duplicate row parsing error
> 
>
> Key: IMPALA-5845
> URL: https://issues.apache.org/jira/browse/IMPALA-5845
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Juan Yu
>Assignee: Riza Suminto
>Priority: Major
>  Labels: ramp-up, supportability
> Fix For: Impala 4.2.0
>
>
> Impala log file grew very quickly with lots of error like
>  I0824 10:44:46.527885  8679 runtime-state.cc:217] Error from query 
> 804d64b80df65fda:a5349b07: Error parsing row: file: 
> hdfs://nameservice1/user/hive/tpcds.db/store_sales/5.parq, before offset: 
> 120795952
> There are 622000 errors for only 141 unique files
> Impala already de-duplicate similar error in lots of scenarios, could the row 
> parsing error be de-duplicated as well to reduce log size and easier 
> troubleshooting?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8011) Allow filtering on virtual column for file name

2022-06-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552019#comment-17552019
 ] 

ASF subversion and git services commented on IMPALA-8011:
-

Commit 23d09638de35dcec6419a5e30df08fd5d8b27e7d in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=23d09638d ]

IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.

This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.

Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.

All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.

Special care is needed for virtual columns when column masking/row
filtering is applicable on them. They are added as "hidden" select
list items to the table masking views which means they don't
expand by * expressions. They still need to be included in *
expressions though when they are coming from user-written views.

Testing:
 * analyzer tests
 * added e2e tests

Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Reviewed-on: http://gerrit.cloudera.org:8080/18514
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Allow filtering on virtual column for file name
> ---
>
> Key: IMPALA-8011
> URL: https://issues.apache.org/jira/browse/IMPALA-8011
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Peter Ebert
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: built-in-function
>
> An additional performance enhancement would be the capability to filter on 
> file names using a virtual column.  This would be somewhat like the current 
> optimization of sorting data and skipping files based on parquet metadata, 
> but instead you put something in the file name to indicate it's contents 
> should be filtered.
> For example say you were writing first names and then searching for them, 
> during your writing phase you put the first letter of the first name into 
> your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" 
> then when doing a query you could filter based on where INPUT__FILE__NAME 
> contains "D" when searching for David and skip reading the file.
> Another use would be if you had a daily partition, and you put the timestamp 
> into the file name, then limit the search to only the last hour even though 
> your partition is daily. This then gives you the ability to sort by another 
> column making searches even faster on both.
>  
> This requires IMPALA-801



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-801) Add function or virtual column for file name

2022-06-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552018#comment-17552018
 ] 

ASF subversion and git services commented on IMPALA-801:


Commit 23d09638de35dcec6419a5e30df08fd5d8b27e7d in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=23d09638d ]

IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.

This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.

Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.

All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.

Special care is needed for virtual columns when column masking/row
filtering is applicable on them. They are added as "hidden" select
list items to the table masking views which means they don't
expand by * expressions. They still need to be included in *
expressions though when they are coming from user-written views.

Testing:
 * analyzer tests
 * added e2e tests

Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Reviewed-on: http://gerrit.cloudera.org:8080/18514
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add function or virtual column for file name
> 
>
> Key: IMPALA-801
> URL: https://issues.apache.org/jira/browse/IMPALA-801
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Affects Versions: Impala 1.2.3
>Reporter: Udai Kiran Potluri
>Assignee: Zoltán Borók-Nagy
>Priority: Minor
>  Labels: built-in-function, impala-iceberg, ramp-up
>
> Hive can list the data files in a table. For eg the following query lists all 
> the data files for the table or partition:
> {noformat}
> select INPUT__FILE__NAME, count(*) from  where dt='20140210' 
> group by INPUT__FILE__NAME;
> {noformat}
> This has two advantages over the existing "show files" functionality:
> * The output can be used in arbitrary SQL statements.
> * You can see which record came from which file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10057) TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST

2022-06-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552016#comment-17552016
 ] 

ASF subversion and git services commented on IMPALA-10057:
--

Commit 97d3b25be3d32c5b3b10e5785cfb32351c4065b0 in impala's branch 
refs/heads/master from Tamas Mate
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=97d3b25be ]

IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT

As 4.1.0 has been released this commit updates the master to 4.2.0.
This step needs to happen on each release, related changes are:
IMPALA-10198, IMPALA-10057

Testing:
 - Ran a build

Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70
Reviewed-on: http://gerrit.cloudera.org:8080/18595
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Impala Public Jenkins 


> TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST
> --
>
> Key: IMPALA-10057
> URL: https://issues.apache.org/jira/browse/IMPALA-10057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>  Labels: flaky
> Fix For: Impala 4.2.0
>
>
> For the both the normal tests and the docker-based tests, the Impala logs 
> generated during the FE_TEST/JDBC_TEST can be huge:
>  
> {noformat}
> $ du -c -h fe_test/ee_tests
> 4.0K  fe_test/ee_tests/minidumps/statestored
> 4.0K  fe_test/ee_tests/minidumps/impalad
> 4.0K  fe_test/ee_tests/minidumps/catalogd
> 16K   fe_test/ee_tests/minidumps
> 352K  fe_test/ee_tests/profiles
> 81G   fe_test/ee_tests
> 81G   total{noformat}
> Creating a tarball of these logs takes 10 minutes. The Impalad/catalogd logs 
> are filled with this error over and over:
> {noformat}
> E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected 
> exception thrown
> Java exception follows:
> java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: 
> org/apache/impala/common/TransactionKeepalive$HeartbeatContext
>   at 
> org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/impala/common/TransactionKeepalive$HeartbeatContext
>   ... 2 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.impala.common.TransactionKeepalive$HeartbeatContext
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 2 more{noformat}
> Two interesting points:
>  # The frontend/jdbc tests are passing, so all of these errors in the impalad 
> logs are not impacting tests.
>  # These errors aren't concurrently with any of the other tests (ee tests, 
> custom cluster tests, etc).
> This is happening on normal core runs (including the GVO job that does 
> FE_TEST/JDBC_TEST) on both Ubuntu and Centos 7. It is also happening on 
> docker-based tests. A theory is that FE_TEST/JDBC_TEST have an Impala cluster 
> running and then invoke maven to run the tests. Maven could manipulate jars 
> while Impala is running. Maybe there is a race-condition or conflict when 
> manipulating those jars that could cause the NoClassDefFoundError. It makes 
> no sense for Impala not to be able to find 
> TransactionKeepalive$HeartbeatContext.
> When it happens, it is in a tight loop, printing the message more than once 
> per millisecond. It fills the ERROR, WARNING, and INFO logs with that 
> message, sometimes for multiple Impalads and/or catalogd.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11338) Update Impala version to 4.2.0-SNAPSHOT

2022-06-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552014#comment-17552014
 ] 

ASF subversion and git services commented on IMPALA-11338:
--

Commit 97d3b25be3d32c5b3b10e5785cfb32351c4065b0 in impala's branch 
refs/heads/master from Tamas Mate
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=97d3b25be ]

IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT

As 4.1.0 has been released this commit updates the master to 4.2.0.
This step needs to happen on each release, related changes are:
IMPALA-10198, IMPALA-10057

Testing:
 - Ran a build

Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70
Reviewed-on: http://gerrit.cloudera.org:8080/18595
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Impala Public Jenkins 


> Update Impala version to 4.2.0-SNAPSHOT
> ---
>
> Key: IMPALA-11338
> URL: https://issues.apache.org/jira/browse/IMPALA-11338
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Tamas Mate
>Priority: Minor
> Fix For: Impala 4.2.0
>
>
> WIth the release of 4.1.0, we should update the master to version 4.2.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10198) Unify Java components into a single maven project

2022-06-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552015#comment-17552015
 ] 

ASF subversion and git services commented on IMPALA-10198:
--

Commit 97d3b25be3d32c5b3b10e5785cfb32351c4065b0 in impala's branch 
refs/heads/master from Tamas Mate
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=97d3b25be ]

IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT

As 4.1.0 has been released this commit updates the master to 4.2.0.
This step needs to happen on each release, related changes are:
IMPALA-10198, IMPALA-10057

Testing:
 - Ran a build

Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70
Reviewed-on: http://gerrit.cloudera.org:8080/18595
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Impala Public Jenkins 


> Unify Java components into a single maven project
> -
>
> Key: IMPALA-10198
> URL: https://issues.apache.org/jira/browse/IMPALA-10198
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.0.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.0.0
>
>
> Currently, there are multiple maven projects in Impala's source. Each one is 
> built separately with a separate maven invocation, while sharing a parent pom 
> (impala-parent/pom.xml). This requires artificial CMake dependencies to avoid 
> concurrent maven invocations (e.g. 
> [https://github.com/apache/impala/commit/4c3f701204f92f8753cf65a97fe4804d1f77bc08]).
>  
> We should unify the Java projects into a single project with submodules. This 
> will allow a single maven invocation. This makes it easier to add new Java 
> submodules, and it fixes the "mvn versions:set" command.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org