[jira] [Commented] (IMPALA-10756) Catalog failed to load metadata.
[ https://issues.apache.org/jira/browse/IMPALA-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552571#comment-17552571 ] Maarten Wullink commented on IMPALA-10756: -- this has not been fixed in the just released 4.1.0 version? > Catalog failed to load metadata. > > > Key: IMPALA-10756 > URL: https://issues.apache.org/jira/browse/IMPALA-10756 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.0.0 > Environment: System: CentOS Linux release 7.9.2009 (Core) > impala version: 4.0 > hive version: hive 3.1.2 >Reporter: zhi tang >Priority: Major > > The Catalog throws a "Invalid method name: 'get_database_req'" exception when > it loads the metadata. Details of the exception: > E0619 17:29:46.031193 301062 CatalogServiceCatalog.java:2614] Error executing > getDatabase() metastore call: default > Java exception follows: > org.apache.thrift.TApplicationException: Invalid method name: > 'get_database_req' > at > org.apache.thrift.TApplicationException.read(TApplicationException.java:111) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database_req(ThriftHiveMetastore.java:1337) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database_req(ThriftHiveMetastore.java:1324) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1940) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1924) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:208) > at com.sun.proxy.$Proxy11.getDatabase(Unknown Source) > at > org.apache.impala.catalog.CatalogServiceCatalog.invalidateTable(CatalogServiceCatalog.java:2608) > at > org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:4558) > at org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:187) > E0619 17:29:46.036005 301062 catalog-server.cc:159] TableNotFoundException: > Table not found: default.count_test > E0619 17:29:55.036509 301062 CatalogServiceCatalog.java:2614] Error > executing getDatabase() metastore call: default > Java exception follows: > org.apache.thrift.TApplicationException: Invalid method name: > 'get_database_req' > at > org.apache.thrift.TApplicationException.read(TApplicationException.java:111) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database_req(ThriftHiveMetastore.java:1337) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database_req(ThriftHiveMetastore.java:1324) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1940) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1924) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:208) > at com.sun.proxy.$Proxy11.getDatabase(Unknown Source) > at > org.apache.impala.catalog.CatalogServiceCatalog.invalidateTable(CatalogServiceCatalog.java:2608) > at > org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:4558) > at org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:187) > E0619 17:29:55.036792 301062 catalog-server.cc:159] TableNotFoundException: > Table not found: default.count_test -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11344) Selecting only the missing fields of ORC files should return NULLs
[ https://issues.apache.org/jira/browse/IMPALA-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552513#comment-17552513 ] Quanlong Huang commented on IMPALA-11344: - [~tangzhi] Do you want to take this? Same as what you did in IMPALA-11296, we just need to fix the code in OrcStructReader::TopLevelReadValueBatch(). > Selecting only the missing fields of ORC files should return NULLs > -- > > Key: IMPALA-11344 > URL: https://issues.apache.org/jira/browse/IMPALA-11344 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Priority: Critical > Labels: newbie, ramp-up > > While looking into the bug of IMPALA-11296, I found a bug on the same > scenario (scanning only the missing columns of ORC files) in current master > branch. > Creating an ORC table with missing fields in the underlying files: > {code:sql} > hive> create external table missing_field_orc (f0 int) stored as orc; > hive> insert into table missing_field_orc select 1; > hive> alter table missing_field_orc add columns (f1 int); > hive> select f1 from missing_field_orc; > +---+ > | f1 | > +---+ > | NULL | > +---+ > hive> select f0, f1 from missing_field_orc; > +-+---+ > | f0 | f1 | > +-+---+ > | 1 | NULL | > +-+---+ > {code} > Run the same queries in Impala: > {code:sql} > impala> VERSION; > Shell version: impala shell build version not available > Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build > 7273cfdfb901b9ef564c2737cf00c7a8abb57f07) > impala> invalidate metadata missing_field_orc; > impala> select f1 from missing_field_orc; > ERROR: Parse error in possibly corrupt ORC file: > 'hdfs://localhost:20500/test-warehouse/missing_field_orc/00_0'. No > columns found for this scan. > impala> select f0, f1 from missing_field_orc; > ++--+ > | f0 | f1 | > ++--+ > | 1 | NULL | > ++--+ > {code} > While selecting only the column 'f1', the query failed by an error. It should > return NULL. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11344) Selecting only the missing fields of ORC files should return NULLs
[ https://issues.apache.org/jira/browse/IMPALA-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-11344: Labels: ramp-up (was: ) > Selecting only the missing fields of ORC files should return NULLs > -- > > Key: IMPALA-11344 > URL: https://issues.apache.org/jira/browse/IMPALA-11344 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Priority: Critical > Labels: ramp-up > > While looking into the bug of IMPALA-11296, I found a bug on the same > scenario (scanning only the missing columns of ORC files) in current master > branch. > Creating an ORC table with missing fields in the underlying files: > {code:sql} > hive> create external table missing_field_orc (f0 int) stored as orc; > hive> insert into table missing_field_orc select 1; > hive> alter table missing_field_orc add columns (f1 int); > hive> select f1 from missing_field_orc; > +---+ > | f1 | > +---+ > | NULL | > +---+ > hive> select f0, f1 from missing_field_orc; > +-+---+ > | f0 | f1 | > +-+---+ > | 1 | NULL | > +-+---+ > {code} > Run the same queries in Impala: > {code:sql} > impala> VERSION; > Shell version: impala shell build version not available > Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build > 7273cfdfb901b9ef564c2737cf00c7a8abb57f07) > impala> invalidate metadata missing_field_orc; > impala> select f1 from missing_field_orc; > ERROR: Parse error in possibly corrupt ORC file: > 'hdfs://localhost:20500/test-warehouse/missing_field_orc/00_0'. No > columns found for this scan. > impala> select f0, f1 from missing_field_orc; > ++--+ > | f0 | f1 | > ++--+ > | 1 | NULL | > ++--+ > {code} > While selecting only the column 'f1', the query failed by an error. It should > return NULL. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11344) Selecting only the missing fields of ORC files should return NULLs
[ https://issues.apache.org/jira/browse/IMPALA-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-11344: Labels: newbie ramp-up (was: ramp-up) > Selecting only the missing fields of ORC files should return NULLs > -- > > Key: IMPALA-11344 > URL: https://issues.apache.org/jira/browse/IMPALA-11344 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Priority: Critical > Labels: newbie, ramp-up > > While looking into the bug of IMPALA-11296, I found a bug on the same > scenario (scanning only the missing columns of ORC files) in current master > branch. > Creating an ORC table with missing fields in the underlying files: > {code:sql} > hive> create external table missing_field_orc (f0 int) stored as orc; > hive> insert into table missing_field_orc select 1; > hive> alter table missing_field_orc add columns (f1 int); > hive> select f1 from missing_field_orc; > +---+ > | f1 | > +---+ > | NULL | > +---+ > hive> select f0, f1 from missing_field_orc; > +-+---+ > | f0 | f1 | > +-+---+ > | 1 | NULL | > +-+---+ > {code} > Run the same queries in Impala: > {code:sql} > impala> VERSION; > Shell version: impala shell build version not available > Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build > 7273cfdfb901b9ef564c2737cf00c7a8abb57f07) > impala> invalidate metadata missing_field_orc; > impala> select f1 from missing_field_orc; > ERROR: Parse error in possibly corrupt ORC file: > 'hdfs://localhost:20500/test-warehouse/missing_field_orc/00_0'. No > columns found for this scan. > impala> select f0, f1 from missing_field_orc; > ++--+ > | f0 | f1 | > ++--+ > | 1 | NULL | > ++--+ > {code} > While selecting only the column 'f1', the query failed by an error. It should > return NULL. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11296) The executor has some resident threads that occupy CPU abnormally.
[ https://issues.apache.org/jira/browse/IMPALA-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552506#comment-17552506 ] Quanlong Huang commented on IMPALA-11296: - [~tangzhi] 's patch is under review: [https://gerrit.cloudera.org/c/18571/] CC [~boroknagyz] > The executor has some resident threads that occupy CPU abnormally. > -- > > Key: IMPALA-11296 > URL: https://issues.apache.org/jira/browse/IMPALA-11296 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.4.0 >Reporter: zhi tang >Assignee: zhi tang >Priority: Major > Attachments: image-2022-05-17-16-40-52-110.png, top_info.png > > > The executor has some resident threads that occupy CPU abnormally. The > following is the call stack information of a thread: > !image-2022-05-17-16-40-52-110.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11160) TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests
[ https://issues.apache.org/jira/browse/IMPALA-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552504#comment-17552504 ] Quanlong Huang commented on IMPALA-11160: - [~csringhofer] nice finding! It seems a bug in local catalog mode. The coordinator could get the partially updated partition meta. As we see it has correct #Rows, #Files, Size, except the incrementalness. I guess setting sync_ddl=1 shower down the process so work around this. > TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests > > > Key: IMPALA-11160 > URL: https://issues.apache.org/jira/browse/IMPALA-11160 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Csaba Ringhofer >Priority: Major > Labels: broken-build > > h3. Error Message > {noformat} > query_test/test_acid.py:220: in test_acid_compute_stats > self.run_test_case('QueryTest/acid-compute-stats', vector, > use_db=unique_database) common/impala_test_suite.py:718: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:554: in __verify_results_and_errors > replace_filenames_with_placeholder) common/test_result_verifier.py:469: in > verify_raw_results VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:278: in verify_query_result_is_equal > assert expected_results == actual_results E assert Comparing > QueryTestResults (expected vs actual): E '1',1,1,'2B','NOT CACHED','NOT > CACHED',regex:.*,'true',regex:.* != '1',1,1,'2B','NOT CACHED','NOT > CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1' > E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','','' > {noformat} > h3. Stacktrace > {noformat} > query_test/test_acid.py:220: in test_acid_compute_stats > self.run_test_case('QueryTest/acid-compute-stats', vector, > use_db=unique_database) > common/impala_test_suite.py:718: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:554: in __verify_results_and_errors > replace_filenames_with_placeholder) > common/test_result_verifier.py:469: in verify_raw_results > VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:278: in verify_query_result_is_equal > assert expected_results == actual_results > E assert Comparing QueryTestResults (expected vs actual): > E '1',1,1,'2B','NOT CACHED','NOT CACHED',regex:.*,'true',regex:.* != > '1',1,1,'2B','NOT CACHED','NOT > CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1' > E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','','' > {noformat} > It happend in > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5359/ > *Please click on "Don't keep this build forever" once this issue is resolved* -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-11160) TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests
[ https://issues.apache.org/jira/browse/IMPALA-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552501#comment-17552501 ] Quanlong Huang edited comment on IMPALA-11160 at 6/10/22 2:15 AM: -- Saw this again in [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5814] was (Author: stiga-huang): Saw this again in [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5814] > TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests > > > Key: IMPALA-11160 > URL: https://issues.apache.org/jira/browse/IMPALA-11160 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Csaba Ringhofer >Priority: Major > Labels: broken-build > > h3. Error Message > {noformat} > query_test/test_acid.py:220: in test_acid_compute_stats > self.run_test_case('QueryTest/acid-compute-stats', vector, > use_db=unique_database) common/impala_test_suite.py:718: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:554: in __verify_results_and_errors > replace_filenames_with_placeholder) common/test_result_verifier.py:469: in > verify_raw_results VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:278: in verify_query_result_is_equal > assert expected_results == actual_results E assert Comparing > QueryTestResults (expected vs actual): E '1',1,1,'2B','NOT CACHED','NOT > CACHED',regex:.*,'true',regex:.* != '1',1,1,'2B','NOT CACHED','NOT > CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1' > E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','','' > {noformat} > h3. Stacktrace > {noformat} > query_test/test_acid.py:220: in test_acid_compute_stats > self.run_test_case('QueryTest/acid-compute-stats', vector, > use_db=unique_database) > common/impala_test_suite.py:718: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:554: in __verify_results_and_errors > replace_filenames_with_placeholder) > common/test_result_verifier.py:469: in verify_raw_results > VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:278: in verify_query_result_is_equal > assert expected_results == actual_results > E assert Comparing QueryTestResults (expected vs actual): > E '1',1,1,'2B','NOT CACHED','NOT CACHED',regex:.*,'true',regex:.* != > '1',1,1,'2B','NOT CACHED','NOT > CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1' > E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','','' > {noformat} > It happend in > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5359/ > *Please click on "Don't keep this build forever" once this issue is resolved* -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11160) TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests
[ https://issues.apache.org/jira/browse/IMPALA-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552501#comment-17552501 ] Quanlong Huang commented on IMPALA-11160: - Saw this again in [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5814] > TestAcid.test_acid_compute_stats failed in ubuntu-16.04-dockerised-tests > > > Key: IMPALA-11160 > URL: https://issues.apache.org/jira/browse/IMPALA-11160 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Csaba Ringhofer >Priority: Major > Labels: broken-build > > h3. Error Message > {noformat} > query_test/test_acid.py:220: in test_acid_compute_stats > self.run_test_case('QueryTest/acid-compute-stats', vector, > use_db=unique_database) common/impala_test_suite.py:718: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:554: in __verify_results_and_errors > replace_filenames_with_placeholder) common/test_result_verifier.py:469: in > verify_raw_results VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:278: in verify_query_result_is_equal > assert expected_results == actual_results E assert Comparing > QueryTestResults (expected vs actual): E '1',1,1,'2B','NOT CACHED','NOT > CACHED',regex:.*,'true',regex:.* != '1',1,1,'2B','NOT CACHED','NOT > CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1' > E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','','' > {noformat} > h3. Stacktrace > {noformat} > query_test/test_acid.py:220: in test_acid_compute_stats > self.run_test_case('QueryTest/acid-compute-stats', vector, > use_db=unique_database) > common/impala_test_suite.py:718: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:554: in __verify_results_and_errors > replace_filenames_with_placeholder) > common/test_result_verifier.py:469: in verify_raw_results > VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:278: in verify_query_result_is_equal > assert expected_results == actual_results > E assert Comparing QueryTestResults (expected vs actual): > E '1',1,1,'2B','NOT CACHED','NOT CACHED',regex:.*,'true',regex:.* != > '1',1,1,'2B','NOT CACHED','NOT > CACHED','TEXT','false','hdfs://192.168.124.1:20500/test-warehouse/managed/test_acid_compute_stats_69ccf940.db/pt/p=1' > E 'Total',1,1,'2B','0B','','','','' == 'Total',1,1,'2B','0B','','','','' > {noformat} > It happend in > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5359/ > *Please click on "Don't keep this build forever" once this issue is resolved* -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11260) Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+
[ https://issues.apache.org/jira/browse/IMPALA-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Tran updated IMPALA-11260: -- Attachment: (was: image-2022-06-09-16-17-39-445.png) > Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+ > - > > Key: IMPALA-11260 > URL: https://issues.apache.org/jira/browse/IMPALA-11260 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 4.0.0, Impala 3.3.0, > Impala 3.4.0, Impala 3.4.1 >Reporter: Quanlong Huang >Priority: Critical > > When running local catalog mode on Java11, the Ehcache sizeof lib complains > that cache sizes may be underestimated: > {code:java} > W0421 20:50:44.238312 9819 ObjectGraphWalker.java:251] > 744e548159a57cb5:879ee74c] The JVM is preventing Ehcache from > accessing the subgraph beneath 'final jdk.internal.loader.URLClassPath > jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be > underestimated as a result > Java exception follows: > java.lang.reflect.InaccessibleObjectException: Unable to make field final > jdk.internal.loader.URLClassPath > jdk.internal.loader.ClassLoaders$AppClassLoader.ucp accessible: module > java.base does not "opens jdk.internal.loader" to unnamed module @6ba7383d > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340) > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280) > at > java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176) > at java.base/java.lang.reflect.Field.setAccessible(Field.java:170) > at > org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245) > at > org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204) > at > org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159) > at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:1999) > at > com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2010) > at > com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2956) > at com.google.common.cache.LocalCache.replace(LocalCache.java:4258) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:540) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1056) > at > org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:87) > at > org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:107) > at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:127) > at > org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:310) > at > org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:165) > at > org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:141) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2014) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1926) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1750) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:164){code} > Similar errors on other classes: > {code} > The JVM is preventing Ehcache from accessing the subgraph beneath 'final > jdk.internal.loader.AbstractClassLoaderValue > jdk.internal.loader.AbstractClassLoaderValue$Sub.this$0' - cache sizes may be > underestimated as a result > The JVM is preventing Ehcache from accessing the subgraph beneath 'final > jdk.internal.loader.URLClassPath > jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be > underestimated as a result > The JVM is preventing Ehcache from accessing the subgraph beneath 'private > final java.lang.Object jdk.internal.loader.AbstractClassLoaderValue$Sub.key' > - cache sizes may be underestimated as a result > The JVM is preventing Ehcache from accessing the subgraph beneath 'private > final java.lang.String java.lang.module.Configuration.targetPlatform' - cache > sizes may be underestimated as a result > The JVM is preventing Ehcache from accessing the subgraph beneath 'private > final java.lang.String java.lang.module.ModuleDescriptor.mainClass' - cache > sizes may be underestimated as a result > The JVM is preventing Ehcache from acc
[jira] (IMPALA-11260) Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+
[ https://issues.apache.org/jira/browse/IMPALA-11260 ] Vincent Tran deleted comment on IMPALA-11260: --- was (Author: thundergun): In the same way that this can hang {*}ImpalaServer::Start(){*}, this can also hang queries in {*}ImpalaServer::ExecuteInternal(){*}, since the call over JNI never returns because the Java threads in *org.apache.impala.service.JniFrontend.getCatalogMetrics()* are stuck here waiting for a *Future* that will never complete: {noformat} "Thread-17" #59 prio=5 os_prio=0 cpu=1404.89ms elapsed=5312.17s tid=0x0acee000 nid=0xae02 waiting on condition [0x7f6443484000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.14/Native Method) - parking to wait for <0x7f6faf520b78> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.14/LockSupport.java:194) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.14/CompletableFuture.java:1796) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.14/ForkJoinPool.java:3128) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.14/CompletableFuture.java:1823) at java.util.concurrent.CompletableFuture.get(java.base@11.0.14/CompletableFuture.java:1998) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:649) at org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170) at org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155) at org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:779) at org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221) Locked ownable synchronizers: - None {noformat} > Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+ > - > > Key: IMPALA-11260 > URL: https://issues.apache.org/jira/browse/IMPALA-11260 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 4.0.0, Impala 3.3.0, > Impala 3.4.0, Impala 3.4.1 >Reporter: Quanlong Huang >Priority: Critical > Attachments: image-2022-06-09-16-17-39-445.png > > > When running local catalog mode on Java11, the Ehcache sizeof lib complains > that cache sizes may be underestimated: > {code:java} > W0421 20:50:44.238312 9819 ObjectGraphWalker.java:251] > 744e548159a57cb5:879ee74c] The JVM is preventing Ehcache from > accessing the subgraph beneath 'final jdk.internal.loader.URLClassPath > jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be > underestimated as a result > Java exception follows: > java.lang.reflect.InaccessibleObjectException: Unable to make field final > jdk.internal.loader.URLClassPath > jdk.internal.loader.ClassLoaders$AppClassLoader.ucp accessible: module > java.base does not "opens jdk.internal.loader" to unnamed module @6ba7383d > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340) > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280) > at > java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176) > at java.base/java.lang.reflect.Field.setAccessible(Field.java:170) > at > org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245) > at > org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204) > at > org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159) > at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:1999) > at > com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2010) > at > com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2956) > at com.google.common.cache.LocalCache.replace(LocalCache.java:4258) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:540) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1056) > at > org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:87) > at > org.apache.impala.catalog.l
[jira] [Comment Edited] (IMPALA-11260) Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+
[ https://issues.apache.org/jira/browse/IMPALA-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552417#comment-17552417 ] Vincent Tran edited comment on IMPALA-11260 at 6/9/22 8:24 PM: --- In the same way that this can hang {*}ImpalaServer::Start(){*}, this can also hang queries in {*}ImpalaServer::ExecuteInternal(){*}, since the call over JNI never returns because the Java threads in *org.apache.impala.service.JniFrontend.getCatalogMetrics()* are stuck here waiting for a *Future* that will never complete: {noformat} "Thread-17" #59 prio=5 os_prio=0 cpu=1404.89ms elapsed=5312.17s tid=0x0acee000 nid=0xae02 waiting on condition [0x7f6443484000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.14/Native Method) - parking to wait for <0x7f6faf520b78> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.14/LockSupport.java:194) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.14/CompletableFuture.java:1796) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.14/ForkJoinPool.java:3128) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.14/CompletableFuture.java:1823) at java.util.concurrent.CompletableFuture.get(java.base@11.0.14/CompletableFuture.java:1998) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:649) at org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170) at org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155) at org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:779) at org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221) Locked ownable synchronizers: - None {noformat} was (Author: thundergun): !image-2022-06-09-16-17-39-445.png! In the same way that this can hang *ImpalaServer::Start()*, this can also hang queries in *ImpalaServer::ExecuteInternal()*, since the call over JNI never returns because the Java threads in *org.apache.impala.service.JniFrontend.getCatalogMetrics()* are stuck here waiting for a *Future* that will never complete: {noformat} "Thread-17" #59 prio=5 os_prio=0 cpu=1404.89ms elapsed=5312.17s tid=0x0acee000 nid=0xae02 waiting on condition [0x7f6443484000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.14/Native Method) - parking to wait for <0x7f6faf520b78> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.14/LockSupport.java:194) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.14/CompletableFuture.java:1796) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.14/ForkJoinPool.java:3128) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.14/CompletableFuture.java:1823) at java.util.concurrent.CompletableFuture.get(java.base@11.0.14/CompletableFuture.java:1998) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:649) at org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170) at org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155) at org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:779) at org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221) Locked ownable synchronizers: - None {noformat} > Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+ > - > > Key: IMPALA-11260 > URL: https://issues.apache.org/jira/browse/IMPALA-11260 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 4.0.0, Impala 3.3.0, > Impala 3.4.0, Impala 3.4.1 >Reporter: Quanlong Huang >Priority: Critical > Attachments: image-2022-06-09-16-17-39-445.png > > > When running local catalog mode on Java11, the Ehcache sizeof lib complains > that cache sizes may be
[jira] [Commented] (IMPALA-11260) Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+
[ https://issues.apache.org/jira/browse/IMPALA-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552417#comment-17552417 ] Vincent Tran commented on IMPALA-11260: --- !image-2022-06-09-16-17-39-445.png! In the same way that this can hang *ImpalaServer::Start()*, this can also hang queries in *ImpalaServer::ExecuteInternal()*, since the call over JNI never returns because the Java threads in *org.apache.impala.service.JniFrontend.getCatalogMetrics()* are stuck here waiting for a *Future* that will never complete: {noformat} "Thread-17" #59 prio=5 os_prio=0 cpu=1404.89ms elapsed=5312.17s tid=0x0acee000 nid=0xae02 waiting on condition [0x7f6443484000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.14/Native Method) - parking to wait for <0x7f6faf520b78> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.14/LockSupport.java:194) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.14/CompletableFuture.java:1796) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.14/ForkJoinPool.java:3128) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.14/CompletableFuture.java:1823) at java.util.concurrent.CompletableFuture.get(java.base@11.0.14/CompletableFuture.java:1998) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:237) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:509) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadTableList(CatalogdMetaProvider.java:649) at org.apache.impala.catalog.local.LocalDb.loadTableNames(LocalDb.java:170) at org.apache.impala.catalog.local.LocalDb.getAllTableNames(LocalDb.java:155) at org.apache.impala.service.Frontend.getCatalogMetrics(Frontend.java:779) at org.apache.impala.service.JniFrontend.getCatalogMetrics(JniFrontend.java:221) Locked ownable synchronizers: - None {noformat} > Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+ > - > > Key: IMPALA-11260 > URL: https://issues.apache.org/jira/browse/IMPALA-11260 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 4.0.0, Impala 3.3.0, > Impala 3.4.0, Impala 3.4.1 >Reporter: Quanlong Huang >Priority: Critical > Attachments: image-2022-06-09-16-17-39-445.png > > > When running local catalog mode on Java11, the Ehcache sizeof lib complains > that cache sizes may be underestimated: > {code:java} > W0421 20:50:44.238312 9819 ObjectGraphWalker.java:251] > 744e548159a57cb5:879ee74c] The JVM is preventing Ehcache from > accessing the subgraph beneath 'final jdk.internal.loader.URLClassPath > jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be > underestimated as a result > Java exception follows: > java.lang.reflect.InaccessibleObjectException: Unable to make field final > jdk.internal.loader.URLClassPath > jdk.internal.loader.ClassLoaders$AppClassLoader.ucp accessible: module > java.base does not "opens jdk.internal.loader" to unnamed module @6ba7383d > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340) > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280) > at > java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176) > at java.base/java.lang.reflect.Field.setAccessible(Field.java:170) > at > org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245) > at > org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204) > at > org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159) > at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:1999) > at > com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2010) > at > com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2956) > at com.google.common.cache.LocalCache.replace(LocalCache.java:4258) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:540) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1056) > at > org.apache.impala.catalog.local.LocalIcebergTa
[jira] [Updated] (IMPALA-11260) Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+
[ https://issues.apache.org/jira/browse/IMPALA-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Tran updated IMPALA-11260: -- Attachment: image-2022-06-09-16-17-39-445.png > Catalog cache item sizes of CatalogdMetaProvider are underestimated on Java9+ > - > > Key: IMPALA-11260 > URL: https://issues.apache.org/jira/browse/IMPALA-11260 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 4.0.0, Impala 3.3.0, > Impala 3.4.0, Impala 3.4.1 >Reporter: Quanlong Huang >Priority: Critical > Attachments: image-2022-06-09-16-17-39-445.png > > > When running local catalog mode on Java11, the Ehcache sizeof lib complains > that cache sizes may be underestimated: > {code:java} > W0421 20:50:44.238312 9819 ObjectGraphWalker.java:251] > 744e548159a57cb5:879ee74c] The JVM is preventing Ehcache from > accessing the subgraph beneath 'final jdk.internal.loader.URLClassPath > jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be > underestimated as a result > Java exception follows: > java.lang.reflect.InaccessibleObjectException: Unable to make field final > jdk.internal.loader.URLClassPath > jdk.internal.loader.ClassLoaders$AppClassLoader.ucp accessible: module > java.base does not "opens jdk.internal.loader" to unnamed module @6ba7383d > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340) > at > java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280) > at > java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:176) > at java.base/java.lang.reflect.Field.setAccessible(Field.java:170) > at > org.ehcache.sizeof.ObjectGraphWalker.getAllFields(ObjectGraphWalker.java:245) > at > org.ehcache.sizeof.ObjectGraphWalker.getFilteredFields(ObjectGraphWalker.java:204) > at > org.ehcache.sizeof.ObjectGraphWalker.walk(ObjectGraphWalker.java:159) > at org.ehcache.sizeof.SizeOf.deepSizeOf(SizeOf.java:74) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$SizeOfWeigher.weigh(CatalogdMetaProvider.java:1999) > at > com.google.common.cache.LocalCache$Segment.setValue(LocalCache.java:2010) > at > com.google.common.cache.LocalCache$Segment.replace(LocalCache.java:2956) > at com.google.common.cache.LocalCache.replace(LocalCache.java:4258) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:540) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadIcebergApiTable(CatalogdMetaProvider.java:1056) > at > org.apache.impala.catalog.local.LocalIcebergTable.loadIcebergTableViaMetaProvider(LocalIcebergTable.java:87) > at > org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:107) > at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:127) > at > org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:310) > at > org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:165) > at > org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:141) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2014) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1926) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1750) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:164){code} > Similar errors on other classes: > {code} > The JVM is preventing Ehcache from accessing the subgraph beneath 'final > jdk.internal.loader.AbstractClassLoaderValue > jdk.internal.loader.AbstractClassLoaderValue$Sub.this$0' - cache sizes may be > underestimated as a result > The JVM is preventing Ehcache from accessing the subgraph beneath 'final > jdk.internal.loader.URLClassPath > jdk.internal.loader.ClassLoaders$AppClassLoader.ucp' - cache sizes may be > underestimated as a result > The JVM is preventing Ehcache from accessing the subgraph beneath 'private > final java.lang.Object jdk.internal.loader.AbstractClassLoaderValue$Sub.key' > - cache sizes may be underestimated as a result > The JVM is preventing Ehcache from accessing the subgraph beneath 'private > final java.lang.String java.lang.module.Configuration.targetPlatform' - cache > sizes may be underestimated as a result > The JVM is preventing Ehcache from accessing the subgraph beneath 'private > final java.lang.String java.lang.module.ModuleDescriptor.mainClass' - cache > sizes may be underestimated as a
[jira] [Work started] (IMPALA-10453) Support file/partition pruning via runtime filters on Iceberg
[ https://issues.apache.org/jira/browse/IMPALA-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10453 started by Tamas Mate. --- > Support file/partition pruning via runtime filters on Iceberg > - > > Key: IMPALA-10453 > URL: https://issues.apache.org/jira/browse/IMPALA-10453 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tamas Mate >Priority: Major > Labels: iceberg, impala-iceberg, performance > > This is a placeholder to figure out what we'd need to do to support dynamic > file-level pruning in Iceberg using runtime filters, i.e. have parity for > partition pruning. > * If there is a single partition value per file, then applying bloom filters > to the row group stats would be effective at pruning files. > * If there are partition transforms, e.g. hash-based, then I think we > probably need to track the partition that the file is associated with and > then have some custom logic in the parquet scanner to do partition pruning. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10267) Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples
[ https://issues.apache.org/jira/browse/IMPALA-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer reassigned IMPALA-10267: Assignee: Csaba Ringhofer (was: Qifan Chen) > Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples > - > > Key: IMPALA-10267 > URL: https://issues.apache.org/jira/browse/IMPALA-10267 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Joe McDonnell >Assignee: Csaba Ringhofer >Priority: Critical > Labels: broken-build, flaky > Fix For: Impala 4.0.0 > > > An exhaustive job hit two Impalad crashes with the following stack: > {noformat} > 2 impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9 > rbx = 0x rbp = 0x7f82f98ad7a0 > rsp = 0x7f82f98ad6a0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x05209129 > Found by: call frame info > 3 impalad!impala::HdfsScanner::WriteTemplateTuples(impala::TupleRow*, int) > [hdfs-scanner.cc : 235 + 0xf] > rbx = 0x rbp = 0x7f82f98ad7a0 > rsp = 0x7f82f98ad6b0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02802013 > Found by: call frame info > 4 impalad!impala::HdfsAvroScanner::ProcessRange(impala::RowBatch*) > [hdfs-avro-scanner.cc : 553 + 0x19] > rbx = 0x0400 rbp = 0x7f82f98adc60 > rsp = 0x7f82f98ad7b0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x0283880d > Found by: call frame info > 5 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) > [base-sequence-scanner.cc : 189 + 0x2b] > rbx = 0x rbp = 0x7f82f98adf40 > rsp = 0x7f82f98adc70 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x029302b5 > Found by: call frame info > 6 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 143 + 0x39] > rbx = 0x0292fbd4 rbp = 0x7f82f98ae000 > rsp = 0x7f82f98adf50 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x028011c9 > Found by: call frame info > 7 > impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 500 + 0x28] > rbx = 0x8000 rbp = 0x7f82f98ae390 > rsp = 0x7f82f98ae010 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x0297aa3d > Found by: call frame info > 8 impalad!impala::HdfsScanNode::ScannerThread(bool, long) > [hdfs-scan-node.cc : 418 + 0x27] > rbx = 0x0001abc6a760 rbp = 0x7f82f98ae750 > rsp = 0x7f82f98ae3a0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02979dbe > Found by: call frame info > 9 > impalad!impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::{lambda()#1}::operator()() > const + 0x30 > rbx = 0x0bbf rbp = 0x7f82f98ae770 > rsp = 0x7f82f98ae760 r12 = 0x08e18f40 > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02979126 > Found by: call frame info{noformat} > This seems to happen when running > query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes on > Avro. In reading the code in HdfsAvroScanner ProcessScanRanger(), it seems > impossible for this value to be negative, so it's unclear what is happening. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10267) Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples
[ https://issues.apache.org/jira/browse/IMPALA-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552258#comment-17552258 ] Csaba Ringhofer commented on IMPALA-10267: -- My theory on what happens here is that there is an error during HdfsAvroScanner::ProcessRange(), but actually we still continue scanning: https://github.com/apache/impala/blob/23d09638de35dcec6419a5e30df08fd5d8b27e7d/be/src/exec/base-sequence-scanner.cc#L190 For example first we set num_records_in_block_ to a lower value than the last record_pos_, and then fail here: https://github.com/apache/impala/blob/23d09638de35dcec6419a5e30df08fd5d8b27e7d/be/src/exec/hdfs-avro-scanner.cc#L512 The next time we call process ProcessRange() we will assume that everything is ok, but num_records_in_block_ will be less than record_pos_. > Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples > - > > Key: IMPALA-10267 > URL: https://issues.apache.org/jira/browse/IMPALA-10267 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Joe McDonnell >Assignee: Qifan Chen >Priority: Critical > Labels: broken-build, flaky > Fix For: Impala 4.0.0 > > > An exhaustive job hit two Impalad crashes with the following stack: > {noformat} > 2 impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9 > rbx = 0x rbp = 0x7f82f98ad7a0 > rsp = 0x7f82f98ad6a0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x05209129 > Found by: call frame info > 3 impalad!impala::HdfsScanner::WriteTemplateTuples(impala::TupleRow*, int) > [hdfs-scanner.cc : 235 + 0xf] > rbx = 0x rbp = 0x7f82f98ad7a0 > rsp = 0x7f82f98ad6b0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02802013 > Found by: call frame info > 4 impalad!impala::HdfsAvroScanner::ProcessRange(impala::RowBatch*) > [hdfs-avro-scanner.cc : 553 + 0x19] > rbx = 0x0400 rbp = 0x7f82f98adc60 > rsp = 0x7f82f98ad7b0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x0283880d > Found by: call frame info > 5 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) > [base-sequence-scanner.cc : 189 + 0x2b] > rbx = 0x rbp = 0x7f82f98adf40 > rsp = 0x7f82f98adc70 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x029302b5 > Found by: call frame info > 6 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 143 + 0x39] > rbx = 0x0292fbd4 rbp = 0x7f82f98ae000 > rsp = 0x7f82f98adf50 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x028011c9 > Found by: call frame info > 7 > impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 500 + 0x28] > rbx = 0x8000 rbp = 0x7f82f98ae390 > rsp = 0x7f82f98ae010 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x0297aa3d > Found by: call frame info > 8 impalad!impala::HdfsScanNode::ScannerThread(bool, long) > [hdfs-scan-node.cc : 418 + 0x27] > rbx = 0x0001abc6a760 rbp = 0x7f82f98ae750 > rsp = 0x7f82f98ae3a0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02979dbe > Found by: call frame info > 9 > impalad!impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::{lambda()#1}::operator()() > const + 0x30 > rbx = 0x0bbf rbp = 0x7f82f98ae770 > rsp = 0x7f82f98ae760 r12 = 0x08e18f40 > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02979126 > Found by: call frame info{noformat} > This seems to happen when running > query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes on > Avro. In reading the code in HdfsAvroScanner ProcessScanRanger(), it seems > impossible for this value to be negative, so it's unclear what is happening. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubs
[jira] [Updated] (IMPALA-11280) Zipping unnest hits DCHECK when querying from a view that has an IN operator
[ https://issues.apache.org/jira/browse/IMPALA-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-11280: -- Description: *Repro steps:* 1) Create a view that returns arrays and has an IN operator in the WHERE clause: {code:java} drop view if exists unnest_bug_view; create view unnest_bug_view as ( select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 2 = 1 and id in (select id from functional_parquet.alltypestiny) ); {code} 2) Unnest the arrays and filter by the unnested values in an outer SELECT: {code:java} select id, unnested_arr1, unnested_arr2 from (select id, unnest(arr1) as unnested_arr1, unnest(arr2) as unnested_arr2 from unnest_bug_view) a where a.unnested_arr1 < 5; {code} This hits a DCHECK in RowDescriptor::GetTupleIdx() {code:java} descriptors.cc:467] 5643fd6cdd5cece3:77942ead] Check failed: id < tuple_idx_map_.size() (3 vs. 2) RowDescriptor: Tuple(id=0 size=29 slots=[Slot(id=2 type=INT col_path=[0] offset=24 null=(offset=28 mask=4) slot_idx=2 field_idx=2), Slot(id=3 type=ARRAY col_path=[1] children_tuple_id=3 offset=0 null=(offset=28 mask=1) slot_idx=0 field_idx=0), Slot(id=5 type=ARRAY col_path=[2] children_tuple_id=4 offset=12 null=(offset=28 mask=2) slot_idx=1 field_idx=1)] tuple_path=[]) Tuple(id=1 size=5 slots=[Slot(id=0 type=INT col_path=[2] offset=0 null=(offset=4 mask=1) slot_idx=0 field_idx=0)] tuple_path=[]) *** Check failure stack trace: *** @ 0x36fe72c google::LogMessage::Fail() @ 0x36fffdc google::LogMessage::SendToLog() @ 0x36fe08a google::LogMessage::Flush() @ 0x3701c48 google::LogMessageFatal::~LogMessageFatal() @ 0x12e47ab impala::RowDescriptor::GetTupleIdx() @ 0x1b378f5 impala::SlotRef::Init() @ 0x1b25fea impala::ScalarExpr::Init() @ 0x1b665b2 impala::ScalarFnCall::Init() @ 0x1b2c44e impala::ScalarExpr::Create() @ 0x1b2c5df impala::ScalarExpr::Create() @ 0x1b2c6a0 impala::ScalarExpr::Create() @ 0x19ad286 impala::PartitionedHashJoinPlanNode::Init() @ 0x18b5d8d impala::PlanNode::CreateTreeHelper() @ 0x18b5cd9 impala::PlanNode::CreateTreeHelper() @ 0x18b5e48 impala::PlanNode::CreateTree() @ 0x12f4ca7 impala::FragmentState::Init() @ 0x12f839c impala::FragmentState::CreateFragmentStateMap() @ 0x126cedb impala::QueryState::StartFInstances() @ 0x125c4df impala::QueryExecMgr::ExecuteQueryHelper() {code} Some notes about the repro: - The inside of the select (without filtering on the unnested value) is OK. - If I unnest only one array then this is OK. - If I remove the IN clause from the view’s DDL then the query runs well. {*}Update{*}: I managed to do a repro without creating an actual view. This might reduce the complexity with the tuple/slot IDs for the investigation. {code:java} select id, unnested_arr1, unnested_arr2 from ( select id, unnest(arr1) as unnested_arr1, unnest(arr2) as unnested_arr2 from functional_parquet.complextypes_arrays where id in (select id from functional_parquet.alltypestiny)) a where a.unnested_arr1 < 5 {code} was: *Repro steps:* 1) Create a view that returns arrays and has an IN operator in the WHERE clause: {code:java} drop view if exists unnest_bug_view; create view unnest_bug_view as ( select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 2 = 1 and id in (select id from functional_parquet.alltypestiny) ); {code} 2) Unnest the arrays and filter by the unnested values in an outer SELECT: {code:java} select id, unnested_arr1, unnested_arr2 from (select id, unnest(arr1) as unnested_arr1, unnest(arr2) as unnested_arr2 from unnest_bug_view) a where a.unnested_arr1 < 5; {code} This hits a DCHECK in RowDescriptor::GetTupleIdx() {code:java} descriptors.cc:467] 5643fd6cdd5cece3:77942ead] Check failed: id < tuple_idx_map_.size() (3 vs. 2) RowDescriptor: Tuple(id=0 size=29 slots=[Slot(id=2 type=INT col_path=[0] offset=24 null=(offset=28 mask=4) slot_idx=2 field_idx=2), Slot(id=3 type=ARRAY col_path=[1] children_tuple_id=3 offset=0 null=(offset=28 mask=1) slot_idx=0 field_idx=0), Slot(id=5 type=ARRAY col_path=[2] children_tuple_id=4 offset=12 null=(offset=28 mask=2) slot_idx=1 field_idx=1)] tuple_path=[]) Tuple(id=1 size=5 slots=[Slot(id=0 type=INT col_path=[2] offset=0 null=(offset=4 mask=1) slot_idx=0 field_idx=0)] tuple_path=[]) *** Check failure stack trace: *** @ 0x36fe72c google::LogMessage::Fail() @ 0x36fffdc google::LogMessage::SendToLog() @ 0x36fe08a google::LogMessage::Flush() @ 0x3701c48 google::LogMessageFatal::~LogMessageFatal() @ 0x12e47ab impala::RowDe
[jira] [Updated] (IMPALA-11280) Zipping unnest hits DCHECK when querying from a view that has an IN operator
[ https://issues.apache.org/jira/browse/IMPALA-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-11280: -- Description: *Repro steps:* 1) Create a view that returns arrays and has an IN operator in the WHERE clause: {code:java} drop view if exists unnest_bug_view; create view unnest_bug_view as ( select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 2 = 1 and id in (select id from functional_parquet.alltypestiny) ); {code} 2) Unnest the arrays and filter by the unnested values in an outer SELECT: {code:java} select id, unnested_arr1, unnested_arr2 from (select id, unnest(arr1) as unnested_arr1, unnest(arr2) as unnested_arr2 from unnest_bug_view) a where a.unnested_arr1 < 5; {code} This hits a DCHECK in RowDescriptor::GetTupleIdx() {code:java} descriptors.cc:467] 5643fd6cdd5cece3:77942ead] Check failed: id < tuple_idx_map_.size() (3 vs. 2) RowDescriptor: Tuple(id=0 size=29 slots=[Slot(id=2 type=INT col_path=[0] offset=24 null=(offset=28 mask=4) slot_idx=2 field_idx=2), Slot(id=3 type=ARRAY col_path=[1] children_tuple_id=3 offset=0 null=(offset=28 mask=1) slot_idx=0 field_idx=0), Slot(id=5 type=ARRAY col_path=[2] children_tuple_id=4 offset=12 null=(offset=28 mask=2) slot_idx=1 field_idx=1)] tuple_path=[]) Tuple(id=1 size=5 slots=[Slot(id=0 type=INT col_path=[2] offset=0 null=(offset=4 mask=1) slot_idx=0 field_idx=0)] tuple_path=[]) *** Check failure stack trace: *** @ 0x36fe72c google::LogMessage::Fail() @ 0x36fffdc google::LogMessage::SendToLog() @ 0x36fe08a google::LogMessage::Flush() @ 0x3701c48 google::LogMessageFatal::~LogMessageFatal() @ 0x12e47ab impala::RowDescriptor::GetTupleIdx() @ 0x1b378f5 impala::SlotRef::Init() @ 0x1b25fea impala::ScalarExpr::Init() @ 0x1b665b2 impala::ScalarFnCall::Init() @ 0x1b2c44e impala::ScalarExpr::Create() @ 0x1b2c5df impala::ScalarExpr::Create() @ 0x1b2c6a0 impala::ScalarExpr::Create() @ 0x19ad286 impala::PartitionedHashJoinPlanNode::Init() @ 0x18b5d8d impala::PlanNode::CreateTreeHelper() @ 0x18b5cd9 impala::PlanNode::CreateTreeHelper() @ 0x18b5e48 impala::PlanNode::CreateTree() @ 0x12f4ca7 impala::FragmentState::Init() @ 0x12f839c impala::FragmentState::CreateFragmentStateMap() @ 0x126cedb impala::QueryState::StartFInstances() @ 0x125c4df impala::QueryExecMgr::ExecuteQueryHelper() {code} Some notes about the repro: - The inside of the select (without filtering on the unnested value) is OK. - If I unnest only one array then this is OK. - If I remove the IN clause from the view’s DDL then the query runs well. {*}Update{*}: I managed to do a repro without creating an actual view: {code:java} select id, unnested_arr1, unnested_arr2 from ( select id, unnest(arr1) as unnested_arr1, unnest(arr2) as unnested_arr2 from functional_parquet.complextypes_arrays where id in (select id from functional_parquet.alltypestiny)) a where a.unnested_arr1 < 5 {code} was: *Repro steps:* 1) Create a view that returns arrays and has an IN operator in the WHERE clause: {code:java} drop view if exists unnest_bug_view; create view unnest_bug_view as ( select id, arr1, arr2 from functional_parquet.complextypes_arrays where id % 2 = 1 and id in (select id from functional_parquet.alltypestiny) ); {code} 2) Unnest the arrays and filter by the unnested values in an outer SELECT: {code:java} select id, unnested_arr1, unnested_arr2 from (select id, unnest(arr1) as unnested_arr1, unnest(arr2) as unnested_arr2 from unnest_bug_view) a where a.unnested_arr1 < 5; {code} This hits a DCHECK in RowDescriptor::GetTupleIdx() {code:java} descriptors.cc:467] 5643fd6cdd5cece3:77942ead] Check failed: id < tuple_idx_map_.size() (3 vs. 2) RowDescriptor: Tuple(id=0 size=29 slots=[Slot(id=2 type=INT col_path=[0] offset=24 null=(offset=28 mask=4) slot_idx=2 field_idx=2), Slot(id=3 type=ARRAY col_path=[1] children_tuple_id=3 offset=0 null=(offset=28 mask=1) slot_idx=0 field_idx=0), Slot(id=5 type=ARRAY col_path=[2] children_tuple_id=4 offset=12 null=(offset=28 mask=2) slot_idx=1 field_idx=1)] tuple_path=[]) Tuple(id=1 size=5 slots=[Slot(id=0 type=INT col_path=[2] offset=0 null=(offset=4 mask=1) slot_idx=0 field_idx=0)] tuple_path=[]) *** Check failure stack trace: *** @ 0x36fe72c google::LogMessage::Fail() @ 0x36fffdc google::LogMessage::SendToLog() @ 0x36fe08a google::LogMessage::Flush() @ 0x3701c48 google::LogMessageFatal::~LogMessageFatal() @ 0x12e47ab impala::RowDescriptor::GetTupleIdx() @ 0x1b378f5 impala::SlotRef::Init() @
[jira] [Comment Edited] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552214#comment-17552214 ] Zoltán Borók-Nagy edited comment on IMPALA-11053 at 6/9/22 2:00 PM: -I was able to quickly fix it.- Opened IMPALA-11346 to track the bug. UPDATE: the fix wasn't correct, still working on it. h1. was (Author: boroknagyz): I was able to quickly fix it. Opened IMPALA-11346 to track the bug. h1. > Impala should be able to read migrated partitioned Iceberg tables > - > > Key: IMPALA-11053 > URL: https://issues.apache.org/jira/browse/IMPALA-11053 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > When Hive (and probably other engines as well) converts a legacy Hive table > to Iceberg it doesn't rewrite the data files. > It means that the data files don't have write ids, moreover they don't have > the partition columns neither. > Currently Impala expects tha partition columns to be present in the data > files, so it won't be able to read converted partitioned tables. > So we need to inject partition values from the Iceberg metadata, plus resolve > columns correctly (position-based resolution needs an offset). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552218#comment-17552218 ] LiPenglin edited comment on IMPALA-11053 at 6/9/22 1:43 PM: Thanks [~boroknagyz] I cleaned up my code and got the same results as you, sorry for the above mistake. {code:java} [localhost.localdomain:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=true; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. {code} One of the things I recently did was migrate from a Hive table to a Iceberg table. I expected to ensure that the original Hive Partition Column would continue to be used in the WHERE clause after the migration. So, is there a solution to errors on partition column values? UPDATE: I saw the IMPALA-11346, that is great! was (Author: lipenglin): Thanks [~boroknagyz] I cleaned up my code and got the same results as you, sorry for the above mistake. {code:java} [localhost.localdomain:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=true; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. {code} One of the things I recently did was migrate from a Hive table to a Iceberg table. I expected to ensure that the original Hive Partition Column would continue to be used in the WHERE clause after the migration. So, is there a solution to errors on partition column values? > Impala should be able to read migrated partitioned Iceberg tables > - > > Key: IMPALA-11053 > URL: https://issues.apache.org/jira/browse/IMPALA-11053 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > When Hive (and probably other engines as well) converts a legacy Hive table > to Iceberg it doesn't rewrite the data files. > It means that the data files don't have write ids, moreover they don't have > the partition columns neither. > Currently Impala expects tha partition columns to be present in the data > files, so it won't be able to read converted partitioned tables. > So we need to inject partition values from the Iceberg metadata, plus resolve > columns correctly (position-based resolution needs an offset). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552218#comment-17552218 ] LiPenglin commented on IMPALA-11053: Thanks [~boroknagyz] I cleaned up my code and got the same results as you, sorry for the above mistake. {code:java} [localhost.localdomain:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=true; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. {code} One of the things I recently did was migrate from a Hive table to a Iceberg table. I expected to ensure that the original Hive Partition Column would continue to be used in the WHERE clause after the migration. So, is there a solution to errors on partition column values? > Impala should be able to read migrated partitioned Iceberg tables > - > > Key: IMPALA-11053 > URL: https://issues.apache.org/jira/browse/IMPALA-11053 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > When Hive (and probably other engines as well) converts a legacy Hive table > to Iceberg it doesn't rewrite the data files. > It means that the data files don't have write ids, moreover they don't have > the partition columns neither. > Currently Impala expects tha partition columns to be present in the data > files, so it won't be able to read converted partitioned tables. > So we need to inject partition values from the Iceberg metadata, plus resolve > columns correctly (position-based resolution needs an offset). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-11346) Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
[ https://issues.apache.org/jira/browse/IMPALA-11346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-11346: -- Assignee: Zoltán Borók-Nagy > Migrated partitioned Iceberg tables might return ERROR when WHERE condition > is used on partition column > --- > > Key: IMPALA-11346 > URL: https://issues.apache.org/jira/browse/IMPALA-11346 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > {noformat} > [localhost:21050] default> select * from > functional_parquet.iceberg_alltypes_part where p_bool=false; > Fetched 0 row(s) in 0.11s > [localhost:21050] default> select * from > functional_parquet.iceberg_alltypes_part where p_bool=true; > ERROR: Unable to find SchemaNode for path > 'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file > 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. > [localhost:21050] default> select * from > functional_parquet.iceberg_alltypes_part where i=3; > Fetched 0 row(s) in 0.12s > [localhost:21050] default> select * from > functional_parquet.iceberg_alltypes_part where i=1; > +---++---+--+---+--+---++--+ > | i | p_bool | p_int | p_bigint | p_float | p_double | p_decimal | > p_date | p_string | > +---++---+--+---+--+---++--+ > | 1 | true | 1 | 11 | 1.1002384 | 2.222 | 123.321 | > 2022-02-22 | impala | > +---++---+--+---+--+---++--+ > Fetched 1 row(s) in 0.12s > [localhost:21050] default> select * from > functional_parquet.iceberg_alltypes_part where p_int=1; > ERROR: Unable to find SchemaNode for path > 'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file > 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. > [localhost:21050] default> select * from > functional_parquet.iceberg_alltypes_part where p_int=3; > Fetched 0 row(s) in 0.11s{noformat} > So we don't get incorrect results at least, but getting errors on partition > column values that are existing. > It seems like it works well with ORC. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552214#comment-17552214 ] Zoltán Borók-Nagy commented on IMPALA-11053: I was able to quickly fix it. Opened IMPALA-11346 to track the bug. h1. > Impala should be able to read migrated partitioned Iceberg tables > - > > Key: IMPALA-11053 > URL: https://issues.apache.org/jira/browse/IMPALA-11053 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > When Hive (and probably other engines as well) converts a legacy Hive table > to Iceberg it doesn't rewrite the data files. > It means that the data files don't have write ids, moreover they don't have > the partition columns neither. > Currently Impala expects tha partition columns to be present in the data > files, so it won't be able to read converted partitioned tables. > So we need to inject partition values from the Iceberg metadata, plus resolve > columns correctly (position-based resolution needs an offset). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-11346) Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
Zoltán Borók-Nagy created IMPALA-11346: -- Summary: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column Key: IMPALA-11346 URL: https://issues.apache.org/jira/browse/IMPALA-11346 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Zoltán Borók-Nagy {noformat} [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=false; Fetched 0 row(s) in 0.11s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=true; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where i=3; Fetched 0 row(s) in 0.12s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where i=1; +---++---+--+---+--+---++--+ | i | p_bool | p_int | p_bigint | p_float | p_double | p_decimal | p_date | p_string | +---++---+--+---+--+---++--+ | 1 | true | 1 | 11 | 1.1002384 | 2.222 | 123.321 | 2022-02-22 | impala | +---++---+--+---+--+---++--+ Fetched 1 row(s) in 0.12s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_int=1; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_int=3; Fetched 0 row(s) in 0.11s{noformat} So we don't get incorrect results at least, but getting errors on partition column values that are existing. It seems like it works well with ORC. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-11293) Add COMPACT command for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-11293 started by Tamas Mate. --- > Add COMPACT command for Iceberg tables > -- > > Key: IMPALA-11293 > URL: https://issues.apache.org/jira/browse/IMPALA-11293 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Tamas Mate >Priority: Major > Labels: impala-iceberg > > Currently Impala cannot compact Iceberg tables. > The following INSERT OVERWRITE statement could be used in the simple cases, > i.e. when the following conditions meet: > * all data files use the same partition spec (i.e. no partition evolution) > * no bucket partitioning (we currently forbid INSERT OVERWRITE for bucket > partitioning) > {noformat} > INSERT OVERWRITE t SELECT * FROM t;{noformat} > We could have a command that compacts the Iceberg table (syntax needs to be > the same with Hive), e.g.: > {noformat} > ALTER TABLE t EXECUTE compaction();{noformat} > At first, the compact command could be just rewritten to the INSERT OVERWRITE > command, but it would also check that there's no partition evolution. > The "no bucket" partitioning condition could be relaxed in this case, because > the result would be deterministic. I.e. the only condition we need to check > is that there was no partition evolution. > Later, we could do compaction by > {noformat} > TRUNCATE TABLE t; > INSERT INTO t SELECT * FROM t FOR SYSTEM_TIME AS OF ...;{noformat} > Currently time-travel queries are not optimized, but we could workaround it > by doing planning at first of: > {noformat} > Create the plan for: > TRUNCATE TABLE t; > INSERT INTO t SELECT * FROM t;{noformat} > Then execute them: > {noformat} > Actually execute: > TRUNCATE TABLE t; > INSERT INTO t SELECT * FROM t; (no need for time-travel, plan was created > before TRUNCATE){noformat} > This could workaround the planning overhead of time-travel queries. > Also, we might add some locking for the table if possible. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552203#comment-17552203 ] Zoltán Borók-Nagy edited comment on IMPALA-11053 at 6/9/22 1:20 PM: Thanks [~LiPenglin] I'm observing a bit different behavior: {noformat} [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=false; Fetched 0 row(s) in 0.11s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=true; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where i=3; Fetched 0 row(s) in 0.12s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where i=1; +---++---+--+---+--+---++--+ | i | p_bool | p_int | p_bigint | p_float | p_double | p_decimal | p_date | p_string | +---++---+--+---+--+---++--+ | 1 | true | 1 | 11 | 1.1002384 | 2.222 | 123.321 | 2022-02-22 | impala | +---++---+--+---+--+---++--+ Fetched 1 row(s) in 0.12s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_int=1; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_int=3; Fetched 0 row(s) in 0.11s{noformat} So I don't get incorrect results at least, but getting errors on partition column values that are existing. UPDATE: it seems like it works well with ORC. was (Author: boroknagyz): Thanks [~LiPenglin] I'm observing a bit different behavior: {noformat} [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=false; Fetched 0 row(s) in 0.11s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=true; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where i=3; Fetched 0 row(s) in 0.12s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where i=1; +---++---+--+---+--+---++--+ | i | p_bool | p_int | p_bigint | p_float | p_double | p_decimal | p_date | p_string | +---++---+--+---+--+---++--+ | 1 | true | 1 | 11 | 1.1002384 | 2.222 | 123.321 | 2022-02-22 | impala | +---++---+--+---+--+---++--+ Fetched 1 row(s) in 0.12s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_int=1; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_int=3; Fetched 0 row(s) in 0.11s{noformat} So I don't get incorrect results at least, but getting errors on partition column values that are existing. > Impala should be able to read migrated partitioned Iceberg tables > - > > Key: IMPALA-11053 > URL: https://issues.apache.org/jira/browse/IMPALA-11053 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > When Hive (and probably other engines a
[jira] [Commented] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552203#comment-17552203 ] Zoltán Borók-Nagy commented on IMPALA-11053: Thanks [~LiPenglin] I'm observing a bit different behavior: {noformat} [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=false; Fetched 0 row(s) in 0.11s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=true; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where i=3; Fetched 0 row(s) in 0.12s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where i=1; +---++---+--+---+--+---++--+ | i | p_bool | p_int | p_bigint | p_float | p_double | p_decimal | p_date | p_string | +---++---+--+---+--+---++--+ | 1 | true | 1 | 11 | 1.1002384 | 2.222 | 123.321 | 2022-02-22 | impala | +---++---+--+---+--+---++--+ Fetched 1 row(s) in 0.12s [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_int=1; ERROR: Unable to find SchemaNode for path 'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'. [localhost:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_int=3; Fetched 0 row(s) in 0.11s{noformat} So I don't get incorrect results at least, but getting errors on partition column values that are existing. > Impala should be able to read migrated partitioned Iceberg tables > - > > Key: IMPALA-11053 > URL: https://issues.apache.org/jira/browse/IMPALA-11053 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > When Hive (and probably other engines as well) converts a legacy Hive table > to Iceberg it doesn't rewrite the data files. > It means that the data files don't have write ids, moreover they don't have > the partition columns neither. > Currently Impala expects tha partition columns to be present in the data > files, so it won't be able to read converted partitioned tables. > So we need to inject partition values from the Iceberg metadata, plus resolve > columns correctly (position-based resolution needs an offset). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8011) Allow filtering on virtual column for file name
[ https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-8011. --- Fix Version/s: Impala 4.2.0 Resolution: Fixed > Allow filtering on virtual column for file name > --- > > Key: IMPALA-8011 > URL: https://issues.apache.org/jira/browse/IMPALA-8011 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Peter Ebert >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: built-in-function > Fix For: Impala 4.2.0 > > > An additional performance enhancement would be the capability to filter on > file names using a virtual column. This would be somewhat like the current > optimization of sorting data and skipping files based on parquet metadata, > but instead you put something in the file name to indicate it's contents > should be filtered. > For example say you were writing first names and then searching for them, > during your writing phase you put the first letter of the first name into > your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" > then when doing a query you could filter based on where INPUT__FILE__NAME > contains "D" when searching for David and skip reading the file. > Another use would be if you had a daily partition, and you put the timestamp > into the file name, then limit the search to only the last hour even though > your partition is daily. This then gives you the ability to sort by another > column making searches even faster on both. > > This requires IMPALA-801 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-801) Add function or virtual column for file name
[ https://issues.apache.org/jira/browse/IMPALA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-801. -- Fix Version/s: Impala 4.2.0 Resolution: Fixed > Add function or virtual column for file name > > > Key: IMPALA-801 > URL: https://issues.apache.org/jira/browse/IMPALA-801 > Project: IMPALA > Issue Type: New Feature > Components: Catalog >Affects Versions: Impala 1.2.3 >Reporter: Udai Kiran Potluri >Assignee: Zoltán Borók-Nagy >Priority: Minor > Labels: built-in-function, impala-iceberg, ramp-up > Fix For: Impala 4.2.0 > > > Hive can list the data files in a table. For eg the following query lists all > the data files for the table or partition: > {noformat} > select INPUT__FILE__NAME, count(*) from where dt='20140210' > group by INPUT__FILE__NAME; > {noformat} > This has two advantages over the existing "show files" functionality: > * The output can be used in arbitrary SQL statements. > * You can see which record came from which file. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11053) Impala should be able to read migrated partitioned Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552144#comment-17552144 ] LiPenglin commented on IMPALA-11053: Hi [~boroknagyz] This is ok when a full table scan is performed. However, the predicate in the WHERE clause does not work. {code:java} --- https://gerrit.cloudera.org/#/c/18240/11/testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-tables.test [localhost.localdomain:21050] default> select * from functional_parquet.iceberg_alltypes_part where p_bool=false; +---++---+--+---+--+---++--+ | i | p_bool | p_int | p_bigint | p_float | p_double | p_decimal | p_date | p_string | +---++---+--+---+--+---++--+ | 1 | true | 1 | 11 | 1.1002384 | 2.222 | 123.321 | 2022-02-22 | impala | | 2 | true | 1 | 11 | 1.1002384 | 2.222 | 123.321 | 2022-02-22 | impala | +---++---+--+---+--+---++--+ Fetched 2 row(s) in 0.13s [localhost.localdomain:21050] default> select * from functional_parquet.iceberg_alltypes_part where i=3; +---++---+--+---+--+---++--+ | i | p_bool | p_int | p_bigint | p_float | p_double | p_decimal | p_date | p_string | +---++---+--+---+--+---++--+ | 1 | true | 1 | 11 | 1.1002384 | 2.222 | 123.321 | 2022-02-22 | impala | | 2 | true | 1 | 11 | 1.1002384 | 2.222 | 123.321 | 2022-02-22 | impala | +---++---+--+---+--+---++--+ Fetched 2 row(s) in 0.16s {code} > Impala should be able to read migrated partitioned Iceberg tables > - > > Key: IMPALA-11053 > URL: https://issues.apache.org/jira/browse/IMPALA-11053 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > When Hive (and probably other engines as well) converts a legacy Hive table > to Iceberg it doesn't rewrite the data files. > It means that the data files don't have write ids, moreover they don't have > the partition columns neither. > Currently Impala expects tha partition columns to be present in the data > files, so it won't be able to read converted partitioned tables. > So we need to inject partition values from the Iceberg metadata, plus resolve > columns correctly (position-based resolution needs an offset). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] (IMPALA-10267) Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples
[ https://issues.apache.org/jira/browse/IMPALA-10267 ] Zoltán Garaguly deleted comment on IMPALA-10267: -- was (Author: zgaraguly): Same issue happened here: https://master-03.jenkins.cloudera.com/job/impala-private-parameterized/1060/ > Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples > - > > Key: IMPALA-10267 > URL: https://issues.apache.org/jira/browse/IMPALA-10267 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Joe McDonnell >Assignee: Qifan Chen >Priority: Critical > Labels: broken-build, flaky > Fix For: Impala 4.0.0 > > > An exhaustive job hit two Impalad crashes with the following stack: > {noformat} > 2 impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9 > rbx = 0x rbp = 0x7f82f98ad7a0 > rsp = 0x7f82f98ad6a0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x05209129 > Found by: call frame info > 3 impalad!impala::HdfsScanner::WriteTemplateTuples(impala::TupleRow*, int) > [hdfs-scanner.cc : 235 + 0xf] > rbx = 0x rbp = 0x7f82f98ad7a0 > rsp = 0x7f82f98ad6b0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02802013 > Found by: call frame info > 4 impalad!impala::HdfsAvroScanner::ProcessRange(impala::RowBatch*) > [hdfs-avro-scanner.cc : 553 + 0x19] > rbx = 0x0400 rbp = 0x7f82f98adc60 > rsp = 0x7f82f98ad7b0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x0283880d > Found by: call frame info > 5 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) > [base-sequence-scanner.cc : 189 + 0x2b] > rbx = 0x rbp = 0x7f82f98adf40 > rsp = 0x7f82f98adc70 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x029302b5 > Found by: call frame info > 6 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 143 + 0x39] > rbx = 0x0292fbd4 rbp = 0x7f82f98ae000 > rsp = 0x7f82f98adf50 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x028011c9 > Found by: call frame info > 7 > impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 500 + 0x28] > rbx = 0x8000 rbp = 0x7f82f98ae390 > rsp = 0x7f82f98ae010 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x0297aa3d > Found by: call frame info > 8 impalad!impala::HdfsScanNode::ScannerThread(bool, long) > [hdfs-scan-node.cc : 418 + 0x27] > rbx = 0x0001abc6a760 rbp = 0x7f82f98ae750 > rsp = 0x7f82f98ae3a0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02979dbe > Found by: call frame info > 9 > impalad!impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::{lambda()#1}::operator()() > const + 0x30 > rbx = 0x0bbf rbp = 0x7f82f98ae770 > rsp = 0x7f82f98ae760 r12 = 0x08e18f40 > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02979126 > Found by: call frame info{noformat} > This seems to happen when running > query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes on > Avro. In reading the code in HdfsAvroScanner ProcessScanRanger(), it seems > impossible for this value to be negative, so it's unclear what is happening. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-11345) Query failed when creating equal conjunction map for Parquet bloom filter
Yuchen Fan created IMPALA-11345: --- Summary: Query failed when creating equal conjunction map for Parquet bloom filter Key: IMPALA-11345 URL: https://issues.apache.org/jira/browse/IMPALA-11345 Project: IMPALA Issue Type: Bug Components: Backend, Distributed Exec Affects Versions: Impala 4.1.0 Environment: CentOS-7, Impala-4.1 Reporter: Yuchen Fan When querying Hive table was added columns without using 'cascade', Impala will encounter error like "Unable to find SchemaNode for path 'db.table.column' in the schema of file 'hdfs://xxx/path/to/parquet_file_before_add_column'." I checked parquet file in error log and found that the schema is not compatible with table metadata. Call stack is attached as below. Path and table name is masked: {code:java} I0609 18:04:25.970052 115413 status.cc:129] c94d0ab3fdf8f943:320300610002] Unable to find SchemaNode for path 'xxx_db.xxx_table.xxx_column' in the schema of file 'hdfs://xxx_nn/xxx_table_path/00_0'. @ 0xea543b impala::Status::Status() @ 0x1e3225c impala::HdfsParquetScanner::CreateColIdx2EqConjunctMap() @ 0x1e363ea impala::HdfsParquetScanner::Open() @ 0x19b40d0 impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() @ 0x1b5cbae impala::HdfsScanNode::ProcessSplit() @ 0x1b5e12a impala::HdfsScanNode::ScannerThread() @ 0x1b5e9c6 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x18eafa9 impala::Thread::SuperviseThread() @ 0x18ee11a boost::detail::thread_data<>::run() @ 0x2385510 thread_proxy @ 0x7fb5b0745162 start_thread @ 0x7fb5ad21df6c __clone{code} The error may be relation with [IMPALA-10640|https://issues.apache.org/jira/browse/IMPALA-10640]. Bloom filter requires right hand values of equal conjunction matches with current file schema. The filter will be unavailable if the column does not exist in all parquet files scanned. I think we can disable parquet bloom filter for this single query or scan node when discovered such situation. How to reproduce (using impala-shell): # create table parquet_test (id INT) stored as parquet; # insert into parquet_test values (1),(2),(3); # alter table parquet_test add columns (name STRING); # insert into parquet_test values (4, "James"); # select * from parquet_test where name in ("Lily"); # Error occured. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10947) SQL support for querying Iceberg metadata
[ https://issues.apache.org/jira/browse/IMPALA-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552053#comment-17552053 ] LiPenglin commented on IMPALA-10947: Hi [~tmate] Thanks for your reply, I got it. > SQL support for querying Iceberg metadata > - > > Key: IMPALA-10947 > URL: https://issues.apache.org/jira/browse/IMPALA-10947 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Tamas Mate >Priority: Major > Labels: impala-iceberg > > HIVE-25457 added support for querying Iceberg table metadata to Hive. > They support the following syntax: > SELECT * FROM default.iceberg_table.history; > Spark uses the same syntax: https://iceberg.apache.org/spark-queries/#history > Other than "history", the following metadata tables are available in Iceberg: > The following metadata tables are available in Iceberg: > * ENTRIES, > * FILES, > * HISTORY, > * SNAPSHOTS, > * MANIFESTS, > * PARTITIONS, > * ALL_DATA_FILES, > * ALL_MANIFESTS, > * ALL_ENTRIES > Impala currently only supports "DESCRIBE HISTORY ". The above SELECT > syntax would be more convenient for the users, also it would be more flexible > as users could easily define filters in WHERE clauses. And of course we would > be consistent with other engines. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10267) Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples
[ https://issues.apache.org/jira/browse/IMPALA-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552058#comment-17552058 ] Zoltán Garaguly commented on IMPALA-10267: -- Same issue happened here: https://master-03.jenkins.cloudera.com/job/impala-private-parameterized/1060/ > Impala crashes in HdfsScanner::WriteTemplateTuples() with negative num_tuples > - > > Key: IMPALA-10267 > URL: https://issues.apache.org/jira/browse/IMPALA-10267 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Joe McDonnell >Assignee: Qifan Chen >Priority: Critical > Labels: broken-build, flaky > Fix For: Impala 4.0.0 > > > An exhaustive job hit two Impalad crashes with the following stack: > {noformat} > 2 impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9 > rbx = 0x rbp = 0x7f82f98ad7a0 > rsp = 0x7f82f98ad6a0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x05209129 > Found by: call frame info > 3 impalad!impala::HdfsScanner::WriteTemplateTuples(impala::TupleRow*, int) > [hdfs-scanner.cc : 235 + 0xf] > rbx = 0x rbp = 0x7f82f98ad7a0 > rsp = 0x7f82f98ad6b0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02802013 > Found by: call frame info > 4 impalad!impala::HdfsAvroScanner::ProcessRange(impala::RowBatch*) > [hdfs-avro-scanner.cc : 553 + 0x19] > rbx = 0x0400 rbp = 0x7f82f98adc60 > rsp = 0x7f82f98ad7b0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x0283880d > Found by: call frame info > 5 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) > [base-sequence-scanner.cc : 189 + 0x2b] > rbx = 0x rbp = 0x7f82f98adf40 > rsp = 0x7f82f98adc70 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x029302b5 > Found by: call frame info > 6 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 143 + 0x39] > rbx = 0x0292fbd4 rbp = 0x7f82f98ae000 > rsp = 0x7f82f98adf50 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x028011c9 > Found by: call frame info > 7 > impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 500 + 0x28] > rbx = 0x8000 rbp = 0x7f82f98ae390 > rsp = 0x7f82f98ae010 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x0297aa3d > Found by: call frame info > 8 impalad!impala::HdfsScanNode::ScannerThread(bool, long) > [hdfs-scan-node.cc : 418 + 0x27] > rbx = 0x0001abc6a760 rbp = 0x7f82f98ae750 > rsp = 0x7f82f98ae3a0 r12 = 0x > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02979dbe > Found by: call frame info > 9 > impalad!impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::{lambda()#1}::operator()() > const + 0x30 > rbx = 0x0bbf rbp = 0x7f82f98ae770 > rsp = 0x7f82f98ae760 r12 = 0x08e18f40 > r13 = 0x7f8306dd1690 r14 = 0x2f6631a0 > r15 = 0x72b8f2f0 rip = 0x02979126 > Found by: call frame info{noformat} > This seems to happen when running > query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes on > Avro. In reading the code in HdfsAvroScanner ProcessScanRanger(), it seems > impossible for this value to be negative, so it's unclear what is happening. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10947) SQL support for querying Iceberg metadata
[ https://issues.apache.org/jira/browse/IMPALA-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552049#comment-17552049 ] Tamas Mate commented on IMPALA-10947: - Hi [~LiPenglin], yes I am working on this, just had to put it aside for a while. I would rather keep it as one task for now. > SQL support for querying Iceberg metadata > - > > Key: IMPALA-10947 > URL: https://issues.apache.org/jira/browse/IMPALA-10947 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Tamas Mate >Priority: Major > Labels: impala-iceberg > > HIVE-25457 added support for querying Iceberg table metadata to Hive. > They support the following syntax: > SELECT * FROM default.iceberg_table.history; > Spark uses the same syntax: https://iceberg.apache.org/spark-queries/#history > Other than "history", the following metadata tables are available in Iceberg: > The following metadata tables are available in Iceberg: > * ENTRIES, > * FILES, > * HISTORY, > * SNAPSHOTS, > * MANIFESTS, > * PARTITIONS, > * ALL_DATA_FILES, > * ALL_MANIFESTS, > * ALL_ENTRIES > Impala currently only supports "DESCRIBE HISTORY ". The above SELECT > syntax would be more convenient for the users, also it would be more flexible > as users could easily define filters in WHERE clauses. And of course we would > be consistent with other engines. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10947) SQL support for querying Iceberg metadata
[ https://issues.apache.org/jira/browse/IMPALA-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10947 started by Tamas Mate. --- > SQL support for querying Iceberg metadata > - > > Key: IMPALA-10947 > URL: https://issues.apache.org/jira/browse/IMPALA-10947 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Tamas Mate >Priority: Major > Labels: impala-iceberg > > HIVE-25457 added support for querying Iceberg table metadata to Hive. > They support the following syntax: > SELECT * FROM default.iceberg_table.history; > Spark uses the same syntax: https://iceberg.apache.org/spark-queries/#history > Other than "history", the following metadata tables are available in Iceberg: > The following metadata tables are available in Iceberg: > * ENTRIES, > * FILES, > * HISTORY, > * SNAPSHOTS, > * MANIFESTS, > * PARTITIONS, > * ALL_DATA_FILES, > * ALL_MANIFESTS, > * ALL_ENTRIES > Impala currently only supports "DESCRIBE HISTORY ". The above SELECT > syntax would be more convenient for the users, also it would be more flexible > as users could easily define filters in WHERE clauses. And of course we would > be consistent with other engines. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11023) Impala should raise an error when a delete delta file is found in an Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamas Mate resolved IMPALA-11023. - Resolution: Fixed > Impala should raise an error when a delete delta file is found in an Iceberg > table > -- > > Key: IMPALA-11023 > URL: https://issues.apache.org/jira/browse/IMPALA-11023 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend >Affects Versions: Impala 4.0.0 >Reporter: Zoltán Borók-Nagy >Assignee: Tamas Mate >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > Impala currently doesn't support row-level deletes for Iceberg tables. > Therefore we should raise an error when a delete delta file is found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11338) Update Impala version to 4.2.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/IMPALA-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamas Mate resolved IMPALA-11338. - Resolution: Fixed > Update Impala version to 4.2.0-SNAPSHOT > --- > > Key: IMPALA-11338 > URL: https://issues.apache.org/jira/browse/IMPALA-11338 > Project: IMPALA > Issue Type: Task > Components: Infrastructure >Affects Versions: Impala 4.2.0 >Reporter: Tamas Mate >Assignee: Tamas Mate >Priority: Minor > Fix For: Impala 4.2.0 > > > WIth the release of 4.1.0, we should update the master to version 4.2.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11023) Impala should raise an error when a delete delta file is found in an Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamas Mate updated IMPALA-11023: Affects Version/s: Impala 4.0.0 > Impala should raise an error when a delete delta file is found in an Iceberg > table > -- > > Key: IMPALA-11023 > URL: https://issues.apache.org/jira/browse/IMPALA-11023 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend >Affects Versions: Impala 4.0.0 >Reporter: Zoltán Borók-Nagy >Assignee: Tamas Mate >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > Impala currently doesn't support row-level deletes for Iceberg tables. > Therefore we should raise an error when a delete delta file is found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11023) Impala should raise an error when a delete delta file is found in an Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamas Mate updated IMPALA-11023: Fix Version/s: Impala 4.1.0 > Impala should raise an error when a delete delta file is found in an Iceberg > table > -- > > Key: IMPALA-11023 > URL: https://issues.apache.org/jira/browse/IMPALA-11023 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Tamas Mate >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > Impala currently doesn't support row-level deletes for Iceberg tables. > Therefore we should raise an error when a delete delta file is found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-2792) Syntactic sugar for computing aggregates over nested collections.
[ https://issues.apache.org/jira/browse/IMPALA-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamas Mate reassigned IMPALA-2792: -- Assignee: (was: Tamas Mate) > Syntactic sugar for computing aggregates over nested collections. > - > > Key: IMPALA-2792 > URL: https://issues.apache.org/jira/browse/IMPALA-2792 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Affects Versions: Impala 2.3.0 >Reporter: Alexander Behm >Priority: Major > Labels: complextype, nested_types, planner, ramp-up, usability > > For user convenience and SQL brevity, we should add syntax extensions to > concisely express aggregates over nested collections. Internally, we should > re-write the concise versions into the more verbose equivalent with a > correlated inline view. > Example A: > {code} > New syntax: > select count(c.orders) from customer c > Internally rewrite to: > select cnt from customer c, (select count(*) from c.orders) v > {code} > Example B: > {code} > New syntax: > select avg(c.orders.items.price) from customer c > Internally rewrite to: > select a from customer c, (select avg(price) from c.orders.items) v > {code} > I suggest performing the rewrite inside StmtRewriter.java after rewriting all > subqueries from the WHERE clause. > Similar syntactic improvements should be considered for analytic functions on > nested collections. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-11268) Allow STORED BY and STORED AS as well
[ https://issues.apache.org/jira/browse/IMPALA-11268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamas Mate reassigned IMPALA-11268: --- Assignee: (was: Tamas Mate) > Allow STORED BY and STORED AS as well > - > > Key: IMPALA-11268 > URL: https://issues.apache.org/jira/browse/IMPALA-11268 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Currently Impala only recognizes the STORED AS clause, and it uses it for > every file format and storage engine. > Hive behaves differently, it uses STORED AS for file formats, and STORED BY > for storage engines like Kudu, HBase, Iceberg. > This is especially convenient for Iceberg uses, because they can write the > following statement to create a table: > CREATE TABLE ice_t (i int) STORED BY ICEBERG STORED AS PARQUET; > We could extend Impala's syntax to allow the above as. For > backward-compatibility we still need to support STORED AS ICEBERG as well. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11296) The executor has some resident threads that occupy CPU abnormally.
[ https://issues.apache.org/jira/browse/IMPALA-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552021#comment-17552021 ] Quanlong Huang commented on IMPALA-11296: - I can reproduce the issue on the 3.x branch: {code:java} Thread 1 (process 8963): #0 0x0267391e in impala::HdfsScanner::InitTupleFromTemplate (this=0x12aff400, template_tuple=0x157bd000, tuple=0x144e9b92, tuple_byte_size=5) at /var/lib/jenkins/impala/be/src/exec/hdfs-scanner.h:537 #1 0x026c89bd in impala::HdfsScanner::InitTupleBuffer (this=0x12aff400, template_tuple=0x157bd000, tuple_mem=0x144e9b92 "", num_tuples=1024) at /var/lib/jenkins/impala/be/src/exec/hdfs-scanner.h:552 #2 0x026c777e in impala::HdfsOrcScanner::TransferTuples (this=0x12aff400, coll_reader=0x138aefc0, dst_batch=0x13c4aa80, do_batch_read=true) at /var/lib/jenkins/impala/be/src/exec/hdfs-orc-scanner.cc:728 #3 0x026c65c1 in impala::HdfsOrcScanner::AssembleRows (this=0x12aff400, row_batch=0x13c4aa80) at /var/lib/jenkins/impala/be/src/exec/hdfs-orc-scanner.cc:683 #4 0x026c4ea1 in impala::HdfsOrcScanner::GetNextInternal (this=0x12aff400, row_batch=0x13c4aa80) at /var/lib/jenkins/impala/be/src/exec/hdfs-orc-scanner.cc:560 #5 0x026c3832 in impala::HdfsOrcScanner::ProcessSplit (this=0x12aff400) at /var/lib/jenkins/impala/be/src/exec/hdfs-orc-scanner.cc:468 #6 0x027de5ae in impala::HdfsScanNode::ProcessSplit (this=0x1095c400, filter_ctxs=..., expr_results_pool=0x7f2814ca4410, scan_range=0x13ab44c0, scanner_thread_reservation=0x7f2814ca4368) at /var/lib/jenkins/impala/be/src/exec/hdfs-scan-node.cc:515 #7 0x027dd783 in impala::HdfsScanNode::ScannerThread (this=0x1095c400, first_thread=true, scanner_thread_reservation=8192) at /var/lib/jenkins/impala/be/src/exec/hdfs-scan-node.cc:417 #8 0x027dcae0 in impala::HdfsScanNodeoperator()(void) const (__closure=0x7f2814ca4b98) at /var/lib/jenkins/impala/be/src/exec/hdfs-scan-node.cc:338 #9 0x027df0d4 in boost::detail::function::void_function_obj_invoker0, void>::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=...) at /var/lib/jenkins/impala/toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159 #10 0x01fc944c in boost::function0::operator() (this=0x7f2814ca4b90) at /var/lib/jenkins/impala/toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:771 #11 0x0258a2ff in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*) (name=..., category=..., functor=..., parent_thread_info=0x7f281169e840, thread_started=0x7f281169d660) at /var/lib/jenkins/impala/be/src/util/thread.cc:360 #12 0x02592583 in boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> >::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) (this=0x13557dc0, f=@0x13557db8: 0x2589f98 , impala::ThreadDebugInfo const*, impala::Promise*)>, a=...) at /var/lib/jenkins/impala/toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531 #13 0x025924a7 in boost::_bi::bind_t, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> > >::operator()() (this=0x13557db8) at /var/lib/jenkins/impala/toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222 #14 0x0259246a in boost::detail::thread_data, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> > > >::run() (this=0x13557c00) at /var/lib/jenkins/impala/toolchain/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116 #15 0x03dc1cba in thread_proxy () #16 0x7f28cd127e25 in start_thread () from /lib64/libpthread.so.0 #17 0x7f28c9c8b34d in clone () from /lib64/libc.so.6{code} However, in the master branch, it's another symptom: IMPALA-11344. We will fix it separately. > The executor has some resident threads that occupy CPU abnormally. > -- > > Key: IMPALA-11296 > URL: https://issues.apache.org/jira/browse/IMPALA-11296 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.4.0 >Reporter: zhi tang >Assignee: zhi tang >Priority: Major > Attachments: image-2022-05-17-16-40-52-110.png, top_info.png > > > The executor has some resident threads that occupy CPU abnormally. The > following is the call stack information of a thread: > !i
[jira] [Commented] (IMPALA-5845) Impala should de-duplicate row parsing error
[ https://issues.apache.org/jira/browse/IMPALA-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552017#comment-17552017 ] ASF subversion and git services commented on IMPALA-5845: - Commit 7273cfdfb901b9ef564c2737cf00c7a8abb57f07 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=7273cfdfb ] IMPALA-5845: Limit the number of non-fatal errors logging to INFO RuntimeState::LogError() does both error aggregation to the coordinator and logging the error to the log file depending on the vlog_level. This can flood INFO log if the specified vlog_level is 1 and makes it difficult to analyze other more significant log lines. This patch limits the number of errors logged to INFO based on max_error_logs_per_instance flag (default is 2000). When this number is exceeded, vlog_level=1 will be downgraded to vlog_level=2. To allow easy debugging in the future, this flag will be ignored if the user sets query option max_errors < 0, which in that case all errors targetting vlog_level 1 will be logged. This patch also fixes a bug where the error count is not increased for non-general error code that is already in 'error_log_' map. Testing: - Add test_logging.py::TestLoggingCore Change-Id: I924768ec461735c172fbf75d6415033bbdb77f9b Reviewed-on: http://gerrit.cloudera.org:8080/18565 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Impala should de-duplicate row parsing error > > > Key: IMPALA-5845 > URL: https://issues.apache.org/jira/browse/IMPALA-5845 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Juan Yu >Assignee: Riza Suminto >Priority: Major > Labels: ramp-up, supportability > Fix For: Impala 4.2.0 > > > Impala log file grew very quickly with lots of error like > I0824 10:44:46.527885 8679 runtime-state.cc:217] Error from query > 804d64b80df65fda:a5349b07: Error parsing row: file: > hdfs://nameservice1/user/hive/tpcds.db/store_sales/5.parq, before offset: > 120795952 > There are 622000 errors for only 141 unique files > Impala already de-duplicate similar error in lots of scenarios, could the row > parsing error be de-duplicated as well to reduce log size and easier > troubleshooting? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8011) Allow filtering on virtual column for file name
[ https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552019#comment-17552019 ] ASF subversion and git services commented on IMPALA-8011: - Commit 23d09638de35dcec6419a5e30df08fd5d8b27e7d in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=23d09638d ] IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name Hive has virtual column INPUT__FILE__NAME which returns the data file name that stores the actual row. It can be used in several ways, see the above two Jira tickets for examples. This virtual column is also needed to support position-based delete files in Iceberg V2 tables. This patch also adds the foundations to support further table-level virtual columns later. Virtual columns are stored at the table level in a separate list from the table schema. During path resolution in Path.resolve() we also try to resolve virtual columns. Slot descriptors also store the information whether they refer to a virtual column. Currently we only add the INPUT__FILE__NAME virtual column. The value of this column can be set in the template tuple of the scanners. All kinds of operations are possible on this virtual column, users can invoke additional functions on it, can filter rows, can group by, etc. Special care is needed for virtual columns when column masking/row filtering is applicable on them. They are added as "hidden" select list items to the table masking views which means they don't expand by * expressions. They still need to be included in * expressions though when they are coming from user-written views. Testing: * analyzer tests * added e2e tests Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Reviewed-on: http://gerrit.cloudera.org:8080/18514 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Allow filtering on virtual column for file name > --- > > Key: IMPALA-8011 > URL: https://issues.apache.org/jira/browse/IMPALA-8011 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Peter Ebert >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: built-in-function > > An additional performance enhancement would be the capability to filter on > file names using a virtual column. This would be somewhat like the current > optimization of sorting data and skipping files based on parquet metadata, > but instead you put something in the file name to indicate it's contents > should be filtered. > For example say you were writing first names and then searching for them, > during your writing phase you put the first letter of the first name into > your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" > then when doing a query you could filter based on where INPUT__FILE__NAME > contains "D" when searching for David and skip reading the file. > Another use would be if you had a daily partition, and you put the timestamp > into the file name, then limit the search to only the last hour even though > your partition is daily. This then gives you the ability to sort by another > column making searches even faster on both. > > This requires IMPALA-801 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-801) Add function or virtual column for file name
[ https://issues.apache.org/jira/browse/IMPALA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552018#comment-17552018 ] ASF subversion and git services commented on IMPALA-801: Commit 23d09638de35dcec6419a5e30df08fd5d8b27e7d in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=23d09638d ] IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name Hive has virtual column INPUT__FILE__NAME which returns the data file name that stores the actual row. It can be used in several ways, see the above two Jira tickets for examples. This virtual column is also needed to support position-based delete files in Iceberg V2 tables. This patch also adds the foundations to support further table-level virtual columns later. Virtual columns are stored at the table level in a separate list from the table schema. During path resolution in Path.resolve() we also try to resolve virtual columns. Slot descriptors also store the information whether they refer to a virtual column. Currently we only add the INPUT__FILE__NAME virtual column. The value of this column can be set in the template tuple of the scanners. All kinds of operations are possible on this virtual column, users can invoke additional functions on it, can filter rows, can group by, etc. Special care is needed for virtual columns when column masking/row filtering is applicable on them. They are added as "hidden" select list items to the table masking views which means they don't expand by * expressions. They still need to be included in * expressions though when they are coming from user-written views. Testing: * analyzer tests * added e2e tests Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Reviewed-on: http://gerrit.cloudera.org:8080/18514 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add function or virtual column for file name > > > Key: IMPALA-801 > URL: https://issues.apache.org/jira/browse/IMPALA-801 > Project: IMPALA > Issue Type: New Feature > Components: Catalog >Affects Versions: Impala 1.2.3 >Reporter: Udai Kiran Potluri >Assignee: Zoltán Borók-Nagy >Priority: Minor > Labels: built-in-function, impala-iceberg, ramp-up > > Hive can list the data files in a table. For eg the following query lists all > the data files for the table or partition: > {noformat} > select INPUT__FILE__NAME, count(*) from where dt='20140210' > group by INPUT__FILE__NAME; > {noformat} > This has two advantages over the existing "show files" functionality: > * The output can be used in arbitrary SQL statements. > * You can see which record came from which file. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10057) TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST
[ https://issues.apache.org/jira/browse/IMPALA-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552016#comment-17552016 ] ASF subversion and git services commented on IMPALA-10057: -- Commit 97d3b25be3d32c5b3b10e5785cfb32351c4065b0 in impala's branch refs/heads/master from Tamas Mate [ https://gitbox.apache.org/repos/asf?p=impala.git;h=97d3b25be ] IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT As 4.1.0 has been released this commit updates the master to 4.2.0. This step needs to happen on each release, related changes are: IMPALA-10198, IMPALA-10057 Testing: - Ran a build Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70 Reviewed-on: http://gerrit.cloudera.org:8080/18595 Reviewed-by: Zoltan Borok-Nagy Tested-by: Impala Public Jenkins > TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST > -- > > Key: IMPALA-10057 > URL: https://issues.apache.org/jira/browse/IMPALA-10057 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > Labels: flaky > Fix For: Impala 4.2.0 > > > For the both the normal tests and the docker-based tests, the Impala logs > generated during the FE_TEST/JDBC_TEST can be huge: > > {noformat} > $ du -c -h fe_test/ee_tests > 4.0K fe_test/ee_tests/minidumps/statestored > 4.0K fe_test/ee_tests/minidumps/impalad > 4.0K fe_test/ee_tests/minidumps/catalogd > 16K fe_test/ee_tests/minidumps > 352K fe_test/ee_tests/profiles > 81G fe_test/ee_tests > 81G total{noformat} > Creating a tarball of these logs takes 10 minutes. The Impalad/catalogd logs > are filled with this error over and over: > {noformat} > E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected > exception thrown > Java exception follows: > java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: > org/apache/impala/common/TransactionKeepalive$HeartbeatContext > at > org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NoClassDefFoundError: > org/apache/impala/common/TransactionKeepalive$HeartbeatContext > ... 2 more > Caused by: java.lang.ClassNotFoundException: > org.apache.impala.common.TransactionKeepalive$HeartbeatContext > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 2 more{noformat} > Two interesting points: > # The frontend/jdbc tests are passing, so all of these errors in the impalad > logs are not impacting tests. > # These errors aren't concurrently with any of the other tests (ee tests, > custom cluster tests, etc). > This is happening on normal core runs (including the GVO job that does > FE_TEST/JDBC_TEST) on both Ubuntu and Centos 7. It is also happening on > docker-based tests. A theory is that FE_TEST/JDBC_TEST have an Impala cluster > running and then invoke maven to run the tests. Maven could manipulate jars > while Impala is running. Maybe there is a race-condition or conflict when > manipulating those jars that could cause the NoClassDefFoundError. It makes > no sense for Impala not to be able to find > TransactionKeepalive$HeartbeatContext. > When it happens, it is in a tight loop, printing the message more than once > per millisecond. It fills the ERROR, WARNING, and INFO logs with that > message, sometimes for multiple Impalads and/or catalogd. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11338) Update Impala version to 4.2.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/IMPALA-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552014#comment-17552014 ] ASF subversion and git services commented on IMPALA-11338: -- Commit 97d3b25be3d32c5b3b10e5785cfb32351c4065b0 in impala's branch refs/heads/master from Tamas Mate [ https://gitbox.apache.org/repos/asf?p=impala.git;h=97d3b25be ] IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT As 4.1.0 has been released this commit updates the master to 4.2.0. This step needs to happen on each release, related changes are: IMPALA-10198, IMPALA-10057 Testing: - Ran a build Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70 Reviewed-on: http://gerrit.cloudera.org:8080/18595 Reviewed-by: Zoltan Borok-Nagy Tested-by: Impala Public Jenkins > Update Impala version to 4.2.0-SNAPSHOT > --- > > Key: IMPALA-11338 > URL: https://issues.apache.org/jira/browse/IMPALA-11338 > Project: IMPALA > Issue Type: Task > Components: Infrastructure >Affects Versions: Impala 4.2.0 >Reporter: Tamas Mate >Assignee: Tamas Mate >Priority: Minor > Fix For: Impala 4.2.0 > > > WIth the release of 4.1.0, we should update the master to version 4.2.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10198) Unify Java components into a single maven project
[ https://issues.apache.org/jira/browse/IMPALA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552015#comment-17552015 ] ASF subversion and git services commented on IMPALA-10198: -- Commit 97d3b25be3d32c5b3b10e5785cfb32351c4065b0 in impala's branch refs/heads/master from Tamas Mate [ https://gitbox.apache.org/repos/asf?p=impala.git;h=97d3b25be ] IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT As 4.1.0 has been released this commit updates the master to 4.2.0. This step needs to happen on each release, related changes are: IMPALA-10198, IMPALA-10057 Testing: - Ran a build Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70 Reviewed-on: http://gerrit.cloudera.org:8080/18595 Reviewed-by: Zoltan Borok-Nagy Tested-by: Impala Public Jenkins > Unify Java components into a single maven project > - > > Key: IMPALA-10198 > URL: https://issues.apache.org/jira/browse/IMPALA-10198 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 4.0.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > Fix For: Impala 4.0.0 > > > Currently, there are multiple maven projects in Impala's source. Each one is > built separately with a separate maven invocation, while sharing a parent pom > (impala-parent/pom.xml). This requires artificial CMake dependencies to avoid > concurrent maven invocations (e.g. > [https://github.com/apache/impala/commit/4c3f701204f92f8753cf65a97fe4804d1f77bc08]). > > We should unify the Java projects into a single project with submodules. This > will allow a single maven invocation. This makes it easier to add new Java > submodules, and it fixes the "mvn versions:set" command. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org