[jira] [Resolved] (IMPALA-10064) Support constant propagation for range predicates
[ https://issues.apache.org/jira/browse/IMPALA-10064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha resolved IMPALA-10064. - Fix Version/s: Impala 4.0 Resolution: Fixed > Support constant propagation for range predicates > - > > Key: IMPALA-10064 > URL: https://issues.apache.org/jira/browse/IMPALA-10064 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: Aman Sinha >Assignee: Aman Sinha >Priority: Major > Fix For: Impala 4.0 > > > Consider the following table schema, view and 2 queries on the view: > {noformat} > create table tt1 (a1 int, b1 int, ts timestamp) partitioned by (mydate date); > create view tt1_view as (select a1, b1, ts from tt1 where mydate = cast(ts as > date)); > // query 1: (Good) constant on ts gets propagated > explain select * from tt1_view where ts = '2019-07-01'; > 00:SCAN HDFS [db1.tt1] >partition predicates: mydate = DATE '2019-07-01' >HDFS partitions=1/3 files=2 size=48B >predicates: db1.tt1.ts = TIMESTAMP '2019-07-01 00:00:00' >row-size=24B cardinality=1 > // query 2: (Not good) constant on ts does not get propagated > explain select * from tt1_view where ts > '2019-07-01'; > 00:SCAN HDFS [db1.tt1] >HDFS partitions=3/3 files=4 size=96B >predicates: db1.tt1.ts > TIMESTAMP '2019-07-01 00:00:00', mydate = CAST(ts > AS DATE) >row-size=28B cardinality=1 > {noformat} > Note that in query 1, with the equality condition on 'ts' the constant value > is propagated to the 'mydate = CAST(ts as date)' predicate. This gets > applied as a partition predicate. Whereas, in query 2 which has a range > predicate, the constant is not propagated and no partition predicate is > created for the scan. We should support the second case also for constant > propagation. The constant predicates such as >, >=. <. <= and involving date > or timestamp literals should be considered ..but we have to analyze the cases > where the propagation is valid. E.g with date_add, date_diff type of > functions is there a potential for incorrect propagation. > Note that a predicate can be a BETWEEN condition such as: > {noformat} > WHERE ts >= '2019-07-01' AND ts <= '2020--07-01' > {noformat} > In this case both need to be applied -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10064) Support constant propagation for range predicates
[ https://issues.apache.org/jira/browse/IMPALA-10064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha resolved IMPALA-10064. - Fix Version/s: Impala 4.0 Resolution: Fixed > Support constant propagation for range predicates > - > > Key: IMPALA-10064 > URL: https://issues.apache.org/jira/browse/IMPALA-10064 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: Aman Sinha >Assignee: Aman Sinha >Priority: Major > Fix For: Impala 4.0 > > > Consider the following table schema, view and 2 queries on the view: > {noformat} > create table tt1 (a1 int, b1 int, ts timestamp) partitioned by (mydate date); > create view tt1_view as (select a1, b1, ts from tt1 where mydate = cast(ts as > date)); > // query 1: (Good) constant on ts gets propagated > explain select * from tt1_view where ts = '2019-07-01'; > 00:SCAN HDFS [db1.tt1] >partition predicates: mydate = DATE '2019-07-01' >HDFS partitions=1/3 files=2 size=48B >predicates: db1.tt1.ts = TIMESTAMP '2019-07-01 00:00:00' >row-size=24B cardinality=1 > // query 2: (Not good) constant on ts does not get propagated > explain select * from tt1_view where ts > '2019-07-01'; > 00:SCAN HDFS [db1.tt1] >HDFS partitions=3/3 files=4 size=96B >predicates: db1.tt1.ts > TIMESTAMP '2019-07-01 00:00:00', mydate = CAST(ts > AS DATE) >row-size=28B cardinality=1 > {noformat} > Note that in query 1, with the equality condition on 'ts' the constant value > is propagated to the 'mydate = CAST(ts as date)' predicate. This gets > applied as a partition predicate. Whereas, in query 2 which has a range > predicate, the constant is not propagated and no partition predicate is > created for the scan. We should support the second case also for constant > propagation. The constant predicates such as >, >=. <. <= and involving date > or timestamp literals should be considered ..but we have to analyze the cases > where the propagation is valid. E.g with date_add, date_diff type of > functions is there a potential for incorrect propagation. > Note that a predicate can be a BETWEEN condition such as: > {noformat} > WHERE ts >= '2019-07-01' AND ts <= '2020--07-01' > {noformat} > In this case both need to be applied -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10057) TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST
[ https://issues.apache.org/jira/browse/IMPALA-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell updated IMPALA-10057: --- Description: For the both the normal tests and the docker-based tests, the Impala logs generated during the FE_TEST/JDBC_TEST can be huge: {noformat} $ du -c -h fe_test/ee_tests 4.0Kfe_test/ee_tests/minidumps/statestored 4.0Kfe_test/ee_tests/minidumps/impalad 4.0Kfe_test/ee_tests/minidumps/catalogd 16K fe_test/ee_tests/minidumps 352Kfe_test/ee_tests/profiles 81G fe_test/ee_tests 81G total{noformat} Creating a tarball of these logs takes 10 minutes. The Impalad/catalogd logs are filled with this error over and over: {noformat} E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected exception thrown Java exception follows: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/impala/common/TransactionKeepalive$HeartbeatContext at org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: org/apache/impala/common/TransactionKeepalive$HeartbeatContext ... 2 more Caused by: java.lang.ClassNotFoundException: org.apache.impala.common.TransactionKeepalive$HeartbeatContext at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 2 more{noformat} Two interesting points: # The frontend/jdbc tests are passing, so all of these errors in the impalad logs are not impacting tests. # These errors aren't concurrently with any of the other tests (ee tests, custom cluster tests, etc). This is happening on normal core runs (including the GVO job that does FE_TEST/JDBC_TEST) on both Ubuntu and Centos 7. It is also happening on docker-based tests. A theory is that FE_TEST/JDBC_TEST have an Impala cluster running and then invoke maven to run the tests. Maven could manipulate jars while Impala is running. Maybe there is a race-condition or conflict when manipulating those jars that could cause the NoClassDefFoundError. It makes no sense for Impala not to be able to find TransactionKeepalive$HeartbeatContext. When it happens, it is in a tight loop, printing the message more than once per millisecond. It fills the ERROR, WARNING, and INFO logs with that message, sometimes for multiple Impalads and/or catalogd. was: For the docker-based tests, the Impala logs generated during the FE_TEST are huge: {noformat} $ du -c -h fe_test/ee_tests 4.0Kfe_test/ee_tests/minidumps/statestored 4.0Kfe_test/ee_tests/minidumps/impalad 4.0Kfe_test/ee_tests/minidumps/catalogd 16K fe_test/ee_tests/minidumps 352Kfe_test/ee_tests/profiles 81G fe_test/ee_tests 81G total{noformat} Creating a tarball of these logs takes 10 minutes. The Impalad/catalogd logs are filled with this error over and over: {noformat} E0805 06:08:45.485440 11219 TransactionKeepalive.java:137] Unexpected exception thrown Java exception follows: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: at org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: ... 2 more{noformat} Two interesting points: # The frontend tests are passing, so all of these errors in the impalad logs are not impacting tests. # These errors aren't happening in any of the other tests (ee tests, custom cluster tests, etc). These errors are not seen outside the docker-based tests. A theory is that FE_TEST runs mvn to build and run the frontend tests. If there were some bad interaction of mvn with the docker filesystem in manipulating the ~/.m2 directory, that may cause problems. One thing to try may be to copy the .m2 directory to make sure it is in the top docker layer (similar to what we do with kudu wal files). > TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST > -- > > Key: IMPALA-10057 > URL: https://issues.apache.org/jira/browse/IMPALA-10057 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Priority: Major > > For the both the normal tests and the docker-based tests, the Impala logs > generated during the FE_TEST/JDBC_TEST can be huge: > > {noformat} > $ du -c -h fe_test/ee_tests > 4.0K fe_test/ee_tests/minidumps/statestored > 4.0K fe_test/ee_tests/minidumps/impalad > 4.0K
[jira] [Updated] (IMPALA-10057) TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST
[ https://issues.apache.org/jira/browse/IMPALA-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell updated IMPALA-10057: --- Summary: TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST (was: Impala logs during docker-based FE_TEST are massive) > TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST > -- > > Key: IMPALA-10057 > URL: https://issues.apache.org/jira/browse/IMPALA-10057 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Priority: Major > > For the docker-based tests, the Impala logs generated during the FE_TEST are > huge: > > {noformat} > $ du -c -h fe_test/ee_tests > 4.0K fe_test/ee_tests/minidumps/statestored > 4.0K fe_test/ee_tests/minidumps/impalad > 4.0K fe_test/ee_tests/minidumps/catalogd > 16K fe_test/ee_tests/minidumps > 352K fe_test/ee_tests/profiles > 81G fe_test/ee_tests > 81G total{noformat} > Creating a tarball of these logs takes 10 minutes. The Impalad/catalogd logs > are filled with this error over and over: > {noformat} > E0805 06:08:45.485440 11219 TransactionKeepalive.java:137] Unexpected > exception thrown > Java exception follows: > java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: > at > org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NoClassDefFoundError: > ... 2 more{noformat} > Two interesting points: > # The frontend tests are passing, so all of these errors in the impalad logs > are not impacting tests. > # These errors aren't happening in any of the other tests (ee tests, custom > cluster tests, etc). These errors are not seen outside the docker-based tests. > A theory is that FE_TEST runs mvn to build and run the frontend tests. If > there were some bad interaction of mvn with the docker filesystem in > manipulating the ~/.m2 directory, that may cause problems. One thing to try > may be to copy the .m2 directory to make sure it is in the top docker layer > (similar to what we do with kudu wal files). > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10064) Support constant propagation for range predicates
[ https://issues.apache.org/jira/browse/IMPALA-10064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189818#comment-17189818 ] ASF subversion and git services commented on IMPALA-10064: -- Commit 5e9f10d34cc2ba6e18b469a3a5ae3ed9f5f306b1 in impala's branch refs/heads/master from Aman Sinha [ https://gitbox.apache.org/repos/asf?p=impala.git;h=5e9f10d ] IMPALA-10064: Support constant propagation for eligible range predicates This patch adds support for constant propagation of range predicates involving date and timestamp constants. Previously, only equality predicates were considered for propagation. The new type of propagation is shown by the following example: Before constant propagation: WHERE date_col = CAST(timestamp_col as DATE) AND timestamp_col BETWEEN '2019-01-01' AND '2020-01-01' After constant propagation: WHERE date_col >= '2019-01-01' AND date_col <= '2020-01-01' AND timestamp_col >= '2019-01-01' AND timestamp_col <= '2020-01-01' AND date_col = CAST(timestamp_col as DATE) As a consequence, since Impala supports table partitioning by date columns but not timestamp columns, the above propagation enables partition pruning based on timestamp ranges. Existing code for equality based constant propagation was refactored and consolidated into a new class which handles both equality and range based constant propagation. Range based propagation is only applied to date and timestamp columns. Testing: - Added new range constant propagation tests to PlannerTest. - Added e2e test for range constant propagation based on a newly added date partitioned table. - Ran precommit tests. Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b Reviewed-on: http://gerrit.cloudera.org:8080/16346 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Support constant propagation for range predicates > - > > Key: IMPALA-10064 > URL: https://issues.apache.org/jira/browse/IMPALA-10064 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: Aman Sinha >Assignee: Aman Sinha >Priority: Major > > Consider the following table schema, view and 2 queries on the view: > {noformat} > create table tt1 (a1 int, b1 int, ts timestamp) partitioned by (mydate date); > create view tt1_view as (select a1, b1, ts from tt1 where mydate = cast(ts as > date)); > // query 1: (Good) constant on ts gets propagated > explain select * from tt1_view where ts = '2019-07-01'; > 00:SCAN HDFS [db1.tt1] >partition predicates: mydate = DATE '2019-07-01' >HDFS partitions=1/3 files=2 size=48B >predicates: db1.tt1.ts = TIMESTAMP '2019-07-01 00:00:00' >row-size=24B cardinality=1 > // query 2: (Not good) constant on ts does not get propagated > explain select * from tt1_view where ts > '2019-07-01'; > 00:SCAN HDFS [db1.tt1] >HDFS partitions=3/3 files=4 size=96B >predicates: db1.tt1.ts > TIMESTAMP '2019-07-01 00:00:00', mydate = CAST(ts > AS DATE) >row-size=28B cardinality=1 > {noformat} > Note that in query 1, with the equality condition on 'ts' the constant value > is propagated to the 'mydate = CAST(ts as date)' predicate. This gets > applied as a partition predicate. Whereas, in query 2 which has a range > predicate, the constant is not propagated and no partition predicate is > created for the scan. We should support the second case also for constant > propagation. The constant predicates such as >, >=. <. <= and involving date > or timestamp literals should be considered ..but we have to analyze the cases > where the propagation is valid. E.g with date_add, date_diff type of > functions is there a potential for incorrect propagation. > Note that a predicate can be a BETWEEN condition such as: > {noformat} > WHERE ts >= '2019-07-01' AND ts <= '2020--07-01' > {noformat} > In this case both need to be applied -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9870) summary and profile command in impala-shell should show both original and retried info
[ https://issues.apache.org/jira/browse/IMPALA-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189775#comment-17189775 ] Sahil Takiar commented on IMPALA-9870: -- WIP Patch: http://gerrit.cloudera.org:8080/16406 > summary and profile command in impala-shell should show both original and > retried info > -- > > Key: IMPALA-9870 > URL: https://issues.apache.org/jira/browse/IMPALA-9870 > Project: IMPALA > Issue Type: Sub-task >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Major > > If a query is retried, impala-shell still uses the original query handle > containing the original query id. Subsequent "summary" and "profile" commands > will return results of the original query. We should consider return both the > original and retried information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10140) Throw CatalogException for query "create database if not exist" with sync_ddl as true
[ https://issues.apache.org/jira/browse/IMPALA-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10140: Description: Customer faced following error message randomly when running following query on impalad version 3.2.0-cdh6.3.2 RELEASE. set sync_ddl =true ; create database if not exists $dbname; I0715 11:52:28.496253 51943 client-request-state.cc:187] a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been su ccessfully executed but its effects may have not been broadcast to all the coordinators. >From the Catalog server log, we can check following error message as well. I0715 11:01:50.143303 220286 jni-util.cc:256] org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been successfully executed but its effects may have not been broadcast to all the coordinators. at org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474) at org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374) at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154) This looks to be another variation of the conditions described in IMPALA-7961. But the difference here is that this case is with "CREATE DATABASE ... IF NOT EXISTS". The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT EXISTS" use case. To fix the issue, we should port the change in patch [https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function. was: Customer faced following error message randomly when running following query on impalad version 3.2.0-cdh6.3.2 RELEASE. ([https://jira.cloudera.com/browse/ENGESC-3589)|https://jira.cloudera.com/browse/ENGESC-3589] set sync_ddl =true ; create database if not exists $dbname; I0715 11:52:28.496253 51943 client-request-state.cc:187] a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been su ccessfully executed but its effects may have not been broadcast to all the coordinators. >From the Catalog server log, we can check following error message as well. I0715 11:01:50.143303 220286 jni-util.cc:256] org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been successfully executed but its effects may have not been broadcast to all the coordinators. at org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474) at org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374) at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154) This looks to be another variation of the conditions described in IMPALA-7961. But the difference here is that this case is with "CREATE DATABASE ... IF NOT EXISTS". The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT EXISTS" use case. To fix the issue, we should port the change in patch [https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function. > Throw CatalogException for query "create database if not exist" with sync_ddl > as true > - > > Key: IMPALA-10140 > URL: https://issues.apache.org/jira/browse/IMPALA-10140 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend >Affects Versions: Impala 3.2.0 >Reporter: Wenzhe Zhou >Priority: Critical > > Customer faced following error message randomly when running following query > on impalad version 3.2.0-cdh6.3.2 RELEASE. > set sync_ddl =true ; create database if not exists $dbname; > I0715 11:52:28.496253 51943 client-request-state.cc:187] > a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the > catalog topic version for the SYNC_DDL operation after 5 attempts.The > operation has been su > ccessfully executed but its effects may have not been broadcast to all the > coordinators. > > From the Catalog server log, we can check following error message as well. > I0715 11:01:50.143303 220286 jni-util.cc:256] > org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog > topic version for the SYNC_DDL operation after 5 attempts.The operation has > been successfully executed but its effects may have not been broadcast to all > the coordinators. > at > org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474) > at >
[jira] [Updated] (IMPALA-10140) Throw CatalogException for query "create database if not exist" with sync_ddl as true
[ https://issues.apache.org/jira/browse/IMPALA-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10140: Priority: Critical (was: Major) > Throw CatalogException for query "create database if not exist" with sync_ddl > as true > - > > Key: IMPALA-10140 > URL: https://issues.apache.org/jira/browse/IMPALA-10140 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend >Affects Versions: Impala 3.2.0 >Reporter: Wenzhe Zhou >Priority: Critical > > Customer faced following error message randomly when running following query > on impalad version 3.2.0-cdh6.3.2 RELEASE. > ([https://jira.cloudera.com/browse/ENGESC-3589)|https://jira.cloudera.com/browse/ENGESC-3589] > set sync_ddl =true ; create database if not exists $dbname; > I0715 11:52:28.496253 51943 client-request-state.cc:187] > a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the > catalog topic version for the SYNC_DDL operation after 5 attempts.The > operation has been su > ccessfully executed but its effects may have not been broadcast to all the > coordinators. > > From the Catalog server log, we can check following error message as well. > I0715 11:01:50.143303 220286 jni-util.cc:256] > org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog > topic version for the SYNC_DDL operation after 5 attempts.The operation has > been successfully executed but its effects may have not been broadcast to all > the coordinators. > at > org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474) > at > org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374) > at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154) > This looks to be another variation of the conditions described in > IMPALA-7961. But the difference here is that this case is with "CREATE > DATABASE ... IF NOT EXISTS". > The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT > EXISTS" use case. > To fix the issue, we should port the change in patch > [https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10106) Update DataSketches to version 2.1.0
[ https://issues.apache.org/jira/browse/IMPALA-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189746#comment-17189746 ] ASF subversion and git services commented on IMPALA-10106: -- Commit f9936549dcab58390c5662ebdedb9c60838185a4 in impala's branch refs/heads/master from Adam Tamas [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f993654 ] IMPALA-10106: Upgrade DataSketches to version 2.1.0 Upgrade the external DataSketches files for HLL/KLL to version 2.1.0 tests: -Ran the tests from tests/query_test/test_datasketches.py Change-Id: I4faa31c0b628a62c7e56a6c4b9549d0aaa8a02ff Reviewed-on: http://gerrit.cloudera.org:8080/16360 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Update DataSketches to version 2.1.0 > > > Key: IMPALA-10106 > URL: https://issues.apache.org/jira/browse/IMPALA-10106 > Project: IMPALA > Issue Type: Improvement >Reporter: Adam Tamas >Assignee: Adam Tamas >Priority: Minor > > Update the external DataSketches files for HLL/KLL to version 2.1.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles
[ https://issues.apache.org/jira/browse/IMPALA-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189724#comment-17189724 ] Sahil Takiar commented on IMPALA-10141: --- Adding some additional fields from {{/proc/net/dev}} in the per-node stats from system-info.cc might be useful as well. Fields like NET_RX_ERRS, NET_RX_DROP, NET_TX_ERRS, NET_TX_DROP might be useful to track transmit / receive errors or dropped packets. Although these stats are probably more generic as they are not specific to the kRPC TCP connections and are truly at the host level. I'm also not sure what exactly they capture compared to the TCP stats. They seem more hardware specific, maybe they would capture host NIC issues. > Include aggregate TCP metrics in per-node profiles > -- > > Key: IMPALA-10141 > URL: https://issues.apache.org/jira/browse/IMPALA-10141 > Project: IMPALA > Issue Type: Improvement >Reporter: Sahil Takiar >Priority: Major > > The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level > metrics per kRPC connection for all inbound / outbound connections. It would > be useful to aggregate some of these metrics and put them in the per-node > profiles. Since it is not possible to currently split these metrics out per > query, they should be added at the per-host level. Furthermore, only metrics > that can be sanely aggregated across all connections should be included. For > example, tracking the number of Retransmitted TCP Packets across all > connections for the duration of the query would be useful. TCP > retransmissions should be rare and are typically indicate of network hardware > issues or network congestions, having at least some high level idea of the > number of TCP retransmissions that occur during a query can drastically help > determine if the network is to blame for query slowness. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading
[ https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189718#comment-17189718 ] Sahil Takiar commented on IMPALA-10139: --- The "network" time (calculated as {{int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns()}}) might be a more useful threshold value to use. > Slow RPC logs can be misleading > --- > > Key: IMPALA-10139 > URL: https://issues.apache.org/jira/browse/IMPALA-10139 > Project: IMPALA > Issue Type: Improvement >Reporter: Sahil Takiar >Priority: Major > > The slow RPC logs added in IMPALA-9128 are based on the total time taken to > successfully complete a RPC. The issue is that there are many reasons why an > RPC might take a long time to complete. An RPC is considered complete only > when the receiver has processed that RPC. > The problem is that due to client-driven back-pressure mechanism, it is > entirely possible that the receiver RPC does not process a receiver RPC > because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been > called yet (indirectly called by {{ExchangeNode::GetNext}}). > This can lead to flood of slow RPC logs, even though the RPCs might not > actually be slow themselves. What is worse is that the because of the > back-pressure mechanism, slowness from the client (e.g. Hue users) will > propagate across all nodes involved in the query. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles
Sahil Takiar created IMPALA-10141: - Summary: Include aggregate TCP metrics in per-node profiles Key: IMPALA-10141 URL: https://issues.apache.org/jira/browse/IMPALA-10141 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level metrics per kRPC connection for all inbound / outbound connections. It would be useful to aggregate some of these metrics and put them in the per-node profiles. Since it is not possible to currently split these metrics out per query, they should be added at the per-host level. Furthermore, only metrics that can be sanely aggregated across all connections should be included. For example, tracking the number of Retransmitted TCP Packets across all connections for the duration of the query would be useful. TCP retransmissions should be rare and are typically indicate of network hardware issues or network congestions, having at least some high level idea of the number of TCP retransmissions that occur during a query can drastically help determine if the network is to blame for query slowness. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles
Sahil Takiar created IMPALA-10141: - Summary: Include aggregate TCP metrics in per-node profiles Key: IMPALA-10141 URL: https://issues.apache.org/jira/browse/IMPALA-10141 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level metrics per kRPC connection for all inbound / outbound connections. It would be useful to aggregate some of these metrics and put them in the per-node profiles. Since it is not possible to currently split these metrics out per query, they should be added at the per-host level. Furthermore, only metrics that can be sanely aggregated across all connections should be included. For example, tracking the number of Retransmitted TCP Packets across all connections for the duration of the query would be useful. TCP retransmissions should be rare and are typically indicate of network hardware issues or network congestions, having at least some high level idea of the number of TCP retransmissions that occur during a query can drastically help determine if the network is to blame for query slowness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10140) Throw CatalogException for query "create database if not exist" with sync_ddl as true
Wenzhe Zhou created IMPALA-10140: Summary: Throw CatalogException for query "create database if not exist" with sync_ddl as true Key: IMPALA-10140 URL: https://issues.apache.org/jira/browse/IMPALA-10140 Project: IMPALA Issue Type: Bug Components: Catalog, Frontend Affects Versions: Impala 3.2.0 Reporter: Wenzhe Zhou Customer faced following error message randomly when running following query on impalad version 3.2.0-cdh6.3.2 RELEASE. ([https://jira.cloudera.com/browse/ENGESC-3589)|https://jira.cloudera.com/browse/ENGESC-3589] set sync_ddl =true ; create database if not exists $dbname; I0715 11:52:28.496253 51943 client-request-state.cc:187] a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been su ccessfully executed but its effects may have not been broadcast to all the coordinators. >From the Catalog server log, we can check following error message as well. I0715 11:01:50.143303 220286 jni-util.cc:256] org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been successfully executed but its effects may have not been broadcast to all the coordinators. at org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474) at org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374) at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154) This looks to be another variation of the conditions described in IMPALA-7961. But the difference here is that this case is with "CREATE DATABASE ... IF NOT EXISTS". The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT EXISTS" use case. To fix the issue, we should port the change in patch [https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10140) Throw CatalogException for query "create database if not exist" with sync_ddl as true
Wenzhe Zhou created IMPALA-10140: Summary: Throw CatalogException for query "create database if not exist" with sync_ddl as true Key: IMPALA-10140 URL: https://issues.apache.org/jira/browse/IMPALA-10140 Project: IMPALA Issue Type: Bug Components: Catalog, Frontend Affects Versions: Impala 3.2.0 Reporter: Wenzhe Zhou Customer faced following error message randomly when running following query on impalad version 3.2.0-cdh6.3.2 RELEASE. ([https://jira.cloudera.com/browse/ENGESC-3589)|https://jira.cloudera.com/browse/ENGESC-3589] set sync_ddl =true ; create database if not exists $dbname; I0715 11:52:28.496253 51943 client-request-state.cc:187] a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been su ccessfully executed but its effects may have not been broadcast to all the coordinators. >From the Catalog server log, we can check following error message as well. I0715 11:01:50.143303 220286 jni-util.cc:256] org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic version for the SYNC_DDL operation after 5 attempts.The operation has been successfully executed but its effects may have not been broadcast to all the coordinators. at org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474) at org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374) at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154) This looks to be another variation of the conditions described in IMPALA-7961. But the difference here is that this case is with "CREATE DATABASE ... IF NOT EXISTS". The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT EXISTS" use case. To fix the issue, we should port the change in patch [https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading
[ https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189698#comment-17189698 ] Sahil Takiar commented on IMPALA-10139: --- This is pretty easy to reproduce on master. I just ran the query "select * from functional.alltypes as a2, functional.alltypes as a1" and didn't fetch any results. A bunch of RPCs get sent, but are not processed because queues are probably full. Then the logs contain entries like: {code:java} I0902 13:25:34.797029 17168 rpcz_store.cc:269] Call impala.DataStreamService.TransmitData from 127.0.0.1:33354 (request call id 6737) took 218496ms. Request Metrics: {} I0902 13:25:34.797061 17168 rpcz_store.cc:273] Trace: 0902 13:21:56.300996 (+ 0us) impala-service-pool.cc:170] Inserting onto call queue 0902 13:21:56.301037 (+41us) impala-service-pool.cc:269] Handling call 0902 13:21:56.301048 (+11us) krpc-data-stream-recvr.cc:325] Enqueuing deferred RPC 0902 13:25:34.757315 (+218456267us) krpc-data-stream-recvr.cc:504] Processing deferred RPC 0902 13:25:34.757317 (+ 2us) krpc-data-stream-recvr.cc:524] Batch queue is full 0902 13:25:34.757319 (+ 2us) krpc-data-stream-recvr.cc:504] Processing deferred RPC 0902 13:25:34.757320 (+ 1us) krpc-data-stream-recvr.cc:524] Batch queue is full 0902 13:25:34.796800 (+ 39480us) krpc-data-stream-recvr.cc:504] Processing deferred RPC 0902 13:25:34.796803 (+ 3us) krpc-data-stream-recvr.cc:397] Deserializing batch 0902 13:25:34.797011 (+ 208us) krpc-data-stream-recvr.cc:424] Enqueuing deserialized batch 0902 13:25:34.797021 (+10us) inbound_call.cc:162] Queueing success response Metrics: {} I0902 13:25:34.797154 17105 krpc-data-stream-sender.cc:394] Slow TransmitData RPC to 127.0.0.1:27000 (fragment_instance_id=d447645333af3b77:671fbefe): took 3m38s. Receiver time: 3m38s Network time: 239.735us I0902 13:25:34.797215 3684 krpc-data-stream-sender.cc:428] d447645333af3b77:671fbefe0005] Long delay waiting for RPC to 127.0.0.1:27000 (fragment_instance_id=d447645333af3b77:671fbefe): took 3m38s {code} > Slow RPC logs can be misleading > --- > > Key: IMPALA-10139 > URL: https://issues.apache.org/jira/browse/IMPALA-10139 > Project: IMPALA > Issue Type: Improvement >Reporter: Sahil Takiar >Priority: Major > > The slow RPC logs added in IMPALA-9128 are based on the total time taken to > successfully complete a RPC. The issue is that there are many reasons why an > RPC might take a long time to complete. An RPC is considered complete only > when the receiver has processed that RPC. > The problem is that due to client-driven back-pressure mechanism, it is > entirely possible that the receiver RPC does not process a receiver RPC > because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been > called yet (indirectly called by {{ExchangeNode::GetNext}}). > This can lead to flood of slow RPC logs, even though the RPCs might not > actually be slow themselves. What is worse is that the because of the > back-pressure mechanism, slowness from the client (e.g. Hue users) will > propagate across all nodes involved in the query. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading
[ https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189696#comment-17189696 ] Sahil Takiar commented on IMPALA-10139: --- Linking IMPALA-3380 - which has some details why we don't add timeouts for TransmitData RPCs. > Slow RPC logs can be misleading > --- > > Key: IMPALA-10139 > URL: https://issues.apache.org/jira/browse/IMPALA-10139 > Project: IMPALA > Issue Type: Improvement >Reporter: Sahil Takiar >Priority: Major > > The slow RPC logs added in IMPALA-9128 are based on the total time taken to > successfully complete a RPC. The issue is that there are many reasons why an > RPC might take a long time to complete. An RPC is considered complete only > when the receiver has processed that RPC. > The problem is that due to client-driven back-pressure mechanism, it is > entirely possible that the receiver RPC does not process a receiver RPC > because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been > called yet (indirectly called by {{ExchangeNode::GetNext}}). > This can lead to flood of slow RPC logs, even though the RPCs might not > actually be slow themselves. What is worse is that the because of the > back-pressure mechanism, slowness from the client (e.g. Hue users) will > propagate across all nodes involved in the query. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10135) Insert events doesn't contain the inserted data files
[ https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189685#comment-17189685 ] Zoltán Borók-Nagy commented on IMPALA-10135: Thanks for taking care of this, Vihang! Please note that the problem with INSERT OVERWRITEs is not that we don't provide the files, but that we don't even send events at all, because 'partsPostInsert' is always empty for INSERT OVERWRITEs, therefore 'insertEventInfos' also remains empty: [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4543-L4553] > Insert events doesn't contain the inserted data files > - > > Key: IMPALA-10135 > URL: https://issues.apache.org/jira/browse/IMPALA-10135 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Vihang Karajgaonkar >Priority: Major > > When Impala generates INSERT EVENTs it doesn't add the newly inserted > datafiles. > The problem is that Impala misuses Sets.difference(set1, set2). From the API > doc at > [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] > "The returned set contains all elements that are contained by {{set1}} and > not contained by {{set2}}. {{set2}} may also contain elements not present in > {{set1}}; these are simply ignored." > So the name "difference" is a bit misleading, it's rather a subtraction > between set1 and set2. > Unfortunately Impala passes the parameters in wrong order: > Sets.difference(beforeInsert, afterInsert): > [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] > So the result will be always empty. > There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT > events. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9992) test_scanner_position seems flaky
[ https://issues.apache.org/jira/browse/IMPALA-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189681#comment-17189681 ] Joe McDonnell commented on IMPALA-9992: --- We output a list of the files at the start of the tests after dataload. For a run not impacted by this, logs/file-list-begin-1.log has these entries for complextypestbl_medium (in the ORC format): {noformat} drwxr-xr-x - jenkins supergroup 0 2020-09-02 05:11 /test-warehouse/managed/complextypestbl_medium_orc_def drwxr-xr-x - jenkins supergroup 0 2020-09-02 05:11 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001 -rw-r--r-- 3 jenkins supergroup 1 2020-09-02 05:11 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001/_orc_acid_version -rw-r--r-- 3 jenkins supergroup 6513 2020-09-02 05:11 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_0_1 -rw-r--r-- 3 jenkins supergroup 6600 2020-09-02 05:11 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_1_0 -rw-r--r-- 3 jenkins supergroup 6671 2020-09-02 05:11 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_2_1{noformat} For a run impacted by this, it has this list in logs/file-list-begin-1.log: {noformat} drwxr-xr-x - jenkins supergroup 0 2020-09-01 03:05 /test-warehouse/managed/complextypestbl_medium_orc_def drwxr-xr-x - jenkins supergroup 0 2020-09-01 03:05 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001 -rw-r--r-- 3 jenkins supergroup 1 2020-09-01 03:05 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001/_orc_acid_version -rw-r--r-- 3 jenkins supergroup 6513 2020-09-01 03:05 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_0_1 -rw-r--r-- 3 jenkins supergroup 6600 2020-09-01 03:05 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_1_0 -rw-r--r-- 3 jenkins supergroup 6671 2020-09-01 03:05 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_2_0 -rw-r--r-- 3 jenkins supergroup 6671 2020-09-01 03:05 /test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_2_1{noformat} It looks like there is an extra file (bucket_2_0 and bucket_2_1 have the same size). This table is written by Hive during dataload. >From the symptoms that I know about, this seems to only happen on ORC (but, of >course, the file list would have the other formats if we have ever seen it >elsewhere). > test_scanner_position seems flaky > - > > Key: IMPALA-9992 > URL: https://issues.apache.org/jira/browse/IMPALA-9992 > Project: IMPALA > Issue Type: Bug > Components: Backend, Frontend >Reporter: Fang-Yu Rao >Assignee: Bikramjeet Vig >Priority: Critical > Labels: broken-build, flaky > > [test_scanner_position|https://github.com/apache/impala/blob/master/tests/query_test/test_nested_types.py#L72-L76] > failed in a recent build when executing the following query at > [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-position.test#L646-L666] > {code:java} > select pos, item, count(*) > from complextypestbl_medium.int_array > group by 1, 2 > {code} > The error message is as follows. > {code:java} > ERROR:test_configuration:Comparing QueryTestResults (expected vs actual): > 0,-1,7300 != 0,-1,9856 > 0,1,7300 != 0,1,9524 > 0,NULL,7300 != 0,NULL,9700 > 1,1,7300 != 1,1,9700 > 1,2,7300 != 1,2,9524 > 2,2,7300 != 2,2,9700 > 2,3,7300 != 2,3,9524 > 3,NULL,7300 != 3,NULL,9700 > 4,3,7300 != 4,3,9700 > 5,NULL,7300 != 5,NULL,9700 > {code} > Maybe [~tarmstrong], [~bikram], and [~csringhofer] could offer some insight > into the issue since you were working on/reviewing the corresponding patch. > Assign the JIRA to [~tarmstrong] for now but please feel free to assign to > other as you find appropriate. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10135) Insert events doesn't contain the inserted data files
[ https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar reassigned IMPALA-10135: Assignee: Vihang Karajgaonkar > Insert events doesn't contain the inserted data files > - > > Key: IMPALA-10135 > URL: https://issues.apache.org/jira/browse/IMPALA-10135 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Vihang Karajgaonkar >Priority: Major > > When Impala generates INSERT EVENTs it doesn't add the newly inserted > datafiles. > The problem is that Impala misuses Sets.difference(set1, set2). From the API > doc at > [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] > "The returned set contains all elements that are contained by {{set1}} and > not contained by {{set2}}. {{set2}} may also contain elements not present in > {{set1}}; these are simply ignored." > So the name "difference" is a bit misleading, it's rather a subtraction > between set1 and set2. > Unfortunately Impala passes the parameters in wrong order: > Sets.difference(beforeInsert, afterInsert): > [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] > So the result will be always empty. > There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT > events. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10135) Insert events doesn't contain the inserted data files
[ https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189644#comment-17189644 ] Vihang Karajgaonkar commented on IMPALA-10135: -- Thanks for reporting this issue [~boroknagyz]. I will take this up. The event type which is being used for generating the INSERT_EVENT defines the files as a optional field IIRC so in theory it is okay to ignore adding the files for this particular event type. Once we switch this to use transactional insert event type IMPALA-9664 for transactional tables, I think we will need to provide the files in both the cases (overwrite v/s no overwrite). > Insert events doesn't contain the inserted data files > - > > Key: IMPALA-10135 > URL: https://issues.apache.org/jira/browse/IMPALA-10135 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Priority: Major > > When Impala generates INSERT EVENTs it doesn't add the newly inserted > datafiles. > The problem is that Impala misuses Sets.difference(set1, set2). From the API > doc at > [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] > "The returned set contains all elements that are contained by {{set1}} and > not contained by {{set2}}. {{set2}} may also contain elements not present in > {{set1}}; these are simply ignored." > So the name "difference" is a bit misleading, it's rather a subtraction > between set1 and set2. > Unfortunately Impala passes the parameters in wrong order: > Sets.difference(beforeInsert, afterInsert): > [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] > So the result will be always empty. > There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT > events. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10139) Slow RPC logs can be misleading
Sahil Takiar created IMPALA-10139: - Summary: Slow RPC logs can be misleading Key: IMPALA-10139 URL: https://issues.apache.org/jira/browse/IMPALA-10139 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The slow RPC logs added in IMPALA-9128 are based on the total time taken to successfully complete a RPC. The issue is that there are many reasons why an RPC might take a long time to complete. An RPC is considered complete only when the receiver has processed that RPC. The problem is that due to client-driven back-pressure mechanism, it is entirely possible that the receiver RPC does not process a receiver RPC because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been called yet (indirectly called by {{ExchangeNode::GetNext}}). This can lead to flood of slow RPC logs, even though the RPCs might not actually be slow themselves. What is worse is that the because of the back-pressure mechanism, slowness from the client (e.g. Hue users) will propagate across all nodes involved in the query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10139) Slow RPC logs can be misleading
Sahil Takiar created IMPALA-10139: - Summary: Slow RPC logs can be misleading Key: IMPALA-10139 URL: https://issues.apache.org/jira/browse/IMPALA-10139 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The slow RPC logs added in IMPALA-9128 are based on the total time taken to successfully complete a RPC. The issue is that there are many reasons why an RPC might take a long time to complete. An RPC is considered complete only when the receiver has processed that RPC. The problem is that due to client-driven back-pressure mechanism, it is entirely possible that the receiver RPC does not process a receiver RPC because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been called yet (indirectly called by {{ExchangeNode::GetNext}}). This can lead to flood of slow RPC logs, even though the RPCs might not actually be slow themselves. What is worse is that the because of the back-pressure mechanism, slowness from the client (e.g. Hue users) will propagate across all nodes involved in the query. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10138) Add fragment instance id to RPC trace output
Sahil Takiar created IMPALA-10138: - Summary: Add fragment instance id to RPC trace output Key: IMPALA-10138 URL: https://issues.apache.org/jira/browse/IMPALA-10138 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The RPC traces added in IMPALA-9128 are hard to correlate to specific queries because the output does not include the fragment instance id. I'm not sure if this is actually possible in the current kRPC code, but it would be nice if the tracing output included the fragment instance id. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10138) Add fragment instance id to RPC trace output
Sahil Takiar created IMPALA-10138: - Summary: Add fragment instance id to RPC trace output Key: IMPALA-10138 URL: https://issues.apache.org/jira/browse/IMPALA-10138 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The RPC traces added in IMPALA-9128 are hard to correlate to specific queries because the output does not include the fragment instance id. I'm not sure if this is actually possible in the current kRPC code, but it would be nice if the tracing output included the fragment instance id. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IMPALA-9954) RpcRecvrTime can be negative
[ https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated IMPALA-9954: - Parent: IMPALA-10137 Issue Type: Sub-task (was: Bug) > RpcRecvrTime can be negative > > > Key: IMPALA-9954 > URL: https://issues.apache.org/jira/browse/IMPALA-9954 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Priority: Major > Attachments: profile_034e7209bd98c96c_9a448dfc.txt > > > Saw this on a recent version of master. Attached the full runtime profile. > {code:java} > KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, > % non-child: 32.30%) > ExecOption: Unpartitioned Sender Codegen Disabled: not needed >- BytesSent (500.000ms): 0, 0 >- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: > 4.34 MB/sec ; Number of samples: 1) >- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; > Number of samples: 2) >- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: > -71077.000ns ; Number of samples: 2) >- EosSent: 1 (1) >- PeakMemoryUsage: 416.00 B (416) >- RowsSent: 100 (100) >- RpcFailure: 0 (0) >- RpcRetry: 0 (0) >- SerializeBatchTime: 2.880ms >- TotalBytesSent: 28.67 KB (29355) >- UncompressedRowBatchSize: 69.29 KB (70950) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-5473) Make diagnosing network issues easier
[ https://issues.apache.org/jira/browse/IMPALA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned IMPALA-5473: Assignee: (was: Michael Ho) > Make diagnosing network issues easier > - > > Key: IMPALA-5473 > URL: https://issues.apache.org/jira/browse/IMPALA-5473 > Project: IMPALA > Issue Type: Task >Affects Versions: Impala 2.10.0 >Reporter: Henry Robinson >Priority: Major > > With our current metrics in the profile, it's hard to debug queries that get > slow throughput from their exchanges. > The following cases have different causes, but similar symptoms (e.g. a high > {{InactiveTimer}} in the xchg profile): > 1. Downstream sender does not produce rows quickly (perhaps because *its* > child instances do not produce rows quickly). > 2. Downstream sender can not _send_ rows quickly, perhaps because of network > congestion. > 3. Downstream sender does not start producing rows until some time after the > upstream has started (captured by {{FirstBatchArrivalWaitTime}}). > 4. Downstream sender does not close stream until some time after all rows are > sent. > We should try to improve these metrics so that all the information about who > is slow, and why, is available clearly in the runtime profile. Distinguishing > cases 1 and 2 is particularly important. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10049) Include RPC call_id in slow RPC logs
[ https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated IMPALA-10049: -- Parent: IMPALA-10137 Issue Type: Sub-task (was: Improvement) > Include RPC call_id in slow RPC logs > > > Key: IMPALA-10049 > URL: https://issues.apache.org/jira/browse/IMPALA-10049 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Priority: Major > > The current code for logging slow RPCs on the sender side looks something > like this: > {code:java} > template > void KrpcDataStreamSender::Channel::LogSlowRpc( > ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) { > int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns(); > LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_ > ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << > "): " > ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) > << ". " > ¦ ¦ ¦ ¦ ¦ << "Receiver time: " > ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), > TUnit::TIME_NS) > ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, > TUnit::TIME_NS); > } > void KrpcDataStreamSender::Channel::LogSlowFailedRpc( > ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) { > LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_ > ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << > "): " > ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) > << ". " > ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString(); > } {code} > It would be nice to include the call_id in the logs as well so that RPCs can > more easily be traced. The RPC call_id is dumped in RPC traces on the > receiver side, as well as in the /rpcz output on the debug ui. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10135) Insert events doesn't contain the inserted data files
[ https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10135: --- Description: When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles. The problem is that Impala misuses Sets.difference(set1, set2). From the API doc at [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] "The returned set contains all elements that are contained by {{set1}} and not contained by {{set2}}. {{set2}} may also contain elements not present in {{set1}}; these are simply ignored." So the name "difference" is a bit misleading, it's rather a subtraction between set1 and set2. Unfortunately Impala passes the parameters in wrong order: Sets.difference(beforeInsert, afterInsert): [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] So the result will be always empty. There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT events. was: When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles. The problem is that Impala misuses Sets.difference(set1, set2). From the API doc at [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] "The returned set contains all elements that are contained by {{set1}} and not contained by {{set2}}. {{set2}} may also contain elements not present in {{set1}}; these are simply ignored." So the name "difference" is a bit misleading, it's rather a subtraction between set1 and set2. Unfortunately Impala passes the parameters in wrong order: Sets.difference(beforeInsert, afterInsert): [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] So the result will be always empty. There's another problem with INSERT OVERWRITEs, in that case we never fill the data files of the insert event. > Insert events doesn't contain the inserted data files > - > > Key: IMPALA-10135 > URL: https://issues.apache.org/jira/browse/IMPALA-10135 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Priority: Major > > When Impala generates INSERT EVENTs it doesn't add the newly inserted > datafiles. > The problem is that Impala misuses Sets.difference(set1, set2). From the API > doc at > [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] > "The returned set contains all elements that are contained by {{set1}} and > not contained by {{set2}}. {{set2}} may also contain elements not present in > {{set1}}; these are simply ignored." > So the name "difference" is a bit misleading, it's rather a subtraction > between set1 and set2. > Unfortunately Impala passes the parameters in wrong order: > Sets.difference(beforeInsert, afterInsert): > [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] > So the result will be always empty. > There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT > events. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6705) TotalNetworkSendTime in query profile is misleading
[ https://issues.apache.org/jira/browse/IMPALA-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189580#comment-17189580 ] Sahil Takiar commented on IMPALA-6705: -- I got bitten by this recently, and it was very confusing. Linking to IMPALA-10137. +1 on fixing this. > TotalNetworkSendTime in query profile is misleading > --- > > Key: IMPALA-6705 > URL: https://issues.apache.org/jira/browse/IMPALA-6705 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, > Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 2.12.0 >Reporter: Michael Ho >Priority: Major > Labels: observability > > {{TotalNetworkSendTime}} is actually measuring the time which a fragment > instance execution thread spent waiting for the completion of previous RPC. > This is a combination of: > - network time of sending the RPC payload to the destination > - processing and queuing time in the destination > - network time of sending the RPC response to the originating node > The name of this metric itself is misleading because it gives the impression > that it's the time spent sending the RPC payload to the destination so a > query profile with a high {{TotalNetworkSendTime}} may easily mislead a user > into concluding that there is something wrong with the network. In reality, > the receiving end could be overloaded and it's taking a huge amount of time > to respond to an RPC. > For this metric to be useful, we need to have a breakdown of those 3 > components above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-10122) Allow view authorization to be deferred until selection time
[ https://issues.apache.org/jira/browse/IMPALA-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188692#comment-17188692 ] Fang-Yu Rao edited comment on IMPALA-10122 at 9/2/20, 5:42 PM: --- Hi [~vihangk1], [~stigahuang], and [~csringhofer], please take a look at the description of the JIRA and let me know if you have any additional idea or comment. Thanks! Tagged [~jcamachorodriguez], [~jfs], [~hemanth619], and [~thejas] as well so that you know the issue is also tracked on the Impala side. was (Author: fangyurao): Hi [~vihangk1], [~stigahuang], and [~csringhofer], please take a look at the description of the JIRA and let me know if you have any additional idea or comment. Thanks! Tagged [~jcamachorodriguez], [~jfs], and [~thejas] as well so that you know the issue is also tracked on the Impala side. > Allow view authorization to be deferred until selection time > > > Key: IMPALA-10122 > URL: https://issues.apache.org/jira/browse/IMPALA-10122 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > Recall that currently Impala performs authorization with Ranger to check > whether the requesting user is granted the privilege of {{SELECT}} for the > underlying tables when a view is created and thus does not check whether the > requesting user is granted the {{SELECT}} privilege on the underlying tables > when the view is selected. > On the other hand, currently a Spark user is not allowed to directly create a > view in HMS without involving the Impala frontend, because Spark clients are > normal users (v.s. superusers). To relax this restriction, it would be good > to allow a Spark user to directly create a view in HMS without involving the > Impala frontend. However, it can be seen that the authorization check is > skipped for views created in this manner since HMS currently does not possess > the capability to perform the authorization. Due to this relaxation, for a > view created this way, the authorization of the view needs to be carried out > at the selection time to make sure the requesting user is indeed granted the > {{SELECT}} privileges on the underlying tables defined in the view. > There is also a corresponding Hive JIRA at HIVE-24026. Refer to there for > further details. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10137) Network Debugging / Supportability Improvements
[ https://issues.apache.org/jira/browse/IMPALA-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated IMPALA-10137: -- Labels: observability (was: ) > Network Debugging / Supportability Improvements > --- > > Key: IMPALA-10137 > URL: https://issues.apache.org/jira/browse/IMPALA-10137 > Project: IMPALA > Issue Type: Epic >Reporter: Sahil Takiar >Priority: Major > Labels: observability > > There are various improvements Impala should make to improve debugging of > network issues (e.g. slow RPCs, TCP retransmissions, etc.). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10049) Include RPC call_id in slow RPC logs
[ https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned IMPALA-10049: - Assignee: Sahil Takiar > Include RPC call_id in slow RPC logs > > > Key: IMPALA-10049 > URL: https://issues.apache.org/jira/browse/IMPALA-10049 > Project: IMPALA > Issue Type: Improvement >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > The current code for logging slow RPCs on the sender side looks something > like this: > {code:java} > template > void KrpcDataStreamSender::Channel::LogSlowRpc( > ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) { > int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns(); > LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_ > ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << > "): " > ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) > << ". " > ¦ ¦ ¦ ¦ ¦ << "Receiver time: " > ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), > TUnit::TIME_NS) > ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, > TUnit::TIME_NS); > } > void KrpcDataStreamSender::Channel::LogSlowFailedRpc( > ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) { > LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_ > ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << > "): " > ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) > << ". " > ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString(); > } {code} > It would be nice to include the call_id in the logs as well so that RPCs can > more easily be traced. The RPC call_id is dumped in RPC traces on the > receiver side, as well as in the /rpcz output on the debug ui. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10049) Include RPC call_id in slow RPC logs
[ https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned IMPALA-10049: - Assignee: (was: Sahil Takiar) > Include RPC call_id in slow RPC logs > > > Key: IMPALA-10049 > URL: https://issues.apache.org/jira/browse/IMPALA-10049 > Project: IMPALA > Issue Type: Improvement >Reporter: Sahil Takiar >Priority: Major > > The current code for logging slow RPCs on the sender side looks something > like this: > {code:java} > template > void KrpcDataStreamSender::Channel::LogSlowRpc( > ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) { > int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns(); > LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_ > ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << > "): " > ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) > << ". " > ¦ ¦ ¦ ¦ ¦ << "Receiver time: " > ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), > TUnit::TIME_NS) > ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, > TUnit::TIME_NS); > } > void KrpcDataStreamSender::Channel::LogSlowFailedRpc( > ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) { > LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_ > ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << > "): " > ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) > << ". " > ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString(); > } {code} > It would be nice to include the call_id in the logs as well so that RPCs can > more easily be traced. The RPC call_id is dumped in RPC traces on the > receiver side, as well as in the /rpcz output on the debug ui. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10094) TestResetMetadata.test_refresh_updated_partitions fails due to connection error
[ https://issues.apache.org/jira/browse/IMPALA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10094. -- Fix Version/s: Impala 4.0 Resolution: Fixed > TestResetMetadata.test_refresh_updated_partitions fails due to connection > error > --- > > Key: IMPALA-10094 > URL: https://issues.apache.org/jira/browse/IMPALA-10094 > Project: IMPALA > Issue Type: Bug >Reporter: Norbert Luksa >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.0 > > > This has occurred in the last few builds in > impala-cdpd-master-staging-core-s3: > [https://master-02.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/14/] > Error message: > {code:java} > metadata/test_reset_metadata.py:49: in test_refresh_updated_partitions > "alter table {0} add partition (year=2020, month=8)".format(tbl)) > common/impala_test_suite.py:983: in run_stmt_in_hive > raise RuntimeError(stderr) > E RuntimeError: SLF4J: Class path contains multiple SLF4J bindings. > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > E SLF4J: Actual binding is of type > [org.apache.logging.slf4j.Log4jLoggerFactory] > E ERROR StatusLogger No log4j2 configuration file found. Using default > configuration: logging only errors to the console. Set system property > 'log4j2.debug' to show Log4j2 internal initialization logging. > E SLF4J: Class path contains multiple SLF4J bindings. > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > E SLF4J: Actual binding is of type > [org.apache.logging.slf4j.Log4jLoggerFactory] > E Connecting to jdbc:hive2://localhost:11050 > E 20/08/18 05:10:24 [main]: WARN jdbc.HiveConnection: Failed to connect to > localhost:11050 > E Could not open connection to the HS2 server. Please check the server URI > and if the URI is correct, then ask the administrator to check the server > status. > E Error: Could not open client transport with JDBC Uri: > jdbc:hive2://localhost:11050: java.net.ConnectException: Connection refused > (Connection refused) (state=08S01,code=0){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10094) TestResetMetadata.test_refresh_updated_partitions fails due to connection error
[ https://issues.apache.org/jira/browse/IMPALA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10094. -- Fix Version/s: Impala 4.0 Resolution: Fixed > TestResetMetadata.test_refresh_updated_partitions fails due to connection > error > --- > > Key: IMPALA-10094 > URL: https://issues.apache.org/jira/browse/IMPALA-10094 > Project: IMPALA > Issue Type: Bug >Reporter: Norbert Luksa >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.0 > > > This has occurred in the last few builds in > impala-cdpd-master-staging-core-s3: > [https://master-02.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/14/] > Error message: > {code:java} > metadata/test_reset_metadata.py:49: in test_refresh_updated_partitions > "alter table {0} add partition (year=2020, month=8)".format(tbl)) > common/impala_test_suite.py:983: in run_stmt_in_hive > raise RuntimeError(stderr) > E RuntimeError: SLF4J: Class path contains multiple SLF4J bindings. > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > E SLF4J: Actual binding is of type > [org.apache.logging.slf4j.Log4jLoggerFactory] > E ERROR StatusLogger No log4j2 configuration file found. Using default > configuration: logging only errors to the console. Set system property > 'log4j2.debug' to show Log4j2 internal initialization logging. > E SLF4J: Class path contains multiple SLF4J bindings. > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > E SLF4J: Actual binding is of type > [org.apache.logging.slf4j.Log4jLoggerFactory] > E Connecting to jdbc:hive2://localhost:11050 > E 20/08/18 05:10:24 [main]: WARN jdbc.HiveConnection: Failed to connect to > localhost:11050 > E Could not open connection to the HS2 server. Please check the server URI > and if the URI is correct, then ask the administrator to check the server > status. > E Error: Could not open client transport with JDBC Uri: > jdbc:hive2://localhost:11050: java.net.ConnectException: Connection refused > (Connection refused) (state=08S01,code=0){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10137) Network Debugging / Supportability Improvements
Sahil Takiar created IMPALA-10137: - Summary: Network Debugging / Supportability Improvements Key: IMPALA-10137 URL: https://issues.apache.org/jira/browse/IMPALA-10137 Project: IMPALA Issue Type: Epic Reporter: Sahil Takiar There are various improvements Impala should make to improve debugging of network issues (e.g. slow RPCs, TCP retransmissions, etc.). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10137) Network Debugging / Supportability Improvements
Sahil Takiar created IMPALA-10137: - Summary: Network Debugging / Supportability Improvements Key: IMPALA-10137 URL: https://issues.apache.org/jira/browse/IMPALA-10137 Project: IMPALA Issue Type: Epic Reporter: Sahil Takiar There are various improvements Impala should make to improve debugging of network issues (e.g. slow RPCs, TCP retransmissions, etc.). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-10124) admission-controller-test fails with no such file or directory error
[ https://issues.apache.org/jira/browse/IMPALA-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189380#comment-17189380 ] Qifan Chen edited comment on IMPALA-10124 at 9/2/20, 5:01 PM: -- Thanks a lot for reporting the issue and providing the detailed info. Here are frames from the stack trace. {code:java} #0 0x0249fc6e in std::vector, std::allocator > >::operator[] (this=0x17c84e48, __n=2) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_vector.h:816 #1 0x024a5118 in std::_detail::_Executor, std::allocator > >, std::allocator, std::allocator > > > >, std::_cxx11::regex_traits, true>::_M_handle_repeat ( this=0x7ffc5aa85d28, _match_mode=std::detail::_Executor >, std::allocator > > >, std::_cxx11::regex_traits, true>::_Match_mode::_Exact, __i=2) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:207 #2 0x024a4f42 in std::_detail::_Executor, std::allocator > >, std::allocator, std::allocator > > > >, std::_cxx11::regex_traits, true>::_M_dfs ( this=0x7ffc5aa85d28, _match_mode=std::detail::_Executor >, std::allocator > > >, std::_cxx11::regex_traits, true>::_Match_mode::_Exact, __i=2) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:466 #3 0x024a62c3 in std::_detail::_Executor, std::allocator > >, std::allocator, std::allocator > > > >, std::_cxx11::regex_traits, true>::_M_handle_match ( this=0x7ffc5aa85d28, _match_mode=std::detail::_Executor >, std::allocator > > >, std::_cxx11::regex_traits, true>::_Match_mode::_Exact, __i=1) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:329 ... ... ... ... #5956 0x024260e6 in std::regex_match, std::allocator, char, std::_cxx11::regex_traits > (_s=..., __re=..., __flags=(unknown: 0)) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex.h:2169 #5957 0x0241e03f in impala::AdmissionControllerTest_TopNQueryCheck_Test::TestBody (this=0x15ac9480) at /home/qchen/Impala/be/src/scheduling/admission-controller-test.cc:1075 #5958 0x093f6bfa in void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::)(), char const) () #5959 0x093f002a in testing::Test::Run() () #5960 0x093f010c in testing::TestInfo::Run() () #5961 0x093f0245 in testing::TestCase::Run() () #5962 0x093f08f0 in testing::internal::UnitTestImpl::RunAllTests() () #5963 0x093f0a27 in testing::UnitTest::Run() () #5964 0x01d7de0b in main (argc=2, argv=0x7ffc5aa872b8) at /home/qchen/Impala/be/src/service/unified-betest-main.cc:48 {code} The responding code is as follows. {code:java} 1064 string mem_details_for_host0 = 1065 admission_controller->GetLogStringForTopNQueriesOnHost(HOST_0); 1066 // Verify that the 5 top ones appear in the following order. 1067 std::regex pattern_pools_for_host0(".*"+ 1068 QUEUE_B+".*"+"id=0001:0002, consumed=10.00 MB"+".*"+ 1069 QUEUE_A+".*"+"id=:, consumed=10.00 MB"+".*"+ 1070 QUEUE_D+".*"+"id=0003:0011, consumed=9.00 MB"+".*"+ 1071 "id=0003:000a, consumed=9.00 MB"+".*"+ 1072 "id=0003:0007, consumed=9.00 MB"+".*" 1073 ,std::regex::basic 1074 ); 1075 ASSERT_TRUE(std::regex_match(mem_details_for_host0, pattern_pools_for_host0)); 1076{code} was (Author: sql_forever): Thanks a lot for reporting the issue and providing the detailed info. Here are frames from the stack trace. #0 0x0249fc6e in std::vector, std::allocator > >::operator[] (this=0x17c84e48, __n=2) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_vector.h:816 #1 0x024a5118 in std::__detail::_Executor<__gnu_cxx::__normal_iterator, std::allocator > >, std::allocator, std::allocator > > > >, std::__cxx11::regex_traits, true>::_M_handle_repeat ( this=0x7ffc5aa85d28, __match_mode=std::__detail::_Executor<__gnu_cxx::__normal_iterator >, std::allocator > > >, std::__cxx11::regex_traits, true>::_Match_mode::_Exact, __i=2) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:207 #2 0x024a4f42 in std::__detail::_Executor<__gnu_cxx::__normal_iterator,
[jira] [Commented] (IMPALA-10124) admission-controller-test fails with no such file or directory error
[ https://issues.apache.org/jira/browse/IMPALA-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189380#comment-17189380 ] Qifan Chen commented on IMPALA-10124: - Thanks a lot for reporting the issue and providing the detailed info. Here are frames from the stack trace. #0 0x0249fc6e in std::vector, std::allocator > >::operator[] (this=0x17c84e48, __n=2) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_vector.h:816 #1 0x024a5118 in std::__detail::_Executor<__gnu_cxx::__normal_iterator, std::allocator > >, std::allocator, std::allocator > > > >, std::__cxx11::regex_traits, true>::_M_handle_repeat ( this=0x7ffc5aa85d28, __match_mode=std::__detail::_Executor<__gnu_cxx::__normal_iterator >, std::allocator > > >, std::__cxx11::regex_traits, true>::_Match_mode::_Exact, __i=2) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:207 #2 0x024a4f42 in std::__detail::_Executor<__gnu_cxx::__normal_iterator, std::allocator > >, std::allocator, std::allocator > > > >, std::__cxx11::regex_traits, true>::_M_dfs ( this=0x7ffc5aa85d28, __match_mode=std::__detail::_Executor<__gnu_cxx::__normal_iterator >, std::allocator > > >, std::__cxx11::regex_traits, true>::_Match_mode::_Exact, __i=2) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:466 #3 0x024a62c3 in std::__detail::_Executor<__gnu_cxx::__normal_iterator, std::allocator > >, std::allocator, std::allocator > > > >, std::__cxx11::regex_traits, true>::_M_handle_match ( this=0x7ffc5aa85d28, __match_mode=std::__detail::_Executor<__gnu_cxx::__normal_iterator >, std::allocator > > >, std::__cxx11::regex_traits, true>::_Match_mode::_Exact, __i=1) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:329 ... ... ... ... #5956 0x024260e6 in std::regex_match, std::allocator, char, std::__cxx11::regex_traits > (__s=..., __re=..., __flags=(unknown: 0)) at /home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex.h:2169 #5957 0x0241e03f in impala::AdmissionControllerTest_TopNQueryCheck_Test::TestBody (this=0x15ac9480) at /home/qchen/Impala/be/src/scheduling/admission-controller-test.cc:1075 #5958 0x093f6bfa in void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) () #5959 0x093f002a in testing::Test::Run() () #5960 0x093f010c in testing::TestInfo::Run() () #5961 0x093f0245 in testing::TestCase::Run() () #5962 0x093f08f0 in testing::internal::UnitTestImpl::RunAllTests() () #5963 0x093f0a27 in testing::UnitTest::Run() () #5964 0x01d7de0b in main (argc=2, argv=0x7ffc5aa872b8) at /home/qchen/Impala/be/src/service/unified-betest-main.cc:48 > admission-controller-test fails with no such file or directory error > > > Key: IMPALA-10124 > URL: https://issues.apache.org/jira/browse/IMPALA-10124 > Project: IMPALA > Issue Type: Bug >Reporter: Yongzhi Chen >Assignee: Qifan Chen >Priority: Major > > In master-core-ubsan, the admission-controller-test fails : > 03:12:04 > /data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/be/build/debug//scheduling/admission-controller-test: > line 10: 29380 Segmentation fault (core dumped) > ${IMPALA_HOME}/bin/run-jvm-binary.sh > ${IMPALA_HOME}/be/build/latest/service/unifiedbetests > --gtest_filter=${GTEST_FILTER} > --gtest_output=xml:${IMPALA_BE_TEST_LOGS_DIR}/${TEST_EXEC_NAME}.xml > -log_filename="${TEST_EXEC_NAME}" "$@" > 03:12:04 Traceback (most recent call last): > 03:12:04 File > "/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/junitxml_prune_notrun.py", > line 71, in > 03:12:04 if __name__ == "__main__": main() > 03:12:04 File > "/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/junitxml_prune_notrun.py", > line 68, in main > 03:12:04 junitxml_prune_notrun(options.filename) > 03:12:04 File > "/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/junitxml_prune_notrun.py", > line 31, in junitxml_prune_notrun > 03:12:04 root = tree.parse(junitxml_filename) > 03:12:04 File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 647, in > parse > 03:12:04 source = open(source, "rb") > 03:12:04 IOError: [Errno 2] No
[jira] [Commented] (IMPALA-10071) Impala shouldn't create filename starting with underscore during TRUNCATE
[ https://issues.apache.org/jira/browse/IMPALA-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189315#comment-17189315 ] ASF subversion and git services commented on IMPALA-10071: -- Commit 502e1134be595ed5506424ee5f06dcf52f6fc646 in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=502e113 ] IMPALA-10071: Impala shouldn't create filename starting with underscore during ACID TRUNCATE When Impala TRUNCATEs an ACID table, it creates a new base directory with the hidden file "_empty" in it. Newer Hive versions ignore files starting with underscore, therefore they ignore the whole base directory. To resolve this issue we can simply rename the empty file to "empty". Testing: * update acid-truncate.test accordingly Change-Id: Ia0557b9944624bc123c540752bbe3877312a7ac9 Reviewed-on: http://gerrit.cloudera.org:8080/16396 Reviewed-by: Csaba Ringhofer Tested-by: Impala Public Jenkins > Impala shouldn't create filename starting with underscore during TRUNCATE > - > > Key: IMPALA-10071 > URL: https://issues.apache.org/jira/browse/IMPALA-10071 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > > When Impala TRUNCATEs an ACID table, it creates a new base directory with the > hidden file "_empty" in it. > Newer Hive versions ignore files starting with underscore, therefore they > ignore the whole base directory. > To resolve this issue we can simply rename the empty file to "empty". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10136) Cardinality estimates for aggregation operations don't consider conjuncts on grouping expressions correctly
Shant Hovsepian created IMPALA-10136: Summary: Cardinality estimates for aggregation operations don't consider conjuncts on grouping expressions correctly Key: IMPALA-10136 URL: https://issues.apache.org/jira/browse/IMPALA-10136 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 3.4.0 Reporter: Shant Hovsepian Assignee: Shant Hovsepian ComputeStats() in the PlanNode calls estimateNumGroups() for the AggregationNode to calculate the cardinality of a grouping expression. Then in a later step applyConjunctsSelectivity() is called to adjust the cardinality based on the available conjuncts. However with aggregation operations certain conjuncts i.e. those from the HAVING clause or conjuncts on the grouping expressions affect the number of groups produced. ndv(day) = 11 count(alltypesagg) = 10280 {code:java} Query: explain select day, count(*) from alltypesagg where day=2 group by 1 ++ | Explain String | ++ | Max Per-Host Resource Reservation: Memory=4.06MB Threads=4 | | Per-Host Resource Estimates: Memory=52MB | | Codegen disabled by planner| || | PLAN-ROOT SINK | | | | | 04:EXCHANGE [UNPARTITIONED]| | | | | 03:AGGREGATE [FINALIZE]| | | output: count:merge(*) | | | group by: `day` | | | row-size=12B cardinality=11 | | | | | 02:EXCHANGE [HASH(`day`)] | | | | | 01:AGGREGATE [STREAMING] | | | output: count(*)| | | group by: `day` | | | row-size=12B cardinality=11 | | | | | 00:SCAN HDFS [functional.alltypesagg] | |partition predicates: `day` = 2 | |HDFS partitions=1/11 files=1 size=74.48KB | |row-size=4B cardinality=1.00K | ++ Fetched 24 row(s) in 0.02s {code} Given the predicate day=1 applies to the grouping expression the cardinality of the aggregation node should b 1 as opposed to 11. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10136) Cardinality estimates for aggregation operations don't consider conjuncts on grouping expressions correctly
Shant Hovsepian created IMPALA-10136: Summary: Cardinality estimates for aggregation operations don't consider conjuncts on grouping expressions correctly Key: IMPALA-10136 URL: https://issues.apache.org/jira/browse/IMPALA-10136 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 3.4.0 Reporter: Shant Hovsepian Assignee: Shant Hovsepian ComputeStats() in the PlanNode calls estimateNumGroups() for the AggregationNode to calculate the cardinality of a grouping expression. Then in a later step applyConjunctsSelectivity() is called to adjust the cardinality based on the available conjuncts. However with aggregation operations certain conjuncts i.e. those from the HAVING clause or conjuncts on the grouping expressions affect the number of groups produced. ndv(day) = 11 count(alltypesagg) = 10280 {code:java} Query: explain select day, count(*) from alltypesagg where day=2 group by 1 ++ | Explain String | ++ | Max Per-Host Resource Reservation: Memory=4.06MB Threads=4 | | Per-Host Resource Estimates: Memory=52MB | | Codegen disabled by planner| || | PLAN-ROOT SINK | | | | | 04:EXCHANGE [UNPARTITIONED]| | | | | 03:AGGREGATE [FINALIZE]| | | output: count:merge(*) | | | group by: `day` | | | row-size=12B cardinality=11 | | | | | 02:EXCHANGE [HASH(`day`)] | | | | | 01:AGGREGATE [STREAMING] | | | output: count(*)| | | group by: `day` | | | row-size=12B cardinality=11 | | | | | 00:SCAN HDFS [functional.alltypesagg] | |partition predicates: `day` = 2 | |HDFS partitions=1/11 files=1 size=74.48KB | |row-size=4B cardinality=1.00K | ++ Fetched 24 row(s) in 0.02s {code} Given the predicate day=1 applies to the grouping expression the cardinality of the aggregation node should b 1 as opposed to 11. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10135) Insert events doesn't contain the inserted data files
Zoltán Borók-Nagy created IMPALA-10135: -- Summary: Insert events doesn't contain the inserted data files Key: IMPALA-10135 URL: https://issues.apache.org/jira/browse/IMPALA-10135 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles. The problem is that Impala misuses Sets.difference(set1, set2). From the API doc at [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] "The returned set contains all elements that are contained by {{set1}} and not contained by {{set2}}. {{set2}} may also contain elements not present in {{set1}}; these are simply ignored." So the name "difference" is a bit misleading, its rather a subtraction between set1 and set2. Unfortunately Impala passes the parameters in wrong order: Sets.difference(beforeInsert, afterInsert): [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] So the result will be always empty. There's another problem with INSERT OVERWRITEs, in that case we never fill the data files of the insert event. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10135) Insert events doesn't contain the inserted data files
[ https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10135: --- Description: When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles. The problem is that Impala misuses Sets.difference(set1, set2). From the API doc at [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] "The returned set contains all elements that are contained by {{set1}} and not contained by {{set2}}. {{set2}} may also contain elements not present in {{set1}}; these are simply ignored." So the name "difference" is a bit misleading, it's rather a subtraction between set1 and set2. Unfortunately Impala passes the parameters in wrong order: Sets.difference(beforeInsert, afterInsert): [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] So the result will be always empty. There's another problem with INSERT OVERWRITEs, in that case we never fill the data files of the insert event. was: When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles. The problem is that Impala misuses Sets.difference(set1, set2). From the API doc at [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] "The returned set contains all elements that are contained by {{set1}} and not contained by {{set2}}. {{set2}} may also contain elements not present in {{set1}}; these are simply ignored." So the name "difference" is a bit misleading, its rather a subtraction between set1 and set2. Unfortunately Impala passes the parameters in wrong order: Sets.difference(beforeInsert, afterInsert): [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] So the result will be always empty. There's another problem with INSERT OVERWRITEs, in that case we never fill the data files of the insert event. > Insert events doesn't contain the inserted data files > - > > Key: IMPALA-10135 > URL: https://issues.apache.org/jira/browse/IMPALA-10135 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Priority: Major > > When Impala generates INSERT EVENTs it doesn't add the newly inserted > datafiles. > The problem is that Impala misuses Sets.difference(set1, set2). From the API > doc at > [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] > "The returned set contains all elements that are contained by {{set1}} and > not contained by {{set2}}. {{set2}} may also contain elements not present in > {{set1}}; these are simply ignored." > So the name "difference" is a bit misleading, it's rather a subtraction > between set1 and set2. > Unfortunately Impala passes the parameters in wrong order: > Sets.difference(beforeInsert, afterInsert): > [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] > So the result will be always empty. > There's another problem with INSERT OVERWRITEs, in that case we never fill > the data files of the insert event. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10135) Insert events doesn't contain the inserted data files
Zoltán Borók-Nagy created IMPALA-10135: -- Summary: Insert events doesn't contain the inserted data files Key: IMPALA-10135 URL: https://issues.apache.org/jira/browse/IMPALA-10135 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles. The problem is that Impala misuses Sets.difference(set1, set2). From the API doc at [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-] "The returned set contains all elements that are contained by {{set1}} and not contained by {{set2}}. {{set2}} may also contain elements not present in {{set1}}; these are simply ignored." So the name "difference" is a bit misleading, its rather a subtraction between set1 and set2. Unfortunately Impala passes the parameters in wrong order: Sets.difference(beforeInsert, afterInsert): [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581] So the result will be always empty. There's another problem with INSERT OVERWRITEs, in that case we never fill the data files of the insert event. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMPALA-10132) Implement ds_hll_estimate_bounds()
[ https://issues.apache.org/jira/browse/IMPALA-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189257#comment-17189257 ] Adam Tamas commented on IMPALA-10132: - Impala is not able to give back complex types as of now. We can either wait until it gets implemented or give back the estimations as a string with the values separated with commas. > Implement ds_hll_estimate_bounds() > -- > > Key: IMPALA-10132 > URL: https://issues.apache.org/jira/browse/IMPALA-10132 > Project: IMPALA > Issue Type: Sub-task >Reporter: Adam Tamas >Priority: Major > > In hive ds_hll_estimate_bounds() gives back an array of doubles. > An example for a sketch created from a table which contains only a single > value: > {code:java} > (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;) > +---+ > | _c0 | > +---+ > | [1.0,1.0,1.998634873453] | > +---+ > {code} > The values of the array is probably a lower bound, an estimate and an upper > bound of the sketch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10132) Implement ds_hll_estimate_bounds()
[ https://issues.apache.org/jira/browse/IMPALA-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Tamas updated IMPALA-10132: Description: In hive ds_hll_estimate_bounds() gives back an array of doubles. An example for a sketch created from a table which contains only a single value: {code:java} (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;) +---+ | _c0 | +---+ | [1.0,1.0,1.998634873453] | +---+ {code} The values of the array is probably a lower bound, an estimate and an upper bound of the sketch. was: In hive ds_hll_estimate_bounds() gives back an array of doubles. An example for a sketch created from a table which contains only a single value: (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;) +---+ | _c0 | +---+ | [1.0,1.0,1.998634873453] | +---+ The values of the array is probably a lower bound, an estimate and an upper bound of the sketch. > Implement ds_hll_estimate_bounds() > -- > > Key: IMPALA-10132 > URL: https://issues.apache.org/jira/browse/IMPALA-10132 > Project: IMPALA > Issue Type: Sub-task >Reporter: Adam Tamas >Priority: Major > > In hive ds_hll_estimate_bounds() gives back an array of doubles. > An example for a sketch created from a table which contains only a single > value: > {code:java} > (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;) > +---+ > | _c0 | > +---+ > | [1.0,1.0,1.998634873453] | > +---+ > {code} > The values of the array is probably a lower bound, an estimate and an upper > bound of the sketch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10133) Implement ds_hll_stringify()
[ https://issues.apache.org/jira/browse/IMPALA-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Tamas updated IMPALA-10133: Description: This function receives a string that is a serialized Apache DataSketches HLL sketch and returns its stringified format. A stringified format should look like and contains the following data: {code:java} (select ds_hll_stringify(ds_hll_sketch(i)) from t;) ++ |_c0 | ++ | ### HLL SKETCH SUMMARY: Log Config K : 12 Hll Target : HLL_4 Current Mode : LIST Memory : true LB : 1.0 Estimate : 1.0 UB : 1.49929250618 OutOfOrder Flag: false Coupon Count : 1 | ++ {code} was: This function receives a string that is a serialized Apache DataSketches HLL sketch and returns its stringified format. A stringified format should look like and contains the following data: (select ds_hll_stringify(ds_hll_sketch(i)) from t;) ++ |_c0 | ++ | ### HLL SKETCH SUMMARY: Log Config K : 12 Hll Target : HLL_4 Current Mode : LIST Memory : true LB : 1.0 Estimate : 1.0 UB : 1.49929250618 OutOfOrder Flag: false Coupon Count : 1 | ++ > Implement ds_hll_stringify() > > > Key: IMPALA-10133 > URL: https://issues.apache.org/jira/browse/IMPALA-10133 > Project: IMPALA > Issue Type: Sub-task >Reporter: Adam Tamas >Priority: Major > > This function receives a string that is a serialized Apache DataSketches > HLL sketch and returns its stringified format. > A stringified format should look like and contains the following data: > {code:java} > (select ds_hll_stringify(ds_hll_sketch(i)) from t;) > ++ > |_c0 | > ++ > | ### HLL SKETCH SUMMARY: > Log Config K : 12 > Hll Target : HLL_4 > Current Mode : LIST > Memory : true > LB : 1.0 > Estimate : 1.0 > UB : 1.49929250618 > OutOfOrder Flag: false > Coupon Count : 1 > | > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10133) Implement ds_hll_stringify()
[ https://issues.apache.org/jira/browse/IMPALA-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10133 started by Adam Tamas. --- > Implement ds_hll_stringify() > > > Key: IMPALA-10133 > URL: https://issues.apache.org/jira/browse/IMPALA-10133 > Project: IMPALA > Issue Type: Sub-task >Reporter: Adam Tamas >Assignee: Adam Tamas >Priority: Major > > This function receives a string that is a serialized Apache DataSketches > HLL sketch and returns its stringified format. > A stringified format should look like and contains the following data: > {code:java} > (select ds_hll_stringify(ds_hll_sketch(i)) from t;) > ++ > |_c0 | > ++ > | ### HLL SKETCH SUMMARY: > Log Config K : 12 > Hll Target : HLL_4 > Current Mode : LIST > Memory : true > LB : 1.0 > Estimate : 1.0 > UB : 1.49929250618 > OutOfOrder Flag: false > Coupon Count : 1 > | > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10134) Implement ds_hll_union_f()
Adam Tamas created IMPALA-10134: --- Summary: Implement ds_hll_union_f() Key: IMPALA-10134 URL: https://issues.apache.org/jira/browse/IMPALA-10134 Project: IMPALA Issue Type: Sub-task Reporter: Adam Tamas Implement ds_hll_union_f() and make sure it's behaveing similarly as in Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10134) Implement ds_hll_union_f()
Adam Tamas created IMPALA-10134: --- Summary: Implement ds_hll_union_f() Key: IMPALA-10134 URL: https://issues.apache.org/jira/browse/IMPALA-10134 Project: IMPALA Issue Type: Sub-task Reporter: Adam Tamas Implement ds_hll_union_f() and make sure it's behaveing similarly as in Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work stopped] (IMPALA-10107) Implement HLL functions to have full compatibility with Hive
[ https://issues.apache.org/jira/browse/IMPALA-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10107 stopped by Adam Tamas. --- > Implement HLL functions to have full compatibility with Hive > > > Key: IMPALA-10107 > URL: https://issues.apache.org/jira/browse/IMPALA-10107 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Gabor Kaszab >Assignee: Adam Tamas >Priority: Minor > > ds_hll_estimate_bounds > ds_hll_stringify > ds_hll_union_f > For parameters and expected behaviour check Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10133) Implement ds_hll_stringify()
Adam Tamas created IMPALA-10133: --- Summary: Implement ds_hll_stringify() Key: IMPALA-10133 URL: https://issues.apache.org/jira/browse/IMPALA-10133 Project: IMPALA Issue Type: Sub-task Reporter: Adam Tamas This function receives a string that is a serialized Apache DataSketches HLL sketch and returns its stringified format. A stringified format should look like and contains the following data: (select ds_hll_stringify(ds_hll_sketch(i)) from t;) ++ |_c0 | ++ | ### HLL SKETCH SUMMARY: Log Config K : 12 Hll Target : HLL_4 Current Mode : LIST Memory : true LB : 1.0 Estimate : 1.0 UB : 1.49929250618 OutOfOrder Flag: false Coupon Count : 1 | ++ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10133) Implement ds_hll_stringify()
Adam Tamas created IMPALA-10133: --- Summary: Implement ds_hll_stringify() Key: IMPALA-10133 URL: https://issues.apache.org/jira/browse/IMPALA-10133 Project: IMPALA Issue Type: Sub-task Reporter: Adam Tamas This function receives a string that is a serialized Apache DataSketches HLL sketch and returns its stringified format. A stringified format should look like and contains the following data: (select ds_hll_stringify(ds_hll_sketch(i)) from t;) ++ |_c0 | ++ | ### HLL SKETCH SUMMARY: Log Config K : 12 Hll Target : HLL_4 Current Mode : LIST Memory : true LB : 1.0 Estimate : 1.0 UB : 1.49929250618 OutOfOrder Flag: false Coupon Count : 1 | ++ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10132) Implement ds_hll_estimate_bounds()
Adam Tamas created IMPALA-10132: --- Summary: Implement ds_hll_estimate_bounds() Key: IMPALA-10132 URL: https://issues.apache.org/jira/browse/IMPALA-10132 Project: IMPALA Issue Type: Sub-task Reporter: Adam Tamas In hive ds_hll_estimate_bounds() gives back an array of doubles. An example for a sketch created from a table which contains only a single value: (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;) +---+ | _c0 | +---+ | [1.0,1.0,1.998634873453] | +---+ The values of the array is probably a lower bound, an estimate and an upper bound of the sketch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10132) Implement ds_hll_estimate_bounds()
Adam Tamas created IMPALA-10132: --- Summary: Implement ds_hll_estimate_bounds() Key: IMPALA-10132 URL: https://issues.apache.org/jira/browse/IMPALA-10132 Project: IMPALA Issue Type: Sub-task Reporter: Adam Tamas In hive ds_hll_estimate_bounds() gives back an array of doubles. An example for a sketch created from a table which contains only a single value: (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;) +---+ | _c0 | +---+ | [1.0,1.0,1.998634873453] | +---+ The values of the array is probably a lower bound, an estimate and an upper bound of the sketch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10131) Document ds_hll_* functions
Adam Tamas created IMPALA-10131: --- Summary: Document ds_hll_* functions Key: IMPALA-10131 URL: https://issues.apache.org/jira/browse/IMPALA-10131 Project: IMPALA Issue Type: New Feature Components: Docs Affects Versions: Impala 4.0 Reporter: Adam Tamas There were some new functions added recently to add support for Apache DataSketches KLL calculations. These functions purpose is to give an approximate boundaries for a given dataset. It is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per bit. The newly introduced functions are: ds_kll_sketch() ds_kll_quantile() ds_kll_quantiles_as_string() ds_kll_n() ds_kll_union() ds_kll_rank() ds_kll_pmf_as_string() ds_kll_cdf_as_string() ds_kll_stringify() Related Jiras: https://issues.apache.org/jira/browse/IMPALA-9959 https://issues.apache.org/jira/browse/IMPALA-9962 https://issues.apache.org/jira/browse/IMPALA-9963 https://issues.apache.org/jira/browse/IMPALA-10017 https://issues.apache.org/jira/browse/IMPALA-10018 https://issues.apache.org/jira/browse/IMPALA-10019 https://issues.apache.org/jira/browse/IMPALA-10020 https://issues.apache.org/jira/browse/IMPALA-10108 We should document these and mark them as experimental features so that users can try out and hopefully give feedback. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10131) Document ds_hll_* functions
Adam Tamas created IMPALA-10131: --- Summary: Document ds_hll_* functions Key: IMPALA-10131 URL: https://issues.apache.org/jira/browse/IMPALA-10131 Project: IMPALA Issue Type: New Feature Components: Docs Affects Versions: Impala 4.0 Reporter: Adam Tamas There were some new functions added recently to add support for Apache DataSketches KLL calculations. These functions purpose is to give an approximate boundaries for a given dataset. It is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per bit. The newly introduced functions are: ds_kll_sketch() ds_kll_quantile() ds_kll_quantiles_as_string() ds_kll_n() ds_kll_union() ds_kll_rank() ds_kll_pmf_as_string() ds_kll_cdf_as_string() ds_kll_stringify() Related Jiras: https://issues.apache.org/jira/browse/IMPALA-9959 https://issues.apache.org/jira/browse/IMPALA-9962 https://issues.apache.org/jira/browse/IMPALA-9963 https://issues.apache.org/jira/browse/IMPALA-10017 https://issues.apache.org/jira/browse/IMPALA-10018 https://issues.apache.org/jira/browse/IMPALA-10019 https://issues.apache.org/jira/browse/IMPALA-10020 https://issues.apache.org/jira/browse/IMPALA-10108 We should document these and mark them as experimental features so that users can try out and hopefully give feedback. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10131) Document ds_kll_* functions
[ https://issues.apache.org/jira/browse/IMPALA-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Tamas updated IMPALA-10131: Summary: Document ds_kll_* functions (was: Document ds_hll_* functions) > Document ds_kll_* functions > --- > > Key: IMPALA-10131 > URL: https://issues.apache.org/jira/browse/IMPALA-10131 > Project: IMPALA > Issue Type: New Feature > Components: Docs >Affects Versions: Impala 4.0 >Reporter: Adam Tamas >Priority: Major > Labels: doc > > There were some new functions added recently to add support for Apache > DataSketches KLL calculations. These functions purpose is to give an > approximate boundaries for > a given dataset. It is an implementation of a very compact quantiles sketch > with lazy compaction scheme and nearly optimal accuracy per bit. > The newly introduced functions are: > ds_kll_sketch() > ds_kll_quantile() > ds_kll_quantiles_as_string() > ds_kll_n() > ds_kll_union() > ds_kll_rank() > ds_kll_pmf_as_string() > ds_kll_cdf_as_string() > ds_kll_stringify() > Related Jiras: > https://issues.apache.org/jira/browse/IMPALA-9959 > https://issues.apache.org/jira/browse/IMPALA-9962 > https://issues.apache.org/jira/browse/IMPALA-9963 > https://issues.apache.org/jira/browse/IMPALA-10017 > https://issues.apache.org/jira/browse/IMPALA-10018 > https://issues.apache.org/jira/browse/IMPALA-10019 > https://issues.apache.org/jira/browse/IMPALA-10020 > https://issues.apache.org/jira/browse/IMPALA-10108 > We should document these and mark them as experimental features so that users > can try out and hopefully give feedback. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10108) Implement ds_kll_stringify function
[ https://issues.apache.org/jira/browse/IMPALA-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189179#comment-17189179 ] ASF subversion and git services commented on IMPALA-10108: -- Commit 4cb3c3556e77ee24003383155ca5e1b70be4db6e in impala's branch refs/heads/master from Adam Tamas [ https://gitbox.apache.org/repos/asf?p=impala.git;h=4cb3c35 ] IMPALA-10108: Implement ds_kll_stringify function This function receives a string that is a serialized Apache DataSketches KLL sketch and returns its stringified format. A stringified format should look like and contains the following data: select ds_kll_stringify(ds_kll_sketch(float_col)) from functional_parquet.alltypestiny; ++ | ds_kll_stringify(ds_kll_sketch(float_col)) | ++ | ### KLL sketch summary:| |K : 200| |min K : 200| |M : 8 | |N : 8 | |Epsilon: 1.33% | |Epsilon PMF: 1.65% | |Empty : false | |Estimation mode: false | |Levels : 1 | |Sorted : false | |Capacity items : 200| |Retained items : 8 | |Storage bytes : 64 | |Min value : 0 | |Max value : 1.1| | ### End sketch summary | || ++ Change-Id: I97f654a4838bf91e3e0bed6a00d78b2c7aa96f75 Reviewed-on: http://gerrit.cloudera.org:8080/16370 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Implement ds_kll_stringify function > --- > > Key: IMPALA-10108 > URL: https://issues.apache.org/jira/browse/IMPALA-10108 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Gabor Kaszab >Assignee: Adam Tamas >Priority: Major > > ds_kll_stringify() receives a string that is a serialized Apache DataSketches > sketch and returns its stringified format by invoking the related function on > the sketch's interface. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10090) Create aarch64 development environment on ubuntu 18.04
[ https://issues.apache.org/jira/browse/IMPALA-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189178#comment-17189178 ] ASF subversion and git services commented on IMPALA-10090: -- Commit 0098113d9582c3b93044eed4078a7f0724dda26f in impala's branch refs/heads/master from zhaorenhai [ https://gitbox.apache.org/repos/asf?p=impala.git;h=0098113 ] IMPALA-10090: Create aarch64 development environment on ubuntu 18.04 Including following changes: 1 build native-toolchain local by script on aarch64 platform 2 change some native-toolchain's lib version number 3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things, because on aarch64, just need to download cdp components , but not need to download toolchain. 4 download hadoop aarch64 nativelibs , impala building needs these libs. With this commit, on ubuntu 18.04 aarch64 version, just need to run bin/bootstrap_development.sh, just like x86. Change-Id: I769668c834ab0dd504a822ed9153186778275d59 Reviewed-on: http://gerrit.cloudera.org:8080/16065 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > Create aarch64 development environment on ubuntu 18.04 > -- > > Key: IMPALA-10090 > URL: https://issues.apache.org/jira/browse/IMPALA-10090 > Project: IMPALA > Issue Type: Sub-task >Reporter: zhaorenhai >Assignee: zhaorenhai >Priority: Major > > Including following changes: > 1 build native-toolchain local by script on aarch64 platform > 2 change some native-toolchain's lib version number > 3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things, > because on aarch64, just need to download cdp components , but not need to > download toolchain. > 4 download hadoop aarch64 nativelibs , impala building needs these libs. > With this commit, on ubuntu 18.04 aarch64 version, just need to run > bin/bootstrap_development.sh, just like x86. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10094) TestResetMetadata.test_refresh_updated_partitions fails due to connection error
[ https://issues.apache.org/jira/browse/IMPALA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189177#comment-17189177 ] ASF subversion and git services commented on IMPALA-10094: -- Commit 28b1542db0c6c91a9d7fc626ac50c2bfc84dabb2 in impala's branch refs/heads/master from Vihang Karajgaonkar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=28b1542 ] IMPALA-10094: Skip test_refresh_updated_partitions on S3 The test test_refresh_updated_partitions runs some commands using Hive which causes it fail on S3 specific jobs since we don't run HiveServer2 in those environments. This patch skips the test on non-hdfs environments. Change-Id: I0d27dd76e772e396a07419a58821ba899ac74188 Reviewed-on: http://gerrit.cloudera.org:8080/16399 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > TestResetMetadata.test_refresh_updated_partitions fails due to connection > error > --- > > Key: IMPALA-10094 > URL: https://issues.apache.org/jira/browse/IMPALA-10094 > Project: IMPALA > Issue Type: Bug >Reporter: Norbert Luksa >Assignee: Vihang Karajgaonkar >Priority: Major > > This has occurred in the last few builds in > impala-cdpd-master-staging-core-s3: > [https://master-02.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/14/] > Error message: > {code:java} > metadata/test_reset_metadata.py:49: in test_refresh_updated_partitions > "alter table {0} add partition (year=2020, month=8)".format(tbl)) > common/impala_test_suite.py:983: in run_stmt_in_hive > raise RuntimeError(stderr) > E RuntimeError: SLF4J: Class path contains multiple SLF4J bindings. > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > E SLF4J: Actual binding is of type > [org.apache.logging.slf4j.Log4jLoggerFactory] > E ERROR StatusLogger No log4j2 configuration file found. Using default > configuration: logging only errors to the console. Set system property > 'log4j2.debug' to show Log4j2 internal initialization logging. > E SLF4J: Class path contains multiple SLF4J bindings. > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: Found binding in > [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] > E SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > E SLF4J: Actual binding is of type > [org.apache.logging.slf4j.Log4jLoggerFactory] > E Connecting to jdbc:hive2://localhost:11050 > E 20/08/18 05:10:24 [main]: WARN jdbc.HiveConnection: Failed to connect to > localhost:11050 > E Could not open connection to the HS2 server. Please check the server URI > and if the URI is correct, then ask the administrator to check the server > status. > E Error: Could not open client transport with JDBC Uri: > jdbc:hive2://localhost:11050: java.net.ConnectException: Connection refused > (Connection refused) (state=08S01,code=0){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10115) Impala should check file schema as well to check full ACIDv2 files
[ https://issues.apache.org/jira/browse/IMPALA-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10115. Fix Version/s: Impala 4.0 Resolution: Fixed > Impala should check file schema as well to check full ACIDv2 files > -- > > Key: IMPALA-10115 > URL: https://issues.apache.org/jira/browse/IMPALA-10115 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Fix For: Impala 4.0 > > > Currently Impala checks file metadata 'hive.acid.version' to decide the full > ACID schema. > There are cases when Hive forgets to set this value for full ACID files, e.g. > major query-based compactions. > So if 'hive.acid.version' is not present, Impala should still look at the > schema elements to be sure about the file format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10115) Impala should check file schema as well to check full ACIDv2 files
[ https://issues.apache.org/jira/browse/IMPALA-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy reassigned IMPALA-10115: -- Assignee: Zoltán Borók-Nagy > Impala should check file schema as well to check full ACIDv2 files > -- > > Key: IMPALA-10115 > URL: https://issues.apache.org/jira/browse/IMPALA-10115 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > > Currently Impala checks file metadata 'hive.acid.version' to decide the full > ACID schema. > There are cases when Hive forgets to set this value for full ACID files, e.g. > major query-based compactions. > So if 'hive.acid.version' is not present, Impala should still look at the > schema elements to be sure about the file format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10115) Impala should check file schema as well to check full ACIDv2 files
[ https://issues.apache.org/jira/browse/IMPALA-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10115. Fix Version/s: Impala 4.0 Resolution: Fixed > Impala should check file schema as well to check full ACIDv2 files > -- > > Key: IMPALA-10115 > URL: https://issues.apache.org/jira/browse/IMPALA-10115 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Fix For: Impala 4.0 > > > Currently Impala checks file metadata 'hive.acid.version' to decide the full > ACID schema. > There are cases when Hive forgets to set this value for full ACID files, e.g. > major query-based compactions. > So if 'hive.acid.version' is not present, Impala should still look at the > schema elements to be sure about the file format. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10090) Create aarch64 development environment on ubuntu 18.04
[ https://issues.apache.org/jira/browse/IMPALA-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaorenhai resolved IMPALA-10090. - Resolution: Fixed > Create aarch64 development environment on ubuntu 18.04 > -- > > Key: IMPALA-10090 > URL: https://issues.apache.org/jira/browse/IMPALA-10090 > Project: IMPALA > Issue Type: Sub-task >Reporter: zhaorenhai >Assignee: zhaorenhai >Priority: Major > > Including following changes: > 1 build native-toolchain local by script on aarch64 platform > 2 change some native-toolchain's lib version number > 3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things, > because on aarch64, just need to download cdp components , but not need to > download toolchain. > 4 download hadoop aarch64 nativelibs , impala building needs these libs. > With this commit, on ubuntu 18.04 aarch64 version, just need to run > bin/bootstrap_development.sh, just like x86. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10090) Create aarch64 development environment on ubuntu 18.04
[ https://issues.apache.org/jira/browse/IMPALA-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaorenhai resolved IMPALA-10090. - Resolution: Fixed > Create aarch64 development environment on ubuntu 18.04 > -- > > Key: IMPALA-10090 > URL: https://issues.apache.org/jira/browse/IMPALA-10090 > Project: IMPALA > Issue Type: Sub-task >Reporter: zhaorenhai >Assignee: zhaorenhai >Priority: Major > > Including following changes: > 1 build native-toolchain local by script on aarch64 platform > 2 change some native-toolchain's lib version number > 3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things, > because on aarch64, just need to download cdp components , but not need to > download toolchain. > 4 download hadoop aarch64 nativelibs , impala building needs these libs. > With this commit, on ubuntu 18.04 aarch64 version, just need to run > bin/bootstrap_development.sh, just like x86. -- This message was sent by Atlassian Jira (v8.3.4#803005)