[jira] [Resolved] (IMPALA-10064) Support constant propagation for range predicates

2020-09-02 Thread Aman Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved IMPALA-10064.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Support constant propagation for range predicates
> -
>
> Key: IMPALA-10064
> URL: https://issues.apache.org/jira/browse/IMPALA-10064
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: Impala 4.0
>
>
> Consider the following table schema, view and 2 queries on the view:
> {noformat}
> create table tt1 (a1 int, b1 int, ts timestamp) partitioned by (mydate date);
> create view tt1_view as (select a1, b1, ts from tt1 where mydate = cast(ts as 
> date));
> // query 1:  (Good) constant on ts gets propagated
> explain select * from tt1_view where ts = '2019-07-01';
> 00:SCAN HDFS [db1.tt1]
>partition predicates: mydate = DATE '2019-07-01'
>HDFS partitions=1/3 files=2 size=48B
>predicates: db1.tt1.ts = TIMESTAMP '2019-07-01 00:00:00'
>row-size=24B cardinality=1
> // query 2: (Not good) constant on ts does not get propagated
> explain select * from tt1_view where ts > '2019-07-01';
> 00:SCAN HDFS [db1.tt1]
>HDFS partitions=3/3 files=4 size=96B
>predicates: db1.tt1.ts > TIMESTAMP '2019-07-01 00:00:00', mydate = CAST(ts 
> AS DATE)
>row-size=28B cardinality=1
> {noformat}
> Note that in query 1, with the equality condition on 'ts' the constant value 
> is propagated to the 'mydate = CAST(ts as date)' predicate.  This gets 
> applied as a partition predicate.  Whereas, in query 2 which has a range 
> predicate, the constant is not propagated and no partition predicate is 
> created for the scan.  We should support the second case also for constant 
> propagation.  The constant predicates such as >, >=. <. <= and involving date 
> or timestamp literals should be considered ..but we have to analyze the cases 
> where the propagation is valid.  E.g with date_add, date_diff type of 
> functions is there a potential for incorrect propagation.
> Note that a predicate can be a BETWEEN condition such as:
> {noformat}
> WHERE ts >= '2019-07-01' AND ts <= '2020--07-01'
> {noformat}
> In this case both need to be applied 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10064) Support constant propagation for range predicates

2020-09-02 Thread Aman Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved IMPALA-10064.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Support constant propagation for range predicates
> -
>
> Key: IMPALA-10064
> URL: https://issues.apache.org/jira/browse/IMPALA-10064
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: Impala 4.0
>
>
> Consider the following table schema, view and 2 queries on the view:
> {noformat}
> create table tt1 (a1 int, b1 int, ts timestamp) partitioned by (mydate date);
> create view tt1_view as (select a1, b1, ts from tt1 where mydate = cast(ts as 
> date));
> // query 1:  (Good) constant on ts gets propagated
> explain select * from tt1_view where ts = '2019-07-01';
> 00:SCAN HDFS [db1.tt1]
>partition predicates: mydate = DATE '2019-07-01'
>HDFS partitions=1/3 files=2 size=48B
>predicates: db1.tt1.ts = TIMESTAMP '2019-07-01 00:00:00'
>row-size=24B cardinality=1
> // query 2: (Not good) constant on ts does not get propagated
> explain select * from tt1_view where ts > '2019-07-01';
> 00:SCAN HDFS [db1.tt1]
>HDFS partitions=3/3 files=4 size=96B
>predicates: db1.tt1.ts > TIMESTAMP '2019-07-01 00:00:00', mydate = CAST(ts 
> AS DATE)
>row-size=28B cardinality=1
> {noformat}
> Note that in query 1, with the equality condition on 'ts' the constant value 
> is propagated to the 'mydate = CAST(ts as date)' predicate.  This gets 
> applied as a partition predicate.  Whereas, in query 2 which has a range 
> predicate, the constant is not propagated and no partition predicate is 
> created for the scan.  We should support the second case also for constant 
> propagation.  The constant predicates such as >, >=. <. <= and involving date 
> or timestamp literals should be considered ..but we have to analyze the cases 
> where the propagation is valid.  E.g with date_add, date_diff type of 
> functions is there a potential for incorrect propagation.
> Note that a predicate can be a BETWEEN condition such as:
> {noformat}
> WHERE ts >= '2019-07-01' AND ts <= '2020--07-01'
> {noformat}
> In this case both need to be applied 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10057) TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST

2020-09-02 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-10057:
---
Description: 
For the both the normal tests and the docker-based tests, the Impala logs 
generated during the FE_TEST/JDBC_TEST can be huge:

 
{noformat}
$ du -c -h fe_test/ee_tests
4.0Kfe_test/ee_tests/minidumps/statestored
4.0Kfe_test/ee_tests/minidumps/impalad
4.0Kfe_test/ee_tests/minidumps/catalogd
16K fe_test/ee_tests/minidumps
352Kfe_test/ee_tests/profiles
81G fe_test/ee_tests
81G total{noformat}
Creating a tarball of these logs takes 10 minutes. The Impalad/catalogd logs 
are filled with this error over and over:
{noformat}
E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected exception 
thrown
Java exception follows:
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: 
org/apache/impala/common/TransactionKeepalive$HeartbeatContext
at 
org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: 
org/apache/impala/common/TransactionKeepalive$HeartbeatContext
... 2 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.impala.common.TransactionKeepalive$HeartbeatContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 2 more{noformat}
Two interesting points:
 # The frontend/jdbc tests are passing, so all of these errors in the impalad 
logs are not impacting tests.
 # These errors aren't concurrently with any of the other tests (ee tests, 
custom cluster tests, etc).

This is happening on normal core runs (including the GVO job that does 
FE_TEST/JDBC_TEST) on both Ubuntu and Centos 7. It is also happening on 
docker-based tests. A theory is that FE_TEST/JDBC_TEST have an Impala cluster 
running and then invoke maven to run the tests. Maven could manipulate jars 
while Impala is running. Maybe there is a race-condition or conflict when 
manipulating those jars that could cause the NoClassDefFoundError. It makes no 
sense for Impala not to be able to find TransactionKeepalive$HeartbeatContext.

When it happens, it is in a tight loop, printing the message more than once per 
millisecond. It fills the ERROR, WARNING, and INFO logs with that message, 
sometimes for multiple Impalads and/or catalogd.

  was:
For the docker-based tests, the Impala logs generated during the FE_TEST are 
huge:

 
{noformat}
$ du -c -h fe_test/ee_tests
4.0Kfe_test/ee_tests/minidumps/statestored
4.0Kfe_test/ee_tests/minidumps/impalad
4.0Kfe_test/ee_tests/minidumps/catalogd
16K fe_test/ee_tests/minidumps
352Kfe_test/ee_tests/profiles
81G fe_test/ee_tests
81G total{noformat}
Creating a tarball of these logs takes 10 minutes. The Impalad/catalogd logs 
are filled with this error over and over:
{noformat}
E0805 06:08:45.485440 11219 TransactionKeepalive.java:137] Unexpected exception 
thrown
Java exception follows:
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: 
at 
org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: 
... 2 more{noformat}
Two interesting points:
 # The frontend tests are passing, so all of these errors in the impalad logs 
are not impacting tests.
 # These errors aren't happening in any of the other tests (ee tests, custom 
cluster tests, etc). These errors are not seen outside the docker-based tests.

A theory is that FE_TEST runs mvn to build and run the frontend tests. If there 
were some bad interaction of mvn with the docker filesystem in manipulating the 
~/.m2 directory, that may cause problems. One thing to try may be to copy the 
.m2 directory to make sure it is in the top docker layer (similar to what we do 
with kudu wal files).

 


> TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST
> --
>
> Key: IMPALA-10057
> URL: https://issues.apache.org/jira/browse/IMPALA-10057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Priority: Major
>
> For the both the normal tests and the docker-based tests, the Impala logs 
> generated during the FE_TEST/JDBC_TEST can be huge:
>  
> {noformat}
> $ du -c -h fe_test/ee_tests
> 4.0K  fe_test/ee_tests/minidumps/statestored
> 4.0K  fe_test/ee_tests/minidumps/impalad
> 4.0K  

[jira] [Updated] (IMPALA-10057) TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST

2020-09-02 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-10057:
---
Summary: TransactionKeepalive NoClassDefFoundError floods logs during 
JDBC_TEST/FE_TEST  (was: Impala logs during docker-based FE_TEST are massive)

> TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST
> --
>
> Key: IMPALA-10057
> URL: https://issues.apache.org/jira/browse/IMPALA-10057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Priority: Major
>
> For the docker-based tests, the Impala logs generated during the FE_TEST are 
> huge:
>  
> {noformat}
> $ du -c -h fe_test/ee_tests
> 4.0K  fe_test/ee_tests/minidumps/statestored
> 4.0K  fe_test/ee_tests/minidumps/impalad
> 4.0K  fe_test/ee_tests/minidumps/catalogd
> 16K   fe_test/ee_tests/minidumps
> 352K  fe_test/ee_tests/profiles
> 81G   fe_test/ee_tests
> 81G   total{noformat}
> Creating a tarball of these logs takes 10 minutes. The Impalad/catalogd logs 
> are filled with this error over and over:
> {noformat}
> E0805 06:08:45.485440 11219 TransactionKeepalive.java:137] Unexpected 
> exception thrown
> Java exception follows:
> java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: 
>   at 
> org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NoClassDefFoundError: 
>   ... 2 more{noformat}
> Two interesting points:
>  # The frontend tests are passing, so all of these errors in the impalad logs 
> are not impacting tests.
>  # These errors aren't happening in any of the other tests (ee tests, custom 
> cluster tests, etc). These errors are not seen outside the docker-based tests.
> A theory is that FE_TEST runs mvn to build and run the frontend tests. If 
> there were some bad interaction of mvn with the docker filesystem in 
> manipulating the ~/.m2 directory, that may cause problems. One thing to try 
> may be to copy the .m2 directory to make sure it is in the top docker layer 
> (similar to what we do with kudu wal files).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10064) Support constant propagation for range predicates

2020-09-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189818#comment-17189818
 ] 

ASF subversion and git services commented on IMPALA-10064:
--

Commit 5e9f10d34cc2ba6e18b469a3a5ae3ed9f5f306b1 in impala's branch 
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5e9f10d ]

IMPALA-10064: Support constant propagation for eligible range predicates

This patch adds support for constant propagation of range predicates
involving date and timestamp constants. Previously, only equality
predicates were considered for propagation. The new type of propagation
is shown by the following example:

Before constant propagation:
 WHERE date_col = CAST(timestamp_col as DATE)
  AND timestamp_col BETWEEN '2019-01-01' AND '2020-01-01'
After constant propagation:
 WHERE date_col >= '2019-01-01' AND date_col <= '2020-01-01'
  AND timestamp_col >= '2019-01-01' AND timestamp_col <= '2020-01-01'
  AND date_col = CAST(timestamp_col as DATE)

As a consequence, since Impala supports table partitioning by date
columns but not timestamp columns, the above propagation enables
partition pruning based on timestamp ranges.

Existing code for equality based constant propagation was refactored
and consolidated into a new class which handles both equality and
range based constant propagation. Range based propagation is only
applied to date and timestamp columns.

Testing:
 - Added new range constant propagation tests to PlannerTest.
 - Added e2e test for range constant propagation based on a newly
   added date partitioned table.
 - Ran precommit tests.

Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b
Reviewed-on: http://gerrit.cloudera.org:8080/16346
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Support constant propagation for range predicates
> -
>
> Key: IMPALA-10064
> URL: https://issues.apache.org/jira/browse/IMPALA-10064
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>
> Consider the following table schema, view and 2 queries on the view:
> {noformat}
> create table tt1 (a1 int, b1 int, ts timestamp) partitioned by (mydate date);
> create view tt1_view as (select a1, b1, ts from tt1 where mydate = cast(ts as 
> date));
> // query 1:  (Good) constant on ts gets propagated
> explain select * from tt1_view where ts = '2019-07-01';
> 00:SCAN HDFS [db1.tt1]
>partition predicates: mydate = DATE '2019-07-01'
>HDFS partitions=1/3 files=2 size=48B
>predicates: db1.tt1.ts = TIMESTAMP '2019-07-01 00:00:00'
>row-size=24B cardinality=1
> // query 2: (Not good) constant on ts does not get propagated
> explain select * from tt1_view where ts > '2019-07-01';
> 00:SCAN HDFS [db1.tt1]
>HDFS partitions=3/3 files=4 size=96B
>predicates: db1.tt1.ts > TIMESTAMP '2019-07-01 00:00:00', mydate = CAST(ts 
> AS DATE)
>row-size=28B cardinality=1
> {noformat}
> Note that in query 1, with the equality condition on 'ts' the constant value 
> is propagated to the 'mydate = CAST(ts as date)' predicate.  This gets 
> applied as a partition predicate.  Whereas, in query 2 which has a range 
> predicate, the constant is not propagated and no partition predicate is 
> created for the scan.  We should support the second case also for constant 
> propagation.  The constant predicates such as >, >=. <. <= and involving date 
> or timestamp literals should be considered ..but we have to analyze the cases 
> where the propagation is valid.  E.g with date_add, date_diff type of 
> functions is there a potential for incorrect propagation.
> Note that a predicate can be a BETWEEN condition such as:
> {noformat}
> WHERE ts >= '2019-07-01' AND ts <= '2020--07-01'
> {noformat}
> In this case both need to be applied 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9870) summary and profile command in impala-shell should show both original and retried info

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189775#comment-17189775
 ] 

Sahil Takiar commented on IMPALA-9870:
--

WIP Patch: http://gerrit.cloudera.org:8080/16406

> summary and profile command in impala-shell should show both original and 
> retried info
> --
>
> Key: IMPALA-9870
> URL: https://issues.apache.org/jira/browse/IMPALA-9870
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> If a query is retried, impala-shell still uses the original query handle 
> containing the original query id. Subsequent "summary" and "profile" commands 
> will return results of the original query. We should consider return both the 
> original and retried information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10140) Throw CatalogException for query "create database if not exist" with sync_ddl as true

2020-09-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10140:

Description: 
Customer faced following error message randomly when running following query on 
impalad version 3.2.0-cdh6.3.2 RELEASE.

set sync_ddl =true ; create database if not exists $dbname;

I0715 11:52:28.496253 51943 client-request-state.cc:187] 
a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the 
catalog topic version for the SYNC_DDL operation after 5 attempts.The operation 
has been su
 ccessfully executed but its effects may have not been broadcast to all the 
coordinators.

 

>From the Catalog server log, we can check following error message as well.

I0715 11:01:50.143303 220286 jni-util.cc:256] 
org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic 
version for the SYNC_DDL operation after 5 attempts.The operation has been 
successfully executed but its effects may have not been broadcast to all the 
coordinators.
 at 
org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474)
 at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374)
 at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154)

This looks to be another variation of the conditions described in IMPALA-7961. 
But the difference here is that this case is with "CREATE DATABASE ... IF NOT 
EXISTS".
 The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT 
EXISTS" use case.

To fix the issue, we should port the change in patch 
[https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function.

 

  was:
Customer faced following error message randomly when running following query on 
impalad version 3.2.0-cdh6.3.2 RELEASE.  
([https://jira.cloudera.com/browse/ENGESC-3589)|https://jira.cloudera.com/browse/ENGESC-3589]

set sync_ddl =true ; create database if not exists $dbname;

I0715 11:52:28.496253 51943 client-request-state.cc:187] 
a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the 
catalog topic version for the SYNC_DDL operation after 5 attempts.The operation 
has been su
 ccessfully executed but its effects may have not been broadcast to all the 
coordinators.

 

>From the Catalog server log, we can check following error message as well.

I0715 11:01:50.143303 220286 jni-util.cc:256] 
org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic 
version for the SYNC_DDL operation after 5 attempts.The operation has been 
successfully executed but its effects may have not been broadcast to all the 
coordinators.
 at 
org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474)
 at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374)
 at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154)

This looks to be another variation of the conditions described in IMPALA-7961. 
But the difference here is that this case is with "CREATE DATABASE ... IF NOT 
EXISTS".
 The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT 
EXISTS" use case.

To fix the issue, we should port the change in patch 
[https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function.

 


> Throw CatalogException for query "create database if not exist" with sync_ddl 
> as true
> -
>
> Key: IMPALA-10140
> URL: https://issues.apache.org/jira/browse/IMPALA-10140
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Wenzhe Zhou
>Priority: Critical
>
> Customer faced following error message randomly when running following query 
> on impalad version 3.2.0-cdh6.3.2 RELEASE.
> set sync_ddl =true ; create database if not exists $dbname;
> I0715 11:52:28.496253 51943 client-request-state.cc:187] 
> a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the 
> catalog topic version for the SYNC_DDL operation after 5 attempts.The 
> operation has been su
>  ccessfully executed but its effects may have not been broadcast to all the 
> coordinators.
>  
> From the Catalog server log, we can check following error message as well.
> I0715 11:01:50.143303 220286 jni-util.cc:256] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 5 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
>  at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474)
>  at 
> 

[jira] [Updated] (IMPALA-10140) Throw CatalogException for query "create database if not exist" with sync_ddl as true

2020-09-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10140:

Priority: Critical  (was: Major)

> Throw CatalogException for query "create database if not exist" with sync_ddl 
> as true
> -
>
> Key: IMPALA-10140
> URL: https://issues.apache.org/jira/browse/IMPALA-10140
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Wenzhe Zhou
>Priority: Critical
>
> Customer faced following error message randomly when running following query 
> on impalad version 3.2.0-cdh6.3.2 RELEASE.  
> ([https://jira.cloudera.com/browse/ENGESC-3589)|https://jira.cloudera.com/browse/ENGESC-3589]
> set sync_ddl =true ; create database if not exists $dbname;
> I0715 11:52:28.496253 51943 client-request-state.cc:187] 
> a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the 
> catalog topic version for the SYNC_DDL operation after 5 attempts.The 
> operation has been su
>  ccessfully executed but its effects may have not been broadcast to all the 
> coordinators.
>  
> From the Catalog server log, we can check following error message as well.
> I0715 11:01:50.143303 220286 jni-util.cc:256] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 5 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
>  at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474)
>  at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374)
>  at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154)
> This looks to be another variation of the conditions described in 
> IMPALA-7961. But the difference here is that this case is with "CREATE 
> DATABASE ... IF NOT EXISTS".
>  The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT 
> EXISTS" use case.
> To fix the issue, we should port the change in patch 
> [https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10106) Update DataSketches to version 2.1.0

2020-09-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189746#comment-17189746
 ] 

ASF subversion and git services commented on IMPALA-10106:
--

Commit f9936549dcab58390c5662ebdedb9c60838185a4 in impala's branch 
refs/heads/master from Adam Tamas
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f993654 ]

IMPALA-10106: Upgrade DataSketches to version 2.1.0

Upgrade the external DataSketches files for HLL/KLL to version 2.1.0

tests:
-Ran the tests from tests/query_test/test_datasketches.py

Change-Id: I4faa31c0b628a62c7e56a6c4b9549d0aaa8a02ff
Reviewed-on: http://gerrit.cloudera.org:8080/16360
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Update DataSketches to version 2.1.0
> 
>
> Key: IMPALA-10106
> URL: https://issues.apache.org/jira/browse/IMPALA-10106
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Adam Tamas
>Assignee: Adam Tamas
>Priority: Minor
>
> Update the external DataSketches files for HLL/KLL to version 2.1.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189724#comment-17189724
 ] 

Sahil Takiar commented on IMPALA-10141:
---

Adding some additional fields from {{/proc/net/dev}} in the per-node stats from 
system-info.cc might be useful as well. Fields like NET_RX_ERRS, NET_RX_DROP, 
NET_TX_ERRS, NET_TX_DROP might be useful to track transmit / receive errors or 
dropped packets. Although these stats are probably more generic as they are not 
specific to the kRPC TCP connections and are truly at the host level. I'm also 
not sure what exactly they capture compared to the TCP stats. They seem more 
hardware specific, maybe they would capture host NIC issues.

> Include aggregate TCP metrics in per-node profiles
> --
>
> Key: IMPALA-10141
> URL: https://issues.apache.org/jira/browse/IMPALA-10141
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
> metrics per kRPC connection for all inbound / outbound connections. It would 
> be useful to aggregate some of these metrics and put them in the per-node 
> profiles. Since it is not possible to currently split these metrics out per 
> query, they should be added at the per-host level. Furthermore, only metrics 
> that can be sanely aggregated across all connections should be included. For 
> example, tracking the number of Retransmitted TCP Packets across all 
> connections for the duration of the query would be useful. TCP 
> retransmissions should be rare and are typically indicate of network hardware 
> issues or network congestions, having at least some high level idea of the 
> number of TCP retransmissions that occur during a query can drastically help 
> determine if the network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189718#comment-17189718
 ] 

Sahil Takiar commented on IMPALA-10139:
---

The "network" time (calculated as {{int64_t network_time_ns = total_time_ns - 
resp_.receiver_latency_ns()}}) might be a more useful threshold value to use.

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10141:
-

 Summary: Include aggregate TCP metrics in per-node profiles
 Key: IMPALA-10141
 URL: https://issues.apache.org/jira/browse/IMPALA-10141
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
metrics per kRPC connection for all inbound / outbound connections. It would be 
useful to aggregate some of these metrics and put them in the per-node 
profiles. Since it is not possible to currently split these metrics out per 
query, they should be added at the per-host level. Furthermore, only metrics 
that can be sanely aggregated across all connections should be included. For 
example, tracking the number of Retransmitted TCP Packets across all 
connections for the duration of the query would be useful. TCP retransmissions 
should be rare and are typically indicate of network hardware issues or network 
congestions, having at least some high level idea of the number of TCP 
retransmissions that occur during a query can drastically help determine if the 
network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10141:
-

 Summary: Include aggregate TCP metrics in per-node profiles
 Key: IMPALA-10141
 URL: https://issues.apache.org/jira/browse/IMPALA-10141
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
metrics per kRPC connection for all inbound / outbound connections. It would be 
useful to aggregate some of these metrics and put them in the per-node 
profiles. Since it is not possible to currently split these metrics out per 
query, they should be added at the per-host level. Furthermore, only metrics 
that can be sanely aggregated across all connections should be included. For 
example, tracking the number of Retransmitted TCP Packets across all 
connections for the duration of the query would be useful. TCP retransmissions 
should be rare and are typically indicate of network hardware issues or network 
congestions, having at least some high level idea of the number of TCP 
retransmissions that occur during a query can drastically help determine if the 
network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10140) Throw CatalogException for query "create database if not exist" with sync_ddl as true

2020-09-02 Thread Wenzhe Zhou (Jira)
Wenzhe Zhou created IMPALA-10140:


 Summary: Throw CatalogException for query "create database if not 
exist" with sync_ddl as true
 Key: IMPALA-10140
 URL: https://issues.apache.org/jira/browse/IMPALA-10140
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog, Frontend
Affects Versions: Impala 3.2.0
Reporter: Wenzhe Zhou


Customer faced following error message randomly when running following query on 
impalad version 3.2.0-cdh6.3.2 RELEASE.  
([https://jira.cloudera.com/browse/ENGESC-3589)|https://jira.cloudera.com/browse/ENGESC-3589]

set sync_ddl =true ; create database if not exists $dbname;

I0715 11:52:28.496253 51943 client-request-state.cc:187] 
a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the 
catalog topic version for the SYNC_DDL operation after 5 attempts.The operation 
has been su
 ccessfully executed but its effects may have not been broadcast to all the 
coordinators.

 

>From the Catalog server log, we can check following error message as well.

I0715 11:01:50.143303 220286 jni-util.cc:256] 
org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic 
version for the SYNC_DDL operation after 5 attempts.The operation has been 
successfully executed but its effects may have not been broadcast to all the 
coordinators.
 at 
org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474)
 at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374)
 at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154)

This looks to be another variation of the conditions described in IMPALA-7961. 
But the difference here is that this case is with "CREATE DATABASE ... IF NOT 
EXISTS".
 The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT 
EXISTS" use case.

To fix the issue, we should port the change in patch 
[https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10140) Throw CatalogException for query "create database if not exist" with sync_ddl as true

2020-09-02 Thread Wenzhe Zhou (Jira)
Wenzhe Zhou created IMPALA-10140:


 Summary: Throw CatalogException for query "create database if not 
exist" with sync_ddl as true
 Key: IMPALA-10140
 URL: https://issues.apache.org/jira/browse/IMPALA-10140
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog, Frontend
Affects Versions: Impala 3.2.0
Reporter: Wenzhe Zhou


Customer faced following error message randomly when running following query on 
impalad version 3.2.0-cdh6.3.2 RELEASE.  
([https://jira.cloudera.com/browse/ENGESC-3589)|https://jira.cloudera.com/browse/ENGESC-3589]

set sync_ddl =true ; create database if not exists $dbname;

I0715 11:52:28.496253 51943 client-request-state.cc:187] 
a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the 
catalog topic version for the SYNC_DDL operation after 5 attempts.The operation 
has been su
 ccessfully executed but its effects may have not been broadcast to all the 
coordinators.

 

>From the Catalog server log, we can check following error message as well.

I0715 11:01:50.143303 220286 jni-util.cc:256] 
org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic 
version for the SYNC_DDL operation after 5 attempts.The operation has been 
successfully executed but its effects may have not been broadcast to all the 
coordinators.
 at 
org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474)
 at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374)
 at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154)

This looks to be another variation of the conditions described in IMPALA-7961. 
But the difference here is that this case is with "CREATE DATABASE ... IF NOT 
EXISTS".
 The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT 
EXISTS" use case.

To fix the issue, we should port the change in patch 
[https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189698#comment-17189698
 ] 

Sahil Takiar commented on IMPALA-10139:
---

This is pretty easy to reproduce on master. I just ran the query "select * from 
functional.alltypes as a2, functional.alltypes as a1" and didn't fetch any 
results. A bunch of RPCs get sent, but are not processed because queues are 
probably full. Then the logs contain entries like:
{code:java}
I0902 13:25:34.797029 17168 rpcz_store.cc:269] Call 
impala.DataStreamService.TransmitData from 127.0.0.1:33354 (request call id 
6737) took 218496ms. Request Metrics: {}
I0902 13:25:34.797061 17168 rpcz_store.cc:273] Trace:
0902 13:21:56.300996 (+ 0us) impala-service-pool.cc:170] Inserting onto 
call queue
0902 13:21:56.301037 (+41us) impala-service-pool.cc:269] Handling call
0902 13:21:56.301048 (+11us) krpc-data-stream-recvr.cc:325] Enqueuing 
deferred RPC
0902 13:25:34.757315 (+218456267us) krpc-data-stream-recvr.cc:504] Processing 
deferred RPC
0902 13:25:34.757317 (+ 2us) krpc-data-stream-recvr.cc:524] Batch queue is 
full
0902 13:25:34.757319 (+ 2us) krpc-data-stream-recvr.cc:504] Processing 
deferred RPC
0902 13:25:34.757320 (+ 1us) krpc-data-stream-recvr.cc:524] Batch queue is 
full
0902 13:25:34.796800 (+ 39480us) krpc-data-stream-recvr.cc:504] Processing 
deferred RPC
0902 13:25:34.796803 (+ 3us) krpc-data-stream-recvr.cc:397] Deserializing 
batch
0902 13:25:34.797011 (+   208us) krpc-data-stream-recvr.cc:424] Enqueuing 
deserialized batch
0902 13:25:34.797021 (+10us) inbound_call.cc:162] Queueing success response
Metrics: {}
I0902 13:25:34.797154 17105 krpc-data-stream-sender.cc:394] Slow TransmitData 
RPC to 127.0.0.1:27000 
(fragment_instance_id=d447645333af3b77:671fbefe): took 3m38s. Receiver 
time: 3m38s Network time: 239.735us
I0902 13:25:34.797215  3684 krpc-data-stream-sender.cc:428] 
d447645333af3b77:671fbefe0005] Long delay waiting for RPC to 
127.0.0.1:27000 (fragment_instance_id=d447645333af3b77:671fbefe): took 
3m38s {code}

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189696#comment-17189696
 ] 

Sahil Takiar commented on IMPALA-10139:
---

Linking IMPALA-3380 - which has some details why we don't add timeouts for 
TransmitData RPCs.

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10135) Insert events doesn't contain the inserted data files

2020-09-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189685#comment-17189685
 ] 

Zoltán Borók-Nagy commented on IMPALA-10135:


Thanks for taking care of this, Vihang! Please note that the problem with 
INSERT OVERWRITEs is not that we don't provide the files, but that we don't 
even send events at all, because 'partsPostInsert' is always empty for INSERT 
OVERWRITEs, therefore 'insertEventInfos' also remains empty:

[https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4543-L4553]

 

> Insert events doesn't contain the inserted data files
> -
>
> Key: IMPALA-10135
> URL: https://issues.apache.org/jira/browse/IMPALA-10135
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When Impala generates INSERT EVENTs it doesn't add the newly inserted 
> datafiles.
> The problem is that Impala misuses Sets.difference(set1, set2). From the API 
> doc at 
> [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]
> "The returned set contains all elements that are contained by {{set1}} and 
> not contained by {{set2}}. {{set2}} may also contain elements not present in 
> {{set1}}; these are simply ignored."
> So the name "difference" is a bit misleading, it's rather a subtraction 
> between set1 and set2.
> Unfortunately Impala passes the parameters in wrong order: 
> Sets.difference(beforeInsert, afterInsert):
> [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]
> So the result will be always empty.
> There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT 
> events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9992) test_scanner_position seems flaky

2020-09-02 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189681#comment-17189681
 ] 

Joe McDonnell commented on IMPALA-9992:
---

We output a list of the files at the start of the tests after dataload. For a 
run not impacted by this, logs/file-list-begin-1.log has these entries for 
complextypestbl_medium (in the ORC format):
{noformat}
drwxr-xr-x   - jenkins supergroup  0 2020-09-02 05:11 
/test-warehouse/managed/complextypestbl_medium_orc_def
drwxr-xr-x   - jenkins supergroup  0 2020-09-02 05:11 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001
-rw-r--r--   3 jenkins supergroup  1 2020-09-02 05:11 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001/_orc_acid_version
-rw-r--r--   3 jenkins supergroup   6513 2020-09-02 05:11 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_0_1
-rw-r--r--   3 jenkins supergroup   6600 2020-09-02 05:11 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_1_0
-rw-r--r--   3 jenkins supergroup   6671 2020-09-02 05:11 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_2_1{noformat}
For a run impacted by this, it has this list in logs/file-list-begin-1.log:
{noformat}
drwxr-xr-x   - jenkins supergroup  0 2020-09-01 03:05 
/test-warehouse/managed/complextypestbl_medium_orc_def
drwxr-xr-x   - jenkins supergroup  0 2020-09-01 03:05 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001
-rw-r--r--   3 jenkins supergroup  1 2020-09-01 03:05 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001/_orc_acid_version
-rw-r--r--   3 jenkins supergroup   6513 2020-09-01 03:05 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_0_1
-rw-r--r--   3 jenkins supergroup   6600 2020-09-01 03:05 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_1_0
-rw-r--r--   3 jenkins supergroup   6671 2020-09-01 03:05 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_2_0
-rw-r--r--   3 jenkins supergroup   6671 2020-09-01 03:05 
/test-warehouse/managed/complextypestbl_medium_orc_def/base_001/bucket_2_1{noformat}
It looks like there is an extra file (bucket_2_0 and bucket_2_1 have 
the same size). This table is written by Hive during dataload.

>From the symptoms that I know about, this seems to only happen on ORC (but, of 
>course, the file list would have the other formats if we have ever seen it 
>elsewhere).

> test_scanner_position seems flaky
> -
>
> Key: IMPALA-9992
> URL: https://issues.apache.org/jira/browse/IMPALA-9992
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Reporter: Fang-Yu Rao
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: broken-build, flaky
>
> [test_scanner_position|https://github.com/apache/impala/blob/master/tests/query_test/test_nested_types.py#L72-L76]
>  failed in a recent build when executing the following query at 
> [https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/nested-types-scanner-position.test#L646-L666]
> {code:java}
> select pos, item, count(*)
> from complextypestbl_medium.int_array
> group by 1, 2
> {code}
> The error message is as follows.
> {code:java}
> ERROR:test_configuration:Comparing QueryTestResults (expected vs actual):
> 0,-1,7300 != 0,-1,9856
> 0,1,7300 != 0,1,9524
> 0,NULL,7300 != 0,NULL,9700
> 1,1,7300 != 1,1,9700
> 1,2,7300 != 1,2,9524
> 2,2,7300 != 2,2,9700
> 2,3,7300 != 2,3,9524
> 3,NULL,7300 != 3,NULL,9700
> 4,3,7300 != 4,3,9700
> 5,NULL,7300 != 5,NULL,9700
> {code}
> Maybe [~tarmstrong], [~bikram], and [~csringhofer] could offer some insight 
> into the issue since you were working on/reviewing the corresponding patch. 
> Assign the JIRA to [~tarmstrong] for now but please feel free to assign to 
> other as you find appropriate. Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10135) Insert events doesn't contain the inserted data files

2020-09-02 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned IMPALA-10135:


Assignee: Vihang Karajgaonkar

> Insert events doesn't contain the inserted data files
> -
>
> Key: IMPALA-10135
> URL: https://issues.apache.org/jira/browse/IMPALA-10135
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When Impala generates INSERT EVENTs it doesn't add the newly inserted 
> datafiles.
> The problem is that Impala misuses Sets.difference(set1, set2). From the API 
> doc at 
> [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]
> "The returned set contains all elements that are contained by {{set1}} and 
> not contained by {{set2}}. {{set2}} may also contain elements not present in 
> {{set1}}; these are simply ignored."
> So the name "difference" is a bit misleading, it's rather a subtraction 
> between set1 and set2.
> Unfortunately Impala passes the parameters in wrong order: 
> Sets.difference(beforeInsert, afterInsert):
> [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]
> So the result will be always empty.
> There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT 
> events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10135) Insert events doesn't contain the inserted data files

2020-09-02 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189644#comment-17189644
 ] 

Vihang Karajgaonkar commented on IMPALA-10135:
--

Thanks for reporting this issue [~boroknagyz]. I will take this up. The event 
type which is being used for generating the INSERT_EVENT defines the files as a 
optional field IIRC so in theory it is okay to ignore adding the files for this 
particular event type. Once we switch this to use transactional insert event 
type IMPALA-9664 for transactional tables, I think we will need to provide the 
files in both the cases (overwrite v/s no overwrite).

> Insert events doesn't contain the inserted data files
> -
>
> Key: IMPALA-10135
> URL: https://issues.apache.org/jira/browse/IMPALA-10135
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> When Impala generates INSERT EVENTs it doesn't add the newly inserted 
> datafiles.
> The problem is that Impala misuses Sets.difference(set1, set2). From the API 
> doc at 
> [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]
> "The returned set contains all elements that are contained by {{set1}} and 
> not contained by {{set2}}. {{set2}} may also contain elements not present in 
> {{set1}}; these are simply ignored."
> So the name "difference" is a bit misleading, it's rather a subtraction 
> between set1 and set2.
> Unfortunately Impala passes the parameters in wrong order: 
> Sets.difference(beforeInsert, afterInsert):
> [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]
> So the result will be always empty.
> There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT 
> events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10139:
-

 Summary: Slow RPC logs can be misleading
 Key: IMPALA-10139
 URL: https://issues.apache.org/jira/browse/IMPALA-10139
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
successfully complete a RPC. The issue is that there are many reasons why an 
RPC might take a long time to complete. An RPC is considered complete only when 
the receiver has processed that RPC. 

The problem is that due to client-driven back-pressure mechanism, it is 
entirely possible that the receiver RPC does not process a receiver RPC because 
{{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been called yet 
(indirectly called by {{ExchangeNode::GetNext}}).

This can lead to flood of slow RPC logs, even though the RPCs might not 
actually be slow themselves. What is worse is that the because of the 
back-pressure mechanism, slowness from the client (e.g. Hue users) will 
propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10139:
-

 Summary: Slow RPC logs can be misleading
 Key: IMPALA-10139
 URL: https://issues.apache.org/jira/browse/IMPALA-10139
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
successfully complete a RPC. The issue is that there are many reasons why an 
RPC might take a long time to complete. An RPC is considered complete only when 
the receiver has processed that RPC. 

The problem is that due to client-driven back-pressure mechanism, it is 
entirely possible that the receiver RPC does not process a receiver RPC because 
{{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been called yet 
(indirectly called by {{ExchangeNode::GetNext}}).

This can lead to flood of slow RPC logs, even though the RPCs might not 
actually be slow themselves. What is worse is that the because of the 
back-pressure mechanism, slowness from the client (e.g. Hue users) will 
propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10138) Add fragment instance id to RPC trace output

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10138:
-

 Summary: Add fragment instance id to RPC trace output
 Key: IMPALA-10138
 URL: https://issues.apache.org/jira/browse/IMPALA-10138
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The RPC traces added in IMPALA-9128 are hard to correlate to specific queries 
because the output does not include the fragment instance id. I'm not sure if 
this is actually possible in the current kRPC code, but it would be nice if the 
tracing output included the fragment instance id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10138) Add fragment instance id to RPC trace output

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10138:
-

 Summary: Add fragment instance id to RPC trace output
 Key: IMPALA-10138
 URL: https://issues.apache.org/jira/browse/IMPALA-10138
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The RPC traces added in IMPALA-9128 are hard to correlate to specific queries 
because the output does not include the fragment instance id. I'm not sure if 
this is actually possible in the current kRPC code, but it would be nice if the 
tracing output included the fragment instance id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-9954) RpcRecvrTime can be negative

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-9954:
-
Parent: IMPALA-10137
Issue Type: Sub-task  (was: Bug)

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5473) Make diagnosing network issues easier

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-5473:


Assignee: (was: Michael Ho)

> Make diagnosing network issues easier
> -
>
> Key: IMPALA-5473
> URL: https://issues.apache.org/jira/browse/IMPALA-5473
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 2.10.0
>Reporter: Henry Robinson
>Priority: Major
>
> With our current metrics in the profile, it's hard to debug queries that get 
> slow throughput from their exchanges. 
> The following cases have different causes, but similar symptoms (e.g. a high 
> {{InactiveTimer}} in the xchg profile):
> 1. Downstream sender does not produce rows quickly (perhaps because *its* 
> child instances do not produce rows quickly).
> 2. Downstream sender can not _send_ rows quickly, perhaps because of network 
> congestion.
> 3. Downstream sender does not start producing rows until some time after the 
> upstream has started (captured by {{FirstBatchArrivalWaitTime}}).
> 4. Downstream sender does not close stream until some time after all rows are 
> sent.
> We should try to improve these metrics so that all the information about who 
> is slow, and why, is available clearly in the runtime profile. Distinguishing 
> cases 1 and 2 is particularly important.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10049) Include RPC call_id in slow RPC logs

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10049:
--
Parent: IMPALA-10137
Issue Type: Sub-task  (was: Improvement)

> Include RPC call_id in slow RPC logs
> 
>
> Key: IMPALA-10049
> URL: https://issues.apache.org/jira/browse/IMPALA-10049
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
>
> The current code for logging slow RPCs on the sender side looks something 
> like this:
> {code:java}
> template 
> void KrpcDataStreamSender::Channel::LogSlowRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) {
>   int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns();
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Receiver time: "
>   ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), 
> TUnit::TIME_NS)
>   ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, 
> TUnit::TIME_NS);
> }
> void KrpcDataStreamSender::Channel::LogSlowFailedRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) {
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString();
> } {code}
> It would be nice to include the call_id in the logs as well so that RPCs can 
> more easily be traced. The RPC call_id is dumped in RPC traces on the 
> receiver side, as well as in the /rpcz output on the debug ui.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10135) Insert events doesn't contain the inserted data files

2020-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10135:
---
Description: 
When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles.

The problem is that Impala misuses Sets.difference(set1, set2). From the API 
doc at 
[https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]

"The returned set contains all elements that are contained by {{set1}} and not 
contained by {{set2}}. {{set2}} may also contain elements not present in 
{{set1}}; these are simply ignored."

So the name "difference" is a bit misleading, it's rather a subtraction between 
set1 and set2.

Unfortunately Impala passes the parameters in wrong order: 
Sets.difference(beforeInsert, afterInsert):

[https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]

So the result will be always empty.

There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT 
events.

  was:
When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles.

The problem is that Impala misuses Sets.difference(set1, set2). From the API 
doc at 
[https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]

"The returned set contains all elements that are contained by {{set1}} and not 
contained by {{set2}}. {{set2}} may also contain elements not present in 
{{set1}}; these are simply ignored."

So the name "difference" is a bit misleading, it's rather a subtraction between 
set1 and set2.

Unfortunately Impala passes the parameters in wrong order: 
Sets.difference(beforeInsert, afterInsert):

[https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]

So the result will be always empty.

There's another problem with INSERT OVERWRITEs, in that case we never fill the 
data files of the insert event.


> Insert events doesn't contain the inserted data files
> -
>
> Key: IMPALA-10135
> URL: https://issues.apache.org/jira/browse/IMPALA-10135
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> When Impala generates INSERT EVENTs it doesn't add the newly inserted 
> datafiles.
> The problem is that Impala misuses Sets.difference(set1, set2). From the API 
> doc at 
> [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]
> "The returned set contains all elements that are contained by {{set1}} and 
> not contained by {{set2}}. {{set2}} may also contain elements not present in 
> {{set1}}; these are simply ignored."
> So the name "difference" is a bit misleading, it's rather a subtraction 
> between set1 and set2.
> Unfortunately Impala passes the parameters in wrong order: 
> Sets.difference(beforeInsert, afterInsert):
> [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]
> So the result will be always empty.
> There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT 
> events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6705) TotalNetworkSendTime in query profile is misleading

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189580#comment-17189580
 ] 

Sahil Takiar commented on IMPALA-6705:
--

I got bitten by this recently, and it was very confusing. Linking to 
IMPALA-10137. +1 on fixing this.

> TotalNetworkSendTime in query profile is misleading
> ---
>
> Key: IMPALA-6705
> URL: https://issues.apache.org/jira/browse/IMPALA-6705
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, 
> Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
>Reporter: Michael Ho
>Priority: Major
>  Labels: observability
>
> {{TotalNetworkSendTime}} is actually measuring the time which a fragment 
> instance execution thread spent waiting for the completion of previous RPC. 
> This is a combination of:
>  - network time of sending the RPC payload to the destination
>  - processing and queuing time in the destination
>  - network time of sending the RPC response to the originating node
> The name of this metric itself is misleading because it gives the impression 
> that it's the time spent sending the RPC payload to the destination so a 
> query profile with a high {{TotalNetworkSendTime}} may easily mislead a user 
> into concluding that there is something wrong with the network. In reality, 
> the receiving end could be overloaded and it's taking a huge amount of time 
> to respond to an RPC.
> For this metric to be useful, we need to have a breakdown of those 3 
> components above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10122) Allow view authorization to be deferred until selection time

2020-09-02 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188692#comment-17188692
 ] 

Fang-Yu Rao edited comment on IMPALA-10122 at 9/2/20, 5:42 PM:
---

Hi [~vihangk1], [~stigahuang], and [~csringhofer], please take a look at the 
description of the JIRA and let me know if you have any additional idea or 
comment. Thanks!

Tagged [~jcamachorodriguez], [~jfs], [~hemanth619], and [~thejas] as well so 
that you know the issue is also tracked on the Impala side.



was (Author: fangyurao):
Hi [~vihangk1], [~stigahuang], and [~csringhofer], please take a look at the 
description of the JIRA and let me know if you have any additional idea or 
comment. Thanks!

Tagged [~jcamachorodriguez], [~jfs], and [~thejas] as well so that you know the 
issue is also tracked on the Impala side.


> Allow view authorization to be deferred until selection time
> 
>
> Key: IMPALA-10122
> URL: https://issues.apache.org/jira/browse/IMPALA-10122
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Recall that currently Impala performs authorization with Ranger to check 
> whether the requesting user is granted the privilege of {{SELECT}} for the 
> underlying tables when a view is created and thus does not check whether the 
> requesting user is granted the {{SELECT}} privilege on the underlying tables 
> when the view is selected.
> On the other hand, currently a Spark user is not allowed to directly create a 
> view in HMS without involving the Impala frontend, because Spark clients are 
> normal users (v.s. superusers). To relax this restriction, it would be good 
> to allow a Spark user to directly create a view in HMS without involving the 
> Impala frontend. However, it can be seen that the authorization check is 
> skipped for views created in this manner since HMS currently does not possess 
> the capability to perform the authorization. Due to this relaxation, for a 
> view created this way, the authorization of the view needs to be carried out 
> at the selection time to make sure the requesting user is indeed granted the 
> {{SELECT}} privileges on the underlying tables defined in the view.
> There is also a corresponding Hive JIRA at HIVE-24026. Refer to there for 
> further details.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10137) Network Debugging / Supportability Improvements

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10137:
--
Labels: observability  (was: )

> Network Debugging / Supportability Improvements
> ---
>
> Key: IMPALA-10137
> URL: https://issues.apache.org/jira/browse/IMPALA-10137
> Project: IMPALA
>  Issue Type: Epic
>Reporter: Sahil Takiar
>Priority: Major
>  Labels: observability
>
> There are various improvements Impala should make to improve debugging of 
> network issues (e.g. slow RPCs, TCP retransmissions, etc.).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10049) Include RPC call_id in slow RPC logs

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-10049:
-

Assignee: Sahil Takiar

> Include RPC call_id in slow RPC logs
> 
>
> Key: IMPALA-10049
> URL: https://issues.apache.org/jira/browse/IMPALA-10049
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The current code for logging slow RPCs on the sender side looks something 
> like this:
> {code:java}
> template 
> void KrpcDataStreamSender::Channel::LogSlowRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) {
>   int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns();
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Receiver time: "
>   ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), 
> TUnit::TIME_NS)
>   ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, 
> TUnit::TIME_NS);
> }
> void KrpcDataStreamSender::Channel::LogSlowFailedRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) {
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString();
> } {code}
> It would be nice to include the call_id in the logs as well so that RPCs can 
> more easily be traced. The RPC call_id is dumped in RPC traces on the 
> receiver side, as well as in the /rpcz output on the debug ui.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10049) Include RPC call_id in slow RPC logs

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-10049:
-

Assignee: (was: Sahil Takiar)

> Include RPC call_id in slow RPC logs
> 
>
> Key: IMPALA-10049
> URL: https://issues.apache.org/jira/browse/IMPALA-10049
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The current code for logging slow RPCs on the sender side looks something 
> like this:
> {code:java}
> template 
> void KrpcDataStreamSender::Channel::LogSlowRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) {
>   int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns();
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Receiver time: "
>   ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), 
> TUnit::TIME_NS)
>   ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, 
> TUnit::TIME_NS);
> }
> void KrpcDataStreamSender::Channel::LogSlowFailedRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) {
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString();
> } {code}
> It would be nice to include the call_id in the logs as well so that RPCs can 
> more easily be traced. The RPC call_id is dumped in RPC traces on the 
> receiver side, as well as in the /rpcz output on the debug ui.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10094) TestResetMetadata.test_refresh_updated_partitions fails due to connection error

2020-09-02 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10094.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> TestResetMetadata.test_refresh_updated_partitions fails due to connection 
> error
> ---
>
> Key: IMPALA-10094
> URL: https://issues.apache.org/jira/browse/IMPALA-10094
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Norbert Luksa
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.0
>
>
> This has occurred in the last few builds in 
> impala-cdpd-master-staging-core-s3: 
> [https://master-02.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/14/]
> Error message:
> {code:java}
> metadata/test_reset_metadata.py:49: in test_refresh_updated_partitions
> "alter table {0} add partition (year=2020, month=8)".format(tbl))
> common/impala_test_suite.py:983: in run_stmt_in_hive
> raise RuntimeError(stderr)
> E   RuntimeError: SLF4J: Class path contains multiple SLF4J bindings.
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> E   SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> E   ERROR StatusLogger No log4j2 configuration file found. Using default 
> configuration: logging only errors to the console. Set system property 
> 'log4j2.debug' to show Log4j2 internal initialization logging.
> E   SLF4J: Class path contains multiple SLF4J bindings.
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> E   SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> E   Connecting to jdbc:hive2://localhost:11050
> E   20/08/18 05:10:24 [main]: WARN jdbc.HiveConnection: Failed to connect to 
> localhost:11050
> E   Could not open connection to the HS2 server. Please check the server URI 
> and if the URI is correct, then ask the administrator to check the server 
> status.
> E   Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://localhost:11050: java.net.ConnectException: Connection refused 
> (Connection refused) (state=08S01,code=0){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10094) TestResetMetadata.test_refresh_updated_partitions fails due to connection error

2020-09-02 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10094.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> TestResetMetadata.test_refresh_updated_partitions fails due to connection 
> error
> ---
>
> Key: IMPALA-10094
> URL: https://issues.apache.org/jira/browse/IMPALA-10094
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Norbert Luksa
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.0
>
>
> This has occurred in the last few builds in 
> impala-cdpd-master-staging-core-s3: 
> [https://master-02.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/14/]
> Error message:
> {code:java}
> metadata/test_reset_metadata.py:49: in test_refresh_updated_partitions
> "alter table {0} add partition (year=2020, month=8)".format(tbl))
> common/impala_test_suite.py:983: in run_stmt_in_hive
> raise RuntimeError(stderr)
> E   RuntimeError: SLF4J: Class path contains multiple SLF4J bindings.
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> E   SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> E   ERROR StatusLogger No log4j2 configuration file found. Using default 
> configuration: logging only errors to the console. Set system property 
> 'log4j2.debug' to show Log4j2 internal initialization logging.
> E   SLF4J: Class path contains multiple SLF4J bindings.
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> E   SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> E   Connecting to jdbc:hive2://localhost:11050
> E   20/08/18 05:10:24 [main]: WARN jdbc.HiveConnection: Failed to connect to 
> localhost:11050
> E   Could not open connection to the HS2 server. Please check the server URI 
> and if the URI is correct, then ask the administrator to check the server 
> status.
> E   Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://localhost:11050: java.net.ConnectException: Connection refused 
> (Connection refused) (state=08S01,code=0){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10137) Network Debugging / Supportability Improvements

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10137:
-

 Summary: Network Debugging / Supportability Improvements
 Key: IMPALA-10137
 URL: https://issues.apache.org/jira/browse/IMPALA-10137
 Project: IMPALA
  Issue Type: Epic
Reporter: Sahil Takiar


There are various improvements Impala should make to improve debugging of 
network issues (e.g. slow RPCs, TCP retransmissions, etc.).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10137) Network Debugging / Supportability Improvements

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10137:
-

 Summary: Network Debugging / Supportability Improvements
 Key: IMPALA-10137
 URL: https://issues.apache.org/jira/browse/IMPALA-10137
 Project: IMPALA
  Issue Type: Epic
Reporter: Sahil Takiar


There are various improvements Impala should make to improve debugging of 
network issues (e.g. slow RPCs, TCP retransmissions, etc.).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10124) admission-controller-test fails with no such file or directory error

2020-09-02 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189380#comment-17189380
 ] 

Qifan Chen edited comment on IMPALA-10124 at 9/2/20, 5:01 PM:
--

Thanks a lot for reporting the issue and providing the detailed info. 

 

Here are frames from the stack trace. 

 
{code:java}
#0 0x0249fc6e in std::vector, 
std::allocator > >::operator[] (this=0x17c84e48, 
__n=2)
 at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_vector.h:816
 #1 0x024a5118 in std::_detail::_Executor, 
std::allocator > >, 
std::allocator, std::allocator > > 
> >, std::_cxx11::regex_traits, true>::_M_handle_repeat (
 this=0x7ffc5aa85d28, 
 _match_mode=std::detail::_Executor >, 
std::allocator > > >, std::_cxx11::regex_traits, 
true>::_Match_mode::_Exact, __i=2)
 at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:207
 #2 0x024a4f42 in std::_detail::_Executor, 
std::allocator > >, 
std::allocator, std::allocator > > 
> >, std::_cxx11::regex_traits, true>::_M_dfs (
 this=0x7ffc5aa85d28, 
 _match_mode=std::detail::_Executor >, 
std::allocator > > >, std::_cxx11::regex_traits, 
true>::_Match_mode::_Exact, __i=2)
 at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:466
 #3 0x024a62c3 in std::_detail::_Executor, 
std::allocator > >, 
std::allocator, std::allocator > > 
> >, std::_cxx11::regex_traits, true>::_M_handle_match (
 this=0x7ffc5aa85d28, 
 _match_mode=std::detail::_Executor >, 
std::allocator > > >, std::_cxx11::regex_traits, 
true>::_Match_mode::_Exact, __i=1)
 at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:329
... ... ... ...
#5956 0x024260e6 in std::regex_match, 
std::allocator, char, std::_cxx11::regex_traits > (_s=..., 
__re=..., __flags=(unknown: 0)) at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex.h:2169
 #5957 0x0241e03f in 
impala::AdmissionControllerTest_TopNQueryCheck_Test::TestBody (this=0x15ac9480) 
at /home/qchen/Impala/be/src/scheduling/admission-controller-test.cc:1075
 #5958 0x093f6bfa in void 
testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::)(), char const) ()
 #5959 0x093f002a in testing::Test::Run() ()
 #5960 0x093f010c in testing::TestInfo::Run() ()
 #5961 0x093f0245 in testing::TestCase::Run() ()
 #5962 0x093f08f0 in testing::internal::UnitTestImpl::RunAllTests() ()
 #5963 0x093f0a27 in testing::UnitTest::Run() ()
 #5964 0x01d7de0b in main (argc=2, argv=0x7ffc5aa872b8) at 
/home/qchen/Impala/be/src/service/unified-betest-main.cc:48
{code}
 

 

The responding code is as follows.

 
{code:java}
1064 string mem_details_for_host0 = 
1065 admission_controller->GetLogStringForTopNQueriesOnHost(HOST_0); 
1066 // Verify that the 5 top ones appear in the following order. 
1067 std::regex pattern_pools_for_host0(".*"+ 
1068 QUEUE_B+".*"+"id=0001:0002, consumed=10.00 
MB"+".*"+ 
1069 QUEUE_A+".*"+"id=:, consumed=10.00 
MB"+".*"+ 
1070 QUEUE_D+".*"+"id=0003:0011, consumed=9.00 
MB"+".*"+ 
1071 "id=0003:000a, consumed=9.00 MB"+".*"+ 
1072 "id=0003:0007, consumed=9.00 MB"+".*" 
1073 ,std::regex::basic 
1074 ); 
1075 ASSERT_TRUE(std::regex_match(mem_details_for_host0, 
pattern_pools_for_host0)); 
1076{code}


was (Author: sql_forever):
Thanks a lot for reporting the issue and providing the detailed info. 

 

Here are frames from the stack trace. 

#0 0x0249fc6e in std::vector, 
std::allocator > >::operator[] (this=0x17c84e48, 
__n=2)
 at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_vector.h:816
#1 0x024a5118 in 
std::__detail::_Executor<__gnu_cxx::__normal_iterator, std::allocator > 
>, std::allocator, 
std::allocator > > > >, std::__cxx11::regex_traits, 
true>::_M_handle_repeat (
 this=0x7ffc5aa85d28, 
 __match_mode=std::__detail::_Executor<__gnu_cxx::__normal_iterator >, 
std::allocator > > >, 
std::__cxx11::regex_traits, true>::_Match_mode::_Exact, __i=2)
 at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:207
#2 0x024a4f42 in 
std::__detail::_Executor<__gnu_cxx::__normal_iterator, 

[jira] [Commented] (IMPALA-10124) admission-controller-test fails with no such file or directory error

2020-09-02 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189380#comment-17189380
 ] 

Qifan Chen commented on IMPALA-10124:
-

Thanks a lot for reporting the issue and providing the detailed info. 

 

Here are frames from the stack trace. 

#0 0x0249fc6e in std::vector, 
std::allocator > >::operator[] (this=0x17c84e48, 
__n=2)
 at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_vector.h:816
#1 0x024a5118 in 
std::__detail::_Executor<__gnu_cxx::__normal_iterator, std::allocator > 
>, std::allocator, 
std::allocator > > > >, std::__cxx11::regex_traits, 
true>::_M_handle_repeat (
 this=0x7ffc5aa85d28, 
 __match_mode=std::__detail::_Executor<__gnu_cxx::__normal_iterator >, 
std::allocator > > >, 
std::__cxx11::regex_traits, true>::_Match_mode::_Exact, __i=2)
 at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:207
#2 0x024a4f42 in 
std::__detail::_Executor<__gnu_cxx::__normal_iterator, std::allocator > 
>, std::allocator, 
std::allocator > > > >, std::__cxx11::regex_traits, true>::_M_dfs (
 this=0x7ffc5aa85d28, 
 __match_mode=std::__detail::_Executor<__gnu_cxx::__normal_iterator >, 
std::allocator > > >, 
std::__cxx11::regex_traits, true>::_Match_mode::_Exact, __i=2)
 at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:466
#3 0x024a62c3 in 
std::__detail::_Executor<__gnu_cxx::__normal_iterator, std::allocator > 
>, std::allocator, 
std::allocator > > > >, std::__cxx11::regex_traits, 
true>::_M_handle_match (
 this=0x7ffc5aa85d28, 
 __match_mode=std::__detail::_Executor<__gnu_cxx::__normal_iterator >, 
std::allocator > > >, 
std::__cxx11::regex_traits, true>::_Match_mode::_Exact, __i=1)
 at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex_executor.tcc:329

... ... ... ...

#5956 0x024260e6 in std::regex_match, 
std::allocator, char, std::__cxx11::regex_traits > (__s=..., 
__re=..., __flags=(unknown: 0)) at 
/home/qchen/Impala/toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/regex.h:2169
#5957 0x0241e03f in 
impala::AdmissionControllerTest_TopNQueryCheck_Test::TestBody (this=0x15ac9480) 
at /home/qchen/Impala/be/src/scheduling/admission-controller-test.cc:1075
#5958 0x093f6bfa in void 
testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) ()
#5959 0x093f002a in testing::Test::Run() ()
#5960 0x093f010c in testing::TestInfo::Run() ()
#5961 0x093f0245 in testing::TestCase::Run() ()
#5962 0x093f08f0 in testing::internal::UnitTestImpl::RunAllTests() ()
#5963 0x093f0a27 in testing::UnitTest::Run() ()
#5964 0x01d7de0b in main (argc=2, argv=0x7ffc5aa872b8) at 
/home/qchen/Impala/be/src/service/unified-betest-main.cc:48

> admission-controller-test fails with no such file or directory error
> 
>
> Key: IMPALA-10124
> URL: https://issues.apache.org/jira/browse/IMPALA-10124
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Qifan Chen
>Priority: Major
>
> In master-core-ubsan, the admission-controller-test fails :
> 03:12:04 
> /data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/be/build/debug//scheduling/admission-controller-test:
>  line 10: 29380 Segmentation fault  (core dumped) 
> ${IMPALA_HOME}/bin/run-jvm-binary.sh 
> ${IMPALA_HOME}/be/build/latest/service/unifiedbetests 
> --gtest_filter=${GTEST_FILTER} 
> --gtest_output=xml:${IMPALA_BE_TEST_LOGS_DIR}/${TEST_EXEC_NAME}.xml 
> -log_filename="${TEST_EXEC_NAME}" "$@"
> 03:12:04 Traceback (most recent call last):
> 03:12:04   File 
> "/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/junitxml_prune_notrun.py",
>  line 71, in 
> 03:12:04 if __name__ == "__main__": main()
> 03:12:04   File 
> "/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/junitxml_prune_notrun.py",
>  line 68, in main
> 03:12:04 junitxml_prune_notrun(options.filename)
> 03:12:04   File 
> "/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/junitxml_prune_notrun.py",
>  line 31, in junitxml_prune_notrun
> 03:12:04 root = tree.parse(junitxml_filename)
> 03:12:04   File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 647, in 
> parse
> 03:12:04 source = open(source, "rb")
> 03:12:04 IOError: [Errno 2] No 

[jira] [Commented] (IMPALA-10071) Impala shouldn't create filename starting with underscore during TRUNCATE

2020-09-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189315#comment-17189315
 ] 

ASF subversion and git services commented on IMPALA-10071:
--

Commit 502e1134be595ed5506424ee5f06dcf52f6fc646 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=502e113 ]

IMPALA-10071: Impala shouldn't create filename starting with underscore during 
ACID TRUNCATE

When Impala TRUNCATEs an ACID table, it creates a new base directory
with the hidden file "_empty" in it. Newer Hive versions ignore files
starting with underscore, therefore they ignore the whole base
directory.

To resolve this issue we can simply rename the empty file to "empty".

Testing:
 * update acid-truncate.test accordingly

Change-Id: Ia0557b9944624bc123c540752bbe3877312a7ac9
Reviewed-on: http://gerrit.cloudera.org:8080/16396
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 


> Impala shouldn't create filename starting with underscore during TRUNCATE
> -
>
> Key: IMPALA-10071
> URL: https://issues.apache.org/jira/browse/IMPALA-10071
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> When Impala TRUNCATEs an ACID table, it creates a new base directory with the 
> hidden file "_empty" in it.
> Newer Hive versions ignore files starting with underscore, therefore they 
> ignore the whole base directory.
> To resolve this issue we can simply rename the empty file to "empty".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10136) Cardinality estimates for aggregation operations don't consider conjuncts on grouping expressions correctly

2020-09-02 Thread Shant Hovsepian (Jira)
Shant Hovsepian created IMPALA-10136:


 Summary: Cardinality estimates for aggregation operations don't 
consider conjuncts on grouping expressions correctly
 Key: IMPALA-10136
 URL: https://issues.apache.org/jira/browse/IMPALA-10136
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Shant Hovsepian
Assignee: Shant Hovsepian


ComputeStats() in the PlanNode calls estimateNumGroups() for the 
AggregationNode to calculate the cardinality of a grouping expression. Then in 
a later step applyConjunctsSelectivity() is called to adjust the cardinality 
based on the available conjuncts. However with aggregation operations certain 
conjuncts i.e. those from the HAVING clause or conjuncts on the grouping 
expressions affect the number of groups produced.

ndv(day) = 11 

count(alltypesagg) = 10280
{code:java}
Query: explain select day, count(*) from alltypesagg where day=2 group by 1
++
| Explain String |
++
| Max Per-Host Resource Reservation: Memory=4.06MB Threads=4 |
| Per-Host Resource Estimates: Memory=52MB   |
| Codegen disabled by planner|
||
| PLAN-ROOT SINK |
| |  |
| 04:EXCHANGE [UNPARTITIONED]|
| |  |
| 03:AGGREGATE [FINALIZE]|
| |  output: count:merge(*)  |
| |  group by: `day` |
| |  row-size=12B cardinality=11 |
| |  |
| 02:EXCHANGE [HASH(`day`)]  |
| |  |
| 01:AGGREGATE [STREAMING]   |
| |  output: count(*)|
| |  group by: `day` |
| |  row-size=12B cardinality=11 |
| |  |
| 00:SCAN HDFS [functional.alltypesagg]  |
|partition predicates: `day` = 2 |
|HDFS partitions=1/11 files=1 size=74.48KB   |
|row-size=4B cardinality=1.00K   |
++
Fetched 24 row(s) in 0.02s
{code}
 

Given the predicate day=1 applies to the grouping expression the cardinality of 
the aggregation node should b 1 as opposed to 11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10136) Cardinality estimates for aggregation operations don't consider conjuncts on grouping expressions correctly

2020-09-02 Thread Shant Hovsepian (Jira)
Shant Hovsepian created IMPALA-10136:


 Summary: Cardinality estimates for aggregation operations don't 
consider conjuncts on grouping expressions correctly
 Key: IMPALA-10136
 URL: https://issues.apache.org/jira/browse/IMPALA-10136
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Shant Hovsepian
Assignee: Shant Hovsepian


ComputeStats() in the PlanNode calls estimateNumGroups() for the 
AggregationNode to calculate the cardinality of a grouping expression. Then in 
a later step applyConjunctsSelectivity() is called to adjust the cardinality 
based on the available conjuncts. However with aggregation operations certain 
conjuncts i.e. those from the HAVING clause or conjuncts on the grouping 
expressions affect the number of groups produced.

ndv(day) = 11 

count(alltypesagg) = 10280
{code:java}
Query: explain select day, count(*) from alltypesagg where day=2 group by 1
++
| Explain String |
++
| Max Per-Host Resource Reservation: Memory=4.06MB Threads=4 |
| Per-Host Resource Estimates: Memory=52MB   |
| Codegen disabled by planner|
||
| PLAN-ROOT SINK |
| |  |
| 04:EXCHANGE [UNPARTITIONED]|
| |  |
| 03:AGGREGATE [FINALIZE]|
| |  output: count:merge(*)  |
| |  group by: `day` |
| |  row-size=12B cardinality=11 |
| |  |
| 02:EXCHANGE [HASH(`day`)]  |
| |  |
| 01:AGGREGATE [STREAMING]   |
| |  output: count(*)|
| |  group by: `day` |
| |  row-size=12B cardinality=11 |
| |  |
| 00:SCAN HDFS [functional.alltypesagg]  |
|partition predicates: `day` = 2 |
|HDFS partitions=1/11 files=1 size=74.48KB   |
|row-size=4B cardinality=1.00K   |
++
Fetched 24 row(s) in 0.02s
{code}
 

Given the predicate day=1 applies to the grouping expression the cardinality of 
the aggregation node should b 1 as opposed to 11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10135) Insert events doesn't contain the inserted data files

2020-09-02 Thread Jira
Zoltán Borók-Nagy created IMPALA-10135:
--

 Summary: Insert events doesn't contain the inserted data files
 Key: IMPALA-10135
 URL: https://issues.apache.org/jira/browse/IMPALA-10135
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles.

The problem is that Impala misuses Sets.difference(set1, set2). From the API 
doc at 
[https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]

"The returned set contains all elements that are contained by {{set1}} and not 
contained by {{set2}}. {{set2}} may also contain elements not present in 
{{set1}}; these are simply ignored."

So the name "difference" is a bit misleading, its rather a subtraction between 
set1 and set2.

Unfortunately Impala passes the parameters in wrong order: 
Sets.difference(beforeInsert, afterInsert):

[https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]

So the result will be always empty.

There's another problem with INSERT OVERWRITEs, in that case we never fill the 
data files of the insert event.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10135) Insert events doesn't contain the inserted data files

2020-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10135:
---
Description: 
When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles.

The problem is that Impala misuses Sets.difference(set1, set2). From the API 
doc at 
[https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]

"The returned set contains all elements that are contained by {{set1}} and not 
contained by {{set2}}. {{set2}} may also contain elements not present in 
{{set1}}; these are simply ignored."

So the name "difference" is a bit misleading, it's rather a subtraction between 
set1 and set2.

Unfortunately Impala passes the parameters in wrong order: 
Sets.difference(beforeInsert, afterInsert):

[https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]

So the result will be always empty.

There's another problem with INSERT OVERWRITEs, in that case we never fill the 
data files of the insert event.

  was:
When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles.

The problem is that Impala misuses Sets.difference(set1, set2). From the API 
doc at 
[https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]

"The returned set contains all elements that are contained by {{set1}} and not 
contained by {{set2}}. {{set2}} may also contain elements not present in 
{{set1}}; these are simply ignored."

So the name "difference" is a bit misleading, its rather a subtraction between 
set1 and set2.

Unfortunately Impala passes the parameters in wrong order: 
Sets.difference(beforeInsert, afterInsert):

[https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]

So the result will be always empty.

There's another problem with INSERT OVERWRITEs, in that case we never fill the 
data files of the insert event.


> Insert events doesn't contain the inserted data files
> -
>
> Key: IMPALA-10135
> URL: https://issues.apache.org/jira/browse/IMPALA-10135
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> When Impala generates INSERT EVENTs it doesn't add the newly inserted 
> datafiles.
> The problem is that Impala misuses Sets.difference(set1, set2). From the API 
> doc at 
> [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]
> "The returned set contains all elements that are contained by {{set1}} and 
> not contained by {{set2}}. {{set2}} may also contain elements not present in 
> {{set1}}; these are simply ignored."
> So the name "difference" is a bit misleading, it's rather a subtraction 
> between set1 and set2.
> Unfortunately Impala passes the parameters in wrong order: 
> Sets.difference(beforeInsert, afterInsert):
> [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]
> So the result will be always empty.
> There's another problem with INSERT OVERWRITEs, in that case we never fill 
> the data files of the insert event.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10135) Insert events doesn't contain the inserted data files

2020-09-02 Thread Jira
Zoltán Borók-Nagy created IMPALA-10135:
--

 Summary: Insert events doesn't contain the inserted data files
 Key: IMPALA-10135
 URL: https://issues.apache.org/jira/browse/IMPALA-10135
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles.

The problem is that Impala misuses Sets.difference(set1, set2). From the API 
doc at 
[https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]

"The returned set contains all elements that are contained by {{set1}} and not 
contained by {{set2}}. {{set2}} may also contain elements not present in 
{{set1}}; these are simply ignored."

So the name "difference" is a bit misleading, its rather a subtraction between 
set1 and set2.

Unfortunately Impala passes the parameters in wrong order: 
Sets.difference(beforeInsert, afterInsert):

[https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]

So the result will be always empty.

There's another problem with INSERT OVERWRITEs, in that case we never fill the 
data files of the insert event.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-10132) Implement ds_hll_estimate_bounds()

2020-09-02 Thread Adam Tamas (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189257#comment-17189257
 ] 

Adam Tamas commented on IMPALA-10132:
-

Impala is not able to give back complex types as of now.

We can either wait until it gets implemented or give back the estimations as a 
string with the values separated with commas.

> Implement ds_hll_estimate_bounds()
> --
>
> Key: IMPALA-10132
> URL: https://issues.apache.org/jira/browse/IMPALA-10132
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Adam Tamas
>Priority: Major
>
> In hive ds_hll_estimate_bounds() gives back an array of doubles.
> An example for a sketch created from a table which contains only a single 
> value:
> {code:java}
> (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;)
> +---+
> |  _c0  |
> +---+
> | [1.0,1.0,1.998634873453]  |
> +---+
> {code}
> The values of the array is probably a lower bound, an estimate and an upper 
> bound of the sketch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10132) Implement ds_hll_estimate_bounds()

2020-09-02 Thread Adam Tamas (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Tamas updated IMPALA-10132:

Description: 
In hive ds_hll_estimate_bounds() gives back an array of doubles.

An example for a sketch created from a table which contains only a single value:

{code:java}
(select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;)
+---+
|  _c0  |
+---+
| [1.0,1.0,1.998634873453]  |
+---+
{code}

The values of the array is probably a lower bound, an estimate and an upper 
bound of the sketch.

  was:
In hive ds_hll_estimate_bounds() gives back an array of doubles.

An example for a sketch created from a table which contains only a single value:
(select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;)
+---+
|  _c0  |
+---+
| [1.0,1.0,1.998634873453]  |
+---+

The values of the array is probably a lower bound, an estimate and an upper 
bound of the sketch.


> Implement ds_hll_estimate_bounds()
> --
>
> Key: IMPALA-10132
> URL: https://issues.apache.org/jira/browse/IMPALA-10132
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Adam Tamas
>Priority: Major
>
> In hive ds_hll_estimate_bounds() gives back an array of doubles.
> An example for a sketch created from a table which contains only a single 
> value:
> {code:java}
> (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;)
> +---+
> |  _c0  |
> +---+
> | [1.0,1.0,1.998634873453]  |
> +---+
> {code}
> The values of the array is probably a lower bound, an estimate and an upper 
> bound of the sketch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10133) Implement ds_hll_stringify()

2020-09-02 Thread Adam Tamas (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Tamas updated IMPALA-10133:

Description: 
This function receives a string that is a serialized Apache DataSketches
HLL sketch and returns its stringified format.

A stringified format should look like and contains the following data:

{code:java}
(select ds_hll_stringify(ds_hll_sketch(i)) from t;)
++
|_c0 |
++
| ### HLL SKETCH SUMMARY: 
  Log Config K   : 12
  Hll Target : HLL_4
  Current Mode   : LIST
  Memory : true
  LB : 1.0
  Estimate   : 1.0
  UB : 1.49929250618
  OutOfOrder Flag: false
  Coupon Count   : 1
 |
++
{code}


  was:
This function receives a string that is a serialized Apache DataSketches
HLL sketch and returns its stringified format.

A stringified format should look like and contains the following data:
(select ds_hll_stringify(ds_hll_sketch(i)) from t;)
++
|_c0 |
++
| ### HLL SKETCH SUMMARY: 
  Log Config K   : 12
  Hll Target : HLL_4
  Current Mode   : LIST
  Memory : true
  LB : 1.0
  Estimate   : 1.0
  UB : 1.49929250618
  OutOfOrder Flag: false
  Coupon Count   : 1
 |
++


> Implement ds_hll_stringify()
> 
>
> Key: IMPALA-10133
> URL: https://issues.apache.org/jira/browse/IMPALA-10133
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Adam Tamas
>Priority: Major
>
> This function receives a string that is a serialized Apache DataSketches
> HLL sketch and returns its stringified format.
> A stringified format should look like and contains the following data:
> {code:java}
> (select ds_hll_stringify(ds_hll_sketch(i)) from t;)
> ++
> |_c0 |
> ++
> | ### HLL SKETCH SUMMARY: 
>   Log Config K   : 12
>   Hll Target : HLL_4
>   Current Mode   : LIST
>   Memory : true
>   LB : 1.0
>   Estimate   : 1.0
>   UB : 1.49929250618
>   OutOfOrder Flag: false
>   Coupon Count   : 1
>  |
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10133) Implement ds_hll_stringify()

2020-09-02 Thread Adam Tamas (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10133 started by Adam Tamas.
---
> Implement ds_hll_stringify()
> 
>
> Key: IMPALA-10133
> URL: https://issues.apache.org/jira/browse/IMPALA-10133
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Adam Tamas
>Assignee: Adam Tamas
>Priority: Major
>
> This function receives a string that is a serialized Apache DataSketches
> HLL sketch and returns its stringified format.
> A stringified format should look like and contains the following data:
> {code:java}
> (select ds_hll_stringify(ds_hll_sketch(i)) from t;)
> ++
> |_c0 |
> ++
> | ### HLL SKETCH SUMMARY: 
>   Log Config K   : 12
>   Hll Target : HLL_4
>   Current Mode   : LIST
>   Memory : true
>   LB : 1.0
>   Estimate   : 1.0
>   UB : 1.49929250618
>   OutOfOrder Flag: false
>   Coupon Count   : 1
>  |
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10134) Implement ds_hll_union_f()

2020-09-02 Thread Adam Tamas (Jira)
Adam Tamas created IMPALA-10134:
---

 Summary: Implement ds_hll_union_f()
 Key: IMPALA-10134
 URL: https://issues.apache.org/jira/browse/IMPALA-10134
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Adam Tamas


Implement ds_hll_union_f() and make sure it's behaveing similarly as in Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10134) Implement ds_hll_union_f()

2020-09-02 Thread Adam Tamas (Jira)
Adam Tamas created IMPALA-10134:
---

 Summary: Implement ds_hll_union_f()
 Key: IMPALA-10134
 URL: https://issues.apache.org/jira/browse/IMPALA-10134
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Adam Tamas


Implement ds_hll_union_f() and make sure it's behaveing similarly as in Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work stopped] (IMPALA-10107) Implement HLL functions to have full compatibility with Hive

2020-09-02 Thread Adam Tamas (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10107 stopped by Adam Tamas.
---
> Implement HLL functions to have full compatibility with Hive
> 
>
> Key: IMPALA-10107
> URL: https://issues.apache.org/jira/browse/IMPALA-10107
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Gabor Kaszab
>Assignee: Adam Tamas
>Priority: Minor
>
> ds_hll_estimate_bounds
> ds_hll_stringify
> ds_hll_union_f
> For parameters and expected behaviour check Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10133) Implement ds_hll_stringify()

2020-09-02 Thread Adam Tamas (Jira)
Adam Tamas created IMPALA-10133:
---

 Summary: Implement ds_hll_stringify()
 Key: IMPALA-10133
 URL: https://issues.apache.org/jira/browse/IMPALA-10133
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Adam Tamas


This function receives a string that is a serialized Apache DataSketches
HLL sketch and returns its stringified format.

A stringified format should look like and contains the following data:
(select ds_hll_stringify(ds_hll_sketch(i)) from t;)
++
|_c0 |
++
| ### HLL SKETCH SUMMARY: 
  Log Config K   : 12
  Hll Target : HLL_4
  Current Mode   : LIST
  Memory : true
  LB : 1.0
  Estimate   : 1.0
  UB : 1.49929250618
  OutOfOrder Flag: false
  Coupon Count   : 1
 |
++



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10133) Implement ds_hll_stringify()

2020-09-02 Thread Adam Tamas (Jira)
Adam Tamas created IMPALA-10133:
---

 Summary: Implement ds_hll_stringify()
 Key: IMPALA-10133
 URL: https://issues.apache.org/jira/browse/IMPALA-10133
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Adam Tamas


This function receives a string that is a serialized Apache DataSketches
HLL sketch and returns its stringified format.

A stringified format should look like and contains the following data:
(select ds_hll_stringify(ds_hll_sketch(i)) from t;)
++
|_c0 |
++
| ### HLL SKETCH SUMMARY: 
  Log Config K   : 12
  Hll Target : HLL_4
  Current Mode   : LIST
  Memory : true
  LB : 1.0
  Estimate   : 1.0
  UB : 1.49929250618
  OutOfOrder Flag: false
  Coupon Count   : 1
 |
++



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10132) Implement ds_hll_estimate_bounds()

2020-09-02 Thread Adam Tamas (Jira)
Adam Tamas created IMPALA-10132:
---

 Summary: Implement ds_hll_estimate_bounds()
 Key: IMPALA-10132
 URL: https://issues.apache.org/jira/browse/IMPALA-10132
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Adam Tamas


In hive ds_hll_estimate_bounds() gives back an array of doubles.

An example for a sketch created from a table which contains only a single value:
(select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;)
+---+
|  _c0  |
+---+
| [1.0,1.0,1.998634873453]  |
+---+

The values of the array is probably a lower bound, an estimate and an upper 
bound of the sketch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10132) Implement ds_hll_estimate_bounds()

2020-09-02 Thread Adam Tamas (Jira)
Adam Tamas created IMPALA-10132:
---

 Summary: Implement ds_hll_estimate_bounds()
 Key: IMPALA-10132
 URL: https://issues.apache.org/jira/browse/IMPALA-10132
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Adam Tamas


In hive ds_hll_estimate_bounds() gives back an array of doubles.

An example for a sketch created from a table which contains only a single value:
(select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;)
+---+
|  _c0  |
+---+
| [1.0,1.0,1.998634873453]  |
+---+

The values of the array is probably a lower bound, an estimate and an upper 
bound of the sketch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10131) Document ds_hll_* functions

2020-09-02 Thread Adam Tamas (Jira)
Adam Tamas created IMPALA-10131:
---

 Summary: Document ds_hll_* functions
 Key: IMPALA-10131
 URL: https://issues.apache.org/jira/browse/IMPALA-10131
 Project: IMPALA
  Issue Type: New Feature
  Components: Docs
Affects Versions: Impala 4.0
Reporter: Adam Tamas


There were some new functions added recently to add support for Apache 
DataSketches KLL calculations. These functions purpose is to give an 
approximate boundaries for 
 a given dataset. It is an implementation of a very compact quantiles sketch 
with lazy compaction scheme and nearly optimal accuracy per bit.

The newly introduced functions are:
 ds_kll_sketch()
ds_kll_quantile() 
ds_kll_quantiles_as_string()
ds_kll_n()
ds_kll_union()
ds_kll_rank()
ds_kll_pmf_as_string()
ds_kll_cdf_as_string()
ds_kll_stringify()

Related Jiras:
https://issues.apache.org/jira/browse/IMPALA-9959
https://issues.apache.org/jira/browse/IMPALA-9962
https://issues.apache.org/jira/browse/IMPALA-9963
https://issues.apache.org/jira/browse/IMPALA-10017
https://issues.apache.org/jira/browse/IMPALA-10018
https://issues.apache.org/jira/browse/IMPALA-10019
https://issues.apache.org/jira/browse/IMPALA-10020
https://issues.apache.org/jira/browse/IMPALA-10108

We should document these and mark them as experimental features so that users 
can try out and hopefully give feedback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10131) Document ds_hll_* functions

2020-09-02 Thread Adam Tamas (Jira)
Adam Tamas created IMPALA-10131:
---

 Summary: Document ds_hll_* functions
 Key: IMPALA-10131
 URL: https://issues.apache.org/jira/browse/IMPALA-10131
 Project: IMPALA
  Issue Type: New Feature
  Components: Docs
Affects Versions: Impala 4.0
Reporter: Adam Tamas


There were some new functions added recently to add support for Apache 
DataSketches KLL calculations. These functions purpose is to give an 
approximate boundaries for 
 a given dataset. It is an implementation of a very compact quantiles sketch 
with lazy compaction scheme and nearly optimal accuracy per bit.

The newly introduced functions are:
 ds_kll_sketch()
ds_kll_quantile() 
ds_kll_quantiles_as_string()
ds_kll_n()
ds_kll_union()
ds_kll_rank()
ds_kll_pmf_as_string()
ds_kll_cdf_as_string()
ds_kll_stringify()

Related Jiras:
https://issues.apache.org/jira/browse/IMPALA-9959
https://issues.apache.org/jira/browse/IMPALA-9962
https://issues.apache.org/jira/browse/IMPALA-9963
https://issues.apache.org/jira/browse/IMPALA-10017
https://issues.apache.org/jira/browse/IMPALA-10018
https://issues.apache.org/jira/browse/IMPALA-10019
https://issues.apache.org/jira/browse/IMPALA-10020
https://issues.apache.org/jira/browse/IMPALA-10108

We should document these and mark them as experimental features so that users 
can try out and hopefully give feedback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10131) Document ds_kll_* functions

2020-09-02 Thread Adam Tamas (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Tamas updated IMPALA-10131:

Summary: Document ds_kll_* functions  (was: Document ds_hll_* functions)

> Document ds_kll_* functions
> ---
>
> Key: IMPALA-10131
> URL: https://issues.apache.org/jira/browse/IMPALA-10131
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Docs
>Affects Versions: Impala 4.0
>Reporter: Adam Tamas
>Priority: Major
>  Labels: doc
>
> There were some new functions added recently to add support for Apache 
> DataSketches KLL calculations. These functions purpose is to give an 
> approximate boundaries for 
>  a given dataset. It is an implementation of a very compact quantiles sketch 
> with lazy compaction scheme and nearly optimal accuracy per bit.
> The newly introduced functions are:
>  ds_kll_sketch()
> ds_kll_quantile() 
> ds_kll_quantiles_as_string()
> ds_kll_n()
> ds_kll_union()
> ds_kll_rank()
> ds_kll_pmf_as_string()
> ds_kll_cdf_as_string()
> ds_kll_stringify()
> Related Jiras:
> https://issues.apache.org/jira/browse/IMPALA-9959
> https://issues.apache.org/jira/browse/IMPALA-9962
> https://issues.apache.org/jira/browse/IMPALA-9963
> https://issues.apache.org/jira/browse/IMPALA-10017
> https://issues.apache.org/jira/browse/IMPALA-10018
> https://issues.apache.org/jira/browse/IMPALA-10019
> https://issues.apache.org/jira/browse/IMPALA-10020
> https://issues.apache.org/jira/browse/IMPALA-10108
> We should document these and mark them as experimental features so that users 
> can try out and hopefully give feedback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10108) Implement ds_kll_stringify function

2020-09-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189179#comment-17189179
 ] 

ASF subversion and git services commented on IMPALA-10108:
--

Commit 4cb3c3556e77ee24003383155ca5e1b70be4db6e in impala's branch 
refs/heads/master from Adam Tamas
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4cb3c35 ]

IMPALA-10108: Implement ds_kll_stringify function

This function receives a string that is a serialized Apache DataSketches
KLL sketch and returns its stringified format.

A stringified format should look like and contains the following data:

select ds_kll_stringify(ds_kll_sketch(float_col))
from functional_parquet.alltypestiny;
++
| ds_kll_stringify(ds_kll_sketch(float_col)) |
++
| ### KLL sketch summary:|
|K  : 200|
|min K  : 200|
|M  : 8  |
|N  : 8  |
|Epsilon: 1.33%  |
|Epsilon PMF: 1.65%  |
|Empty  : false  |
|Estimation mode: false  |
|Levels : 1  |
|Sorted : false  |
|Capacity items : 200|
|Retained items : 8  |
|Storage bytes  : 64 |
|Min value  : 0  |
|Max value  : 1.1|
| ### End sketch summary |
||
++

Change-Id: I97f654a4838bf91e3e0bed6a00d78b2c7aa96f75
Reviewed-on: http://gerrit.cloudera.org:8080/16370
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Implement ds_kll_stringify function
> ---
>
> Key: IMPALA-10108
> URL: https://issues.apache.org/jira/browse/IMPALA-10108
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Gabor Kaszab
>Assignee: Adam Tamas
>Priority: Major
>
> ds_kll_stringify() receives a string that is a serialized Apache DataSketches 
> sketch and returns its stringified format by invoking the related function on 
> the sketch's interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10090) Create aarch64 development environment on ubuntu 18.04

2020-09-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189178#comment-17189178
 ] 

ASF subversion and git services commented on IMPALA-10090:
--

Commit 0098113d9582c3b93044eed4078a7f0724dda26f in impala's branch 
refs/heads/master from zhaorenhai
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0098113 ]

IMPALA-10090: Create aarch64 development environment on ubuntu 18.04

Including following changes:
1 build native-toolchain local by script on aarch64 platform
2 change some native-toolchain's lib version number
3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things,
  because on aarch64, just need to download cdp components ,
  but not need to download toolchain.
4 download hadoop aarch64 nativelibs , impala building needs these libs.

With this commit,  on ubuntu 18.04 aarch64 version,
just need to run bin/bootstrap_development.sh, just like x86.

Change-Id: I769668c834ab0dd504a822ed9153186778275d59
Reviewed-on: http://gerrit.cloudera.org:8080/16065
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Create aarch64 development environment on ubuntu 18.04
> --
>
> Key: IMPALA-10090
> URL: https://issues.apache.org/jira/browse/IMPALA-10090
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: zhaorenhai
>Assignee: zhaorenhai
>Priority: Major
>
> Including following changes:
> 1 build native-toolchain local by script on aarch64 platform
> 2 change some native-toolchain's lib version number
> 3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things, 
> because on aarch64, just need to download cdp components , but not need to 
> download toolchain.
> 4 download hadoop aarch64 nativelibs , impala building needs these libs.
> With this commit,  on ubuntu 18.04 aarch64 version, just need to run 
> bin/bootstrap_development.sh, just like x86.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10094) TestResetMetadata.test_refresh_updated_partitions fails due to connection error

2020-09-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189177#comment-17189177
 ] 

ASF subversion and git services commented on IMPALA-10094:
--

Commit 28b1542db0c6c91a9d7fc626ac50c2bfc84dabb2 in impala's branch 
refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=28b1542 ]

IMPALA-10094: Skip test_refresh_updated_partitions on S3

The test test_refresh_updated_partitions runs some commands using Hive which
causes it fail on S3 specific jobs since we don't run HiveServer2 in those
environments. This patch skips the test on non-hdfs environments.

Change-Id: I0d27dd76e772e396a07419a58821ba899ac74188
Reviewed-on: http://gerrit.cloudera.org:8080/16399
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> TestResetMetadata.test_refresh_updated_partitions fails due to connection 
> error
> ---
>
> Key: IMPALA-10094
> URL: https://issues.apache.org/jira/browse/IMPALA-10094
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Norbert Luksa
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> This has occurred in the last few builds in 
> impala-cdpd-master-staging-core-s3: 
> [https://master-02.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/14/]
> Error message:
> {code:java}
> metadata/test_reset_metadata.py:49: in test_refresh_updated_partitions
> "alter table {0} add partition (year=2020, month=8)".format(tbl))
> common/impala_test_suite.py:983: in run_stmt_in_hive
> raise RuntimeError(stderr)
> E   RuntimeError: SLF4J: Class path contains multiple SLF4J bindings.
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> E   SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> E   ERROR StatusLogger No log4j2 configuration file found. Using default 
> configuration: logging only errors to the console. Set system property 
> 'log4j2.debug' to show Log4j2 internal initialization logging.
> E   SLF4J: Class path contains multiple SLF4J bindings.
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/apache-hive-3.1.3000.7.2.2.0-135-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: Found binding in 
> [jar:file:/data0/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/cdp_components-5250295/hadoop-3.1.1.7.2.2.0-135/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> E   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> E   SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> E   Connecting to jdbc:hive2://localhost:11050
> E   20/08/18 05:10:24 [main]: WARN jdbc.HiveConnection: Failed to connect to 
> localhost:11050
> E   Could not open connection to the HS2 server. Please check the server URI 
> and if the URI is correct, then ask the administrator to check the server 
> status.
> E   Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://localhost:11050: java.net.ConnectException: Connection refused 
> (Connection refused) (state=08S01,code=0){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10115) Impala should check file schema as well to check full ACIDv2 files

2020-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10115.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Impala should check file schema as well to check full ACIDv2 files
> --
>
> Key: IMPALA-10115
> URL: https://issues.apache.org/jira/browse/IMPALA-10115
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 4.0
>
>
> Currently Impala checks file metadata 'hive.acid.version' to decide the full 
> ACID schema.
> There are cases when Hive forgets to set this value for full ACID files, e.g. 
> major query-based compactions.
> So if 'hive.acid.version' is not present, Impala should still look at the 
> schema elements to be sure about the file format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10115) Impala should check file schema as well to check full ACIDv2 files

2020-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-10115:
--

Assignee: Zoltán Borók-Nagy

> Impala should check file schema as well to check full ACIDv2 files
> --
>
> Key: IMPALA-10115
> URL: https://issues.apache.org/jira/browse/IMPALA-10115
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> Currently Impala checks file metadata 'hive.acid.version' to decide the full 
> ACID schema.
> There are cases when Hive forgets to set this value for full ACID files, e.g. 
> major query-based compactions.
> So if 'hive.acid.version' is not present, Impala should still look at the 
> schema elements to be sure about the file format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10115) Impala should check file schema as well to check full ACIDv2 files

2020-09-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10115.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Impala should check file schema as well to check full ACIDv2 files
> --
>
> Key: IMPALA-10115
> URL: https://issues.apache.org/jira/browse/IMPALA-10115
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 4.0
>
>
> Currently Impala checks file metadata 'hive.acid.version' to decide the full 
> ACID schema.
> There are cases when Hive forgets to set this value for full ACID files, e.g. 
> major query-based compactions.
> So if 'hive.acid.version' is not present, Impala should still look at the 
> schema elements to be sure about the file format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10090) Create aarch64 development environment on ubuntu 18.04

2020-09-02 Thread zhaorenhai (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaorenhai resolved IMPALA-10090.
-
Resolution: Fixed

> Create aarch64 development environment on ubuntu 18.04
> --
>
> Key: IMPALA-10090
> URL: https://issues.apache.org/jira/browse/IMPALA-10090
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: zhaorenhai
>Assignee: zhaorenhai
>Priority: Major
>
> Including following changes:
> 1 build native-toolchain local by script on aarch64 platform
> 2 change some native-toolchain's lib version number
> 3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things, 
> because on aarch64, just need to download cdp components , but not need to 
> download toolchain.
> 4 download hadoop aarch64 nativelibs , impala building needs these libs.
> With this commit,  on ubuntu 18.04 aarch64 version, just need to run 
> bin/bootstrap_development.sh, just like x86.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10090) Create aarch64 development environment on ubuntu 18.04

2020-09-02 Thread zhaorenhai (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaorenhai resolved IMPALA-10090.
-
Resolution: Fixed

> Create aarch64 development environment on ubuntu 18.04
> --
>
> Key: IMPALA-10090
> URL: https://issues.apache.org/jira/browse/IMPALA-10090
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: zhaorenhai
>Assignee: zhaorenhai
>Priority: Major
>
> Including following changes:
> 1 build native-toolchain local by script on aarch64 platform
> 2 change some native-toolchain's lib version number
> 3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things, 
> because on aarch64, just need to download cdp components , but not need to 
> download toolchain.
> 4 download hadoop aarch64 nativelibs , impala building needs these libs.
> With this commit,  on ubuntu 18.04 aarch64 version, just need to run 
> bin/bootstrap_development.sh, just like x86.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)