[jira] [Updated] (IMPALA-10505) Avoid creating misleading audit logs when a requesting user does not have privileges on the underlying tables of a view

2021-02-12 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-10505:
-
Description: 
We found that misleading audit logs could be generated in Impala if a 
requesting user granted the privileges on a view does not have the privileges 
on the table(s) on which the view is based. Such an issue could be reproduced 
as follows.
 # Start an authorization-enabled Impala cluster.
 # As the user {{admin}}, execute "{{CREATE VIEW 
default.v_functional_alltypestiny AS SELECT id, bool_col FROM 
functional.alltypestiny;}}".
 # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE 
default.v_functional_alltypestiny TO USER non_owner;}}".
 # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}".
 # Add a break point at 
[RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122]
 to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the 
following statement.
 # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM 
default.v_functional_alltypestiny;}}"

We will find that only 1 {{AuthzAuditEvent}} was logged. Specifically, the 
field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of 
'{{accessResult}}' is 0, indicating this is a failed authorization for the 
underlying table of the view. But actually the user '{{non_owner}}' is and 
should be allowed to execute the statement since it was granted the privilege 
on the view.

Therefore, we should remove such a confusing log entry and also retain the 
audit log entry corresponding to the privilege check for the view, i.e., 
{{default.v_functional_alltypestiny}}.

I have the following findings after an initial investigation.

Under the hood Impala performed 2 privileges checks. One for the view and the 
other for the table on which the view is based. Since the user has been granted 
the {{SELECT}} privilege on the view, the first privilege check would succeed, 
whereas the second privilege check would fail since the user does not have the 
{{SELECT}} privilege on the underlying table.

Each privilege check resulted in one audit log entry generated by the Ranger 
server. Thus the first audit log entry would be a successful audit event 
because it corresponds to the privilege check for the view. However, the second 
privilege check resulted in a failed audit event since it corresponds to the 
privilege check for the underlying table and the requesting user does not have 
the {{SELECT}} privilege on the table. Impala performed the 2nd check for a 
reason. In short, the requesting user is not allowed to access the runtime 
profile if the user does not have the privileges on the underlying table(s). 
Refer to 
[BaseAuthorizationChecker#authorize()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L175-L190]
 for further details.

On the other hand, for a list of audit events resulting from a query, if there 
exists a failed audit event, only the first failed audit event would be kept by 
Impala and then sent to Ranger. That is the reason why in the end we only saw 
that failed audit event.

  was:
We found that misleading audit logs could be generated in Impala if a 
requesting user granted the privileges on a view does not have the privileges 
on the table(s) on which the view is based. Such an issue could be reproduced 
as follows.
 # Start an authorization-enabled Impala cluster.
 # As the user {{admin}}, execute "{{CREATE VIEW 
default.v_functional_alltypestiny AS SELECT id, bool_col FROM 
functional.alltypestiny;}}".
 # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE 
default.v_functional_alltypestiny TO USER non_owner;}}".
 # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}".
 # Add a break point at 
[RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122]
 to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the 
following statement.
 # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM 
default.v_functional_alltypestiny;}}"

We will find that only 1 {{AuthzZuditEvent}} was logged. Specifically, the 
field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of 
'{{accessResult}}' is 0, indicating this is a failed authorization for the 
underlying table of the view. But actually the user '{{non_owner}}' is and 
should be allowed to execute the statement since it was granted the privilege 
on the view.

Therefore, we should remove such a confusing log entry and also retain the 
audit log entry corresponding to the privilege check for the 

[jira] [Created] (IMPALA-10506) Check if Impala LZ4 has same bug as ARROW-11301

2021-02-12 Thread Tim Armstrong (Jira)
Tim Armstrong created IMPALA-10506:
--

 Summary: Check if Impala LZ4 has same bug as ARROW-11301
 Key: IMPALA-10506
 URL: https://issues.apache.org/jira/browse/IMPALA-10506
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Tim Armstrong
Assignee: Csaba Ringhofer


I noticed ARROW-11301 in the context of a Parquet discussion 
(https://github.com/apache/parquet-format/pull/164/files/2dfe463c948948f7d9624bee3cdd4706eb3488b5#diff-a1727652430ce24c121536393f2ece63c5799a99583738f48aa8bb9fa71cb3f8)
 and wondered if Impala has made the same mistake

CC [~arawat] [~csringhofer] [~boroknagyz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10505) Avoid creating misleading audit logs when a requesting user does not have privileges on the underlying tables of a view

2021-02-12 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-10505:
-
Description: 
We found that misleading audit logs could be generated in Impala if a 
requesting user granted the privileges on a view does not have the privileges 
on the table(s) on which the view is based. Such an issue could be reproduced 
as follows.
 # Start an authorization-enabled Impala cluster.
 # As the user {{admin}}, execute "{{CREATE VIEW 
default.v_functional_alltypestiny AS SELECT id, bool_col FROM 
functional.alltypestiny;}}".
 # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE 
default.v_functional_alltypestiny TO USER non_owner;}}".
 # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}".
 # Add a break point at 
[RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122]
 to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the 
following statement.
 # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM 
default.v_functional_alltypestiny;}}"

We will find that only 1 {{AuthzZuditEvent}} was logged. Specifically, the 
field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of 
'{{accessResult}}' is 0, indicating this is a failed authorization for the 
underlying table of the view. But actually the user '{{non_owner}}' is and 
should be allowed to execute the statement since it was granted the privilege 
on the view.

Therefore, we should remove such a confusing log entry and also retain the 
audit log entry corresponding to the privilege check for the view, i.e., 
{{default.v_functional_alltypestiny}}.

I have the following findings after an initial investigation.

Under the hood Impala performed 2 privileges checks. One for the view and the 
other for the table on which the view is based. Since the user has been granted 
the {{SELECT}} privilege on the view, the first privilege check would succeed, 
whereas the second privilege check would fail since the user does not have the 
{{SELECT}} privilege on the underlying table.

Each privilege check resulted in one audit log entry generated by the Ranger 
server. Thus the first audit log entry would be a successful audit event 
because it corresponds to the privilege check for the view. However, the second 
privilege check resulted in a failed audit event since it corresponds to the 
privilege check for the underlying table and the requesting user does not have 
the {{SELECT}} privilege on the table. Impala performed the 2nd check for a 
reason. In short, the requesting user is not allowed to access the runtime 
profile if the user does not have the privileges on the underlying table(s). 
Refer to 
[BaseAuthorizationChecker#authorize()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L175-L190]
 for further details.

On the other hand, for a list of audit events resulting from a query, if there 
exists a failed audit event, only the first failed audit event would be kept by 
Impala and then sent to Ranger. That is the reason why in the end we only saw 
that failed audit event.

  was:
We found that misleading audit logs could be generated in Impala if a 
requesting user granted the privileges on a view does not have the privileges 
on the table(s) on which the view is based. Such an issue could be reproduced 
as follows.
 # Start an authorization-enabled Impala cluster.
 # As the user {{admin}}, execute "{{CREATE VIEW 
default.v_functional_alltypestiny AS SELECT id, bool_col FROM 
functional.alltypestiny;}}".
 # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE 
default.v_functional_alltypestiny TO USER non_owner;}}".
 # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}".
 # Add a break point at 
[RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122]
 to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the 
following statement.
 # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM 
default.v_functional_alltypestiny;}}"

We will find that only 1 AuthzZuditEvent was logged. Specifically, the field of 
'{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of 
'{{accessResult}}' is 0, indicating this is a failed authorization for the 
underlying table of the view. But actually the user '{{non_owner}}' is and 
should be allowed to execute the statement since it was granted the privilege 
on the view.

Therefore, we should remove such a confusing log entry and also retain the 
audit log entry corresponding to the privilege check for the 

[jira] [Updated] (IMPALA-10505) Avoid creating misleading audit logs when a requesting user does not have privileges on the underlying tables of a view

2021-02-12 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-10505:
-
Description: 
We found that misleading audit logs could be generated in Impala if a 
requesting user granted the privileges on a view does not have the privileges 
on the table(s) on which the view is based. Such an issue could be reproduced 
as follows.
 # Start an authorization-enabled Impala cluster.
 # As the user {{admin}}, execute "{{CREATE VIEW 
default.v_functional_alltypestiny AS SELECT id, bool_col FROM 
functional.alltypestiny;}}".
 # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE 
default.v_functional_alltypestiny TO USER non_owner;}}".
 # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}".
 # Add a break point at 
[RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122]
 to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the 
following statement.
 # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM 
default.v_functional_alltypestiny;}}"

We will find that only 1 AuthzZuditEvent was logged. Specifically, the field of 
'{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of 
'{{accessResult}}' is 0, indicating this is a failed authorization for the 
underlying table of the view. But actually the user '{{non_owner}}' is and 
should be allowed to execute the statement since it was granted the privilege 
on the view.

Therefore, we should remove such a confusing log entry and also retain the 
audit log entry corresponding to the privilege check for the view, i.e., 
{{default.v_functional_alltypestiny}}.

I have the following findings after an initial investigation.

Under the hood Impala performed 2 privileges checks. One for the view and the 
other for the table on which the view is based. Since the user has been granted 
the {{SELECT}} privilege on the view, the first privilege check would succeed, 
whereas the second privilege check would fail since the user does not have the 
{{SELECT}} privilege on the underlying table.

Each privilege check resulted in one audit log entry generated by the Ranger 
server. Thus the first audit log entry would be a successful audit event 
because it corresponds to the privilege check for the view. However, the second 
privilege check resulted in a failed audit event since it corresponds to the 
privilege check for the underlying table and the requesting user does not have 
the {{SELECT}} privilege on the table. Impala performed the 2nd check for a 
reason. In short, the requesting user is not allowed to access the runtime 
profile if the user does not have the privileges on the underlying table(s). 
Refer to 
[BaseAuthorizationChecker#authorize()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L175-L190]
 for further details.

On the other hand, for a list of audit events resulting from a query, if there 
exists a failed audit event, only the first failed audit event would be kept by 
Impala and then sent to Ranger. That is the reason why in the end we only saw 
that failed audit event.

  was:
We found that misleading audit logs could be generated in Impala if a 
requesting user granted the privileges on a view does not have the privileges 
on the table(s) on which the view is based. Such an issue could be reproduced 
as follows.
 # Start an authorization-enabled Impala cluster.
 # As the user {{admin}}, execute "{{CREATE VIEW 
default.v_functional_alltypestiny AS SELECT id, bool_col FROM 
functional.alltypestiny;}}".
 # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE 
default.v_functional_alltypestiny TO USER non_owner;}}".
 # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}".
 # Add a break point at 
[RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122]
 to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the 
following statement.
 # As the user {{non_owner}}, execute "{{SELECT COUNT(*) FROM 
default.v_functional_alltypestiny;}}"

We will find that only 1 AuthzZuditEvent was logged. Specifically, the field of 
'{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of 
'{{accessResult}}' is 0, indicating this is a failed authorization for the 
underlying table of the view. But actually the user '{{non_owner}}' is and 
should be allowed to execute the statement since it was granted the privilege 
on the view.

Therefore, we should remove such a confusing log entry and also retain the 
audit log entry corresponding to the privilege check for the view, 

[jira] [Created] (IMPALA-10505) Avoid creating misleading audit logs when a requesting user does not have privileges on the underlying tables of a view

2021-02-12 Thread Fang-Yu Rao (Jira)
Fang-Yu Rao created IMPALA-10505:


 Summary: Avoid creating misleading audit logs when a requesting 
user does not have privileges on the underlying tables of a view
 Key: IMPALA-10505
 URL: https://issues.apache.org/jira/browse/IMPALA-10505
 Project: IMPALA
  Issue Type: Bug
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao


We found that misleading audit logs could be generated in Impala if a 
requesting user granted the privileges on a view does not have the privileges 
on the table(s) on which the view is based. Such an issue could be reproduced 
as follows.
 # Start an authorization-enabled Impala cluster.
 # As the user {{admin}}, execute "{{CREATE VIEW 
default.v_functional_alltypestiny AS SELECT id, bool_col FROM 
functional.alltypestiny;}}".
 # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE 
default.v_functional_alltypestiny TO USER non_owner;}}".
 # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}".
 # Add a break point at 
[RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122]
 to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the 
following statement.
 # As the user {{non_owner}}, execute "{{SELECT COUNT(*) FROM 
default.v_functional_alltypestiny;}}"

We will find that only 1 AuthzZuditEvent was logged. Specifically, the field of 
'{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of 
'{{accessResult}}' is 0, indicating this is a failed authorization for the 
underlying table of the view. But actually the user '{{non_owner}}' is and 
should be allowed to execute the statement since it was granted the privilege 
on the view.

Therefore, we should remove such a confusing log entry and also retain the 
audit log entry corresponding to the privilege check for the view, i.e., 
{{default.v_functional_alltypestiny}}.

I have the following findings after an initial investigation.

Under the hood Impala performed 2 privileges checks. One for the view and the 
other for the table on which the view is based. Since the user has been granted 
the {{SELECT}} privilege on the view, the first privilege check would succeed, 
whereas the second privilege check would fail since the user does not have the 
{{SELECT}} privilege on the underlying table.

Each privilege check resulted in one audit log entry generated by the Ranger 
server. Thus the first audit log entry would be a successful audit event 
because it corresponds to the privilege check for the view. However, the second 
privilege check resulted in a failed audit event since it corresponds to the 
privilege check for the underlying table and the requesting user does not have 
the {{SELECT}} privilege on the table. Impala performed the 2nd check for a 
reason. In short, the requesting user is not allowed to access the runtime 
profile if the user does not have the privileges on the underlying table(s). 
Refer to 
[BaseAuthorizationChecker#authorize()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L175-L190]
 for further details.

On the other hand, for a list of audit events resulting from a query, if there 
exists a failed audit event, only the first failed audit event would be kept by 
Impala and then sent to Ranger. That is the reason why in the end we only saw 
that failed audit event.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-9745) SELECT from view fails with "AnalysisException: No matching function with signature: to_timestamp(TIMESTAMP, STRING)" after expression rewrite.

2021-02-12 Thread Aman Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9745 started by Aman Sinha.
--
> SELECT from view fails with "AnalysisException: No matching function with 
> signature: to_timestamp(TIMESTAMP, STRING)" after expression rewrite.
> ---
>
> Key: IMPALA-9745
> URL: https://issues.apache.org/jira/browse/IMPALA-9745
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.11.0, Impala 4.0
>Reporter: Andrew Sherman
>Assignee: Aman Sinha
>Priority: Critical
>
> Simple test case
> {code}
> drop view if exists test_replication_view;
> drop table if exists test_replication;
> create table test_replication(cob string);
> insert into test_replication values('2018-06-07');
> insert into test_replication values('2018-06-07');
> insert into test_replication values('2018-06-07');
> insert into test_replication values('2018-06-08');
> select * from test_replication;
> create view test_replication_view as select to_timestamp(cob, '-MM-dd') 
> cob_ts,cob trade_date from test_replication;
> select 1 from test_replication_view deal WHERE trade_date = deal.cob_ts AND 
> deal.cob_ts = '2018-06-07';
> {code}
> The problem seems to be that after expression rewrite the type of cob has 
> become a timestamp and so we look for the function "to_timestamp(TIMESTAMP, 
> STRING)" instead of "to_timestamp(STRING, STRING)".
> A workaround is to run with
> {code}
> set enable_expr_rewrites=false;
> {code}
> For comparison a similar query runs OK in mysql
> {code}
> drop view if exists test_replication_view;
> drop table if exists test_replication;
> create table test_replication(cob varchar(255));
> insert into test_replication values('2018-06-07');
> insert into test_replication values('2018-06-07');
> insert into test_replication values('2018-06-07');
> insert into test_replication values('2018-06-08');
> select * from test_replication;
> create view test_replication_view as select str_to_date(cob, '%Y-%m-%d') 
> cob_ts,cob trade_date from test_replication;
> select 1 from test_replication_view deal WHERE trade_date = deal.cob_ts AND 
> deal.cob_ts = '2018-06-07'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4805) Avoid hash exchanges before analytic functions in more situations.

2021-02-12 Thread Aman Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved IMPALA-4805.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Avoid hash exchanges before analytic functions in more situations.
> --
>
> Key: IMPALA-4805
> URL: https://issues.apache.org/jira/browse/IMPALA-4805
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: Alexander Behm
>Assignee: Aman Sinha
>Priority: Major
>  Labels: performance, ramp-up
> Fix For: Impala 4.0
>
>
> This case works as expected. There is no no hash exchange before 
> sort+analytic:
> {code}
> explain select /* +straight_join */ count(*) over (partition by t1.id)
> from functional.alltypes t1
> inner join /* +shuffle */ functional.alltypes t2
>   on t1.id = t2.id
> +---+
> | Explain String|
> +---+
> | Estimated Per-Host Requirements: Memory=168.01MB VCores=2 |
> |   |
> | PLAN-ROOT SINK|
> | | |
> | 07:EXCHANGE [UNPARTITIONED]   |
> | | |
> | 04:ANALYTIC   |
> | |  functions: count(*)|
> | |  partition by: t1.id|
> | | |
> | 03:SORT   |
> | |  order by: id ASC NULLS FIRST   |
> | | |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED]|
> | |  hash predicates: t1.id = t2.id |
> | |  runtime filters: RF000 <- t2.id|
> | | |
> | |--06:EXCHANGE [HASH(t2.id)]  |
> | |  |  |
> | |  01:SCAN HDFS [functional.alltypes t2]  |
> | | partitions=24/24 files=24 size=478.45KB |
> | | |
> | 05:EXCHANGE [HASH(t1.id)] |
> | | |
> | 00:SCAN HDFS [functional.alltypes t1] |
> |partitions=24/24 files=24 size=478.45KB|
> |runtime filters: RF000 -> t1.id|
> +---+
> {code}
> This equivalent case has an unnecessary hash exchange:
> {code}
> explain select /* +straight_join */ count(*) over (partition by t2.id)
> from functional.alltypes t1
> inner join /* +shuffle */ functional.alltypes t2
>   on t1.id = t2.id
> +---+
> | Explain String|
> +---+
> | Estimated Per-Host Requirements: Memory=168.01MB VCores=3 |
> |   |
> | PLAN-ROOT SINK|
> | | |
> | 08:EXCHANGE [UNPARTITIONED]   |
> | | |
> | 04:ANALYTIC   |
> | |  functions: count(*)|
> | |  partition by: t2.id|
> | | |
> | 03:SORT   |
> | |  order by: id ASC NULLS FIRST   |
> | | |
> | 07:EXCHANGE [HASH(t2.id)] |
> | | |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED]|
> | |  hash predicates: t1.id = t2.id |
> | |  runtime filters: RF000 <- t2.id|
> | | |
> | |--06:EXCHANGE [HASH(t2.id)]  |
> | |  |  |
> | |  01:SCAN HDFS [functional.alltypes t2]  |
> | | partitions=24/24 files=24 size=478.45KB |
> | |   

[jira] [Commented] (IMPALA-10501) Hit DCHECK in parquet-column-readers.cc: def_levels_.CacheRemaining() <= num_buffered_values_

2021-02-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283868#comment-17283868
 ] 

Zoltán Borók-Nagy commented on IMPALA-10501:


The failing query was (from 
impalad.c34abf949077.invalid-user.log.INFO.20210211-023758.1):
{noformat}
I0211 03:55:22.140455 102573 Frontend.java:1587] 
be46bb72819942fd:85934edd] Analyzing query: select l_shipmode, 
o_orderpriority, count(*)
from tpch_nested_parquet.customer.c_orders o, o.o_lineitems l
where l_receiptdate < '1992-01-10'{noformat}
from test *test_parquet_stats.py::TestParquetStats::test_page_index*

It's hard to tell what went wrong without the data file. I'm planning to run 
load_nested and the above test in a loop, that'll hopefully reproduce this 
issue.

> Hit DCHECK in parquet-column-readers.cc:  def_levels_.CacheRemaining() <= 
> num_buffered_values_
> --
>
> Key: IMPALA-10501
> URL: https://issues.apache.org/jira/browse/IMPALA-10501
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Tim Armstrong
>Assignee: Zoltán Borók-Nagy
>Priority: Blocker
>  Labels: broken-build, crash, flaky, parquet
> Attachments: consoleText.3.gz, impalad_coord_exec-0.tar.gz
>
>
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3814/
> {noformat}
> F0211 03:55:26.383247 14487 parquet-column-readers.cc:517] 
> be46bb72819942fd:85934edd0001] Check failed: def_levels_.CacheRemaining() 
> <= num_buffered_values_ (921 vs. 916) 
> *** Check failure stack trace: ***
> @  0x53646ec  google::LogMessage::Fail()
> @  0x5365fdc  google::LogMessage::SendToLog()
> @  0x536404a  google::LogMessage::Flush()
> @  0x5367c48  google::LogMessageFatal::~LogMessageFatal()
> @  0x2ff886f  
> impala::ScalarColumnReader<>::MaterializeValueBatch<>()
> @  0x2f8ae44  
> impala::ScalarColumnReader<>::MaterializeValueBatch<>()
> @  0x2f761bf  impala::ScalarColumnReader<>::ReadValueBatch<>()
> @  0x2f2889a  impala::ScalarColumnReader<>::ReadValueBatch()
> @  0x2ebd8c0  impala::HdfsParquetScanner::AssembleRows()
> @  0x2eb882e  impala::HdfsParquetScanner::GetNextInternal()
> @  0x2eb67bd  impala::HdfsParquetScanner::ProcessSplit()
> @  0x2aaf3f2  impala::HdfsScanNode::ProcessSplit()
> @  0x2aae773  impala::HdfsScanNode::ScannerThread()
> @  0x2aadadb  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x2aafe94  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x220e331  boost::function0<>::operator()()
> @  0x2842e7f  impala::Thread::SuperviseThread()
> @  0x284ae1c  boost::_bi::list5<>::operator()<>()
> @  0x284ad40  boost::_bi::bind_t<>::operator()()
> @  0x284ad01  boost::detail::thread_data<>::run()
> @  0x406b291  thread_proxy
> @ 0x7f2465cba6b9  start_thread
> @ 0x7f24627e64dc  clone
> rImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>   at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:866)
> {noformat}
> It was likely a fuzz test:
> {noformat}
> 19:55:23 
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
>  50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] 
> 19:55:23 [gw5] PASSED 
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
>  50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 

[jira] [Commented] (IMPALA-4805) Avoid hash exchanges before analytic functions in more situations.

2021-02-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283863#comment-17283863
 ] 

ASF subversion and git services commented on IMPALA-4805:
-

Commit 4721978e8fb6d80a9f023e568b983b12b14f8acc in impala's branch 
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4721978 ]

IMPALA-4805: Avoid hash exchange before analytic function if appropriate

This patch avoids adding a hash exchange below an analytic function
that has partition by b as long as the child can satisfy that
requirement through an equivalence relationship .. i.e an exact match is
not required.

For example:
select count(*) over (partition by b) from t1, t2 where a = b

In this case, the analytic sort has a required partitioning on b but the
child is an inner join whose output partition key could be either 'a' or
'b' (it happens to be 'a' given how the data partition was populated),
then we should still be able to use the child's partitioning without
adding a hash exchange. Note that for outer joins the logic is slightly
different.

Testing:
 - Added a new planner test with analytic function + inner join
   (outer join test case already existed before).

Change-Id: Icb6289d1e70cfb6bbd5b38eedb00856dbc85ac77
Reviewed-on: http://gerrit.cloudera.org:8080/16888
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Avoid hash exchanges before analytic functions in more situations.
> --
>
> Key: IMPALA-4805
> URL: https://issues.apache.org/jira/browse/IMPALA-4805
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: Alexander Behm
>Assignee: Aman Sinha
>Priority: Major
>  Labels: performance, ramp-up
>
> This case works as expected. There is no no hash exchange before 
> sort+analytic:
> {code}
> explain select /* +straight_join */ count(*) over (partition by t1.id)
> from functional.alltypes t1
> inner join /* +shuffle */ functional.alltypes t2
>   on t1.id = t2.id
> +---+
> | Explain String|
> +---+
> | Estimated Per-Host Requirements: Memory=168.01MB VCores=2 |
> |   |
> | PLAN-ROOT SINK|
> | | |
> | 07:EXCHANGE [UNPARTITIONED]   |
> | | |
> | 04:ANALYTIC   |
> | |  functions: count(*)|
> | |  partition by: t1.id|
> | | |
> | 03:SORT   |
> | |  order by: id ASC NULLS FIRST   |
> | | |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED]|
> | |  hash predicates: t1.id = t2.id |
> | |  runtime filters: RF000 <- t2.id|
> | | |
> | |--06:EXCHANGE [HASH(t2.id)]  |
> | |  |  |
> | |  01:SCAN HDFS [functional.alltypes t2]  |
> | | partitions=24/24 files=24 size=478.45KB |
> | | |
> | 05:EXCHANGE [HASH(t1.id)] |
> | | |
> | 00:SCAN HDFS [functional.alltypes t1] |
> |partitions=24/24 files=24 size=478.45KB|
> |runtime filters: RF000 -> t1.id|
> +---+
> {code}
> This equivalent case has an unnecessary hash exchange:
> {code}
> explain select /* +straight_join */ count(*) over (partition by t2.id)
> from functional.alltypes t1
> inner join /* +shuffle */ functional.alltypes t2
>   on t1.id = t2.id
> +---+
> | Explain String|
> +---+
> | Estimated Per-Host Requirements: Memory=168.01MB VCores=3 |
> |   |
> | PLAN-ROOT SINK|
> | | |
> | 08:EXCHANGE [UNPARTITIONED]

[jira] [Commented] (IMPALA-10503) testdata load hits hive memory limit errors during hive inserts

2021-02-12 Thread Kurt Deschler (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283861#comment-17283861
 ] 

Kurt Deschler commented on IMPALA-10503:


[http://gerrit.cloudera.org:8080/17061]

> testdata load hits hive memory limit errors during hive inserts
> ---
>
> Key: IMPALA-10503
> URL: https://issues.apache.org/jira/browse/IMPALA-10503
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Major
>
> Hit these memory errors running the following on a 32GB host:
> {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed 
> on request. Exit code is 143}}
>  {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}}
>  {{]], TaskAttempt 1 failed, info=[Container 
> container_1600192631322_0036_01_06 finished with diagnostics set to 
> [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container 
> [pid=24715,containerID=container_1600192631322_0036_01_06] is running 
> 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
> physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing 
> container.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10504) Add tracing for remote block reads

2021-02-12 Thread Kurt Deschler (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283862#comment-17283862
 ] 

Kurt Deschler commented on IMPALA-10504:


[http://gerrit.cloudera.org:8080/17062]

> Add tracing for remote block reads
> --
>
> Key: IMPALA-10504
> URL: https://issues.apache.org/jira/browse/IMPALA-10504
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Major
>
> While chasing performance issues, there were a large number of remote block 
> read messages in the logs. Need tracing to track down the source of these. 
> {noformat}
> Errors: Read 3.07 GB of data across network that was expected to be local.
> Block locality metadata for table 'tpcds_600_parquet.store_sales' may be 
> stale.
> This only affects query performance and not result correctness.
> One of the common causes for this warning is HDFS rebalancer moving some of 
> the file's blocks.
> If the issue persists, consider running "INVALIDATE METADATA 
> `tpcds_600_parquet`.`store_sales`"{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10504) Add tracing for remote block reads

2021-02-12 Thread Kurt Deschler (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Deschler updated IMPALA-10504:
---
Description: 
While chasing performance issues, there were a large number of remote block 
read messages in the logs. Need tracing to track down the source of these. 
{noformat}
Errors: Read 3.07 GB of data across network that was expected to be local.
Block locality metadata for table 'tpcds_600_parquet.store_sales' may be stale.
This only affects query performance and not result correctness.
One of the common causes for this warning is HDFS rebalancer moving some of the 
file's blocks.
If the issue persists, consider running "INVALIDATE METADATA 
`tpcds_600_parquet`.`store_sales`"{noformat}
 

  was:
http://gerrit.cloudera.org:8080/17062

While chasing performance issues, there were a large number of remote block 
read messages in the logs. Need tracing to track down the source of these. 
{noformat}
Errors: Read 3.07 GB of data across network that was expected to be local.
Block locality metadata for table 'tpcds_600_parquet.store_sales' may be stale.
This only affects query performance and not result correctness.
One of the common causes for this warning is HDFS rebalancer moving some of the 
file's blocks.
If the issue persists, consider running "INVALIDATE METADATA 
`tpcds_600_parquet`.`store_sales`"{noformat}
 


> Add tracing for remote block reads
> --
>
> Key: IMPALA-10504
> URL: https://issues.apache.org/jira/browse/IMPALA-10504
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Major
>
> While chasing performance issues, there were a large number of remote block 
> read messages in the logs. Need tracing to track down the source of these. 
> {noformat}
> Errors: Read 3.07 GB of data across network that was expected to be local.
> Block locality metadata for table 'tpcds_600_parquet.store_sales' may be 
> stale.
> This only affects query performance and not result correctness.
> One of the common causes for this warning is HDFS rebalancer moving some of 
> the file's blocks.
> If the issue persists, consider running "INVALIDATE METADATA 
> `tpcds_600_parquet`.`store_sales`"{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10504) Add tracing for remote block reads

2021-02-12 Thread Kurt Deschler (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Deschler updated IMPALA-10504:
---
Description: 
http://gerrit.cloudera.org:8080/17062

While chasing performance issues, there were a large number of remote block 
read messages in the logs. Need tracing to track down the source of these. 
{noformat}
Errors: Read 3.07 GB of data across network that was expected to be local.
Block locality metadata for table 'tpcds_600_parquet.store_sales' may be stale.
This only affects query performance and not result correctness.
One of the common causes for this warning is HDFS rebalancer moving some of the 
file's blocks.
If the issue persists, consider running "INVALIDATE METADATA 
`tpcds_600_parquet`.`store_sales`"{noformat}
 

  was:
While chasing performance issues, there were a large number of remote block 
read messages in the logs. Need tracing to track down the source of these. 
{noformat}
Errors: Read 3.07 GB of data across network that was expected to be local.
Block locality metadata for table 'tpcds_600_parquet.store_sales' may be stale.
This only affects query performance and not result correctness.
One of the common causes for this warning is HDFS rebalancer moving some of the 
file's blocks.
If the issue persists, consider running "INVALIDATE METADATA 
`tpcds_600_parquet`.`store_sales`"{noformat}
 


> Add tracing for remote block reads
> --
>
> Key: IMPALA-10504
> URL: https://issues.apache.org/jira/browse/IMPALA-10504
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Major
>
> http://gerrit.cloudera.org:8080/17062
> While chasing performance issues, there were a large number of remote block 
> read messages in the logs. Need tracing to track down the source of these. 
> {noformat}
> Errors: Read 3.07 GB of data across network that was expected to be local.
> Block locality metadata for table 'tpcds_600_parquet.store_sales' may be 
> stale.
> This only affects query performance and not result correctness.
> One of the common causes for this warning is HDFS rebalancer moving some of 
> the file's blocks.
> If the issue persists, consider running "INVALIDATE METADATA 
> `tpcds_600_parquet`.`store_sales`"{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10503) testdata load hits hive memory limit errors during hive inserts

2021-02-12 Thread Kurt Deschler (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Deschler updated IMPALA-10503:
---
Description: 
Hit these memory errors running the following on a 32GB host:

{{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed on 
request. Exit code is 143}}
 {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}}
 {{]], TaskAttempt 1 failed, info=[Container 
container_1600192631322_0036_01_06 finished with diagnostics set to 
[Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container 
[pid=24715,containerID=container_1600192631322_0036_01_06] is running 
14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.}}

  was:
http://gerrit.cloudera.org:8080/17061

Hit these memory errors running the following on a 32GB host:

{{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed on 
request. Exit code is 143}}
 {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}}
 {{]], TaskAttempt 1 failed, info=[Container 
container_1600192631322_0036_01_06 finished with diagnostics set to 
[Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container 
[pid=24715,containerID=container_1600192631322_0036_01_06] is running 
14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.}}


> testdata load hits hive memory limit errors during hive inserts
> ---
>
> Key: IMPALA-10503
> URL: https://issues.apache.org/jira/browse/IMPALA-10503
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Major
>
> Hit these memory errors running the following on a 32GB host:
> {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed 
> on request. Exit code is 143}}
>  {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}}
>  {{]], TaskAttempt 1 failed, info=[Container 
> container_1600192631322_0036_01_06 finished with diagnostics set to 
> [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container 
> [pid=24715,containerID=container_1600192631322_0036_01_06] is running 
> 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
> physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing 
> container.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10503) testdata load hits hive memory limit errors during hive inserts

2021-02-12 Thread Kurt Deschler (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Deschler updated IMPALA-10503:
---
Description: 
http://gerrit.cloudera.org:8080/17061

Hit these memory errors running the following on a 32GB host:

{{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed on 
request. Exit code is 143}}
 {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}}
 {{]], TaskAttempt 1 failed, info=[Container 
container_1600192631322_0036_01_06 finished with diagnostics set to 
[Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container 
[pid=24715,containerID=container_1600192631322_0036_01_06] is running 
14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.}}

  was:
Hit these memory errors running the following on a 32GB host:

{{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed on 
request. Exit code is 143}}
 {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}}
 {{]], TaskAttempt 1 failed, info=[Container 
container_1600192631322_0036_01_06 finished with diagnostics set to 
[Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container 
[pid=24715,containerID=container_1600192631322_0036_01_06] is running 
14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.}}


> testdata load hits hive memory limit errors during hive inserts
> ---
>
> Key: IMPALA-10503
> URL: https://issues.apache.org/jira/browse/IMPALA-10503
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Major
>
> http://gerrit.cloudera.org:8080/17061
> Hit these memory errors running the following on a 32GB host:
> {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed 
> on request. Exit code is 143}}
>  {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}}
>  {{]], TaskAttempt 1 failed, info=[Container 
> container_1600192631322_0036_01_06 finished with diagnostics set to 
> [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container 
> [pid=24715,containerID=container_1600192631322_0036_01_06] is running 
> 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
> physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing 
> container.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10504) Add tracing for remote block reads

2021-02-12 Thread Kurt Deschler (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10504 started by Kurt Deschler.
--
> Add tracing for remote block reads
> --
>
> Key: IMPALA-10504
> URL: https://issues.apache.org/jira/browse/IMPALA-10504
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Major
>
> While chasing performance issues, there were a large number of remote block 
> read messages in the logs. Need tracing to track down the source of these. 
> {noformat}
> Errors: Read 3.07 GB of data across network that was expected to be local.
> Block locality metadata for table 'tpcds_600_parquet.store_sales' may be 
> stale.
> This only affects query performance and not result correctness.
> One of the common causes for this warning is HDFS rebalancer moving some of 
> the file's blocks.
> If the issue persists, consider running "INVALIDATE METADATA 
> `tpcds_600_parquet`.`store_sales`"{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10504) Add tracing for remote block reads

2021-02-12 Thread Kurt Deschler (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Deschler reassigned IMPALA-10504:
--

Assignee: Kurt Deschler

> Add tracing for remote block reads
> --
>
> Key: IMPALA-10504
> URL: https://issues.apache.org/jira/browse/IMPALA-10504
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Major
>
> While chasing performance issues, there were a large number of remote block 
> read messages in the logs. Need tracing to track down the source of these. 
> {noformat}
> Errors: Read 3.07 GB of data across network that was expected to be local.
> Block locality metadata for table 'tpcds_600_parquet.store_sales' may be 
> stale.
> This only affects query performance and not result correctness.
> One of the common causes for this warning is HDFS rebalancer moving some of 
> the file's blocks.
> If the issue persists, consider running "INVALIDATE METADATA 
> `tpcds_600_parquet`.`store_sales`"{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10504) Add tracing for remote block reads

2021-02-12 Thread Kurt Deschler (Jira)
Kurt Deschler created IMPALA-10504:
--

 Summary: Add tracing for remote block reads
 Key: IMPALA-10504
 URL: https://issues.apache.org/jira/browse/IMPALA-10504
 Project: IMPALA
  Issue Type: Improvement
Reporter: Kurt Deschler


While chasing performance issues, there were a large number of remote block 
read messages in the logs. Need tracing to track down the source of these. 
{noformat}
Errors: Read 3.07 GB of data across network that was expected to be local.
Block locality metadata for table 'tpcds_600_parquet.store_sales' may be stale.
This only affects query performance and not result correctness.
One of the common causes for this warning is HDFS rebalancer moving some of the 
file's blocks.
If the issue persists, consider running "INVALIDATE METADATA 
`tpcds_600_parquet`.`store_sales`"{noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10503) testdata load hits hive memory limit errors during hive inserts

2021-02-12 Thread Kurt Deschler (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10503 started by Kurt Deschler.
--
> testdata load hits hive memory limit errors during hive inserts
> ---
>
> Key: IMPALA-10503
> URL: https://issues.apache.org/jira/browse/IMPALA-10503
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Kurt Deschler
>Assignee: Kurt Deschler
>Priority: Major
>
> Hit these memory errors running the following on a 32GB host:
> {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed 
> on request. Exit code is 143}}
>  {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}}
>  {{]], TaskAttempt 1 failed, info=[Container 
> container_1600192631322_0036_01_06 finished with diagnostics set to 
> [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container 
> [pid=24715,containerID=container_1600192631322_0036_01_06] is running 
> 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
> physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing 
> container.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10503) testdata load hits hive memory limit errors during hive inserts

2021-02-12 Thread Kurt Deschler (Jira)
Kurt Deschler created IMPALA-10503:
--

 Summary: testdata load hits hive memory limit errors during hive 
inserts
 Key: IMPALA-10503
 URL: https://issues.apache.org/jira/browse/IMPALA-10503
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.4.0
Reporter: Kurt Deschler
Assignee: Kurt Deschler


Hit these memory errors running the following on a 32GB host:

{{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed on 
request. Exit code is 143}}
 {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}}
 {{]], TaskAttempt 1 failed, info=[Container 
container_1600192631322_0036_01_06 finished with diagnostics set to 
[Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container 
[pid=24715,containerID=container_1600192631322_0036_01_06] is running 
14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10502) delayed 'Invalidated objects in cache' cause 'Table already exists'

2021-02-12 Thread Adriano (Jira)
Adriano created IMPALA-10502:


 Summary: delayed 'Invalidated objects in cache' cause 'Table 
already exists'
 Key: IMPALA-10502
 URL: https://issues.apache.org/jira/browse/IMPALA-10502
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog, Clients, Frontend
Affects Versions: Impala 3.4.0
Reporter: Adriano


In fast paced environment where the interval between the step 1 and 2 is # < 
100ms (a simplified pipeline looks like):

0- catalog 'on demand' in use and disableHmsSync (enabled or disabled: no 
difference)
1- open session to coord A -> DROP TABLE X -> close session
2- open session to coord A -> CREATE TABLE X-> close session

Results: the step -2- can fail with table already exist.

During the internal investigation was discovered that IMPALA-9913 will regress 
the issue in almost all scenarios.
However considering that the investigation are internally ongoing it is nice to 
have the event tracked also here.
Once we are sure that IMPALA-9913 fix these events we can close this as 
duplicate, in alternative carry on the investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org