[jira] [Updated] (IMPALA-10505) Avoid creating misleading audit logs when a requesting user does not have privileges on the underlying tables of a view
[ https://issues.apache.org/jira/browse/IMPALA-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-10505: - Description: We found that misleading audit logs could be generated in Impala if a requesting user granted the privileges on a view does not have the privileges on the table(s) on which the view is based. Such an issue could be reproduced as follows. # Start an authorization-enabled Impala cluster. # As the user {{admin}}, execute "{{CREATE VIEW default.v_functional_alltypestiny AS SELECT id, bool_col FROM functional.alltypestiny;}}". # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE default.v_functional_alltypestiny TO USER non_owner;}}". # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}". # Add a break point at [RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122] to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the following statement. # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM default.v_functional_alltypestiny;}}" We will find that only 1 {{AuthzAuditEvent}} was logged. Specifically, the field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of '{{accessResult}}' is 0, indicating this is a failed authorization for the underlying table of the view. But actually the user '{{non_owner}}' is and should be allowed to execute the statement since it was granted the privilege on the view. Therefore, we should remove such a confusing log entry and also retain the audit log entry corresponding to the privilege check for the view, i.e., {{default.v_functional_alltypestiny}}. I have the following findings after an initial investigation. Under the hood Impala performed 2 privileges checks. One for the view and the other for the table on which the view is based. Since the user has been granted the {{SELECT}} privilege on the view, the first privilege check would succeed, whereas the second privilege check would fail since the user does not have the {{SELECT}} privilege on the underlying table. Each privilege check resulted in one audit log entry generated by the Ranger server. Thus the first audit log entry would be a successful audit event because it corresponds to the privilege check for the view. However, the second privilege check resulted in a failed audit event since it corresponds to the privilege check for the underlying table and the requesting user does not have the {{SELECT}} privilege on the table. Impala performed the 2nd check for a reason. In short, the requesting user is not allowed to access the runtime profile if the user does not have the privileges on the underlying table(s). Refer to [BaseAuthorizationChecker#authorize()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L175-L190] for further details. On the other hand, for a list of audit events resulting from a query, if there exists a failed audit event, only the first failed audit event would be kept by Impala and then sent to Ranger. That is the reason why in the end we only saw that failed audit event. was: We found that misleading audit logs could be generated in Impala if a requesting user granted the privileges on a view does not have the privileges on the table(s) on which the view is based. Such an issue could be reproduced as follows. # Start an authorization-enabled Impala cluster. # As the user {{admin}}, execute "{{CREATE VIEW default.v_functional_alltypestiny AS SELECT id, bool_col FROM functional.alltypestiny;}}". # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE default.v_functional_alltypestiny TO USER non_owner;}}". # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}". # Add a break point at [RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122] to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the following statement. # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM default.v_functional_alltypestiny;}}" We will find that only 1 {{AuthzZuditEvent}} was logged. Specifically, the field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of '{{accessResult}}' is 0, indicating this is a failed authorization for the underlying table of the view. But actually the user '{{non_owner}}' is and should be allowed to execute the statement since it was granted the privilege on the view. Therefore, we should remove such a confusing log entry and also retain the audit log entry corresponding to the privilege check for the
[jira] [Created] (IMPALA-10506) Check if Impala LZ4 has same bug as ARROW-11301
Tim Armstrong created IMPALA-10506: -- Summary: Check if Impala LZ4 has same bug as ARROW-11301 Key: IMPALA-10506 URL: https://issues.apache.org/jira/browse/IMPALA-10506 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Tim Armstrong Assignee: Csaba Ringhofer I noticed ARROW-11301 in the context of a Parquet discussion (https://github.com/apache/parquet-format/pull/164/files/2dfe463c948948f7d9624bee3cdd4706eb3488b5#diff-a1727652430ce24c121536393f2ece63c5799a99583738f48aa8bb9fa71cb3f8) and wondered if Impala has made the same mistake CC [~arawat] [~csringhofer] [~boroknagyz] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10505) Avoid creating misleading audit logs when a requesting user does not have privileges on the underlying tables of a view
[ https://issues.apache.org/jira/browse/IMPALA-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-10505: - Description: We found that misleading audit logs could be generated in Impala if a requesting user granted the privileges on a view does not have the privileges on the table(s) on which the view is based. Such an issue could be reproduced as follows. # Start an authorization-enabled Impala cluster. # As the user {{admin}}, execute "{{CREATE VIEW default.v_functional_alltypestiny AS SELECT id, bool_col FROM functional.alltypestiny;}}". # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE default.v_functional_alltypestiny TO USER non_owner;}}". # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}". # Add a break point at [RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122] to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the following statement. # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM default.v_functional_alltypestiny;}}" We will find that only 1 {{AuthzZuditEvent}} was logged. Specifically, the field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of '{{accessResult}}' is 0, indicating this is a failed authorization for the underlying table of the view. But actually the user '{{non_owner}}' is and should be allowed to execute the statement since it was granted the privilege on the view. Therefore, we should remove such a confusing log entry and also retain the audit log entry corresponding to the privilege check for the view, i.e., {{default.v_functional_alltypestiny}}. I have the following findings after an initial investigation. Under the hood Impala performed 2 privileges checks. One for the view and the other for the table on which the view is based. Since the user has been granted the {{SELECT}} privilege on the view, the first privilege check would succeed, whereas the second privilege check would fail since the user does not have the {{SELECT}} privilege on the underlying table. Each privilege check resulted in one audit log entry generated by the Ranger server. Thus the first audit log entry would be a successful audit event because it corresponds to the privilege check for the view. However, the second privilege check resulted in a failed audit event since it corresponds to the privilege check for the underlying table and the requesting user does not have the {{SELECT}} privilege on the table. Impala performed the 2nd check for a reason. In short, the requesting user is not allowed to access the runtime profile if the user does not have the privileges on the underlying table(s). Refer to [BaseAuthorizationChecker#authorize()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L175-L190] for further details. On the other hand, for a list of audit events resulting from a query, if there exists a failed audit event, only the first failed audit event would be kept by Impala and then sent to Ranger. That is the reason why in the end we only saw that failed audit event. was: We found that misleading audit logs could be generated in Impala if a requesting user granted the privileges on a view does not have the privileges on the table(s) on which the view is based. Such an issue could be reproduced as follows. # Start an authorization-enabled Impala cluster. # As the user {{admin}}, execute "{{CREATE VIEW default.v_functional_alltypestiny AS SELECT id, bool_col FROM functional.alltypestiny;}}". # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE default.v_functional_alltypestiny TO USER non_owner;}}". # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}". # Add a break point at [RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122] to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the following statement. # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM default.v_functional_alltypestiny;}}" We will find that only 1 AuthzZuditEvent was logged. Specifically, the field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of '{{accessResult}}' is 0, indicating this is a failed authorization for the underlying table of the view. But actually the user '{{non_owner}}' is and should be allowed to execute the statement since it was granted the privilege on the view. Therefore, we should remove such a confusing log entry and also retain the audit log entry corresponding to the privilege check for the
[jira] [Updated] (IMPALA-10505) Avoid creating misleading audit logs when a requesting user does not have privileges on the underlying tables of a view
[ https://issues.apache.org/jira/browse/IMPALA-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao updated IMPALA-10505: - Description: We found that misleading audit logs could be generated in Impala if a requesting user granted the privileges on a view does not have the privileges on the table(s) on which the view is based. Such an issue could be reproduced as follows. # Start an authorization-enabled Impala cluster. # As the user {{admin}}, execute "{{CREATE VIEW default.v_functional_alltypestiny AS SELECT id, bool_col FROM functional.alltypestiny;}}". # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE default.v_functional_alltypestiny TO USER non_owner;}}". # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}". # Add a break point at [RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122] to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the following statement. # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM default.v_functional_alltypestiny;}}" We will find that only 1 AuthzZuditEvent was logged. Specifically, the field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of '{{accessResult}}' is 0, indicating this is a failed authorization for the underlying table of the view. But actually the user '{{non_owner}}' is and should be allowed to execute the statement since it was granted the privilege on the view. Therefore, we should remove such a confusing log entry and also retain the audit log entry corresponding to the privilege check for the view, i.e., {{default.v_functional_alltypestiny}}. I have the following findings after an initial investigation. Under the hood Impala performed 2 privileges checks. One for the view and the other for the table on which the view is based. Since the user has been granted the {{SELECT}} privilege on the view, the first privilege check would succeed, whereas the second privilege check would fail since the user does not have the {{SELECT}} privilege on the underlying table. Each privilege check resulted in one audit log entry generated by the Ranger server. Thus the first audit log entry would be a successful audit event because it corresponds to the privilege check for the view. However, the second privilege check resulted in a failed audit event since it corresponds to the privilege check for the underlying table and the requesting user does not have the {{SELECT}} privilege on the table. Impala performed the 2nd check for a reason. In short, the requesting user is not allowed to access the runtime profile if the user does not have the privileges on the underlying table(s). Refer to [BaseAuthorizationChecker#authorize()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L175-L190] for further details. On the other hand, for a list of audit events resulting from a query, if there exists a failed audit event, only the first failed audit event would be kept by Impala and then sent to Ranger. That is the reason why in the end we only saw that failed audit event. was: We found that misleading audit logs could be generated in Impala if a requesting user granted the privileges on a view does not have the privileges on the table(s) on which the view is based. Such an issue could be reproduced as follows. # Start an authorization-enabled Impala cluster. # As the user {{admin}}, execute "{{CREATE VIEW default.v_functional_alltypestiny AS SELECT id, bool_col FROM functional.alltypestiny;}}". # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE default.v_functional_alltypestiny TO USER non_owner;}}". # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}". # Add a break point at [RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122] to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the following statement. # As the user {{non_owner}}, execute "{{SELECT COUNT(*) FROM default.v_functional_alltypestiny;}}" We will find that only 1 AuthzZuditEvent was logged. Specifically, the field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of '{{accessResult}}' is 0, indicating this is a failed authorization for the underlying table of the view. But actually the user '{{non_owner}}' is and should be allowed to execute the statement since it was granted the privilege on the view. Therefore, we should remove such a confusing log entry and also retain the audit log entry corresponding to the privilege check for the view,
[jira] [Created] (IMPALA-10505) Avoid creating misleading audit logs when a requesting user does not have privileges on the underlying tables of a view
Fang-Yu Rao created IMPALA-10505: Summary: Avoid creating misleading audit logs when a requesting user does not have privileges on the underlying tables of a view Key: IMPALA-10505 URL: https://issues.apache.org/jira/browse/IMPALA-10505 Project: IMPALA Issue Type: Bug Reporter: Fang-Yu Rao Assignee: Fang-Yu Rao We found that misleading audit logs could be generated in Impala if a requesting user granted the privileges on a view does not have the privileges on the table(s) on which the view is based. Such an issue could be reproduced as follows. # Start an authorization-enabled Impala cluster. # As the user {{admin}}, execute "{{CREATE VIEW default.v_functional_alltypestiny AS SELECT id, bool_col FROM functional.alltypestiny;}}". # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE default.v_functional_alltypestiny TO USER non_owner;}}". # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}". # Add a break point at [RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122] to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the following statement. # As the user {{non_owner}}, execute "{{SELECT COUNT(*) FROM default.v_functional_alltypestiny;}}" We will find that only 1 AuthzZuditEvent was logged. Specifically, the field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of '{{accessResult}}' is 0, indicating this is a failed authorization for the underlying table of the view. But actually the user '{{non_owner}}' is and should be allowed to execute the statement since it was granted the privilege on the view. Therefore, we should remove such a confusing log entry and also retain the audit log entry corresponding to the privilege check for the view, i.e., {{default.v_functional_alltypestiny}}. I have the following findings after an initial investigation. Under the hood Impala performed 2 privileges checks. One for the view and the other for the table on which the view is based. Since the user has been granted the {{SELECT}} privilege on the view, the first privilege check would succeed, whereas the second privilege check would fail since the user does not have the {{SELECT}} privilege on the underlying table. Each privilege check resulted in one audit log entry generated by the Ranger server. Thus the first audit log entry would be a successful audit event because it corresponds to the privilege check for the view. However, the second privilege check resulted in a failed audit event since it corresponds to the privilege check for the underlying table and the requesting user does not have the {{SELECT}} privilege on the table. Impala performed the 2nd check for a reason. In short, the requesting user is not allowed to access the runtime profile if the user does not have the privileges on the underlying table(s). Refer to [BaseAuthorizationChecker#authorize()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L175-L190] for further details. On the other hand, for a list of audit events resulting from a query, if there exists a failed audit event, only the first failed audit event would be kept by Impala and then sent to Ranger. That is the reason why in the end we only saw that failed audit event. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9745) SELECT from view fails with "AnalysisException: No matching function with signature: to_timestamp(TIMESTAMP, STRING)" after expression rewrite.
[ https://issues.apache.org/jira/browse/IMPALA-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9745 started by Aman Sinha. -- > SELECT from view fails with "AnalysisException: No matching function with > signature: to_timestamp(TIMESTAMP, STRING)" after expression rewrite. > --- > > Key: IMPALA-9745 > URL: https://issues.apache.org/jira/browse/IMPALA-9745 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.11.0, Impala 4.0 >Reporter: Andrew Sherman >Assignee: Aman Sinha >Priority: Critical > > Simple test case > {code} > drop view if exists test_replication_view; > drop table if exists test_replication; > create table test_replication(cob string); > insert into test_replication values('2018-06-07'); > insert into test_replication values('2018-06-07'); > insert into test_replication values('2018-06-07'); > insert into test_replication values('2018-06-08'); > select * from test_replication; > create view test_replication_view as select to_timestamp(cob, '-MM-dd') > cob_ts,cob trade_date from test_replication; > select 1 from test_replication_view deal WHERE trade_date = deal.cob_ts AND > deal.cob_ts = '2018-06-07'; > {code} > The problem seems to be that after expression rewrite the type of cob has > become a timestamp and so we look for the function "to_timestamp(TIMESTAMP, > STRING)" instead of "to_timestamp(STRING, STRING)". > A workaround is to run with > {code} > set enable_expr_rewrites=false; > {code} > For comparison a similar query runs OK in mysql > {code} > drop view if exists test_replication_view; > drop table if exists test_replication; > create table test_replication(cob varchar(255)); > insert into test_replication values('2018-06-07'); > insert into test_replication values('2018-06-07'); > insert into test_replication values('2018-06-07'); > insert into test_replication values('2018-06-08'); > select * from test_replication; > create view test_replication_view as select str_to_date(cob, '%Y-%m-%d') > cob_ts,cob trade_date from test_replication; > select 1 from test_replication_view deal WHERE trade_date = deal.cob_ts AND > deal.cob_ts = '2018-06-07' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-4805) Avoid hash exchanges before analytic functions in more situations.
[ https://issues.apache.org/jira/browse/IMPALA-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha resolved IMPALA-4805. Fix Version/s: Impala 4.0 Resolution: Fixed > Avoid hash exchanges before analytic functions in more situations. > -- > > Key: IMPALA-4805 > URL: https://issues.apache.org/jira/browse/IMPALA-4805 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.8.0 >Reporter: Alexander Behm >Assignee: Aman Sinha >Priority: Major > Labels: performance, ramp-up > Fix For: Impala 4.0 > > > This case works as expected. There is no no hash exchange before > sort+analytic: > {code} > explain select /* +straight_join */ count(*) over (partition by t1.id) > from functional.alltypes t1 > inner join /* +shuffle */ functional.alltypes t2 > on t1.id = t2.id > +---+ > | Explain String| > +---+ > | Estimated Per-Host Requirements: Memory=168.01MB VCores=2 | > | | > | PLAN-ROOT SINK| > | | | > | 07:EXCHANGE [UNPARTITIONED] | > | | | > | 04:ANALYTIC | > | | functions: count(*)| > | | partition by: t1.id| > | | | > | 03:SORT | > | | order by: id ASC NULLS FIRST | > | | | > | 02:HASH JOIN [INNER JOIN, PARTITIONED]| > | | hash predicates: t1.id = t2.id | > | | runtime filters: RF000 <- t2.id| > | | | > | |--06:EXCHANGE [HASH(t2.id)] | > | | | | > | | 01:SCAN HDFS [functional.alltypes t2] | > | | partitions=24/24 files=24 size=478.45KB | > | | | > | 05:EXCHANGE [HASH(t1.id)] | > | | | > | 00:SCAN HDFS [functional.alltypes t1] | > |partitions=24/24 files=24 size=478.45KB| > |runtime filters: RF000 -> t1.id| > +---+ > {code} > This equivalent case has an unnecessary hash exchange: > {code} > explain select /* +straight_join */ count(*) over (partition by t2.id) > from functional.alltypes t1 > inner join /* +shuffle */ functional.alltypes t2 > on t1.id = t2.id > +---+ > | Explain String| > +---+ > | Estimated Per-Host Requirements: Memory=168.01MB VCores=3 | > | | > | PLAN-ROOT SINK| > | | | > | 08:EXCHANGE [UNPARTITIONED] | > | | | > | 04:ANALYTIC | > | | functions: count(*)| > | | partition by: t2.id| > | | | > | 03:SORT | > | | order by: id ASC NULLS FIRST | > | | | > | 07:EXCHANGE [HASH(t2.id)] | > | | | > | 02:HASH JOIN [INNER JOIN, PARTITIONED]| > | | hash predicates: t1.id = t2.id | > | | runtime filters: RF000 <- t2.id| > | | | > | |--06:EXCHANGE [HASH(t2.id)] | > | | | | > | | 01:SCAN HDFS [functional.alltypes t2] | > | | partitions=24/24 files=24 size=478.45KB | > | |
[jira] [Commented] (IMPALA-10501) Hit DCHECK in parquet-column-readers.cc: def_levels_.CacheRemaining() <= num_buffered_values_
[ https://issues.apache.org/jira/browse/IMPALA-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283868#comment-17283868 ] Zoltán Borók-Nagy commented on IMPALA-10501: The failing query was (from impalad.c34abf949077.invalid-user.log.INFO.20210211-023758.1): {noformat} I0211 03:55:22.140455 102573 Frontend.java:1587] be46bb72819942fd:85934edd] Analyzing query: select l_shipmode, o_orderpriority, count(*) from tpch_nested_parquet.customer.c_orders o, o.o_lineitems l where l_receiptdate < '1992-01-10'{noformat} from test *test_parquet_stats.py::TestParquetStats::test_page_index* It's hard to tell what went wrong without the data file. I'm planning to run load_nested and the above test in a loop, that'll hopefully reproduce this issue. > Hit DCHECK in parquet-column-readers.cc: def_levels_.CacheRemaining() <= > num_buffered_values_ > -- > > Key: IMPALA-10501 > URL: https://issues.apache.org/jira/browse/IMPALA-10501 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Tim Armstrong >Assignee: Zoltán Borók-Nagy >Priority: Blocker > Labels: broken-build, crash, flaky, parquet > Attachments: consoleText.3.gz, impalad_coord_exec-0.tar.gz > > > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3814/ > {noformat} > F0211 03:55:26.383247 14487 parquet-column-readers.cc:517] > be46bb72819942fd:85934edd0001] Check failed: def_levels_.CacheRemaining() > <= num_buffered_values_ (921 vs. 916) > *** Check failure stack trace: *** > @ 0x53646ec google::LogMessage::Fail() > @ 0x5365fdc google::LogMessage::SendToLog() > @ 0x536404a google::LogMessage::Flush() > @ 0x5367c48 google::LogMessageFatal::~LogMessageFatal() > @ 0x2ff886f > impala::ScalarColumnReader<>::MaterializeValueBatch<>() > @ 0x2f8ae44 > impala::ScalarColumnReader<>::MaterializeValueBatch<>() > @ 0x2f761bf impala::ScalarColumnReader<>::ReadValueBatch<>() > @ 0x2f2889a impala::ScalarColumnReader<>::ReadValueBatch() > @ 0x2ebd8c0 impala::HdfsParquetScanner::AssembleRows() > @ 0x2eb882e impala::HdfsParquetScanner::GetNextInternal() > @ 0x2eb67bd impala::HdfsParquetScanner::ProcessSplit() > @ 0x2aaf3f2 impala::HdfsScanNode::ProcessSplit() > @ 0x2aae773 impala::HdfsScanNode::ScannerThread() > @ 0x2aadadb > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x2aafe94 > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x220e331 boost::function0<>::operator()() > @ 0x2842e7f impala::Thread::SuperviseThread() > @ 0x284ae1c boost::_bi::list5<>::operator()<>() > @ 0x284ad40 boost::_bi::bind_t<>::operator()() > @ 0x284ad01 boost::detail::thread_data<>::run() > @ 0x406b291 thread_proxy > @ 0x7f2465cba6b9 start_thread > @ 0x7f24627e64dc clone > rImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) > at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:866) > {noformat} > It was likely a fuzz test: > {noformat} > 19:55:23 > query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit: > 50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > 19:55:23 [gw5] PASSED > query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit: > 50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1,
[jira] [Commented] (IMPALA-4805) Avoid hash exchanges before analytic functions in more situations.
[ https://issues.apache.org/jira/browse/IMPALA-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283863#comment-17283863 ] ASF subversion and git services commented on IMPALA-4805: - Commit 4721978e8fb6d80a9f023e568b983b12b14f8acc in impala's branch refs/heads/master from Aman Sinha [ https://gitbox.apache.org/repos/asf?p=impala.git;h=4721978 ] IMPALA-4805: Avoid hash exchange before analytic function if appropriate This patch avoids adding a hash exchange below an analytic function that has partition by b as long as the child can satisfy that requirement through an equivalence relationship .. i.e an exact match is not required. For example: select count(*) over (partition by b) from t1, t2 where a = b In this case, the analytic sort has a required partitioning on b but the child is an inner join whose output partition key could be either 'a' or 'b' (it happens to be 'a' given how the data partition was populated), then we should still be able to use the child's partitioning without adding a hash exchange. Note that for outer joins the logic is slightly different. Testing: - Added a new planner test with analytic function + inner join (outer join test case already existed before). Change-Id: Icb6289d1e70cfb6bbd5b38eedb00856dbc85ac77 Reviewed-on: http://gerrit.cloudera.org:8080/16888 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Avoid hash exchanges before analytic functions in more situations. > -- > > Key: IMPALA-4805 > URL: https://issues.apache.org/jira/browse/IMPALA-4805 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.8.0 >Reporter: Alexander Behm >Assignee: Aman Sinha >Priority: Major > Labels: performance, ramp-up > > This case works as expected. There is no no hash exchange before > sort+analytic: > {code} > explain select /* +straight_join */ count(*) over (partition by t1.id) > from functional.alltypes t1 > inner join /* +shuffle */ functional.alltypes t2 > on t1.id = t2.id > +---+ > | Explain String| > +---+ > | Estimated Per-Host Requirements: Memory=168.01MB VCores=2 | > | | > | PLAN-ROOT SINK| > | | | > | 07:EXCHANGE [UNPARTITIONED] | > | | | > | 04:ANALYTIC | > | | functions: count(*)| > | | partition by: t1.id| > | | | > | 03:SORT | > | | order by: id ASC NULLS FIRST | > | | | > | 02:HASH JOIN [INNER JOIN, PARTITIONED]| > | | hash predicates: t1.id = t2.id | > | | runtime filters: RF000 <- t2.id| > | | | > | |--06:EXCHANGE [HASH(t2.id)] | > | | | | > | | 01:SCAN HDFS [functional.alltypes t2] | > | | partitions=24/24 files=24 size=478.45KB | > | | | > | 05:EXCHANGE [HASH(t1.id)] | > | | | > | 00:SCAN HDFS [functional.alltypes t1] | > |partitions=24/24 files=24 size=478.45KB| > |runtime filters: RF000 -> t1.id| > +---+ > {code} > This equivalent case has an unnecessary hash exchange: > {code} > explain select /* +straight_join */ count(*) over (partition by t2.id) > from functional.alltypes t1 > inner join /* +shuffle */ functional.alltypes t2 > on t1.id = t2.id > +---+ > | Explain String| > +---+ > | Estimated Per-Host Requirements: Memory=168.01MB VCores=3 | > | | > | PLAN-ROOT SINK| > | | | > | 08:EXCHANGE [UNPARTITIONED]
[jira] [Commented] (IMPALA-10503) testdata load hits hive memory limit errors during hive inserts
[ https://issues.apache.org/jira/browse/IMPALA-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283861#comment-17283861 ] Kurt Deschler commented on IMPALA-10503: [http://gerrit.cloudera.org:8080/17061] > testdata load hits hive memory limit errors during hive inserts > --- > > Key: IMPALA-10503 > URL: https://issues.apache.org/jira/browse/IMPALA-10503 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Kurt Deschler >Assignee: Kurt Deschler >Priority: Major > > Hit these memory errors running the following on a 32GB host: > {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed > on request. Exit code is 143}} > {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}} > {{]], TaskAttempt 1 failed, info=[Container > container_1600192631322_0036_01_06 finished with diagnostics set to > [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container > [pid=24715,containerID=container_1600192631322_0036_01_06] is running > 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB > physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing > container.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10504) Add tracing for remote block reads
[ https://issues.apache.org/jira/browse/IMPALA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283862#comment-17283862 ] Kurt Deschler commented on IMPALA-10504: [http://gerrit.cloudera.org:8080/17062] > Add tracing for remote block reads > -- > > Key: IMPALA-10504 > URL: https://issues.apache.org/jira/browse/IMPALA-10504 > Project: IMPALA > Issue Type: Improvement >Reporter: Kurt Deschler >Assignee: Kurt Deschler >Priority: Major > > While chasing performance issues, there were a large number of remote block > read messages in the logs. Need tracing to track down the source of these. > {noformat} > Errors: Read 3.07 GB of data across network that was expected to be local. > Block locality metadata for table 'tpcds_600_parquet.store_sales' may be > stale. > This only affects query performance and not result correctness. > One of the common causes for this warning is HDFS rebalancer moving some of > the file's blocks. > If the issue persists, consider running "INVALIDATE METADATA > `tpcds_600_parquet`.`store_sales`"{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10504) Add tracing for remote block reads
[ https://issues.apache.org/jira/browse/IMPALA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Deschler updated IMPALA-10504: --- Description: While chasing performance issues, there were a large number of remote block read messages in the logs. Need tracing to track down the source of these. {noformat} Errors: Read 3.07 GB of data across network that was expected to be local. Block locality metadata for table 'tpcds_600_parquet.store_sales' may be stale. This only affects query performance and not result correctness. One of the common causes for this warning is HDFS rebalancer moving some of the file's blocks. If the issue persists, consider running "INVALIDATE METADATA `tpcds_600_parquet`.`store_sales`"{noformat} was: http://gerrit.cloudera.org:8080/17062 While chasing performance issues, there were a large number of remote block read messages in the logs. Need tracing to track down the source of these. {noformat} Errors: Read 3.07 GB of data across network that was expected to be local. Block locality metadata for table 'tpcds_600_parquet.store_sales' may be stale. This only affects query performance and not result correctness. One of the common causes for this warning is HDFS rebalancer moving some of the file's blocks. If the issue persists, consider running "INVALIDATE METADATA `tpcds_600_parquet`.`store_sales`"{noformat} > Add tracing for remote block reads > -- > > Key: IMPALA-10504 > URL: https://issues.apache.org/jira/browse/IMPALA-10504 > Project: IMPALA > Issue Type: Improvement >Reporter: Kurt Deschler >Assignee: Kurt Deschler >Priority: Major > > While chasing performance issues, there were a large number of remote block > read messages in the logs. Need tracing to track down the source of these. > {noformat} > Errors: Read 3.07 GB of data across network that was expected to be local. > Block locality metadata for table 'tpcds_600_parquet.store_sales' may be > stale. > This only affects query performance and not result correctness. > One of the common causes for this warning is HDFS rebalancer moving some of > the file's blocks. > If the issue persists, consider running "INVALIDATE METADATA > `tpcds_600_parquet`.`store_sales`"{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10504) Add tracing for remote block reads
[ https://issues.apache.org/jira/browse/IMPALA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Deschler updated IMPALA-10504: --- Description: http://gerrit.cloudera.org:8080/17062 While chasing performance issues, there were a large number of remote block read messages in the logs. Need tracing to track down the source of these. {noformat} Errors: Read 3.07 GB of data across network that was expected to be local. Block locality metadata for table 'tpcds_600_parquet.store_sales' may be stale. This only affects query performance and not result correctness. One of the common causes for this warning is HDFS rebalancer moving some of the file's blocks. If the issue persists, consider running "INVALIDATE METADATA `tpcds_600_parquet`.`store_sales`"{noformat} was: While chasing performance issues, there were a large number of remote block read messages in the logs. Need tracing to track down the source of these. {noformat} Errors: Read 3.07 GB of data across network that was expected to be local. Block locality metadata for table 'tpcds_600_parquet.store_sales' may be stale. This only affects query performance and not result correctness. One of the common causes for this warning is HDFS rebalancer moving some of the file's blocks. If the issue persists, consider running "INVALIDATE METADATA `tpcds_600_parquet`.`store_sales`"{noformat} > Add tracing for remote block reads > -- > > Key: IMPALA-10504 > URL: https://issues.apache.org/jira/browse/IMPALA-10504 > Project: IMPALA > Issue Type: Improvement >Reporter: Kurt Deschler >Assignee: Kurt Deschler >Priority: Major > > http://gerrit.cloudera.org:8080/17062 > While chasing performance issues, there were a large number of remote block > read messages in the logs. Need tracing to track down the source of these. > {noformat} > Errors: Read 3.07 GB of data across network that was expected to be local. > Block locality metadata for table 'tpcds_600_parquet.store_sales' may be > stale. > This only affects query performance and not result correctness. > One of the common causes for this warning is HDFS rebalancer moving some of > the file's blocks. > If the issue persists, consider running "INVALIDATE METADATA > `tpcds_600_parquet`.`store_sales`"{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10503) testdata load hits hive memory limit errors during hive inserts
[ https://issues.apache.org/jira/browse/IMPALA-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Deschler updated IMPALA-10503: --- Description: Hit these memory errors running the following on a 32GB host: {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed on request. Exit code is 143}} {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}} {{]], TaskAttempt 1 failed, info=[Container container_1600192631322_0036_01_06 finished with diagnostics set to [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container [pid=24715,containerID=container_1600192631322_0036_01_06] is running 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.}} was: http://gerrit.cloudera.org:8080/17061 Hit these memory errors running the following on a 32GB host: {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed on request. Exit code is 143}} {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}} {{]], TaskAttempt 1 failed, info=[Container container_1600192631322_0036_01_06 finished with diagnostics set to [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container [pid=24715,containerID=container_1600192631322_0036_01_06] is running 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.}} > testdata load hits hive memory limit errors during hive inserts > --- > > Key: IMPALA-10503 > URL: https://issues.apache.org/jira/browse/IMPALA-10503 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Kurt Deschler >Assignee: Kurt Deschler >Priority: Major > > Hit these memory errors running the following on a 32GB host: > {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed > on request. Exit code is 143}} > {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}} > {{]], TaskAttempt 1 failed, info=[Container > container_1600192631322_0036_01_06 finished with diagnostics set to > [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container > [pid=24715,containerID=container_1600192631322_0036_01_06] is running > 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB > physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing > container.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10503) testdata load hits hive memory limit errors during hive inserts
[ https://issues.apache.org/jira/browse/IMPALA-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Deschler updated IMPALA-10503: --- Description: http://gerrit.cloudera.org:8080/17061 Hit these memory errors running the following on a 32GB host: {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed on request. Exit code is 143}} {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}} {{]], TaskAttempt 1 failed, info=[Container container_1600192631322_0036_01_06 finished with diagnostics set to [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container [pid=24715,containerID=container_1600192631322_0036_01_06] is running 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.}} was: Hit these memory errors running the following on a 32GB host: {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed on request. Exit code is 143}} {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}} {{]], TaskAttempt 1 failed, info=[Container container_1600192631322_0036_01_06 finished with diagnostics set to [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container [pid=24715,containerID=container_1600192631322_0036_01_06] is running 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.}} > testdata load hits hive memory limit errors during hive inserts > --- > > Key: IMPALA-10503 > URL: https://issues.apache.org/jira/browse/IMPALA-10503 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Kurt Deschler >Assignee: Kurt Deschler >Priority: Major > > http://gerrit.cloudera.org:8080/17061 > Hit these memory errors running the following on a 32GB host: > {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed > on request. Exit code is 143}} > {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}} > {{]], TaskAttempt 1 failed, info=[Container > container_1600192631322_0036_01_06 finished with diagnostics set to > [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container > [pid=24715,containerID=container_1600192631322_0036_01_06] is running > 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB > physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing > container.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10504) Add tracing for remote block reads
[ https://issues.apache.org/jira/browse/IMPALA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10504 started by Kurt Deschler. -- > Add tracing for remote block reads > -- > > Key: IMPALA-10504 > URL: https://issues.apache.org/jira/browse/IMPALA-10504 > Project: IMPALA > Issue Type: Improvement >Reporter: Kurt Deschler >Assignee: Kurt Deschler >Priority: Major > > While chasing performance issues, there were a large number of remote block > read messages in the logs. Need tracing to track down the source of these. > {noformat} > Errors: Read 3.07 GB of data across network that was expected to be local. > Block locality metadata for table 'tpcds_600_parquet.store_sales' may be > stale. > This only affects query performance and not result correctness. > One of the common causes for this warning is HDFS rebalancer moving some of > the file's blocks. > If the issue persists, consider running "INVALIDATE METADATA > `tpcds_600_parquet`.`store_sales`"{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10504) Add tracing for remote block reads
[ https://issues.apache.org/jira/browse/IMPALA-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Deschler reassigned IMPALA-10504: -- Assignee: Kurt Deschler > Add tracing for remote block reads > -- > > Key: IMPALA-10504 > URL: https://issues.apache.org/jira/browse/IMPALA-10504 > Project: IMPALA > Issue Type: Improvement >Reporter: Kurt Deschler >Assignee: Kurt Deschler >Priority: Major > > While chasing performance issues, there were a large number of remote block > read messages in the logs. Need tracing to track down the source of these. > {noformat} > Errors: Read 3.07 GB of data across network that was expected to be local. > Block locality metadata for table 'tpcds_600_parquet.store_sales' may be > stale. > This only affects query performance and not result correctness. > One of the common causes for this warning is HDFS rebalancer moving some of > the file's blocks. > If the issue persists, consider running "INVALIDATE METADATA > `tpcds_600_parquet`.`store_sales`"{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10504) Add tracing for remote block reads
Kurt Deschler created IMPALA-10504: -- Summary: Add tracing for remote block reads Key: IMPALA-10504 URL: https://issues.apache.org/jira/browse/IMPALA-10504 Project: IMPALA Issue Type: Improvement Reporter: Kurt Deschler While chasing performance issues, there were a large number of remote block read messages in the logs. Need tracing to track down the source of these. {noformat} Errors: Read 3.07 GB of data across network that was expected to be local. Block locality metadata for table 'tpcds_600_parquet.store_sales' may be stale. This only affects query performance and not result correctness. One of the common causes for this warning is HDFS rebalancer moving some of the file's blocks. If the issue persists, consider running "INVALIDATE METADATA `tpcds_600_parquet`.`store_sales`"{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10503) testdata load hits hive memory limit errors during hive inserts
[ https://issues.apache.org/jira/browse/IMPALA-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10503 started by Kurt Deschler. -- > testdata load hits hive memory limit errors during hive inserts > --- > > Key: IMPALA-10503 > URL: https://issues.apache.org/jira/browse/IMPALA-10503 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Kurt Deschler >Assignee: Kurt Deschler >Priority: Major > > Hit these memory errors running the following on a 32GB host: > {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed > on request. Exit code is 143}} > {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}} > {{]], TaskAttempt 1 failed, info=[Container > container_1600192631322_0036_01_06 finished with diagnostics set to > [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container > [pid=24715,containerID=container_1600192631322_0036_01_06] is running > 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB > physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing > container.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10503) testdata load hits hive memory limit errors during hive inserts
Kurt Deschler created IMPALA-10503: -- Summary: testdata load hits hive memory limit errors during hive inserts Key: IMPALA-10503 URL: https://issues.apache.org/jira/browse/IMPALA-10503 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.4.0 Reporter: Kurt Deschler Assignee: Kurt Deschler Hit these memory errors running the following on a 32GB host: {{buildall.sh-format -testdata }}{{[2020-09-15 13:24:08.751]Container killed on request. Exit code is 143}} {{[2020-09-15 13:24:08.751]Container exited with a non-zero exit code 143.}} {{]], TaskAttempt 1 failed, info=[Container container_1600192631322_0036_01_06 finished with diagnostics set to [Container failed, exitCode=-104. [2020-09-15 13:24:20.868]Container [pid=24715,containerID=container_1600192631322_0036_01_06] is running 14176256B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.9 GB of 2.1 GB virtual memory used. Killing container.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10502) delayed 'Invalidated objects in cache' cause 'Table already exists'
Adriano created IMPALA-10502: Summary: delayed 'Invalidated objects in cache' cause 'Table already exists' Key: IMPALA-10502 URL: https://issues.apache.org/jira/browse/IMPALA-10502 Project: IMPALA Issue Type: Bug Components: Catalog, Clients, Frontend Affects Versions: Impala 3.4.0 Reporter: Adriano In fast paced environment where the interval between the step 1 and 2 is # < 100ms (a simplified pipeline looks like): 0- catalog 'on demand' in use and disableHmsSync (enabled or disabled: no difference) 1- open session to coord A -> DROP TABLE X -> close session 2- open session to coord A -> CREATE TABLE X-> close session Results: the step -2- can fail with table already exist. During the internal investigation was discovered that IMPALA-9913 will regress the issue in almost all scenarios. However considering that the investigation are internally ongoing it is nice to have the event tracked also here. Once we are sure that IMPALA-9913 fix these events we can close this as duplicate, in alternative carry on the investigation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org