[ 
https://issues.apache.org/jira/browse/IMPALA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783086#comment-16783086
 ] 

Michael Ho commented on IMPALA-8274:
------------------------------------

FWIW, the bug above led to crash like the following:
{noformat}
F0302 10:21:04.562525 22393 coordinator-backend-state.cc:571] Check failed: 
per_fragment_instance_idx < exec_summary.exec_stats.size() (62 vs. 1)  
name=HDFS_SCAN_NODE (id=3) instance_id=e54a26423c426f58:ecf1f6b4000000b5 
fragment_idx=4
{noformat}

{noformat}
(gdb) bt
#0  0x00007f0e215c0207 in raise () from ./sysroot/lib64/libc.so.6
#1  0x00007f0e215c18f8 in abort () from ./sysroot/lib64/libc.so.6
#2  0x00000000047fe4d4 in google::DumpStackTraceAndExit() ()
#3  0x00000000047f4f2d in google::LogMessage::Fail() ()
#4  0x00000000047f67d2 in google::LogMessage::SendToLog() ()
#5  0x00000000047f4907 in google::LogMessage::Flush() ()
#6  0x00000000047f7ece in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x000000000275dd6a in 
impala::Coordinator::BackendState::InstanceStats::Update (this=0x17d393910, 
exec_status=..., thrift_profile=..., exec_summary=0x1a72a940, 
scan_range_progress=0x1a72a8d8)
    at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/src/runtime/coordinator-backend-state.cc:571
#8  0x000000000275b0cf in 
impala::Coordinator::BackendState::ApplyExecStatusReport (this=0x2e71f0100, 
backend_exec_status=..., thrift_profiles=..., exec_summary=0x1a72a940, 
scan_range_progress=0x1a72a8d8,
    dml_exec_state=0x1a72aa80) at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/src/runtime/coordinator-backend-state.cc:337
#9  0x00000000027474bb in impala::Coordinator::UpdateBackendExecStatus 
(this=0x1a72a880, request=..., thrift_profiles=...) at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/src/runtime/coordinator.cc:713
#10 0x00000000020d5c46 in impala::ClientRequestState::UpdateBackendExecStatus 
(this=0xe3a1c000, request=..., thrift_profiles=...) at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/src/service/client-request-state.cc:1303
#11 0x0000000002038291 in impala::ControlService::ReportExecStatus 
(this=0x1596cad0, request=0x7835ba70, response=0x47894bfa0, 
rpc_context=0x47894aea0)
    at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/src/service/control-service.cc:152
#12 0x00000000020dbac4 in 
impala::ControlServiceIf::ControlServiceIf(scoped_refptr<kudu::MetricEntity> 
const&, scoped_refptr<kudu::rpc::ResultTracker> 
const&)::{lambda(google::protobuf::Message const*, google::protobuf::Message*, 
kudu::rpc::RpcContext*)#2}::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const ()
    at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/generated-sources/gen-cpp/control_service.service.cc:62
{noformat}

> Missing update to index into profiles vector in 
> Coordinator::BackendState::ApplyExecStatusReport()
> --------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8274
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8274
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Blocker
>
> {{idx}} isn't updated in case we skip a duplicated or stale duplicated update 
> of a fragment instance. As a result, we may end up passing the wrong profile 
> to {{instance_stats->Update()}}. This may lead to random crashes in 
> {{Coordinator::BackendState::InstanceStats::Update}}.
> {noformat}
>   int idx = 0;
>   const bool has_profile = thrift_profiles.profile_trees.size() > 0;
>   TRuntimeProfileTree empty_profile;
>   for (const FragmentInstanceExecStatusPB& instance_exec_status :
>            backend_exec_status.instance_exec_status()) {
>     int64_t report_seq_no = instance_exec_status.report_seq_no();
>     int instance_idx = 
> GetInstanceIdx(instance_exec_status.fragment_instance_id());
>     DCHECK_EQ(instance_stats_map_.count(instance_idx), 1);
>     InstanceStats* instance_stats = instance_stats_map_[instance_idx];
>     int64_t last_report_seq_no = instance_stats->last_report_seq_no_;
>     DCHECK(instance_stats->exec_params_.instance_id ==
>         ProtoToQueryId(instance_exec_status.fragment_instance_id()));
>     // Ignore duplicate or out-of-order messages.
>     if (report_seq_no <= last_report_seq_no) {
>       VLOG_QUERY << Substitute("Ignoring stale update for query instance $0 
> with "
>           "seq no $1", PrintId(instance_stats->exec_params_.instance_id), 
> report_seq_no);
>       continue; <<--- // XXX bad
>     }
>     DCHECK(!instance_stats->done_);
>     DCHECK(!has_profile || idx < thrift_profiles.profile_trees.size());
>     const TRuntimeProfileTree& profile =
>         has_profile ? thrift_profiles.profile_trees[idx++] : empty_profile;
>     instance_stats->Update(instance_exec_status, profile, exec_summary,
>         scan_range_progress);
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to