Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11021 )
Change subject: IMPALA-6214: Determine and warn about stuck fragment instances. ...................................................................... Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/11021/1/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/11021/1/be/src/runtime/krpc-data-stream-recvr.cc@230 PS1, Line 230: stringstream dstid; : : dstid << recvr_->dest_node_id(); : string frgId = PrintId(recvr_->fragment_instance_id()); : string watchdog_str = "fragment_instance_id= " + frgId + " node= " + dstid.str(); : memset(buf, BUFSZ, 0); : sprintf(buf, "%s : %d %s", __FILE__, __LINE__, watchdog_str.c_str()); constructing this once per batch seems like a big performance red flag. Why not construct this string in the KrpcDataStreamRecvr? I don't know this code super well, but I thought this was constant for a given instance. That said, I also don't know if this is even a safe usage of ScopedWatchKernelStack -- I believe it relies on its argument not ever getting freed, because there is no synchronization that ensures that the "watcher" thread doesn't read this pointer after the "watched" thread exits. In other words, I think you might have a race here where the watcher prints arbitrary junk off of the target thread's stack. http://gerrit.cloudera.org:8080/#/c/11021/1/be/src/runtime/krpc-data-stream-recvr.cc@239 PS1, Line 239: kudu::SetStackTraceSignal(SIGRTMIN + 1); you should call this at init time, not once per batch. I'm not even sure it's thread-safe, and if it is, it probably takes a lock so this would collapse throughput. -- To view, visit http://gerrit.cloudera.org:8080/11021 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I260a1d0a3477e5c6a46094e664500c3e2ed7de62 Gerrit-Change-Number: 11021 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Mon, 23 Jul 2018 23:56:24 +0000 Gerrit-HasComments: Yes