Mike Percy has posted comments on this change. Change subject: Add Google Breakpad support to Kudu ......................................................................
Patch Set 21: (12 comments) http://gerrit.cloudera.org:8080/#/c/5620/20/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: PS20, Line 500: ls -al $TEST_TMPDIR : find $TEST_TMPDIR > Maybe just "find $TEST_TMPDIR -ls" would yield the same thing as both of th Done http://gerrit.cloudera.org:8080/#/c/5620/21/java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java File java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java: Line 51: import static org.junit.Assert.assertTrue; > Unused? Done http://gerrit.cloudera.org:8080/#/c/5620/21/src/kudu/master/master_main.cc File src/kudu/master/master_main.cc: Line 62: CHECK_OK(BlockSigUSR1()); // Must be called before logging threads started. > Can we do this within InitGoogleLoggingSafe()? It'd affect a few more proce Done http://gerrit.cloudera.org:8080/#/c/5620/21/src/kudu/util/minidump-test.cc File src/kudu/util/minidump-test.cc: PS21, Line 85: #if defined(ADDRESS_SANITIZER) : // CHECK raises a SIGSEGV under ASAN and fails this test. : return; : #endif > This one is also surprising. CHECK() should yield a SIGABRT under the hood, I have investigated this a bit and so far I'm not quite sure why it's generating the SIGSEGV. Death tests can be difficult to debug under GDB because of all the forking (follow child and follow parent both resulted in me not being able to break on a SIGSEGV signal... I think I need to break on clone and change the fork-follow-mode setting in the middle of execution to debug it). I ran strace() to make sure and somewhere in the CHECK it is generating a SIGSEGV. This is the output: Death test: { while (google::_Check_string* _result = google::Check_EQImpl( google::GetReferenceableValue(1), google::GetReferenceableValue(0), "1" " " "==" " " "0")) google::LogMessageFatal("../../src/kudu/util/minidump-test.cc", 93, google::CheckOpString(_result)).stream(); } Result: died but not with expected error. Expected: kudu::MinidumpDeathTest_TestCheckStackTraceAndMinidump_Test::TestBody() Actual msg: [ DEATH ] F0123 21:44:57.211793 11054 minidump-test.cc:93] Check failed: 1 == 0 (1 vs. 0) [ DEATH ] *** Check failure stack trace: *** [ DEATH ] ASAN:DEADLYSIGNAL [ DEATH ] ================================================================= [ DEATH ] ==11054==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0x7ffd934add30 sp 0x7ffd934add28 T0) [ DEATH ] ==11054==Hint: pc points to the zero page. [ DEATH ] ==11054==The signal is caused by a READ memory access. [ DEATH ] ==11054==Hint: address points to the zero page. [ DEATH ] [ DEATH ] AddressSanitizer can not provide additional info. [ DEATH ] SUMMARY: AddressSanitizer: SEGV (<unknown module>) [ DEATH ] ==11054==ABORTING [ DEATH ] Line 113: // ASAN appears to catch SIGBUS, SIGSEGV, and SIGFPE and the process is not killed. > Is there any documentation to corroborate this (and the TSAN case)? It's so I actually can't find any documentation for it. I took a look at the ASAN source and it installs its own signal handlers while providing a way to override them via ASAN_OPTIONS, I believe (based on looking at IsHandleDeadlySignal() in thirdparty/llvm-3.9.1.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux.cc). I don't know much more than that but IMHO it's not that big of a deal to just disable some tests / signals under ASAN. We maintain some ASAN coverage still, but we're mainly concerned about being able to get minidumps in RELEASE mode anyway. Line 117: #elif defined(THREAD_SANITIZER) > I think it's technically possible to build with both ASAN and TSAN, so just Done and done http://gerrit.cloudera.org:8080/#/c/5620/21/src/kudu/util/minidump.cc File src/kudu/util/minidump.cc: Line 68: return false; // NOLINT(*) > What does lint dislike about this? It's the clang tidy bot, and I think since kMinidumpPlatformSupported is a constexpr it doesn't handle this well and sees if (value && false) return false PS21, Line 178: non-user > Nit: non-user signal (or non-user-signal) Done Line 179: google::InstallFailureFunction(nullptr); > Maybe we should move this down to just before the breakpad exception handle Done PS21, Line 261: // Send SIGUSR1 signal to thread, which will wake it up. : kill(getpid(), SIGUSR1); > I found this confusing, since technically we're sending the signal to the e You're correct about how sigwait() works. I didn't look into using signalfd(), since sigwait() seems to work. Adding a warning to subprocess.h seems a little out of place, since it is kind of a standalone library. Line 270: CHECK_EQ(0, sigwait(&signals, &signal)); > maybe PCHECK() would be more appropriate here, since EINVAL via errno is te OK. I did something similar since sigwait() doesn't set errno. http://gerrit.cloudera.org:8080/#/c/5620/6/src/kudu/util/minidump.h File src/kudu/util/minidump.h: Line 41: // handling facilities, this class must be invoked after installing those > I don't follow issue #1. Every KuduTest includes a directory hierarchy that After our in-person conversation I went back to the original use case I had for this and found that if I integrated the handler into ServerBase I didn't need the singleton to access the instance from the log cleanup thread, since ServerBase owns that too. So I was able to get rid of the tricky parts of the singleton implementation (the lock and the weak_ptr) in the latest rev. -- To view, visit http://gerrit.cloudera.org:8080/5620 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I495695cc38b75377f20b0497093a81ed5baa887f Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes