Mike Percy has posted comments on this change.

Change subject: Add Google Breakpad support to Kudu
......................................................................


Patch Set 21:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/5620/20/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

PS20, Line 500:     ls -al $TEST_TMPDIR
              :     find $TEST_TMPDIR
> Maybe just "find $TEST_TMPDIR -ls" would yield the same thing as both of th
Done


http://gerrit.cloudera.org:8080/#/c/5620/21/java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java
File java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java:

Line 51: import static org.junit.Assert.assertTrue;
> Unused?
Done


http://gerrit.cloudera.org:8080/#/c/5620/21/src/kudu/master/master_main.cc
File src/kudu/master/master_main.cc:

Line 62:   CHECK_OK(BlockSigUSR1()); // Must be called before logging threads 
started.
> Can we do this within InitGoogleLoggingSafe()? It'd affect a few more proce
Done


http://gerrit.cloudera.org:8080/#/c/5620/21/src/kudu/util/minidump-test.cc
File src/kudu/util/minidump-test.cc:

PS21, Line 85: #if defined(ADDRESS_SANITIZER)
             :   // CHECK raises a SIGSEGV under ASAN and fails this test.
             :   return;
             : #endif
> This one is also surprising. CHECK() should yield a SIGABRT under the hood,
I have investigated this a bit and so far I'm not quite sure why it's 
generating the SIGSEGV. Death tests can be difficult to debug under GDB because 
of all the forking (follow child and follow parent both resulted in me not 
being able to break on a SIGSEGV signal... I think I need to break on clone and 
change the fork-follow-mode setting in the middle of execution to debug it).

I ran strace() to make sure and somewhere in the CHECK it is generating a 
SIGSEGV. This is the output:

Death test: { while (google::_Check_string* _result = google::Check_EQImpl( 
google::GetReferenceableValue(1), google::GetReferenceableValue(0), "1" " " 
"==" " " "0")) google::LogMessageFatal("../../src/kudu/util/minidump-test.cc", 
93, google::CheckOpString(_result)).stream(); }
    Result: died but not with expected error.
  Expected: 
kudu::MinidumpDeathTest_TestCheckStackTraceAndMinidump_Test::TestBody()
Actual msg:
[  DEATH   ] F0123 21:44:57.211793 11054 minidump-test.cc:93] Check failed: 1 
== 0 (1 vs. 0)
[  DEATH   ] *** Check failure stack trace: ***
[  DEATH   ] ASAN:DEADLYSIGNAL
[  DEATH   ] =================================================================
[  DEATH   ] ==11054==ERROR: AddressSanitizer: SEGV on unknown address 
0x000000000000 (pc 0x000000000000 bp 0x7ffd934add30 sp 0x7ffd934add28 T0)
[  DEATH   ] ==11054==Hint: pc points to the zero page.
[  DEATH   ] ==11054==The signal is caused by a READ memory access.
[  DEATH   ] ==11054==Hint: address points to the zero page.
[  DEATH   ]
[  DEATH   ] AddressSanitizer can not provide additional info.
[  DEATH   ] SUMMARY: AddressSanitizer: SEGV (<unknown module>)
[  DEATH   ] ==11054==ABORTING
[  DEATH   ]


Line 113:   // ASAN appears to catch SIGBUS, SIGSEGV, and SIGFPE and the 
process is not killed.
> Is there any documentation to corroborate this (and the TSAN case)? It's so
I actually can't find any documentation for it. I took a look at the ASAN 
source and it installs its own signal handlers while providing a way to 
override them via ASAN_OPTIONS, I believe (based on looking at 
IsHandleDeadlySignal() in 
thirdparty/llvm-3.9.1.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux.cc).

I don't know much more than that but IMHO it's not that big of a deal to just 
disable some tests / signals under ASAN. We maintain some ASAN coverage still, 
but we're mainly concerned about being able to get minidumps in RELEASE mode 
anyway.


Line 117: #elif defined(THREAD_SANITIZER)
> I think it's technically possible to build with both ASAN and TSAN, so just
Done and done


http://gerrit.cloudera.org:8080/#/c/5620/21/src/kudu/util/minidump.cc
File src/kudu/util/minidump.cc:

Line 68:     return false; // NOLINT(*)
> What does lint dislike about this?
It's the clang tidy bot, and I think since kMinidumpPlatformSupported is a 
constexpr it doesn't handle this well and sees if (value && false) return false


PS21, Line 178: non-user 
> Nit: non-user signal (or non-user-signal)
Done


Line 179:   google::InstallFailureFunction(nullptr);
> Maybe we should move this down to just before the breakpad exception handle
Done


PS21, Line 261:   // Send SIGUSR1 signal to thread, which will wake it up.
              :   kill(getpid(), SIGUSR1);
> I found this confusing, since technically we're sending the signal to the e
You're correct about how sigwait() works.

I didn't look into using signalfd(), since sigwait() seems to work.

Adding a warning to subprocess.h seems a little out of place, since it is kind 
of a standalone library.


Line 270:     CHECK_EQ(0, sigwait(&signals, &signal));
> maybe PCHECK() would be more appropriate here, since EINVAL via errno is te
OK. I did something similar since sigwait() doesn't set errno.


http://gerrit.cloudera.org:8080/#/c/5620/6/src/kudu/util/minidump.h
File src/kudu/util/minidump.h:

Line 41: // handling facilities, this class must be invoked after installing 
those
> I don't follow issue #1. Every KuduTest includes a directory hierarchy that
After our in-person conversation I went back to the original use case I had for 
this and found that if I integrated the handler into ServerBase I didn't need 
the singleton to access the instance from the log cleanup thread, since 
ServerBase owns that too. So I was able to get rid of the tricky parts of the 
singleton implementation (the lock and the weak_ptr) in the latest rev.


-- 
To view, visit http://gerrit.cloudera.org:8080/5620
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I495695cc38b75377f20b0497093a81ed5baa887f
Gerrit-PatchSet: 21
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to