[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
huangtianhua...@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/15899 ) Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. Patch Set 2: Hi all, I have took test after upgrade libunwind to 1.4.0, it works, so I think we only have to upgrade libunwind for this issue. -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 2 Gerrit-Owner: RuiChen Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: RuiChen Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 18 May 2020 07:34:11 + Gerrit-HasComments: No
[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
RuiChen has posted comments on this change. ( http://gerrit.cloudera.org:8080/15899 ) Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. Patch Set 2: > I uploaded libunwind 1.4.0 and verified thirdparty builds > successfully. You can update `LIBUNWIND_VERSION=1.4.0` in vars.sh > in this patch now as well. Thank you, will update patch and testing again. -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 2 Gerrit-Owner: RuiChen Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: RuiChen Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 18 May 2020 01:17:12 + Gerrit-HasComments: No
[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15899 ) Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. Patch Set 2: I uploaded libunwind 1.4.0 and verified thirdparty builds successfully. You can update `LIBUNWIND_VERSION=1.4.0` in vars.sh in this patch now as well. -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 2 Gerrit-Owner: RuiChen Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: RuiChen Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 14 May 2020 20:51:29 + Gerrit-HasComments: No
[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
RuiChen has posted comments on this change. ( http://gerrit.cloudera.org:8080/15899 ) Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. Patch Set 2: https://github.com/apache/kudu/blob/master/thirdparty/vars.sh#L31 -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 2 Gerrit-Owner: RuiChen Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: RuiChen Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 14 May 2020 09:37:19 + Gerrit-HasComments: No
[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
RuiChen has posted comments on this change. ( http://gerrit.cloudera.org:8080/15899 ) Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. Patch Set 2: > > Patch Set 1: > > > > Todd, thank you for replying to share these information, looks > like two things impact the part of code: 1.libunwind call malloc() > in thread stack trace collection, I find upstream have fixed it in > libunwind 1.4.0 https://github.com/libunwind/libunwind/pull/72 ; > 2.save waiting time for spawning thread. So I update the patch to > address #2, set tid after strings::Substitute that cause coredump > in patchset 2, I think we maybe update libunwind in the future. > > Why not update libunwind now? My point above is that, even if > you've fixed this particular core dump, you still have a potential > deadlock bug because of the call to malloc in libunwind. The libunwind source code package that Kudu depend on is in AWS S3, and I don't know how to push the 1.4.x release into it, so I commit the patch as workaround. Could you help to add a 1.4.x libunwind into S3? then I will update patch to apply it, thanks -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 2 Gerrit-Owner: RuiChen Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: RuiChen Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 14 May 2020 09:37:05 + Gerrit-HasComments: No
[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15899 ) Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. Patch Set 2: > Patch Set 1: > > Todd, thank you for replying to share these information, looks like two > things impact the part of code: 1.libunwind call malloc() in thread stack > trace collection, I find upstream have fixed it in libunwind 1.4.0 > https://github.com/libunwind/libunwind/pull/72 ; 2.save waiting time for > spawning thread. So I update the patch to address #2, set tid after > strings::Substitute that cause coredump in patchset 2, I think we maybe > update libunwind in the future. Why not update libunwind now? My point above is that, even if you've fixed this particular core dump, you still have a potential deadlock bug because of the call to malloc in libunwind. -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 2 Gerrit-Owner: RuiChen Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: RuiChen Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 13 May 2020 22:01:36 + Gerrit-HasComments: No
[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
Hello Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15899 to look at the new patch set (#2). Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread Some logic before t->functor_() in SuperviseThread maybe cause coredump, GetThreadStack function will try to collect thread info even if tcmalloc ThreadCache haven't been inited completlly. Complete all of initialization, then set tid_ of thread in order to avoiding unexpect access. Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 --- M src/kudu/util/thread.cc 1 file changed, 3 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/15899/2 -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 2 Gerrit-Owner: RuiChen Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: RuiChen Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
RuiChen has posted comments on this change. ( http://gerrit.cloudera.org:8080/15899 ) Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. Patch Set 1: Todd, thank you for replying to share these information, looks like two things impact the part of code: 1.libunwind call malloc() in thread stack trace collection, I find upstream have fixed it in libunwind 1.4.0 https://github.com/libunwind/libunwind/pull/72 ; 2.save waiting time for spawning thread. So I update the patch to address #2, set tid after strings::Substitute that cause coredump in patchset 2, I think we maybe update libunwind in the future. -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 1 Gerrit-Owner: RuiChen Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: RuiChen Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 12 May 2020 07:09:21 + Gerrit-HasComments: No
[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15899 ) Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. Patch Set 1: One further note I forgot to mention above: the code path here is purposefully as short as possible since the spawning thread is blocked waiting for the spawned thread to set the tid. In many cases, the spawning thread may be latency-sensitive, and is trying to defer work to a thread pool in order to return as quickly as possible. So, the more work we do before releasing the tid back to the spawner, the worse the latency will be. -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 1 Gerrit-Owner: RuiChen Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 11 May 2020 17:56:14 + Gerrit-HasComments: No
[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15899 ) Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. Patch Set 1: Code-Review-1 (1 comment) http://gerrit.cloudera.org:8080/#/c/15899/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15899/1//COMMIT_MSG@11 PS1, Line 11: ThreadCache haven't been inited completlly. I'm not sure this is the right fix. The bug report says that the issue is that the signal handler is calling into malloc() before tcmalloc is fully initted, which certainly could cause a crash, but the fact that a signal handler is calling malloc() at all is already a big problem. On x86 we use framepointer-based unwinding so that the signal handler for stack trace collection is async-safe. If the libunwind stack unwinder on aarch64 isn't signal-safe, we should disable that code path entirely, rather than doing this fix. If we don't, I think there is still a risk of arbitrary deadlocks due to malloc not being reentrant (the signalled thread may be holding the PageHeap lock, for example) -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 1 Gerrit-Owner: RuiChen Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 11 May 2020 17:55:04 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread
RuiChen has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15899 Change subject: KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread .. KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread Some logic before t->functor_() in SuperviseThread maybe cause coredump, GetThreadStack function will try to collect thread info even if tcmalloc ThreadCache haven't been inited completlly. Complete all of initialization, then set tid_ of thread in order to avoiding unexpect access. Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 --- M src/kudu/util/thread.cc 1 file changed, 7 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/15899/1 -- To view, visit http://gerrit.cloudera.org:8080/15899 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955 Gerrit-Change-Number: 15899 Gerrit-PatchSet: 1 Gerrit-Owner: RuiChen