[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-18 Thread Anonymous Coward (Code Review)
huangtianhua...@gmail.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15899 )

Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..


Patch Set 2:

Hi all,
I have took test after upgrade libunwind to 1.4.0, it works, so I think we only 
have to upgrade libunwind for this issue.


--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 2
Gerrit-Owner: RuiChen 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: RuiChen 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 18 May 2020 07:34:11 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-17 Thread RuiChen (Code Review)
RuiChen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15899 )

Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..


Patch Set 2:

> I uploaded libunwind 1.4.0 and verified thirdparty builds
 > successfully. You can update `LIBUNWIND_VERSION=1.4.0` in vars.sh
 > in this patch now as well.

Thank you, will update patch and testing again.


--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 2
Gerrit-Owner: RuiChen 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: RuiChen 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 18 May 2020 01:17:12 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-14 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15899 )

Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..


Patch Set 2:

I uploaded libunwind 1.4.0 and verified thirdparty builds successfully. You can 
update `LIBUNWIND_VERSION=1.4.0` in vars.sh in this patch now as well.


--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 2
Gerrit-Owner: RuiChen 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: RuiChen 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 14 May 2020 20:51:29 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-14 Thread RuiChen (Code Review)
RuiChen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15899 )

Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..


Patch Set 2:

https://github.com/apache/kudu/blob/master/thirdparty/vars.sh#L31


--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 2
Gerrit-Owner: RuiChen 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: RuiChen 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 14 May 2020 09:37:19 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-14 Thread RuiChen (Code Review)
RuiChen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15899 )

Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..


Patch Set 2:

> > Patch Set 1:
 > >
 > > Todd, thank you for replying to share these information, looks
 > like two things impact the part of code: 1.libunwind call malloc()
 > in thread stack trace collection, I find upstream have fixed it in
 > libunwind 1.4.0 https://github.com/libunwind/libunwind/pull/72 ;
 > 2.save waiting time for spawning thread. So I update the patch to
 > address #2, set tid after strings::Substitute that cause coredump
 > in patchset 2, I think we maybe update libunwind in the future.
 >
 > Why not update libunwind now? My point above is that, even if
 > you've fixed this particular core dump, you still have a potential
 > deadlock bug because of the call to malloc in libunwind.

The libunwind source code package that Kudu depend on is in AWS S3, and I don't 
know how to push the 1.4.x release into it, so I commit the patch as 
workaround. Could you help to add a 1.4.x libunwind into S3? then I will update 
patch to apply it, thanks


--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 2
Gerrit-Owner: RuiChen 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: RuiChen 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 14 May 2020 09:37:05 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15899 )

Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..


Patch Set 2:

> Patch Set 1:
>
> Todd, thank you for replying to share these information, looks like two 
> things impact the part of code: 1.libunwind call malloc() in thread stack 
> trace collection, I find upstream have fixed it in libunwind 1.4.0 
> https://github.com/libunwind/libunwind/pull/72 ; 2.save waiting time for 
> spawning thread. So I update the patch to address #2, set tid after 
> strings::Substitute that cause coredump in patchset 2, I think we maybe 
> update libunwind in the future.

Why not update libunwind now? My point above is that, even if you've fixed this 
particular core dump, you still have a potential deadlock bug because of the 
call to malloc in libunwind.


--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 2
Gerrit-Owner: RuiChen 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: RuiChen 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 13 May 2020 22:01:36 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-12 Thread RuiChen (Code Review)
Hello Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15899

to look at the new patch set (#2).

Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..

KUDU-3096: Fix core dump when thread race between GetThreadStack and 
SuperviseThread

Some logic before t->functor_() in SuperviseThread maybe cause coredump,
GetThreadStack function will try to collect thread info even if tcmalloc
ThreadCache haven't been inited completlly.

Complete all of initialization, then set tid_ of thread in order to
avoiding unexpect access.

Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
---
M src/kudu/util/thread.cc
1 file changed, 3 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/15899/2
--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 2
Gerrit-Owner: RuiChen 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: RuiChen 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-12 Thread RuiChen (Code Review)
RuiChen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15899 )

Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..


Patch Set 1:

Todd, thank you for replying to share these information, looks like two things 
impact the part of code: 1.libunwind call malloc() in thread stack trace 
collection, I find upstream have fixed it in libunwind 1.4.0 
https://github.com/libunwind/libunwind/pull/72 ; 2.save waiting time for 
spawning thread. So I update the patch to address #2, set tid after 
strings::Substitute that cause coredump in patchset 2, I think we maybe update 
libunwind in the future.


--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 1
Gerrit-Owner: RuiChen 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: RuiChen 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 12 May 2020 07:09:21 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15899 )

Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..


Patch Set 1:

One further note I forgot to mention above: the code path here is purposefully 
as short as possible since the spawning thread is blocked waiting for the 
spawned thread to set the tid. In many cases, the spawning thread may be 
latency-sensitive, and is trying to defer work to a thread pool in order to 
return as quickly as possible. So, the more work we do before releasing the tid 
back to the spawner, the worse the latency will be.


--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 1
Gerrit-Owner: RuiChen 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 11 May 2020 17:56:14 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-11 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15899 )

Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..


Patch Set 1: Code-Review-1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15899/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15899/1//COMMIT_MSG@11
PS1, Line 11: ThreadCache haven't been inited completlly.
I'm not sure this is the right fix. The bug report says that the issue is that 
the signal handler is calling into malloc() before tcmalloc is fully initted, 
which certainly could cause a crash, but the fact that a signal handler is 
calling malloc() at all is already a big problem. On x86 we use 
framepointer-based unwinding so that the signal handler for stack trace 
collection is async-safe. If the libunwind stack unwinder on aarch64 isn't 
signal-safe, we should disable that code path entirely, rather than doing this 
fix. If we don't, I think there is still a risk of arbitrary deadlocks due to 
malloc not being reentrant (the signalled thread may be holding the PageHeap 
lock, for example)



--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 1
Gerrit-Owner: RuiChen 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 11 May 2020 17:55:04 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3096: Fix core dump when thread race between GetThreadStack and SuperviseThread

2020-05-11 Thread RuiChen (Code Review)
RuiChen has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15899


Change subject: KUDU-3096: Fix core dump when thread race between 
GetThreadStack and SuperviseThread
..

KUDU-3096: Fix core dump when thread race between GetThreadStack and 
SuperviseThread

Some logic before t->functor_() in SuperviseThread maybe cause coredump,
GetThreadStack function will try to collect thread info even if tcmalloc
ThreadCache haven't been inited completlly.

Complete all of initialization, then set tid_ of thread in order to
avoiding unexpect access.

Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
---
M src/kudu/util/thread.cc
1 file changed, 7 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/15899/1
--
To view, visit http://gerrit.cloudera.org:8080/15899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Gerrit-Change-Number: 15899
Gerrit-PatchSet: 1
Gerrit-Owner: RuiChen