Perl debug patches used to track down source of segfault

Eric Wong Mon, 01 Feb 2021 01:07:26 -0800

Attached are two patches against the Debian-packaged perl
5.28.1-6+deb10u1 which I used for tracking down the attempt to
access @DB::args of PublicInbox::Listener::event_step as the
source of the segfault.


I don't know Perl internals very well, and I was never an
advanced gdb user when I hacked C.

While most segfaults I saw did not emit any errors, maybe 1-2 of
them pointed out something from Carp (the error reporting and
backtrace module of Perl); so I wasn't sure if there were
multiple sources of segfaults.

I initially thought it was something warning during the
destruction phase because most of the code isn't doing anything
out-of-the-ordinary aside from short-lived workers.

I then ran lei with debugperl(1) (deb: perl-debug) instead of
the normal "perl" on my system.  That got me to the assertion
failure in sv.c::Perl_sv_clear at

        assert(SvTYPE(sv) != (svtype)SVTYPEMASK);

So I made the attached patch to sv.c to call Perl_op_dump (which
I learned about from the perlhacktips(1) manpage).

This patch to sv.c got me a dump which put me around lines 340-350
of Carp.pm.  Poking into Carp.pm brought me to this comment
around lines 330:

        # Guard our serialization of the stack from stack refcounting bugs
        # NOTE this is NOT a complete solution, we cannot 100% guard against
        # these bugs.  However in many cases Perl *is* capable of detecting
        # them and throws an error when it does.  Unfortunately serializing
        # the arguments on the stack is a perfect way of finding these bugs,
        # even when they would not affect normal program flow that did not
        # poke around inside the stack.  Inside of Carp.pm it makes little
        # sense reporting these bugs, as Carp's job is to report the callers
        # errors, not the ones it might happen to tickle while doing so.
        # See: https://rt.perl.org/Public/Bug/Display.html?id=131046
        # and: https://rt.perl.org/Public/Bug/Display.html?id=52610
        # for more details and discussion. - Yves

The conundrum is Carp itself is generating backtraces, so
getting a Perl-level backtrace becomes tricky...

I decided to edit Carp.pm to store the caller-supplied error
message in a global variable, and set the G_ERR environment
variable before where it would occasionally segfault.

Once the environment variable is set, printing fields of the
global **environ array from gdb on the core dump where G_ERR is
stored would get me the $sub_name and offending error message.

It turned out the error was the confess call in
PublicInbox::Eml::body_str which was copied over from
Email::MIME, and $sub_name was PublicInbox::Listener::event_step.

That led me to realize $DescriptorMap{$fd} was no longer
referenced clobbered and to this workaround.

I didn't want to spend time compiling, relinking and install
Perl again (and didn't want to do it the first time), but I
could've printed the G_ERR environment where I was doing
Perl_op_dump, too.

>From 03712da7eb875b18879662079279241cd4fee2d0 Mon Sep 17 00:00:00 2001
From: Eric Wong <e...@80x24.org>
Date: Mon, 1 Feb 2021 05:42:42 +0000
Subject: [PATCH 1/2] sv.c: run Perl_op_dump before failing assertion

The debugperl(1) binary packaged by Debian alerted me
to an assertion in Perl_sv_clear failing at:

	assert(SvTYPE(sv) != (svtype)SVTYPEMASK);

To get more information about what Perl code is executing,
we can run Perl_op_dump on PL_op before triggering the
assertion.
---
 sv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sv.c b/sv.c
index 07865bb..2c950e0 100644
--- a/sv.c
+++ b/sv.c
@@ -6510,6 +6510,9 @@ Perl_sv_clear(pTHX_ SV *const orig_sv)
 	type = SvTYPE(sv);
 
 	assert(SvREFCNT(sv) == 0);
+	if ((SvTYPE(sv) == (svtype)SVTYPEMASK)) {
+	    Perl_op_dump(aTHX_ PL_op);
+	}
 	assert(SvTYPE(sv) != (svtype)SVTYPEMASK);
 
 	if (type <= SVt_IV) {

>From ef7a02def4468ac7301c120ece586caea7351c4d Mon Sep 17 00:00:00 2001
From: Eric Wong <e...@80x24.org>
Date: Mon, 1 Feb 2021 05:34:12 +0000
Subject: [PATCH 2/2] carp: set G_ERR in environ before accessing @DB::args

In case accessing @DB::args causes a segfault and core dump;
one can open the core dump with gdb(1) run "p environ[$IDX]"
to figure out what message and sub_name is tickling Perl5's
stack-not-refcounted behavior and causing a segfault.

($IDX is typically the last environment variable index in the
 environ(7) array, but it could be another number)
---
 dist/Carp/lib/Carp.pm | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/dist/Carp/lib/Carp.pm b/dist/Carp/lib/Carp.pm
index 109b7fe..c87e373 100644
--- a/dist/Carp/lib/Carp.pm
+++ b/dist/Carp/lib/Carp.pm
@@ -302,6 +302,8 @@ BEGIN {
     }
 }
 
+our $g_err;
+
 sub caller_info {
     my $i = shift(@_) + 1;
     my %call_info;
@@ -327,6 +329,7 @@ sub caller_info {
 
     my $sub_name = Carp::get_subname( \%call_info );
     if ( $call_info{has_args} ) {
+$ENV{G_ERR} = "$sub_name $g_err";
         # Guard our serialization of the stack from stack refcounting bugs
         # NOTE this is NOT a complete solution, we cannot 100% guard against
         # these bugs.  However in many cases Perl *is* capable of detecting
@@ -594,6 +597,7 @@ sub ret_backtrace {
         $tid_msg = " thread $tid" if $tid;
     }
 
+$g_err = $err;
     my %i = caller_info($i);
     $mess = "$err at $i{file} line $i{line}$tid_msg";
     if( $. ) {
@@ -635,6 +639,7 @@ sub ret_summary {
         my $tid = threads->tid;
         $tid_msg = " thread $tid" if $tid;
     }
+$g_err = $err;
 
     my %i = caller_info($i);
     return "$err at $i{file} line $i{line}$tid_msg\.\n";

Perl debug patches used to track down source of segfault

Reply via email to