Re: Random segmentation fault

2016-01-12 Thread John Dunlap
I ended up falling back to the previous Debian release. I'm planning to try
again in Debian 9.

On Tue, Jan 12, 2016 at 9:52 AM, FredB  wrote:

> Hello,
>
> I have exactly the same issue that here
> https://mail-archives.apache.org/mod_mbox/perl-modperl/201509.mbox/%3CCAC5eUSu85CoiT0MkQvwvxjfrhJcn13DqJfV=egfuxvwswyr...@mail.gmail.com%3E
>
> Same OS, Debian Jessie, and no problem at all with same configuration and
> Debian wheezy -> apache2 2.2.22 and mod_perl 2.0.7
> After a while apache2 crashes with "Out of memory" but the complete system
> are using less than 15% of ram ...
>
> *** Error in `/usr/sbin/apache2': double free or corruption (!prev):
> 0x7f859f10 ***
> [Tue Jan 12 14:32:25.852805 2016] [core:notice] [pid 19350:tid
> 140212941145984] AH00052: child pid 1487 exit signal Segmentation fault (11)
> [Tue Jan 12 14:32:25.852857 2016] [core:notice] [pid 19350:tid
> 140212941145984] AH00052: child pid 1488 exit signal Aborted (6)
> Out of memory!
> Out of memory!
> PerlIOApache_flush: flush can't be called before the response phase, 
> line 1599.
> *** longjmp causes uninitialized stack frame ***: /usr/sbin/apache2
> terminated
> === Backtrace: =
> /lib/x86_64-linux-gnu/libc.so.6(+0x731ff)[0x7f85dd8051ff]
> /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f85dd8884c7]
> /lib/x86_64-linux-gnu/libc.so.6(+0xf63fd)[0x7f85dd8883fd]
> /lib/x86_64-linux-gnu/libc.so.6(__longjmp_chk+0x29)[0x7f85dd888359]
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(+0x4d2f1)[0x7f85da4f72f1]
>
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(Perl_my_failure_exit+0x43)[0x7f85da5000b3]
>
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(Perl_die_unwind+0x27d)[0x7f85da5af08d]
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(Perl_vcroak+0x39)[0x7f85da551ab9]
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(+0xa8584)[0x7f85da552584]
> /usr/lib/apache2/modules/mod_perl.so(+0x1e3d1)[0x7f85da8863d1]
>
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(Perl_PerlIO_flush+0x3f)[0x7f85da5ef44f]
>
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(PerlIOBase_close+0x26)[0x7f85da5f1346]
> /usr/lib/apache2/modules/mod_perl.so(+0x1e3e9)[0x7f85da8863e9]
>
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(PerlIO__close+0x28)[0x7f85da5f13f8]
>
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(Perl_PerlIO_close+0xf)[0x7f85da5f143f]
>
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(Perl_io_close+0xba)[0x7f85da5cbe8a]
>
> /usr/lib/x86_64-linux-gnu/libperl.so.5.20(Perl_do_close+0x69)[0x7f85da5cbfb9]
>
> /usr/lib/apache2/modules/mod_perl.so(modperl_response_handler_cgi+0x138)[0x7f85da87b3a8]
> /usr/sbin/apache2(ap_run_handler+0x40)[0x7f85de69f2a0]
> /usr/sbin/apache2(ap_invoke_handler+0x69)[0x7f85de69f7e9]
> /usr/sbin/apache2(ap_process_async_request+0x392)[0x7f85de6b5682]
> /usr/sbin/apache2(+0x6b1f0)[0x7f85de6b21f0]
> /usr/sbin/apache2(ap_run_process_connection+0x40)[0x7f85de6a8b10]
> /usr/lib/apache2/modules/mod_mpm_event.so(+0x6d7a)[0x7f85dacb7d7a]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4)[0x7f85ddb430a4]
> [Tue Jan 12 14:32:28.857012 2016] [core:notice] [pid 19350:tid
> 140212941145984] AH00052: child pid 1649 exit signal Segmentation fault (11)
> [Tue Jan 12 14:32:28.857070 2016] [core:notice] [pid 19350:tid
> 140212941145984] AH00052: child pid 1650 exit signal Segmentation fault (11)
> [Tue Jan 12 14:32:30.860185 2016] [core:notice] [pid 19350:tid
> 140212941145984] AH00052: child pid 1756 exit signal Segmentation fault (11)
> [Tue Jan 12 14:32:31.861317 2016] [core:notice] [pid 19350:tid
> 140212941145984] AH00052: child pid 1809 exit signal Segmentation fault (11)
> [Tue Jan 12 14:32:31.861365 2016] [core:notice] [pid 19350:tid
> 140212941145984] AH00052: child pid 1810 exit signal Segmentation fault (11)
> [Tue Jan 12 14:32:34.865446 2016] [core:notice] [pid 19350:tid
> 140212941145984] AH00052: child pid 1968 exit signal Segmentation fault (11)
>
> Any help will be greatly appreciated
>
> Fred
>



-- 
John Dunlap
*CTO | Lariat *

*Direct:*
*j...@lariat.co *

*Customer Service:*
877.268.6667
supp...@lariat.co


Re: Random segmentation fault

2015-09-15 Thread Michael Schout
Steve, John:

I did a bisect today against perl git, using mod_perl 2.0.9 and apache
2.2.29 to find out where my two issues were caused.  It turns out that
both of my problems, which were:

- panic: attempt to copy freed scalar  to 

and also

- segmentation fault caused by a specific "return" statement

Are *BOTH* caused by the following commit made to perl between 5.19.6
and 5.19.7:

> commit 437e3a7dac994ebace1195549170c81f474d9c20
> Author: Matthew Horsfall 
> Date:   Wed Dec 11 18:28:21 2013 -0500
> 
> Optimise out PUSHMARK/RETURN if return is the last statement in a sub.
> 
> This makes:
> 
>   sub baz { return $cat; }
> 
> Behave like:
> 
>   sub baz { $cat; }
> 
> Which is notably faster.
...

I created a patch that reverses this (at least the change to op.c.. the
other parts of the patch are just new macros, and a test case), then
both of my problems are fixed.

John:  I'd be interested to know if your problem is related.  If
possible, can you build perl with the attached patch applied and see  if
that fixes your segfault also?

This seems to be mod_perl specific.  I have a very
straightforward/minimal test case that causes the "panic" error under
mod_perl, but the same code runs fine under the command line outside of
mod_perl.

Regards,
Michael Schout
diff --git a/op.c b/op.c
index 7038526..dc42b56 100644
--- a/op.c
+++ b/op.c
@@ -11354,45 +11354,6 @@ Perl_rpeep(pTHX_ OP *o)
case OP_NEXTSTATE:
PL_curcop = ((COP*)o);  /* for warnings */
 
-   /* Optimise a "return ..." at the end of a sub to just be "...".
-* This saves 2 ops. Before:
-* 1  <;> nextstate(main 1 -e:1) v ->2
-* 4  <@> return K ->5
-* 2<0> pushmark s ->3
-* -<1> ex-rv2sv sK/1 ->4
-* 3  <#> gvsv[*cat] s ->4
-*
-* After:
-* -  <@> return K ->-
-* -<0> pushmark s ->2
-* -<1> ex-rv2sv sK/1 ->-
-* 2  <$> gvsv(*cat) s ->3
-*/
-   {
-   OP *next = o->op_next;
-   OP *sibling = o->op_sibling;
-   if (   OP_TYPE_IS(next, OP_PUSHMARK)
-   && OP_TYPE_IS(sibling, OP_RETURN)
-   && OP_TYPE_IS(sibling->op_next, OP_LINESEQ)
-   && OP_TYPE_IS(sibling->op_next->op_next, OP_LEAVESUB)
-   && cUNOPx(sibling)->op_first == next
-   && next->op_sibling && next->op_sibling->op_next
-   && next->op_next
-   ) {
-   /* Look through the PUSHMARK's siblings for one that
-* points to the RETURN */
-   OP *top = next->op_sibling;
-   while (top && top->op_next) {
-   if (top->op_next == sibling) {
-   top->op_next = sibling->op_next;
-   o->op_next = next->op_next;
-   break;
-   }
-   top = top->op_sibling;
-   }
-   }
-   }
-
/* Optimise 'my $x; my $y;' into 'my ($x, $y);'
  *
 * This latter form is then suitable for conversion into padrange


Re: Random segmentation fault

2015-09-15 Thread John Dunlap
First, let me thank you for your efforts! I doubt I could have come to the
bottom of that. Unfortunately, I have been unable to reproduce my problem
outside of a production environment and I can't risk pushing a version of
perl into production that I've built myself just to test that. As unhelpful
as it is, the best option for my customers, is to fall back to Debian
7(perl 5.14.2).

On Tue, Sep 15, 2015 at 4:58 PM, Michael Schout  wrote:

> Steve, John:
>
> I did a bisect today against perl git, using mod_perl 2.0.9 and apache
> 2.2.29 to find out where my two issues were caused.  It turns out that
> both of my problems, which were:
>
> - panic: attempt to copy freed scalar  to 
>
> and also
>
> - segmentation fault caused by a specific "return" statement
>
> Are *BOTH* caused by the following commit made to perl between 5.19.6
> and 5.19.7:
>
> > commit 437e3a7dac994ebace1195549170c81f474d9c20
> > Author: Matthew Horsfall 
> > Date:   Wed Dec 11 18:28:21 2013 -0500
> >
> > Optimise out PUSHMARK/RETURN if return is the last statement in a
> sub.
> >
> > This makes:
> >
> >   sub baz { return $cat; }
> >
> > Behave like:
> >
> >   sub baz { $cat; }
> >
> > Which is notably faster.
> ...
>
> I created a patch that reverses this (at least the change to op.c.. the
> other parts of the patch are just new macros, and a test case), then
> both of my problems are fixed.
>
> John:  I'd be interested to know if your problem is related.  If
> possible, can you build perl with the attached patch applied and see  if
> that fixes your segfault also?
>
> This seems to be mod_perl specific.  I have a very
> straightforward/minimal test case that causes the "panic" error under
> mod_perl, but the same code runs fine under the command line outside of
> mod_perl.
>
> Regards,
> Michael Schout
>



-- 
John Dunlap
*CTO | Lariat *

*Direct:*
*j...@lariat.co *

*Customer Service:*
877.268.6667
supp...@lariat.co


Re: Random segmentation fault

2015-09-14 Thread John Dunlap
I'll probably deal with this by staying on Debian 7 for the near future.
I'll attempt upgrading again in Debian 9.

On Mon, Sep 14, 2015 at 10:27 AM, Michael Schout  wrote:

> On 9/11/15 2:26 PM, John Dunlap wrote:
> > I found a lot of stuff like the following in my Apache logs. Is it
> > possible to get this kind of output from Apache when the server runs
> > out of memory? I wouldn't have expected so. It has all the hallmarks
> > of something more sinister.
>
> For whatever its worth, I started seen random segfaults starting between
> 5.18 and 5.20 somewhere.  I actually have a bizarre way to reproduce the
> one I see reliably by moving a return in my code.  I'm not sure if mine
> is related to the segfault you are seeing, but you might try downgrading
> to 5.18 if that is an option and see if the problem goes away.
>
> I'm stuck on 5.16 until I can figure this out because regexes have nasty
> bugs in 5.18 (see https://rt.perl.org/Public/Bug/Display.html?id=125491).
>
> I am planning to bisect against perl 5.19 git to figure out where this
> broke, but I just haven't had time yet.
>
> Regards,
> Michael Schout
>
>
>


-- 
John Dunlap
*CTO | Lariat *

*Direct:*
*j...@lariat.co *

*Customer Service:*
877.268.6667
supp...@lariat.co


Re: Random segmentation fault

2015-09-14 Thread Michael Schout
On 9/11/15 2:26 PM, John Dunlap wrote:
> I found a lot of stuff like the following in my Apache logs. Is it
> possible to get this kind of output from Apache when the server runs
> out of memory? I wouldn't have expected so. It has all the hallmarks
> of something more sinister.

For whatever its worth, I started seen random segfaults starting between
5.18 and 5.20 somewhere.  I actually have a bizarre way to reproduce the
one I see reliably by moving a return in my code.  I'm not sure if mine
is related to the segfault you are seeing, but you might try downgrading
to 5.18 if that is an option and see if the problem goes away.

I'm stuck on 5.16 until I can figure this out because regexes have nasty
bugs in 5.18 (see https://rt.perl.org/Public/Bug/Display.html?id=125491).

I am planning to bisect against perl 5.19 git to figure out where this
broke, but I just haven't had time yet.

Regards,
Michael Schout




Re: Random segmentation fault

2015-09-14 Thread John Dunlap
No, I have not. I don't have time to mess with compiling my own version and
I certainly don't have time to support a custom version. I always run the
version that comes with Debian out of the box.

On Mon, Sep 14, 2015 at 1:12 PM, Steve Hay 
wrote:

> Have you tried 5.20.3? This has just been released and contains a number
> of crash fixes. (I wonder if #123398 might be relevant?)
>
> On 14 September 2015 at 15:57, John Dunlap  wrote:
>
>> I'll probably deal with this by staying on Debian 7 for the near future.
>> I'll attempt upgrading again in Debian 9.
>>
>> On Mon, Sep 14, 2015 at 10:27 AM, Michael Schout  wrote:
>>
>>> On 9/11/15 2:26 PM, John Dunlap wrote:
>>> > I found a lot of stuff like the following in my Apache logs. Is it
>>> > possible to get this kind of output from Apache when the server runs
>>> > out of memory? I wouldn't have expected so. It has all the hallmarks
>>> > of something more sinister.
>>>
>>> For whatever its worth, I started seen random segfaults starting between
>>> 5.18 and 5.20 somewhere.  I actually have a bizarre way to reproduce the
>>> one I see reliably by moving a return in my code.  I'm not sure if mine
>>> is related to the segfault you are seeing, but you might try downgrading
>>> to 5.18 if that is an option and see if the problem goes away.
>>>
>>> I'm stuck on 5.16 until I can figure this out because regexes have nasty
>>> bugs in 5.18 (see https://rt.perl.org/Public/Bug/Display.html?id=125491
>>> ).
>>>
>>> I am planning to bisect against perl 5.19 git to figure out where this
>>> broke, but I just haven't had time yet.
>>>
>>> Regards,
>>> Michael Schout
>>>
>>>
>>>
>>
>>
>> --
>> John Dunlap
>> *CTO | Lariat *
>>
>> *Direct:*
>> *j...@lariat.co *
>>
>> *Customer Service:*
>> 877.268.6667
>> supp...@lariat.co
>>
>
>


-- 
John Dunlap
*CTO | Lariat *

*Direct:*
*j...@lariat.co *

*Customer Service:*
877.268.6667
supp...@lariat.co


Re: Random segmentation fault

2015-09-14 Thread Steve Hay
Have you tried 5.20.3? This has just been released and contains a number of
crash fixes. (I wonder if #123398 might be relevant?)

On 14 September 2015 at 15:57, John Dunlap  wrote:

> I'll probably deal with this by staying on Debian 7 for the near future.
> I'll attempt upgrading again in Debian 9.
>
> On Mon, Sep 14, 2015 at 10:27 AM, Michael Schout  wrote:
>
>> On 9/11/15 2:26 PM, John Dunlap wrote:
>> > I found a lot of stuff like the following in my Apache logs. Is it
>> > possible to get this kind of output from Apache when the server runs
>> > out of memory? I wouldn't have expected so. It has all the hallmarks
>> > of something more sinister.
>>
>> For whatever its worth, I started seen random segfaults starting between
>> 5.18 and 5.20 somewhere.  I actually have a bizarre way to reproduce the
>> one I see reliably by moving a return in my code.  I'm not sure if mine
>> is related to the segfault you are seeing, but you might try downgrading
>> to 5.18 if that is an option and see if the problem goes away.
>>
>> I'm stuck on 5.16 until I can figure this out because regexes have nasty
>> bugs in 5.18 (see https://rt.perl.org/Public/Bug/Display.html?id=125491).
>>
>> I am planning to bisect against perl 5.19 git to figure out where this
>> broke, but I just haven't had time yet.
>>
>> Regards,
>> Michael Schout
>>
>>
>>
>
>
> --
> John Dunlap
> *CTO | Lariat *
>
> *Direct:*
> *j...@lariat.co *
>
> *Customer Service:*
> 877.268.6667
> supp...@lariat.co
>


Re: Random segmentation fault

2015-09-14 Thread Michael Schout
On 9/14/15 12:12 PM, Steve Hay wrote:
> Have you tried 5.20.3? This has just been released and contains a
> number of crash fixes. (I wonder if #123398 might be relevant?)
I just tried 5.20.3.

For my issue (mentioned earlier in this thread), 5.20.3 does not help.

I'll post a followup message in  anew thread when I have time, but
basically, I started seeing two different problems somewhere between
5.19.0 and and 5.20.0:

1) "panic: attempt to copy freed scalar  to " reliably
reproduced when calling $cgi->param(x => ''); inside a TryCatch try { }
block. (and $cgi is CGI.pm here)

   Test case I have works perfectly against perl, but produces the above
panic under mod_perl.

  Seems to be something to do with TryCatch or Devel::Declare as the
problem goes away if I use eval { } instead of try { }.  Unfortunately
this is a large client codebase heavly invested in TryCatch so moving
away from that is not going to be fun/easy.

2) repeatable segfault by a certain subroutine doing something as simple as:

  sub foo {
my ($self, $field, $type) = @_;

if ($type = 'X') {
 return $self->_bar($field);
}
  }

  The fun part of this one is that if I remove the "return" keyword, the
segfault goes away.

Regards,
Michael Schout




Re: Random segmentation fault

2015-09-06 Thread Dr James Smith

John,

Sometimes it's difficult to see what the error is because you can't see 
the request (doesn't get logged)


To get round this - add:

 * a transhandler which writes a tag (e.g. ST), the request and the PID
   to the error log
 * a cleanuphandler which does the same... with a different tag (e.g. FI)

you can then get a better idea of what is causing the error as the 
request that causes the seg-fault will have a ST just before the seg 
fault but no FI... you will also have a history of all the request 
handled by that PID (in case it is cumulative)


Sometime (about 12 years) ago we were having errors with apparently 
random requests (including static images) - doing this we discovered the 
request which died was the request after a request which talked to a 
particular Oracle database.


On the live site we just killed the child at the end of these 
requests... and then went back to diagnose the error...


James

On 03/09/2015 22:21, John Dunlap wrote:
Ever since upgrading from Debian 7 - which shipped with Apache 2.2 - 
to Debian 8 - which shipped with Apache 2.4 - my user base has been 
reporting that their browsers randomly tell them "No data received". 
To date, they have not been able to identify any kind of pattern which 
triggers it. I've been sifting through the server logs looking for 
problems and I'm seeing a lot of errors similar to the following:
[Thu Sep 03 21:12:52.382357 2015] [core:notice] [pid 13199:tid 
140364918835072] AH00052: child pid 2088 exit signal Segmentation 
fault (11)
[Thu Sep 03 21:13:03.406215 2015] [core:notice] [pid 13199:tid 
140364918835072] AH00052: child pid 2121 exit signal Segmentation 
fault (11)
[Thu Sep 03 21:13:05.417909 2015] [core:notice] [pid 13199:tid 
140364918835072] AH00052: child pid 2165 exit signal Segmentation 
fault (11)
[Thu Sep 03 21:13:08.433829 2015] [core:notice] [pid 13199:tid 
140364918835072] AH00052: child pid 2232 exit signal Segmentation 
fault (11)
[Thu Sep 03 21:15:53.614351 2015] [core:notice] [pid 13199:tid 
140364918835072] AH00052: child pid 2264 exit signal Segmentation 
fault (11)
[Thu Sep 03 21:16:03.637236 2015] [core:notice] [pid 13199:tid 
140364918835072] AH00052: child pid 2539 exit signal Segmentation 
fault (11)



Can someone give me some tips on how to proceed with troubleshooting 
this and, possibly, fixing it?


--
John Dunlap
/CTO | Lariat/
/
/
/*Direct:*/
/j...@lariat.co /
/
*Customer Service:*/
877.268.6667
supp...@lariat.co 





--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE.