Slow fork bomb message in latest version of POE

2014-03-24 Thread albertocurro
Guys,

 We have a product developed using POE as a base framework, with some other 
tool libraries as log4perl; basically is a forward proxy, composed of several 
modules, each one of them comprising a POE::Session; all of them share an 
internal queue of tasks to be performed. Each module performs several tasks on 
initialization, and if anything goes wrong, croak() is called to stop the 
service -> this is considered ok, since croak() is only called during 
initialization, when validation is being performed.

 The product is stable and works really fine, but recently I updated POE to the 
latest version, and since then we can see this message in the logs:

registering pdu failed: 263!
=== 5267 === 5 -> on_handle (from Handler/StoreRemote.pm at 87)
=== 5267 === 5 -> on_retry (from Handler/StoreRemote.pm at 141)
=== 5267 === 9 -> on_handle (from Handler/StoreRemote.pm at 87)
=== 5267 === 9 -> on_retry (from Handler/StoreRemote.pm at 141)
=== 5267 === !!! Kernel has child processes.
=== 5267 === !!! Stopped child process (PID 5373) reaped when 
POE::Kernel->run() is ready to return.
=== 5267 === !!! Stopped child process (PID 5374) reaped when 
POE::Kernel->run() is ready to return.
=== 5267 === !!! At least one child process is still running when 
POE::Kernel->run() is ready to return.
=== 5267 === !!! Be sure to use sig_child() to reap child processes.
=== 5267 === !!! In extreme cases, failure to reap child processes has
=== 5267 === !!! resulted in a slow 'fork bomb' that has halted systems.
mkdir /mnt/nfs99: Permission denied at Handler/Store.pm line 147

first lines and last line above are the errors itself, but this part is new 
since the upgrading:

=== 5267 === !!! Kernel has child processes.
=== 5267 === !!! Stopped child process (PID 5373) reaped when 
POE::Kernel->run() is ready to return.
=== 5267 === !!! Stopped child process (PID 5374) reaped when 
POE::Kernel->run() is ready to return.
=== 5267 === !!! At least one child process is still running when 
POE::Kernel->run() is ready to return.
=== 5267 === !!! Be sure to use sig_child() to reap child processes.
=== 5267 === !!! In extreme cases, failure to reap child processes has
=== 5267 === !!! resulted in a slow 'fork bomb' that has halted systems.

I can see it everytime the service is stopped because of an unhandled 
condition, even when POE's event loop has been already running for ours. It was 
not visible before, and I can't get rid of it in any way. I've tried different 
ways to avoid it with no luck.

Any advice or alternative approach on this?

Many thanks
Alberto



Re: Slow fork bomb message in latest version of POE

2014-03-24 Thread Rocco Caputo
Hi, Alberto.

At program end time, POE runs a quick waitpid() check for child processes that 
may have leaked.  This check was added after a bug report where POE locked up a 
server after several days of running.  It turned out to be the reporter's 
application, but it was hard to debug.

Your program seems to have created two processes that it didn't reap: PIDs 5373 
and 5374.  The ideal solution is to reap those processes before exiting.  Your 
program can do this using POE::Kernel's sig_child() method.

In some cases, a third-party library will create processes and not properly 
clean them up.  It can be impossible to solve this case without modifying other 
people's code.

If you just want to ignore the problem, this might do the trick.  Put these 
lines in your last _stop handler.  They should reap the processes you've leaked 
before POE's check:

use POSIX ":sys_wait_h";
1 while waitpid(WNOHANG, -1) > 0;

It's a bit of a pain, but I think it's better to explicitly ignore the problem 
than for it to go unnoticed by default.

Please let me know whether that resolves your problem.  It may not.  For 
example, the processes may still be open until an object is destroyed at global 
destruction time.

-- 
Rocco Caputo 

On Mar 24, 2014, at 05:46, albertocurro  wrote:

> Guys,
> 
> We have a product developed using POE as a base framework, with some other 
> tool libraries as log4perl; basically is a forward proxy, composed of several 
> modules, each one of them comprising a POE::Session; all of them share an 
> internal queue of tasks to be performed. Each module performs several tasks 
> on initialization, and if anything goes wrong, croak() is called to stop the 
> service -> this is considered ok, since croak() is only called during 
> initialization, when validation is being performed.
> 
> The product is stable and works really fine, but recently I updated POE to 
> the latest version, and since then we can see this message in the logs:
> 
> registering pdu failed: 263!
> === 5267 === 5 -> on_handle (from Handler/StoreRemote.pm at 87)
> === 5267 === 5 -> on_retry (from Handler/StoreRemote.pm at 141)
> === 5267 === 9 -> on_handle (from Handler/StoreRemote.pm at 87)
> === 5267 === 9 -> on_retry (from Handler/StoreRemote.pm at 141)
> === 5267 === !!! Kernel has child processes.
> === 5267 === !!! Stopped child process (PID 5373) reaped when 
> POE::Kernel->run() is ready to return.
> === 5267 === !!! Stopped child process (PID 5374) reaped when 
> POE::Kernel->run() is ready to return.
> === 5267 === !!! At least one child process is still running when 
> POE::Kernel->run() is ready to return.
> === 5267 === !!! Be sure to use sig_child() to reap child processes.
> === 5267 === !!! In extreme cases, failure to reap child processes has
> === 5267 === !!! resulted in a slow 'fork bomb' that has halted systems.
> mkdir /mnt/nfs99: Permission denied at Handler/Store.pm line 147
> 
> first lines and last line above are the errors itself, but this part is new 
> since the upgrading:
> 
> === 5267 === !!! Kernel has child processes.
> === 5267 === !!! Stopped child process (PID 5373) reaped when 
> POE::Kernel->run() is ready to return.
> === 5267 === !!! Stopped child process (PID 5374) reaped when 
> POE::Kernel->run() is ready to return.
> === 5267 === !!! At least one child process is still running when 
> POE::Kernel->run() is ready to return.
> === 5267 === !!! Be sure to use sig_child() to reap child processes.
> === 5267 === !!! In extreme cases, failure to reap child processes has
> === 5267 === !!! resulted in a slow 'fork bomb' that has halted systems.
> 
> I can see it everytime the service is stopped because of an unhandled 
> condition, even when POE's event loop has been already running for ours. It 
> was not visible before, and I can't get rid of it in any way. I've tried 
> different ways to avoid it with no luck.
> 
> Any advice or alternative approach on this?
> 
> Many thanks
> Alberto
> 



Re: Slow fork bomb message in latest version of POE

2014-03-24 Thread albertocurro
Hi Rocco,

 many thanks for your quick answer! Unfortunately, the provided solution only 
works partially. I still have some cases where the "fork bomb" message is here 
with us :(

  One of the cases is this one: under some configuration, an instance of nginx 
is started, so our product writes the configuration file and starts the Nginx 
instance pointing to that configuration file. BUT, if the configuration file 
could not be written (directory does not exist, etc), then the error raises, 
and I've not found any way to handle it:

DEBUG - Created nginx temporary directory /opt/tmp/pull/instance1
DEBUG - Created nginx configuration directory /opt/etc/pull/instance1
DEBUG - Created nginx log directory /opt/log/pull/instance1
DEBUG - creating nginx configfile for instance 1 in /opt/etc/pull/instance1
=== 13991 === !!! Kernel has 1 child process(es).
=== 13991 === !!! At least one child process is still running when 
POE::Kernel->run() is ready to return.
=== 13991 === !!! Be sure to use sig_child() to reap child processes.
=== 13991 === !!! In extreme cases, failure to reap child processes has
=== 13991 === !!! resulted in a slow 'fork bomb' that has halted systems.
Could not open file: No such file or directory

 I've added a DIE handler in the main session to try to handle this:

 $sig_session = POE::Session->create(
inline_states => {
_start => sub {
$_[HEAP]{RELOADED} = 0;
$_[KERNEL]->sig(TERM => '_sigterm');
$_[KERNEL]->sig(INT => '_sigterm');
$_[KERNEL]->sig(DIE => '_sigterm');
$_[KERNEL]->sig(nginx_reload => '_sig_nginx_reload');
$_[KERNEL]->alias_set('sighandler');
},
_sigdie => sub {
print "Handling exception, calling stop";
POE::Kernel->call($sig_session, '_stop');
},
_stop => sub {
# Reap any existing pid (# 1825119)
print "Handling stop";
POE::Kernel->sig_child();
use POSIX ":sys_wait_h";
1 while waitpid(WNOHANG, -1) > 0;

# Clear signal handlers...
$_[KERNEL]->sig('TERM');

But, as said above, it's not working. Checking POE's code, I can see the 
message lines are generated in Resources/Signals.pm, under _data_sig_finalize() 
method (where POE is already doing the same you recommended me, waiting for the 
pid).

But _data_sig_finalize() method is called in Kernel.pm just after unregistered 
all the signals (Kernel.pm => _finalize_kernel):

 my $self = shift;

  # Disable signal watching since there's now no place for them to go.
  foreach ($self->_data_sig_get_safe_signals()) {
$self->loop_ignore_signal($_);
  }

  # Remove the kernel session's signal watcher.
  $self->_data_sig_remove($self->ID, "IDLE");

  # The main loop is done, no matter which event library ran it.
  # sig before loop so that it clears the signal_pipe file handler
  $self->_data_sig_finalize();
  $self->loop_finalize();

 Once here, none of my signal handlers in the main session instance would work, 
as the signals have been unregistered. On an exception (die) while 
POE::Kernel->run(), how could I handle it then??

 Thanks a lot
 Alberto




 Activado lun, 24 mar 2014 13:45:45 +0100 Rocco Caputo  escribió  

>Hi, Alberto. 
> 
>At program end time, POE runs a quick waitpid() check for child processes that 
>may have leaked. This check was added after a bug report where POE locked up a 
>server after several days of running. It turned out to be the reporter's 
>application, but it was hard to debug. 
> 
>Your program seems to have created two processes that it didn't reap: PIDs 
>5373 and 5374. The ideal solution is to reap those processes before exiting. 
>Your program can do this using POE::Kernel's sig_child() method. 
> 
>In some cases, a third-party library will create processes and not properly 
>clean them up. It can be impossible to solve this case without modifying other 
>people's code. 
> 
>If you just want to ignore the problem, this might do the trick. Put these 
>lines in your last _stop handler. They should reap the processes you've leaked 
>before POE's check: 
> 
>use POSIX ":sys_wait_h"; 
>1 while waitpid(WNOHANG, -1) > 0; 
> 
>It's a bit of a pain, but I think it's better to explicitly ignore the problem 
>than for it to go unnoticed by default. 
> 
>Please let me know whether that resolves your problem. It may not. For 
>example, the processes may still be open until an object is destroyed at 
>global destruction time. 
> 
>-- 
>Rocco Caputo  
> 
>On Mar 24, 2014, at 05:46, albertocurro  wrote: 
> 
>> Guys, 
>> 
>> We have a product developed using POE as a base framework, with some other 
>> tool libraries as log4perl; basically is a forward proxy, composed of 
>> several modules, each one of them comprising a POE::Session; all of them 
>> share an internal queue of tasks to be performed. Each module performs 
>> several tasks on initialization, and if anything goes wrong

Re: Slow fork bomb message in latest version of POE

2014-03-24 Thread Rocco Caputo
You are not using sig_child() as intended.  When used as intended, sig_child() 
will prevent shutdown until the child process has exited and has been reaped.  
The timing issues you're worried about should not exist.

-- 
Rocco Caputo 

On Mar 24, 2014, at 11:44, albertocurro  wrote:

> Hi Rocco,
> 
> many thanks for your quick answer! Unfortunately, the provided solution only 
> works partially. I still have some cases where the "fork bomb" message is 
> here with us :(
> 
>  One of the cases is this one: under some configuration, an instance of nginx 
> is started, so our product writes the configuration file and starts the Nginx 
> instance pointing to that configuration file. BUT, if the configuration file 
> could not be written (directory does not exist, etc), then the error raises, 
> and I've not found any way to handle it:
> 
> DEBUG - Created nginx temporary directory /opt/tmp/pull/instance1
> DEBUG - Created nginx configuration directory /opt/etc/pull/instance1
> DEBUG - Created nginx log directory /opt/log/pull/instance1
> DEBUG - creating nginx configfile for instance 1 in /opt/etc/pull/instance1
> === 13991 === !!! Kernel has 1 child process(es).
> === 13991 === !!! At least one child process is still running when 
> POE::Kernel->run() is ready to return.
> === 13991 === !!! Be sure to use sig_child() to reap child processes.
> === 13991 === !!! In extreme cases, failure to reap child processes has
> === 13991 === !!! resulted in a slow 'fork bomb' that has halted systems.
> Could not open file: No such file or directory
> 
> I've added a DIE handler in the main session to try to handle this:
> 
> $sig_session = POE::Session->create(
>inline_states => {
>_start => sub {
>$_[HEAP]{RELOADED} = 0;
>$_[KERNEL]->sig(TERM => '_sigterm');
>$_[KERNEL]->sig(INT => '_sigterm');
>$_[KERNEL]->sig(DIE => '_sigterm');
>$_[KERNEL]->sig(nginx_reload => '_sig_nginx_reload');
>$_[KERNEL]->alias_set('sighandler');
>},
>_sigdie => sub {
>print "Handling exception, calling stop";
>POE::Kernel->call($sig_session, '_stop');
>},
>_stop => sub {
># Reap any existing pid (# 1825119)
>print "Handling stop";
>POE::Kernel->sig_child();
>use POSIX ":sys_wait_h";
>1 while waitpid(WNOHANG, -1) > 0;
> 
># Clear signal handlers...
>$_[KERNEL]->sig('TERM');
> 
> But, as said above, it's not working. Checking POE's code, I can see the 
> message lines are generated in Resources/Signals.pm, under 
> _data_sig_finalize() method (where POE is already doing the same you 
> recommended me, waiting for the pid).
> 
> But _data_sig_finalize() method is called in Kernel.pm just after 
> unregistered all the signals (Kernel.pm => _finalize_kernel):
> 
> my $self = shift;
> 
>  # Disable signal watching since there's now no place for them to go.
>  foreach ($self->_data_sig_get_safe_signals()) {
>$self->loop_ignore_signal($_);
>  }
> 
>  # Remove the kernel session's signal watcher.
>  $self->_data_sig_remove($self->ID, "IDLE");
> 
>  # The main loop is done, no matter which event library ran it.
>  # sig before loop so that it clears the signal_pipe file handler
>  $self->_data_sig_finalize();
>  $self->loop_finalize();
> 
> Once here, none of my signal handlers in the main session instance would 
> work, as the signals have been unregistered. On an exception (die) while 
> POE::Kernel->run(), how could I handle it then??
> 
> Thanks a lot
> Alberto
> 
> 
> 
> 
>  Activado lun, 24 mar 2014 13:45:45 +0100 Rocco Caputo  escribió  
> 
>> Hi, Alberto. 
>> 
>> At program end time, POE runs a quick waitpid() check for child processes 
>> that may have leaked. This check was added after a bug report where POE 
>> locked up a server after several days of running. It turned out to be the 
>> reporter's application, but it was hard to debug. 
>> 
>> Your program seems to have created two processes that it didn't reap: PIDs 
>> 5373 and 5374. The ideal solution is to reap those processes before exiting. 
>> Your program can do this using POE::Kernel's sig_child() method. 
>> 
>> In some cases, a third-party library will create processes and not properly 
>> clean them up. It can be impossible to solve this case without modifying 
>> other people's code. 
>> 
>> If you just want to ignore the problem, this might do the trick. Put these 
>> lines in your last _stop handler. They should reap the processes you've 
>> leaked before POE's check: 
>> 
>> use POSIX ":sys_wait_h"; 
>> 1 while waitpid(WNOHANG, -1) > 0; 
>> 
>> It's a bit of a pain, but I think it's better to explicitly ignore the 
>> problem than for it to go unnoticed by default. 
>> 
>> Please let me know whether that resolves your problem. It may not. For 
>> example, the processes may still be open until an object is destroyed at 
>> global destruction ti

Asunto: Re: Slow fork bomb message in latest version of POE

2014-03-24 Thread albertocurro
Hi again,

 sorry! from the code below, there's a mistake as DIE signal is linked to 
_sigterm, while is really pointing to _sigdie; just to clarify it before 
someone says "it can't work, you are pointing to the wrong method!" :D
 
 Alberto

 
 Activado lun, 24 mar 2014 16:44:36 +0100 
albertocurro escribió  

 > Hi Rocco, 
 >  
 >  many thanks for your quick answer! Unfortunately, the provided solution 
 > only works partially. I still have some cases where the "fork bomb" message 
 > is here with us :( 
 >  
 >   One of the cases is this one: under some configuration, an instance of 
 > nginx is started, so our product writes the configuration file and starts 
 > the Nginx instance pointing to that configuration file. BUT, if the 
 > configuration file could not be written (directory does not exist, etc), 
 > then the error raises, and I've not found any way to handle it: 
 >  
 > DEBUG - Created nginx temporary directory /opt/tmp/pull/instance1 
 > DEBUG - Created nginx configuration directory /opt/etc/pull/instance1 
 > DEBUG - Created nginx log directory /opt/log/pull/instance1 
 > DEBUG - creating nginx configfile for instance 1 in /opt/etc/pull/instance1 
 > === 13991 === !!! Kernel has 1 child process(es). 
 > === 13991 === !!! At least one child process is still running when 
 > POE::Kernel->run() is ready to return. 
 > === 13991 === !!! Be sure to use sig_child() to reap child processes. 
 > === 13991 === !!! In extreme cases, failure to reap child processes has 
 > === 13991 === !!! resulted in a slow 'fork bomb' that has halted systems. 
 > Could not open file: No such file or directory 
 >  
 >  I've added a DIE handler in the main session to try to handle this: 
 >  
 >  $sig_session = POE::Session->create( 
 > inline_states => { 
 > _start => sub { 
 > $_[HEAP]{RELOADED} = 0; 
 > $_[KERNEL]->sig(TERM => '_sigterm'); 
 > $_[KERNEL]->sig(INT => '_sigterm'); 
 > $_[KERNEL]->sig(DIE => '_sigterm'); 
 > $_[KERNEL]->sig(nginx_reload => '_sig_nginx_reload'); 
 > $_[KERNEL]->alias_set('sighandler'); 
 > }, 
 > _sigdie => sub { 
 > print "Handling exception, calling stop"; 
 > POE::Kernel->call($sig_session, '_stop'); 
 > }, 
 > _stop => sub { 
 > # Reap any existing pid (# 1825119) 
 > print "Handling stop"; 
 > POE::Kernel->sig_child(); 
 > use POSIX ":sys_wait_h"; 
 > 1 while waitpid(WNOHANG, -1) > 0; 
 >  
 > # Clear signal handlers... 
 > $_[KERNEL]->sig('TERM'); 
 >  
 > But, as said above, it's not working. Checking POE's code, I can see the 
 > message lines are generated in Resources/Signals.pm, under 
 > _data_sig_finalize() method (where POE is already doing the same you 
 > recommended me, waiting for the pid). 
 >  
 > But _data_sig_finalize() method is called in Kernel.pm just after 
 > unregistered all the signals (Kernel.pm => _finalize_kernel): 
 >  
 >  my $self = shift; 
 >  
 >   # Disable signal watching since there's now no place for them to go. 
 >   foreach ($self->_data_sig_get_safe_signals()) { 
 > $self->loop_ignore_signal($_); 
 >   } 
 >  
 >   # Remove the kernel session's signal watcher. 
 >   $self->_data_sig_remove($self->ID, "IDLE"); 
 >  
 >   # The main loop is done, no matter which event library ran it. 
 >   # sig before loop so that it clears the signal_pipe file handler 
 >   $self->_data_sig_finalize(); 
 >   $self->loop_finalize(); 
 >  
 >  Once here, none of my signal handlers in the main session instance would 
 > work, as the signals have been unregistered. On an exception (die) while 
 > POE::Kernel->run(), how could I handle it then?? 
 >  
 >  Thanks a lot 
 >  Alberto 
 >  
 >  
 >  
 >  
 >  Activado lun, 24 mar 2014 13:45:45 +0100 Rocco Caputo  escribió   
 >  
 > >Hi, Alberto.  
 > >  
 > >At program end time, POE runs a quick waitpid() check for child processes 
 > >that may have leaked. This check was added after a bug report where POE 
 > >locked up a server after several days of running. It turned out to be the 
 > >reporter's application, but it was hard to debug.  
 > >  
 > >Your program seems to have created two processes that it didn't reap: PIDs 
 > >5373 and 5374. The ideal solution is to reap those processes before 
 > >exiting. Your program can do this using POE::Kernel's sig_child() method.  
 > >  
 > >In some cases, a third-party library will create processes and not properly 
 > >clean them up. It can be impossible to solve this case without modifying 
 > >other people's code.  
 > >  
 > >If you just want to ignore the problem, this might do the trick. Put these 
 > >lines in your last _stop handler. They should reap the processes you've 
 > >leaked before POE's check:  
 > >  
 > >use POSIX ":sys_wait_h";  
 > >1 while waitpid(WNOHANG, -1) > 0;  
 > >  
 > >It's a bit of a pain, but I thi

Asunto: Re: Slow fork bomb message in latest version of POE

2014-03-24 Thread albertocurro

Hi,

 Sorry, but I don't catch what you exactly mean with "not using sig_child() as 
intended". Do you mean calling it from the main session so each child process 
will be closed properly? 

 The issue I have is how to handle unexpected exceptions. Seems they are thrown 
and raised without control, killing POE's kernel before in the way. I could be 
thinking in the timing in the wrong way, though...

 Alberto

 Activado lun, 24 mar 2014 16:59:49 +0100 Rocco Caputo 
escribió  

 > You are not using sig_child() as intended.  When used as intended, 
 > sig_child() will prevent shutdown until the child process has exited and has 
 > been reaped.  The timing issues you're worried about should not exist. 
 >  
 > --  
 > Rocco Caputo  
 >  
 > On Mar 24, 2014, at 11:44, albertocurro  wrote: 
 >  
 > > Hi Rocco, 
 > >  
 > > many thanks for your quick answer! Unfortunately, the provided solution 
 > > only works partially. I still have some cases where the "fork bomb" 
 > > message is here with us :( 
 > >  
 > >  One of the cases is this one: under some configuration, an instance of 
 > > nginx is started, so our product writes the configuration file and starts 
 > > the Nginx instance pointing to that configuration file. BUT, if the 
 > > configuration file could not be written (directory does not exist, etc), 
 > > then the error raises, and I've not found any way to handle it: 
 > >  
 > > DEBUG - Created nginx temporary directory /opt/tmp/pull/instance1 
 > > DEBUG - Created nginx configuration directory /opt/etc/pull/instance1 
 > > DEBUG - Created nginx log directory /opt/log/pull/instance1 
 > > DEBUG - creating nginx configfile for instance 1 in 
 > > /opt/etc/pull/instance1 
 > > === 13991 === !!! Kernel has 1 child process(es). 
 > > === 13991 === !!! At least one child process is still running when 
 > > POE::Kernel->run() is ready to return. 
 > > === 13991 === !!! Be sure to use sig_child() to reap child processes. 
 > > === 13991 === !!! In extreme cases, failure to reap child processes has 
 > > === 13991 === !!! resulted in a slow 'fork bomb' that has halted systems. 
 > > Could not open file: No such file or directory 
 > >  
 > > I've added a DIE handler in the main session to try to handle this: 
 > >  
 > > $sig_session = POE::Session->create( 
 > >inline_states => { 
 > >_start => sub { 
 > >$_[HEAP]{RELOADED} = 0; 
 > >$_[KERNEL]->sig(TERM => '_sigterm'); 
 > >$_[KERNEL]->sig(INT => '_sigterm'); 
 > >$_[KERNEL]->sig(DIE => '_sigterm'); 
 > >$_[KERNEL]->sig(nginx_reload => '_sig_nginx_reload'); 
 > >$_[KERNEL]->alias_set('sighandler'); 
 > >}, 
 > >_sigdie => sub { 
 > >print "Handling exception, calling stop"; 
 > >POE::Kernel->call($sig_session, '_stop'); 
 > >}, 
 > >_stop => sub { 
 > ># Reap any existing pid (# 1825119) 
 > >print "Handling stop"; 
 > >POE::Kernel->sig_child(); 
 > >use POSIX ":sys_wait_h"; 
 > >1 while waitpid(WNOHANG, -1) > 0; 
 > >  
 > ># Clear signal handlers... 
 > >$_[KERNEL]->sig('TERM'); 
 > >  
 > > But, as said above, it's not working. Checking POE's code, I can see the 
 > > message lines are generated in Resources/Signals.pm, under 
 > > _data_sig_finalize() method (where POE is already doing the same you 
 > > recommended me, waiting for the pid). 
 > >  
 > > But _data_sig_finalize() method is called in Kernel.pm just after 
 > > unregistered all the signals (Kernel.pm => _finalize_kernel): 
 > >  
 > > my $self = shift; 
 > >  
 > >  # Disable signal watching since there's now no place for them to go. 
 > >  foreach ($self->_data_sig_get_safe_signals()) { 
 > >$self->loop_ignore_signal($_); 
 > >  } 
 > >  
 > >  # Remove the kernel session's signal watcher. 
 > >  $self->_data_sig_remove($self->ID, "IDLE"); 
 > >  
 > >  # The main loop is done, no matter which event library ran it. 
 > >  # sig before loop so that it clears the signal_pipe file handler 
 > >  $self->_data_sig_finalize(); 
 > >  $self->loop_finalize(); 
 > >  
 > > Once here, none of my signal handlers in the main session instance would 
 > > work, as the signals have been unregistered. On an exception (die) while 
 > > POE::Kernel->run(), how could I handle it then?? 
 > >  
 > > Thanks a lot 
 > > Alberto 
 > >  
 > >  
 > >  
 > >  
 > >  Activado lun, 24 mar 2014 13:45:45 +0100 Rocco Caputo  escribió   
 > >  
 > >> Hi, Alberto.  
 > >>  
 > >> At program end time, POE runs a quick waitpid() check for child processes 
 > >> that may have leaked. This check was added after a bug report where POE 
 > >> locked up a server after several days of running. It turned out to be the 
 > >> reporter's application, but it was hard to debug.  
 > >>  
 > >> Your program seems to have created two processes that it didn't reap: 
 > >> PIDs 5373 and 5374. T

Re: Asunto: Re: Slow fork bomb message in latest version of POE

2014-03-24 Thread Rocco Caputo
Hi again.

What I mean is that I don't think you know what sig_child() does exactly, or 
how to use it.  I base this impression on two things: First, you're calling 
sig_child() from a place where it will never work and at a time that is 
obviously too late to do anything.  Second, it needs at least two parameters to 
work, but you're passing it nothing.

I recommend not using SIGDIE for common exception handling.  Its scope is too 
broad, and your code will get ugly.  It's probably cleaner to use eval{} or 
Try::Tiny to convert your unexpected exceptions into expected ones.  If you 
catch them explicitly, then POE won't need to raise them, and there should be 
less strange behavior.

The problem seems to be migrating.  I recommend caution against further 
clouding the original issue until it's resolved.

If you resolve your exceptions issue, and if you resolve your sig_child() usage 
issue, then your program should not be interrupted at inopportune times, and it 
should reap the nginx process before it exits.  This should resolve all 
outstanding issues, as I currently understand them.

-- 
Rocco Caputo 

On Mar 24, 2014, at 12:15, albertocurro  wrote:

> 
> Hi,
> 
> Sorry, but I don't catch what you exactly mean with "not using sig_child() as 
> intended". Do you mean calling it from the main session so each child process 
> will be closed properly? 
> 
> The issue I have is how to handle unexpected exceptions. Seems they are 
> thrown and raised without control, killing POE's kernel before in the way. I 
> could be thinking in the timing in the wrong way, though...
> 
> Alberto
> 
>  Activado lun, 24 mar 2014 16:59:49 +0100 Rocco Caputo 
> escribió  
> 
>> You are not using sig_child() as intended.  When used as intended, 
>> sig_child() will prevent shutdown until the child process has exited and has 
>> been reaped.  The timing issues you're worried about should not exist. 
>> 
>> --  
>> Rocco Caputo  
>> 
>> On Mar 24, 2014, at 11:44, albertocurro  wrote: 
>> 
>>> Hi Rocco, 
>>> 
>>> many thanks for your quick answer! Unfortunately, the provided solution 
>>> only works partially. I still have some cases where the "fork bomb" message 
>>> is here with us :( 
>>> 
>>> One of the cases is this one: under some configuration, an instance of 
>>> nginx is started, so our product writes the configuration file and starts 
>>> the Nginx instance pointing to that configuration file. BUT, if the 
>>> configuration file could not be written (directory does not exist, etc), 
>>> then the error raises, and I've not found any way to handle it: 
>>> 
>>> DEBUG - Created nginx temporary directory /opt/tmp/pull/instance1 
>>> DEBUG - Created nginx configuration directory /opt/etc/pull/instance1 
>>> DEBUG - Created nginx log directory /opt/log/pull/instance1 
>>> DEBUG - creating nginx configfile for instance 1 in /opt/etc/pull/instance1 
>>> === 13991 === !!! Kernel has 1 child process(es). 
>>> === 13991 === !!! At least one child process is still running when 
>>> POE::Kernel->run() is ready to return. 
>>> === 13991 === !!! Be sure to use sig_child() to reap child processes. 
>>> === 13991 === !!! In extreme cases, failure to reap child processes has 
>>> === 13991 === !!! resulted in a slow 'fork bomb' that has halted systems. 
>>> Could not open file: No such file or directory 
>>> 
>>> I've added a DIE handler in the main session to try to handle this: 
>>> 
>>> $sig_session = POE::Session->create( 
>>>   inline_states => { 
>>>   _start => sub { 
>>>   $_[HEAP]{RELOADED} = 0; 
>>>   $_[KERNEL]->sig(TERM => '_sigterm'); 
>>>   $_[KERNEL]->sig(INT => '_sigterm'); 
>>>   $_[KERNEL]->sig(DIE => '_sigterm'); 
>>>   $_[KERNEL]->sig(nginx_reload => '_sig_nginx_reload'); 
>>>   $_[KERNEL]->alias_set('sighandler'); 
>>>   }, 
>>>   _sigdie => sub { 
>>>   print "Handling exception, calling stop"; 
>>>   POE::Kernel->call($sig_session, '_stop'); 
>>>   }, 
>>>   _stop => sub { 
>>>   # Reap any existing pid (# 1825119) 
>>>   print "Handling stop"; 
>>>   POE::Kernel->sig_child(); 
>>>   use POSIX ":sys_wait_h"; 
>>>   1 while waitpid(WNOHANG, -1) > 0; 
>>> 
>>>   # Clear signal handlers... 
>>>   $_[KERNEL]->sig('TERM'); 
>>> 
>>> But, as said above, it's not working. Checking POE's code, I can see the 
>>> message lines are generated in Resources/Signals.pm, under 
>>> _data_sig_finalize() method (where POE is already doing the same you 
>>> recommended me, waiting for the pid). 
>>> 
>>> But _data_sig_finalize() method is called in Kernel.pm just after 
>>> unregistered all the signals (Kernel.pm => _finalize_kernel): 
>>> 
>>> my $self = shift; 
>>> 
>>> # Disable signal watching since there's now no place for them to go. 
>>> foreach ($self->_data_sig_get_safe_signals()) { 
>>>   $self->loop_ignore_signal($_); 
>>> } 
>>> 
>>> # Remove the kernel session's signal watc