Bug#930728: pg_cltlcluster leaks logfile descriptor into PostgreSQL backends

2019-06-26 Thread Andrey Borodin
Hi!

> 26 июня 2019 г., в 19:22, Christoph Berg  написал(а):
> 
> Re: Andrey Borodin 2019-06-19 
> <4178a7b3-b742-49d7-afa9-fcb3439a0...@yandex-team.ru>
>> On my observation - yes, descriptor 4 is no longer in lsof.
> 
> Ah, I had not noticed the "4" before. Only 1 and 2 are expected.
> We can simply close() that one after dup2()'ing it to stdout/stderr.
> 
> Thanks for spotting!

Thanks for pushing fix for this.
Maybe we can somehow get rid of 1 and 2 attached to logfile (that will be 
logrotated and deleted)?
Can we redirect output to some temporary file and after pg_ctl append this file 
to logfile?

I simply want to be able to delete this file via logrotate...

Best regards, Andrey Borodin.


Bug#930728: pg_cltlcluster leaks logfile descriptor into PostgreSQL backends

2019-06-20 Thread Andrey Borodin



> 19 июня 2019 г., в 21:05, Andrey Borodin  написал(а):
> 
> But I've checked only on Yandex.Cloud clusters. I think I'll do repro on 
> fresh Ubuntu...

So, I've tried the fix on Ubuntu 18.
There are much more leaked descriptors in this case and fix helps only partially

postgres@x4m-pg-test:/home/x4mmm$ pg_ctlcluster 10 main start
postgres@x4m-pg-test:/home/x4mmm$ lsof | grep 'postgresql-10-main.log' | grep 
postgres | grep '/var/log'
postgres  4715 postgres1w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4715 postgres2w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4715 postgres4w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4715 postgres5w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4717 postgres1w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4717 postgres2w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4717 postgres4w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4717 postgres5w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4718 postgres1w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4718 postgres2w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4718 postgres4w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4718 postgres5w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4719 postgres1w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4719 postgres2w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4719 postgres4w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4719 postgres5w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4720 postgres1w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4720 postgres2w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4720 postgres4w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4720 postgres5w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4721 postgres1w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4721 postgres2w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4721 postgres4w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4721 postgres5w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4722 postgres1w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4722 postgres2w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4722 postgres4w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log
postgres  4722 postgres5w  REG  252,1 1332 
256830 /var/log/postgresql/postgresql-10-main.log

fixing pg_ctlcluster manually
postgres@x4m-pg-test:/usr/bin$ diff pg_ctlcluster pg_ctlcluster_copy 
208,213c208,210
<   #my $output = $info{'logfile'} && -r $info{'logfile'} ? 
$info{'logfile'} : '/dev/null';
<   #dup2(POSIX::open($output, POSIX::O_WRONLY | POSIX::O_APPEND | 
POSIX::O_CREAT), 1);
<   #dup2(POSIX::open($output, POSIX::O_WRONLY | POSIX::O_APPEND | 
POSIX::O_CREAT), 2);
<   
< open(STDOUT, ">>$info{logfile}") or die $!;
< open(STDERR,  '>')  or die $!;
---
> my $output = $info{'logfile'} && -r $info{'logfile'} ? 
> $info{'logfile'} : '/dev/null';
> dup2(POSIX::open($output, POSIX::O_WRONLY | POSIX::O_APPEND | 
> POSIX::O_CREAT), 1);
> dup2(POSIX::open($output, POSIX::O_WRONLY | POSIX::O_APPEND | 
> POSIX::O_CREAT), 2);

postgres@x4m-pg-test:/usr/bin$

Bug#930728: pg_cltlcluster leaks logfile descriptor into PostgreSQL backends

2019-06-19 Thread Andrey Borodin


> 19 июня 2019 г., в 20:57, Christoph Berg  написал(а):
> 
> Control: tags -1 moreinfo
> 
> Re: Andrey Borodin 2019-06-19 
> <3fd6aec7-55c8-4129-a46c-0bb3e4296...@yandex-team.ru>
>> Package: postgresql-common
>> Version: 199
>> 
>> When I execute
>> service postgresql start
>> I observe that logfile descriptor will leak into postmaster and every 
>> backend. This is not fd opened by PG's syslogger.
> 
> That's because we use logging to stderr.
> 
>> -my $fd = POSIX::open($info{'logfile'}, 
>> POSIX::O_WRONLY|POSIX::O_APPEND|POSIX::O_CREAT) or error "Could not open 
>> logfile $info{'logfile'}";
>> -dup2($fd, 1);
>> -dup2($fd, 2);
>> +open(STDOUT, ">>$info{logfile}") or die $!;
>> +open(STDERR,  '>')  or die $!;
> 
> Does that actually do things differently? Does stderr logging still
> work with that patch?
On my observation - yes, descriptor 4 is no longer in lsof.
I looked that I see some lines of pg_ctl output like "output will be redirected 
etcetc" in pglog. But not from stderr.

But I've checked only on Yandex.Cloud clusters. I think I'll do repro on fresh 
Ubuntu...

I'm not an expert in Perl, these two lines were my first experience, and even 
in them I needed a lot of help from colleagues with Perl background.


Best regards, Andrey Borodin.


Bug#930728: pg_cltlcluster leaks logfile descriptor into PostgreSQL backends

2019-06-19 Thread Andrey Borodin
Package: postgresql-common
Version: 199

When I execute
service postgresql start
I observe that logfile descriptor will leak into postmaster and every backend. 
This is not fd opened by PG's syslogger.

root@rc1b-tl7je3ulcm4z9v3c ~ # lsof | grep 'postgresql-10-data.log' | grep 
postgres | grep '/var/log'
postgres  28  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290011  postgres1w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290011  postgres2w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290011  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290011  postgres   13w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290013  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290014  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290015  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290016  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290017  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290018  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290019  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290020  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290036  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290211  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290256  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290257  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log
postgres  290258  postgres4w  REG  253,1 103172283  
   131900 /var/log/postgresql/postgresql-10-data.log

When logfile is rotated, descriptor remain to be open, but occupied space is 
not accounted by df. This leads to incorrect free space estimation on cluster.
When logfile is big enough during cluster start this can result in sudden 
cluster stop, which could be prevented by monitoring.

If I apply following patch descriptor stops leaking.

I use Ubuntu.
Description:Ubuntu 18.04.2 LTS
Release:18.04
Codename:   bionic

Best regards, Andrey Borodin.

diff --git a/pg_ctlcluster b/pg_ctlcluster
index 1690d3d..6039659 100755
--- a/pg_ctlcluster
+++ b/pg_ctlcluster
@@ -207,9 +207,8 @@ sub start {
 } else {
 setsid or error "could not start session: $!";
 if ($info{'logfile'}) {
-my $fd = POSIX::open($info{'logfile'}, 
POSIX::O_WRONLY|POSIX::O_APPEND|POSIX::O_CREAT) or error "Could not open 
logfile $info{'logfile'}";
-dup2($fd, 1);
-dup2($fd, 2);
+open(STDOUT, ">>$info{logfile}") or die $!;
+open(STDERR,  '>')  or die $!;
 }
 exec $pg_ctl @options or error "could not exec $pg_ctl @options: $!";
 }