Did you do a detailed du during the supposed problem and after the reboot and 
make a diff of those
to fimd any invlolved files/dirs?
That said, i think you might consider posting on freebsd-[questions|stable] as 
well.

On Τετ 20 Μαρ 2013 11:49:07 Dan Thomas wrote:

Hi Guys,

We're seeing a problem with some of our FreeBSD/PostgreSQL servers "leaking" 
quite significant amounts of disk space:

    > df -h /usr/local/pgsql/
    Filesystem       Size    Used   Avail Capacity  Mounted on
    /dev/mfid1s1d    1.1T    772G    222G    78%    /usr/local/pgsql

    > du -sh /usr/local/pgsql/
    741G    /usr/local/pgsql/

Stopping Postgres doesn't fix it, but rebooting does which points at the OS 
rather than PG to me. However, the leak is only apparent in the dedicated pgsql 
partition, and only on our database servers, so PostgreSQL seems to at least be 
involved. The partition itself is a relatively standard UFS partition:

    > grep /usr/local/pgsql /etc/fstab
    /dev/mfid1s1d   /usr/local/pgsql    ufs   rw   2   2

    > tunefs -p /usr/local/pgsql/
    tunefs: POSIX.1e ACLs: (-a)                                disabled
    tunefs: NFSv4 ACLs: (-N)                                   disabled
    tunefs: MAC multilabel: (-l)                               disabled
    tunefs: soft updates: (-n)                                 enabled
    tunefs: gjournal: (-J)                                     disabled
    tunefs: trim: (-t)                                         disabled
    tunefs: maximum blocks per file in a cylinder group: (-e)  2048
    tunefs: average file size: (-f)                            16384
    tunefs: average number of files in a directory: (-s)       64
    tunefs: minimum percentage of free space: (-m)             8%
    tunefs: optimization preference: (-o)                      time
    tunefs: volume label: (-L)                                 

LSOF isn't showing any open files:

    > lsof +L /usr/local/pgsql/ | awk '{ print $8 }' | grep 0 | wc -l
    0

We're not creating filesystem snapshots:

    > find /usr/local/pgsql/ -flags snapshot
    >

Not all of our servers are leaking space, it's only the more recently-installed 
systems. Here's a quick breakdown of versions:

    FreeBSD   PostgreSQL   Leaking?
    8.0       8.4.4        no
    8.2       9.0.4        no
    8.3       9.1.4        yes
    8.3       9.2.3        yes
    9.1       9.2.3        yes

Each of these servers is configured with a warm standby, so we've been 
switching them over to the standby to reclaim the space (rebooting the primary 
is too much downtime). The standby does *not* demonstrate this problem while 
it's being used as a standby, but it starts leaking space once it's been made 
the primary.

Initially I thought this might be related to WAL files, however the pg_xlog dir 
is symlinked outside of the /usr/local/pgsql partition that is demonstrating 
this problem:

    > ll /usr/local/pgsql/data/pg_xlog    
    lrwxr-xr-x 25B Oct 19 10:48 pg_xlog -> /usr/local/pglog/pg_xlog/

I've exhausted everything I can think of to try to solve this one. Has anyone 
got any ideas on how to go about debugging this?

Thanks,

Dan



-
Achilleas Mantzios
IT DEV
IT DEPT
Dynacom Tankers Mgmt

Reply via email to