Hello, Tom.

One again it happen.

postgres 11009 1 0 Jan12 ? 00:00:12 /usr/bin/ postmaster -D /data/idx/pgdata --silent-mode=true
postgres 11027 11009  0 Jan12 ?        00:26:55 postgres: logger process
postgres 11029 11009  0 Jan12 ?        00:00:21 postgres: writer process
postgres 11030 11009 0 Jan12 ? 00:16:50 postgres: stats collector process postgres 16751 11009 1 20:40 ? 00:00:12 postgres: stat stat 10.0.0.2(41239) idle postgres 16753 11009 0 20:40 ? 00:00:11 postgres: stat stat 10.0.0.2(41244) idle postgres 16758 11009 3 20:41 ? 00:00:35 postgres: stat stat 10.0.0.2(50546) SELECT postgres 16760 11009 0 20:42 ? 00:00:00 postgres: stat stat 10.0.0.2(50573) idle postgres 16761 11009 99 20:42 ? 00:16:59 postgres: stat stat 10.0.0.2(50577) idle postgres 16762 11009 0 20:43 ? 00:00:00 postgres: stat stat 10.0.0.2(50603) INSERT

I tried to use gdb but no success.

machupicchu ~ # gdb /usr/bin/postgres 16761
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc-unknown-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
Attaching to program: /usr/bin/postgres, process 16761
....

I enabled full logging to find a query after which pg die. From postgresql.conf:
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = off
log_min_messages = error
log_min_error_statement = error
log_min_duration_statement = 1000
log_duration = on
log_line_prefix = '%m, %s, %r, %p, '
log_statement = 'all'

But I can't find nothing about PID 16761 or tcp port 50577 in logs! I can find neighbour, but not this process.

Else:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ---- cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 4 24 1408 25832 180800 3738664 0 0 5 23 31 5 8 1 84 6 4 24 1408 25832 180800 3738664 0 0 0 0 1010 14 0 50 0 50 4 24 1408 25832 180800 3738664 0 0 0 0 1005 11 0 50 0 50 4 24 1408 25832 180800 3738664 0 0 0 0 1005 11 0 50 0 50 4 24 1408 25832 180800 3738664 0 0 0 0 1004 11 0 50 0 50 4 24 1408 25832 180800 3738664 0 0 0 0 1005 13 0 50 0 50 4 24 1408 25832 180800 3738664 0 0 0 0 1004 13 0 50 0 50 4 24 1408 25832 180800 3738664 0 0 0 0 1004 11 0 50 0 50

Looks like problem is in IO-wait?

What to do? Where to dig?

In some weeks a want to migrate to 2 redundant sun fire v440 servers with 4 storages. Main idea is to have HW redundancy. But now.. Don't know what to say to my boss. HW is fine, but a lot of data loss.. :)

p.s. I commented my configuration for _mem options. Now it's default.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to