Thanks Jeff, Magnus

Thanks for the suggestions.
This morning the same issue occurred, but this time it also complained that
it failed to write to pg_clog  (previous days, the only failure was in
writing to pg_stat_tmp)


2013-12-17 07:13:04 GMT DETAIL:  Could not write to file "pg_clog/0004" at
offset 212992: No space left on device.
2013-12-17 07:13:05 GMT ERROR:  could not access status of transaction 0
2013-12-17 07:13:05 GMT DETAIL:  Could not write to file "pg_clog/0004" at
offset 212992: No space left on device.
2013-12-17 07:13:06 GMT ERROR:  could not access status of transaction 0
2013-12-17 07:13:06 GMT DETAIL:  Could not write to file "pg_clog/0004" at
offset 212992: No space left on device.
2013-12-17 07:13:07 GMT LOG:  could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:07 GMT LOG:  could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:08 GMT LOG:  could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:08 GMT LOG:  could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on dev2013-12-17 07:25:15 GMT
WARNING:  terminating connection because of crash of another server
 process
2013-12-17 07:25:15 GMT DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared m
emory.
2013-12-17 07:25:15 GMT HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2013-12-17 07:25:16 GMT LOG:  all server processes terminated;
reinitializing
2013-12-17 07:25:17 GMT LOG:  database system was interrupted; last known up
at 2013-12-17 07:08:22 GMT
2013-12-17 07:25:17 GMT LOG:  database system was not properly shut down;
automatic recovery in progress
2013-12-17 07:25:17 GMT LOG:  redo starts at 5BC/CF3F9888
2013-12-17 07:25:18 GMT LOG:  could not open file
"pg_xlog/00000001000005BC000000D6" (log file 1468, segment 214): No such
file or directory
2013-12-17 07:25:18 GMT LOG:  redo done at 5BC/D5FFFEC0
2013-12-17 07:25:18 GMT LOG:  last completed transaction was at log time
2013-12-17 07:25:12.989653+00
2013-12-17 07:25:19 GMT LOG:  database system is ready to accept connections
2013-12-17 07:25:19 GMT LOG:  autovacuum launcher started



It’s never failed to write pg_xlog’s though – or at least, never reported a
failure.

For now, I’ve altered the pg_stat_tmp path and we’ll see what happens
tomorrow – Ideally though, yes.. I’d like to see this happening at runtime
and get a better picture of what’s blowing out here.



Further to this however, I notice that the base directory is consuming more
disk than I would have expected it to (all our relations are stored in
tablespaces assigned to other disks).
Looking for the larger culprits, I note the following files consuming 4.4GB

9.2/main/base/74641/2260957
9.2/main/base/74641/2260957.1
9.2/main/base/74641/2260957.2
9.2/main/base/74641/2260957.3
9.2/main/base/74641/2260957.4

I notice the timestamps are just shy of the time we experience that
pg_stat_tmp issue.

-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957
-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.1
-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.2
-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.3
-rw------- 1 postgres postgres  328466432 Dec 17 06:56 2260957.4


They appear to be orphaned oid’s, though I’m not sure this is definitive:

clone=# select pg_relation_filepath(2260957);
 pg_relation_filepath
----------------------

(1 row)



I probably would have thought these to be temporary files – except for that
I have temp_tablespace set elsewhere.

Are these truly orphaned files that postgresql no longer needs (possibly due
to a crash) ?    Is there a mechanism to scan the PG_DATA directories to
locate such files?
FYI, we’re running 9.2.4


Thanks again,

Tim



From:  Jeff Janes <jeff.ja...@gmail.com>
Date:  Monday, 16 December 2013 17:32
To:  Tim Kane <tim.k...@gmail.com>
Cc:  pgsql-general General <pgsql-general@postgresql.org>
Subject:  Re: [GENERAL] pg_stat_tmp

On Mon, Dec 16, 2013 at 5:57 AM, Tim Kane <tim.k...@gmail.com> wrote:
> Hi all,
> 
> The past few days I’ve been encountering the following error, followed by a
> full db restart and recovery
> 
> 
> 2013-12-16 07:12:53 GMT LOG:  could not write temporary statistics file
> "pg_stat_tmp/pgstat.tmp": No space left on device

Is that the only thing in the logs?  pg_stat_tmp problems should not bring
down your database.  But problems with pg_xlog running out of space
certainly can--but they should also be logged.
 

> 
> This occurs at a time of moderate load, during the same set of operations each
> morning.
> Interestingly, when I execute this manually at any other time of date, the
> process completes normally.
> 
> I presume that the pg_stat_tmp location is system-wide and likely is not
> impacted by temp_tablespaces
> The root partition, where postgresql is installed does not have a lot of disk
> available (4GB).
> 
> My first instinct here is to symlink pg_stat_tmp against another disk with a
> little more room to breathe, however I’m surprised that pgstat.tmp would grow
> to be so large in the first place – possibly there is something else at play
> here.

We don't know how large it is getting!  If pg_stat_tmp shares the same
partition as pg_xlog, base (as in the default configuration), and pg_log,
then any of those things could be filling up the partition, and pg_stat_tmp
could just be the canary, not the culprit.

Anyway, you don't need to use a symlink, you could just change
stats_temp_directory to point someplace else.

Your best bet is run "du" or something similar to figure out where your
space is actually going.
 
Cheers,

Jeff


Reply via email to