Re: [BUGS] Postgres crash? could not write to log file: No space left on device

2013-07-02 Thread Heikki Linnakangas

On 26.06.2013 17:15, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

We've discussed retrying short writes before, and IIRC Tom has argued
that it shouldn't be necessary when writing to disk. Nevertheless, I
think we should retry in XLogWrite(). It can write much bigger chunks
than most write() calls, so there's more room for a short write to
happen there if it can happen at all. Secondly, it PANICs on failure, so
it would be nice to try a bit harder to avoid that.


Seems reasonable.  My concern about the idea in general was the
impossibility of being sure we'd protected every single write() call.
But if we can identify specific call sites that seem at more risk than
most, I'm okay with adding extra logic there.


Committed a patch to add retry loop to XLogWrite().

I noticed that FileWrite() has some additional Windows-specific code to 
also retry on an ERROR_NO_SYSTEM_RESOURCES error. That's a bit scary, 
because we don't check for that in any other write() calls in the 
backend. If we really need to be prepared for that on Windows, I think 
that would need to be in a wrapper function in src/port or src/backend/port.


Would a Windows-person like to comment on that?

- Heikki


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Postgres crash? could not write to log file: No space left on device

2013-06-26 Thread Greg Stark
On Wed, Jun 26, 2013 at 12:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  (Though if it is, it's not apparent why such
 failures would only be manifesting on the pg_xlog files and not for
 anything else.)

Well data files are only ever written to in 8k chunks. Maybe these
errors are only occuring on 8k xlog records such as records with
multiple full page images. I'm not sure how much we write for other
types of files but they won't be written to as frequently as xlog or
data files and might not cause errors that are as noticeable.


-- 
greg


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Postgres crash? could not write to log file: No space left on device

2013-06-26 Thread Andres Freund
On 2013-06-26 13:14:37 +0100, Greg Stark wrote:
 On Wed, Jun 26, 2013 at 12:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
   (Though if it is, it's not apparent why such
  failures would only be manifesting on the pg_xlog files and not for
  anything else.)
 
 Well data files are only ever written to in 8k chunks. Maybe these
 errors are only occuring on 8k xlog records such as records with
 multiple full page images. I'm not sure how much we write for other
 types of files but they won't be written to as frequently as xlog or
 data files and might not cause errors that are as noticeable.

We only write xlog in XLOG_BLCKSZ units - which is 8kb by default as
well...

Yuri, have you compiled postgres with nonstandard configure or
pg_config_manual.h settings?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Postgres crash? could not write to log file: No space left on device

2013-06-26 Thread Heikki Linnakangas

On 26.06.2013 15:21, Andres Freund wrote:

On 2013-06-26 13:14:37 +0100, Greg Stark wrote:

On Wed, Jun 26, 2013 at 12:57 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

  (Though if it is, it's not apparent why such
failures would only be manifesting on the pg_xlog files and not for
anything else.)


Well data files are only ever written to in 8k chunks. Maybe these
errors are only occuring on8k xlog records such as records with
multiple full page images. I'm not sure how much we write for other
types of files but they won't be written to as frequently as xlog or
data files and might not cause errors that are as noticeable.


We only write xlog in XLOG_BLCKSZ units - which is 8kb by default as
well...


Actually, XLogWrite() writes multiple pages at once. If all wal_buffers 
are dirty, it can try to write them all in one write() call.


We've discussed retrying short writes before, and IIRC Tom has argued 
that it shouldn't be necessary when writing to disk. Nevertheless, I 
think we should retry in XLogWrite(). It can write much bigger chunks 
than most write() calls, so there's more room for a short write to 
happen there if it can happen at all. Secondly, it PANICs on failure, so 
it would be nice to try a bit harder to avoid that.


- Heikki


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Postgres crash? could not write to log file: No space left on device

2013-06-26 Thread Andres Freund
On 2013-06-26 15:40:08 +0300, Heikki Linnakangas wrote:
 On 26.06.2013 15:21, Andres Freund wrote:
 On 2013-06-26 13:14:37 +0100, Greg Stark wrote:
 On Wed, Jun 26, 2013 at 12:57 AM, Tom Lanet...@sss.pgh.pa.us  wrote:
   (Though if it is, it's not apparent why such
 failures would only be manifesting on the pg_xlog files and not for
 anything else.)
 
 Well data files are only ever written to in 8k chunks. Maybe these
 errors are only occuring on8k xlog records such as records with
 multiple full page images. I'm not sure how much we write for other
 types of files but they won't be written to as frequently as xlog or
 data files and might not cause errors that are as noticeable.
 
 We only write xlog in XLOG_BLCKSZ units - which is 8kb by default as
 well...
 
 Actually, XLogWrite() writes multiple pages at once. If all wal_buffers are
 dirty, it can try to write them all in one write() call.

Oh. Misremembered that.

 We've discussed retrying short writes before, and IIRC Tom has argued that
 it shouldn't be necessary when writing to disk. Nevertheless, I think we
 should retry in XLogWrite(). It can write much bigger chunks than most
 write() calls, so there's more room for a short write to happen t$here if it
 can happen at all. Secondly, it PANICs on failure, so it would be nice to
 try a bit harder to avoid that.

At the very least we should log the amount of bytes actually writen if
it was a short write to make it possible to discern that case from the
direct ENOSPC response.

This might also be caused by the fact that until recently the SIGALRM
handler didn't set SA_RESTART... If a backend decided to write out the
xlog directly it very well might have an active alarm...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Postgres crash? could not write to log file: No space left on device

2013-06-26 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 We've discussed retrying short writes before, and IIRC Tom has argued 
 that it shouldn't be necessary when writing to disk. Nevertheless, I 
 think we should retry in XLogWrite(). It can write much bigger chunks 
 than most write() calls, so there's more room for a short write to 
 happen there if it can happen at all. Secondly, it PANICs on failure, so 
 it would be nice to try a bit harder to avoid that.

Seems reasonable.  My concern about the idea in general was the
impossibility of being sure we'd protected every single write() call.
But if we can identify specific call sites that seem at more risk than
most, I'm okay with adding extra logic there.

regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] Postgres crash? could not write to log file: No space left on device

2013-06-25 Thread Yuri Levinsky
Dear All,

I have the following issue on Sun Solaris 10. PostgreSQL version is
9.2.3. The wall logging is minimal and no archiving. The DB restarted
several time, the box is up for last 23 days. The PostgreSQL
installation and files under /data/postgres that is half empty. Is it
some other destination that might cause the problem? Can I log the space
consumption and directory name where the problem is happening by some
debug level or trace setting?  

 

PANIC:  could not write to log file 81, segment 125 at offset 13959168,
length 1392640: No space left on device

LOG:  process 10203 still waiting for ShareLock on transaction 3010915
after 1004.113 ms

STATEMENT:  UPDATE tctuserinfo SET clickmiles = clickmiles + $1,
periodicalclickmiles = periodicalclickmiles + $2, active = $3,
activeupdatetime = $4, activationsetby = $5, smscid = $6, sdrtime = $7,
simvalue = simvalue + $8, totalsimval

ue = totalsimvalue + $9, firstclick = $10, lastclick = $11,
firstactivationtime = $12, cbchannel = $13, clickmilesupdatetime = $14,
ci = $15, lac = $16, bscid = $17, lastlocationupdatetime = $18,
subscriptiontype = $19, contentcategory =

$20, livechannels = $21, contextclicks = $22 WHERE phonenumber = $23

LOG:  WAL writer process (PID 10476) was terminated by signal 6

LOG:  terminating any other active server processes

FATAL:  the database system is in recovery mode

 

But it looks OK:

 

dbnetapp:/vol/postgres

90G44G46G49%/data/postgres

 

Is it possible that heavy queries consuming disk space (as temporary
space) and after the crash and recovery it becoming OK? 

 



Re: [BUGS] Postgres crash? could not write to log file: No space left on device

2013-06-25 Thread Lou Picciano
Yuri, You're sure the pg xlogs are going where you expect them to? Have you 
fine-tooth-combed your conf file for log file-related settings? Log files may 
well be directed to fs other than /data/postgres (as is common in our 
environments, e.g.)

Do a $ df -h on the various FSes involved...

Are you using Solaris 10 ACLs? Dig deeper on Tom's point on user-specific 
quotas. ZFS in use? Various quota settings under Solaris can get you really 
unexpected mileage.

Lou Picciano

- Original Message -
From: Yuri Levinsky 
To: pgsql-bugs@postgresql.org
Sent: Tue, 25 Jun 2013 12:23:00 - (UTC)
Subject: [BUGS] Postgres crash? could not write to log file: No space left on 
device






 Dear All, I have the following issue on Sun Solaris 10. PostgreSQL version is 
9.2.3. The wall logging is minimal and no archiving. The DB restarted several 
time, the box is up for last 23 days. The PostgreSQL installation and files 
under /data/postgres that is half empty. Is it some other destination that 
might cause the problem? Can I log the space consumption and directory name 
where the problem is happening by some debug level or trace setting?   PANIC:  
could not write to log file 81, segment 125 at offset 13959168, length 1392640: 
No space left on deviceLOG:  process 10203 still waiting for ShareLock on 
transaction 3010915 after 1004.113 msSTATEMENT:  UPDATE tctuserinfo SET 
clickmiles = clickmiles + $1, periodicalclickmiles = periodicalclickmiles + $2, 
active = $3, activeupdatetime = $4, activationsetby = $5, smscid = $6, sdrtime 
= $7, simvalue = simvalue + $8, totalsimvalue = totalsimvalue + $9, firstclick 
= $10, lastclick = $11, firstactivationtime = $12, cbchannel = $13, 
clickmilesupdatetime = $14, ci = $15, lac = $16, bscid = $17, 
lastlocationupdatetime = $18, subscriptiontype = $19, contentcategory =$20, 
livechannels = $21, contextclicks = $22 WHERE phonenumber = $23LOG:  WAL writer 
process (PID 10476) was terminated by signal 6LOG:  terminating any other 
active server processesFATAL:  the database system is in recovery mode But it 
looks OK: dbnetapp:/vol/postgres90G44G46G
49%/data/postgres Is it possible that “heavy” queries consuming disk space 
(as temporary space) and after the crash and recovery it becoming OK?  


Re: [BUGS] Postgres crash? could not write to log file: No space left on device

2013-06-25 Thread Jeff Davis
On Tue, 2013-06-25 at 09:46 -0400, Tom Lane wrote:
 Yuri Levinsky yu...@celltick.com writes:
  PANIC:  could not write to log file 81, segment 125 at offset 13959168,
  length 1392640: No space left on device
 
 That's definitely telling you it got ENOSPC from a write in
 $PGDATA/pg_xlog.

Either that, or write() wrote less than expected but did not set errno.
It looks like we assume ENOSPC when errno is not set.

Regards,
Jeff Davis




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Postgres crash? could not write to log file: No space left on device

2013-06-25 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Tue, 2013-06-25 at 09:46 -0400, Tom Lane wrote:
 That's definitely telling you it got ENOSPC from a write in
 $PGDATA/pg_xlog.

 Either that, or write() wrote less than expected but did not set errno.

Good point.  I wonder if he's using a filesystem that is capable of
reporting partial writes for other reasons, eg maybe it allows signals
to end writes early.  (Though if it is, it's not apparent why such
failures would only be manifesting on the pg_xlog files and not for
anything else.)

regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs