Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-12 Thread Martijn van Oosterhout
On Thu, Jan 11, 2007 at 11:10:38PM +, Simon Riggs wrote:
 On Thu, 2007-01-11 at 17:06 +, Gregory Stark wrote:
  Having a CRC in WAL but not in the heap seems kind of pointless. 
 
 Yes...
 
  If your
  hardware is unreliable the corruption could anywhere. 
 
 Agreed.

I thought the point was that the WAL protects against unexpected power
failure, that sort of thing. In that situation, the memory is the first
to be corrupted, and an active DMA transfer will thus be corrupted
also. We don't need to worry about the data, because the WAL is known
to be accurate.

The WAL does not protect against random data corruption, in normal
operation it is never read. If we want to detect random corruption,
we'd need checksum everywhere, yes. But that's not the goal here.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Simon Riggs
On Wed, 2007-01-10 at 23:32 -0500, Bruce Momjian wrote:
 Simon Riggs wrote:
  On Fri, 2007-01-05 at 22:57 -0500, Tom Lane wrote:
   Jim Nasby [EMAIL PROTECTED] writes:
On Jan 5, 2007, at 6:30 AM, Zeugswetter Andreas ADI SD wrote:
Ok, so when you need CRC's on a replicate (but not on the master) you
   
Which sounds to me like a good reason to allow the option in  
recovery.conf as well...
   
   Actually, I'm not seeing the use-case for a slave having a different
   setting from the master at all?
   
 My backup server is less reliable than the primary.
   
 My backup server is more reliable than the primary.
   
   Somehow, neither of these statements seem likely to be uttered by
   a sane DBA ...
  
  If I take a backup of a server and bring it up on a new system, the
  blocks in the backup will not have been CRC checked before they go to
  disk.
  
  If I take the same server and now stream log records across to it, why
  *must* that data be CRC checked, when the original data has not been?
  
  I'm proposing choice, with a safe default. That's all.
 
 I am assuming this item is dead because no performance results have been
 reported.

It's been on my hold queue, as a result of its lack of clear acceptance.

Results from earlier tests show the routines which are dominated by CRC
checking overhead are prominent in a number of important workloads.
Those workloads all have a substantial disk component, so test results
will vary between no saving at all on a disk-bound system to some
savings on a CPU bound system.

Restore RecordIsValid() #1 on oprofile results at 50-70% CPU

COPYXLogInsert() #1 on oprofile results at 17% CPU
(full_page_writes = on)

OLTPno test with full_page_writes = on (less relevant)

OLTPXLogInsert() #5 on oprofile results at 1.2% CPU
(full_page_writes = off)
Removing the CRC checks on WAL would likely be the easiest to remove 1%
CPU on the system as it stands. Other changes require algorithmic or
architectural changes to improve matters, though gains can be much
larger. 1% doesn't sound much, but PostgreSQL is a very sleek beast
these days. As we improve things in other areas the importance of this
patch as a tuning switch will grow.

Clearly the current patch is not accepted, but can we imagine a patch
that saved substantial CPU time in these areas that would be acceptable?
*Always* as a non-default option, IMHO, with careful documentation as to
its possible use.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 COPY  XLogInsert() #1 on oprofile results at 17% CPU
   (full_page_writes = on)

But what portion of that is actually CRC-related?  XLogInsert does quite
a lot.

Anyway, I can't see degrading the reliability of the system for a gain
in the range of a few percent, which is the most that we'd be likely
to get here ... for a factor of two or more, maybe people would be
willing to take a risk.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Simon Riggs
On Thu, 2007-01-11 at 09:01 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  COPYXLogInsert() #1 on oprofile results at 17% CPU
  (full_page_writes = on)
 
 But what portion of that is actually CRC-related?  XLogInsert does quite
 a lot.
 
 Anyway, I can't see degrading the reliability of the system for a gain
 in the range of a few percent, which is the most that we'd be likely
 to get here ... for a factor of two or more, maybe people would be
 willing to take a risk.

All I would add is that the loss of reliability was not certain in all
cases, otherwise I myself would have dropped the idea long ago. With the
spectre of doubt surrounding this, I'm happy to drop the idea until we
have proof/greater certainty either way.

Patch revoked.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 What did you think about protecting against torn writes using id numbers every
 512 bytes.

Pretty much not happening; or are you volunteering to fix every part of
the system to tolerate injections of inserted data anywhere in a stored
datum?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 What did you think about protecting against torn writes using id numbers 
 every
 512 bytes.

 Pretty much not happening; or are you volunteering to fix every part of
 the system to tolerate injections of inserted data anywhere in a stored
 datum?

I was thinking to do it at a low level as the xlog records are prepared to be
written to the filesystem and as the data is being read from disk. I haven't
read that code yet to see where to inject it but I understand there's already
a copy happening and it could be done there. Even if we optimize out all the
copies we could do it in the actual i/o call using readv/writev.

I wasn't thinking of doing it on actual disk buffers since it doesn't help us
avoid full page writes, but it could be done there too using readv/writev in
the smgr. That might be useful for a full_page_writes=off system so even if it
can't guarantee no corruption at least it can guarantee no silent corruption.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Pretty much not happening; or are you volunteering to fix every part of
 the system to tolerate injections of inserted data anywhere in a stored
 datum?

 I was thinking to do it at a low level as the xlog records are prepared to be
 written to the filesystem and as the data is being read from disk. I haven't
 read that code yet to see where to inject it but I understand there's already
 a copy happening and it could be done there.

You understand wrong ... a tuple sitting on disk is normally read
directly from the shared buffer, and I don't think we want to pay for
copying it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Pretty much not happening; or are you volunteering to fix every part of
 the system to tolerate injections of inserted data anywhere in a stored
 datum?

 I was thinking to do it at a low level as the xlog records are prepared to be
 written to the filesystem and as the data is being read from disk. I haven't
 read that code yet to see where to inject it but I understand there's already
 a copy happening and it could be done there.

 You understand wrong ... a tuple sitting on disk is normally read
 directly from the shared buffer, and I don't think we want to pay for
 copying it.

xlog records


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 You understand wrong ... a tuple sitting on disk is normally read
 directly from the shared buffer, and I don't think we want to pay for
 copying it.

 xlog records

Oh, sorry, had the wrong context in mind.  I'm still not very impressed
with the idea --- a CRC check will catch many kinds of problems, whereas
this approach catches exactly one kind of problem.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Oh, sorry, had the wrong context in mind.  I'm still not very impressed
 with the idea --- a CRC check will catch many kinds of problems, whereas
 this approach catches exactly one kind of problem.

Well in fairness I tossed in a throwaway comment at the end of that email
about heap pages. I'll do the same here since I can't resist. But the main
thread here is about xlog really.

It just seems to me like it's better to target each problem with a solution
that addresses it directly than have one feature that we hope hope addresses
them all more or less.

Having a CRC in WAL but not in the heap seems kind of pointless. If your
hardware is unreliable the corruption could anywhere. Depending on it to solve
multiple problems means we can't offer the option to disable it because it
would affect other things as well.

What I would like to see is a CRC option that would put CRC checks in every
disk page whether heap, index, WAL, control file, etc. I think we would
default that to off to match our current setup most closely.

Separately we would have a feature in WAL to detect torn pages so that we can
reliably detect the end of valid WAL. That would have to always be on. But
having it as a separate feature means the CRC could be optional.

Also, incidentally like I mentioned in my previous email, we could do the torn
page detection in heap pages too by handling it in the smgr using
readv/writev. No copies, no corrupted datums. Essentially the tags would be
inserted on the fly as the data was copied into kernel space.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-11 Thread Simon Riggs
On Thu, 2007-01-11 at 17:06 +, Gregory Stark wrote:
 Having a CRC in WAL but not in the heap seems kind of pointless. 

Yes...

 If your
 hardware is unreliable the corruption could anywhere. 

Agreed.

Other DBMS have one setting for the whole server; I've never seen
separate settings for WAL and data.

 Depending on it to solve
 multiple problems means we can't offer the option to disable it
 because it
 would affect other things as well.
 
 What I would like to see is a CRC option that would put CRC checks in
 every
 disk page whether heap, index, WAL, control file, etc. I think we
 would
 default that to off to match our current setup most closely.
 
 Separately we would have a feature in WAL to detect torn pages so that
 we can
 reliably detect the end of valid WAL. That would have to always be on.
 But
 having it as a separate feature means the CRC could be optional.

Your thoughts seem logical to me.

It does seem a bigger project than I'd envisaged, but doable, one day.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-10 Thread Bruce Momjian
Simon Riggs wrote:
 On Fri, 2007-01-05 at 22:57 -0500, Tom Lane wrote:
  Jim Nasby [EMAIL PROTECTED] writes:
   On Jan 5, 2007, at 6:30 AM, Zeugswetter Andreas ADI SD wrote:
   Ok, so when you need CRC's on a replicate (but not on the master) you
  
   Which sounds to me like a good reason to allow the option in  
   recovery.conf as well...
  
  Actually, I'm not seeing the use-case for a slave having a different
  setting from the master at all?
  
  My backup server is less reliable than the primary.
  
  My backup server is more reliable than the primary.
  
  Somehow, neither of these statements seem likely to be uttered by
  a sane DBA ...
 
 If I take a backup of a server and bring it up on a new system, the
 blocks in the backup will not have been CRC checked before they go to
 disk.
 
 If I take the same server and now stream log records across to it, why
 *must* that data be CRC checked, when the original data has not been?
 
 I'm proposing choice, with a safe default. That's all.

I am assuming this item is dead because no performance results have been
reported.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org