On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote: > On 22.12.2010 16:52, Simon Riggs wrote: > > On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote: > >> On 22.12.2010 15:59, Simon Riggs wrote: > >>> On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote: > >>>> My gut feeling is that a reasonable compromise is to set hint bits like > >>>> we do today, but don't mark the page as dirty when only hint bits are > >>>> set. That way you get the benefit of hint bits for tuples that are > >>>> frequently accessed and stay in buffer cache. But you don't spend any > >>>> extra I/O to set them. I'd really like to see a worst-case scenario > >>>> benchmark of a patch that does that. > >>> > >>> That sounds great, but still prevents block checksums and that is a very > >>> valuable feature for robustness. > >> > >> It does? The problem with block checksums is that if you modify a page > >> and don't have a corresponding WAL record for it, like a hint bit > >> update, you can have a torn page so that the checksum doesn't match. > >> Refraining from dirtying the page when a hint bit is updated avoids the > >> problem. With that change, we only ever write pages to disk that have a > >> WAL record associated with it, with full-page images as necessary to > >> avoid torn pages. > > > > Which then leads to a block CRC not matching the block in memory.
> Do you envision that the CRC is calculated at every update, or only when > a page is written out from the buffer cache? At every update, so there is a clear assertion that the CRC matches the block. > If the former, you could > recalculate the CRC at a hint bit update too. If the latter, the hint > bits are included in the page image that you checksum just like any > other data. If we didn't have hint bits, we wouldn't need to recalculate the CRC each time one was updated... > > So what you suggest works only if we restrict CRC checking to blocks > > incoming to the buffer cache, but leaves us unable to do CRC checks on > > blocks once in the buffer cache. Since many blocks stay in cache almost > > constantly, we're left with the situation that the most heavily used > > parts of the database seldom get CRC checked. > > There's plenty of stuff in memory that's not covered by an > application-level CRC. That's what ECC RAM is for. http://www.google.com/research/pubs/archive/35162.pdf Google research shows that each DIMM has an 8% chance per annum of uncorrectable memory errors, even on ECC. If you have large RAM, like everybody now does, your incidence of this type of error will be much higher than it was in previous years, so our perception of what is necessary now to protect databases is out of date. We have data under our care, and will be much more likely to receive this kind of error because of the amount of RAM we use. > Updating the CRC at > every update to a page seems really expensive, but it's an orthogonal > issue to hint bits. Clearly, the frequency with which we set hint bits affects the frequency we can sensibly update CRCs. It shouldn't be up to us to decide how much protection a user wants to give their data. There might be two or three settings that make sense, but clearly we need to be able to limit hint-bit setting to allow us to have a usable CRC check. So there is a very string connection between turning this optimisation off and gaining CRC checking as a feature. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers