Re: [HACKERS] Enabling Checksums

Greg Smith Sun, 17 Mar 2013 17:50:57 -0700

On 3/17/13 1:41 PM, Simon Riggs wrote:

So I'm now moving towards commit using a CRC algorithm. I'll put in a
feature to allow algorithm be selected at initdb time, though that is
mainly a convenience  to allow us to more easily do further testing on
speedups and whether there are any platform specific regressions
there.

That sounds reasonable. As I just posted, I'm hoping Ants can help makea pass over a CRC16 version, since his one on the Fletcher one seemedvery productive. If you're spending time looking at this, I know I'dprefer to see you poking at the WAL related aspects instead. There aremore of us who are capable of crunching CRC code than the list of peoplewho have practice at WAL changes like you do.

I see the situation with checksums right now as being similar to thecommit/postpone situation for Hot Standby in 9.0. The code is uglierand surely buggier than we'd like, but it has been getting beat onregularly for over a year now to knock problems out. There are surelymore bugs left to find. The improved testing that comes only fromsomething being committed is probably necessary to really advance thetesting coverage though. But with adopting the feature being a strictopt-in, the bug rate for non-adopters isn't that broad. All the TLIrearrangements is a lot of the patch, but that's pretty mechanical workthat doesn't seem that risky.

There was one question that kepts coming up in person this week (Simon,Jeff, Daniel, Josh Berkus, and myself were all in the same place for afew days) that I wanted to address with some thoughts on-list. Giventhat the current overhead is right on the edge of being acceptable, theconcern is whether committing this will lock the project into apermanent problem that can't be improved later. I think it'smanageable, though. Here's how I interpret the data we have:

-The checksum has to change from Fletcher 16 to CRC-16. The "hairy"parts of the feature don't change very much from that though. I seeexactly which checksum is produced is a pretty small detail, from a codecorrectness perspective. It's not like this will be starting over thetesting cycle completely. The performance change should be quantifiedthough.

-Some common workloads will show no performance drop, like things thatfit into shared_buffers and don't write hint bits.

-Some common workloads that write things seem to hit about a 2% drop,presumably because they hit one of the slower situations around 10% ofthe time.

-There are a decent number of hard to deal with workloads that haveshared_buffers <-> OS cache thrashing, and any approach here willregularly hit them with around a 20% drop. There's some hope that thiswill improve later, especially if a CRC is used and later versions canpick up the Intel i7 CRC32 hardware acceleration. The magnitude of thisoverhead doesn't seem too negotiable though. We've heard enoughcomparisons with other people's implementations now to see that's nearthe best anyone does here. If the weird slowdowns some people reportwith very large values of shared_buffers is fixed, that will make thissituation better. That's on my hit list of things I really want to seesorted in the next release.

-The worst of the worst case behavior is Jeff's "SELECTs now write a WALlogged hint bit now" test, which can easily exceed a 20% drop. Therehave been lots of features submitted in the last two releases that tryto improve hint bit operations. Some of those didn't show enough of awin to be worth the trouble. It may be the case, though, that in achecksummed environment those wins are suddenly big enough to matter.If any of those go in later, the worst case for checksums could thenimprove too. Having to test both ways, with and without checksums,complicates the performance testing. But the project has to startadopting a better approach to that in the next year regardless IMHO, andI'm scheduling time to help as much as I can with it. (That's a wholeother discussion)

-Having COPY FREEZE available now is a useful tool to eliminate a lot ofthe load/expensive hint bit write scenarios I know exist in the realworld. I think the docs for checksumming should even highlight thatsynergy.

As long as the feature is off by default, so that people have to turn iton to hit the biggest changed code paths, the exposure to potential bugsdoesn't seem too bad. New WAL data is no fun, but it's not like thishasn't happened before.

For version <9.3+1>, there's a decent sized list of potentialperformance improvements that seem possible. I don't see any reason tobelieve committing a CRC16 based version of this will lock theimplementation into a bad form that can't be optimized later. Thecomparison with Hot Standby again seems apt again here. There was adecent list of rough edges that were hit by early 9.0 adopters only whenthey turned the feature on. Then many were improved in 9.1.Checksumming seems it could follow the same path. Committed for 9.3,improvements expected during <9.3+1> work, generally considered welltested by the release of <9.3+1>.

On the testing front, we've seen on-list interest in this feature fromcompanies like Heroku and Enova, who both have some resources andpractice to help testing too. Heroku can spin up test instances withworkloads any number of ways. Enova can make a Londiste standby withchecksums turned on to hit it with a logical replicated workload, whilethe master stays un-checksummed.

If this goes in, I fully intent to hold both companies to hitting thefeature with as many workloads as they can help generate during (andbeyond) beta. I have my own stress tests I'll keep running too. If thebug rate from the beta adopters is bad and doesn't improve, there's isalways the uncomfortable possibility of reverting it before the first RC.


--
Greg Smith   2ndQuadrant US    [email protected]   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

Reply via email to