On 3/17/13 1:41 PM, Simon Riggs wrote:
So I'm now moving towards commit using a CRC algorithm. I'll put in a
feature to allow algorithm be selected at initdb time, though that is
mainly a convenience  to allow us to more easily do further testing on
speedups and whether there are any platform specific regressions
there.

That sounds reasonable. As I just posted, I'm hoping Ants can help make a pass over a CRC16 version, since his one on the Fletcher one seemed very productive. If you're spending time looking at this, I know I'd prefer to see you poking at the WAL related aspects instead. There are more of us who are capable of crunching CRC code than the list of people who have practice at WAL changes like you do.

I see the situation with checksums right now as being similar to the commit/postpone situation for Hot Standby in 9.0. The code is uglier and surely buggier than we'd like, but it has been getting beat on regularly for over a year now to knock problems out. There are surely more bugs left to find. The improved testing that comes only from something being committed is probably necessary to really advance the testing coverage though. But with adopting the feature being a strict opt-in, the bug rate for non-adopters isn't that broad. All the TLI rearrangements is a lot of the patch, but that's pretty mechanical work that doesn't seem that risky.

There was one question that kepts coming up in person this week (Simon, Jeff, Daniel, Josh Berkus, and myself were all in the same place for a few days) that I wanted to address with some thoughts on-list. Given that the current overhead is right on the edge of being acceptable, the concern is whether committing this will lock the project into a permanent problem that can't be improved later. I think it's manageable, though. Here's how I interpret the data we have:

-The checksum has to change from Fletcher 16 to CRC-16. The "hairy" parts of the feature don't change very much from that though. I see exactly which checksum is produced is a pretty small detail, from a code correctness perspective. It's not like this will be starting over the testing cycle completely. The performance change should be quantified though.

-Some common workloads will show no performance drop, like things that fit into shared_buffers and don't write hint bits.

-Some common workloads that write things seem to hit about a 2% drop, presumably because they hit one of the slower situations around 10% of the time.

-There are a decent number of hard to deal with workloads that have shared_buffers <-> OS cache thrashing, and any approach here will regularly hit them with around a 20% drop. There's some hope that this will improve later, especially if a CRC is used and later versions can pick up the Intel i7 CRC32 hardware acceleration. The magnitude of this overhead doesn't seem too negotiable though. We've heard enough comparisons with other people's implementations now to see that's near the best anyone does here. If the weird slowdowns some people report with very large values of shared_buffers is fixed, that will make this situation better. That's on my hit list of things I really want to see sorted in the next release.

-The worst of the worst case behavior is Jeff's "SELECTs now write a WAL logged hint bit now" test, which can easily exceed a 20% drop. There have been lots of features submitted in the last two releases that try to improve hint bit operations. Some of those didn't show enough of a win to be worth the trouble. It may be the case, though, that in a checksummed environment those wins are suddenly big enough to matter. If any of those go in later, the worst case for checksums could then improve too. Having to test both ways, with and without checksums, complicates the performance testing. But the project has to start adopting a better approach to that in the next year regardless IMHO, and I'm scheduling time to help as much as I can with it. (That's a whole other discussion)

-Having COPY FREEZE available now is a useful tool to eliminate a lot of the load/expensive hint bit write scenarios I know exist in the real world. I think the docs for checksumming should even highlight that synergy.

As long as the feature is off by default, so that people have to turn it on to hit the biggest changed code paths, the exposure to potential bugs doesn't seem too bad. New WAL data is no fun, but it's not like this hasn't happened before.

For version <9.3+1>, there's a decent sized list of potential performance improvements that seem possible. I don't see any reason to believe committing a CRC16 based version of this will lock the implementation into a bad form that can't be optimized later. The comparison with Hot Standby again seems apt again here. There was a decent list of rough edges that were hit by early 9.0 adopters only when they turned the feature on. Then many were improved in 9.1. Checksumming seems it could follow the same path. Committed for 9.3, improvements expected during <9.3+1> work, generally considered well tested by the release of <9.3+1>.

On the testing front, we've seen on-list interest in this feature from companies like Heroku and Enova, who both have some resources and practice to help testing too. Heroku can spin up test instances with workloads any number of ways. Enova can make a Londiste standby with checksums turned on to hit it with a logical replicated workload, while the master stays un-checksummed.

If this goes in, I fully intent to hold both companies to hitting the feature with as many workloads as they can help generate during (and beyond) beta. I have my own stress tests I'll keep running too. If the bug rate from the beta adopters is bad and doesn't improve, there's is always the uncomfortable possibility of reverting it before the first RC.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to