On 2015-02-16 05:19:11 +0000, Kevin Grittner wrote: > Tom Lane <t...@sss.pgh.pa.us> wrote: > > Jim Nasby <jim.na...@bluetreble.com> writes: > >> On 2/15/15 10:36 AM, Tom Lane wrote: > >>> I wonder if we couldn't achieve largely the same positive effects through > >>> adding a simple transaction-level timeout option. > > We suggested this to our customer and got out of the meeting with > it looking like it *might* fly. In the next meeting, however, they > said they had run it by others and reviewed the code and it was > completely unacceptable -- they would not consider pg with this as > the solution. > > >> A common use-case is long-running reports hitting relatively stable data > >> in a database that also has tables with a high churn rate (ie: queue > >> tables). In those scenarios your only options right now are to suffer > >> huge amounts of bloat in the high-churn or not do your reporting. A > >> simple transaction timeout only "solves" this by denying you reporting > >> queries. > > > > Agreed, but Kevin's proposal has exactly the same problem only worse, > > because (a) the reporting transactions might or might not fail (per > > Murphy, they'll fail consistently only when you're on deadline), and > > (b) if they do fail, there's nothing you can do short of increasing the > > slack db-wide. > > These they were comfortable with, and did *not* consider to be > unpredictable or something they could not do something about. > I really don't feel I can say more than that, though, without > disclosing more than I should.
I agree that we need to do something about the dangers of long snapshots, I'm not sure this is it. I'm not sure of the contrary either. What I'm definitely not a fan of though, is this implementation. Having to add checks to a large number of places is a recipe for silently wrong query results. One thing I was wondering about recently was introducing an optimization for repeatedly updated rows into the vacuum code: A row that has xmin = xmax where these have committed can be removed, even if the xid is above the xmin horizon - no other transaction is ever going to see it. While there's some hairy-ness implementing that, it doesn't seem too hard. And there's a fair number of cases where that'd allow less bloat to accumulate. Obviously it'd be better if we had logic to do that for other patterns as well (where the updates aren't in the same xact), but it seems like a good start. > > There might be something in that, but again it's not much like this patch. > > The key point I think we're both making is that nondeterministic failures > > are bad, especially when you're talking about long-running, expensive-to- > > retry transactions. > > What the customer most doesn't want to be "nondeterministic" is > whether the error is generated only when the snapshot has been used > to read a page which has been modified since the snapshot was > taken. If tables or materialized views are set up before a query > and then not subsequently modified during the execution of the > query, that query must not be canceled even if it runs for days, > but it must not cause unconstrained bloat during the run. So far I > don't see any way to avoid including the LSN with the snapshot or > modifying the index AMs. Let's be clear on the footprint for that; > for the btree implementation it is: IIUC you never would want cancellations when accessing the the tables these longrunning backends actually access. The errors about too old snapshots are just a stopgap because we can't compute the xmin per table in a meaningful way atm. Right? Is the bloat caused by rows these transactions actually can see or are the not-removed rows newer than the transaction's xmax? Since you actually don't want cancellations for the long running reporting queries it very much might be sensible to switch to a more complicated version of HeapTupleSatisfiesVacuum() if there's longrunning queries. One that can recognize if rows are actually visible to any backend, or if there are just snapshots that see older rows. I've previously wondered how hard this would be, but I don't think it's *that* hard. And it'd be a much more general solution. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers