On 2/15/15 10:36 AM, Tom Lane wrote:
Kevin Grittner <kgri...@ymail.com> writes:
Tom Lane <t...@sss.pgh.pa.us> wrote:
Kevin Grittner <kgri...@ymail.com> writes:
What this patch does is add a GUC call old_snapshot_threshold.  It
defaults to -1, which leaves behavior matching unpatched code.
Above that it allows tuples to be vacuumed away after the number of
transaction IDs specified by the GUC have been consumed.

TBH, I'm not sure why we'd wish to emulate Oracle's single worst
operational feature.

I've run into cases where people have suffered horribly bloated
databases because of one ill-behaved connection.  Some companies
don't want to be vulnerable to that and the disruption that
recovery from that bloat causes.

No doubt, preventing bloat is a good thing, but that doesn't mean this
is the best API we could create for the issue.  The proposition this
patch offers to DBAs is: "You can turn this knob to reduce bloat by some
hard-to-quantify factor.  The cost is that some long-running transactions
might fail.  You won't know which ones are at risk, the failures won't be
the same from time to time, and you won't be able to do anything to spare
high-value transactions from that fate except by turning that knob back
again globally (which requires a database restart)."  Maybe refugees from
Oracle will think that sounds good, but nobody else will.

I wonder if we couldn't achieve largely the same positive effects through
adding a simple transaction-level timeout option.  That would be far
easier for users to understand and manage, it would be trivial to allow
specific high-value transactions to run with a laxer setting, it does not
create any implementation-detail-dependent behavior that we'd be having to
explain to users forevermore, and (not incidentally) it would be a lot
simpler and more trustworthy to implement.  There's no well-defined
correlation between your setting and the net effect on database bloat,
but that's true with the "snapshot too old" approach as well.

A common use-case is long-running reports hitting relatively stable data in a database that also has tables with a high churn rate (ie: queue tables). In those scenarios your only options right now are to suffer huge amounts of bloat in the high-churn or not do your reporting. A simple transaction timeout only "solves" this by denying you reporting queries.

An idea that I've had on this would be some way to "lock down" the tables that a long-running transaction could access. That would allow vacuum to ignore any snapshots that transaction had for tables it wasn't accessing. That's something that would be deterministic.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to