On Fri, Apr 28, 2017 at 12:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Thu, Apr 27, 2017 at 5:22 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> How so? Shouldn't the indexscan go back and mark such tuples dead in >>> the index, such that they'd be visited this way only once? If that's >>> not happening, maybe we should try to fix it. > >> Hmm. Actually, I think the scenario I saw was where there was a large >> number of tuples at the end of the index that weren't dead yet due to >> an old snapshot held open. That index was being scanned by lots of >> short-running queries. Those queries executed just fine, but they >> took a long to plan because they had to step over all of the dead >> tuples in the index one by one. > > But that was the scenario that we intended to fix by changing to > SnapshotDirty, no? Or I guess not quite, because > dead-but-still-visible-to-somebody tuples are rejected by SnapshotDirty.
Yup. > Maybe we need another type of snapshot that would accept any > non-vacuumable tuple. I really don't want SnapshotAny semantics here, > but a tuple that was live more recently than the xmin horizon seems > like it's acceptable enough. HeapTupleSatisfiesVacuum already > implements the right behavior, but we don't have a Snapshot-style > interface for it. Maybe. What I know is that several people have found SnapshotDirty to be problematic, and in the case with which I am acquainted, using SnapshotAny fixed it. I do not know whether, if everybody in the world were using SnapshotAny, somebody would have the problem you're talking about, or some other one. And if they did, I don't know whether using the new kind of snapshot you are proposing would fix it. I do know that giving SnapshotAny to people seems to have only improved things according to the information currently available to me. I don't, in general, share your intuition that using SnapshotAny is the wrong thing. We're looking up the last value in the index for planning purposes. It seems to me that, as far as making index scans more or less expensive to scan, a dead tuple is almost as good as a live one. Until that tuple is not only marked dead, but removed from the index page, it contributes to the cost of an index scan. To put that another way, suppose the range of index keys is 0 to 2 million, but the heap tuples for values 1 million and up are committed deleted. All the index tuples remain (and may or may not be removable depending on what other snapshots exist in the system). Now, consider the following three cases: (a) index scan from 0 to 10,000 (b) index scan from 1,000,000 to 1,010,000 (c) index scan from 3,000,000 to 3,010,000 I contend that the cost of index scan (b) is a lot closer to the cost of (a) than to the cost of (c). (b) finds a whole bunch of index tuples; (c) gives up immediately and goes home. So I actually think that using the actual last value in the index - 2,000,000 - is conceptually *correct* regardless of whether it's marked dead and regardless of whether the corresponding heap tuple is dead. The cost depends mostly on which tuples are present in the index, not which table rows the user can see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers