Le mer. 12 oct. 2022 à 06:08, Ron <ronljohnso...@gmail.com> a écrit :
> On 10/11/22 22:35, Julien Rouhaud wrote: > > On Tue, Oct 11, 2022 at 07:42:55PM +0200, Guillaume Lelarge wrote: > >> Le mar. 11 oct. 2022 à 18:27, Alvaro Herrera <alvhe...@alvh.no-ip.org> > a > >> écrit : > >> > >>> On 2022-Oct-11, Tom Lane wrote: > >>> > >>>> Are there any tables in this query where extremal values of the join > >>>> key are likely to be in recently-added or recently-dead rows? Does > >>>> VACUUM'ing on the primary help? > >>> I remember having an hypothesis, upon getting a report of this exact > >>> problem on a customer system once, that it could be due to killtuple > not > >>> propagating to standbys except by FPIs. I do not remember if we proved > >>> that true or not. I do not remember observing that tables were being > >>> read, however. > >>> > >>> > >> Thanks for your answers. > >> > >> The last predicate is "and c3>='2020-10-10' and c3<='2022-10-10'. I > have no > >> idea on the actual use of c3 but rows with c3 at '2022-10-10' (which is > >> yesterday) is much probably recently-added. I can ask my customer if you > >> want but this looks like a pretty safe bet. > >> > >> On the VACUUM question, I didn't say, but we're kind of wondering if it > was > >> lacking a recent-enough VACUUM. So they're doing a VACUUM tonight on the > >> database (and especially on the 1.6TB table which is part of the query). > >> I'm kind of skeptical because if the VACUUM wasn't enough on the > standby, > >> it should be the same on the primary. > >> > >> Actually, there are two things that really bug me: > >> * why the difference between primary and both standbys? > >> * why now? (it worked great before this weekend, and the only thing I > know > >> happened before is a batch delete on sunday... which may be a > good-enough > >> reason for things to get screwed, but once again, why only both > standbys?) > >> > >> Julien Rouhaud also told me about killtuples, but I have no idea what > they > >> are. I suppose this is different from dead tuples. Anyway, if you can > >> enlighten me, I'll be happy :) > > That's an optimisation where an index scan can mark an index entry as > dead > > (LP_DEAD) if if tries to fetch some data from the heap that turns out to > be all > > dead, so further scans won't have to check again (you can grep > kill_prior_tuple > > in the source for more details). As that's a hint bit, it may not be > > replicated unless you enable wal_log_hints or data_checksums (or write > it as a > > FPI indeed), which could explain discrepancy between primary (after a > first > > slow index scan) and standby nodes. > > > > But since your customer recreated their standbys from scratch *after* > that > > delete, all the nodes should have those hint bits set (Guillaume > confirmed > > off-list that they used a fresh BASE_BACKUP). Note that Guillaume also > > confirmed off-list that the customer has checksums enabled, which means > that > > MarkBufferDirtyHint() should be guaranteed to mark the buffers as dirty, > so I'm > > out of ideas to explain the different behavior on standbys. > > Would EXPLAIN (VERBOSE, COSTS, FORMAT JSON) run on both nodes help show > any > differences? > > No differences. -- Guillaume.