Le mer. 12 oct. 2022 à 08:56, Guillaume Lelarge <guilla...@lelarge.info> a
écrit :

> Le mar. 11 oct. 2022 à 19:42, Guillaume Lelarge <guilla...@lelarge.info>
> a écrit :
>
>> Le mar. 11 oct. 2022 à 18:27, Alvaro Herrera <alvhe...@alvh.no-ip.org> a
>> écrit :
>>
>>> On 2022-Oct-11, Tom Lane wrote:
>>>
>>> > Are there any tables in this query where extremal values of the join
>>> > key are likely to be in recently-added or recently-dead rows?  Does
>>> > VACUUM'ing on the primary help?
>>>
>>> I remember having an hypothesis, upon getting a report of this exact
>>> problem on a customer system once, that it could be due to killtuple not
>>> propagating to standbys except by FPIs.  I do not remember if we proved
>>> that true or not.  I do not remember observing that tables were being
>>> read, however.
>>>
>>>
>> Thanks for your answers.
>>
>> The last predicate is "and c3>='2020-10-10' and c3<='2022-10-10'. I have
>> no idea on the actual use of c3 but rows with c3 at '2022-10-10' (which is
>> yesterday) is much probably recently-added. I can ask my customer if you
>> want but this looks like a pretty safe bet.
>>
>> On the VACUUM question, I didn't say, but we're kind of wondering if it
>> was lacking a recent-enough VACUUM. So they're doing a VACUUM tonight on
>> the database (and especially on the 1.6TB table which is part of the
>> query). I'm kind of skeptical because if the VACUUM wasn't enough on the
>> standby, it should be the same on the primary.
>>
>>
> It appears that I was wrong. I just got an email from my customer saying
> they got their performance back after a VACUUM on the two main tables of
> the query. I'll have them on the phone in about an hour. I'll probably know
> more then. Still wondering why it was an issue on the standby and not on
> the primary. VACUUM cleans up tables and indexes, and this activity goes
> through WAL, doesn't it?
>
>
Just finished my phone call. So, they definitely have their performance
back. All they did was a VACUUM on two tables.

If I understand correctly, during "normal" operations, some information is
stored on the primary and sent to standbys. For some reason, only the
primary take them into account, standbys ignore them. That would explain
why, when we promoted a standby without doing anything else, it had much
better performance. VACUUM fixes the issue on a standby, probably by
storing this information in a different way. After VACUUM, standbys stop
ignoring this information, which helps get the performance back.

That sounds like a plausible explanation. I still have questions if you
don't mind:
* what is this information?
* where is it stored? my guess would be indexes
* why is it ignored on standbys and used on primary?

We didn't talk much about releases, so I guess that the
"standby-ignores-some-information" part is currently on all available
releases?

Thank you.


-- 
Guillaume.

Reply via email to