Hi, I have spent some time thinking about Michael's feedback and I want to see if I can propose an alternate design that we can come to a consensus on.
> I had to work around the assert at the end of pgstat_report_stat(), to > tell that under an xact the partial flushes were OK. I was wondering > about keeping the end assertion based on solely "force" intact, making > the flush routines return a status based on if we are in an xact. > > > That looks "simpler" that the previous proposal but who would be > > responsible to > > call pg_stat_report()? If that's the client responsibility that kind of > > look weird > > to me. If that's the core, how would that be scheduled? I think that the > > end solution should prevent to find similar issues as 039549d70f6 fixed, > > without > > delegating to the client. > > TBH, I don't like the requirement of setting SIGALRM in all the places > where we'd require it for the sake of this proposal, where > historically we have never done that, copying a mechanism that already > exists in the tree for procsigs, as the previous patch I posted proves > we could reuse. How about we add a timeout when a transaction goes idle? Currently we have IDLE_STATS_UPDATE_TIMEOUT, which is the timeout interval used when we have done more than one flush within a second. This only works when the session is idle (not in a transaction), and flushes everything. However, as I mentioned earlier in this thread, we can also schedule a flush that occurs when a transaction goes idle. This means when a transaction is started (BEGIN/START TRANSACTION) the proceeding idle-in-transaction can schedule a timeout of 10 seconds based on a new #define called PGSTAT_IDLE_TXN_INTERVAL. The timeout is not a fixed repeating interval. It is set on-demand each time the transaction enters idle-in-transaction state: first after the BEGIN, then again after each subsequent command completes. This is a trade-off: if a single command runs for a long time, no flush happens until it finishes and the transaction goes idle again. So, we still introduce a new timeout in this mechanism, but it does not need to be registered all over the place. So with this all the "safe to be flushed" stats will be flushed mid-transaction at some point, and all stats will be flushed at the end of a transaction. With that said, the only way I can conceive getting away from a a new timeout in this design is to track timestamp whenever we go into idle-intransaction, but I am not too comfy adding a GetCurrentTimestamp() there. -> if no action occurs at this point, we don't set another timeout > It's also not clear to me what a correct frequency of > the stat updates should be, and why it would make sense to force that > in a GUC; No GUC will be needed in what I am proposing. A 10s flush interval should be sufficient. > On top of that, I am not really convinced that there is a good reason > to remove the existing stat report calls we have already planted in > the tree for auxiliary processes, diverging from the stable branches > where these exist. The design being proposed keeps all the existing in-tact. So attached is a new proposal with tests and docs. In terms of test I fell back to the strategy used by Bertrand [0] with the INJECTION_POINT trick to reduce the flush interval. It does mean we keep a pg_sleep in the test, but I think we should not try to do anything different here. Also, with this design we can call the flush modes FLUSH_IN_TRANSACTION and FLUSH_AT_TXN_BOUNDARY. As the earlier proposals, the callbacks will still need to make the decision what stats to flush based on if it's called in-transaction or at boundary. [0] https://www.postgresql.org/message-id/aZbDYMrOkeCyIubO%40ip-10-97-1-34.eu-west-3.compute.internal -- Sami Imseih Amazon Web Services (AWS)
v12-0001-Add-periodic-in-transaction-stats-flushing.patch
Description: Binary data
