Hi Andres,

> I think this is a case where the potential work arounds are complex enough to 
> use significant resources to get right, and are likely to make properly 
> fixing the issue harder. I'm willing to comment on proposals that claim not 
> to be problmatic in those regards, but I have *SERIOUS* doubts they exist.

Alright. But I'd still try and ask your thoughts about it (below).
The proposed design touches the buffer invalidation during recovery process of 
standby server.

The question was about "how" to remember those buffers that contain 
truncate/drop tables, right?

1. Because the multiple scans of the whole shared buffer per concurrent 
truncate/drop table was the cause of the time-consuming behavior, DURING the 
failover process while WAL is being applied, we temporary delay the scanning 
and invalidating of shared buffers. At the same time, we remember the 
relations/relfilenodes (of dropped/truncated tables) by adding them in a hash 
table called "skip list". 
2. After WAL is applied, the checkpoint(or bg writer) scans the shared buffer 
only ONCE, compare the pages against the skip list, and invalidates the 
relevant pages. After deleting the relevant pages on the shared memory, it will 
not be written back to the disk.

Assuming the theory works, this design will only affect the behavior of 
checkpointer (or maybe bg writer) during recovery process / failover. Any 
feedback, thoughts?

BTW, are there any updates whether the community will push through anytime soon 
regarding the buffer mapping implementation you mentioned?


Regards,
Kirk Jamison


Reply via email to