Re: Separate HEAP WAL replay logic into its own file

2024-07-30 Thread Li, Yong

> I think that this proposal is reasonable but we need to get
> attention from a committer to move forward this proposal.
> 
> 
> Thanks,
> —
> kou

Thank you Kou for your review. I will move the CF to the next
phase and see what happens.


Regards,
Yong




Re: Make COPY format extendable: Extract COPY TO format implementations

2024-07-26 Thread Li, Yong


> On Jul 25, 2024, at 12:51, Sutou Kouhei  wrote:
> 
> Hi,
> 
> THREAD SUMMARY:

Very nice summary.

> 
> Implementation:
> 
> The v18 patch set is the latest patch set. [6]
> It includes the following patches:
> 
> 0001: This adds a basic feature (Copy{From,To}Routine)
>  (This isn't enough for extending COPY format.
>  This just extracts minimal procedure sets to be
>  extendable as callback sets.)
> 0002: This uses Copy{From,To}Rountine for the existing
>  formats (text, csv and binary)
>  (This may not be committed because there is a
>  profiling related concern. See the following section
>  for details)
> 0003: This adds support for specifying custom format by
>  "COPY ... WITH (format 'my-format')"
>  (This also adds a test for this feature.)
> 0004: This exports Copy{From,To}StateData
>  (But this isn't enough to implement custom COPY
>  FROM/TO handlers as an extension.)
> 0005: This adds opaque member to Copy{From,To}StateData and
>  export some functions to read the next data and flush
>  the buffer
>  (We can implement a PoC Apache Arrow COPY FROM/TO
>  handler as an extension with this. [7])
> 
> Thanks,
> --
> kou
> 

This review is for 0001 only because the other patches are not ready
for commit.

The v18-0001 patch applies cleanly to HEAD. “make check-world” also
runs cleanly. The patch looks good for me.


Regards,
Yong

Re: Separate HEAP WAL replay logic into its own file

2024-07-26 Thread Li, Yong


> On Jul 23, 2024, at 09:54, Sutou Kouhei  wrote:
>
>
> Here are my comments for your patch:
>
> 1. Could you create your patch by "git format-patch -vN master"
>   or something? If you create your patch by "git format-patch",
>   we can apply your patch by "git am XXX.patch".
>

Thanks for your review. I’ve updated the patch to follow your
suggested format.

>
> 3. Could you remove '#include "access/heapam_xlog.h"' from
>   heapam.c because it's needless now.
>
>   BTW, it seems that we can remove more includes from
>   heapam.c:
>
> 4. Could you remove needless includes from heapam_xlog.c? It
>   seems that we can remove the following includes:

I have removed the redundant includes in the latest patch.

>
> 5. There are still WAL related codes in heapam.c:
>
>   4.1. log_heap_update()
>   4.2. log_heap_new_cid()
>   4.3. if (RelationNeedsWAL()) {...} in heap_insert()
>   4.4. if (needwal) {...} in heap_multi_insert()
>   4.5. if (RelationNeedsWAL()) {...} in heap_delete()
>   4.6. if (RelationNeedsWAL()) {...}s in heap_update()
>   4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
>   4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
>   4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
>   4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
>   4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
>   4.12. log_heap_visible()
>
>   Should we move them to head_xlog.c too?
>
>   If we should do it, separated commits will be easy to
>   review. For example, the 0001 patch moves existing codes
>   to head_xlog.c as-is. The 0002 patch extracts WAL related
>   codes in heap_insert() to heap_xlog.c and so on.
>
>
> Thanks,
> --
> kou

I followed the convention of most access methods. The “xlog”
file includes the WAL replay logic only. The logic that generates
the log records themselves stays with the code that performs
the changes. Take nbtree as an example, you can also find
WAL generating code in several _bt_insertxxx() functions inside
the nbtinsert.c file.

Please help review the updated file again. Thanks in advance!


Yong






v2-0001-heapam_refactor.patch
Description: v2-0001-heapam_refactor.patch


Re: Make COPY format extendable: Extract COPY TO format implementations

2024-07-22 Thread Li, Yong
Hi Kou,

I tried to follow the thread but had to skip quite some discussions in the 
middle part of the thread. From what I read, it appears to me that there were a 
lot of back-and-forth discussions on the specific implementation details (i.e. 
do not touch existing format implementation), performance concerns and how to 
split the patches to make it more manageable.

My understanding is that the provided v17 patch aims to achieve the followings:
- Retain existing format implementations as built-in formats, and do not go 
through the new interface for them.
- Make sure that there is no sign of performance degradation.
- Refactoring the existing code to make it easier and possible to make copy 
handlers extensible. However, some of the infrastructure work that are required 
to make copy handler extensible are intentionally delayed for future patches. 
Some of the work were proposed as patches in earlier messages, but they were 
not explicitly referenced in recent messages.

Overall, the current v17 patch applies cleanly to HEAD. “make check-world” also 
runs cleanly. If my understanding of the current status of the patch is 
correct, the patch looks good to me.


Regards,
Yong

Re: Separate HEAP WAL replay logic into its own file

2024-06-19 Thread Li, Yong


> On Jun 18, 2024, at 20:42, Melanie Plageman  wrote:
> 
> External Email
> 
> On Mon, Jun 17, 2024 at 9:12 PM Li, Yong  wrote:
>> 
>> As a newcomer, when I was walking through the code looking for WAL replay 
>> related code, it was relatively easy for me to find them for the B-Tree 
>> access method because of the “xlog” hint in the file names. It took me a 
>> while to find the same for the heap access method. When I finally found them 
>> (via text search), it was a small surprise. Having different file 
>> organizations for different access methods gives me this urge to make 
>> everything consistent. I think it will make it easier for newcomers, and it 
>> will reduce the mental load for everyone to remember that heap replay is 
>> inside the heapam.c not some “???xlog.c”.
> 
> That makes sense. The branch for PG18 has not been cut yet, so I
> recommend registering this patch for the July commitfest [1] so it
> doesn't get lost.
> 
> - Melanie
> 

Thanks for the positive feedback. I’ve added the patch to the July CF.

Yong



Re: Separate HEAP WAL replay logic into its own file

2024-06-17 Thread Li, Yong


> On Jun 17, 2024, at 23:01, Melanie Plageman  wrote:
> 
> External Email
> 
> On Mon, Jun 17, 2024 at 2:20 AM Li, Yong  wrote:
>> 
>> Hi PostgreSQL hackers,
>> 
>> For most access methods in PostgreSQL, the implementation of the access 
>> method itself and the implementation of its WAL replay logic are organized 
>> in separate source files.  However, the HEAP access method is an exception.  
>> Both the access method and the WAL replay logic are collocated in the same 
>> heapam.c.  To follow the pattern established by other access methods and to 
>> improve maintainability, I made the enclosed patch to separate HEAP’s replay 
>> logic into its own file.  The changes are straightforward.  Move the replay 
>> related functions into the new heapam_xlog.c file, push the common 
>> heap_execute_freeze_tuple() helper function into the heapam.h header, and 
>> adjust the build files.
> 
> I'm not against this change, but I am curious at what inspired this.
> Were you looking at Postgres code and simply noticed that there isn't
> a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you
> wanted to change that? Or is there some specific reason this would
> help you as a Postgres developer, user, or ecosystem member?
> 
> - Melanie

As a newcomer, when I was walking through the code looking for WAL replay 
related code, it was relatively easy for me to find them for the B-Tree access 
method because of the “xlog” hint in the file names. It took me a while to find 
the same for the heap access method. When I finally found them (via text 
search), it was a small surprise. Having different file organizations for 
different access methods gives me this urge to make everything consistent. I 
think it will make it easier for newcomers, and it will reduce the mental load 
for everyone to remember that heap replay is inside the heapam.c not some 
“???xlog.c”.

Yong

Separate HEAP WAL replay logic into its own file

2024-06-17 Thread Li, Yong
Hi PostgreSQL hackers,

For most access methods in PostgreSQL, the implementation of the access method 
itself and the implementation of its WAL replay logic are organized in separate 
source files.  However, the HEAP access method is an exception.  Both the 
access method and the WAL replay logic are collocated in the same heapam.c.  To 
follow the pattern established by other access methods and to improve 
maintainability, I made the enclosed patch to separate HEAP’s replay logic into 
its own file.  The changes are straightforward.  Move the replay related 
functions into the new heapam_xlog.c file, push the common 
heap_execute_freeze_tuple() helper function into the heapam.h header, and 
adjust the build files.

I hope people find this straightforward refactoring helpful.


Yong





heapam_refactor.patch
Description: heapam_refactor.patch


Re: Proposal to add page headers to SLRU pages

2024-06-13 Thread Li, Yong


> On Jun 10, 2024, at 16:01, Michael Paquier  wrote:
> 
> External Email
> 
> From: Michael Paquier 
> Subject: Re: Proposal to add page headers to SLRU pages
> Date: June 10, 2024 at 16:01:50 GMT+8
> To: Bertrand Drouvot 
> Cc: "Li, Yong" , Jeff Davis , Aleksander 
> Alekseev , PostgreSQL Hackers 
> , "Bagga, Rishu" , 
> Robert Haas , Andrey Borodin , 
> "Shyrabokau, Anton" 
> 
> 
> On Mon, Jun 10, 2024 at 07:19:56AM +0000, Bertrand Drouvot wrote:
>> On Tue, Mar 19, 2024 at 06:48:33AM +, Li, Yong wrote:
>>> Unfortunately, the test requires a setup of two different versions of PG. I 
>>> am not
>>> aware of an existing test infrastructure which can run automated tests 
>>> using two
>>> PGs. I did manually verify the output of pg_upgrade.
>> 
>> I think there is something in t/002_pg_upgrade.pl (see 
>> src/bin/pg_upgrade/TESTING),
>> that could be used to run automated tests using an old and a current version.
> 
> Cluster.pm relies on install_path for stuff, where it is possible to
> create tests with multiple nodes pointing to different installation
> paths.  This allows mixing nodes with different build options, or just
> different major versions like pg_upgrade's perl tests.
> —
> Michael
> 
> 

Thanks for pointing this out. Here is what I have tried:
1. Manually build and install PostgreSQL from the latest source code.
2. Following the instructions from src/bin/pg_upgrade to manually dump the 
regression database.
3. Apply the patch to the latest code, and build from the source.
4. Run “make check” by following the instructions from src/bin/pg_upgrade and 
setting up the olddump and oldinstall to point to the “old” installation used 
in step 2.

All tests pass.


Yong



Re: Proposal to add page headers to SLRU pages

2024-03-19 Thread Li, Yong

>> - New comments have been added to pg_upgrade to mention the SLRU
>>  page header change as the reason for upgrading clog files.
> 
> That seems reasonable, but were any alternatives discussed? Do we have
> consensus that this is the right thing to do?

In general, there are two approaches. Either we convert the existing clog files,
or we don’t.  The patch chooses to convert.

If we don’t, then the clog file code must be able to handle both formats. For,
XIDs in the range where the clog is written in the old format, segment and 
offset
computation must be done in one way, and for XIDs in a different range, it must
be computed in a different way.  To avoid changing the format in the middle of a
page, which must not happen, the new format must start from a clean page, 
possibly in a clean new segment.  If the database is extremely small and has 
only
a few transactions on the first page of clog, then we must either convert the 
whole
page (effectively the whole clog file), or we must skip the rest of the XIDs on 
the
page and ask the database to start from XIDs on the second page on restart.
Also, we need to consider where to store the cut-off XID and when to remove it.
All these details feel very complex and error prone to me.  Performing a 
one-time
conversion is the most efficient and straightforward approach to me. 

> 
> And if we use this approach, is there extra validation or testing that
> can be done?
> 
> Regards,
>Jeff Davis

Unfortunately, the test requires a setup of two different versions of PG. I am 
not
aware of an existing test infrastructure which can run automated tests using two
PGs. I did manually verify the output of pg_upgrade. 


Regards,
Yong



Re: Proposal to add page headers to SLRU pages

2024-03-11 Thread Li, Yong

> On Mar 9, 2024, at 05:22, Jeff Davis  wrote:
> 
> External Email
> 
> On Wed, 2024-03-06 at 12:01 +, Li, Yong wrote:
>> Rebase the patch against the latest HEAD.
> 
> The upgrade logic could use more comments explaining what's going on
> and why. As I understand it, it's a one-time conversion that needs to
> happen between 16 and 17. Is that right?
> 
> Regards,
>Jeff Davis
> 

> In the new code we effectively store only one LSN per page, which I
> understand is strictly worse.  Maybe the idea of doing away with these
> LSN groups should be reconsidered ... unless I completely misunderstand
> the whole thing.
> 
> --
> Álvaro Herrera PostgreSQL Developer  —


Thanks for the comments on LSN groups and pg_upgrade.

I have updated the patch to address both comments:
- The clog LSN group has been brought back.
  Now the page LSN on each clog page is used for honoring the write-ahead rule
  and it is always the highest LSN of all the LSN groups on the page.
  The LSN groups are used by TransactionIdGetStatus() as before.
- New comments have been added to pg_upgrade to mention the SLRU
  page header change as the reason for upgrading clog files.

Regards,
Yong

slru_page_header_v6.patch
Description: slru_page_header_v6.patch


Re: Proposal to add page headers to SLRU pages

2024-03-07 Thread Li, Yong

> On Mar 7, 2024, at 03:09, Stephen Frost  wrote:
> 
> External Email
> 
> From: Stephen Frost 
> Subject: Re: Proposal to add page headers to SLRU pages
> Date: March 7, 2024 at 03:09:59 GMT+8
> To: Alvaro Herrera 
> Cc: "Li, Yong" , Aleksander Alekseev 
> , PostgreSQL Hackers 
> , "Bagga, Rishu" , 
> Robert Haas , "Debnath, Shawn" , Andrey 
> Borodin , "Shyrabokau, Anton" 
> 
> 
> Greetings,
> 
> * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
>> I suppose this is important to do if we ever want to move SLRUs into
>> shared buffers.  However, I wonder about the extra time this adds to
>> pg_upgrade.  Is this something we should be concerned about?  Is there
>> any measurement/estimates to tell us how long this would be?  Right now,
>> if you use a cloning strategy for the data files, the upgrade should be
>> pretty quick ... but the amount of data in pg_xact and pg_multixact
>> could be massive, and the rewrite is likely to take considerable time.
> 
> While I definitely agree that there should be some consideration of
> this concern, it feels on-par with the visibility-map rewrite which was
> done previously.  Larger systems will likely have more to deal with than
> smaller systems, but it's still a relatively small portion of the data
> overall.
> 
> The benefit of this change, beyond just the possibility of moving them
> into shared buffers some day in the future, is that this would mean that
> SLRUs will have checksums (if the cluster has them enabled).  That
> benefit strikes me as well worth the cost of the rewrite taking some
> time and the minor loss of space due to the page header.
> 
> Would it be useful to consider parallelizing this work?  There's already
> parts of pg_upgrade which can be parallelized and so this isn't,
> hopefully, a big lift to add, but I'm not sure if there's enough work
> being done here CPU-wise, compared to the amount of IO being done, to
> have it make sense to run it in parallel.  Might be worth looking into
> though, at least, as disks have gotten to be quite fast.
> 
> Thanks!
> 
> Stephen
> 


Thank Alvaro and Stephen for your thoughtful comments.

I did a quick benchmark regarding pg_upgrade time, and here are the results.

Hardware spec:
MacBook Pro M1 Max - 10 cores, 64GB memory, 1TB Apple SSD

Operating system:
macOS 14.3.1

Complier:
Apple clang 15.0.0

Compiler optimization level: -O2


PG setups:
Old cluster:  PG 16.2 release (source build)
New cluster:  PG Git HEAD plus the patch (source build)


Benchmark steps:

1. Initdb for PG 16.2.
2. Initdb for PG HEAD.
3. Run pg_upgrade on the above empty database, and time the overall wall clock 
time.
4. In the old cluster, write 512MB all-zero dummy segment files (2048 segments) 
under pg_xact.
5. In the old cluster, write 512MB all-zero dummy segment files under 
pg_multixact/members.
6. In the old cluster, write 512MB all-zero dummy segment files under 
pg_multixact/offsets.
7. Purge the OS page cache.
7. Run pg_upgrade again, and time the overall wall clock time.


Test result:

On the empty database, pg_upgrade took 4.8 seconds to complete.

With 1.5GB combined SLRU data to convert, pg_upgrade took 11.5 seconds to 
complete.

It took 6.7 seconds to convert 1.5GB SLRU files for pg_upgrade.



For clog, 2048 segments can host about 2 billion transactions, right at the 
limit for wraparound.
That’s the maximum we can have.  2048 segments are also big for pg_multixact 
SLRUs.

Therefore, on a modern hardware, in the worst case, pg_upgrade will run for 7 
seconds longer.


Regards,

Yong

Re: Proposal to add page headers to SLRU pages

2024-03-06 Thread Li, Yong
Rebase the patch against the latest HEAD.

Regards,
Yong





slru_page_header_v5.patch
Description: slru_page_header_v5.patch


Re: locked reads for atomics

2024-01-16 Thread Li, Yong


> On Nov 28, 2023, at 05:00, Nathan Bossart  wrote:
> 
> External Email
> 
> Here's a v2 of the patch set in which I've attempted to address all
> feedback.  I've also added a pg_write_membarrier_u* pair of functions that
> provide an easy way to write to an atomic variable with full barrier
> semantics.  In the generic implementation, these are just aliases for an
> atomic exchange.
> 
> 0002 demonstrates how these functions might be used to eliminate the
> arch_lck spinlock, which is only ever used for one boolean variable.  My
> hope is that the membarrier functions make eliminating spinlocks for
> non-performance-sensitive code easy to reason about.
> 
> (We might be able to use a pg_atomic_flag instead for 0002, but that code
> seems intended for a slightly different use-case and has more complicated
> barrier semantics.)
> 
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com

Hi Nathan,

The patch looks good to me.

The patch adds two pairs of atomic functions that provide full-barrier 
semantics to atomic read/write operations. The patch also includes an example 
of how this new functions can be used to replace spin locks.

The patch applies cleanly to HEAD. “make check-world” also runs cleanly with no 
error.  I am moving it to Ready for Committers.

Regards,
Yong

Re: Proposal to add page headers to SLRU pages

2024-01-16 Thread Li, Yong
Rebase the patch against the latest HEAD.

Regards,
Yong



slru_page_header_v4.patch
Description: slru_page_header_v4.patch


Re: archive modules loose ends

2024-01-15 Thread Li, Yong


> On Nov 29, 2023, at 01:18, Nathan Bossart  wrote:
> 
> External Email
> 
> Here is a new version of the patch with feedback addressed.
> 
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com

Hi Nathan,

The patch looks good to me.  With the context explained in the thread, the 
patch is easy to understand.
The patch serves as a refactoring which pulls up common memory management and 
error handling concerns into the pgarch.c.  With the patch, individual archive 
callbacks can focus on copying the files and leave the boilerplate code to 
pgarch.c. 

The patch applies cleanly to HEAD.  “make check-world” also runs cleanly with 
no error.


Regards,
Yong

Re: Proposal to add page headers to SLRU pages

2024-01-04 Thread Li, Yong


> On Jan 2, 2024, at 19:35, Aleksander Alekseev  
> wrote:
>
> Thanks for the updated patch.
>
> cfbot seems to have some complaints regarding compiler warnings and
> also building the patch on Windows:
>
> http://cfbot.cputube.org/

Thanks for the information.  Here is the updated patch.

Regards,
Yong



slru_page_header_v3.patch
Description: slru_page_header_v3.patch


Re: Proposal to add page headers to SLRU pages

2023-12-18 Thread Li, Yong
> This work is being done in file.c – it seems to me the proper way to
> proceed would be to continue writing on-disk upgrade logic here.

> Besides that this looks good to me, would like to hear what others have to 
> say.

Thank you, Rishu for taking time to review the code.  I've updated the patch
and moved the on-disk upgrade logic to pg_upgrade/file.c.

I have also added this thread to the current Commitfest and hope this patch
will be  part of the 17 release.

The commitfest link:
https://commitfest.postgresql.org/46/4709/


Regards,
Yong,




slur_page_header_v2.patch
Description: slur_page_header_v2.patch


Re: Proposal to add page headers to SLRU pages

2023-12-08 Thread Li, Yong
Given so many different approaches were discussed, I have started a wiki to 
record and collaborate all efforts towards SLRU improvements.  The wiki 
provides a concise overview of all the ideas discussed and can serve as a 
portal for all historical discussions.  Currently, the wiki summarizes four 
recent threads ranging from identifier format change to page header change, to 
moving SLRU into the main buffer pool, to reduce lock contention on SLRU 
latches.  We can keep the patch related discussions in this thread and use the 
wiki as a live document for larger scale collaborations.

The wiki page is here:  https://wiki.postgresql.org/wiki/SLRU_improvements

Regarding the benefits of this patch, here is a detailed explanation:

  1.  Checksum is added to each page, allowing us to verify if a page has been 
corrupted when read from the disk.
  2.  The ad-hoc LSN group structure is removed from the SLRU cache control 
data and is replaced by the page LSN in the page header. This allows us to use 
the same WAL protocol as used by pages in the main buffer pool: flush all redo 
logs up to the page LSN before flushing the page itself. If we move SLRU caches 
into the main buffer pool, this change fits naturally.
  3.  It leaves further optimizations open. We can continue to pursue the goal 
of moving SLRU into the main buffer pool, or we can follow the lock partition 
idea. This change by itself does not conflict with either proposal.

Also, the patch is now complete and is ready for review.  All check-world tests 
including tap tests passed with this patch.


Regards,
Yong

From: Robert Haas 
Date: Friday, December 8, 2023 at 03:51
To: Debnath, Shawn 
Cc: Andrey Borodin , PostgreSQL Hackers 
, Aleksander Alekseev 
, Li, Yong , Shyrabokau, Anton 
, Bagga, Rishu 
Subject: Re: Proposal to add page headers to SLRU pages
External Email

On Thu, Dec 7, 2023 at 1:28 PM Debnath, Shawn  wrote:
> What is being proposed now is the simple and core functionality of introducing
> page headers to SLRU pages while continuing to be in the SLRU cache. This
> allows the whole project to be iterative and reviewers to better reason about
> the smaller set of changes being introduced into the codebase.
>
> Once the set of on-disk changes are in, we can follow up on optimizations.
> It may be moving to buffer cache or reviewing Dilip's approach in [1], we
> will have the option to be flexible in our approach.

I basically agree with this. I don't think we should let the perfect
be the enemy of the good. Shooting down this patch because it doesn't
do everything that we want is a recipe for getting nothing done at
all.

That said, I don't think that the original post on this thread
provides a sufficiently clear and detailed motivation for making this
change. For this to eventually be committed, it's going to need (among
other things) a commit message that articulates a convincing rationale
for whatever changes it makes. Here's what the original email said:

> It adds a checksum to each SLRU page, tracks page LSN as if it is a standard 
> page and eases future page enhancements.

Of those three things, in my opinion, the first is good and the other
two are too vague. I assume that most people who would be likely to
read a commit message would understand the value of pages having
checksums. But I can't immediately think of what the value of tracking
the page LSN as if it were a standard page might be, so that probably
needs more explanation. Similarly, at least one or two of the future
page enhancements that might be eased should be spelled out, and/or
the ways in which they would be made easier should be articulated.

--
Robert Haas
EDB: 
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com%2F=05%7C01%7Cyoli%40ebay.com%7C2cad2fe1de8a40f3167608dbf75de73c%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638375754901646398%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=hkccuGfpt1%2BKxuhk%2BJt%2F3HyYuJqQHYfizib76%2F9HtUU%3D=0<http://www.enterprisedb.com/>


slru_page_header_v1.patch
Description: slru_page_header_v1.patch


Proposal to add page headers to SLRU pages

2023-12-06 Thread Li, Yong
Hi all,

PostgreSQL currently maintains several data structures in the SLRU cache.  The 
current SLRU pages do not have any header, so it is impossible to checksum a 
page and verify its integrity.  It is very difficult to debug issues caused by 
corrupted SLRU pages.  Also, without a page header, page LSN is tracked in an 
ad-hoc fashion using LSN groups, which requires additional data structure in 
the shared memory.  At eBay, we are building on the patch shared by Rishu Bagga 
in [1], which adds the standard PageHeaderData to each SLRU page.  We believe 
that adding the standard page header to each SLRU page is the correct approach 
for the long run.  It adds a checksum to each SLRU page, tracks page LSN as if 
it is a standard page and eases future page enhancements.

The enclosed patch changes the address calculation logic for all 7 SLRUs in the 
following 6 files:
src/backend/access/transam/clog.c
src/backend/access/transam/commit_ts.c
src/backend/access/transam/multixact.c
src/backend/access/transam/subtrans.c
src/backend/commands/async.c
src/backend/storage/lmgr/predicate.c

The patch enables page checksum with changes to the following 2 files:
src/backend/access/transam/slru.c
src/bin/pg_checksums/pg_checksums.c

The patch removes the group LSNs defined for each SLRU cache. See changes to:
src/include/access/slru.h

The patch adds a few helper macros in the following files:
src/backend/storage/page/bufpage.c
src/include/storage/bufpage.h

The patch updates some test cases:
src/bin/pg_resetwal/t/001_basic.pl
src/test/modules/test_slru/test_slru.c

I am still working on patching the pg_upgrade.  Just love to hear your thoughts 
on the idea and the current patch.


Discussed with: Anton Shyrabokau and Shawn Debnath

[1] 
https://www.postgresql.org/message-id/flat/EFAAC0BE-27E9-4186-B925-79B7C696D5AC%40amazon.com


Regards,
Yong




slru_add_page_header.patch
Description: slru_add_page_header.patch