Re: Separate HEAP WAL replay logic into its own file
On Wed, Sep 18, 2024 at 08:40:02AM +, Li, Yong wrote: > I am so glad to see that my patch got committed. Thank you a lot for it! > This is my first accepted patch. It really means a lot to me. No problem. Thanks for the patch! -- Michael signature.asc Description: PGP signature
Re: Separate HEAP WAL replay logic into its own file
> On Sep 12, 2024, at 13:39, Michael Paquier wrote: > > External Email > > > I was looking at all that, and this is only moving code around. While > the part for heap_xlog_logical_rewrite in rewriteheap.c is a bit sad > but historical, the header cleanup in heapam.c is nice. > > Seeing heap_execute_freeze_tuple in heapam.h due to the dependency to > XLH_INVALID_XVAC and XLH_FREEZE_XVAC is slightly surprising, but the > opposite where heap_execute_freeze_tuple() would be in heapam_xlog.h > was less interesting. Just to say that I am agreeing with you here > and I have let this part as you suggested originally. > > I was wondering for a bit about the order of the functions for heap > and heap, but these are ordered in their own, which is also OK. I > have added a few more comments at the top of each subroutine for the > records to be more consistent, and applied the result. > — > Michael > I am so glad to see that my patch got committed. Thank you a lot for it! This is my first accepted patch. It really means a lot to me. Yong
Re: Separate HEAP WAL replay logic into its own file
On Thu, Sep 12, 2024 at 08:12:30AM +0900, Michael Paquier wrote: > It looks like my mind was wondering away when I wrote this part. > Sorry for the useless noise. I was looking at all that, and this is only moving code around. While the part for heap_xlog_logical_rewrite in rewriteheap.c is a bit sad but historical, the header cleanup in heapam.c is nice. Seeing heap_execute_freeze_tuple in heapam.h due to the dependency to XLH_INVALID_XVAC and XLH_FREEZE_XVAC is slightly surprising, but the opposite where heap_execute_freeze_tuple() would be in heapam_xlog.h was less interesting. Just to say that I am agreeing with you here and I have let this part as you suggested originally. I was wondering for a bit about the order of the functions for heap and heap, but these are ordered in their own, which is also OK. I have added a few more comments at the top of each subroutine for the records to be more consistent, and applied the result. -- Michael signature.asc Description: PGP signature
Re: Separate HEAP WAL replay logic into its own file
On Wed, Sep 11, 2024 at 04:41:49PM +0900, Michael Paquier wrote: > +#include "access/heapam_xlog.h" > > This is included in heapam.h, but missing from the patch. I guess > that you fat-fingered a `git add`. It looks like my mind was wondering away when I wrote this part. Sorry for the useless noise. -- Michael signature.asc Description: PGP signature
Re: Separate HEAP WAL replay logic into its own file
On Tue, Jul 30, 2024 at 06:48:26AM +, Li, Yong wrote: > Thank you Kou for your review. I will move the CF to the next > phase and see what happens. Quite a fan of what you are proposing here, knowing that heapam.c is still 8.8k lines of code even after moving the 1.3k lines dedicated to WAL records. +#include "access/heapam_xlog.h" This is included in heapam.h, but missing from the patch. I guess that you fat-fingered a `git add`. -- Michael signature.asc Description: PGP signature
Re: Separate HEAP WAL replay logic into its own file
> I think that this proposal is reasonable but we need to get > attention from a committer to move forward this proposal. > > > Thanks, > — > kou Thank you Kou for your review. I will move the CF to the next phase and see what happens. Regards, Yong
Re: Separate HEAP WAL replay logic into its own file
Hi, In <599e67d2-2929-4858-b8bc-f9c4ae889...@ebay.com> "Re: Separate HEAP WAL replay logic into its own file" on Fri, 26 Jul 2024 07:56:12 +, "Li, Yong" wrote: >> 1. Could you create your patch by "git format-patch -vN master" >> or something? If you create your patch by "git format-patch", >> we can apply your patch by "git am XXX.patch". >> > > Thanks for your review. I’ve updated the patch to follow your > suggested format. Thanks. I could apply your patch by "git am v2-0001-heapam_refactor.patch". Could you use the following format for the commit message next time? ${TITLE} ${DESCRIPTION} For example: Separate HEAP WAL replay logic into its own file Most access methods (i.e. nbtree and hash) use a separate file with "xlog" in its name for its WAL replay logic. Heap is one exception of this convention. To make it easier for newcomers to find the WAL replay logic for the heap access method, this patch isolates heap's replay logic in a new heapam_xlog.c file. This patch is a pure refactoring with no change to the logic. This is a commonly used Git's commit message format. See also other commit messages by "git log". >> 5. There are still WAL related codes in heapam.c: >> >> 4.1. log_heap_update() >> 4.2. log_heap_new_cid() >> 4.3. if (RelationNeedsWAL()) {...} in heap_insert() >> 4.4. if (needwal) {...} in heap_multi_insert() >> 4.5. if (RelationNeedsWAL()) {...} in heap_delete() >> 4.6. if (RelationNeedsWAL()) {...}s in heap_update() >> 4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple() >> 4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec() >> 4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative() >> 4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative() >> 4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update() >> 4.12. log_heap_visible() >> >> Should we move them to head_xlog.c too? >> >> If we should do it, separated commits will be easy to >> review. For example, the 0001 patch moves existing codes >> to head_xlog.c as-is. The 0002 patch extracts WAL related >> codes in heap_insert() to heap_xlog.c and so on. > > I followed the convention of most access methods. The “xlog” > file includes the WAL replay logic only. The logic that generates > the log records themselves stays with the code that performs > the changes. Take nbtree as an example, you can also find > WAL generating code in several _bt_insertxxx() functions inside > the nbtinsert.c file. You're right. Sorry. I think that this proposal is reasonable but we need to get attention from a committer to move forward this proposal. Thanks, -- kou
Re: Separate HEAP WAL replay logic into its own file
> On Jul 23, 2024, at 09:54, Sutou Kouhei wrote: > > > Here are my comments for your patch: > > 1. Could you create your patch by "git format-patch -vN master" > or something? If you create your patch by "git format-patch", > we can apply your patch by "git am XXX.patch". > Thanks for your review. I’ve updated the patch to follow your suggested format. > > 3. Could you remove '#include "access/heapam_xlog.h"' from > heapam.c because it's needless now. > > BTW, it seems that we can remove more includes from > heapam.c: > > 4. Could you remove needless includes from heapam_xlog.c? It > seems that we can remove the following includes: I have removed the redundant includes in the latest patch. > > 5. There are still WAL related codes in heapam.c: > > 4.1. log_heap_update() > 4.2. log_heap_new_cid() > 4.3. if (RelationNeedsWAL()) {...} in heap_insert() > 4.4. if (needwal) {...} in heap_multi_insert() > 4.5. if (RelationNeedsWAL()) {...} in heap_delete() > 4.6. if (RelationNeedsWAL()) {...}s in heap_update() > 4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple() > 4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec() > 4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative() > 4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative() > 4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update() > 4.12. log_heap_visible() > > Should we move them to head_xlog.c too? > > If we should do it, separated commits will be easy to > review. For example, the 0001 patch moves existing codes > to head_xlog.c as-is. The 0002 patch extracts WAL related > codes in heap_insert() to heap_xlog.c and so on. > > > Thanks, > -- > kou I followed the convention of most access methods. The “xlog” file includes the WAL replay logic only. The logic that generates the log records themselves stays with the code that performs the changes. Take nbtree as an example, you can also find WAL generating code in several _bt_insertxxx() functions inside the nbtinsert.c file. Please help review the updated file again. Thanks in advance! Yong v2-0001-heapam_refactor.patch Description: v2-0001-heapam_refactor.patch
Re: Separate HEAP WAL replay logic into its own file
Hi, I'm reviewing patches in commitfest 2024-07: https://commitfest.postgresql.org/48/ This is the 5th patch: https://commitfest.postgresql.org/48/5054/ FYI: https://commitfest.postgresql.org/48/4681/ is my patch. In "Re: Separate HEAP WAL replay logic into its own file" on Tue, 18 Jun 2024 01:12:42 +, "Li, Yong" wrote: > As a newcomer, when I was walking through the code looking > for WAL replay related code, it was relatively easy for me > to find them for the B-Tree access method because of the > “xlog” hint in the file names. It took me a while to > find the same for the heap access method. When I finally > found them (via text search), it was a small > surprise. Having different file organizations for > different access methods gives me this urge to make > everything consistent. I think it will make it easier for > newcomers, and it will reduce the mental load for everyone > to remember that heap replay is inside the heapam.c not > some “???xlog.c”. It makes sense. Here are my comments for your patch: 1. Could you create your patch by "git format-patch -vN master" or something? If you create your patch by "git format-patch", we can apply your patch by "git am XXX.patch". 2. I confirmed that all heapam.c -> heapam_xlog.c/heapam.h moves don't change implementations. I re-moved moved codes to heapam.c and there is no diff in heapam.c. 3. Could you remove '#include "access/heapam_xlog.h"' from heapam.c because it's needless now. BTW, it seems that we can remove more includes from heapam.c: diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index bc6d4868975..f1671072576 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -31,42 +31,24 @@ */ #include "postgres.h" -#include "access/bufmask.h" #include "access/heapam.h" -#include "access/heapam_xlog.h" #include "access/heaptoast.h" #include "access/hio.h" #include "access/multixact.h" -#include "access/parallel.h" -#include "access/relscan.h" #include "access/subtrans.h" #include "access/syncscan.h" -#include "access/sysattr.h" -#include "access/tableam.h" -#include "access/transam.h" #include "access/valid.h" #include "access/visibilitymap.h" -#include "access/xact.h" -#include "access/xlog.h" #include "access/xloginsert.h" -#include "access/xlogutils.h" -#include "catalog/catalog.h" #include "commands/vacuum.h" -#include "miscadmin.h" #include "pgstat.h" -#include "port/atomics.h" #include "port/pg_bitutils.h" -#include "storage/bufmgr.h" -#include "storage/freespace.h" #include "storage/lmgr.h" #include "storage/predicate.h" #include "storage/procarray.h" -#include "storage/standby.h" #include "utils/datum.h" #include "utils/injection_point.h" #include "utils/inval.h" -#include "utils/relcache.h" -#include "utils/snapmgr.h" #include "utils/spccache.h" --- We may want to work on removing needless includes as a separated cleanup task. 4. Could you remove needless includes from heapam_xlog.c? It seems that we can remove the following includes: diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c index b372f2b4afc..af4976f382d 100644 --- a/src/backend/access/heap/heapam_xlog.c +++ b/src/backend/access/heap/heapam_xlog.c @@ -16,16 +16,11 @@ #include "access/bufmask.h" #include "access/heapam.h" -#include "access/heapam_xlog.h" -#include "access/transam.h" #include "access/visibilitymap.h" #include "access/xlog.h" #include "access/xlogutils.h" -#include "port/atomics.h" -#include "storage/bufmgr.h" #include "storage/freespace.h" #include "storage/standby.h" -#include "utils/relcache.h" 5. There are still WAL related codes in heapam.c: 4.1. log_heap_update() 4.2. log_heap_new_cid() 4.3. if (RelationNeedsWAL()) {...} in heap_insert() 4.4. if (needwal) {...} in heap_multi_insert() 4.5. if (RelationNeedsWAL()) {...} in heap_delete() 4.6. if (RelationNeedsWAL()) {...}s in heap_update() 4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple() 4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec() 4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative() 4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative() 4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update() 4.12. log_heap_visible() Should we move them to head_xlog.c too? If we should do it, separated commits will be easy to review. For example, the 0001 patch moves existing codes to head_xlog.c as-is. The 0002 patch extracts WAL related codes in heap_insert() to heap_xlog.c and so on. Thanks, -- kou
Re: Separate HEAP WAL replay logic into its own file
> On Jun 18, 2024, at 20:42, Melanie Plageman wrote: > > External Email > > On Mon, Jun 17, 2024 at 9:12 PM Li, Yong wrote: >> >> As a newcomer, when I was walking through the code looking for WAL replay >> related code, it was relatively easy for me to find them for the B-Tree >> access method because of the “xlog” hint in the file names. It took me a >> while to find the same for the heap access method. When I finally found them >> (via text search), it was a small surprise. Having different file >> organizations for different access methods gives me this urge to make >> everything consistent. I think it will make it easier for newcomers, and it >> will reduce the mental load for everyone to remember that heap replay is >> inside the heapam.c not some “???xlog.c”. > > That makes sense. The branch for PG18 has not been cut yet, so I > recommend registering this patch for the July commitfest [1] so it > doesn't get lost. > > - Melanie > Thanks for the positive feedback. I’ve added the patch to the July CF. Yong
Re: Separate HEAP WAL replay logic into its own file
On Mon, Jun 17, 2024 at 9:12 PM Li, Yong wrote: > > As a newcomer, when I was walking through the code looking for WAL replay > related code, it was relatively easy for me to find them for the B-Tree > access method because of the “xlog” hint in the file names. It took me a > while to find the same for the heap access method. When I finally found them > (via text search), it was a small surprise. Having different file > organizations for different access methods gives me this urge to make > everything consistent. I think it will make it easier for newcomers, and it > will reduce the mental load for everyone to remember that heap replay is > inside the heapam.c not some “???xlog.c”. That makes sense. The branch for PG18 has not been cut yet, so I recommend registering this patch for the July commitfest [1] so it doesn't get lost. - Melanie [1] https://commitfest.postgresql.org/48/
Re: Separate HEAP WAL replay logic into its own file
> On Jun 17, 2024, at 23:01, Melanie Plageman wrote: > > External Email > > On Mon, Jun 17, 2024 at 2:20 AM Li, Yong wrote: >> >> Hi PostgreSQL hackers, >> >> For most access methods in PostgreSQL, the implementation of the access >> method itself and the implementation of its WAL replay logic are organized >> in separate source files. However, the HEAP access method is an exception. >> Both the access method and the WAL replay logic are collocated in the same >> heapam.c. To follow the pattern established by other access methods and to >> improve maintainability, I made the enclosed patch to separate HEAP’s replay >> logic into its own file. The changes are straightforward. Move the replay >> related functions into the new heapam_xlog.c file, push the common >> heap_execute_freeze_tuple() helper function into the heapam.h header, and >> adjust the build files. > > I'm not against this change, but I am curious at what inspired this. > Were you looking at Postgres code and simply noticed that there isn't > a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you > wanted to change that? Or is there some specific reason this would > help you as a Postgres developer, user, or ecosystem member? > > - Melanie As a newcomer, when I was walking through the code looking for WAL replay related code, it was relatively easy for me to find them for the B-Tree access method because of the “xlog” hint in the file names. It took me a while to find the same for the heap access method. When I finally found them (via text search), it was a small surprise. Having different file organizations for different access methods gives me this urge to make everything consistent. I think it will make it easier for newcomers, and it will reduce the mental load for everyone to remember that heap replay is inside the heapam.c not some “???xlog.c”. Yong
Re: Separate HEAP WAL replay logic into its own file
On Mon, Jun 17, 2024 at 2:20 AM Li, Yong wrote: > > Hi PostgreSQL hackers, > > For most access methods in PostgreSQL, the implementation of the access > method itself and the implementation of its WAL replay logic are organized in > separate source files. However, the HEAP access method is an exception. > Both the access method and the WAL replay logic are collocated in the same > heapam.c. To follow the pattern established by other access methods and to > improve maintainability, I made the enclosed patch to separate HEAP’s replay > logic into its own file. The changes are straightforward. Move the replay > related functions into the new heapam_xlog.c file, push the common > heap_execute_freeze_tuple() helper function into the heapam.h header, and > adjust the build files. I'm not against this change, but I am curious at what inspired this. Were you looking at Postgres code and simply noticed that there isn't a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you wanted to change that? Or is there some specific reason this would help you as a Postgres developer, user, or ecosystem member? - Melanie
Separate HEAP WAL replay logic into its own file
Hi PostgreSQL hackers, For most access methods in PostgreSQL, the implementation of the access method itself and the implementation of its WAL replay logic are organized in separate source files. However, the HEAP access method is an exception. Both the access method and the WAL replay logic are collocated in the same heapam.c. To follow the pattern established by other access methods and to improve maintainability, I made the enclosed patch to separate HEAP’s replay logic into its own file. The changes are straightforward. Move the replay related functions into the new heapam_xlog.c file, push the common heap_execute_freeze_tuple() helper function into the heapam.h header, and adjust the build files. I hope people find this straightforward refactoring helpful. Yong heapam_refactor.patch Description: heapam_refactor.patch