Re: Separate HEAP WAL replay logic into its own file

2024-09-18 Thread Michael Paquier
On Wed, Sep 18, 2024 at 08:40:02AM +, Li, Yong wrote:
> I am so glad to see that my patch got committed. Thank you a lot for it!
> This is my first accepted patch. It really means a lot to me.

No problem.  Thanks for the patch!
--
Michael


signature.asc
Description: PGP signature


Re: Separate HEAP WAL replay logic into its own file

2024-09-18 Thread Li, Yong


> On Sep 12, 2024, at 13:39, Michael Paquier  wrote:
> 
> External Email
> 
> 
> I was looking at all that, and this is only moving code around.  While
> the part for heap_xlog_logical_rewrite in rewriteheap.c is a bit sad
> but historical, the header cleanup in heapam.c is nice.
> 
> Seeing heap_execute_freeze_tuple in heapam.h due to the dependency to
> XLH_INVALID_XVAC and XLH_FREEZE_XVAC is slightly surprising, but the
> opposite where heap_execute_freeze_tuple() would be in heapam_xlog.h
> was less interesting.  Just to say that I am agreeing with you here
> and I have let this part as you suggested originally.
> 
> I was wondering for a bit about the order of the functions for heap
> and heap, but these are ordered in their own, which is also OK.  I
> have added a few more comments at the top of each subroutine for the
> records to be more consistent, and applied the result.
> —
> Michael
> 

I am so glad to see that my patch got committed. Thank you a lot for it!
This is my first accepted patch. It really means a lot to me.

Yong

Re: Separate HEAP WAL replay logic into its own file

2024-09-11 Thread Michael Paquier
On Thu, Sep 12, 2024 at 08:12:30AM +0900, Michael Paquier wrote:
> It looks like my mind was wondering away when I wrote this part.
> Sorry for the useless noise.

I was looking at all that, and this is only moving code around.  While
the part for heap_xlog_logical_rewrite in rewriteheap.c is a bit sad
but historical, the header cleanup in heapam.c is nice.

Seeing heap_execute_freeze_tuple in heapam.h due to the dependency to
XLH_INVALID_XVAC and XLH_FREEZE_XVAC is slightly surprising, but the
opposite where heap_execute_freeze_tuple() would be in heapam_xlog.h
was less interesting.  Just to say that I am agreeing with you here
and I have let this part as you suggested originally.

I was wondering for a bit about the order of the functions for heap
and heap, but these are ordered in their own, which is also OK.  I
have added a few more comments at the top of each subroutine for the
records to be more consistent, and applied the result.
--
Michael


signature.asc
Description: PGP signature


Re: Separate HEAP WAL replay logic into its own file

2024-09-11 Thread Michael Paquier
On Wed, Sep 11, 2024 at 04:41:49PM +0900, Michael Paquier wrote:
> +#include "access/heapam_xlog.h"
> 
> This is included in heapam.h, but missing from the patch.  I guess
> that you fat-fingered a `git add`.

It looks like my mind was wondering away when I wrote this part.
Sorry for the useless noise.
--
Michael


signature.asc
Description: PGP signature


Re: Separate HEAP WAL replay logic into its own file

2024-09-11 Thread Michael Paquier
On Tue, Jul 30, 2024 at 06:48:26AM +, Li, Yong wrote:
> Thank you Kou for your review. I will move the CF to the next
> phase and see what happens.

Quite a fan of what you are proposing here, knowing that heapam.c is
still 8.8k lines of code even after moving the 1.3k lines dedicated to
WAL records.

+#include "access/heapam_xlog.h"

This is included in heapam.h, but missing from the patch.  I guess
that you fat-fingered a `git add`.
--
Michael


signature.asc
Description: PGP signature


Re: Separate HEAP WAL replay logic into its own file

2024-07-29 Thread Li, Yong

> I think that this proposal is reasonable but we need to get
> attention from a committer to move forward this proposal.
> 
> 
> Thanks,
> —
> kou

Thank you Kou for your review. I will move the CF to the next
phase and see what happens.


Regards,
Yong




Re: Separate HEAP WAL replay logic into its own file

2024-07-29 Thread Sutou Kouhei
Hi,

In <599e67d2-2929-4858-b8bc-f9c4ae889...@ebay.com>
  "Re: Separate HEAP WAL replay logic into its own file" on Fri, 26 Jul 2024 
07:56:12 +,
  "Li, Yong"  wrote:

>> 1. Could you create your patch by "git format-patch -vN master"
>>   or something? If you create your patch by "git format-patch",
>>   we can apply your patch by "git am XXX.patch".
>>
> 
> Thanks for your review. I’ve updated the patch to follow your
> suggested format.

Thanks. I could apply your patch by "git am
v2-0001-heapam_refactor.patch".

Could you use the following format for the commit message
next time?


${TITLE}

${DESCRIPTION}


For example:


Separate HEAP WAL replay logic into its own file

Most access methods (i.e. nbtree and hash) use a separate
file with "xlog" in its name for its WAL replay logic. Heap
is one exception of this convention. To make it easier for
newcomers to find the WAL replay logic for the heap access
method, this patch isolates heap's replay logic in a new
heapam_xlog.c file. This patch is a pure refactoring with no
change to the logic.


This is a commonly used Git's commit message format. See
also other commit messages by "git log".

>> 5. There are still WAL related codes in heapam.c:
>>
>>   4.1. log_heap_update()
>>   4.2. log_heap_new_cid()
>>   4.3. if (RelationNeedsWAL()) {...} in heap_insert()
>>   4.4. if (needwal) {...} in heap_multi_insert()
>>   4.5. if (RelationNeedsWAL()) {...} in heap_delete()
>>   4.6. if (RelationNeedsWAL()) {...}s in heap_update()
>>   4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
>>   4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
>>   4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
>>   4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
>>   4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
>>   4.12. log_heap_visible()
>>
>>   Should we move them to head_xlog.c too?
>>
>>   If we should do it, separated commits will be easy to
>>   review. For example, the 0001 patch moves existing codes
>>   to head_xlog.c as-is. The 0002 patch extracts WAL related
>>   codes in heap_insert() to heap_xlog.c and so on.
> 
> I followed the convention of most access methods. The “xlog”
> file includes the WAL replay logic only. The logic that generates
> the log records themselves stays with the code that performs
> the changes. Take nbtree as an example, you can also find
> WAL generating code in several _bt_insertxxx() functions inside
> the nbtinsert.c file.

You're right. Sorry.


I think that this proposal is reasonable but we need to get
attention from a committer to move forward this proposal.


Thanks,
-- 
kou




Re: Separate HEAP WAL replay logic into its own file

2024-07-26 Thread Li, Yong


> On Jul 23, 2024, at 09:54, Sutou Kouhei  wrote:
>
>
> Here are my comments for your patch:
>
> 1. Could you create your patch by "git format-patch -vN master"
>   or something? If you create your patch by "git format-patch",
>   we can apply your patch by "git am XXX.patch".
>

Thanks for your review. I’ve updated the patch to follow your
suggested format.

>
> 3. Could you remove '#include "access/heapam_xlog.h"' from
>   heapam.c because it's needless now.
>
>   BTW, it seems that we can remove more includes from
>   heapam.c:
>
> 4. Could you remove needless includes from heapam_xlog.c? It
>   seems that we can remove the following includes:

I have removed the redundant includes in the latest patch.

>
> 5. There are still WAL related codes in heapam.c:
>
>   4.1. log_heap_update()
>   4.2. log_heap_new_cid()
>   4.3. if (RelationNeedsWAL()) {...} in heap_insert()
>   4.4. if (needwal) {...} in heap_multi_insert()
>   4.5. if (RelationNeedsWAL()) {...} in heap_delete()
>   4.6. if (RelationNeedsWAL()) {...}s in heap_update()
>   4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
>   4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
>   4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
>   4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
>   4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
>   4.12. log_heap_visible()
>
>   Should we move them to head_xlog.c too?
>
>   If we should do it, separated commits will be easy to
>   review. For example, the 0001 patch moves existing codes
>   to head_xlog.c as-is. The 0002 patch extracts WAL related
>   codes in heap_insert() to heap_xlog.c and so on.
>
>
> Thanks,
> --
> kou

I followed the convention of most access methods. The “xlog”
file includes the WAL replay logic only. The logic that generates
the log records themselves stays with the code that performs
the changes. Take nbtree as an example, you can also find
WAL generating code in several _bt_insertxxx() functions inside
the nbtinsert.c file.

Please help review the updated file again. Thanks in advance!


Yong






v2-0001-heapam_refactor.patch
Description: v2-0001-heapam_refactor.patch


Re: Separate HEAP WAL replay logic into its own file

2024-07-22 Thread Sutou Kouhei
Hi,

I'm reviewing patches in commitfest 2024-07:
https://commitfest.postgresql.org/48/

This is the 5th patch:
https://commitfest.postgresql.org/48/5054/

FYI: https://commitfest.postgresql.org/48/4681/ is my patch.

In 
  "Re: Separate HEAP WAL replay logic into its own file" on Tue, 18 Jun 2024 
01:12:42 +,
  "Li, Yong"  wrote:

> As a newcomer, when I was walking through the code looking
> for WAL replay related code, it was relatively easy for me
> to find them for the B-Tree access method because of the
> “xlog” hint in the file names. It took me a while to
> find the same for the heap access method. When I finally
> found them (via text search), it was a small
> surprise. Having different file organizations for
> different access methods gives me this urge to make
> everything consistent. I think it will make it easier for
> newcomers, and it will reduce the mental load for everyone
> to remember that heap replay is inside the heapam.c not
> some “???xlog.c”.

It makes sense.


Here are my comments for your patch:

1. Could you create your patch by "git format-patch -vN master"
   or something? If you create your patch by "git format-patch",
   we can apply your patch by "git am XXX.patch".

2. I confirmed that all heapam.c -> heapam_xlog.c/heapam.h
   moves don't change implementations. I re-moved moved
   codes to heapam.c and there is no diff in heapam.c.

3. Could you remove '#include "access/heapam_xlog.h"' from
   heapam.c because it's needless now.

   BTW, it seems that we can remove more includes from
   heapam.c:


diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bc6d4868975..f1671072576 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -31,42 +31,24 @@
  */
 #include "postgres.h"
 
-#include "access/bufmask.h"
 #include "access/heapam.h"
-#include "access/heapam_xlog.h"
 #include "access/heaptoast.h"
 #include "access/hio.h"
 #include "access/multixact.h"
-#include "access/parallel.h"
-#include "access/relscan.h"
 #include "access/subtrans.h"
 #include "access/syncscan.h"
-#include "access/sysattr.h"
-#include "access/tableam.h"
-#include "access/transam.h"
 #include "access/valid.h"
 #include "access/visibilitymap.h"
-#include "access/xact.h"
-#include "access/xlog.h"
 #include "access/xloginsert.h"
-#include "access/xlogutils.h"
-#include "catalog/catalog.h"
 #include "commands/vacuum.h"
-#include "miscadmin.h"
 #include "pgstat.h"
-#include "port/atomics.h"
 #include "port/pg_bitutils.h"
-#include "storage/bufmgr.h"
-#include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "storage/procarray.h"
-#include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/injection_point.h"
 #include "utils/inval.h"
-#include "utils/relcache.h"
-#include "utils/snapmgr.h"
 #include "utils/spccache.h"
---

   We may want to work on removing needless includes as a
   separated cleanup task.

4. Could you remove needless includes from heapam_xlog.c? It
   seems that we can remove the following includes:


diff --git a/src/backend/access/heap/heapam_xlog.c 
b/src/backend/access/heap/heapam_xlog.c
index b372f2b4afc..af4976f382d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -16,16 +16,11 @@
 
 #include "access/bufmask.h"
 #include "access/heapam.h"
-#include "access/heapam_xlog.h"
-#include "access/transam.h"
 #include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xlogutils.h"
-#include "port/atomics.h"
-#include "storage/bufmgr.h"
 #include "storage/freespace.h"
 #include "storage/standby.h"
-#include "utils/relcache.h"


5. There are still WAL related codes in heapam.c:

   4.1. log_heap_update()
   4.2. log_heap_new_cid()
   4.3. if (RelationNeedsWAL()) {...} in heap_insert()
   4.4. if (needwal) {...} in heap_multi_insert()
   4.5. if (RelationNeedsWAL()) {...} in heap_delete()
   4.6. if (RelationNeedsWAL()) {...}s in heap_update()
   4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
   4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
   4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
   4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
   4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
   4.12. log_heap_visible()

   Should we move them to head_xlog.c too?

   If we should do it, separated commits will be easy to
   review. For example, the 0001 patch moves existing codes
   to head_xlog.c as-is. The 0002 patch extracts WAL related
   codes in heap_insert() to heap_xlog.c and so on.


Thanks,
-- 
kou




Re: Separate HEAP WAL replay logic into its own file

2024-06-18 Thread Li, Yong


> On Jun 18, 2024, at 20:42, Melanie Plageman  wrote:
> 
> External Email
> 
> On Mon, Jun 17, 2024 at 9:12 PM Li, Yong  wrote:
>> 
>> As a newcomer, when I was walking through the code looking for WAL replay 
>> related code, it was relatively easy for me to find them for the B-Tree 
>> access method because of the “xlog” hint in the file names. It took me a 
>> while to find the same for the heap access method. When I finally found them 
>> (via text search), it was a small surprise. Having different file 
>> organizations for different access methods gives me this urge to make 
>> everything consistent. I think it will make it easier for newcomers, and it 
>> will reduce the mental load for everyone to remember that heap replay is 
>> inside the heapam.c not some “???xlog.c”.
> 
> That makes sense. The branch for PG18 has not been cut yet, so I
> recommend registering this patch for the July commitfest [1] so it
> doesn't get lost.
> 
> - Melanie
> 

Thanks for the positive feedback. I’ve added the patch to the July CF.

Yong



Re: Separate HEAP WAL replay logic into its own file

2024-06-18 Thread Melanie Plageman
On Mon, Jun 17, 2024 at 9:12 PM Li, Yong  wrote:
>
> As a newcomer, when I was walking through the code looking for WAL replay 
> related code, it was relatively easy for me to find them for the B-Tree 
> access method because of the “xlog” hint in the file names. It took me a 
> while to find the same for the heap access method. When I finally found them 
> (via text search), it was a small surprise. Having different file 
> organizations for different access methods gives me this urge to make 
> everything consistent. I think it will make it easier for newcomers, and it 
> will reduce the mental load for everyone to remember that heap replay is 
> inside the heapam.c not some “???xlog.c”.

That makes sense. The branch for PG18 has not been cut yet, so I
recommend registering this patch for the July commitfest [1] so it
doesn't get lost.

- Melanie

[1] https://commitfest.postgresql.org/48/




Re: Separate HEAP WAL replay logic into its own file

2024-06-17 Thread Li, Yong


> On Jun 17, 2024, at 23:01, Melanie Plageman  wrote:
> 
> External Email
> 
> On Mon, Jun 17, 2024 at 2:20 AM Li, Yong  wrote:
>> 
>> Hi PostgreSQL hackers,
>> 
>> For most access methods in PostgreSQL, the implementation of the access 
>> method itself and the implementation of its WAL replay logic are organized 
>> in separate source files.  However, the HEAP access method is an exception.  
>> Both the access method and the WAL replay logic are collocated in the same 
>> heapam.c.  To follow the pattern established by other access methods and to 
>> improve maintainability, I made the enclosed patch to separate HEAP’s replay 
>> logic into its own file.  The changes are straightforward.  Move the replay 
>> related functions into the new heapam_xlog.c file, push the common 
>> heap_execute_freeze_tuple() helper function into the heapam.h header, and 
>> adjust the build files.
> 
> I'm not against this change, but I am curious at what inspired this.
> Were you looking at Postgres code and simply noticed that there isn't
> a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you
> wanted to change that? Or is there some specific reason this would
> help you as a Postgres developer, user, or ecosystem member?
> 
> - Melanie

As a newcomer, when I was walking through the code looking for WAL replay 
related code, it was relatively easy for me to find them for the B-Tree access 
method because of the “xlog” hint in the file names. It took me a while to find 
the same for the heap access method. When I finally found them (via text 
search), it was a small surprise. Having different file organizations for 
different access methods gives me this urge to make everything consistent. I 
think it will make it easier for newcomers, and it will reduce the mental load 
for everyone to remember that heap replay is inside the heapam.c not some 
“???xlog.c”.

Yong

Re: Separate HEAP WAL replay logic into its own file

2024-06-17 Thread Melanie Plageman
On Mon, Jun 17, 2024 at 2:20 AM Li, Yong  wrote:
>
> Hi PostgreSQL hackers,
>
> For most access methods in PostgreSQL, the implementation of the access 
> method itself and the implementation of its WAL replay logic are organized in 
> separate source files.  However, the HEAP access method is an exception.  
> Both the access method and the WAL replay logic are collocated in the same 
> heapam.c.  To follow the pattern established by other access methods and to 
> improve maintainability, I made the enclosed patch to separate HEAP’s replay 
> logic into its own file.  The changes are straightforward.  Move the replay 
> related functions into the new heapam_xlog.c file, push the common 
> heap_execute_freeze_tuple() helper function into the heapam.h header, and 
> adjust the build files.

I'm not against this change, but I am curious at what inspired this.
Were you looking at Postgres code and simply noticed that there isn't
a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you
wanted to change that? Or is there some specific reason this would
help you as a Postgres developer, user, or ecosystem member?

- Melanie




Separate HEAP WAL replay logic into its own file

2024-06-16 Thread Li, Yong
Hi PostgreSQL hackers,

For most access methods in PostgreSQL, the implementation of the access method 
itself and the implementation of its WAL replay logic are organized in separate 
source files.  However, the HEAP access method is an exception.  Both the 
access method and the WAL replay logic are collocated in the same heapam.c.  To 
follow the pattern established by other access methods and to improve 
maintainability, I made the enclosed patch to separate HEAP’s replay logic into 
its own file.  The changes are straightforward.  Move the replay related 
functions into the new heapam_xlog.c file, push the common 
heap_execute_freeze_tuple() helper function into the heapam.h header, and 
adjust the build files.

I hope people find this straightforward refactoring helpful.


Yong





heapam_refactor.patch
Description: heapam_refactor.patch