Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-05-18 Thread Noah Misch
On Wed, May 15, 2024 at 02:56:54PM +0900, Masahiko Sawada wrote:
> On Sat, May 4, 2024 at 7:36 AM Joe Conway  wrote:
> > On 5/3/24 11:44, Peter Eisentraut wrote:
> > > On 03.05.24 16:13, Tom Lane wrote:
> > >> Peter Eisentraut  writes:
> > >>> On 30.04.24 19:29, Tom Lane wrote:
> >  Also, the bigger picture here is the seeming assumption that "if
> >  we change pg_trgm then it will be safe to replicate from x86 to
> >  arm".  I don't believe that that's a good idea and I'm unwilling
> >  to promise that it will work, regardless of what we do about
> >  char signedness.  That being the case, I don't want to invest a
> >  lot of effort in the signedness issue.  Option (1) is clearly
> >  a small change with little if any risk of future breakage.
> > >>
> > >>> But note that option 1 would prevent some replication that is currently
> > >>> working.
> > >>
> > >> The point of this thread though is that it's working only for small
> > >> values of "work".  People are rightfully unhappy if it seems to work
> > >> and then later they get bitten by compatibility problems.
> > >>
> > >> Treating char signedness as a machine property in pg_control would
> > >> signal that we don't intend to make it work, and would ensure that
> > >> even the most minimal testing would find out that it doesn't work.
> > >>
> > >> If we do not do that, it seems to me we have to buy into making
> > >> it work.  That would mean dealing with the consequences of an
> > >> incompatible change in pg_trgm indexes, and then going through
> > >> the same dance again the next time(s) similar problems are found.
> > >
> > > Yes, that is understood.  But anecdotally, replicating between x86-64 
> > > arm64 is
> > > occasionally used for upgrades or migrations.  In practice, this appears 
> > > to have
> > > mostly worked.  If we now discover that it won't work with certain index
> > > extension modules, it's usable for most users. Even if we say, you have to
> > > reindex everything afterwards, it's probably still useful for these 
> > > scenarios.

Like you, I would not introduce a ControlFileData block for sign-of-char,
given the signs of breakage in extension indexing only.  That would lose much
useful migration capability.  I'm sympathetic to Tom's point, which I'll
attempt to summarize as: a ControlFileData block is a promise we know how to
keep, whereas we should expect further bug discoveries without it.  At the
same time, I put more weight on the apparently-wide span of functionality that
works fine.  (I wonder whether any static analyzer does or practically could
detect char sign dependence with a decent false positive rate.)

> +1
> 
> How about extending amcheck to support GIN and GIst indexes so that it
> can detect potential data incompatibility due to changing 'char' to
> 'unsigned char'? I think these new tests would be useful also for
> users to check if they really need to reindex indexes due to such
> changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18.
> Users who upgraded to PG18 can run the new amcheck tests on the
> primary as well as the standby.

If I were standardizing pg_trgm on one or the other notion of "char", I would
choose signed char, since I think it's still the majority.  More broadly, I
see these options to fix pg_trgm:

1. Change to signed char.  Every arm64 system needs to scan pg_trgm indexes.
2. Change to unsigned char.  Every x86 system needs to scan pg_trgm indexes.
3. Offer both, as an upgrade path.  For example, pg_trgm could have separate
   operator classes gin_trgm_ops and gin_trgm_ops_unsigned.  Running
   pg_upgrade on an unsigned-char system would automatically map v17
   gin_trgm_ops to v18 gin_trgm_ops_unsigned.  This avoids penalizing any
   architecture with upgrade-time scans.

Independently, having amcheck for GIN and/or GiST would be great.




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-05-14 Thread Masahiko Sawada
On Sat, May 4, 2024 at 7:36 AM Joe Conway  wrote:
>
> On 5/3/24 11:44, Peter Eisentraut wrote:
> > On 03.05.24 16:13, Tom Lane wrote:
> >> Peter Eisentraut  writes:
> >>> On 30.04.24 19:29, Tom Lane wrote:
>  Also, the bigger picture here is the seeming assumption that "if
>  we change pg_trgm then it will be safe to replicate from x86 to
>  arm".  I don't believe that that's a good idea and I'm unwilling
>  to promise that it will work, regardless of what we do about
>  char signedness.  That being the case, I don't want to invest a
>  lot of effort in the signedness issue.  Option (1) is clearly
>  a small change with little if any risk of future breakage.
> >>
> >>> But note that option 1 would prevent some replication that is currently
> >>> working.
> >>
> >> The point of this thread though is that it's working only for small
> >> values of "work".  People are rightfully unhappy if it seems to work
> >> and then later they get bitten by compatibility problems.
> >>
> >> Treating char signedness as a machine property in pg_control would
> >> signal that we don't intend to make it work, and would ensure that
> >> even the most minimal testing would find out that it doesn't work.
> >>
> >> If we do not do that, it seems to me we have to buy into making
> >> it work.  That would mean dealing with the consequences of an
> >> incompatible change in pg_trgm indexes, and then going through
> >> the same dance again the next time(s) similar problems are found.
> >
> > Yes, that is understood.  But anecdotally, replicating between x86-64 arm64 
> > is
> > occasionally used for upgrades or migrations.  In practice, this appears to 
> > have
> > mostly worked.  If we now discover that it won't work with certain index
> > extension modules, it's usable for most users. Even if we say, you have to
> > reindex everything afterwards, it's probably still useful for these 
> > scenarios.
>
> +1

+1

How about extending amcheck to support GIN and GIst indexes so that it
can detect potential data incompatibility due to changing 'char' to
'unsigned char'? I think these new tests would be useful also for
users to check if they really need to reindex indexes due to such
changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18.
Users who upgraded to PG18 can run the new amcheck tests on the
primary as well as the standby.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-05-03 Thread Joe Conway

On 5/3/24 11:44, Peter Eisentraut wrote:

On 03.05.24 16:13, Tom Lane wrote:

Peter Eisentraut  writes:

On 30.04.24 19:29, Tom Lane wrote:

Also, the bigger picture here is the seeming assumption that "if
we change pg_trgm then it will be safe to replicate from x86 to
arm".  I don't believe that that's a good idea and I'm unwilling
to promise that it will work, regardless of what we do about
char signedness.  That being the case, I don't want to invest a
lot of effort in the signedness issue.  Option (1) is clearly
a small change with little if any risk of future breakage.



But note that option 1 would prevent some replication that is currently
working.


The point of this thread though is that it's working only for small
values of "work".  People are rightfully unhappy if it seems to work
and then later they get bitten by compatibility problems.

Treating char signedness as a machine property in pg_control would
signal that we don't intend to make it work, and would ensure that
even the most minimal testing would find out that it doesn't work.

If we do not do that, it seems to me we have to buy into making
it work.  That would mean dealing with the consequences of an
incompatible change in pg_trgm indexes, and then going through
the same dance again the next time(s) similar problems are found.


Yes, that is understood.  But anecdotally, replicating between x86-64 arm64 is 
occasionally used for upgrades or migrations.  In practice, this appears to have 
mostly worked.  If we now discover that it won't work with certain index 
extension modules, it's usable for most users. Even if we say, you have to 
reindex everything afterwards, it's probably still useful for these scenarios.


+1

I have heard similar anecdotes, and the reported experience goes even further -- 
many such upgrade/migration uses, with exceedingly rare reported failures.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-05-03 Thread Peter Eisentraut

On 03.05.24 16:13, Tom Lane wrote:

Peter Eisentraut  writes:

On 30.04.24 19:29, Tom Lane wrote:

Also, the bigger picture here is the seeming assumption that "if
we change pg_trgm then it will be safe to replicate from x86 to
arm".  I don't believe that that's a good idea and I'm unwilling
to promise that it will work, regardless of what we do about
char signedness.  That being the case, I don't want to invest a
lot of effort in the signedness issue.  Option (1) is clearly
a small change with little if any risk of future breakage.



But note that option 1 would prevent some replication that is currently
working.


The point of this thread though is that it's working only for small
values of "work".  People are rightfully unhappy if it seems to work
and then later they get bitten by compatibility problems.

Treating char signedness as a machine property in pg_control would
signal that we don't intend to make it work, and would ensure that
even the most minimal testing would find out that it doesn't work.

If we do not do that, it seems to me we have to buy into making
it work.  That would mean dealing with the consequences of an
incompatible change in pg_trgm indexes, and then going through
the same dance again the next time(s) similar problems are found.


Yes, that is understood.  But anecdotally, replicating between x86-64 
arm64 is occasionally used for upgrades or migrations.  In practice, 
this appears to have mostly worked.  If we now discover that it won't 
work with certain index extension modules, it's usable for most users. 
Even if we say, you have to reindex everything afterwards, it's probably 
still useful for these scenarios.


The way I understand the original report, the issue has to do 
specifically with how signed and unsigned chars compare differently.  I 
don't imagine this is used anywhere in the table/heap code.  So it's 
plausible that this issue is really contained to indexes.


On the other hand, if we put in a check against this, then at least we 
can answer any user questions about this with more certainty: No, won't 
work, here is why.






Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-05-03 Thread Tom Lane
Peter Eisentraut  writes:
> On 30.04.24 19:29, Tom Lane wrote:
>> Also, the bigger picture here is the seeming assumption that "if
>> we change pg_trgm then it will be safe to replicate from x86 to
>> arm".  I don't believe that that's a good idea and I'm unwilling
>> to promise that it will work, regardless of what we do about
>> char signedness.  That being the case, I don't want to invest a
>> lot of effort in the signedness issue.  Option (1) is clearly
>> a small change with little if any risk of future breakage.

> But note that option 1 would prevent some replication that is currently 
> working.

The point of this thread though is that it's working only for small
values of "work".  People are rightfully unhappy if it seems to work
and then later they get bitten by compatibility problems.

Treating char signedness as a machine property in pg_control would
signal that we don't intend to make it work, and would ensure that
even the most minimal testing would find out that it doesn't work.

If we do not do that, it seems to me we have to buy into making
it work.  That would mean dealing with the consequences of an
incompatible change in pg_trgm indexes, and then going through
the same dance again the next time(s) similar problems are found.

regards, tom lane




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-05-03 Thread Peter Eisentraut

On 30.04.24 19:29, Tom Lane wrote:

1) Assume that char signedness is somehow a property of bits-on-disk
even though it's weird.  Then pg_trgm indexes are correct, but we need
to store char signedness in pg_control.
2) Assume that char signedness is not a property of bits-on-disk.
Then pg_trgm indexes are buggy and need to be fixed.
What do you think?

Also, the bigger picture here is the seeming assumption that "if
we change pg_trgm then it will be safe to replicate from x86 to
arm".  I don't believe that that's a good idea and I'm unwilling
to promise that it will work, regardless of what we do about
char signedness.  That being the case, I don't want to invest a
lot of effort in the signedness issue.  Option (1) is clearly
a small change with little if any risk of future breakage.


But note that option 1 would prevent some replication that is currently 
working.






Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-30 Thread Masahiko Sawada
On Wed, May 1, 2024 at 2:29 AM Tom Lane  wrote:
>
> Alexander Korotkov  writes:
> > I agree that storing char signedness might seem weird.  But it appears
> > that we already store indexes that depend on char signedness.  So,
> > it's effectively property of bits-on-disk even though it affects
> > indirectly.  Then I see two options to make the picture consistent.
> > 1) Assume that char signedness is somehow a property of bits-on-disk
> > even though it's weird.  Then pg_trgm indexes are correct, but we need
> > to store char signedness in pg_control.
> > 2) Assume that char signedness is not a property of bits-on-disk.
> > Then pg_trgm indexes are buggy and need to be fixed.
> > What do you think?
>
> The problem with option (2) is the assumption that pg_trgm's behavior
> is the only bug of this kind, either now or in the future.  I think
> that's just about an impossible standard to meet, because there's no
> realistic way to test whether char signedness is affecting things.
> (Sure, you can compare results across platforms, but maybe you
> just didn't test the right case.)
>
> Also, the bigger picture here is the seeming assumption that "if
> we change pg_trgm then it will be safe to replicate from x86 to
> arm".  I don't believe that that's a good idea and I'm unwilling
> to promise that it will work, regardless of what we do about
> char signedness.  That being the case, I don't want to invest a
> lot of effort in the signedness issue.

I think that the char signedness issue is an issue also for developers
(and extension authors) since it could lead to confusion and potential
bugs in the future due to that. x86 developers would think of char as
always being signed and write code that will misbehave on arm
machines. For example, since logical replication should behave
correctly even in cross-arch replication all developers need to be
aware of that. I thought of using the -funsigned-char (or
-fsigned-char) compiler flag to avoid that but it would have a broader
impact not only on indexes.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-30 Thread Tom Lane
Alexander Korotkov  writes:
> I agree that storing char signedness might seem weird.  But it appears
> that we already store indexes that depend on char signedness.  So,
> it's effectively property of bits-on-disk even though it affects
> indirectly.  Then I see two options to make the picture consistent.
> 1) Assume that char signedness is somehow a property of bits-on-disk
> even though it's weird.  Then pg_trgm indexes are correct, but we need
> to store char signedness in pg_control.
> 2) Assume that char signedness is not a property of bits-on-disk.
> Then pg_trgm indexes are buggy and need to be fixed.
> What do you think?

The problem with option (2) is the assumption that pg_trgm's behavior
is the only bug of this kind, either now or in the future.  I think
that's just about an impossible standard to meet, because there's no
realistic way to test whether char signedness is affecting things.
(Sure, you can compare results across platforms, but maybe you
just didn't test the right case.)

Also, the bigger picture here is the seeming assumption that "if
we change pg_trgm then it will be safe to replicate from x86 to
arm".  I don't believe that that's a good idea and I'm unwilling
to promise that it will work, regardless of what we do about
char signedness.  That being the case, I don't want to invest a
lot of effort in the signedness issue.  Option (1) is clearly
a small change with little if any risk of future breakage.
Option (2) ... not so much.

regards, tom lane




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-30 Thread Alexander Korotkov
On Tue, Apr 30, 2024 at 7:54 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > Given this, should we try to do better with binary compatibility
> > checks using ControlFileData?  AFAICS they are supposed to check if
> > the database cluster is binary compatible with the running
> > architecture.  But it obviously allows incompatibilities.
>
> Perhaps.  pg_control already covers endianness, which I think
> is the root of the hashing differences I showed.  Adding a field
> for char signedness feels a little weird, since it's not directly
> a property of the bits-on-disk, but maybe we should.

I agree that storing char signedness might seem weird.  But it appears
that we already store indexes that depend on char signedness.  So,
it's effectively property of bits-on-disk even though it affects
indirectly.  Then I see two options to make the picture consistent.
1) Assume that char signedness is somehow a property of bits-on-disk
even though it's weird.  Then pg_trgm indexes are correct, but we need
to store char signedness in pg_control.
2) Assume that char signedness is not a property of bits-on-disk.
Then pg_trgm indexes are buggy and need to be fixed.
What do you think?

--
Regards,
Alexander Korotkov




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-30 Thread Tom Lane
Alexander Korotkov  writes:
> Given this, should we try to do better with binary compatibility
> checks using ControlFileData?  AFAICS they are supposed to check if
> the database cluster is binary compatible with the running
> architecture.  But it obviously allows incompatibilities.

Perhaps.  pg_control already covers endianness, which I think
is the root of the hashing differences I showed.  Adding a field
for char signedness feels a little weird, since it's not directly
a property of the bits-on-disk, but maybe we should.

regards, tom lane




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-30 Thread Alexander Korotkov
On Tue, Apr 23, 2024 at 5:57 PM Tom Lane  wrote:
> "Guo, Adam"  writes:
> > I would like to report an issue with the pg_trgm extension on
> > cross-architecture replication scenarios. When an x86_64 standby
> > server is replicating from an aarch64 primary server or vice versa,
> > the gist_trgm_ops opclass returns different results on the primary
> > and standby.
>
> I do not think that is a supported scenario.  Hash functions and
> suchlike are not guaranteed to produce the same results on different
> CPU architectures.  As a quick example, I get
>
> regression=# select hashfloat8(34);
>  hashfloat8
> 
>21570837
> (1 row)
>
> on x86_64 but
>
> postgres=# select hashfloat8(34);
>  hashfloat8
> 
>  -602898821
> (1 row)
>
> on ppc32 thanks to the endianness difference.

Given this, should we try to do better with binary compatibility
checks using ControlFileData?  AFAICS they are supposed to check if
the database cluster is binary compatible with the running
architecture.  But it obviously allows incompatibilities.

--
Regards,
Alexander Korotkov




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-30 Thread Tom Lane
Masahiko Sawada  writes:
> On Tue, Apr 30, 2024 at 12:37 PM Tom Lane  wrote:
>> I think this will break existing indexes that are working fine.
>> Yeah, it would have been better to avoid the difference, but
>> it's too late now.

> True. So it will be a PG18 item.

How will it be any better in v18?  It's still an on-disk
compatibility break for affected platforms.

Now, people could recover by reindexing affected indexes,
but I think we need to have a better justification than this
for making them do so.

regards, tom lane




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-30 Thread Masahiko Sawada
On Tue, Apr 30, 2024 at 12:37 PM Tom Lane  wrote:
>
> Masahiko Sawada  writes:
> > On Tue, Apr 23, 2024 at 11:57 PM Tom Lane  wrote:
> >> Reject as not a bug.  Discourage people from thinking that physical
> >> replication will work across architectures.
>
> > While cross-arch physical replication is not supported, I think having
> > architecture dependent differences is not good and It's legitimate to
> > fix it. FYI the 'char' data type comparison is done as though char is
> > unsigned. I've attached a small patch to fix it. What do you think?
>
> I think this will break existing indexes that are working fine.
> Yeah, it would have been better to avoid the difference, but
> it's too late now.

True. So it will be a PG18 item.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-29 Thread Tom Lane
Masahiko Sawada  writes:
> On Tue, Apr 23, 2024 at 11:57 PM Tom Lane  wrote:
>> Reject as not a bug.  Discourage people from thinking that physical
>> replication will work across architectures.

> While cross-arch physical replication is not supported, I think having
> architecture dependent differences is not good and It's legitimate to
> fix it. FYI the 'char' data type comparison is done as though char is
> unsigned. I've attached a small patch to fix it. What do you think?

I think this will break existing indexes that are working fine.
Yeah, it would have been better to avoid the difference, but
it's too late now.

regards, tom lane




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-29 Thread Masahiko Sawada
On Tue, Apr 23, 2024 at 11:57 PM Tom Lane  wrote:
>
> "Guo, Adam"  writes:
> > I would like to report an issue with the pg_trgm extension on
> > cross-architecture replication scenarios. When an x86_64 standby
> > server is replicating from an aarch64 primary server or vice versa,
> > the gist_trgm_ops opclass returns different results on the primary
> > and standby.
>
> I do not think that is a supported scenario.  Hash functions and
> suchlike are not guaranteed to produce the same results on different
> CPU architectures.  As a quick example, I get
>
> regression=# select hashfloat8(34);
>  hashfloat8
> 
>21570837
> (1 row)
>
> on x86_64 but
>
> postgres=# select hashfloat8(34);
>  hashfloat8
> 
>  -602898821
> (1 row)
>
> on ppc32 thanks to the endianness difference.
>
> > Given that this has problem has come up before and seems likely to
> > come up again, I'm curious what other broad solutions there might be
> > to resolve it?
>
> Reject as not a bug.  Discourage people from thinking that physical
> replication will work across architectures.

While cross-arch physical replication is not supported, I think having
architecture dependent differences is not good and It's legitimate to
fix it. FYI the 'char' data type comparison is done as though char is
unsigned. I've attached a small patch to fix it. What do you think?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


fix_signedness_issue_in_pg_trgm.patch
Description: Binary data


Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-23 Thread Tom Lane
"Guo, Adam"  writes:
> I would like to report an issue with the pg_trgm extension on
> cross-architecture replication scenarios. When an x86_64 standby
> server is replicating from an aarch64 primary server or vice versa,
> the gist_trgm_ops opclass returns different results on the primary
> and standby.

I do not think that is a supported scenario.  Hash functions and
suchlike are not guaranteed to produce the same results on different
CPU architectures.  As a quick example, I get

regression=# select hashfloat8(34);
 hashfloat8 

   21570837
(1 row)

on x86_64 but

postgres=# select hashfloat8(34);
 hashfloat8 

 -602898821
(1 row)

on ppc32 thanks to the endianness difference.

> Given that this has problem has come up before and seems likely to
> come up again, I'm curious what other broad solutions there might be
> to resolve it?

Reject as not a bug.  Discourage people from thinking that physical
replication will work across architectures.

regards, tom lane




pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-04-23 Thread Guo, Adam
Hi all,

I would like to report an issue with the pg_trgm extension on 
cross-architecture replication scenarios. When an x86_64 standby server is 
replicating from an aarch64 primary server or vice versa, the gist_trgm_ops 
opclass returns different results on the primary and standby. Masahiko 
previously reported a similar issue affecting the pg_bigm extension [1].

To reproduce, execute the following on the x86_64 primary server:

CREATE EXTENSION pg_trgm;
CREATE TABLE tbl (c text);
CREATE INDEX ON tbl USING gist (c gist_trgm_ops);
INSERT INTO tbl VALUES ('Bóbr');

On the x86_64 primary server:

postgres=> select * from tbl where c like '%Bób%';
  c
--
Bóbr
(1 row)

On the aarch64 replica server:

postgres=> select * from tbl where c like '%Bób%';
c
---
(0 rows)

The root cause looks the same as the pg_bigm issue that Masahiko reported. To 
compare trigrams, pg_trgm uses a numerical comparison of chars [2]. On x86_64 a 
char is signed by default, whereas on aarch64 it is unsigned by default. 
gist_trgm_ops expects the trigram list to be sorted, but due to the different 
signedness of chars, the sort order is broken when replicating the values 
across architectures.

The different sort behaviour can be demonstrated using show_trgm.

On the x86_64 primary server:

postgres=> SELECT show_trgm('Bóbr');
show_trgm
--
{0x89194c,"  b","br ",0x707c72,0x7f7849}
(1 row)

On the aarch64 replica server:

postgres=> SELECT show_trgm('Bóbr');
show_trgm
--
{"  b","br ",0x707c72,0x7f7849,0x89194c}
(1 row)

One simple solution for this specific case is to declare the char signedness in 
the CMPPCHAR macro.

--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -42,7 +42,7 @@
typedef char trgm[3];
 #define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) )
-#define CMPPCHAR(a,b,i)  CMPCHAR( *(((const char*)(a))+i), *(((const 
char*)(b))+i) )
+#define CMPPCHAR(a,b,i)  CMPCHAR( *(((unsigned char*)(a))+i), *(((unsigned 
char*)(b))+i) )
#define CMPTRGM(a,b) ( CMPPCHAR(a,b,0) ? CMPPCHAR(a,b,0) : ( CMPPCHAR(a,b,1) ? 
CMPPCHAR(a,b,1) : CMPPCHAR(a,b,2) ) )
 #define CPTRGM(a,b) do {   
\

Alternatively, Postgres can be compiled with -funsigned-char or -fsigned-char. 
I came across a code comment suggesting that this may not be a good idea in 
general [3].

Given that this has problem has come up before and seems likely to come up 
again, I'm curious what other broad solutions there might be to resolve it? 
Looking forward to any feedback, thanks!

Best,

Adam Guo
Amazon Web Services: https://aws.amazon.com

[1] 
https://osdn.net/projects/pgbigm/lists/archive/hackers/2024-February/000370.html
[2] 
https://github.com/postgres/postgres/blob/480bc6e3ed3a5719cdec076d4943b119890e8171/contrib/pg_trgm/trgm.h#L45
[3] 
https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/cash.c#L114-L123