Re: Fix some ubsan/asan related issues

2024-02-05 Thread Tristan Partin

On Tue Jan 30, 2024 at 3:58 PM CST, Andres Freund wrote:

Hi,

On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
> From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
> From: Tristan Partin 
> Date: Mon, 29 Jan 2024 18:03:39 -0600
> Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
>  NULL

> If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
> the API contract of memcpy in glibc. The two pointer arguments are
> marked as nonnull, even in the event the amount to copy is 0 bytes.

It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
that something useful?


Dropped. Will change on the Neon side. Should we add an assert 
somewhere for good measure? Near the memcpy in question?



> From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
> From: Tristan Partin 
> Date: Wed, 24 Jan 2024 17:07:01 -0600
> Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address
> 
> The ecpg is parser is extremely leaky, so we need to silence leak

> detection.

This does stuff beyond epcg...


Dropped.


> +if get_option('b_sanitize').contains('address')
> +  cdata.set('USE_ADDRESS_SANITIZER', 1)
> +endif
>  
>  ###

>  # NLS / Gettext
> diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> index ac409b0006..e18e716d9c 100644
> --- a/src/bin/initdb/initdb.c
> +++ b/src/bin/initdb/initdb.c
> @@ -338,6 +338,17 @@ do { \
>output_failed = true, output_errno = errno; \
>  } while (0)
>  
> +#ifdef USE_ADDRESS_SANITIZER


When asan is used  __SANITIZE_ADDRESS__ is defined, so we don't need to
implement this ourselves.


Thanks!


> +const char *__asan_default_options(void);
> +
> +const char *__asan_default_options(void)
> +{
> +  return "detect_leaks=0";
> +}
> +
> +#endif

Wonder if we should move this into some static library and link it into all
binaries that don't want leak detection? It doesn't seem great to have to
adjust this in a bunch of files if we want to adjust the options...


See attached patches. Here is what I found to be necessary to get 
a -Db_sanitize=address,undefined build to successfully make it through 
tests. I do have a few concerns about the patch.


1. For whatever reason, __SANITIZE_LEAK__ is not defined when the leak 
  sanitizer is enabled. So you will see me use this, to make some 
  include directives work. I don't like this as a final solution 
  because someone could use -fsanitize=leak.
2. I tracked down what seems to be a valid leak in adt/xml.c. Attached 
  (test.sql) is a fairly minimal reproduction of what is needed to show 
  the leak. I didn't spend too much time tracking it down. Might get to 
  it later, who knows. Below you will find the backtrace, and whoever 
  wants to try their hand at fixing it will need to comment out 
  xmlNewNode in the leak.supp file.
3. I don't love the new library name. Maybe it should be name more lsan 
  specific.
4. Should pg_attribute_no_asan be renamed to 
  pg_attribute_no_sanitize_address? That would match 
  pg_attribute_no_sanitize_alignment.


I will also attach a Meson test log for good measure. I didn't try 
testing anything that requires PG_TEST_EXTRA, but I suspect that 
everything will be fine.


Alexander, I haven't yet gotten to the things you pointed out in the 
sibling thread.


==221848==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 120 byte(s) in 1 object(s) allocated from:
   #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 
7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
   #1 0x7fac4a48427d in xmlNewNode (/lib64/libxml2.so.2+0x5d27d) (BuildId: 
3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
   #2 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
   #3 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
   #4 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
   #5 0xe91f2c in ExecInterpExprStillValid 
../src/backend/executor/execExprInterp.c:1881
   #6 0x109632d in ExecEvalExprSwitchContext 
../src/include/executor/executor.h:355
   #7 0x109655d in ExecProject ../src/include/executor/executor.h:389
   #8 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
   #9 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
   #10 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
   #11 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
   #12 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
   #13 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
   #14 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
   #15 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
   #16 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
   #17 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
   #18 0x170dbce in BackendRun 

Re: Fix some ubsan/asan related issues

2024-01-31 Thread Tristan Partin

On Tue Jan 30, 2024 at 10:00 PM CST, Alexander Lakhin wrote:

Hello,

30.01.2024 18:57, Tristan Partin wrote:
> Patch 1:
>
> Passing NULL as a second argument to memcpy breaks ubsan, ...

Maybe you would like to fix also one more similar place, reached with:
create extension xml2;
select xslt_process('',
$$http://www.w3.org/1999/XSL/Transform;>


$$);

varlena.c:201:26: runtime error: null pointer passed as argument 2, which is 
declared to never be null

There is also an issue with pg_bsd_indent, I stumble upon when doing
`make check-world` with the sanitizers enabled:
https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com

31.01.2024 00:23, Andres Freund wrote:
> The reason asan fails is that it uses a "shadow stack" to track stack variable
> lifetimes. These confuse our stack depth check. CI doesn't have the issue
> because the compiler doesn't yet enable the feature, locally I get around it
> by using ASAN_OPTIONS=detect_stack_use_after_return=0:...

Even with detect_stack_use_after_return=0, clang-18's asan makes the test
012_subtransactions.pl fail:
2024-01-31 03:24:25.691 UTC [4112455] 012_subtransactions.pl LOG: statement: 
SELECT hs_subxids(201);
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl ERROR: stack depth 
limit exceeded
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl HINT: Increase the configuration parameter max_stack_depth 
(currently 2048kB), after ensuring the platform's stack depth limit is adequate.


(All the other tests pass.)
Though the same test passes when I use clang-16.


Thanks Alexander! I will try and take a look at these.

--
Tristan Partin
Neon (https://neon.tech)




Re: Fix some ubsan/asan related issues

2024-01-30 Thread Alexander Lakhin

Hello,

30.01.2024 18:57, Tristan Partin wrote:

Patch 1:

Passing NULL as a second argument to memcpy breaks ubsan, ...


Maybe you would like to fix also one more similar place, reached with:
create extension xml2;
select xslt_process('',
$$http://www.w3.org/1999/XSL/Transform;>


$$);

varlena.c:201:26: runtime error: null pointer passed as argument 2, which is 
declared to never be null

There is also an issue with pg_bsd_indent, I stumble upon when doing
`make check-world` with the sanitizers enabled:
https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com

31.01.2024 00:23, Andres Freund wrote:

The reason asan fails is that it uses a "shadow stack" to track stack variable
lifetimes. These confuse our stack depth check. CI doesn't have the issue
because the compiler doesn't yet enable the feature, locally I get around it
by using ASAN_OPTIONS=detect_stack_use_after_return=0:...


Even with detect_stack_use_after_return=0, clang-18's asan makes the test
012_subtransactions.pl fail:
2024-01-31 03:24:25.691 UTC [4112455] 012_subtransactions.pl LOG: statement: 
SELECT hs_subxids(201);
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl ERROR: stack depth 
limit exceeded
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl HINT: Increase the configuration parameter max_stack_depth 
(currently 2048kB), after ensuring the platform's stack depth limit is adequate.


(All the other tests pass.)
Though the same test passes when I use clang-16.

Best regards,
Alexander

Re: Fix some ubsan/asan related issues

2024-01-30 Thread Andres Freund
Hi,

On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
> From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
> From: Tristan Partin 
> Date: Mon, 29 Jan 2024 18:03:39 -0600
> Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
>  NULL

> If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
> the API contract of memcpy in glibc. The two pointer arguments are
> marked as nonnull, even in the event the amount to copy is 0 bytes.

It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
that something useful?


> From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
> From: Tristan Partin 
> Date: Wed, 24 Jan 2024 17:07:01 -0600
> Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address
> 
> The ecpg is parser is extremely leaky, so we need to silence leak
> detection.

This does stuff beyond epcg...


> +if get_option('b_sanitize').contains('address')
> +  cdata.set('USE_ADDRESS_SANITIZER', 1)
> +endif
>  
>  ###
>  # NLS / Gettext
> diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> index ac409b0006..e18e716d9c 100644
> --- a/src/bin/initdb/initdb.c
> +++ b/src/bin/initdb/initdb.c
> @@ -338,6 +338,17 @@ do { \
>   output_failed = true, output_errno = errno; \
>  } while (0)
>  
> +#ifdef USE_ADDRESS_SANITIZER

When asan is used  __SANITIZE_ADDRESS__ is defined, so we don't need to
implement this ourselves.


> +const char *__asan_default_options(void);
> +
> +const char *__asan_default_options(void)
> +{
> + return "detect_leaks=0";
> +}
> +
> +#endif

Wonder if we should move this into some static library and link it into all
binaries that don't want leak detection? It doesn't seem great to have to
adjust this in a bunch of files if we want to adjust the options...

Greetings,

Andres Freund




Re: Fix some ubsan/asan related issues

2024-01-30 Thread Andres Freund
Hi,

On 2024-01-30 22:05:28 +0200, Heikki Linnakangas wrote:
> On 30/01/2024 17:57, Tristan Partin wrote:
> > In my effort to try to see if the test suite would pass with asan
> > enabled, I ran into a max_stack_depth issue. I tried maxing it out
> > (hence, the patch), but that still didn't remedy my issue. I tried to
> > look on the list for any relevant emails, but nothing turned up. Maybe
> > others have not attempted this before? Seems doubtful.
> > 
> > Not entirely sure how to fix this issue. I personally find asan
> > extremely effective, even more than valgrind, so it would be great if
> > I could run Postgres with asan enabled to catch various stupid C issues
> > I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
> > just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
> > seem to leave enough stack space for Postgres.
> 
> I'm a bit confused by these. We already run with sanitizer in the cirrus CI.
> What does this enable that we're not already doing?

The reason asan fails is that it uses a "shadow stack" to track stack variable
lifetimes. These confuse our stack depth check. CI doesn't have the issue
because the compiler doesn't yet enable the feature, locally I get around it
by using ASAN_OPTIONS=detect_stack_use_after_return=0:...

The checks are actually quite useful, so making our stack depth check work
with asan would be worthwhile.

I discussed this in a bit more in
https://postgr.es/m/20231129193920.4vphw7dqxjzf5v5b%40awork3.anarazel.de

Greetings,

Andres Freund




Re: Fix some ubsan/asan related issues

2024-01-30 Thread Heikki Linnakangas

On 30/01/2024 17:57, Tristan Partin wrote:

Patch 1:

Passing NULL as a second argument to memcpy breaks ubsan, and there
didn't seem to be anything preventing that in the LogLogicalMessage()
codepath. Here is a preventative measure in LogLogicalMessage() and an
Assert() in CopyXLogRecordToWAL().


For the audience: We ran into this one with the neon extension. The 
solution is to call LogLogicalMessage("", 0, ...) instead of 
LogLogicalMessage(NULL, 0, ...). . But it's true that it's pointlessfor 
LogLogicalMessage to call XLogRegisterData() if the message is empty. If 
we do this, we should check for 'size == 0' rather than 'message == NULL'.



Patch 2:

Support building with -Db_sanitize=address in Meson. Various executables
are leaky which can cause the builds to fail and tests to fail, when we
are fine leaking this memory.

Personally, I am a big stickler for always freeing my memory in
executables even if it gets released at program exit because it makes
the leak sanitizer much more effective. However (!), I am not here to
change the philosophy of memory management in one-off executables, so
I have just silenced memory leaks in various executables for now.

Patch 3:

THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION!

In my effort to try to see if the test suite would pass with asan
enabled, I ran into a max_stack_depth issue. I tried maxing it out
(hence, the patch), but that still didn't remedy my issue. I tried to
look on the list for any relevant emails, but nothing turned up. Maybe
others have not attempted this before? Seems doubtful.

Not entirely sure how to fix this issue. I personally find asan
extremely effective, even more than valgrind, so it would be great if
I could run Postgres with asan enabled to catch various stupid C issues
I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
seem to leave enough stack space for Postgres.


I'm a bit confused by these. We already run with sanitizer in the cirrus 
CI. What does this enable that we're not already doing?


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Fix some ubsan/asan related issues

2024-01-30 Thread Tristan Partin
Spend so much time writing out the email, I once again forget 
attachments...UGH.


--
Tristan Partin
Neon (https://neon.tech)
From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 29 Jan 2024 18:03:39 -0600
Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
 NULL

If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
the API contract of memcpy in glibc. The two pointer arguments are
marked as nonnull, even in the event the amount to copy is 0 bytes.
---
 src/backend/access/transam/xlog.c | 1 +
 src/backend/replication/logical/message.c | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 478377c4a2..929888beb5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1288,6 +1288,7 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
 		}
 
 		Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || rdata_len == 0);
+		Assert(rdata_data != NULL);
 		memcpy(currpos, rdata_data, rdata_len);
 		currpos += rdata_len;
 		CurrPos += rdata_len;
diff --git a/src/backend/replication/logical/message.c b/src/backend/replication/logical/message.c
index 2ac34e7781..126c57ef6e 100644
--- a/src/backend/replication/logical/message.c
+++ b/src/backend/replication/logical/message.c
@@ -67,7 +67,8 @@ LogLogicalMessage(const char *prefix, const char *message, size_t size,
 	XLogBeginInsert();
 	XLogRegisterData((char *) , SizeOfLogicalMessage);
 	XLogRegisterData(unconstify(char *, prefix), xlrec.prefix_size);
-	XLogRegisterData(unconstify(char *, message), size);
+	if (message)
+		XLogRegisterData(unconstify(char *, message), size);
 
 	/* allow origin filtering */
 	XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);
-- 
Tristan Partin
Neon (https://neon.tech)

From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Wed, 24 Jan 2024 17:07:01 -0600
Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address

The ecpg is parser is extremely leaky, so we need to silence leak
detection.
---
 meson.build|  3 +++
 src/bin/initdb/initdb.c| 11 +++
 src/bin/pg_config/pg_config.c  | 10 ++
 src/bin/pg_resetwal/pg_resetwal.c  | 10 ++
 src/include/pg_config.h.in |  5 +
 src/interfaces/ecpg/preproc/ecpg.c | 11 +++
 6 files changed, 50 insertions(+)

diff --git a/meson.build b/meson.build
index 8ed51b6aae..d8c524d6f6 100644
--- a/meson.build
+++ b/meson.build
@@ -2530,6 +2530,9 @@ cdata.set_quoted('PG_VERSION_STR',
   )
 )
 
+if get_option('b_sanitize').contains('address')
+  cdata.set('USE_ADDRESS_SANITIZER', 1)
+endif
 
 ###
 # NLS / Gettext
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ac409b0006..e18e716d9c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -338,6 +338,17 @@ do { \
 		output_failed = true, output_errno = errno; \
 } while (0)
 
+#ifdef USE_ADDRESS_SANITIZER
+
+const char *__asan_default_options(void);
+
+const char *__asan_default_options(void)
+{
+	return "detect_leaks=0";
+}
+
+#endif
+
 /*
  * Escape single quotes and backslashes, suitably for insertions into
  * configuration files or SQL E'' strings.
diff --git a/src/bin/pg_config/pg_config.c b/src/bin/pg_config/pg_config.c
index 77d09ccfc4..26d0b2f62b 100644
--- a/src/bin/pg_config/pg_config.c
+++ b/src/bin/pg_config/pg_config.c
@@ -67,6 +67,16 @@ static const InfoItem info_items[] = {
 	{NULL, NULL}
 };
 
+#ifdef USE_ADDRESS_SANITIZER
+
+const char *__asan_default_options(void);
+
+const char *__asan_default_options(void)
+{
+	return "detect_leaks=0";
+}
+
+#endif
 
 static void
 help(void)
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..54f1ce5e44 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -89,6 +89,16 @@ static void KillExistingWALSummaries(void);
 static void WriteEmptyXLOG(void);
 static void usage(void);
 
+#ifdef USE_ADDRESS_SANITIZER
+
+const char *__asan_default_options(void);
+
+const char *__asan_default_options(void)
+{
+	return "detect_leaks=0";
+}
+
+#endif
 
 int
 main(int argc, char *argv[])
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 07e73567dc..ce0c700b6d 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -668,6 +668,11 @@
 /* Define to 1 if strerror_r() returns int. */
 #undef STRERROR_R_INT
 
+/* Define to 1 if using the address sanitizer. Typically this can be detecte
+ * with __has_feature(address_sanitizer), but GCC doesn't support it with C99.
+ * Remove it when the standard is bumped. */
+#undef USE_ADDRESS_SANITIZER
+
 /* Define to 1 to use ARMv8 CRC Extension. */
 #undef USE_ARMV8_CRC32C
 
diff --git