Re: Out of memory error handling in frontend code

2023-10-06 Thread Frédéric Yhuel

Hi Daniel,

Thank you for your answer.

On 9/28/23 14:02, Daniel Gustafsson wrote:

On 28 Sep 2023, at 10:14, Frédéric Yhuel  wrote:



After some time, we understood that the 20 million of large objects were 
responsible for the huge memory usage (more than 10 GB) by pg_dump.


This sounds like a known issue [0] which has been reported several times, and
one we should get around to fixing sometime.



Indeed, I saw some of these reports afterwards :)


I think a more useful error message would help for such cases.


Knowing that this is case that pops up, I agree that we could do better around
the messaging here.


I haven't try to get the patch ready for review, I know that the format of the 
messages isn't right, I'd like to know what do you think of the idea, first.


I don't think adding more details is a bad idea, but it shouldn't require any
knowledge about internals so I think messages like the one below needs to be
reworded to be more helpful.

+   if (loinfo == NULL)
+   {
+   pg_fatal("getLOs: out of memory");
+   }



OK, here is a second version of the patch.

I didn't try to fix the path getLOs -> AssignDumpId -> catalogid_insert 
-> [...] -> catalogid_allocate, but that's annoying because it amounts 
to 11% of the memory allocations from valgrind's output.


Best regards,
FrédéricFrom 575903c1fd4ccbe49c762d1d6424e2264ba6cfad Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20Yhuel?= 
Date: Mon, 18 Sep 2023 08:18:19 +0200
Subject: [PATCH] pg_dump: fix OOM handling

Exit with better error messages when there's not enough memory
to process large objects.
---
 src/bin/pg_dump/pg_backup_archiver.c | 45 +---
 src/bin/pg_dump/pg_backup_archiver.h |  4 +++
 src/bin/pg_dump/pg_dump.c| 33 
 src/common/fe_memutils.c | 14 +++--
 src/include/common/fe_memutils.h |  1 +
 5 files changed, 80 insertions(+), 17 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index ab351e457e..fedbe67a07 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -1101,6 +1101,22 @@ WriteData(Archive *AHX, const void *data, size_t dLen)
 	AH->WriteDataPtr(AH, data, dLen);
 }
 
+#define fill_in_new_toc(a,b) \
+do { \
+	if (opts->b) \
+	{ \
+		newToc->a = pg_strdup_extended(opts->b, MCXT_ALLOC_NO_OOM); \
+		if (newToc->a == NULL) \
+		{ \
+			goto error; \
+		} \
+	} \
+	else \
+	{ \
+		newToc->a = NULL; \
+	} \
+} while(0)
+
 /*
  * Create a new TOC entry. The TOC was designed as a TOC, but is now the
  * repository for all metadata. But the name has stuck.
@@ -1118,7 +1134,11 @@ ArchiveEntry(Archive *AHX, CatalogId catalogId, DumpId dumpId,
 	ArchiveHandle *AH = (ArchiveHandle *) AHX;
 	TocEntry   *newToc;
 
-	newToc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
+	newToc = (TocEntry *) pg_malloc_extended(sizeof(TocEntry), MCXT_ALLOC_NO_OOM|MCXT_ALLOC_ZERO);
+	if (newToc == NULL)
+	{
+		goto error;
+	}
 
 	AH->tocCount++;
 	if (dumpId > AH->maxDumpId)
@@ -1133,15 +1153,15 @@ ArchiveEntry(Archive *AHX, CatalogId catalogId, DumpId dumpId,
 	newToc->dumpId = dumpId;
 	newToc->section = opts->section;
 
-	newToc->tag = pg_strdup(opts->tag);
-	newToc->namespace = opts->namespace ? pg_strdup(opts->namespace) : NULL;
-	newToc->tablespace = opts->tablespace ? pg_strdup(opts->tablespace) : NULL;
-	newToc->tableam = opts->tableam ? pg_strdup(opts->tableam) : NULL;
-	newToc->owner = opts->owner ? pg_strdup(opts->owner) : NULL;
-	newToc->desc = pg_strdup(opts->description);
-	newToc->defn = opts->createStmt ? pg_strdup(opts->createStmt) : NULL;
-	newToc->dropStmt = opts->dropStmt ? pg_strdup(opts->dropStmt) : NULL;
-	newToc->copyStmt = opts->copyStmt ? pg_strdup(opts->copyStmt) : NULL;
+	fill_in_new_toc(tag, tag);
+	fill_in_new_toc(namespace, namespace);
+	fill_in_new_toc(tablespace, tablespace);
+	fill_in_new_toc(tableam, tableam);
+	fill_in_new_toc(owner, owner);
+	fill_in_new_toc(desc, description);
+	fill_in_new_toc(defn, createStmt);
+	fill_in_new_toc(dropStmt, dropStmt);
+	fill_in_new_toc(copyStmt, copyStmt);
 
 	if (opts->nDeps > 0)
 	{
@@ -1166,6 +1186,11 @@ ArchiveEntry(Archive *AHX, CatalogId catalogId, DumpId dumpId,
 		AH->ArchiveEntryPtr(AH, newToc);
 
 	return newToc;
+
+error:
+	pg_log_error("Could not add a new archive entry: %s", strerror(errno));
+	pg_log_error_hint(LO_OOM_HINT);
+	exit(1);
 }
 
 /* Public */
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 18b38c17ab..626fe2cb11 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -97,6 +97,10 @@ typedef struct _archiveHandle ArchiveHandle;
 typedef struct _tocEntry TocEntry;
 struct ParallelState;
 
+#define LO_OOM_HINT "If the database contains a large number of large objects, "\
+		"consider using the \"-B\" option to exclude them from the dump "\
+		"if it makes sense. Otherwise, 

Re: Out of memory error handling in frontend code

2023-09-28 Thread Daniel Gustafsson
> On 28 Sep 2023, at 10:14, Frédéric Yhuel  wrote:

> After some time, we understood that the 20 million of large objects were 
> responsible for the huge memory usage (more than 10 GB) by pg_dump.

This sounds like a known issue [0] which has been reported several times, and
one we should get around to fixing sometime.

> I think a more useful error message would help for such cases.

Knowing that this is case that pops up, I agree that we could do better around
the messaging here.

> I haven't try to get the patch ready for review, I know that the format of 
> the messages isn't right, I'd like to know what do you think of the idea, 
> first.

I don't think adding more details is a bad idea, but it shouldn't require any
knowledge about internals so I think messages like the one below needs to be
reworded to be more helpful.

+   if (loinfo == NULL)
+   {
+   pg_fatal("getLOs: out of memory");
+   }

--
Daniel Gustafsson

[0] 
https://www.postgresql.org/message-id/7da8823d83a2b66bdd917aa6cb2c5c2619d86011.ca...@credativ.de





Out of memory error handling in frontend code

2023-09-28 Thread Frédéric Yhuel

Hello,

One of our customers recently complained that his pg_dump stopped 
abruptly with the message "out of memory".


After some time, we understood that the 20 million of large objects were 
responsible for the huge memory usage (more than 10 GB) by pg_dump.


I think a more useful error message would help for such cases. Indeed, 
it's not always possible to ask the client to run pg_dump with 
"valgrind --tool=massif" on its server.


Now, I understand that we don't want to add too much to the frontend 
code, it would be a lot of pain for not much gain.


But I wonder if we could add some checks in a few strategic places, as 
in the attached patch.


The idea would be to print a more useful error message in most of the 
cases, and keep the basic "out of memory" for the remaining ones.


I haven't try to get the patch ready for review, I know that the format 
of the messages isn't right, I'd like to know what do you think of the 
idea, first.


Best regards,
FrédéricFrom 81aa4ae59778f1193d6e1a8c81931502c941e997 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20Yhuel?= 
Date: Mon, 18 Sep 2023 08:18:19 +0200
Subject: [PATCH] pg_dump: fix OOM handling

Exit with better error messages when there's not enough memory
to process large objects.
---
 src/bin/pg_dump/pg_backup_archiver.c | 10 +++---
 src/bin/pg_dump/pg_backup_archiver.h |  6 --
 src/bin/pg_dump/pg_dump.c| 13 +++--
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 39ebcfec32..8370e26075 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -1112,13 +1112,17 @@ WriteData(Archive *AHX, const void *data, size_t dLen)
 
 /* Public */
 TocEntry *
-ArchiveEntry(Archive *AHX, CatalogId catalogId, DumpId dumpId,
-			 ArchiveOpts *opts)
+ArchiveEntry2(Archive *AHX, CatalogId catalogId, DumpId dumpId,
+			 ArchiveOpts *opts, const char* caller)
 {
 	ArchiveHandle *AH = (ArchiveHandle *) AHX;
 	TocEntry   *newToc;
 
-	newToc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
+	newToc = (TocEntry *) pg_malloc_extended(sizeof(TocEntry), MCXT_ALLOC_NO_OOM|MCXT_ALLOC_ZERO);
+	if (newToc == NULL)
+	{
+		pg_fatal("%s: could not add a new archive entry: %s", caller, strerror(errno));
+	}
 
 	AH->tocCount++;
 	if (dumpId > AH->maxDumpId)
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 18b38c17ab..60872744a6 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -400,8 +400,10 @@ typedef struct _archiveOpts
 } ArchiveOpts;
 #define ARCHIVE_OPTS(...) &(ArchiveOpts){__VA_ARGS__}
 /* Called to add a TOC entry */
-extern TocEntry *ArchiveEntry(Archive *AHX, CatalogId catalogId,
-			  DumpId dumpId, ArchiveOpts *opts);
+extern TocEntry *ArchiveEntry2(Archive *AHX, CatalogId catalogId,
+	  DumpId dumpId, ArchiveOpts *opts, const char* caller);
+
+#define ArchiveEntry(a, b, c, d) ArchiveEntry2(a, b, c, d, __func__)
 
 extern void WriteHead(ArchiveHandle *AH);
 extern void ReadHead(ArchiveHandle *AH);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7977d6a9c0..8413d4f115 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3566,7 +3566,11 @@ getLOs(Archive *fout)
 	/*
 	 * Each large object has its own "BLOB" archive entry.
 	 */
-	loinfo = (LoInfo *) pg_malloc(ntups * sizeof(LoInfo));
+	loinfo = (LoInfo *) pg_malloc_extended(ntups * sizeof(LoInfo), MCXT_ALLOC_NO_OOM);
+	if (loinfo == NULL)
+	{
+		pg_fatal("getLOs: out of memory");
+	}
 
 	for (i = 0; i < ntups; i++)
 	{
@@ -3606,7 +3610,12 @@ getLOs(Archive *fout)
 	 */
 	if (ntups > 0)
 	{
-		lodata = (DumpableObject *) pg_malloc(sizeof(DumpableObject));
+		lodata = (DumpableObject *) pg_malloc_extended(sizeof(DumpableObject), MCXT_ALLOC_NO_OOM);
+		if (lodata == NULL)
+		{
+			pg_fatal("getLOs: out of memory");
+		}
+
 		lodata->objType = DO_LARGE_OBJECT_DATA;
 		lodata->catId = nilCatalogId;
 		AssignDumpId(lodata);
-- 
2.39.2