I have committed some of the unrelated formatting changes separately, so
what's left now is attached.
On 17.05.23 01:38, Andres Freund wrote:
- I left this for smgrzeroextend():
FIXME: why both blocknum and nblocks
I guess you're suggesting that we would do an lstat() to figure out the size
instead? Or use some cached size? That'd not be trivial to add - and it just
seems unrelated to smgzerorextend(), it's just as true for smgrextend().
What smgzerorextend() does now seems sensible to me. I'd just like one
or two sentences that document its API. Right now, blocknum and nblocks
are not documented at all. Of course, we can guess, but I'm also
interested in edge cases. What are you supposed to pass for blocknum?
What happens if it's not right after the current last block? You
answered that, but is that just what happens to happen, or do we
actually want to support that? What happens if blocknum is *before* the
currently last block? Would that overwrite the last existing blocks
with zero? What if nblocks is negative? Does that zero out blocks
backwards?
Obviously, the answer for most of these right now is that you're not
supposed to do that. But as the smgrextend() + hash index case shows,
these things tend to grow in unexpected directions.
Also, slightly unrelated to the API, did we consider COW file systems?
Like, is explicitly allocating blocks of zeroes sensible on btrfs?
From 17ffc6055c7755d110ca19f21bc328bf19197896 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Fri, 19 May 2023 16:49:03 +0200
Subject: [PATCH v2] WIP: Improve smgr source code comments
Discussion:
https://www.postgresql.org/message-id/flat/22fed8ba-01c3-2008-a256-4ea912d68fab%40enterprisedb.com
---
src/backend/storage/smgr/md.c | 17 +++++------------
src/backend/storage/smgr/smgr.c | 24 ++++++++++++++++--------
2 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 42e3501255..a19bab47f2 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -449,11 +449,11 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber
forknum, bool isRedo)
/*
* mdextend() -- Add a block to the specified relation.
*
- * The semantics are nearly the same as mdwrite(): write at the
- * specified position. However, this is to be used for the case of
- * extending a relation (i.e., blocknum is at or beyond the current
- * EOF). Note that we assume writing a block beyond current EOF
- * causes intervening file space to become filled with zeroes.
+ * This is to be used for the case of extending a relation (i.e., blocknum is
+ * at or beyond the current EOF). Note that writing a block beyond current
+ * EOF must cause the intervening file space to become filled with zeroes.
+ * The POSIX file system APIs do that automatically, so we don't need to do
+ * anything about that.
*/
void
mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
@@ -516,9 +516,6 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber
blocknum,
/*
* mdzeroextend() -- Add new zeroed out blocks to the specified relation.
- *
- * Similar to mdextend(), except the relation can be extended by multiple
- * blocks at once and the added blocks will be filled with zeroes.
*/
void
mdzeroextend(SMgrRelation reln, ForkNumber forknum,
@@ -800,10 +797,6 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber
blocknum,
/*
* mdwrite() -- Write the supplied block at the appropriate location.
- *
- * This is to be used only for updating already-existing blocks of a
- * relation (ie, those before the current EOF). To extend a relation,
- * use mdextend().
*/
void
mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index f76c4605db..c4c6c5a373 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -30,7 +30,9 @@
/*
* This struct of function pointers defines the API between smgr.c and
- * any individual storage manager module. Note that smgr subfunctions are
+ * any individual storage manager module.
+ *
+ * Note that smgr subfunctions are
* generally expected to report problems via elog(ERROR). An exception is
* that smgr_unlink should use elog(WARNING), rather than erroring out,
* because we normally unlink relations during post-commit/abort cleanup,
@@ -104,8 +106,7 @@ static void smgrshutdown(int code, Datum arg);
/*
- * smgrinit(), smgrshutdown() -- Initialize or shut down storage
- * managers.
+ * smgrinit() -- Initialize storage managers.
*
* Note: smgrinit is called during backend startup (normal or standalone
* case), *not* during postmaster start. Therefore, any resources created
@@ -144,6 +145,8 @@ smgrshutdown(int code, Datum arg)
/*
* smgropen() -- Return an SMgrRelation object, creating it if need be.
*
+ * "backend" is for temporary tables, otherwise InvalidBackendId.
+ *
* This does not attempt to actually open the underlying file.
*/
SMgrRelation
@@ -368,6 +371,9 @@ smgrcloserellocator(RelFileLocatorBackend rlocator)
* Given an already-created (but presumably unused) SMgrRelation,
* cause the underlying disk file or other storage for the fork
* to be created.
+ *
+ * isRedo is true during recovery. In that case, the underlying storage may
+ * already exist.
*/
void
smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
@@ -490,8 +496,8 @@ smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)
* The semantics are nearly the same as smgrwrite(): write at the
* specified position. However, this is to be used for the case of
* extending a relation (i.e., blocknum is at or beyond the current
- * EOF). Note that we assume writing a block beyond current EOF
- * causes intervening file space to become filled with zeroes.
+ * EOF). Writing a block beyond current EOF must cause the
+ * intervening file space to become filled with zeroes.
*/
void
smgrextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
@@ -514,9 +520,11 @@ smgrextend(SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum,
/*
* smgrzeroextend() -- Add new zeroed out blocks to a file.
*
- * Similar to smgrextend(), except the relation can be extended by
- * multiple blocks at once and the added blocks will be filled with
- * zeroes.
+ * Extend the relation by the given number of blocks, which will be filled
+ * with zeroes. This is similar to smgrextend() but only extends and does not
+ * write a buffer of data.
+ *
+ * FIXME: why both blocknum and nblocks
*/
void
smgrzeroextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
--
2.40.1