On Mon, Aug 09, 2021 at 01:08:42PM -0400, Robert Haas wrote:
> To reproduce, initialize a cluster with wal_level=minimal and
> max_wal_senders=0. Then from psql:
> 
> \! mkdir /tmp/goose
> 
> CHECKPOINT;
> CREATE TABLESPACE goose LOCATION '/tmp/goose';
> SET wal_skip_threshold=0;
> BEGIN;
> CREATE TABLE wild (a int, b text) TABLESPACE goose;
> INSERT INTO wild VALUES (1, 'chase');
> COMMIT;
> SELECT * FROM wild;
> 
> As expected, you will see one row in table 'wild'. Now perform an
> immediate shutdown. Restart the server. Table 'wild' is now empty.

Thanks for finding the problem.  It's a bad problem.

> The problem appears to be that tblspc_redo() calls
> create_tablespace_directories(), which says:
> 
>         /*
>          * Our theory for replaying a CREATE is to forcibly drop the target
>          * subdirectory if present, and then recreate it. This may be more
>          * work than needed, but it is simple to implement.
>          */
> 
> Unfortunately, this theory (which dates to
> c86f467d18aa58e18fd85b560b46d8de014e6017, vintage 2010, by Bruce) is
> correct only with wal_level>minimal. At wal_level='minimal', we can
> replay the record to recreate the relfilenode, but not the records
> that would have created the contents. However, note that if the table
> is smaller than wal_skip_threshold, then we'll log full-page images of
> the contents at commit time even at wal_level='minimal' after which we
> have no problem. As far as I can see, this bug has "always" existed,
> but before c6b92041d38512a4176ed76ad06f713d2e6c01a8 (2020, Noah) you
> would have needed a different test case. Specifically, you would have
> needed to use COPY to put the row in the table, and you would have
> needed to omit setting wal_skip_threshold since it didn't exist yet.

Agreed.

> I don't presently have a specific idea about how to fix this.

Can't recovery just not delete the directory, create it if doesn't exist, and
be happy if it does exist?  Like the attached WIP.  If we think it's possible
for a crash during mkdir to leave a directory having the wrong permissions,
adding a chmod would be in order.
diff --git a/src/backend/commands/tablespace.c 
b/src/backend/commands/tablespace.c
index a54239a..2928ecf 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -589,7 +589,6 @@ create_tablespace_directories(const char *location, const 
Oid tablespaceoid)
 {
        char       *linkloc;
        char       *location_with_version_dir;
-       struct stat st;
 
        linkloc = psprintf("pg_tblspc/%u", tablespaceoid);
        location_with_version_dir = psprintf("%s/%s", location,
@@ -614,39 +613,22 @@ create_tablespace_directories(const char *location, const 
Oid tablespaceoid)
                                                        location)));
        }
 
-       if (InRecovery)
-       {
-               /*
-                * Our theory for replaying a CREATE is to forcibly drop the 
target
-                * subdirectory if present, and then recreate it. This may be 
more
-                * work than needed, but it is simple to implement.
-                */
-               if (stat(location_with_version_dir, &st) == 0 && 
S_ISDIR(st.st_mode))
-               {
-                       if (!rmtree(location_with_version_dir, true))
-                               /* If this failed, MakePGDirectory() below is 
going to error. */
-                               ereport(WARNING,
-                                               (errmsg("some useless files may 
be left behind in old database directory \"%s\"",
-                                                               
location_with_version_dir)));
-               }
-       }
-
        /*
         * The creation of the version directory prevents more than one 
tablespace
         * in a single location.
         */
        if (MakePGDirectory(location_with_version_dir) < 0)
        {
-               if (errno == EEXIST)
-                       ereport(ERROR,
-                                       (errcode(ERRCODE_OBJECT_IN_USE),
-                                        errmsg("directory \"%s\" already in 
use as a tablespace",
-                                                       
location_with_version_dir)));
-               else
+               if (errno != EEXIST)
                        ereport(ERROR,
                                        (errcode_for_file_access(),
                                         errmsg("could not create directory 
\"%s\": %m",
                                                        
location_with_version_dir)));
+               else if (!InRecovery)
+                       ereport(ERROR,
+                                       (errcode(ERRCODE_OBJECT_IN_USE),
+                                        errmsg("directory \"%s\" already in 
use as a tablespace",
+                                                       
location_with_version_dir)));
        }
 
        /*

Reply via email to