[PATCH] initdb: do not exit after warn_on_mount_point

2022-09-10 Thread andrey . arapov
Hello,

please find my first patch for PostgreSQL is attached.
Kind regards,
Andrey Arapov


0001-initdb-do-not-exit-after-warn_on_mount_point.patch
Description: Binary data


Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-10 Thread Tom Lane
andrey.ara...@nixaid.com writes:
> please find my first patch for PostgreSQL is attached.

You haven't explained why you think this would be a good
change, or even a safe one.

regards, tom lane




Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread andrey . arapov
Hi Tom,

I've updated the patch by adding the explanation behind and more comments. 
(please see the attachment)

Have slightly improved the logic so that it does not report an error
"directory \"%s\" exists but is not empty"
when it is only supposed to warn the user about the mountpoint, without exiting.

To me, my patch looks like a typo fix where exit(1) should not be called on the 
warn_on_mount_point(),
but only warn and continue as more people are mounting the device at 
`/var/lib/postgresql/data` (PGDATA) in the containerized world (K8s 
deployments, especially now in the Akash Network I am working for) for making 
sure their data persist.

As a workaround, we either have to `rmdir /var/lib/postgresql/data/lost+found` 
before running `docker-entrypoint.sh postgres` which in turn calls the 
`initdb`, or, alternatively we have to pass 
`PGDATA=/var/lib/postgresql/data/` while mounting persistent storage 
over `/var/lib/postgresql/data` path so that it won't exit on the very first 
run.
To me, this in itself is an odd behavior, which led me to finding this typo 
(from my point of view) to which I've made this patch.


Please let me know if it makes sense or requires more information / explanation.


Kind regards,
Andrey Arapov


September 10, 2022 5:10 PM, "Tom Lane"  wrote:

> andrey.ara...@nixaid.com writes:
> 
>> please find my first patch for PostgreSQL is attached.
> 
> You haven't explained why you think this would be a good
> change, or even a safe one.
> 
> regards, tom lane


0001-initdb-do-not-exit-when-PGDATA-PGDATA-pg_wal-is-a-mo.patch
Description: Binary data


Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Julien Rouhaud
Hi,

On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote:
>
> Have slightly improved the logic so that it does not report an error
> "directory \"%s\" exists but is not empty"
> when it is only supposed to warn the user about the mountpoint, without
> exiting.
>
> To me, my patch looks like a typo fix where exit(1) should not be called on
> the warn_on_mount_point(), but only warn and continue as more people are
> mounting the device at `/var/lib/postgresql/data` (PGDATA) in the
> containerized world (K8s deployments, especially now in the Akash Network I
> am working for) for making sure their data persist.

This definitely isn't a typo, as it's really strongly discouraged to use a
mount point for the data directory.  You can refer to this thread [1] for more
explanations.

[1] 
https://www.postgresql.org/message-id/flat/CAKoxK%2B6H40imynM5P31bf0DnpN-5f5zeROjcaj6BKVAjxdEA9w%40mail.gmail.com




Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Tom Lane
Julien Rouhaud  writes:
> On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote:
>> Have slightly improved the logic so that it does not report an error
>> "directory \"%s\" exists but is not empty"
>> when it is only supposed to warn the user about the mountpoint, without
>> exiting.

> This definitely isn't a typo, as it's really strongly discouraged to use a
> mount point for the data directory.

Absolutely.  I think maybe the problem here is that warn_on_mount_point()
is a pretty bad name for that helper function, as this is not "just a
warning".

regards, tom lane




Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Julien Rouhaud
On Sun, Sep 11, 2022 at 06:22:21PM +, andrey.ara...@nixaid.com wrote:
> September 11, 2022 12:18 PM, "Julien Rouhaud"  wrote:
>
> > Hi,
> >
> > On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote:
> >
> >> Have slightly improved the logic so that it does not report an error
> >> "directory \"%s\" exists but is not empty"
> >> when it is only supposed to warn the user about the mountpoint, without
> >> exiting.
> >>
> >> To me, my patch looks like a typo fix where exit(1) should not be called on
> >> the warn_on_mount_point(), but only warn and continue as more people are
> >> mounting the device at `/var/lib/postgresql/data` (PGDATA) in the
> >> containerized world (K8s deployments, especially now in the Akash Network I
> >> am working for) for making sure their data persist.
> >
> > This definitely isn't a typo, as it's really strongly discouraged to use a
> > mount point for the data directory. You can refer to this thread [1] for 
> > more
> > explanations.
> >
> > [1]
> > https://www.postgresql.org/message-id/flat/CAKoxK+6H40imynM5P31bf0DnpN-5f5zeROjcaj6BKVAjxdEA9w@mail.
> > mail.com
>
>
> I've read the "why not using a mountpoint as PGDATA?" thread [1] as well as
> Bugzilla "postgresql-setup fails when /var/lib/pgsql/data is mount point"
> thead [2] but could not find any good reason why not to mount the PGDATA
> directly, except probably for the NFS mount point, but who does that anyway?

What about this part in Tom's original answer:

3. If, some day, the filesystem is accidentally unmounted while the database is
up, it will continue to write into files that are now getting placed in the
mount-point directory on the parent volume.  This usually results in an
unrecoverably messed-up database by the time you realize what's going wrong.
(There are horror stories about such cases in the PG community mailing list
archives, dating from before we installed the don't-use-a-mount-point defenses
in initdb.)

> Everyone using containerized postgresql image cannot use /var/lib/postgresql
> as the mountpoint but has to use /var/lib/postgresql/data instead due to this
> issue [4] due to [5].  Hence, everyone using containerized version of
> postgresql with the device (say Ceph's RBD) mounted over
> /var/lib/postgresql/data directory has to either specify:
>
> - PGDATA=/var/lib/postgresql/data/
>
> OR
>
> make sure to remove $PGDATA/lost+found directory.

initdb had this behavior for almost 10 years, and for good reasons: it prevents
actual problems that were seen in the field.

It's unfortunate that the docker postgres images didn't take that behavior into
account, which was introduced more than a year before any work started on those
images, but if you're not happy with those workarounds it's something that
should be changed in the docker images.

> Both of these hacks are only for the initdb to fail detect the mountpoint
> which, on its own, is supposed to be only a warning which is seen from the
> function name warn_on_mount_point() and its messages [6] look to be of only
> advisory nature to me.

As Tom confirmed at [1], you shouldn't assume anything about the criticality
based on the function name.  If anything, the "warn" part is only saying that
the function itself won't exit(1).  And yes this function is only adding
information on the reason why the given directory can't be used, but it doesn't
change the fact that the given directory can't be used.

[1] https://www.postgresql.org/message-id/813162.1662908...@sss.pgh.pa.us




Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Tom Lane
Julien Rouhaud  writes:
> On Sun, Sep 11, 2022 at 06:22:21PM +, andrey.ara...@nixaid.com wrote:
>> Everyone using containerized postgresql image cannot use /var/lib/postgresql
>> as the mountpoint but has to use /var/lib/postgresql/data instead due to this
>> issue [4] due to [5].

> initdb had this behavior for almost 10 years, and for good reasons: it 
> prevents
> actual problems that were seen in the field.

The long and the short of this is that one person losing their data
outweighs thousands of people being very mildly inconvenienced by
having to create an extra level of directory.  I understand that you
don't think so, but you'd change your mind PDQ if it was *your* data
that got irretrievably corrupted.

We are not going to remove this check.

If anything, the fault I'd find with the existing code is that it's not
sufficiently thorough about detecting what is a mount point.  AFAICS,
neither the lost+found check nor the extra-dot-files check will trigger
on modern filesystems such as XFS.  I wonder if we could do something
like comparing the stat(2) st_dev numbers for the proposed data directory
vs. its parent directory.

regards, tom lane




Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-12 Thread andrey . arapov
September 11, 2022 12:18 PM, "Julien Rouhaud"  wrote:

> Hi,
> 
> On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote:
> 
>> Have slightly improved the logic so that it does not report an error
>> "directory \"%s\" exists but is not empty"
>> when it is only supposed to warn the user about the mountpoint, without
>> exiting.
>> 
>> To me, my patch looks like a typo fix where exit(1) should not be called on
>> the warn_on_mount_point(), but only warn and continue as more people are
>> mounting the device at `/var/lib/postgresql/data` (PGDATA) in the
>> containerized world (K8s deployments, especially now in the Akash Network I
>> am working for) for making sure their data persist.
> 
> This definitely isn't a typo, as it's really strongly discouraged to use a
> mount point for the data directory. You can refer to this thread [1] for more
> explanations.
> 
> [1]
> https://www.postgresql.org/message-id/flat/CAKoxK+6H40imynM5P31bf0DnpN-5f5zeROjcaj6BKVAjxdEA9w@mail.
> mail.com


I've read the "why not using a mountpoint as PGDATA?" thread [1] as well as 
Bugzilla "postgresql-setup fails when /var/lib/pgsql/data is mount point" thead 
[2] but could not find any good reason why not to mount the PGDATA directly,
except probably for the NFS mount point, but who does that anyway?
And using NFS for PostgreSQL PGDATA is a way for finding problems starting with 
poor performance ending up with corrupted DB. lol

The only point that hooked my attention was the pg_upgrade, but then I've tried 
pg_upgrade'ing postgresql:13 to postgresql:14, everything went without issues. 
I wrote a step-by-step doc for that purpose [3].

That leaves me unconvinced on why would `initdb` quit when detecting PGDATA 
being a mountpoint.

Everyone using containerized postgresql image cannot use /var/lib/postgresql as 
the mountpoint but has to use /var/lib/postgresql/data instead due to this 
issue [4] due to [5].
Hence, everyone using containerized version of postgresql with the device (say 
Ceph's RBD) mounted over /var/lib/postgresql/data directory has to either 
specify:

- PGDATA=/var/lib/postgresql/data/

OR

make sure to remove $PGDATA/lost+found directory.

Both of these hacks are only for the initdb to fail detect the mountpoint 
which, on its own, is supposed to be only a warning which is seen from the 
function name warn_on_mount_point() and its messages [6] look to be of only 
advisory nature to me.

I would understand initdb to exit if it detects the mountpoint being is NFS, 
otherwise, there is no reason for not letting one use the mountpoint over 
PGDATA as neither it breaks the pg_upgrade functionality as shown in this 
message.


[1] 
https://www.postgresql.org/message-id/flat/cakoxk+6h40imynm5p31bf0dnpn-5f5zerojcaj6bkvajxde...@mail.gmail.com
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1247477
[3] https://gist.github.com/andy108369/aa3bf1707054542f2fa944f6d39aef64
[4] https://github.com/docker-library/postgres/issues/404#issuecomment-773755801
[5] 
https://github.com/docker-library/postgres/blob/3c20b7bdb915ecb1648fb468ab53080c58bb1716/Dockerfile-debian.template#L184
[6] 
https://github.com/postgres/postgres/blob/7fed801135bae14d63b11ee4a10f6083767046d8/src/bin/initdb/initdb.c#L2616-L2626


Kind regards,
Andrey Arapov




Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-12 Thread andrey . arapov
September 12, 2022 7:51 AM, "Tom Lane"  wrote:

> Julien Rouhaud  writes:
> 
>> On Sun, Sep 11, 2022 at 06:22:21PM +, andrey.ara...@nixaid.com wrote:
>>> Everyone using containerized postgresql image cannot use /var/lib/postgresql
>>> as the mountpoint but has to use /var/lib/postgresql/data instead due to 
>>> this
>>> issue [4] due to [5].
>> 
>> initdb had this behavior for almost 10 years, and for good reasons: it 
>> prevents
>> actual problems that were seen in the field.
> 
> The long and the short of this is that one person losing their data
> outweighs thousands of people being very mildly inconvenienced by
> having to create an extra level of directory. I understand that you
> don't think so, but you'd change your mind PDQ if it was *your* data
> that got irretrievably corrupted.
> 
> We are not going to remove this check.
> 
> If anything, the fault I'd find with the existing code is that it's not
> sufficiently thorough about detecting what is a mount point. AFAICS,
> neither the lost+found check nor the extra-dot-files check will trigger
> on modern filesystems such as XFS. I wonder if we could do something
> like comparing the stat(2) st_dev numbers for the proposed data directory
> vs. its parent directory.
> 
> regards, tom lane


Thank you for the explanation Tom & Julien!

This is not an issue for the containerized PostgreSQL where the mountpoint is a 
required dependency without which the container won't even spawn nor start the 
scripts (such as initdb).
This also makes it difficult to comprehend the reason initdb exits when it sees 
PGDATA is a mountpoint it at the first hand.

That and the "Warn about initdb using mount-points" commit [4] made me thinking 
that this was a typo.

But I do understand that having it exiting on detecting PGDATA being a 
mountpoint have saved people from having their DB corrupted is a good enough 
reason for keeping thing as they are now.


== Summarizing the issue ==

The problem is that PostgreSQL will irrecoverably corrupt when the `PGDATA` 
mountpoint gets accidentally unmounted while the DB is up. [1]

The `PGDATA` can't just accidentally get unmounted due to the files being 
locked by a process there, unless there is a time window between the DB 
locking/unlocking the data in it of course.
Or unless someone forcibly unmounts it.

(I'd expect the DB to detect this & immediately terminate with a fatal error)

Or more likely is when `PGDATA` directory gets mounted back again _while_ 
PostgreSQL is running with a newly initialized cluster in the empty PGDATA 
directory (say, due to start-scripts running `initdb` or there was an old 
instance of a previously initialized cluster in the underlying PGDATA directory 
[with mountpoint being unmounted]), and then gets closed which in turn causes 
it writing the wrong data into the correct pg_control file.


== The solution (partial) ==

The solution to this was built into the `initdb` [4] - which makes sure it 
fails when it finds the `PGDATA` is a mountpoint.
This is currently achieved by `initdb` finding the `lost+found` directory in 
PGDATA, which is not a robust solution, since there are filesystems that do not 
place that file onto the mountpoint such as XFS, BTRFS, eCryptfs, ..., or one 
can erase the `lost+found` directory.
Hence, this is only a partial solution to the problem and looks to be more of a 
workaround nature IMHO.


== The recommendation ==

The recommendation is to mount an upper directory instead of PGDATA directly 
(one above the PGDATA, e.g. `$PGDATA/../.`) to make sure the DB or a certain* 
PostgreSQL start script will fail due to a missing `PGDATA` directory instead 
of running the `initdb` to re-init the cluster.

== The reason ==

This is because, as already mentioned before, running the `initdb` can lead to 
a situation where one can irrecoverably corrupt the DB by mounting the original 
PGDATA backing device back over again and stopping/re-starting the DB which 
will irrecoverably overwrite the good pg_control file.
(* - I assume that on some systems the postgresql start-script runs `initdb` 
when detects PGDATA directory is empty before starting the DB; like in the 
postgresql docker container [2] (by checking the presence of 
`$PGDATA/PG_VERSION`). And the default postgresql start script [3] runs the DB 
directly without issuing `initdb`.)


== Next steps? ==

I'm wondering whether that's the same issue for mariadb / mysql ...

And whether there could be a better way to handle this issue and protect the 
PostgreSQL from corrupting the DB in the event the PGDATA gets accidentally 
unmounted / re-mounted leading the the issue described above.
Solving this would allow people to actually mount the PGDATA directly, without 
the need to fix the `initdb`'s mountpoint detection for filesystems without 
`lost+found` directory.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1247477#c1
[2] 
https://github.com/docker-library/postgres/blob/74e51d102aede/docker-entrypoint.sh#L317
[3] 
h