Hi!
PROBLEM Our customer stumble onto the next behaviour of the Postgres cluster: if disk space is exhausted, Postgres continues to work until WAL can be successfully written. Thus, upon disk space exhaustion, clients will get an “ERROR: could not extend file “base/XXXXX/XXXXX”: No space left on device” messages and transactions will be aborted. But the cluster continues to work for a quite some time. This behaviour of the PostgreSQL, of course, is perfectly legit. Cluster just translate OS error to the user and can do nothing about it, expecting space may be available later. On the other hand, users continues to send more data and having more and more transactions to be aborted. There are next possible ways to diagnose described situation: —external monitoring system; —log analysis; —create/drop table and analyse results. Each one have advantages and disadvantages. I'm not going to dive deeper here, if you don't mind. The customer, mentioned above, in this particular case, would be glad to be able to have a mechanism to stop the cluster. Again, in this concrete case. PROPOSAL My proposal is to add a tablespace option in order to be able to configure which behaviour is appropriate for a particular user. I've decided to call this option “on_no_space” for now. If anyone has a better naming for this feature, please, report. So, the idea is to add both GUC and tablespace option “on_no_space”. The tablespace option defines the behaviour of the cluster for a particular tablespace in “on_no_space” situation. The GUC defines the default value of tablespace option. Patch is posted as PoC is attached. Here's what it looks like: =============================================================================================== == Create 100Mb disk $ dd if=/dev/zero of=/tmp/foo.img bs=100M count=1 $ mkfs.ext4 /tmp/foo.img $ mkdir /tmp/foo $ sudo mount -t ext4 -o loop /tmp/foo.img /tmp/foo $ sudo chown -R orlov:orlov /tmp/foo =============================================================================================== == Into psql postgres=# CREATE TABLESPACE foo LOCATION '/tmp/foo' WITH (on_no_space=fatal); CREATE TABLESPACE postgres=# \db+ List of tablespaces Name | Owner | Location | Access privileges | Options | Size | Description ------------+-------+----------+-------------------+---------------------+---------+------------- foo | orlov | /tmp/foo | | {on_no_space=fatal} | 0 bytes | ... postgres=# CREATE TABLE bar(qux int, quux text) WITH (autovacuum_enabled = false) TABLESPACE foo; CREATE TABLE postgres=# INSERT INTO bar(qux, quux) SELECT id, md5(id::text) FROM generate_series(1, 10000000) AS id; FATAL: could not extend file "pg_tblspc/16384/PG_16_202211121/5/16385": No space left on device HINT: Check free disk space. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. =============================================================================================== CAVEATS Again, I've posted this patch as a PoC. This is not a complete realization of described functionality. AFAICS, there are next problems: - I have to put get_tablespace_elevel call in RelationGetBufferForTuple in order to tablespace in cache; overwise, cache miss in get_tablespace triggers assertion failing in lock.c:887 (Assert("!IsRelationExtensionLockHeld")). This assertion was added by commit 15ef6ff4 (see [0] for details). - What error should be when mdextend called not to insert a tuple into a heap (WAL applying, for example)? Maybe, adding just GUC without ability to customize certain tablespaces to define "out of disk space" behaviour is enough? I would appreciate it if you give your opinions on a subject. -- Best regards, Maxim Orlov. [0] https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B%2BSs%3DGn1LA%40mail.gmail.com
v1-0001-Add-out-of-disk-space-elog-level.patch
Description: Binary data