I have a question about the files in .../data/postgresql/11/main/base,
specifically in relation to very large tables and how they are written.

I have been attempting to restore a relatively large database with
pg_restore and it has been running for more than a week. (I also  have
another thread about the same restore related to network vs. local disk I/O)

I ran the pg_restore in verbose mode using multiple jobs so I can tell what
has finished and what has not. The database had 66 tables and most of them
have been restored. Two of the tables were quite large (billions of rows
translating to between 1 and 2TB of data on disk for those two tables) and
those two tables are pretty much the only things remaining that has not
been reported as finished by pg_restore.

As the process has been going for a week, I have been tracking the machine
(a dedicated piece of hardware, non-virtualized) and have been noticing a
progressive slowdown (as tracked by iostat). There is nothing running on
the machine besides postgresql and the server is only doing this restore,
nothing else. It is now, on average, running at less than 25% of the speed
that it was running four days ago (as measured by rate of I/O).

I started to dig into what was happening on the machine and I noticed the
following:

iotop reports that two postgres processes (I assume each processing one of
the two tables that needs to be processed) are doing all the I/O. That
makes sense

Total DISK READ :    1473.81 K/s | Total DISK WRITE :     617.30 K/s
Actual DISK READ:    1473.81 K/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 6601 be/4 postgres  586.44 K/s    7.72 K/s  0.00 % 97.39 % postgres:
11/main: postg~s thebruteff [local] COPY
 6600 be/4 postgres  887.37 K/s  601.87 K/s  0.00 % 93.42 % postgres:
11/main: postg~s thebruteff [local] COPY
  666 be/3 root        0.00 B/s    7.72 K/s  0.00 %  5.73 % [jbd2/sda1-8]
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]

So, the next thing I though I would do is an "lsof" command for each of
those two processes to see what they were writing. That was a bit of a
surpise:

# lsof -p 6600 | wc -l;
840

# lsof -p 6601 | wc -l;
906

Is that normal? That there be so many open file pointers? ~900 open file
pointers for each of the processes?

The next I did was go to see the actual data files, to see how many there
are. In my case they are in postgresql/11/main/base/24576 and there are
2076 files there. That made sense. However, I found that when I list them
by modification date I see something interesting:

-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.7
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.8
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.9
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.10
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.11
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.12
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.13
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.14
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.16
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.15
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.17
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.18
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.19
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.21
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.22
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.23
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.24
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.25
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.26
-rw------- 1 postgres postgres   19062784 Oct  8 13:05 27082_fsm
-rw------- 1 postgres postgres  544489472 Oct  8 13:05 27077.34
-rw------- 1 postgres postgres  169705472 Oct  8 13:05 27082.72
-rw------- 1 postgres postgres  978321408 Oct  8 13:05 27083.27
-rw------- 1 postgres postgres  342925312 Oct  8 13:05 27076.88

If you notice, the file size is capped at 1 GB and as the giant table has
grown it has added more files in this directory. However, the mysterious
thing to me is that it keeps modifying those files constantly - even the
ones that are completely full. So for the two large tables it has been
restoring all week, the modification time for the ever growing list of
files is being updating constantly.

Could it be that thats why I am seeing a slowdown over the course of the
week - that for some reason as the number of files for the table has grown,
the system is spending more and more time seeking around the disk to touch
all those files for some reason?

Does anyone who understands the details of postgresql's interaction with
the file system have an explanation for why all those files which are full
are being touched constantly?

Reply via email to