Hi
pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout in following environment. [Environment] Postgres 13dev (master branch) Red Hat Enterprise Postgres 7.4 [Error] $ pg_basebackup -F t --progress --verbose -h <hostname> -D <directory> pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/5A000060 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_15647" pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. [Analysis] - pg_basebackup -F t creates a tar file and does fsync() for each tablespace. (Otherwise, -F p does fsync() only once at the end.) - While doing fsync() for a tar file for one tablespace, wal sender sends the content of the next tablespace. When fsync() spends long time, the tcp socket of pg_basebackup returns "zero window" packets to wal sender. This means the tcp socket buffer of pg_basebackup is exhausted since pg_basebackup cannot receive during fsync(). - The socket of wal sender retries to send the packet, but resets connection after tcp_user_timeout. After wal sender resets connection, pg_basebackup cannot receive data and fails with above error. [Solution] I think fsync() for each tablespace is not necessary. Like pg_basebackup -F p, I think fsync() is necessary only once at the end. Could you give me any comment? Regards, Ryohei Takahashi