Hello all, pg_rewind throws the following error when there is a file of large size available in the Slave server's data directory.
unexpected result while sending file list: ERROR: value "2148000000" is out of range for type integer CONTEXT: COPY fetchchunks, line 2402, column begin: "2148000000" How to reproduce ---------------------------- 1. Set up replication between Server A(master) and Server B(slave) 2. Promote the slave server(Server B ) 3. Stop the old master (Server A) 4. Create a large file in the newly promoted master's (Server B) data directory using the below command dd if=/dev/zero of=large.file bs=1024 count=4000000 [root@localhost data]# dd if=/dev/zero of=large.file bs=1024 count=4000000 4000000+0 records in 4000000+0 records out 4096000000 bytes (4.1 GB) copied, 8.32263 s, 492 MB/s 5. Execute pg_rewind command from old master(server A) ./pg_rewind -D /home/enterprisedb/master/ --debug --progress --source-server="port=5661 user=enterprisedb dbname=edb" IMHO, it seems to be a bug in pg_rewind. As mentioned in pg_rewind documentation, there are few files which are copied in whole. "Copy all other files such as pg_xact and configuration files from the source cluster to the target cluster (everything except the relation files)." -- https://www.postgresql.org/docs/devel/static/app-pgrewind.html Those files are copied in max CHUNKSIZE(default 1000000) bytes at a time. In the process, pg_rewind creates a table with the following schema and loads information about blocks that need to be copied. CREATE TEMPORARY TABLE fetchchunks(path text, begin int4, len int4); postgres=# select * from fetchchunks where begin != 0; path | begin | len -----------------------------------------+----------+--------- pg_wal/000000010000000000000002 | 1000000 | 1000000 pg_wal/000000010000000000000002 | 2000000 | 1000000 pg_wal/000000010000000000000002 | 3000000 | 1000000 pg_wal/000000010000000000000002 | 4000000 | 1000000 ...... and so on. The range for begin is between -2147483648 to +2147483647. For a 4GB file, begin definitely goes beyond 2147483647 and it throws the following error: unexpected result while sending file list: ERROR: value "2148000000" is out of range for type integer CONTEXT: COPY fetchchunks, line 2659, column begin: "2148000000" I guess we've to change the data type to bigint. Also, we need some implementation of ntohl() for 8-byte data types. I've attached a script to reproduce the error and a draft patch. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
standby-server-setup.sh
Description: Bourne shell script
fix_copying_large_file_pg_rewind_v1.patch
Description: application/download
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers