Moved to -hackers. On Sat, Jul 29, 2017 at 4:35 AM, Scott Milliken <sc...@deltaex.com> wrote: > Thank you Masahiko! I've tested and confirmed that this patch fixes the > problem. >
Thank you for the testing. This issue should be added to the open item since this cause of the server crash. I'll add it. > On Fri, Jul 28, 2017 at 3:07 AM, Masahiko Sawada <sawada.m...@gmail.com> > wrote: >> >> On Mon, Jul 24, 2017 at 4:22 PM, <sc...@deltaex.com> wrote: >> > The following bug has been logged on the website: >> > >> > Bug reference: 14758 >> > Logged by: Scott Milliken >> > Email address: sc...@deltaex.com >> > PostgreSQL version: 10beta2 >> > Operating system: Linux 4.10.0-27-generic #30~16.04.2-Ubuntu S >> > Description: >> > >> > I'm testing logical replication on 10beta2, and found a segfault that I >> > can >> > reliably reproduce with an index on a not-actually immutable function. >> > >> > Here's the function in question: >> > >> > ``` >> > CREATE OR REPLACE FUNCTION public.immutable_random(integer) >> > RETURNS double precision >> > LANGUAGE sql >> > IMMUTABLE >> > AS $function$SELECT random(); >> > $function$; >> > ``` >> > >> > It's not actually immutable since it's calling random (a hack to get an >> > efficient random sort on a table). >> > >> > (Aside: I'd understand if it errored on creation of the index, but would >> > really prefer to keep using this instead of tablesample because it's >> > fast, >> > deterministic, and doesn't have sampling biases like the SYSTEM >> > sampling.) >> > >> > >> > Here's full reproduction instructions: >> > >> > >> > Primary: >> > ``` >> > mkdir -p /tmp/test-seg0 >> > PGPORT=5301 initdb -D /tmp/test-seg0 >> > echo "wal_level = logical" >> /tmp/test-seg0/postgresql.conf >> > PGPORT=5301 pg_ctl -D /tmp/test-seg0 start >> > for (( ; ; )); do if pg_isready -d postgres -p 5301; then break; fi; >> > sleep >> > 1; done >> > psql -p 5301 postgres -c "CREATE USER test WITH PASSWORD 'test' >> > SUPERUSER >> > CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;" >> > createdb -p 5301 -E utf8 test >> > >> > psql -p 5301 -U test test -c "CREATE TABLE testtbl (id int, name text);" >> > psql -p 5301 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT >> > testtbl_pkey PRIMARY KEY (id);" >> > psql -p 5301 -U test test -c "CREATE PUBLICATION testpub FOR TABLE >> > testtbl;" >> > psql -p 5301 -U test test -c "INSERT INTO testtbl (id, name) VALUES (1, >> > 'a');" >> > ``` >> > >> > Secondary: >> > ``` >> > mkdir -p /tmp/test-seg1 >> > PGPORT=5302 initdb -D /tmp/test-seg1 >> > PGPORT=5302 pg_ctl -D /tmp/test-seg1 start >> > for (( ; ; )); do if pg_isready -d postgres -p 5302; then break; fi; >> > sleep >> > 1; done >> > psql -p 5302 postgres -c "CREATE USER test WITH PASSWORD 'test' >> > SUPERUSER >> > CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;" >> > createdb -p 5302 -E utf8 test >> > >> > psql -p 5302 -U test test -c "CREATE TABLE testtbl (id int, name text);" >> > psql -p 5302 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT >> > testtbl_pkey PRIMARY KEY (id);" >> > psql -p 5302 -U test test -c 'CREATE FUNCTION >> > public.immutable_random(integer) RETURNS double precision LANGUAGE sql >> > IMMUTABLE AS $function$ SELECT random(); $function$' >> > psql -p 5302 -U test test -c "CREATE INDEX ix_testtbl_random ON testtbl >> > USING btree (immutable_random(id));" >> > psql -p 5302 -U test test -c "CREATE SUBSCRIPTION test0_testpub >> > CONNECTION >> > 'port=5301 user=test dbname=test' PUBLICATION testpub;" >> > ``` >> > >> > The secondary crashes with a segfault: >> > >> > ``` >> > 2017-07-23 23:55:37.961 PDT [4823] LOG: logical replication table >> > synchronization worker for subscription "test0_testpub", table "testtbl" >> > has started >> > 2017-07-23 23:55:38.244 PDT [4758] LOG: worker process: logical >> > replication >> > worker for subscription 16396 sync 16386 (PID 4823) was terminated by >> > signal >> > 11: Segmentation fault >> > 2017-07-23 23:55:38.244 PDT [4758] LOG: terminating any other active >> > server >> > processes >> > 2017-07-23 23:55:38.245 PDT [4763] WARNING: terminating connection >> > because >> > of crash of another server process >> > 2017-07-23 23:55:38.245 PDT [4763] DETAIL: The postmaster has commanded >> > this server process to roll back the current transaction and exit, >> > because >> > another server process exited >> > abnormally and possibly corrupted shared memory. >> > 2017-07-23 23:55:38.245 PDT [4763] HINT: In a moment you should be able >> > to >> > reconnect to the database and repeat your command. >> > 2017-07-23 23:55:38.247 PDT [4758] LOG: all server processes >> > terminated; >> > reinitializing >> > 2017-07-23 23:55:38.256 PDT [4826] LOG: database system was >> > interrupted; >> > last known up at 2017-07-23 23:55:36 PDT >> > 2017-07-23 23:55:38.809 PDT [4826] LOG: database system was not >> > properly >> > shut down; automatic recovery in progress >> > 2017-07-23 23:55:38.812 PDT [4826] LOG: redo starts at 0/173AEA0 >> > 2017-07-23 23:55:38.815 PDT [4826] LOG: invalid record length at >> > 0/17B50B0: >> > wanted 24, got 0 >> > 2017-07-23 23:55:38.815 PDT [4826] LOG: redo done at 0/17B5070 >> > 2017-07-23 23:55:38.815 PDT [4826] LOG: last completed transaction was >> > at >> > log time 2017-07-23 23:55:37.962957-07 >> > ``` >> > >> >> Thank you for the reporting and precise reproducing steps! >> I could reproduced this issue and it seems to me that the cause of >> this is that the table sync worker didn't get a snapshot before >> starting table copy. Attached patch fixes this problem. >> >> Regards, >> >> -- >> Masahiko Sawada >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION >> NTT Open Source Software Center > > Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers