On Fri, Jul 3, 2015 at 2:20 PM, Martijn van Oosterhout
<klep...@svana.org> wrote:
> On Fri, Jul 03, 2015 at 12:21:02AM +0200, Andres Freund wrote:
>> Hi,
>>
>> On 2015-07-03 00:05:24 +0200, Martijn van Oosterhout wrote:
>> > === Start with an empty database
>>
>> My guess is you have wal_level = minimal?
>
> Default config, was just initdb'd. So yes, the default wal_level =
> minimal.
>
>> > === Note the index file is 8KB.
>> > === At this point nuke the database server (in this case it was simply
>> > === destroying the container it was running in.
>>
>> How did you continue from there? The container has persistent storage?
>> Or are you repapplying the WAL to somewhere else?
>
> The container has persistant storage on the host. What I think is
> actually unusual is that the script that started postgres was missing
> an 'exec" so postgres never gets the signal to shutdown.
>
>> > martijn@martijn-jessie:$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p 
>> > /data/postgres/pg_xlog/ 000000010000000000000001 |grep -wE 
>> > '16389|16387|16393'
>> > rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn: 
>> > 0/016A9240, prev 0/016A9200, bkp: 0000, desc: checkpoint: redo 0/16A9240; 
>> > tli 1; prev tli 1; fpw true; xid 0/686; oid 16387; multi 1; offset 0; 
>> > oldest xid 673 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; 
>> > shutdown
>> > rmgr: Storage     len (rec/tot):     16/    48, tx:          0, lsn: 
>> > 0/016A92D0, prev 0/016A92A8, bkp: 0000, desc: file create: base/16385/16387
>> > rmgr: Sequence    len (rec/tot):    158/   190, tx:        686, lsn: 
>> > 0/016B5E50, prev 0/016B5D88, bkp: 0000, desc: log: rel 1663/16385/16387
>> > rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 
>> > 0/016B5F10, prev 0/016B5E50, bkp: 0000, desc: file create: base/16385/16389
>> > rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 
>> > 0/016BB028, prev 0/016BAFD8, bkp: 0000, desc: file create: base/16385/16393
>> > rmgr: Sequence    len (rec/tot):    158/   190, tx:        686, lsn: 
>> > 0/016BE4F8, prev 0/016BE440, bkp: 0000, desc: log: rel 1663/16385/16387
>> > rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 
>> > 0/016BE6B0, prev 0/016BE660, bkp: 0000, desc: file truncate: 
>> > base/16385/16389 to 0 blocks
>> > rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 
>> > 0/016BE6E0, prev 0/016BE6B0, bkp: 0000, desc: file truncate: 
>> > base/16385/16393 to 0 blocks
>> > pg_xlogdump: FATAL:  error in WAL record at 0/16BE710: record with zero 
>> > length at 0/16BE740
>>
>> Note that the truncate will lead to a new, different, relfilenode.
>
> Really? Comparing the relfilenodes gives the same values before and
> after the truncate.

Yep, the relfilenodes are not changed in this case because CREATE TABLE and
TRUNCATE were executed in the same transaction block.

>> > ctmp=# select * from test;
>> > ERROR:  could not read block 0 in file "base/16385/16393": read only 0 of 
>> > 8192 bytes
>>
>> Hm. I can't reproduce this. Can you include a bit more details about how
>> to reproduce?
>
> Hmm, for me it is 100% reproducable. Are you familiar with Docker? I
> can probably construct a Dockerfile that reproduces it pretty reliably.

I could reproduce the problem in the master branch by doing
the following steps.

1. start the PostgreSQL server with wal_level = minimal
2. execute the following SQL statements
     begin;
     create table test(id serial primary key);
     truncate table test;
     commit;
3. shutdown the server with immediate mode
4. restart the server (crash recovery occurs)
5. execute the following SQL statement
    select * from test;

The optimization of TRUNCATE opereation that we can use when
CREATE TABLE and TRUNCATE are executed in the same transaction block
seems to cause the problem. In this case, only index file truncation is
logged, and index creation in btbuild() is not logged because wal_level
is minimal. Then at the subsequent crash recovery, index file is truncated
to 0 byte... Very simple fix is to log an index creation in that case,
but not sure if that's ok to do..

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to