Re: [HACKERS] 9.4 pg_control corruption

2014-07-27 Thread 李海龙

Hi,dear steven  pgsql-hackers

I've encountered the similar phenonmenon with 9.4 .



1.  environment

1.1 OS version

postgres@lhl-Latitude-E5420:~$ cat /etc/issue
Ubuntu 13.10 \n \l

postgres@lhl-Latitude-E5420:~$ uname -av
Linux lhl-Latitude-E5420 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 
16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


1.2 PostgreSQL version


postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_controldata --version
pg_controldata (PostgreSQL) 9.4beta2
postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_config
BINDIR = /opt/pg94/bin
DOCDIR = /opt/pg94/share/doc/postgresql
HTMLDIR = /opt/pg94/share/doc/postgresql
INCLUDEDIR = /opt/pg94/include
PKGINCLUDEDIR = /opt/pg94/include/postgresql
INCLUDEDIR-SERVER = /opt/pg94/include/postgresql/server
LIBDIR = /opt/pg94/lib
PKGLIBDIR = /opt/pg94/lib/postgresql
LOCALEDIR = /opt/pg94/share/locale
MANDIR = /opt/pg94/share/man
SHAREDIR = /opt/pg94/share/postgresql
SYSCONFDIR = /opt/pg94/etc/postgresql
PGXS = /opt/pg94/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg94' '--with-perl' '--with-libxml' 
'--with-libxslt' '--with-ossp-uuid'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed 
-Wl,-rpath,'/opt/pg94/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lxslt -lxml2 -lz -lreadline -lrt -lcrypt 
-ldl -lm
VERSION = PostgreSQL 9.4beta2


2.  phenonmenon

I have a PostgreSQL datadir named /export/pg94beta1_data/ which was 
initialized with PostgreSQL 9.4beta1,



postgres@lhl-Latitude-E5420:~$ /opt/pg94/bin/pg_controldata 
/export/pg94beta1_data/
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.

pg_control version number:937
Catalog version number:   201405111
Database system identifier:   6014427290583411360
Database cluster state:   in production
pg_control last modified: 2014年07月27日 星期日 16时36分50秒
Latest checkpoint location:   0/17462890
Prior checkpoint location:0/17462828
Latest checkpoint's REDO location:0/17462890
Latest checkpoint's REDO WAL file:00010017
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID:  0/1387
Latest checkpoint's NextOID:  0
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:715
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:2014年07月27日 星期日 16时36分50秒
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:minimal
Current wal_log_hints setting:off
Current max_connections setting:  100
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 65793
Date/time type storage:   floating-point numbers
Float4 argument passing:  by reference
Float8 argument passing:  by reference
Data page checksum version:   307500851



but the server complained about the following when I started it with 
PostgreSQL 9.4beta2,

postgres@lhl-Latitude-E5420:~$  /opt/pg94/bin/pg_ctl -D 
/export/pg94beta1_data/ start
server starting
postgres@lhl-Latitude-E5420:~$ [2014-07-27 19:23:57.922 CST 27983 
53d4e14d.6d4f 1 0]FATAL:  database files are incompatible with server
[2014-07-27 19:23:57.922 CST 27983 53d4e14d.6d4f 2 0]DETAIL: The 
database cluster was initialized with PG_CONTROL_VERSION 937, but the 
server was compiled with PG_CONTROL_VERSION 942.
[2014-07-27 19:23:57.922 CST 27983 53d4e14d.6d4f 3 0]HINT:  It looks 
like you need to initdb.




I always think that it should not come up the PG_CONTROL_VERSION 
mismatch when the PostgreSQL version upgrade between the small version .

Is there some important differences in 

Re: [HACKERS] 9.4 pg_control corruption

2014-07-27 Thread Tom Lane
=?utf-8?B?5p2O5rW36b6Z?= hailong...@qunar.com writes:
 I have a PostgreSQL datadir named /export/pg94beta1_data/ which was 
 initialized with PostgreSQL 9.4beta1,
 [ and 9.4beta2 won't start with it ]

This is expected; you need to initdb.  Or use pg_upgrade to upgrade
the cluster.  We had to change pg_control format post-beta1.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 pg_control corruption

2014-07-27 Thread 李海龙


Understand!

Before I wrote last email, I had initialized a new db with PostgreSQL 
9.4beta2 and restored the pg_dumpall data of /export/pg94beta1_data/


Thanks

Best Regards!

at 2014-07-28 00:35 +08, Tom Lane wrote:
 =?utf-8?B?5p2O5rW36b6Z?= hailong...@qunar.com writes:
 I have a PostgreSQL datadir named /export/pg94beta1_data/ which was
 initialized with PostgreSQL 9.4beta1,
 [ and 9.4beta2 won't start with it ]
 This is expected; you need to initdb.  Or use pg_upgrade to upgrade
 the cluster.  We had to change pg_control format post-beta1.

   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 pg_control corruption

2014-07-27 Thread Josh Berkus
On 07/27/2014 09:35 AM, Tom Lane wrote:
 =?utf-8?B?5p2O5rW36b6Z?= hailong...@qunar.com writes:
 I have a PostgreSQL datadir named /export/pg94beta1_data/ which was 
 initialized with PostgreSQL 9.4beta1,
 [ and 9.4beta2 won't start with it ]
 
 This is expected; you need to initdb.  Or use pg_upgrade to upgrade
 the cluster.  We had to change pg_control format post-beta1.

Thank you for testing that though!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.4 pg_control corruption

2014-07-08 Thread Steve Singer
I've encountered a corrupt pg_control  file on my 9.4 development 
cluster.  I've mostly been using the cluster for changeset extraction / 
slony testing.


This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change 
discussed in another thread) but would have had the initdb done with an 
earlier 9.4 snapshot.




/usr/local/pgsql94wal/bin$ ./pg_controldata ../data
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.

pg_control version number:937
Catalog version number:   201405111
Database system identifier:   6014096177254975326
Database cluster state:   in production
pg_control last modified: Tue 08 Jul 2014 06:15:58 PM EDT
Latest checkpoint location:   5/44DC5FC8
Prior checkpoint location:5/44C58B88
Latest checkpoint's REDO location:5/44DC5FC8
Latest checkpoint's REDO WAL file:000100050044
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/1558590
Latest checkpoint's NextOID:  505898
Latest checkpoint's NextMultiXactId:  3285
Latest checkpoint's NextMultiOffset:  6569
Latest checkpoint's oldestXID:1281
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:Tue 08 Jul 2014 06:15:23 PM EDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:logical
Current wal_log_hints setting:off
Current max_connections setting:  200
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 65793
Date/time type storage:   floating-point numbers
Float4 argument passing:  by reference
Float8 argument passing:  by reference
Data page checksum version:   2602751502
ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$


Before this postgres crashed, and seemed to have problems recovering. I 
might have hit CTRL-C but I didn't do anything drastic like issue a kill -9.



test1 [unknown] 2014-07-08 18:15:18.986 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:20.482 EDTWARNING:  terminating 
connection because of crash of another server process
test1 [unknown] 2014-07-08 18:15:20.482 EDTDETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and 
exit, because another server process exited abnormally and possibly 
corrupted shared memory.
test1 [unknown] 2014-07-08 18:15:20.482 EDTHINT:  In a moment you should 
be able to reconnect to the database and repeat your command.
  2014-07-08 18:15:20.483 EDTLOG:  all server processes terminated; 
reinitializing
  2014-07-08 18:15:20.720 EDTLOG:  database system was interrupted; 
last known up at 2014-07-08 18:15:15 EDT
  2014-07-08 18:15:20.865 EDTLOG:  database system was not properly 
shut down; automatic recovery in progress

  2014-07-08 18:15:20.954 EDTLOG:  redo starts at 5/41023848
  2014-07-08 18:15:23.153 EDTLOG:  unexpected pageaddr 4/D8DC6000 in 
log segment 000100050044, offset 14442496

  2014-07-08 18:15:23.153 EDTLOG:  redo done at 5/44DC5F60
  2014-07-08 18:15:23.153 EDTLOG:  last completed transaction was at 
log time 2014-07-08 18:15:17.874937-04
test2 [unknown] 2014-07-08 18:15:24.247 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:24.772 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.281 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.547 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.548 EDTFATAL:  the database system 
is in recovery mode
test3 [unknown] 2014-07-08 18:15:25.549 EDTFATAL:  the database system 
is in recovery mode
test4 [unknown] 2014-07-08 18:15:25.557 EDTFATAL:  the database system 
is in recovery mode
test5 [unknown] 2014-07-08 18:15:25.582 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.584 EDTFATAL:  the database system 
is in 

Re: [HACKERS] 9.4 pg_control corruption

2014-07-08 Thread Tom Lane
Steve Singer st...@ssinger.info writes:
 I've encountered a corrupt pg_control  file on my 9.4 development 
 cluster.  I've mostly been using the cluster for changeset extraction / 
 slony testing.

 This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change 
 discussed in another thread) but would have had the initdb done with an 
 earlier 9.4 snapshot.

Somehow or other you missed the update to pg_control version number 942.
There's no obvious reason to think that this pg_control file is corrupt
on its own terms, but the pg_controldata version you're using expects
the 942 layout.  The fact that the server wasn't complaining about this
suggests that you've not recompiled the server, or at least not xlog.c.
Possibly the odd failure to restart indicates that you have a partially
updated server executable?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 pg_control corruption

2014-07-08 Thread Steve Singer

On 07/08/2014 10:14 PM, Tom Lane wrote:

Steve Singer st...@ssinger.info writes:

I've encountered a corrupt pg_control  file on my 9.4 development
cluster.  I've mostly been using the cluster for changeset extraction /
slony testing.
This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change
discussed in another thread) but would have had the initdb done with an
earlier 9.4 snapshot.

Somehow or other you missed the update to pg_control version number 942.
There's no obvious reason to think that this pg_control file is corrupt
on its own terms, but the pg_controldata version you're using expects
the 942 layout.  The fact that the server wasn't complaining about this
suggests that you've not recompiled the server, or at least not xlog.c.
Possibly the odd failure to restart indicates that you have a partially
updated server executable?



The server  is complaining about that, it started to after the crash 
(which is why I ran pg_controldata)


ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ ./postgres -D ../data
  2014-07-08 22:28:57.796 EDTFATAL:  database files are incompatible 
with server
  2014-07-08 22:28:57.796 EDTDETAIL:  The database cluster was 
initialized with PG_CONTROL_VERSION 937, but the server was compiled 
with PG_CONTROL_VERSION 942.

  2014-07-08 22:28:57.796 EDTHINT:  It looks like you need to initdb.
ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$


The server seemed fine (and it was 9.4 because I was using 9.4 features)
The server crashed
The server performed crash recovery
The server server wouldn't start and pg_controldata shows the attached 
output


I wasn't recompiling or reinstalling around this time either.




regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 pg_control corruption

2014-07-08 Thread Tom Lane
Steve Singer st...@ssinger.info writes:
 On 07/08/2014 10:14 PM, Tom Lane wrote:
 There's no obvious reason to think that this pg_control file is corrupt
 on its own terms, but the pg_controldata version you're using expects
 the 942 layout.  The fact that the server wasn't complaining about this
 suggests that you've not recompiled the server, or at least not xlog.c.

 The server  is complaining about that, it started to after the crash 

Then you updated your sources, recompiled and reinstalled, but failed to
restart the server when you did that.  Else it would have complained on
the spot.

If you had any valuable data in the installation, we could talk about how
to get it out; but since you didn't I'd suggest just re-initdb and move
on.  I don't see anything unexpected here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers