On 12/11/2013 09:44 AM, Erikjan Rijkers wrote:
I don't know whether the below constitutes a bug, but:

Daily (sometimes even more often) I recompile 9.4devel (after git pull)  to run 
a large dev database (100 GB or so).

To avoid frequent initdb and many-hour-restore of data, I do this only when the 
following two #defines are unchanged:
   CATALOG_VERSION_NO in src/include/catalog/catversion.h, and
   PG_CONTROL_VERSION in src/include/catalog/pg_control.h

the goal being to always run the latest db, without having to reload the ~100 
GB dev db unexpectedly at inconvenient times.

Generally, this works OK.

However, the last few weeks I sometimes get, after such recompiling,  'invalid 
magic number' errors from which I don't know
how to recover (it means, apparently, an initdb is needed and I have then to 
reload the database).


2013-12-11 00:15:25.627 CET 25304 LOG:  received smart shutdown request
2013-12-11 00:15:25.631 CET 25306 LOG:  shutting down
2013-12-11 00:15:25.904 CET 25306 LOG:  database system is shut down
2013-12-11 08:11:59.858 CET 25490 LOG:  database system was shut down at 
2013-12-11 00:15:25 CET
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid magic number D078 in log 
segment 000000010000006300000034, offset 0
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid primary checkpoint record
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid magic number D078 in log 
segment 000000010000006300000034, offset 0
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid secondary checkpoint record
2013-12-11 08:11:59.901 CET 25490 PANIC:  could not locate a valid checkpoint 
record
2013-12-11 08:12:00.326 CET 25492 FATAL:  the database system is starting up
2013-12-11 08:12:01.328 CET 25493 FATAL:  the database system is starting up
2013-12-11 08:12:01.682 CET 25489 LOG:  startup process (PID 25490) was 
terminated by signal 6: Aborted
2013-12-11 08:12:01.682 CET 25489 LOG:  aborting startup due to startup process 
failure


My question is two-fold:

1. (general:)  is this 'invalid magic number' unexpected, and should it be 
reported always?

The magic number it's complaining about is the constant stored in the WAL logs, that acts as a version number. Think of it as the CATALOG_VERSION_NO, but for the WAL log format. It's bumped whenever the WAL format changes.

2. (for my setup specifically:)  is there any way that I can recognize, 
beforehand, at the code base level, such an
impending 'invalid magic number' state?

Keep an eye on the XLOG_PAGE_MAGIC constaint in xlog_internal.h.

Can de db be recovered from easily? (although this dev database is expendable, 
it takes many hours to rebuild; I'd like to
avoid that if possible).

Yes, you can use pg_resetxlog to recover.

BTW, you could also use pg_upgrade. That should work also when catversion has been bumped.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to