Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)
I think, the main problem is that, postgres reads wrong xlog-s file. I lunched strace to postgres process, and then i grep the log: # cat /tmp/strace-log | fgrep xlog 5546 stat("pg_xlog", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0 5546 stat("pg_xlog/archive_status", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0 5546 open("pg_xlog/0001.history", O_RDONLY) = -1 ENOENT (No such file or directory) 5546 open("pg_xlog/0001000F0052", O_RDONLY) = 4 5546 open("pg_xlog/0001000F0052", O_RDONLY) = 4 There is only one file. But i have a lot of files (i parsed filenames for more readable format): 0001001000D2tli:1 log:16 seg:210 0001001000EEtli:1 log:16 seg:238 0001001000D3tli:1 log:16 seg:211 0001001000E2tli:1 log:16 seg:226 0001001000D5tli:1 log:16 seg:213 0001001000E8tli:1 log:16 seg:232 0001001000F7tli:1 log:16 seg:247 0001001000DFtli:1 log:16 seg:223 0001001000DCtli:1 log:16 seg:220 0001001000E7tli:1 log:16 seg:231 0001001000EAtli:1 log:16 seg:234 0001001000D1tli:1 log:16 seg:209 0001001000DDtli:1 log:16 seg:221 0001001000F5tli:1 log:16 seg:245 0001001000E0tli:1 log:16 seg:224 0001001000EBtli:1 log:16 seg:235 0001001000D0tli:1 log:16 seg:208 0001001000F4tli:1 log:16 seg:244 0001001000F6tli:1 log:16 seg:246 0001001000D7tli:1 log:16 seg:215 0001001000DBtli:1 log:16 seg:219 0001001000E4tli:1 log:16 seg:228 0001001000DEtli:1 log:16 seg:222 0001001000E9tli:1 log:16 seg:233 0001001000D4tli:1 log:16 seg:212 0001001000D9tli:1 log:16 seg:217 0001001000F3tli:1 log:16 seg:243 0001001000E5tli:1 log:16 seg:229 0001001000DAtli:1 log:16 seg:218 0001001000ECtli:1 log:16 seg:236 0001001000D6tli:1 log:16 seg:214 0001001000EFtli:1 log:16 seg:239 0001001000E6tli:1 log:16 seg:230 0001001000E1tli:1 log:16 seg:225 0001001000F0tli:1 log:16 seg:240 0001001000D8tli:1 log:16 seg:216 0001001000CFtli:1 log:16 seg:207 0001001000EDtli:1 log:16 seg:237 0001001000E3tli:1 log:16 seg:227 0001001000F1tli:1 log:16 seg:241 0001001000F2tli:1 log:16 seg:242 0001001000F8tli:1 log:16 seg:248 So, main problem i think, is that pg_control file is corrupted (i guess). So xlogreset-n sais: bash-3.2$ pg_resetxlog -n /var/lib/pgsql/data could not change directory to "/root" pg_control values: *First log file ID after reset:16* First log file segment after reset: 249 pg_control version number:843 Catalog version number: 200904091 Database system identifier: 5592178670599662815 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 0/7760685 Latest checkpoint's NextOID: 2556003 Latest checkpoint's NextMultiXactId: 3925 Latest checkpoint's NextMultiOffset: 7901 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment:16777216 Maximum length of identifiers:64 Maximum columns in an index: 32 Maximum size of a TOAST chunk:1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value *And main question, how to fo
Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)
On 2.12.2011 09:16, Oleg Serov wrote: > Hello! > > i've don't try to do reindex. There was enough space. Not sure whether you tried to reindex or not. And what do you mean by 'there was enough space'? For example with ext2 (and ext3/ext4) it was rather simple to exhaust inodes long before the device was actually full. What filesystem are you using, anyway? This seems like a I/O issue, you should check the hardware and the settings (e.g. what caches are enabled etc.). Post more details, if possible. Have you checked S.M.A.R.T. info from the drives? > And i have a full data-directory backup, when i've stop server, before > start. Good. Have you moved it to a different machine? Otherwise you don't have a backup, just a copy. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)
On 12/02/2011 09:08 AM, Oleg Serov wrote: Then i've analyzed log, and found this: 7 days ago appears this errors: db= LOG: could not rename temporary statistics file "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat": db= WARNING: pgstat wait timeout ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076 Now that you've taken a file-level backup (hopefully copied to a different computer), do you think it might be worth doing an fsck of the file system? I'm wondering if your underlying storage has been doing something dodgy. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)
Hello! i've don't try to do reindex. There was enough space. And i have a full data-directory backup, when i've stop server, before start. 2011/12/2 Venkat Balaji > > 2011/12/2 Oleg Serov > >> And, i'm an idiot. >> >> My DB version: >> PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) >> 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit >> >> >> >> 2011/12/2 Oleg Serov >> >>> Hello, i have a problem. >>> >>> I've got a production server, working fine. Then i've got strange error: >>> > ERROR: right sibling's left-link doesn't match: block 147 links to >>> 407 instead of expected 146 in index "order_status_key"' >>> And decidet to backup all server. So i shut-down VPS with server and >>> backup all data. >>> Then, after i booted it - and then - i've got Data loss. >>> >> > This seems to be an Index corruption. Did you try re-indexing ? Index > creation might have failed, re-indexing would re-organize the Index tuples. > If you are sure about disk corruption, try and "re-create" or "create > concurrent Index" on a different disk. > > > >> I've lost data, that have been written to DB around 10-100 hours >>> (different tables, have different last updated value). >>> >>> Then i've analyzed log, and found this: >>> 7 days ago appears this errors: >>> db= LOG: could not rename temporary statistics file >>> "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat": >>> db= WARNING: pgstat wait timeout >>> ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076 >> >> > This should be a free space issue, do you have enough space in > "pg_stat_tmp" disk ? > > > 5 days ago: >>> a lot of: ERROR: xlog flush request F/DC1A22D8 is not satisfied --- >>> flushed only to F/526512E0 >>> 83238 db= WARNING: could not write block 54 of base/16384/2619 >>> 83239 db= CONTEXT: writing block 54 of relation base/16384/2619 >>> >> And today: >>> 18 db= LOG: could not open file "pg_xlog/0001000F0052" >>> (log file 15, segment 82): >>> 19 db= ERROR: xlog flush request F/DC1A22D8 is not satisfied --- >>> flushed only to F/52FDF0E0 >>> >> >>> There is any ability to recover fresh data from database? >>> >> > What kind of backups you have available ? > > > > Thanks > VB > -- С уважением Олег
Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)
2011/12/2 Oleg Serov > And, i'm an idiot. > > My DB version: > PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) > 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit > > > > 2011/12/2 Oleg Serov > >> Hello, i have a problem. >> >> I've got a production server, working fine. Then i've got strange error: >> > ERROR: right sibling's left-link doesn't match: block 147 links to 407 >> instead of expected 146 in index "order_status_key"' >> And decidet to backup all server. So i shut-down VPS with server and >> backup all data. >> Then, after i booted it - and then - i've got Data loss. >> > This seems to be an Index corruption. Did you try re-indexing ? Index creation might have failed, re-indexing would re-organize the Index tuples. If you are sure about disk corruption, try and "re-create" or "create concurrent Index" on a different disk. > I've lost data, that have been written to DB around 10-100 hours >> (different tables, have different last updated value). >> >> Then i've analyzed log, and found this: >> 7 days ago appears this errors: >> db= LOG: could not rename temporary statistics file >> "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat": >> db= WARNING: pgstat wait timeout >> ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076 > > This should be a free space issue, do you have enough space in "pg_stat_tmp" disk ? 5 days ago: >> a lot of: ERROR: xlog flush request F/DC1A22D8 is not satisfied --- >> flushed only to F/526512E0 >> 83238 db= WARNING: could not write block 54 of base/16384/2619 >> 83239 db= CONTEXT: writing block 54 of relation base/16384/2619 >> > And today: >> 18 db= LOG: could not open file "pg_xlog/0001000F0052" >> (log file 15, segment 82): >> 19 db= ERROR: xlog flush request F/DC1A22D8 is not satisfied --- >> flushed only to F/52FDF0E0 >> > >> There is any ability to recover fresh data from database? >> > What kind of backups you have available ? Thanks VB
Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)
And, i'm an idiot. My DB version: PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit 2011/12/2 Oleg Serov > Hello, i have a problem. > > I've got a production server, working fine. Then i've got strange error: > > ERROR: right sibling's left-link doesn't match: block 147 links to 407 > instead of expected 146 in index "order_status_key"' > And decidet to backup all server. So i shut-down VPS with server and > backup all data. > Then, after i booted it - and then - i've got Data loss. > > I've lost data, that have been written to DB around 10-100 hours > (different tables, have different last updated value). > > Then i've analyzed log, and found this: > 7 days ago appears this errors: > db= LOG: could not rename temporary statistics file > "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat": > db= WARNING: pgstat wait timeout > ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076 > > 5 days ago: > a lot of: ERROR: xlog flush request F/DC1A22D8 is not satisfied --- > flushed only to F/526512E0 > 83238 db= WARNING: could not write block 54 of base/16384/2619 > 83239 db= CONTEXT: writing block 54 of relation base/16384/2619 > > And today: > 18 db= LOG: could not open file "pg_xlog/0001000F0052" > (log file 15, segment 82): > 19 db= ERROR: xlog flush request F/DC1A22D8 is not satisfied --- > flushed only to F/52FDF0E0 > > There is any ability to recover fresh data from database? > > Thanks! > > > -- С уважением Олег