Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)

2011-12-03 Thread Tomas Vondra
On 2.12.2011 09:16, Oleg Serov wrote:
 Hello!
 
 i've don't try to do reindex. There was enough space.

Not sure whether you tried to reindex or not. And what do you mean by
'there was enough space'? For example with ext2 (and ext3/ext4) it was
rather simple to exhaust inodes long before the device was actually
full. What filesystem are you using, anyway?

This seems like a I/O issue, you should check the hardware and the
settings (e.g. what caches are enabled etc.). Post more details, if
possible. Have you checked S.M.A.R.T. info from the drives?

 And i have a full data-directory backup, when i've stop server, before
 start.

Good. Have you moved it to a different machine? Otherwise you don't have
a backup, just a copy.

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)

2011-12-03 Thread Oleg Serov
I think, the main problem is that, postgres reads wrong xlog-s file.

I lunched strace to postgres process, and then i grep the log:
# cat /tmp/strace-log  | fgrep xlog
5546  stat(pg_xlog, {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
5546  stat(pg_xlog/archive_status, {st_mode=S_IFDIR|0700, st_size=4096,
...}) = 0
5546  open(pg_xlog/0001.history, O_RDONLY) = -1 ENOENT (No such file
or directory)
5546  open(pg_xlog/0001000F0052, O_RDONLY) = 4
5546  open(pg_xlog/0001000F0052, O_RDONLY) = 4

There is only one file.

But i have a lot of files (i parsed filenames for more readable format):
0001001000D2tli:1   log:16  seg:210
0001001000EEtli:1   log:16  seg:238
0001001000D3tli:1   log:16  seg:211
0001001000E2tli:1   log:16  seg:226
0001001000D5tli:1   log:16  seg:213
0001001000E8tli:1   log:16  seg:232
0001001000F7tli:1   log:16  seg:247
0001001000DFtli:1   log:16  seg:223
0001001000DCtli:1   log:16  seg:220
0001001000E7tli:1   log:16  seg:231
0001001000EAtli:1   log:16  seg:234
0001001000D1tli:1   log:16  seg:209
0001001000DDtli:1   log:16  seg:221
0001001000F5tli:1   log:16  seg:245
0001001000E0tli:1   log:16  seg:224
0001001000EBtli:1   log:16  seg:235
0001001000D0tli:1   log:16  seg:208
0001001000F4tli:1   log:16  seg:244
0001001000F6tli:1   log:16  seg:246
0001001000D7tli:1   log:16  seg:215
0001001000DBtli:1   log:16  seg:219
0001001000E4tli:1   log:16  seg:228
0001001000DEtli:1   log:16  seg:222
0001001000E9tli:1   log:16  seg:233
0001001000D4tli:1   log:16  seg:212
0001001000D9tli:1   log:16  seg:217
0001001000F3tli:1   log:16  seg:243
0001001000E5tli:1   log:16  seg:229
0001001000DAtli:1   log:16  seg:218
0001001000ECtli:1   log:16  seg:236
0001001000D6tli:1   log:16  seg:214
0001001000EFtli:1   log:16  seg:239
0001001000E6tli:1   log:16  seg:230
0001001000E1tli:1   log:16  seg:225
0001001000F0tli:1   log:16  seg:240
0001001000D8tli:1   log:16  seg:216
0001001000CFtli:1   log:16  seg:207
0001001000EDtli:1   log:16  seg:237
0001001000E3tli:1   log:16  seg:227
0001001000F1tli:1   log:16  seg:241
0001001000F2tli:1   log:16  seg:242
0001001000F8tli:1   log:16  seg:248

So, main problem i think, is that pg_control file is corrupted (i guess).

So xlogreset-n sais:
bash-3.2$ pg_resetxlog -n /var/lib/pgsql/data
could not change directory to /root
pg_control values:

*First log file ID after reset:16*
First log file segment after reset:   249
pg_control version number:843
Catalog version number:   200904091
Database system identifier:   5592178670599662815
Latest checkpoint's TimeLineID:   1
Latest checkpoint's NextXID:  0/7760685
Latest checkpoint's NextOID:  2556003
Latest checkpoint's NextMultiXactId:  3925
Latest checkpoint's NextMultiOffset:  7901
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value

*And main question, how to force 

Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)

2011-12-02 Thread Oleg Serov
Hello!

i've don't try to do reindex. There was enough space.

And i have a full data-directory backup, when i've stop server, before
start.


2011/12/2 Venkat Balaji venkat.bal...@verse.in


 2011/12/2 Oleg Serov sero...@gmail.com

 And, i'm an idiot.

 My DB version:
 PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit



 2011/12/2 Oleg Serov sero...@gmail.com

 Hello, i have a problem.

 I've got a production server, working fine. Then i've got strange error:
  ERROR:  right sibling's left-link doesn't match: block 147 links to
 407 instead of expected 146 in index order_status_key'
 And decidet to backup all server. So i shut-down VPS with server and
 backup all data.
 Then, after i booted it - and then - i've got Data loss.


 This seems to be an Index corruption. Did you try re-indexing ? Index
 creation might have failed, re-indexing would re-organize the Index tuples.
 If you are sure about disk corruption, try and re-create or create
 concurrent Index on a different disk.







 I've lost data, that have been written to DB around 10-100 hours
 (different tables, have different last updated value).

 Then i've analyzed log, and found this:
 7 days ago appears this errors:
 db= LOG:  could not rename temporary statistics file
 pg_stat_tmp/pgstat.tmp to pg_stat_tmp/pgstat.stat:
 db= WARNING:  pgstat wait timeout
  ERROR:  missing chunk number 0 for toast value 2550017 in pg_toast_17076


 This should be a free space issue, do you have enough space in
 pg_stat_tmp disk ?





 5 days ago:
 a lot of: ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
 flushed only to F/526512E0
   83238 db= WARNING:  could not write block 54 of base/16384/2619
   83239 db= CONTEXT:  writing block 54 of relation base/16384/2619

 And today:
  18 db= LOG:  could not open file pg_xlog/0001000F0052
 (log file 15, segment 82):
  19 db= ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
 flushed only to F/52FDF0E0


 There is any ability to recover fresh data from database?


 What kind of backups you have available ?





 Thanks
 VB




-- 
С уважением

Олег


Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)

2011-12-02 Thread Craig Ringer

On 12/02/2011 09:08 AM, Oleg Serov wrote:

Then i've analyzed log, and found this:
7 days ago appears this errors:
db= LOG:  could not rename temporary statistics file 
pg_stat_tmp/pgstat.tmp to pg_stat_tmp/pgstat.stat:

db= WARNING:  pgstat wait timeout
 ERROR:  missing chunk number 0 for toast value 2550017 in pg_toast_17076


Now that you've taken a file-level backup (hopefully copied to a 
different computer), do you think it might be worth doing an fsck of the 
file system? I'm wondering if your underlying storage has been doing 
something dodgy.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)

2011-12-01 Thread Oleg Serov
Hello, i have a problem.

I've got a production server, working fine. Then i've got strange error:
 ERROR:  right sibling's left-link doesn't match: block 147 links to 407
instead of expected 146 in index order_status_key'
And decidet to backup all server. So i shut-down VPS with server and backup
all data.
Then, after i booted it - and then - i've got Data loss.

I've lost data, that have been written to DB around 10-100 hours (different
tables, have different last updated value).

Then i've analyzed log, and found this:
7 days ago appears this errors:
db= LOG:  could not rename temporary statistics file
pg_stat_tmp/pgstat.tmp to pg_stat_tmp/pgstat.stat:
db= WARNING:  pgstat wait timeout
 ERROR:  missing chunk number 0 for toast value 2550017 in pg_toast_17076

5 days ago:
a lot of: ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
flushed only to F/526512E0
  83238 db= WARNING:  could not write block 54 of base/16384/2619
  83239 db= CONTEXT:  writing block 54 of relation base/16384/2619

And today:
 18 db= LOG:  could not open file pg_xlog/0001000F0052
(log file 15, segment 82):
 19 db= ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
flushed only to F/52FDF0E0

There is any ability to recover fresh data from database?

Thanks!


Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)

2011-12-01 Thread Oleg Serov
And, i'm an idiot.

My DB version:
PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit


2011/12/2 Oleg Serov sero...@gmail.com

 Hello, i have a problem.

 I've got a production server, working fine. Then i've got strange error:
  ERROR:  right sibling's left-link doesn't match: block 147 links to 407
 instead of expected 146 in index order_status_key'
 And decidet to backup all server. So i shut-down VPS with server and
 backup all data.
 Then, after i booted it - and then - i've got Data loss.

 I've lost data, that have been written to DB around 10-100 hours
 (different tables, have different last updated value).

 Then i've analyzed log, and found this:
 7 days ago appears this errors:
 db= LOG:  could not rename temporary statistics file
 pg_stat_tmp/pgstat.tmp to pg_stat_tmp/pgstat.stat:
 db= WARNING:  pgstat wait timeout
  ERROR:  missing chunk number 0 for toast value 2550017 in pg_toast_17076

 5 days ago:
 a lot of: ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
 flushed only to F/526512E0
   83238 db= WARNING:  could not write block 54 of base/16384/2619
   83239 db= CONTEXT:  writing block 54 of relation base/16384/2619

 And today:
  18 db= LOG:  could not open file pg_xlog/0001000F0052
 (log file 15, segment 82):
  19 db= ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
 flushed only to F/52FDF0E0

 There is any ability to recover fresh data from database?

 Thanks!





-- 
С уважением

Олег


Re: [GENERAL] Postgresql + corrupted disk = data loss. (Need help for database recover)

2011-12-01 Thread Venkat Balaji
2011/12/2 Oleg Serov sero...@gmail.com

 And, i'm an idiot.

 My DB version:
 PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit



 2011/12/2 Oleg Serov sero...@gmail.com

 Hello, i have a problem.

 I've got a production server, working fine. Then i've got strange error:
  ERROR:  right sibling's left-link doesn't match: block 147 links to 407
 instead of expected 146 in index order_status_key'
 And decidet to backup all server. So i shut-down VPS with server and
 backup all data.
 Then, after i booted it - and then - i've got Data loss.


This seems to be an Index corruption. Did you try re-indexing ? Index
creation might have failed, re-indexing would re-organize the Index tuples.
If you are sure about disk corruption, try and re-create or create
concurrent Index on a different disk.


 I've lost data, that have been written to DB around 10-100 hours
 (different tables, have different last updated value).

 Then i've analyzed log, and found this:
 7 days ago appears this errors:
 db= LOG:  could not rename temporary statistics file
 pg_stat_tmp/pgstat.tmp to pg_stat_tmp/pgstat.stat:
 db= WARNING:  pgstat wait timeout
  ERROR:  missing chunk number 0 for toast value 2550017 in pg_toast_17076


This should be a free space issue, do you have enough space in
pg_stat_tmp disk ?

5 days ago:
 a lot of: ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
 flushed only to F/526512E0
   83238 db= WARNING:  could not write block 54 of base/16384/2619
   83239 db= CONTEXT:  writing block 54 of relation base/16384/2619

 And today:
  18 db= LOG:  could not open file pg_xlog/0001000F0052
 (log file 15, segment 82):
  19 db= ERROR:  xlog flush request F/DC1A22D8 is not satisfied ---
 flushed only to F/52FDF0E0


 There is any ability to recover fresh data from database?


What kind of backups you have available ?


Thanks
VB