Re: [GENERAL] Help - corruption issue?

2011-04-26 Thread Scott Marlowe
On Mon, Apr 25, 2011 at 8:50 PM, Phoenix Kiula phoenix.ki...@gmail.com wrote:
 On Tuesday, April 26, 2011, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 25.4.2011 18:16, Phoenix Kiula napsal(a):
 Sorry, spoke too soon.

 I can COPY individual chunks to files. Did that by year, and at least
 the dumping worked.

 Now I need to pull the data in at the destination server.

 If I COPY each individual file back into the table, it works. Slowly,
 but seems to work. I tried to combine all the files into one go, then
 truncate the table, and pull it all in in one go (130 million rows or
 so) but this time it gave the same error. However, it pointed out a
 specific row where the problem was:

 COPY links, line 15272357:
 16426447     9s2q7   9s2q7   N       
 http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;i...;
 server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.

 Is this any use at all?  Would appreciate any pointers!

 So the dump worked fina and it fails when loading it back into the DB?
 Have you checked the output file (just see the tail). Can you post the
 part that causes issues? Just the line 16426447 and few lines around.

 regards
 Tomas

 From the old server:
 Yearly COPY files worked. Pg_dumpall was giving problems.

 In the new server:
 COPY FROM worked. All files appear to have been copied. Then I create
 the primary key index, and another index. Many records are there, but
 many are not there! There's no error, just that some records/rows just
 didn't make it.

Are you sure you're getting all the data out of the source (broken)
database you think you are?  Are you sure those rows are in the dump?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-26 Thread Phoenix Kiula
On Tue, Apr 26, 2011 at 3:24 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Apr 25, 2011 at 8:50 PM, Phoenix Kiula phoenix.ki...@gmail.com 
 wrote:
 On Tuesday, April 26, 2011, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 25.4.2011 18:16, Phoenix Kiula napsal(a):
 Sorry, spoke too soon.

 I can COPY individual chunks to files. Did that by year, and at least
 the dumping worked.

 Now I need to pull the data in at the destination server.

 If I COPY each individual file back into the table, it works. Slowly,
 but seems to work. I tried to combine all the files into one go, then
 truncate the table, and pull it all in in one go (130 million rows or
 so) but this time it gave the same error. However, it pointed out a
 specific row where the problem was:

 COPY links, line 15272357:
 16426447     9s2q7   9s2q7   N       
 http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;i...;
 server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.

 Is this any use at all?  Would appreciate any pointers!

 So the dump worked fina and it fails when loading it back into the DB?
 Have you checked the output file (just see the tail). Can you post the
 part that causes issues? Just the line 16426447 and few lines around.

 regards
 Tomas

 From the old server:
 Yearly COPY files worked. Pg_dumpall was giving problems.

 In the new server:
 COPY FROM worked. All files appear to have been copied. Then I create
 the primary key index, and another index. Many records are there, but
 many are not there! There's no error, just that some records/rows just
 didn't make it.

 Are you sure you're getting all the data out of the source (broken)
 database you think you are?  Are you sure those rows are in the dump?



Actually I am not. Some rows are missing.

Will a COUNT(*) on the two databases -- old and new -- be sufficient
and reliable information about the number of rows that went AWOL?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-26 Thread Tomas Vondra
Dne 26.4.2011 04:50, Phoenix Kiula napsal(a):
 Tomas, the line where it crashed, here are the 10 or so lines around it:
 
  head -15272350 /backup/links/links_all.txt | tail -20

No, those lines are before the one that causes problems - line number is
15272357, and you've printed just 15272350 lines using head. Do this

$ head -15272367 /backup/links/links_all.txt | tail -20

That should give us 10 lines before, 10 lines after.

Tomas.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-26 Thread Tomas Vondra
Dne 26.4.2011 14:41, Phoenix Kiula napsal(a):
 On Tue, Apr 26, 2011 at 3:24 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 Are you sure you're getting all the data out of the source (broken)
 database you think you are?  Are you sure those rows are in the dump?
 
 
 
 Actually I am not. Some rows are missing.
 
 Will a COUNT(*) on the two databases -- old and new -- be sufficient
 and reliable information about the number of rows that went AWOL?

That should give us at least some idea if the copy worked. Have you
checked the postmaster.log (and kernel log in /var/log/messages) why the
new DB crashed when you do SELECT * FROM mytable LIMIT 1 (as TL
recommended yesterday)?

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-25 Thread Phoenix Kiula
On Fri, Apr 22, 2011 at 8:35 PM,  t...@fuzzy.cz wrote:
 On Fri, Apr 22, 2011 at 8:20 PM,  t...@fuzzy.cz wrote:
 On Fri, Apr 22, 2011 at 7:07 PM,  t...@fuzzy.cz wrote:
 In the pg_dumpall backup process, I get this error. Does this help?


 Well, not really - it's just another incarnation of the problem we've
 already seen. PostgreSQL reads the data, and at some point it finds out
 it
 needs to allocate 4294967293B of memory. Which is strange, because it's
 actually a negative number (-3 AFAIK).

 It's probably caused by data corruption (incorrect length for a field).

 There are ways to find out more about the cause, e.g. here:

 http://archives.postgresql.org/pgsql-hackers/2005-10/msg01198.php

 but you need to have a pg compiled with debug support. I guess the
 packaged version does not support that, but maybe you can get the
 sources
 and compile them on your own.

 If it really is a data corruption, you might try to locate the corrupted
 blocks like this:

 -- get number of blocks
 SELECT relpages FROM pg_class WHERE relname = 'table_name';

 -- get items for each block (read the problematic column)
 FOR block IN 1..relpages LOOP
 SELECT AVG(length(colname)) FROM table_name WHERE ctid =
 '(block,0)'::ctid AND ctid  '(block+1,0)'::ctid;


 Thanks for this. Very useful. What is this -- a function? How should I
 execute this query?

 It's a pseudocode - you need to implement that in whatever language you
 like. You could do that in PL/pgSQL but don't forget it's probably going
 to crash when you hit the problematic block so I'd probably implement that
 in outside the DB (with a logic to continue the loop once the connection
 dies).

 And 'ctid' is a pseudocolumn that means '(block#, row#)' i.e. it's
 something like a physical location of the row.

 regards
 Tomas



A question.

Is data dumped from COPY TO command any use?

It has taken me days, but I have managed to COPY my large table in chunks.

If I subsequently COPY FROM these files, would this be a workable solution?

My fear based on my ignorance is that maybe the data corruption, if
any exists, will also get COPY-ied  and therefore transferred into the
fresh database.

Is this fear justified, or is COPY a viable alternative?

Thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-25 Thread Phoenix Kiula
On Mon, Apr 25, 2011 at 9:19 PM, Phoenix Kiula phoenix.ki...@gmail.com wrote:
 On Fri, Apr 22, 2011 at 8:35 PM,  t...@fuzzy.cz wrote:
 On Fri, Apr 22, 2011 at 8:20 PM,  t...@fuzzy.cz wrote:
 On Fri, Apr 22, 2011 at 7:07 PM,  t...@fuzzy.cz wrote:
 In the pg_dumpall backup process, I get this error. Does this help?


 Well, not really - it's just another incarnation of the problem we've
 already seen. PostgreSQL reads the data, and at some point it finds out
 it
 needs to allocate 4294967293B of memory. Which is strange, because it's
 actually a negative number (-3 AFAIK).

 It's probably caused by data corruption (incorrect length for a field).

 There are ways to find out more about the cause, e.g. here:

 http://archives.postgresql.org/pgsql-hackers/2005-10/msg01198.php

 but you need to have a pg compiled with debug support. I guess the
 packaged version does not support that, but maybe you can get the
 sources
 and compile them on your own.

 If it really is a data corruption, you might try to locate the corrupted
 blocks like this:

 -- get number of blocks
 SELECT relpages FROM pg_class WHERE relname = 'table_name';

 -- get items for each block (read the problematic column)
 FOR block IN 1..relpages LOOP
 SELECT AVG(length(colname)) FROM table_name WHERE ctid =
 '(block,0)'::ctid AND ctid  '(block+1,0)'::ctid;


 Thanks for this. Very useful. What is this -- a function? How should I
 execute this query?

 It's a pseudocode - you need to implement that in whatever language you
 like. You could do that in PL/pgSQL but don't forget it's probably going
 to crash when you hit the problematic block so I'd probably implement that
 in outside the DB (with a logic to continue the loop once the connection
 dies).

 And 'ctid' is a pseudocolumn that means '(block#, row#)' i.e. it's
 something like a physical location of the row.

 regards
 Tomas



 A question.

 Is data dumped from COPY TO command any use?

 It has taken me days, but I have managed to COPY my large table in chunks.

 If I subsequently COPY FROM these files, would this be a workable solution?

 My fear based on my ignorance is that maybe the data corruption, if
 any exists, will also get COPY-ied  and therefore transferred into the
 fresh database.

 Is this fear justified, or is COPY a viable alternative?

 Thanks!




Sorry, spoke too soon.

I can COPY individual chunks to files. Did that by year, and at least
the dumping worked.

Now I need to pull the data in at the destination server.

If I COPY each individual file back into the table, it works. Slowly,
but seems to work. I tried to combine all the files into one go, then
truncate the table, and pull it all in in one go (130 million rows or
so) but this time it gave the same error. However, it pointed out a
specific row where the problem was:


COPY links, line 15272357:
16426447   9s2q7   9s2q7   N   
http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;i...;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


Is this any use at all?  Would appreciate any pointers!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-25 Thread Alban Hertroys
On 25 Apr 2011, at 18:16, Phoenix Kiula wrote:

 If I COPY each individual file back into the table, it works. Slowly,
 but seems to work. I tried to combine all the files into one go, then
 truncate the table, and pull it all in in one go (130 million rows or
 so) but this time it gave the same error. However, it pointed out a
 specific row where the problem was:
 
 
 COPY links, line 15272357:
 16426447 9s2q7   9s2q7   N   
 http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;i...;
 server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.
 
 
 Is this any use at all?  Would appreciate any pointers!


I didn't follow the entire thread, so maybe someone mentioned this already, 
but...
Usually if we see error messages like those it turns out the OS is killing the 
postgres process with it's equivalent of a low-on-memory-killer. I know Linux's 
got such a beast, and that you can turn it off.

It's a frequently recurring issue on this list, there's bound to be some 
pointers in the archives ;)

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4db5b02411674566889782!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-25 Thread Tomas Vondra
Dne 25.4.2011 19:31, Alban Hertroys napsal(a):
 On 25 Apr 2011, at 18:16, Phoenix Kiula wrote:
 
 If I COPY each individual file back into the table, it works. Slowly,
 but seems to work. I tried to combine all the files into one go, then
 truncate the table, and pull it all in in one go (130 million rows or
 so) but this time it gave the same error. However, it pointed out a
 specific row where the problem was:


 COPY links, line 15272357:
 164264479s2q7   9s2q7   N   
 http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;i...;
 server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.


 Is this any use at all?  Would appreciate any pointers!
 
 
 I didn't follow the entire thread, so maybe someone mentioned this already, 
 but...
 Usually if we see error messages like those it turns out the OS is killing 
 the postgres process with it's equivalent of a low-on-memory-killer. I know 
 Linux's got such a beast, and that you can turn it off.
 
 It's a frequently recurring issue on this list, there's bound to be some 
 pointers in the archives ;)

Not sure if this COPY failure is caused by the same issue as before, but
the original issue was caused by this

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  invalid memory alloc
request size 4294967293
pg_dump: The command was: COPY public.links (id, link_id, alias,
aliasentered, url, user_known, user_id, url_encrypted, title, private,
private_key, status, create_date, modify_date, disable_in_statistics,
user_running_id, url_host_long) TO stdout;
pg_dumpall: pg_dump failed on database snipurl, exiting

i.e. a bad memory alloc request (with negative size). That does not seem
like an OOM killing the backend.

regards
Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-25 Thread Phoenix Kiula
On Tue, Apr 26, 2011 at 1:56 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 25.4.2011 19:31, Alban Hertroys napsal(a):
 On 25 Apr 2011, at 18:16, Phoenix Kiula wrote:

 If I COPY each individual file back into the table, it works. Slowly,
 but seems to work. I tried to combine all the files into one go, then
 truncate the table, and pull it all in in one go (130 million rows or
 so) but this time it gave the same error. However, it pointed out a
 specific row where the problem was:


 COPY links, line 15272357:
 16426447    9s2q7   9s2q7   N       
 http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;i...;
 server closed the connection unexpectedly
      This probably means the server terminated abnormally
      before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.


 Is this any use at all?  Would appreciate any pointers!


 I didn't follow the entire thread, so maybe someone mentioned this already, 
 but...
 Usually if we see error messages like those it turns out the OS is killing 
 the postgres process with it's equivalent of a low-on-memory-killer. I know 
 Linux's got such a beast, and that you can turn it off.

 It's a frequently recurring issue on this list, there's bound to be some 
 pointers in the archives ;)

 Not sure if this COPY failure is caused by the same issue as before, but
 the original issue was caused by this

 pg_dump: SQL command failed
 pg_dump: Error message from server: ERROR:  invalid memory alloc
 request size 4294967293
 pg_dump: The command was: COPY public.links (id, link_id, alias,
 aliasentered, url, user_known, user_id, url_encrypted, title, private,
 private_key, status, create_date, modify_date, disable_in_statistics,
 user_running_id, url_host_long) TO stdout;
 pg_dumpall: pg_dump failed on database snipurl, exiting

 i.e. a bad memory alloc request (with negative size). That does not seem
 like an OOM killing the backend.



Most likely you're right.

I did a COPY FROM and populated the entire table. In my hard disk, the
space consumption went up by 64GB.

Yet, when I do a SELECT * FROM mytable LIMIT 1 the entire DB
crashes. There is no visible record.

What's this?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-25 Thread Tomas Vondra
Dne 25.4.2011 20:40, Phoenix Kiula napsal(a):
 
 I did a COPY FROM and populated the entire table. In my hard disk, the
 space consumption went up by 64GB.

So you have dumped the table piece by piece, it worked, and now you have
a complete copy of the table? All the rows?

 Yet, when I do a SELECT * FROM mytable LIMIT 1 the entire DB
 crashes. There is no visible record.
 
 What's this?

H, that's strange ... you're saying that's a freshly populated DB?

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-25 Thread Tom Lane
Phoenix Kiula phoenix.ki...@gmail.com writes:
 I did a COPY FROM and populated the entire table. In my hard disk, the
 space consumption went up by 64GB.

 Yet, when I do a SELECT * FROM mytable LIMIT 1 the entire DB
 crashes. There is no visible record.

There should certainly be a visible record somewhere, ie, the
postmaster log.  It might also be productive to look in the kernel log
for events around the same time --- OOM kills would be recorded there,
and if the true story here is hardware problems there might also be
kernel log messages about that.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-25 Thread Tomas Vondra
Dne 25.4.2011 18:16, Phoenix Kiula napsal(a):
 Sorry, spoke too soon.
 
 I can COPY individual chunks to files. Did that by year, and at least
 the dumping worked.
 
 Now I need to pull the data in at the destination server.
 
 If I COPY each individual file back into the table, it works. Slowly,
 but seems to work. I tried to combine all the files into one go, then
 truncate the table, and pull it all in in one go (130 million rows or
 so) but this time it gave the same error. However, it pointed out a
 specific row where the problem was:

 COPY links, line 15272357:
 16426447 9s2q7   9s2q7   N   
 http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;i...;
 server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.
 
 Is this any use at all?  Would appreciate any pointers!

So the dump worked fina and it fails when loading it back into the DB?
Have you checked the output file (just see the tail). Can you post the
part that causes issues? Just the line 16426447 and few lines around.

regards
Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-25 Thread Shashank Tripathi
On Tuesday, April 26, 2011, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 25.4.2011 18:16, Phoenix Kiula napsal(a):
 Sorry, spoke too soon.

 I can COPY individual chunks to files. Did that by year, and at least
 the dumping worked.

 Now I need to pull the data in at the destination server.

 If I COPY each individual file back into the table, it works. Slowly,
 but seems to work. I tried to combine all the files into one go, then
 truncate the table, and pull it all in in one go (130 million rows or
 so) but this time it gave the same error. However, it pointed out a
 specific row where the problem was:

 COPY links, line 15272357:
 16426447     9s2q7   9s2q7   N       
 http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;i...;
 server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.

 Is this any use at all?  Would appreciate any pointers!

 So the dump worked fina and it fails when loading it back into the DB?
 Have you checked the output file (just see the tail). Can you post the
 part that causes issues? Just the line 16426447 and few lines around.

 regards
 Tomas

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




Ok let me explain.

Pg_dumpall did not work. It kept on crashing.

So I did copy, with conditional commands, copying one year at a time.
This process took me a day and a half but I now have files with copy
dumps for last 11 years.

On the fresh server, instead of 'copy from' with 11 files I
cocatenated the files into one.

Then in a transaction, I imported this file into the new database, which has:

Begin
Truncate table
Copy from into table
Commit

This worked. I confirmed by checking for new disk usage in the ~/data
folder. it has gone up by 64gig.

Yet that SQL gives me no rows.

-- 
Shashank Tripathi
+1 646 755 9860
+65 932 55 600

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-25 Thread Phoenix Kiula
 On Tuesday, April 26, 2011, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 25.4.2011 18:16, Phoenix Kiula napsal(a):
 Sorry, spoke too soon.

 I can COPY individual chunks to files. Did that by year, and at least
 the dumping worked.

 Now I need to pull the data in at the destination server.

 If I COPY each individual file back into the table, it works. Slowly,
 but seems to work. I tried to combine all the files into one go, then
 truncate the table, and pull it all in in one go (130 million rows or
 so) but this time it gave the same error. However, it pointed out a
 specific row where the problem was:

 COPY links, line 15272357:
 16426447     9s2q7   9s2q7   N       
 http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;i...;
 server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.

 Is this any use at all?  Would appreciate any pointers!

 So the dump worked fina and it fails when loading it back into the DB?
 Have you checked the output file (just see the tail). Can you post the
 part that causes issues? Just the line 16426447 and few lines around.

 regards
 Tomas





From the old server:
Yearly COPY files worked. Pg_dumpall was giving problems.

In the new server:
COPY FROM worked. All files appear to have been copied. Then I create
the primary key index, and another index. Many records are there, but
many are not there! There's no error, just that some records/rows just
didn't make it.

I did the COPY FROM in a transaction block. If there had been an
error, then commit would have rolledback, right? It didn't. It
committed. No errors. Just that some data has not come in.

How can I get more info on why?

Tomas, the line where it crashed, here are the 10 or so lines around it:







 head -15272350 /backup/links/links_all.txt | tail -20
164264229s2pi   9s2pi   N   
http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;index=digital-musicamp;keywords=Cannibal+Corpse+-+Split+Wide+Openamp;linkCode=ur2amp;tag=dmp3-20
 0   121.214.194.133 7a69d5842739e20b56c0103d1a6ec172e58f9e07   
 \N  Y   2009-01-10
20:59:31.135881 2009-01-10 20:59:31.135881  \N  \N
164264239s2pj   9s2pj   N   
http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;index=digital-musicamp;keywords=Juana+Fe+-+la+murga+finalamp;linkCode=ur2amp;tag=dmp3-20
 0   201.215.6.104   5e2ae1f363c7854c13a101a60b32a9a1ade26767   
 \N  Y   2009-01-10
20:59:31.593474 2009-01-10 20:59:31.593474  Y   \N  \N
158978629gqva   9gqva   N   
http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;index=digital-musicamp;keywords=Boyz+II+Men+-+Ill+Make+Love+To+Youamp;linkCode=ur2amp;tag=dmp3-20
0   76.10.185.873c840fa5428c0464556dccb7d1013a6ec53d1743
N   Y   2009-01-04
19:40:50.734967 2009-01-10 20:59:32.286937  N   \N  \N
1513014990ahx   90ahx   N   
http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;index=digital-musicamp;keywords=The+Killers+-+All+The+Pretty+Facesamp;linkCode=ur2amp;tag=dmp3-20
0   65.25.74.1415eb2a1bb48d4926d8eaf946fb544ce11c50a9e5b
N   Y   2008-12-22
14:54:20.813923 2009-01-10 20:59:33.896232  N   \N  \N
164264259s2pl   9s2pl   N   
http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;index=digital-musicamp;keywords=Freddy+Quinn+-+Junge%2C+Komm+Bald+Wiederamp;linkCode=ur2amp;tag=dmp3-20
  0   123.100.137.226 fb7af64a4b886f074a6443b8d43f571c3083f51c  
  \N  Y   2009-01-10
20:59:33.986764 2009-01-10 20:59:33.986764  Y   \N  \N
163917569rbyk   9rbyk   N   
http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;index=digital-musicamp;keywords=Closure+In+Moscow+-+Ofelia...+Ofeliaamp;linkCode=ur2amp;tag=dmp3-20
  0   71.233.18.39a4f95f246b89523785b736530fb4b3a335195c4b  
  N   Y   2009-01-10
13:20:54.86346  2009-01-10 20:59:34.641193  N   \N  \N
162299289nv3c   9nv3c   N   
http://www.amazon.com/gp/search?camp=1789amp;creative=9325amp;ie=UTF8amp;index=digital-musicamp;keywords=Ministry+of+Sound+-+Freestylers+%2F+Push+Upamp;linkCode=ur2amp;tag=dmp3-20
   0   24.60.222.70b455933eb976b39313f5da56afcd9db29d3f7bde 
   N   Y   2009-01-08
19:35:19.842463 2009-01-10 20:59:35.343552  N   \N  \N
164264279s2pn   9s2pn   N   
http://www.annehelmond.nl/2007/11/26/celebrating-two-thousand-delicious-bookmarks/
  195.190.28.97   22a06537e25985273297471dbeb3fb6ae217cb90
\N  Y   2009-01-10

Re: [GENERAL] Help - corruption issue?

2011-04-22 Thread tv
 On Fri, Apr 22, 2011 at 12:06 PM, Phoenix Kiula phoenix.ki...@gmail.com
 wrote:
 On Fri, Apr 22, 2011 at 12:51 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 21.4.2011 07:16, Phoenix Kiula napsal(a):
 Tomas,

 I did a crash log with the strace for PID of the index command as you
 suggested.

 Here's the output:
 http://www.heypasteit.com/clip/WNR

 Also including below, but because this will wrap etc, you can look at
 the link above.

 Thanks for any ideas or pointers!



 Process 15900 attached - interrupt to quit

 Nope, that's the psql process - you need to attach to the backend
 process that's created to handle the connection. Whenever you create a
 connection (from a psql), a new backend process is forked to handle
 that
 single connection - this is the process you need to strace.

 You can either see that in 'ps ax' (the PID is usually +1 with respect
 to the psql process), or you can do this

  SELECT pg_backend_pid();

 as that will give you PID of the backend for the current connection.





 Thanks. Did that.

 The crash.log is a large-ish file, about 24KB. Here's the last 10
 lines though. Does this help?



  ~  tail -10 /root/crash.log
 read(58, `\1\0\0\230\337\0\343\1\0\0\0P\0T\r\0 \3
 \374\236\2\2T\215\312\1\354\235\32\2..., 8192) = 8192
 write(97, 213.156.60\0\0 \0\0\0\37\0\364P\3\0\34@\22\0\0\000210,
 8192) = 8192
 read(58, `\1\0\0\274\362\0\343\1\0\0\0T\0\210\r\0 \3
 0\217\352\1\240\236\272\0024\235\322\2..., 8192) = 8192
 read(58, [\1\0\0\354)c*\1\0\0\0T\0\214\r\0 \3
 \254\236\242\2\340\220\342\2\\\235\232\2..., 8192) = 8192
 read(58, \\\1\0\0\200\245\207\32\1\0\0\0\\\0\340\r\0 \3
 \237\272\1\304\235\262\2\340\215\322\1..., 8192) = 8192
 read(58, \350\0\0\0\274\311x\323\1\0\0\0\\\\r\0 \3
 \200\236\372\2(\235\252\2\34\234\22\2..., 8192) = 8192
 read(58, ;\1\0\0|#\265\30\1\0\0\0`\0h\r\0 \3
 \324\236R\2\314\235\n\2h\215\362\1..., 8192) = 8192
 read(58, c\1\0\\24%u\1\0\0\0\230\0\210\r\0 \3
 \240\226\32\16\260\235\252\1p\222Z\10..., 8192) = 8192
 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
 Process 17161 detached



 The full crash.log file is here if needed:
 https://www.yousendit.com/download/ VnBxcmxjNDJlM1JjR0E9PQ

 Btw, this happens when I try to create an index on one of the columns
 in my table.

 Just before this, I had created another index on modify_date  (a
 timestamp column) and it went fine.

 Does that mean anything?

 Thanks




 Probably a dumb and ignorant question, but should I be reseting the xlog?
 http://postgresql.1045698.n5.nabble.com/SIGSEGV-when-trying-to-start-in-single-user-mode-td1924418.html

Nope, that's a different problem I guess - you don't have problems with
starting up a database (when the logs are replayed), so this would not
help (and it might cause other issues).

Anyway I haven't found anything useful in the strace output - it seems it
works fine, reads about 500MB (each of the 'read' calls corresponds to 8kB
of data) of data and then suddenly ends. A bit strange is the last line is
not complete ...

Anyway, this is where my current knowledge of how processes in PostgreSQL
ends. If I was sitting at the terminal, I'd probably continue by try and
error to find out more details about the segfault, but that's not very
applicable over e-mail.

So let's hope some of the pg gurus who read this list will enlighten us
with a bit more knowledge.

regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-22 Thread Phoenix Kiula
On Fri, Apr 22, 2011 at 7:07 PM,  t...@fuzzy.cz wrote:
 On Fri, Apr 22, 2011 at 12:06 PM, Phoenix Kiula phoenix.ki...@gmail.com
 wrote:
 On Fri, Apr 22, 2011 at 12:51 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 21.4.2011 07:16, Phoenix Kiula napsal(a):
 Tomas,

 I did a crash log with the strace for PID of the index command as you
 suggested.

 Here's the output:
 http://www.heypasteit.com/clip/WNR

 Also including below, but because this will wrap etc, you can look at
 the link above.

 Thanks for any ideas or pointers!



 Process 15900 attached - interrupt to quit

 Nope, that's the psql process - you need to attach to the backend
 process that's created to handle the connection. Whenever you create a
 connection (from a psql), a new backend process is forked to handle
 that
 single connection - this is the process you need to strace.

 You can either see that in 'ps ax' (the PID is usually +1 with respect
 to the psql process), or you can do this

  SELECT pg_backend_pid();

 as that will give you PID of the backend for the current connection.





 Thanks. Did that.

 The crash.log is a large-ish file, about 24KB. Here's the last 10
 lines though. Does this help?



  ~  tail -10 /root/crash.log
 read(58, `\1\0\0\230\337\0\343\1\0\0\0P\0T\r\0 \3
 \374\236\2\2T\215\312\1\354\235\32\2..., 8192) = 8192
 write(97, 213.156.60\0\0 \0\0\0\37\0\364P\3\0\34@\22\0\0\000210,
 8192) = 8192
 read(58, `\1\0\0\274\362\0\343\1\0\0\0T\0\210\r\0 \3
 0\217\352\1\240\236\272\0024\235\322\2..., 8192) = 8192
 read(58, [\1\0\0\354)c*\1\0\0\0T\0\214\r\0 \3
 \254\236\242\2\340\220\342\2\\\235\232\2..., 8192) = 8192
 read(58, \\\1\0\0\200\245\207\32\1\0\0\0\\\0\340\r\0 \3
 \237\272\1\304\235\262\2\340\215\322\1..., 8192) = 8192
 read(58, \350\0\0\0\274\311x\323\1\0\0\0\\\\r\0 \3
 \200\236\372\2(\235\252\2\34\234\22\2..., 8192) = 8192
 read(58, ;\1\0\0|#\265\30\1\0\0\0`\0h\r\0 \3
 \324\236R\2\314\235\n\2h\215\362\1..., 8192) = 8192
 read(58, c\1\0\\24%u\1\0\0\0\230\0\210\r\0 \3
 \240\226\32\16\260\235\252\1p\222Z\10..., 8192) = 8192
 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
 Process 17161 detached



 The full crash.log file is here if needed:
 https://www.yousendit.com/download/ VnBxcmxjNDJlM1JjR0E9PQ

 Btw, this happens when I try to create an index on one of the columns
 in my table.

 Just before this, I had created another index on modify_date  (a
 timestamp column) and it went fine.

 Does that mean anything?

 Thanks




 Probably a dumb and ignorant question, but should I be reseting the xlog?
 http://postgresql.1045698.n5.nabble.com/SIGSEGV-when-trying-to-start-in-single-user-mode-td1924418.html

 Nope, that's a different problem I guess - you don't have problems with
 starting up a database (when the logs are replayed), so this would not
 help (and it might cause other issues).

 Anyway I haven't found anything useful in the strace output - it seems it
 works fine, reads about 500MB (each of the 'read' calls corresponds to 8kB
 of data) of data and then suddenly ends. A bit strange is the last line is
 not complete ...

 Anyway, this is where my current knowledge of how processes in PostgreSQL
 ends. If I was sitting at the terminal, I'd probably continue by try and
 error to find out more details about the segfault, but that's not very
 applicable over e-mail.

 So let's hope some of the pg gurus who read this list will enlighten us
 with a bit more knowledge.

 regards
 Tomas







In the pg_dumpall backup process, I get this error. Does this help?


pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  invalid memory alloc
request size 4294967293
pg_dump: The command was: COPY public.links (id, link_id, alias,
aliasentered, url, user_known, user_id, url_encrypted, title, private,
private_key, status, create_date, modify_date, disable_in_statistics,
user_running_id, url_host_long) TO stdout;
pg_dumpall: pg_dump failed on database snipurl, exiting


Thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-22 Thread tv
 On Fri, Apr 22, 2011 at 7:07 PM,  t...@fuzzy.cz wrote:
 In the pg_dumpall backup process, I get this error. Does this help?


Well, not really - it's just another incarnation of the problem we've
already seen. PostgreSQL reads the data, and at some point it finds out it
needs to allocate 4294967293B of memory. Which is strange, because it's
actually a negative number (-3 AFAIK).

It's probably caused by data corruption (incorrect length for a field).

There are ways to find out more about the cause, e.g. here:

http://archives.postgresql.org/pgsql-hackers/2005-10/msg01198.php

but you need to have a pg compiled with debug support. I guess the
packaged version does not support that, but maybe you can get the sources
and compile them on your own.

If it really is a data corruption, you might try to locate the corrupted
blocks like this:

-- get number of blocks
SELECT relpages FROM pg_class WHERE relname = 'table_name';

-- get items for each block (read the problematic column)
FOR block IN 1..relpages LOOP
SELECT AVG(length(colname)) FROM table_name WHERE ctid =
'(block,0)'::ctid AND ctid  '(block+1,0)'::ctid;

and once it fails remember the block ID (and restart - there might be more).

regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-22 Thread Phoenix Kiula
On Fri, Apr 22, 2011 at 8:20 PM,  t...@fuzzy.cz wrote:
 On Fri, Apr 22, 2011 at 7:07 PM,  t...@fuzzy.cz wrote:
 In the pg_dumpall backup process, I get this error. Does this help?


 Well, not really - it's just another incarnation of the problem we've
 already seen. PostgreSQL reads the data, and at some point it finds out it
 needs to allocate 4294967293B of memory. Which is strange, because it's
 actually a negative number (-3 AFAIK).

 It's probably caused by data corruption (incorrect length for a field).

 There are ways to find out more about the cause, e.g. here:

 http://archives.postgresql.org/pgsql-hackers/2005-10/msg01198.php

 but you need to have a pg compiled with debug support. I guess the
 packaged version does not support that, but maybe you can get the sources
 and compile them on your own.

 If it really is a data corruption, you might try to locate the corrupted
 blocks like this:

 -- get number of blocks
 SELECT relpages FROM pg_class WHERE relname = 'table_name';

 -- get items for each block (read the problematic column)
 FOR block IN 1..relpages LOOP
 SELECT AVG(length(colname)) FROM table_name WHERE ctid =
 '(block,0)'::ctid AND ctid  '(block+1,0)'::ctid;


Thanks for this. Very useful. What is this -- a function? How should I
execute this query?

Thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-22 Thread tv
 On Fri, Apr 22, 2011 at 8:20 PM,  t...@fuzzy.cz wrote:
 On Fri, Apr 22, 2011 at 7:07 PM,  t...@fuzzy.cz wrote:
 In the pg_dumpall backup process, I get this error. Does this help?


 Well, not really - it's just another incarnation of the problem we've
 already seen. PostgreSQL reads the data, and at some point it finds out
 it
 needs to allocate 4294967293B of memory. Which is strange, because it's
 actually a negative number (-3 AFAIK).

 It's probably caused by data corruption (incorrect length for a field).

 There are ways to find out more about the cause, e.g. here:

 http://archives.postgresql.org/pgsql-hackers/2005-10/msg01198.php

 but you need to have a pg compiled with debug support. I guess the
 packaged version does not support that, but maybe you can get the
 sources
 and compile them on your own.

 If it really is a data corruption, you might try to locate the corrupted
 blocks like this:

 -- get number of blocks
 SELECT relpages FROM pg_class WHERE relname = 'table_name';

 -- get items for each block (read the problematic column)
 FOR block IN 1..relpages LOOP
 SELECT AVG(length(colname)) FROM table_name WHERE ctid =
 '(block,0)'::ctid AND ctid  '(block+1,0)'::ctid;


 Thanks for this. Very useful. What is this -- a function? How should I
 execute this query?

It's a pseudocode - you need to implement that in whatever language you
like. You could do that in PL/pgSQL but don't forget it's probably going
to crash when you hit the problematic block so I'd probably implement that
in outside the DB (with a logic to continue the loop once the connection
dies).

And 'ctid' is a pseudocolumn that means '(block#, row#)' i.e. it's
something like a physical location of the row.

regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-21 Thread Tomas Vondra
Dne 21.4.2011 07:16, Phoenix Kiula napsal(a):
 Tomas,
 
 I did a crash log with the strace for PID of the index command as you
 suggested.
 
 Here's the output:
 http://www.heypasteit.com/clip/WNR
 
 Also including below, but because this will wrap etc, you can look at
 the link above.
 
 Thanks for any ideas or pointers!
 
 
 
 Process 15900 attached - interrupt to quit

Nope, that's the psql process - you need to attach to the backend
process that's created to handle the connection. Whenever you create a
connection (from a psql), a new backend process is forked to handle that
single connection - this is the process you need to strace.

You can either see that in 'ps ax' (the PID is usually +1 with respect
to the psql process), or you can do this

  SELECT pg_backend_pid();

as that will give you PID of the backend for the current connection.

regards
Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-21 Thread Phoenix Kiula
On Fri, Apr 22, 2011 at 12:51 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 21.4.2011 07:16, Phoenix Kiula napsal(a):
 Tomas,

 I did a crash log with the strace for PID of the index command as you
 suggested.

 Here's the output:
 http://www.heypasteit.com/clip/WNR

 Also including below, but because this will wrap etc, you can look at
 the link above.

 Thanks for any ideas or pointers!



 Process 15900 attached - interrupt to quit

 Nope, that's the psql process - you need to attach to the backend
 process that's created to handle the connection. Whenever you create a
 connection (from a psql), a new backend process is forked to handle that
 single connection - this is the process you need to strace.

 You can either see that in 'ps ax' (the PID is usually +1 with respect
 to the psql process), or you can do this

  SELECT pg_backend_pid();

 as that will give you PID of the backend for the current connection.





Thanks. Did that.

The crash.log is a large-ish file, about 24KB. Here's the last 10
lines though. Does this help?



 ~  tail -10 /root/crash.log
read(58, `\1\0\0\230\337\0\343\1\0\0\0P\0T\r\0 \3
\374\236\2\2T\215\312\1\354\235\32\2..., 8192) = 8192
write(97, 213.156.60\0\0 \0\0\0\37\0\364P\3\0\34@\22\0\0\000210,
8192) = 8192
read(58, `\1\0\0\274\362\0\343\1\0\0\0T\0\210\r\0 \3
0\217\352\1\240\236\272\0024\235\322\2..., 8192) = 8192
read(58, [\1\0\0\354)c*\1\0\0\0T\0\214\r\0 \3
\254\236\242\2\340\220\342\2\\\235\232\2..., 8192) = 8192
read(58, \\\1\0\0\200\245\207\32\1\0\0\0\\\0\340\r\0 \3
\237\272\1\304\235\262\2\340\215\322\1..., 8192) = 8192
read(58, \350\0\0\0\274\311x\323\1\0\0\0\\\\r\0 \3
\200\236\372\2(\235\252\2\34\234\22\2..., 8192) = 8192
read(58, ;\1\0\0|#\265\30\1\0\0\0`\0h\r\0 \3
\324\236R\2\314\235\n\2h\215\362\1..., 8192) = 8192
read(58, c\1\0\\24%u\1\0\0\0\230\0\210\r\0 \3
\240\226\32\16\260\235\252\1p\222Z\10..., 8192) = 8192
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
Process 17161 detached



The full crash.log file is here if needed:
https://www.yousendit.com/download/ VnBxcmxjNDJlM1JjR0E9PQ

Btw, this happens when I try to create an index on one of the columns
in my table.

Just before this, I had created another index on modify_date  (a
timestamp column) and it went fine.

Does that mean anything?

Thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-21 Thread Phoenix Kiula
On Fri, Apr 22, 2011 at 12:06 PM, Phoenix Kiula phoenix.ki...@gmail.com wrote:
 On Fri, Apr 22, 2011 at 12:51 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 21.4.2011 07:16, Phoenix Kiula napsal(a):
 Tomas,

 I did a crash log with the strace for PID of the index command as you
 suggested.

 Here's the output:
 http://www.heypasteit.com/clip/WNR

 Also including below, but because this will wrap etc, you can look at
 the link above.

 Thanks for any ideas or pointers!



 Process 15900 attached - interrupt to quit

 Nope, that's the psql process - you need to attach to the backend
 process that's created to handle the connection. Whenever you create a
 connection (from a psql), a new backend process is forked to handle that
 single connection - this is the process you need to strace.

 You can either see that in 'ps ax' (the PID is usually +1 with respect
 to the psql process), or you can do this

  SELECT pg_backend_pid();

 as that will give you PID of the backend for the current connection.





 Thanks. Did that.

 The crash.log is a large-ish file, about 24KB. Here's the last 10
 lines though. Does this help?



  ~  tail -10 /root/crash.log
 read(58, `\1\0\0\230\337\0\343\1\0\0\0P\0T\r\0 \3
 \374\236\2\2T\215\312\1\354\235\32\2..., 8192) = 8192
 write(97, 213.156.60\0\0 \0\0\0\37\0\364P\3\0\34@\22\0\0\000210,
 8192) = 8192
 read(58, `\1\0\0\274\362\0\343\1\0\0\0T\0\210\r\0 \3
 0\217\352\1\240\236\272\0024\235\322\2..., 8192) = 8192
 read(58, [\1\0\0\354)c*\1\0\0\0T\0\214\r\0 \3
 \254\236\242\2\340\220\342\2\\\235\232\2..., 8192) = 8192
 read(58, \\\1\0\0\200\245\207\32\1\0\0\0\\\0\340\r\0 \3
 \237\272\1\304\235\262\2\340\215\322\1..., 8192) = 8192
 read(58, \350\0\0\0\274\311x\323\1\0\0\0\\\\r\0 \3
 \200\236\372\2(\235\252\2\34\234\22\2..., 8192) = 8192
 read(58, ;\1\0\0|#\265\30\1\0\0\0`\0h\r\0 \3
 \324\236R\2\314\235\n\2h\215\362\1..., 8192) = 8192
 read(58, c\1\0\\24%u\1\0\0\0\230\0\210\r\0 \3
 \240\226\32\16\260\235\252\1p\222Z\10..., 8192) = 8192
 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
 Process 17161 detached



 The full crash.log file is here if needed:
 https://www.yousendit.com/download/ VnBxcmxjNDJlM1JjR0E9PQ

 Btw, this happens when I try to create an index on one of the columns
 in my table.

 Just before this, I had created another index on modify_date  (a
 timestamp column) and it went fine.

 Does that mean anything?

 Thanks




Probably a dumb and ignorant question, but should I be reseting the xlog?
http://postgresql.1045698.n5.nabble.com/SIGSEGV-when-trying-to-start-in-single-user-mode-td1924418.html

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-20 Thread Phoenix Kiula
 On a fast network it should only take a few minutes.  Now rsyncing
 live 2.4 TB databases, that takes time. :)  Your raptors, if they're
 working properly, should be able to transfer at around 80 to
 100Megabytes a second.  10 to 15 seconds a gig.  30 minutes or so via
 gig ethernet.  I'd run iostat and see how well my drive array was
 performing during a large, largely sequential copy.


OK. An update.

We have changed all the hardware except disks.

REINDEX still gave this problem:

--
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
--


So I rebooted and logged back in a single user mode. All services
stopped. All networking stopped. Only postgresql started. I tried the
REINDEX again.

Same problem :(

This means the problem is likely with data?

I do have a pg_dumpall dump from 1 day before. Will lose some data,
but should have most of it.

Is it worth it for me to try and restore from there? What's the best
thing to do right now?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-20 Thread Tomas Vondra
Dne 20.4.2011 12:56, Phoenix Kiula napsal(a):
 On a fast network it should only take a few minutes.  Now rsyncing
 live 2.4 TB databases, that takes time. :)  Your raptors, if they're
 working properly, should be able to transfer at around 80 to
 100Megabytes a second.  10 to 15 seconds a gig.  30 minutes or so via
 gig ethernet.  I'd run iostat and see how well my drive array was
 performing during a large, largely sequential copy.
 
 
 OK. An update.
 
 We have changed all the hardware except disks.

OK, so the card is working and the drives are fine. Have you run the
tw_cli tool to check the drives? Because it's probably the last thing
that might be faulty and was not replaced.

 REINDEX still gave this problem:
 
 --
 server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.
 --

Hm, have you checked if there's something else in the logs? More details
about the crash or something like that.

I'd probably try to run strace on the backend, to get more details about
where it crashes. Just find out the PID of the backend dedicated to your
psql session, do

$ strace -p PID  crash.log 21

and then run the REINDEX. Once it crashes you can see the last few lines
from the logfile.

 So I rebooted and logged back in a single user mode. All services
 stopped. All networking stopped. Only postgresql started. I tried the
 REINDEX again.
 
 Same problem :(
 
 This means the problem is likely with data?

Well, maybe. It might be a problem with the data, it might be a bug in
postgres ...

 I do have a pg_dumpall dump from 1 day before. Will lose some data,
 but should have most of it.
 
 Is it worth it for me to try and restore from there? What's the best
 thing to do right now?

So have you done the file backup? That's the first thing I'd do.

Anyway what's best depends on how important is the missing piece of
data. We still don't know how to fix the problem, but it sure seems like
a corrupted data.

I think you already know which table is corrupted, right? In that case
you may actually try to find the bad block and erase it (and maybe do a
copy so that we can see what's wrong with it and how it might happen).
There's a very nice guide on how to do that

http://blog.endpoint.com/2010/06/tracking-down-database-corruption-with.html

It sure seems like the problem you have (invalid alloc request etc.).
The really annoying part is locating the block, as you have to scan
through the table (which sucks with such big table).

And yes, if there's corruption, there might be more corrupted blocks.

regards
Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-20 Thread Tomas Vondra
Dne 20.4.2011 22:11, Tomas Vondra napsal(a):
 There's a very nice guide on how to do that
 
 http://blog.endpoint.com/2010/06/tracking-down-database-corruption-with.html
 
 It sure seems like the problem you have (invalid alloc request etc.).
 The really annoying part is locating the block, as you have to scan
 through the table (which sucks with such big table).
 
 And yes, if there's corruption, there might be more corrupted blocks.

BTW, there's a setting 'zero_damaged_pages' that might help with this

http://www.postgresql.org/docs/8.2/static/runtime-config-developer.html

see this talk for more details how to use it

http://www.casitconf.org/casitconf11/Tech_track_2_files/cascadia_postgres_rbernier.pdf

Anyway don't play with this without the file backup, as this will zero
the blocks.

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-20 Thread Phoenix Kiula
On Thu, Apr 21, 2011 at 7:27 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 20.4.2011 22:11, Tomas Vondra napsal(a):
 There's a very nice guide on how to do that

 http://blog.endpoint.com/2010/06/tracking-down-database-corruption-with.html

 It sure seems like the problem you have (invalid alloc request etc.).
 The really annoying part is locating the block, as you have to scan
 through the table (which sucks with such big table).

 And yes, if there's corruption, there might be more corrupted blocks.

 BTW, there's a setting 'zero_damaged_pages' that might help with this

 http://www.postgresql.org/docs/8.2/static/runtime-config-developer.html

 see this talk for more details how to use it

 http://www.casitconf.org/casitconf11/Tech_track_2_files/cascadia_postgres_rbernier.pdf

 Anyway don't play with this without the file backup, as this will zero
 the blocks.

 Tomas






Thanks Tomas. Very handy info.

FIRST: is there anyone on this list who offers PG admin support?
Please write to me directly.

Second, for the strace, which process should I use?


ps auxwww|grep ^postgres
postgres  4320  0.0  0.1 440192 10824 ?  Ss   08:49   0:00
/usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres  4355  0.0  0.0 11724  964 ?Ss   08:49   0:00
postgres: logger process
postgres  4365  0.0  0.0 440396 3268 ?   Ss   08:49   0:00
postgres: writer process
postgres  4366  0.0  0.0 11860 1132 ?Ss   08:49   0:00
postgres: stats collector process
postgres 15795  0.0  0.0  7136 1440 pts/0S22:44   0:00 -bash
postgres 15900  0.0  0.0  7860 1956 pts/0S+   22:44   0:00 psql -h
localhost MYDOMAIN -E MYDOMAIN_MYDOMAIN
postgres 15901  0.0  0.0 441124 3072 ?   Ss   22:44   0:00
postgres: MYDOMAIN_MYDOMAIN MYDOMAIN 127.0.0.1(34346) idle


Third, I have the backup in two ways:

1. I took a backup of the entire /pgsql/data folder. PG was shutdown
at the time.
2. I have a pg_dumpall file but it is missing one day's data (still
useful as last resort).

Will #1 have corrupt data in it?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-20 Thread Phoenix Kiula
On Thu, Apr 21, 2011 at 11:49 AM, Phoenix Kiula phoenix.ki...@gmail.com wrote:
 On Thu, Apr 21, 2011 at 7:27 AM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 20.4.2011 22:11, Tomas Vondra napsal(a):
 There's a very nice guide on how to do that

 http://blog.endpoint.com/2010/06/tracking-down-database-corruption-with.html

 It sure seems like the problem you have (invalid alloc request etc.).
 The really annoying part is locating the block, as you have to scan
 through the table (which sucks with such big table).

 And yes, if there's corruption, there might be more corrupted blocks.

 BTW, there's a setting 'zero_damaged_pages' that might help with this

 http://www.postgresql.org/docs/8.2/static/runtime-config-developer.html

 see this talk for more details how to use it

 http://www.casitconf.org/casitconf11/Tech_track_2_files/cascadia_postgres_rbernier.pdf

 Anyway don't play with this without the file backup, as this will zero
 the blocks.

 Tomas






 Thanks Tomas. Very handy info.

 FIRST: is there anyone on this list who offers PG admin support?
 Please write to me directly.

 Second, for the strace, which process should I use?


 ps auxwww|grep ^postgres
 postgres  4320  0.0  0.1 440192 10824 ?      Ss   08:49   0:00
 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
 postgres  4355  0.0  0.0 11724  964 ?        Ss   08:49   0:00
 postgres: logger process
 postgres  4365  0.0  0.0 440396 3268 ?       Ss   08:49   0:00
 postgres: writer process
 postgres  4366  0.0  0.0 11860 1132 ?        Ss   08:49   0:00
 postgres: stats collector process
 postgres 15795  0.0  0.0  7136 1440 pts/0    S    22:44   0:00 -bash
 postgres 15900  0.0  0.0  7860 1956 pts/0    S+   22:44   0:00 psql -h
 localhost MYDOMAIN -E MYDOMAIN_MYDOMAIN
 postgres 15901  0.0  0.0 441124 3072 ?       Ss   22:44   0:00
 postgres: MYDOMAIN_MYDOMAIN MYDOMAIN 127.0.0.1(34346) idle


 Third, I have the backup in two ways:

 1. I took a backup of the entire /pgsql/data folder. PG was shutdown
 at the time.
 2. I have a pg_dumpall file but it is missing one day's data (still
 useful as last resort).

 Will #1 have corrupt data in it?




Tomas,

I did a crash log with the strace for PID of the index command as you
suggested.

Here's the output:
http://www.heypasteit.com/clip/WNR

Also including below, but because this will wrap etc, you can look at
the link above.

Thanks for any ideas or pointers!



Process 15900 attached - interrupt to quit
read(0, r, 1) = 1
write(1, r, 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, e, 1) = 1
write(1, e, 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, i, 1) = 1
write(1, i, 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, n, 1) = 1
write(1, n, 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, d, 1) = 1
write(1, d, 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, e, 1) = 1
write(1, e, 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, x, 1) = 1
write(1, x, 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0,  , 1) = 1
write(1,  , 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, l, 1) = 1
write(1, l, 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, i, 1) = 1
write(1, i, 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, n, 1) = 1
write(1, n, 1)= 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, \177, 1)  = 1
write(1, \10\33[K, 4) = 4
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, \177, 1)  = 1
write(1, \10\33[K, 4) = 4
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, \177, 1)  = 1
write(1, \10\33[K, 4) = 4
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, \177, 1)  = 1
write(1, \10\33[K, 4) = 4
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, \177, 1)  = 1
write(1, \10\33[K, 4) = 4
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, \177, 1)  = 1
write(1, \10\33[K, 4) = 4
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, \177, 1)  = 1
write(1, \10\33[K, 4) = 4
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, \177, 1)  = 1
write(1, \10\33[K, 4) = 4
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, \177, 1)  = 1
write(1, \10\33[K, 4) = 4
rt_sigprocmask(SIG_BLOCK, NULL, 

[GENERAL] Help - corruption issue?

2011-04-18 Thread Phoenix Kiula
While doing a PG dump, I seem to have a problem:

ERROR: invalid memory alloc request size 4294967293

Upon googling, this seems to be a data corruption issue!

( Came about while doing performance tuning as being discussed on the
PG-PERFORMANCE list:
http://postgresql.1045698.n5.nabble.com/REINDEX-takes-half-a-day-and-still-not-complete-td4005943.html
)

One of the older messages suggests that I do file level backup and
restore the data.
http://archives.postgresql.org/pgsql-admin/2008-05/msg00191.php

How does one do this -- should I copy the data folder? What are the
specific steps?

I'm on PG 8.2.9, CentOS 5, with 8GB of RAM. The disks are four SATAII
disks on RAID 1.

Thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread Filip Rembiałkowski
Phoenix,

how large (in total) is this database)?

can you copy (cp -a) the data directory somewhere? I would do this
just in case :-)


regarding the manual recovery process:

1. you'll have to isolate corrupted table.
you can do this by dumping all tables one-by-one (pg_dump -t TABLE)
until you get the error.

2. find the record which is corupted... approach like this might work:
select count(*) from the_corrupted_table where PK_column = some_value.

3 .you should try to dump the table by chunks - skipping the corrupted
row(s) if possible

4. if above method does not work, you can try manually hex-editing
(zeroing) some bytes (with postgres shut down) to make dump work
again.


PS. obligatory note:

8.2.9 Release Date: 2008-06-12; 8.2.21 Release Date: 2011-04-18
seems like you were running almost three years without bugfixes.
aside from fixing your current problem, I would first do the upgrade
to avoid more corruption.






2011/4/18 Phoenix Kiula phoenix.ki...@gmail.com

 While doing a PG dump, I seem to have a problem:

 ERROR: invalid memory alloc request size 4294967293

 Upon googling, this seems to be a data corruption issue!

 ( Came about while doing performance tuning as being discussed on the
 PG-PERFORMANCE list:
 http://postgresql.1045698.n5.nabble.com/REINDEX-takes-half-a-day-and-still-not-complete-td4005943.html
 )

 One of the older messages suggests that I do file level backup and
 restore the data.
 http://archives.postgresql.org/pgsql-admin/2008-05/msg00191.php

 How does one do this -- should I copy the data folder? What are the
 specific steps?

 I'm on PG 8.2.9, CentOS 5, with 8GB of RAM. The disks are four SATAII
 disks on RAID 1.

 Thanks!

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread Phoenix Kiula
Thanks Filip.

I know which table it is. It's my largest table with over 125 million rows.

All the others are less than 100,000 rows. Most are in fact less than 25,000.

Now, which specific part of the table is corrupted -- if it is row
data, then can I dump specific parts of that table? How? Pg_dumpall
does not seem to have an option to have a WHERE clause?

If the lead index is corrupt, then issuing a reindex should work. So I
disconnected all other users. The DB was doing nothing. And then I
started a psql session and issued the command reindex database MYDB.
After 3 hours, I see this error:



[QUOTE]
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: WARNING:
terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
Failed.
!
[/UNQUOTE]


What am I to do now? Even reindex is not working. I can try to drop
indexes and create them again. Will that help?





2011/4/18 Filip Rembiałkowski plk.zu...@gmail.com:
 Phoenix,

 how large (in total) is this database)?

 can you copy (cp -a) the data directory somewhere? I would do this
 just in case :-)


 regarding the manual recovery process:

 1. you'll have to isolate corrupted table.
 you can do this by dumping all tables one-by-one (pg_dump -t TABLE)
 until you get the error.

 2. find the record which is corupted... approach like this might work:
 select count(*) from the_corrupted_table where PK_column = some_value.

 3 .you should try to dump the table by chunks - skipping the corrupted
 row(s) if possible

 4. if above method does not work, you can try manually hex-editing
 (zeroing) some bytes (with postgres shut down) to make dump work
 again.


 PS. obligatory note:

 8.2.9 Release Date: 2008-06-12; 8.2.21 Release Date: 2011-04-18
 seems like you were running almost three years without bugfixes.
 aside from fixing your current problem, I would first do the upgrade
 to avoid more corruption.






 2011/4/18 Phoenix Kiula phoenix.ki...@gmail.com

 While doing a PG dump, I seem to have a problem:

 ERROR: invalid memory alloc request size 4294967293

 Upon googling, this seems to be a data corruption issue!

 ( Came about while doing performance tuning as being discussed on the
 PG-PERFORMANCE list:
 http://postgresql.1045698.n5.nabble.com/REINDEX-takes-half-a-day-and-still-not-complete-td4005943.html
 )

 One of the older messages suggests that I do file level backup and
 restore the data.
 http://archives.postgresql.org/pgsql-admin/2008-05/msg00191.php

 How does one do this -- should I copy the data folder? What are the
 specific steps?

 I'm on PG 8.2.9, CentOS 5, with 8GB of RAM. The disks are four SATAII
 disks on RAID 1.

 Thanks!

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread tv
 Thanks Filip.

 I know which table it is. It's my largest table with over 125 million
 rows.

 All the others are less than 100,000 rows. Most are in fact less than
 25,000.

 Now, which specific part of the table is corrupted -- if it is row
 data, then can I dump specific parts of that table? How? Pg_dumpall
 does not seem to have an option to have a WHERE clause?

 If the lead index is corrupt, then issuing a reindex should work. So I
 disconnected all other users. The DB was doing nothing. And then I
 started a psql session and issued the command reindex database MYDB.
 After 3 hours, I see this error:



 [QUOTE]
 server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
 The connection to the server was lost. Attempting reset: WARNING:
 terminating connection because of crash of another server process
 DETAIL:  The postmaster has commanded this server process to roll back
 the current transaction and exit, because another server process
 exited abnormally and possibly corrupted shared memory.
 HINT:  In a moment you should be able to reconnect to the database and
 repeat your command.
 Failed.
 !
 [/UNQUOTE]


 What am I to do now? Even reindex is not working. I can try to drop
 indexes and create them again. Will that help?

It might help, but as someone already pointed out, you're running a
version that's 3 years old. So do a hot file backup (stop the db and copy
the data directory to another machine), check the hardware (especially the
RAID controller and RAM), upgrade to the latest 8.2.x version and then try
again.

I'll post a bit more info into the other thread, as it's related to the
reindex performance and not to this issue.

regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread Merlin Moncure
2011/4/18 Phoenix Kiula phoenix.ki...@gmail.com:
 Thanks Filip.

 I know which table it is. It's my largest table with over 125 million rows.

 All the others are less than 100,000 rows. Most are in fact less than 25,000.

 Now, which specific part of the table is corrupted -- if it is row
 data, then can I dump specific parts of that table? How? Pg_dumpall
 does not seem to have an option to have a WHERE clause?

 If the lead index is corrupt, then issuing a reindex should work. So I
 disconnected all other users. The DB was doing nothing. And then I
 started a psql session and issued the command reindex database MYDB.
 After 3 hours, I see this error:



 [QUOTE]
 server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
 The connection to the server was lost. Attempting reset: WARNING:
 terminating connection because of crash of another server process
 DETAIL:  The postmaster has commanded this server process to roll back
 the current transaction and exit, because another server process
 exited abnormally and possibly corrupted shared memory.
 HINT:  In a moment you should be able to reconnect to the database and
 repeat your command.
 Failed.
 !
 [/UNQUOTE]


 What am I to do now? Even reindex is not working. I can try to drop
 indexes and create them again. Will that help?

it might.  take a full file system backup first and drop the indexes.
before recreating them, take a regular dump (with pg_dump) and if it
goes through, you're golden, rebuild the indexes, *update the
postmaster to latest 8.2*, and you can go back online.

merllin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread Phoenix Kiula
On Mon, Apr 18, 2011 at 11:02 PM,  t...@fuzzy.cz wrote:
 Thanks Filip.

 I know which table it is. It's my largest table with over 125 million
 rows.

 All the others are less than 100,000 rows. Most are in fact less than
 25,000.

 Now, which specific part of the table is corrupted -- if it is row
 data, then can I dump specific parts of that table? How? Pg_dumpall
 does not seem to have an option to have a WHERE clause?

 If the lead index is corrupt, then issuing a reindex should work. So I
 disconnected all other users. The DB was doing nothing. And then I
 started a psql session and issued the command reindex database MYDB.
 After 3 hours, I see this error:



 [QUOTE]
 server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
 The connection to the server was lost. Attempting reset: WARNING:
 terminating connection because of crash of another server process
 DETAIL:  The postmaster has commanded this server process to roll back
 the current transaction and exit, because another server process
 exited abnormally and possibly corrupted shared memory.
 HINT:  In a moment you should be able to reconnect to the database and
 repeat your command.
 Failed.
 !
 [/UNQUOTE]


 What am I to do now? Even reindex is not working. I can try to drop
 indexes and create them again. Will that help?

 It might help, but as someone already pointed out, you're running a
 version that's 3 years old. So do a hot file backup (stop the db and copy
 the data directory to another machine), check the hardware (especially the
 RAID controller and RAM), upgrade to the latest 8.2.x version and then try
 again.

 I'll post a bit more info into the other thread, as it's related to the
 reindex performance and not to this issue.

 regards
 Tomas



Thanks. For CentOS (RedHat?) the latest is 8.2.19 right? Not the
8.2.20 that's mentioned on front page of PG.org.

http://www.pgrpms.org/8.2/redhat/rhel-4-i386/repoview/

Question:  will upgrading from 8.2.9 to 8.2.19 have some repercussions
in terms of huge changes or problems?

I know 9.x had some new additions including casting etc (or is that
irrelevant to me?) but if 8.2.19 is safe in terms of not requiring
anything new from my side, then I can do the upgrade quickly.

Welcome any advice.

Thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread Tomas Vondra
Dne 18.4.2011 20:27, Phoenix Kiula napsal(a):


 What am I to do now? Even reindex is not working. I can try to drop
 indexes and create them again. Will that help?

 It might help, but as someone already pointed out, you're running a
 version that's 3 years old. So do a hot file backup (stop the db and copy
 the data directory to another machine), check the hardware (especially the
 RAID controller and RAM), upgrade to the latest 8.2.x version and then try
 again.

 I'll post a bit more info into the other thread, as it's related to the
 reindex performance and not to this issue.

 regards
 Tomas
 
 Thanks. For CentOS (RedHat?) the latest is 8.2.19 right? Not the
 8.2.20 that's mentioned on front page of PG.org.

Centos is probably a bit delayed behind the source version. If you want
to stick with the binary version, go with the 8.2.19.

 http://www.pgrpms.org/8.2/redhat/rhel-4-i386/repoview/
 
 Question:  will upgrading from 8.2.9 to 8.2.19 have some repercussions
 in terms of huge changes or problems?

Those minor versions are mostly bugfixes and small improvements. So no,
I wouldn't expect huge problems.

 I know 9.x had some new additions including casting etc (or is that
 irrelevant to me?) but if 8.2.19 is safe in terms of not requiring
 anything new from my side, then I can do the upgrade quickly.

Don't do that right now. When doing 'minor' upgrades, you don't need to
dump/restore the database - you can just replace the binaries and it
should work as the file format does not change between minor versions
(and 8.2.9 - 8.2.19 is a minor upgrade).

Still, do the file backup as described in the previous posts. You could
even do an online backup using pg_backup_start/pg_backup_stop etc.

To upgrade from 8.2 to 9.0 you'd need to do pg_dump backup and then
restore the database. Which is of scope right now, I guess.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread Scott Marlowe
On Mon, Apr 18, 2011 at 5:44 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Still, do the file backup as described in the previous posts. You could
 even do an online backup using pg_backup_start/pg_backup_stop etc.

As soon as you have a working file system backup, get the tw_cli
utility for the 3ware cards downloaded and LOOK at what it has to say
about your RAID controller, drives, and array health.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread Phoenix Kiula
On Tue, Apr 19, 2011 at 8:35 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Apr 18, 2011 at 5:44 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Still, do the file backup as described in the previous posts. You could
 even do an online backup using pg_backup_start/pg_backup_stop etc.

 As soon as you have a working file system backup, get the tw_cli
 utility for the 3ware cards downloaded and LOOK at what it has to say
 about your RAID controller, drives, and array health.



I am with SoftLayer. They're a very professional bunch. They even
changed my BBU last night. The RAID card is working. The memory and
the hardware are also tested.

I have now upgraded to 8.2.19.

Then I restarted the server, and dropped indexes. When I recreate the
first index, the same thing happens:

--
# CREATE INDEX idx_links_userid ON links (user_id);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
--


There is nothing going on in the server other than this command. All
other users are blocked!

Logging is enabled but does not have anything!

I am now worried. What is this problem?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread Scott Marlowe
On Mon, Apr 18, 2011 at 8:52 PM, Phoenix Kiula phoenix.ki...@gmail.com wrote:
 On Tue, Apr 19, 2011 at 8:35 AM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Mon, Apr 18, 2011 at 5:44 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Still, do the file backup as described in the previous posts. You could
 even do an online backup using pg_backup_start/pg_backup_stop etc.

 As soon as you have a working file system backup, get the tw_cli
 utility for the 3ware cards downloaded and LOOK at what it has to say
 about your RAID controller, drives, and array health.



 I am with SoftLayer. They're a very professional bunch. They even
 changed my BBU last night. The RAID card is working. The memory and
 the hardware are also tested.

So, RAID is good for sure?  As in someone logged into the machine, and
went to the tw_cli utility and asked it about the status of the
physical drives and virtual RAID array and the card said yes they're
good?  No bad sectors being remapped?  Hmmm.  One of my old tests when
things were acting up was to see if the server could compile the linux
kernel or pgsql back when it took 1.5 hours to do.  If you keep
getting sig 11s on production kernel compiles something's wrong with
the system, software or hardware.

 I have now upgraded to 8.2.19.

 Then I restarted the server, and dropped indexes. When I recreate the
 first index, the same thing happens:

 --
 # CREATE INDEX idx_links_userid ON links (user_id);
 server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.
 --

What do the Postgresql logs say at this time?  oh wait...

 There is nothing going on in the server other than this command. All
 other users are blocked!

 Logging is enabled but does not have anything!

System logs maybe?  Something about a process getting killed?  Have
you tried turning up the verbosity of the pg logs?

 I am now worried. What is this problem?

We gotta check one thing at a time really.

If you copy the dir off to another machine and run pgsql 8.2.latest or
thereabouts, can you then create the index?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread Phoenix Kiula
 System logs maybe?  Something about a process getting killed?  Have
 you tried turning up the verbosity of the pg logs?


Syslog has to be compiled with PG? How do I enable it? Where should I
look for it?

The documentation, whenever it mentions syslog, always just assumes
the expression If syslog is enabled. Well where do I enable it?  -
http://www.postgresql.org/docs/8.2/static/runtime-config-logging.html

Would appreciate some guidance on this.


 We gotta check one thing at a time really.

 If you copy the dir off to another machine and run pgsql 8.2.latest or
 thereabouts, can you then create the index?


I will try this. Transferring 106GB of data, even zipped, is a huge
ask and just the management will take over a day or so. I was hoping
we could do without this.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help - corruption issue?

2011-04-18 Thread Scott Marlowe
On Mon, Apr 18, 2011 at 9:23 PM, Phoenix Kiula phoenix.ki...@gmail.com wrote:
 System logs maybe?  Something about a process getting killed?  Have
 you tried turning up the verbosity of the pg logs?


 Syslog has to be compiled with PG? How do I enable it? Where should I
 look for it?

 The documentation, whenever it mentions syslog, always just assumes
 the expression If syslog is enabled. Well where do I enable it?  -
 http://www.postgresql.org/docs/8.2/static/runtime-config-logging.html

 Would appreciate some guidance on this.

No I meant the system logs, the ones in /var/log/yadayada.  Like
/var/log/message, things like that.  See if any of them have anything
interesting happening when things go badly.

syslog is logging using the syslog system which puts logs from various
processes into the /var/log dir, like /var/log/pgsql.  Assuming you
have a stock RHEL install I'd expect the pgsql logs to be in
/var/log/pgsql or thereabouts.

 We gotta check one thing at a time really.

 If you copy the dir off to another machine and run pgsql 8.2.latest or
 thereabouts, can you then create the index?


 I will try this. Transferring 106GB of data, even zipped, is a huge
 ask and just the management will take over a day or so. I was hoping
 we could do without this.

On a fast network it should only take a few minutes.  Now rsyncing
live 2.4 TB databases, that takes time. :)  Your raptors, if they're
working properly, should be able to transfer at around 80 to
100Megabytes a second.  10 to 15 seconds a gig.  30 minutes or so via
gig ethernet.  I'd run iostat and see how well my drive array was
performing during a large, largely sequential copy.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general