RE : [HACKERS] Stability problems

2002-11-15 Thread Verger Nicolas
You're right, it was a hardware problem.
Thanks for your help.

Nicolas VERGER

> -Message d'origine-
> De : [EMAIL PROTECTED] [mailto:pgsql-hackers-
> [EMAIL PROTECTED]] De la part de scott.marlowe
> Envoyé : mercredi 6 novembre 2002 21:38
> À : Nicolas VERGER
> Cc : 'PostgreSQL Hackers Mailing List'
> Objet : Re: [HACKERS] Stability problems
> 
> I would recommend checking your memory (look for memtest86 online
> somewhere.  Good tool.)  Anytime a machine seems to act flakely
there's a
> better than even chance it has a bad bit of memory in it.
> 
> On Wed, 6 Nov 2002, Nicolas VERGER wrote:
> 
> > Hi,
> > I have strange stability problems.
> > I can't access a table (the table is different each time I get the
> > problem, it could be a system table (pg_am), or a user defined one):
> > Can't "select *" the whole table but can "select * limit x offset
y", so
> > it appears that only a tuple is in bad status. I can't vacuum or
pg_dump
> > this table too.
> > The error disappears after waiting some time.
> >
> > I get the following error in log when select the 'bad' line:
> >

> > 
> > 2002-11-05 11:26:42 [3062]   DEBUG:  server process (pid 4551) was
> > terminated by signal 11
> > 2002-11-05 11:26:42 [3062]   DEBUG:  terminating any other active
server
> > processes
> > 2002-11-05 11:26:42 [4555]   FATAL 1:  The database system is in
> > recovery mode
> > 2002-11-05 11:26:42 [3062]   DEBUG:  all server processes
terminated;
> > reinitializing shared memory and semaphores
> > 2002-11-05 11:26:42 [4557]   DEBUG:  database system was interrupted
at
> > 2002-11-05 11:23:00 CET
> >

> > 
> >
> > I get the following error in log when vacuuming the 'bad' table:
> >

> > 
> > 2002-11-05 14:46:44 [5768]   FATAL 2:  failed to add item with len =
191
> > to page 150 (free space 4294967096, nusd 0, noff 0)
> > 2002-11-05 14:46:44 [5569]   DEBUG:  server process (pid 5768)
exited
> > with exit code 2
> > 2002-11-05 14:46:44 [5569]   DEBUG:  terminating any other active
server
> > processes
> > 2002-11-05 14:46:44 [5771]   NOTICE:  Message from PostgreSQL
backend:
> > The Postmaster has informed me that some other backend
> > died abnormally and possibly corrupted shared memory.
> > I have rolled back the current transaction and am
> > going to terminate your database system connection and exit.
> > Please reconnect to the database system and repeat your
query.
> > 2002-11-05 14:46:44 [5772]   NOTICE:  Message from PostgreSQL
backend:
> > The Postmaster has informed me that some other backend
> > died abnormally and possibly corrupted shared memory.
> > I have rolled back the current transaction and am
> > going to terminate your database system connection and exit.
> > Please reconnect to the database system and repeat your
query.
> > 2002-11-05 14:46:44 [5569]   DEBUG:  all server processes
terminated;
> > reinitializing shared memory and semaphores
> > 2002-11-05 14:46:44 [5774]   DEBUG:  database system was interrupted
at
> > 2002-11-05 14:46:40 CET
> >

> > 
> >
> > template1=# select version();
> > PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
> >
> > Is it a lock problem? Is there a way to log it?
> >
> >
> > Thanks for all making such a good job.
> >
> > Nicolas VERGER
> >
> >
> > ---(end of
broadcast)---
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to [EMAIL PROTECTED] so that your
> > message can get through to the mailing list cleanly
> >
> 
> 
> ---(end of
broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to [EMAIL PROTECTED] so that your
> message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



RE : RE : [HACKERS] Stability problems

2002-11-15 Thread Verger Nicolas
> > Scott you're right, it was a hardware problem.
> > Thanks for your help.
> >
> 
> Glad to be of help.  What was the problem?  Bad memory or bad hard
drive?
> Just curious.

It was a bad 512Mo memory module and a bad memory slot on the
motherboard.
Our hosting provider never checks memory before, but now it will make
the test systematically.

Nicolas VERGER


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: RE : [HACKERS] Stability problems

2002-11-12 Thread scott.marlowe
On Tue, 12 Nov 2002, Nicolas VERGER wrote:

> Scott you're right, it was a hardware problem.
> Thanks for your help.
> 

Glad to be of help.  What was the problem?  Bad memory or bad hard drive?  
Just curious.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



RE : [HACKERS] Stability problems

2002-11-12 Thread Nicolas VERGER
Scott you're right, it was a hardware problem.
Thanks for your help.

Nicolas VERGER

> -Message d'origine-
> De : [EMAIL PROTECTED] [mailto:pgsql-hackers-
> [EMAIL PROTECTED]] De la part de scott.marlowe
> Envoyé : mercredi 6 novembre 2002 21:38
> À : Nicolas VERGER
> Cc : 'PostgreSQL Hackers Mailing List'
> Objet : Re: [HACKERS] Stability problems
> 
> I would recommend checking your memory (look for memtest86 online
> somewhere.  Good tool.)  Anytime a machine seems to act flakely
there's a
> better than even chance it has a bad bit of memory in it.
> 
> On Wed, 6 Nov 2002, Nicolas VERGER wrote:
> 
> > Hi,
> > I have strange stability problems.
> > I can't access a table (the table is different each time I get the
> > problem, it could be a system table (pg_am), or a user defined one):
> > Can't "select *" the whole table but can "select * limit x offset
y", so
> > it appears that only a tuple is in bad status. I can't vacuum or
pg_dump
> > this table too.
> > The error disappears after waiting some time.
> >
> > I get the following error in log when select the 'bad' line:
> >

> > 
> > 2002-11-05 11:26:42 [3062]   DEBUG:  server process (pid 4551) was
> > terminated by signal 11
> > 2002-11-05 11:26:42 [3062]   DEBUG:  terminating any other active
server
> > processes
> > 2002-11-05 11:26:42 [4555]   FATAL 1:  The database system is in
> > recovery mode
> > 2002-11-05 11:26:42 [3062]   DEBUG:  all server processes
terminated;
> > reinitializing shared memory and semaphores
> > 2002-11-05 11:26:42 [4557]   DEBUG:  database system was interrupted
at
> > 2002-11-05 11:23:00 CET
> >

> > 
> >
> > I get the following error in log when vacuuming the 'bad' table:
> >

> > 
> > 2002-11-05 14:46:44 [5768]   FATAL 2:  failed to add item with len =
191
> > to page 150 (free space 4294967096, nusd 0, noff 0)
> > 2002-11-05 14:46:44 [5569]   DEBUG:  server process (pid 5768)
exited
> > with exit code 2
> > 2002-11-05 14:46:44 [5569]   DEBUG:  terminating any other active
server
> > processes
> > 2002-11-05 14:46:44 [5771]   NOTICE:  Message from PostgreSQL
backend:
> > The Postmaster has informed me that some other backend
> > died abnormally and possibly corrupted shared memory.
> > I have rolled back the current transaction and am
> > going to terminate your database system connection and exit.
> > Please reconnect to the database system and repeat your
query.
> > 2002-11-05 14:46:44 [5772]   NOTICE:  Message from PostgreSQL
backend:
> > The Postmaster has informed me that some other backend
> > died abnormally and possibly corrupted shared memory.
> > I have rolled back the current transaction and am
> > going to terminate your database system connection and exit.
> > Please reconnect to the database system and repeat your
query.
> > 2002-11-05 14:46:44 [5569]   DEBUG:  all server processes
terminated;
> > reinitializing shared memory and semaphores
> > 2002-11-05 14:46:44 [5774]   DEBUG:  database system was interrupted
at
> > 2002-11-05 14:46:40 CET
> >

> > 
> >
> > template1=# select version();
> > PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
> >
> > Is it a lock problem? Is there a way to log it?
> >
> >
> > Thanks for all making such a good job.
> >
> > Nicolas VERGER
> >
> >
> > ---(end of
broadcast)---
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to [EMAIL PROTECTED] so that your
> > message can get through to the mailing list cleanly
> >
> 
> 
> ---(end of
broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to [EMAIL PROTECTED] so that your
> message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Stability problems

2002-11-06 Thread scott.marlowe
I would recommend checking your memory (look for memtest86 online 
somewhere.  Good tool.)  Anytime a machine seems to act flakely there's a 
better than even chance it has a bad bit of memory in it.

On Wed, 6 Nov 2002, Nicolas VERGER wrote:

> Hi,
> I have strange stability problems.
> I can't access a table (the table is different each time I get the
> problem, it could be a system table (pg_am), or a user defined one):
> Can't "select *" the whole table but can "select * limit x offset y", so
> it appears that only a tuple is in bad status. I can't vacuum or pg_dump
> this table too.
> The error disappears after waiting some time.
> 
> I get the following error in log when select the 'bad' line: 
> 
> 
> 2002-11-05 11:26:42 [3062]   DEBUG:  server process (pid 4551) was
> terminated by signal 11
> 2002-11-05 11:26:42 [3062]   DEBUG:  terminating any other active server
> processes
> 2002-11-05 11:26:42 [4555]   FATAL 1:  The database system is in
> recovery mode
> 2002-11-05 11:26:42 [3062]   DEBUG:  all server processes terminated;
> reinitializing shared memory and semaphores
> 2002-11-05 11:26:42 [4557]   DEBUG:  database system was interrupted at
> 2002-11-05 11:23:00 CET
> 
> 
> 
> I get the following error in log when vacuuming the 'bad' table: 
> 
> 
> 2002-11-05 14:46:44 [5768]   FATAL 2:  failed to add item with len = 191
> to page 150 (free space 4294967096, nusd 0, noff 0)
> 2002-11-05 14:46:44 [5569]   DEBUG:  server process (pid 5768) exited
> with exit code 2
> 2002-11-05 14:46:44 [5569]   DEBUG:  terminating any other active server
> processes
> 2002-11-05 14:46:44 [5771]   NOTICE:  Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend
> died abnormally and possibly corrupted shared memory.
> I have rolled back the current transaction and am
> going to terminate your database system connection and exit.
> Please reconnect to the database system and repeat your query.
> 2002-11-05 14:46:44 [5772]   NOTICE:  Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend
> died abnormally and possibly corrupted shared memory.
> I have rolled back the current transaction and am
> going to terminate your database system connection and exit.
> Please reconnect to the database system and repeat your query.
> 2002-11-05 14:46:44 [5569]   DEBUG:  all server processes terminated;
> reinitializing shared memory and semaphores
> 2002-11-05 14:46:44 [5774]   DEBUG:  database system was interrupted at
> 2002-11-05 14:46:40 CET
> 
> 
> 
> template1=# select version();
> PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
> 
> Is it a lock problem? Is there a way to log it?
> 
> 
> Thanks for all making such a good job.
> 
> Nicolas VERGER
> 
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to [EMAIL PROTECTED] so that your
> message can get through to the mailing list cleanly
> 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Stability problems

2002-11-06 Thread Tom Lane
"Nicolas VERGER" <[EMAIL PROTECTED]> writes:
> 2002-11-05 14:46:44 [5768]   FATAL 2:  failed to add item with len = 191
> to page 150 (free space 4294967096, nusd 0, noff 0)

> template1=# select version();
> PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96

Hmm.  This looks a lot like the bug I recently noted in vacuum's
free-space calculations --- but that bug only affects machines where
MAXALIGN > 4, which I would not expect for an Intel machine.  Anyway
you might try this patch:

*** pgsql-server/src/backend/commands/vacuum.c  2002/10/21 22:06:19 1.243
--- pgsql-server/src/backend/commands/vacuum.c  2002/10/31 19:25:29 1.244
***
*** 1753,1759 
}
to_vacpage->free -= MAXALIGN(tlen);
if (to_vacpage->offsets_used >= 
to_vacpage->offsets_free)
!   to_vacpage->free -= 
MAXALIGN(sizeof(ItemIdData));
(to_vacpage->offsets_used)++;
if (free_vtmove == 0)
{
--- 1753,1759 
}
to_vacpage->free -= MAXALIGN(tlen);
if (to_vacpage->offsets_used >= 
to_vacpage->offsets_free)
!   to_vacpage->free -= sizeof(ItemIdData);
(to_vacpage->offsets_used)++;
if (free_vtmove == 0)
{

(Line numbers are for recent CVS tip and are off a little for 7.2, but
there's only one occurrence of MAXALIGN(sizeof(... in vacuum.c; you
can't miss it.)

While you are at it, be sure to update to 7.2.3.  There are some
*critical* bug fixes in 7.2.3.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] Stability problems

2002-11-06 Thread Nicolas VERGER
Hi,
I have strange stability problems.
I can't access a table (the table is different each time I get the
problem, it could be a system table (pg_am), or a user defined one):
Can't "select *" the whole table but can "select * limit x offset y", so
it appears that only a tuple is in bad status. I can't vacuum or pg_dump
this table too.
The error disappears after waiting some time.

I get the following error in log when select the 'bad' line: 


2002-11-05 11:26:42 [3062]   DEBUG:  server process (pid 4551) was
terminated by signal 11
2002-11-05 11:26:42 [3062]   DEBUG:  terminating any other active server
processes
2002-11-05 11:26:42 [4555]   FATAL 1:  The database system is in
recovery mode
2002-11-05 11:26:42 [3062]   DEBUG:  all server processes terminated;
reinitializing shared memory and semaphores
2002-11-05 11:26:42 [4557]   DEBUG:  database system was interrupted at
2002-11-05 11:23:00 CET



I get the following error in log when vacuuming the 'bad' table: 


2002-11-05 14:46:44 [5768]   FATAL 2:  failed to add item with len = 191
to page 150 (free space 4294967096, nusd 0, noff 0)
2002-11-05 14:46:44 [5569]   DEBUG:  server process (pid 5768) exited
with exit code 2
2002-11-05 14:46:44 [5569]   DEBUG:  terminating any other active server
processes
2002-11-05 14:46:44 [5771]   NOTICE:  Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
2002-11-05 14:46:44 [5772]   NOTICE:  Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
2002-11-05 14:46:44 [5569]   DEBUG:  all server processes terminated;
reinitializing shared memory and semaphores
2002-11-05 14:46:44 [5774]   DEBUG:  database system was interrupted at
2002-11-05 14:46:40 CET



template1=# select version();
PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96

Is it a lock problem? Is there a way to log it?


Thanks for all making such a good job.

Nicolas VERGER


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly