RE : RE : [HACKERS] Stability problems
Scott you're right, it was a hardware problem. Thanks for your help. Glad to be of help. What was the problem? Bad memory or bad hard drive? Just curious. It was a bad 512Mo memory module and a bad memory slot on the motherboard. Our hosting provider never checks memory before, but now it will make the test systematically. Nicolas VERGER ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
RE : [HACKERS] Stability problems
You're right, it was a hardware problem. Thanks for your help. Nicolas VERGER -Message d'origine- De : [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED]] De la part de scott.marlowe Envoyé : mercredi 6 novembre 2002 21:38 À : Nicolas VERGER Cc : 'PostgreSQL Hackers Mailing List' Objet : Re: [HACKERS] Stability problems I would recommend checking your memory (look for memtest86 online somewhere. Good tool.) Anytime a machine seems to act flakely there's a better than even chance it has a bad bit of memory in it. On Wed, 6 Nov 2002, Nicolas VERGER wrote: Hi, I have strange stability problems. I can't access a table (the table is different each time I get the problem, it could be a system table (pg_am), or a user defined one): Can't select * the whole table but can select * limit x offset y, so it appears that only a tuple is in bad status. I can't vacuum or pg_dump this table too. The error disappears after waiting some time. I get the following error in log when select the 'bad' line: 2002-11-05 11:26:42 [3062] DEBUG: server process (pid 4551) was terminated by signal 11 2002-11-05 11:26:42 [3062] DEBUG: terminating any other active server processes 2002-11-05 11:26:42 [4555] FATAL 1: The database system is in recovery mode 2002-11-05 11:26:42 [3062] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-11-05 11:26:42 [4557] DEBUG: database system was interrupted at 2002-11-05 11:23:00 CET I get the following error in log when vacuuming the 'bad' table: 2002-11-05 14:46:44 [5768] FATAL 2: failed to add item with len = 191 to page 150 (free space 4294967096, nusd 0, noff 0) 2002-11-05 14:46:44 [5569] DEBUG: server process (pid 5768) exited with exit code 2 2002-11-05 14:46:44 [5569] DEBUG: terminating any other active server processes 2002-11-05 14:46:44 [5771] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. 2002-11-05 14:46:44 [5772] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. 2002-11-05 14:46:44 [5569] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-11-05 14:46:44 [5774] DEBUG: database system was interrupted at 2002-11-05 14:46:40 CET template1=# select version(); PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96 Is it a lock problem? Is there a way to log it? Thanks for all making such a good job. Nicolas VERGER ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
RE : [HACKERS] Stability problems
Scott you're right, it was a hardware problem. Thanks for your help. Nicolas VERGER -Message d'origine- De : [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED]] De la part de scott.marlowe Envoyé : mercredi 6 novembre 2002 21:38 À : Nicolas VERGER Cc : 'PostgreSQL Hackers Mailing List' Objet : Re: [HACKERS] Stability problems I would recommend checking your memory (look for memtest86 online somewhere. Good tool.) Anytime a machine seems to act flakely there's a better than even chance it has a bad bit of memory in it. On Wed, 6 Nov 2002, Nicolas VERGER wrote: Hi, I have strange stability problems. I can't access a table (the table is different each time I get the problem, it could be a system table (pg_am), or a user defined one): Can't select * the whole table but can select * limit x offset y, so it appears that only a tuple is in bad status. I can't vacuum or pg_dump this table too. The error disappears after waiting some time. I get the following error in log when select the 'bad' line: 2002-11-05 11:26:42 [3062] DEBUG: server process (pid 4551) was terminated by signal 11 2002-11-05 11:26:42 [3062] DEBUG: terminating any other active server processes 2002-11-05 11:26:42 [4555] FATAL 1: The database system is in recovery mode 2002-11-05 11:26:42 [3062] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-11-05 11:26:42 [4557] DEBUG: database system was interrupted at 2002-11-05 11:23:00 CET I get the following error in log when vacuuming the 'bad' table: 2002-11-05 14:46:44 [5768] FATAL 2: failed to add item with len = 191 to page 150 (free space 4294967096, nusd 0, noff 0) 2002-11-05 14:46:44 [5569] DEBUG: server process (pid 5768) exited with exit code 2 2002-11-05 14:46:44 [5569] DEBUG: terminating any other active server processes 2002-11-05 14:46:44 [5771] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. 2002-11-05 14:46:44 [5772] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. 2002-11-05 14:46:44 [5569] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-11-05 14:46:44 [5774] DEBUG: database system was interrupted at 2002-11-05 14:46:40 CET template1=# select version(); PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96 Is it a lock problem? Is there a way to log it? Thanks for all making such a good job. Nicolas VERGER ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: RE : [HACKERS] Stability problems
On Tue, 12 Nov 2002, Nicolas VERGER wrote: Scott you're right, it was a hardware problem. Thanks for your help. Glad to be of help. What was the problem? Bad memory or bad hard drive? Just curious. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Stability problems
Nicolas VERGER [EMAIL PROTECTED] writes: 2002-11-05 14:46:44 [5768] FATAL 2: failed to add item with len = 191 to page 150 (free space 4294967096, nusd 0, noff 0) template1=# select version(); PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96 Hmm. This looks a lot like the bug I recently noted in vacuum's free-space calculations --- but that bug only affects machines where MAXALIGN 4, which I would not expect for an Intel machine. Anyway you might try this patch: *** pgsql-server/src/backend/commands/vacuum.c 2002/10/21 22:06:19 1.243 --- pgsql-server/src/backend/commands/vacuum.c 2002/10/31 19:25:29 1.244 *** *** 1753,1759 } to_vacpage-free -= MAXALIGN(tlen); if (to_vacpage-offsets_used = to_vacpage-offsets_free) ! to_vacpage-free -= MAXALIGN(sizeof(ItemIdData)); (to_vacpage-offsets_used)++; if (free_vtmove == 0) { --- 1753,1759 } to_vacpage-free -= MAXALIGN(tlen); if (to_vacpage-offsets_used = to_vacpage-offsets_free) ! to_vacpage-free -= sizeof(ItemIdData); (to_vacpage-offsets_used)++; if (free_vtmove == 0) { (Line numbers are for recent CVS tip and are off a little for 7.2, but there's only one occurrence of MAXALIGN(sizeof(... in vacuum.c; you can't miss it.) While you are at it, be sure to update to 7.2.3. There are some *critical* bug fixes in 7.2.3. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Stability problems
I would recommend checking your memory (look for memtest86 online somewhere. Good tool.) Anytime a machine seems to act flakely there's a better than even chance it has a bad bit of memory in it. On Wed, 6 Nov 2002, Nicolas VERGER wrote: Hi, I have strange stability problems. I can't access a table (the table is different each time I get the problem, it could be a system table (pg_am), or a user defined one): Can't select * the whole table but can select * limit x offset y, so it appears that only a tuple is in bad status. I can't vacuum or pg_dump this table too. The error disappears after waiting some time. I get the following error in log when select the 'bad' line: 2002-11-05 11:26:42 [3062] DEBUG: server process (pid 4551) was terminated by signal 11 2002-11-05 11:26:42 [3062] DEBUG: terminating any other active server processes 2002-11-05 11:26:42 [4555] FATAL 1: The database system is in recovery mode 2002-11-05 11:26:42 [3062] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-11-05 11:26:42 [4557] DEBUG: database system was interrupted at 2002-11-05 11:23:00 CET I get the following error in log when vacuuming the 'bad' table: 2002-11-05 14:46:44 [5768] FATAL 2: failed to add item with len = 191 to page 150 (free space 4294967096, nusd 0, noff 0) 2002-11-05 14:46:44 [5569] DEBUG: server process (pid 5768) exited with exit code 2 2002-11-05 14:46:44 [5569] DEBUG: terminating any other active server processes 2002-11-05 14:46:44 [5771] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. 2002-11-05 14:46:44 [5772] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. 2002-11-05 14:46:44 [5569] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-11-05 14:46:44 [5774] DEBUG: database system was interrupted at 2002-11-05 14:46:40 CET template1=# select version(); PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96 Is it a lock problem? Is there a way to log it? Thanks for all making such a good job. Nicolas VERGER ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly