[ADMIN] Vacuum Memory Exhausted error

2003-06-17 Thread Greg Spiegelberg
Morning,

I have multiple databases running on a Solaris 9, PostgreSQL 7.3.2
system and nightly I run a process that, for each database, performs
first a VACUUM ANALYZE; then runs the vacuumlo on the same database.
The problem is that the VACUUM ANALYZE; results in a

  ERROR:  Memory exhausted in AllocSetAlloc(1048575996)

The (1048575996) doesn't change regardless of the database that the
vacuum is running on.  Additionally, our tables in all of our databases
are reindexed every 6 hours.
That's the problem.

My current workaround is prior to the VACUUM ANALYZE; I drop all
indexes to all non-system, non-postgres tables, then perform the
vacuum on each non-system, non-postgres table, and finally recreate
the indexes.  The reindex every 6 hours is currently disabled as well.
This is the only solution I have to date.
Any input would be greatly appreciated.

Greg

--
Greg Spiegelberg
 Sr. Product Development Engineer
 Cranel, Incorporated.
 Phone: 614.318.4314
 Fax:   614.431.8388
 Email: [EMAIL PROTECTED]
Cranel. Technology. Integrity. Focus.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [ADMIN] Vacuum Memory Exhausted error

2003-06-17 Thread Tom Lane
Greg Spiegelberg [EMAIL PROTECTED] writes:
 The problem is that the VACUUM ANALYZE; results in a
ERROR:  Memory exhausted in AllocSetAlloc(1048575996)

If you do VACUUM and ANALYZE separately, which part shows the failure?
(I'm betting on ANALYZE.)  And have you narrowed down which table it
fails on?  (Adding the VERBOSE option would let you track that.)

This looks to me like a data-corruption problem, specifically a bad
value of the length word in a variable-length field.  (Adjacent data
is probably clobbered too, but the length problem is showing itself
first.)  See previous threads in the archives for general advice about
tracking down broken rows and getting rid of them.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [ADMIN] Vacuum Memory Exhausted error

2003-06-17 Thread Tom Lane
Greg Spiegelberg [EMAIL PROTECTED] writes:
 I have isolated the table but can't seem to find the info on finding
 the bad row and removing it.

Well, it's not rocket science, you just do trial and error to see which
rows you can select without getting the error.  I'd try a SELECT
COUNT(*) first to check that there is no corruption of tuple headers.
If that works, identify which column contains the damage by seeing
whether you can do SELECT max(col) for each column left-to-right.
Then identify the broken row by doing
SELECT broken-col FROM table OFFSET n LIMIT 1
for various values of n --- this reads and discards n rows then reads
and returns another, so if it doesn't fail then the first n+1 rows are
good.  If you understand the principle of binary search you can home
in on the target row quickly.  Actually it's a little bit complicated
because of an undocumented fact: in current releases the LIMIT mechanism
actually reads one more row than it needs to, so when you have narrowed
down the exact n at which it first fails, the broken row is actually
identifiable by
SELECT ctid FROM table OFFSET n+1 LIMIT 1
Double-check that you have identified the right row by verifying that
SELECT * FROM table WHERE ctid = 'ctid obtained above'
blows up --- if not, you're off by one in the LIMIT stuff.

Once you have found the broken row, learn what you can from it (with
luck you can select at least the first few columns) and then delete it
by ctid.

This all assumes that there is exactly one point of corruption, which is
a really bad assumption when dealing with real cases.  Keep in mind that
there is likely to be more than one broken row, and that some of the
corruption may show only as incorrect values and not anything that
provokes an error report.  Once you can select all the data in the
table, do what you can to validate your data.

BTW, ctid is the physical location of a row, which is expressed in the
form '(block number, line number)'.  Once you have determined which
block(s) contain broken data, it would be interesting to dump them out
with a tool like pg_filedump (see http://sources.redhat.com/rhdb/).
The pattern of wrong data might possibly suggest something about the
cause.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings