Frederik Ramm wrote:

Dear PostgreSQL community,

I hope you can help me with a problem I'm having - I'm stuck and don't know how to debug this further.

I have a rather large nightly process that imports a lot of data from the OpenStreetMap project into a PostGIS database, then proceeds doing all sorts of things - creating spatial indexes, computing bounding boxes, doing simplification of geometries, that kind of stuff. The whole job usually takes about five hours.

I'm running this on a Quad-Core Linux (Ubuntu, PostgreSQL 8.3) machine with 8 GB RAM.

Every other night, the process aborts with some strange error message, and never at the same position:

ERROR:  invalid page header in block 166406 of relation "node_tags"

ERROR: could not open segment 2 of relation 1663/24253056/24253895 (target block 1421295656): No such file or directory

ERROR:  Unknown geometry type: 10

When I continue the process after the failure, it will usually work.

I know you all think "hardware problem" now. Of course this was my first guess as well. I ran a memory test for a night, no results; I downgraded do "failsafe defaults" for all BIOS timings, again no change. Ran "cpuburn" and all sorts of other things to grill the hardware - nothing.

Then I bought an entirely new machine; similar setup, but using a Gigabyte instead of Asus mainboard, different chipset, slightly faster Quad-Core processor, and again 8 GB RAM and Ubuntu "Hardy" with PostgresSQL 8.3 and matching PostGIS.

Believe it or not, this machine shows the *same* problems. It is not 100% reproducible, sometimes the job works fully, but every other day it just breaks down with one of the funny messages like above. No memtest errors here either.

Both machines are "consumer" quality, i.e. normal Intel processors and not the "server" (Xeon) stock.

I am at a loss - how can I proceed? This looks like a hardware problem alright, but so simliar problems on two so different machines? Is there something wrong with Intel's Quad-Core CPUs?

What could I do to have a better chance of reproducing the error and ultimately identifying the component responsible? Is there some kind of "PostgresSQL load test", something like "cpuburn" for PostgreSQL?

Have there been other reports of intermittent problems like mine, and does anybody have any blind guesses...?

Thanks
Frederik


Hi Frederik,

We did find a memory clobber in the PostGIS ANALYZE routine a while back, but the fix hasn't yet made it into a release.

If you are building from source, please can you try applying the patch here: http://code.google.com/p/postgis/issues/detail?id=43 and reporting back whether it helps or not?


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to