For the record, here are the results of our (ongoing) inevstigation into the index/heap corruption problems I reported a couple of weeks ago.
We were able to trigger the problem with kernels 2.6.16, 2.6.17 and 2.6.18.rc1, with 2.6.16 seeming to be the most flaky. By replacing the NFS-mounted netapp with a fibre-channel SAN, we have eliminated the problem on all kernels. From this, it would seem to be an NFS bug introduced post 2.6.14, though we cannot rule out a postgres bug exposed by unusual timing issues. Our starting systems are: Sun v40z 4 x Dual Core AMD Opteron(tm) Processor 875 Kernel 2.6.16.14 #8 SMP x86_64 x86_64 x86_64 GNU/Linux (and others) kernel boot option: elevator=deadline 16 Gigs of RAM postgresql-8.0.8-1PGDG Bonded e1000/tg3 NICs with 8192 MTU. Slony 1.1.5 NetApp FAS270 OnTap 7.0.3 Mounted with the NFS options rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=600,tcp,noac Jumbo frames 8192 MTU. All postgres data and logs are stored on the netapp. All tests results were reproduced with postgres 8.0.8 __ Marc On Fri, 2006-06-30 at 23:20 -0400, Tom Lane wrote: > Marc Munro <[EMAIL PROTECTED]> writes: > > We tried all of these suggestions and still get the problem. Nothing > > interesting in the log file so I guess the Asserts did not fire. > > Not surprising, it was a long shot that any of those things were really > broken. But worth testing. > > > We are going to try experimenting with different kernels now. Unless > > anyone has any other suggestions. > > Right at the moment I have no better ideas :-( > > regards, tom lane >
signature.asc
Description: This is a digitally signed message part