Tom Lane wrote:
Travis Cross <[EMAIL PROTECTED]> writes:
I'm noticing that a handful (4-16) of rows with duplicate columns
(uid,token) are sneaking into the table every day despite the
primary key constraint.

Corrupt index, looks like ... you might try reindexing the index.

I probably should have mentioned that I have indeed done a REINDEX on the table a couple of times in the past, suspecting that issue, and having seen it resolve similar issues on this list. Upon your suggestion, I'm running one right now, and I will probably dump and reload the entire database after hours, unless anyone thinks that would be a bad idea (or unproductive in tracking this down).

I don't believe that the PANIC you show has anything directly to do
with duplicate entries.  It is a symptom of corrupt index structure.
Now a corrupt index might also explain failure to notice duplications,
but changing your application isn't going to fix whatever is causing
it.  You need to look for server-side causes.

Indeed, you are correct. I should also mention that the problem seems to build over time, in the sense that everything will run fine for awhile (a few days), and then will crash repeatedly. Deleting the duplicate rows seems to reset the counter -- of course, I cannot run a successful REINDEX until I have deleted those duplicate rows.

Any database or system crashes on this server (before this problem
started)?

No. In fact, this box, and a sister box running similar hardware, have been models of system stability. My uptimes are 46 and 87 days, respectively, representing the time since I've done a kernel upgrade and the time since I plugged the boxes into the rack. The sister box is running real-time voice services.

Do you *know* that the disk drive will not lie about write
complete?

"Know" is such a strong word ;) Honestly, I have very little idea. I understand the nature of the problem this presents, as I've read the very fine PostgreSQL manual many times over the years.

Because the drives I use are specifically designed to operate well in a RAID environment, I would 'hope' that the drives perform honest write operations.

I wonder if there is a utility to perform a deterministic test of this...

What is the platform and storage system, anyway?

The platform is:

Linux 2.6.16.9 (w/o loadable modules)
Supermicro PDSMi (a single processor P-D board)
2G ECC DDRII SDRAM

The storage system is:

On-board SATA ICH7R Controller
2 x WD3200SD hard drives running in a Linux RAID 1 configuration.
That is to say: Western Digital 320G SATA 'enterprise' drives. The drives have a somewhat unique feature: time-limited error recovery, which is supposed to let the RAID controller/software deal with errors after a certain point (7 seconds), rather than continuing to block, and causing the drive to fall out of the array.

The drive:
http://www.westerndigital.com/en/products/products.asp?driveid=114&language=en

I'll run file system consistency checks tonight to see if I can pick out a proximal cause for all this chaos.

I really do appreciate the assistance.

Cheers,

-- Travis

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Reply via email to