On Apr 14, 2010, at 2:31 PM, Tom Lane wrote:

> I wrote:
>> [ theory about cause of Rusty's crash ]
> 
> I started to doubt this theory after wondering why the problem hadn't
> been exposed by CLOBBER_CACHE_ALWAYS testing, which is done routinely
> by the buildfarm.  That setting would surely cause the cache flush to
> happen at the troublesome time.  After a good deal more investigation,
> I found out why it doesn't crash with that.  The problematic case is
> for a relation that has rd_newRelfilenodeSubid nonzero but
> rd_createSubid zero (ie, it's been truncated in the current xact).
> Given that, RelationFlushRelation will attempt a rebuild but
> RelationCacheInvalidate won't exempt the relation from destruction.
> However, if you do a TRUNCATE under CLOBBER_CACHE_ALWAYS, the relcache
> entry gets blown away immediately at the conclusion of that command,
> because we'll do a RelationCacheInvalidate as a consequence of
> CLOBBER_CACHE_ALWAYS.  When the relcache entry is rebuilt for later use,
> it won't have rd_newRelfilenodeSubid set, so it's not a hazard anymore.
> In order to expose this bug, the relcache entry has to survive past the
> TRUNCATE and then a cache flush has to occur while we are in process of
> rebuilding it, not before.
> 
> What this suggests is that CLOBBER_CACHE_ALWAYS is actually too strong
> to provide a thorough test of cache flush hazards.  Maybe we need an
> alternate setting along the lines of CLOBBER_CACHE_SOMETIMES that would
> randomly choose whether or not to flush at any given opportunity.  But
> if such a setup did produce a crash, it'd be awfully hard to reproduce
> for investigation.  Ideas?
> 
> There is another slightly odd thing here, which is that the stack trace
> Rusty provided clearly shows the crash occurring during processing of a
> local relcache invalidation message for the truncated relation.  This
> would be expected during execution of the TRUNCATE itself, but at that
> point the rel has positive refcnt so there's no problem.  According to
> the stack trace the active SQL command is an INSERT ... SELECT, and I
> wouldn't expect that to queue any relcache invals.  Are there any
> triggers or other unusual things in the real application (not the
> watered-down test case) that would be triggered in INSERT/SELECT?
> 
>                       regards, tom lane


There are no triggers or other unusual things going on in the real application. 
 This worked in 8.3.9 but started failing when going to 8.4.3.

The test case program was the smallest thing I could write to reproduce the 
problem consistently on my machine, but I couldn't reproduce it consistently on 
other machines and architectures.  I'm glad Heikki was able to also see the 
crash on his hardware.  I can take Heikki's patch back out and get a new stack 
trace from the test program if that would be useful to you.

Thanks,

Rusty
--
Rusty Conover
rcono...@infogears.com
InfoGears Inc / GearBuyer.com / FootwearBuyer.com
http://www.infogears.com
http://www.gearbuyer.com
http://www.footwearbuyer.com

Reply via email to