On Fri, Oct 28, 2005 at 04:58:56PM -0400, Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > All of them have in common that the slotno being passed ($3 below) is in > > SLRU_PAGE_READ_IN_PROGRESS state ... could it be a problem with lock > > reordering? Maybe somebody is trying to read in a page, and somebody > > else steals the buffer from under them. Not sure how likely is that. > > It's even more interesting than that: in all three cases, > SlruSelectLRUPage has selected a "least recently used" page that is > still in READ_IN_PROGRESS state (ie, we haven't finished faulting it in) > and is recursively calling SimpleLruReadPage to wait for that condition > to terminate. > > Apparently, Jim's setup could desperately do with a larger SLRU arena > for pg_subtrans, because this is supposed to be a never-happen path --- > if you can't finish loading a page before you need its slot for > something else, you are thrashing with a capital T. > > I suppose there's a bug in this path, but I'm darned if I can see what > it is. There are a number of obvious inefficiencies, but those > shouldn't be important given that this isn't supposed to happen much. > But how's it getting to the Assert failure?
If it helps, this is a ~250G database that's (now) on an 8-way (opteron I think) machine with 32G. shared_buffers is set to 1G. My client also has a 4-way machine with 16G, although it seemed to be having some issues with producing cores that were useful. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly