The trace didn't look like it was the result of stuck ManifoldCF locks. Karl
On Mon, Feb 11, 2013 at 11:43 AM, Erlend Garåsen <[email protected]> wrote: > On 11.02.13 17.14, Karl Wright wrote: >> >> I've confirmed that there's a deadlock of some kind; it's occurring as >> a result of the hopcount depth tracking. The issue seems to be nested >> transactions; PostgreSQL doesn't appear to be dealing with these >> properly in all cases, and winds up getting a deadlock that it doesn't >> detect. >> >> I have to look carefully at the code to see if it can be restructured. > > > Perhaps I should wait till I start the job once again? > > When I had the last version from trunk deployed on our test version, the job > ran for over three days without any problems. This deadlock occurred only > one hour before I started he job earlier today with version 1.1.1 RC0 > installed. The only thing I did prior to the upgrade was to stop the running > job, stopping the Agent process and the Resin instance. I _did_ notice that > the Agent process was still running, so I killed it (-9) and cleaned locks > thereafter. I doubt that this had some impact on PG, but I'm mentioning this > anyway in order to provide as much information as possible. By looking bash > history, I can confirm that I did everything in the correct order: killed > the process and then ran the lock clean command class. > > Our agent control script is also attached in case we're doing something > wrong. > > > Erlend > > -- > Erlend Garåsen > Center for Information Technology Services > University of Oslo > P.O. Box 1086 Blindern, N-0317 OSLO, Norway > Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
