I've been getting periodic entries in .../mailman/logs/locks that show: Oct 08 08:33:50 2004 (6969) listname.lock unexpected linkcount: -1 Oct 08 08:33:50 2004 (6969) listname.lock lifetime has expired, breaking
Lots of error messages, but no apparent problems with list delivery. I probably would not have noticed but for an "oops" that tried to gateway 30,000+ news messages in to a test list. This flooded the log nicely, and caught my attention.... Final analysis is that while waiting for a lock to be freed, a waiting process enter a race condition when the holding process releases the lock, and the result is that the non-existant lock file is checked for it's link count (__linkcount returns -1), and then has it's lifetime checked (__releasetime() returns -1, which results in an expired lifetime). The patch: ------------------------------------------------- *** mailman-2.1.5/Mailman/LockFile.py Mon Mar 31 22:28:16 2003 --- LockFile.py Tue Oct 12 14:05:21 2004 *************** *** 264,269 **** --- 264,271 ---- # The link failed for some reason, possibly because someone # else already has the lock (i.e. we got an EEXIST), or for # some other bizarre reason. + self.__writelog ('Link attempt failed. OSError is %s' % + os.strerror(e.errno)) if e.errno == errno.ENOENT: # TBD: in some Linux environments, it is possible to get # an ENOENT, which is truly strange, because this means *************** *** 283,290 **** elif self.__linkcount() <> 2: # Somebody's messin' with us! Log this, and try again # later. TBD: should we raise an exception? self.__writelog('unexpected linkcount: %d' % ! self.__linkcount(), important=True) elif self.__read() == self.__tmpfname: # It was us that already had the link. self.__writelog('already locked') --- 285,297 ---- elif self.__linkcount() <> 2: # Somebody's messin' with us! Log this, and try again # later. TBD: should we raise an exception? + links = self.__linkcount() + if links == -1: # The lock was cleared already! + self.__writelog( + 'No lockfile after a lockfile exists error?') + continue self.__writelog('unexpected linkcount: %d' % ! links, important=True) elif self.__read() == self.__tmpfname: # It was us that already had the link. self.__writelog('already locked') *************** *** 299,305 **** raise TimeOutError # Okay, we haven't timed out, but we didn't get the lock. Let's # find if the lock lifetime has expired. ! if time.time() > self.__releasetime() + CLOCK_SLOP: # Yes, so break the lock. self.__break() self.__writelog('lifetime has expired, breaking', --- 306,317 ---- raise TimeOutError # Okay, we haven't timed out, but we didn't get the lock. Let's # find if the lock lifetime has expired. ! rel_time = self.__releasetime() ! if (rel_time == -1): # Lock does not exist anymore? ! self.__writelog( ! 'Checked the release time of a non-existant lock.') ! continue ! elif time.time() > rel_time + CLOCK_SLOP: # Yes, so break the lock. self.__break() self.__writelog('lifetime has expired, breaking', -------------------------------------------------- Brian. -- Brian Greenberg [EMAIL PROTECTED]
------------------------------------------------------ Mailman-Users mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/