I've been getting periodic entries in .../mailman/logs/locks that show:

Oct 08 08:33:50 2004 (6969) listname.lock unexpected linkcount: -1
Oct 08 08:33:50 2004 (6969) listname.lock lifetime has expired, breaking

Lots of error messages, but no apparent problems with list delivery. 
I probably would not have noticed but for an "oops" that tried to
gateway 30,000+ news messages in to a test list.  This flooded the log
nicely, and caught my attention....

Final analysis is that while waiting for a lock to be freed, a waiting
process enter a race condition when the holding process releases the
lock, and the result is that the non-existant lock file is checked for
it's link count (__linkcount returns -1), and then has it's lifetime
checked (__releasetime() returns -1, which results in an expired

The patch:

*** mailman-2.1.5/Mailman/LockFile.py   Mon Mar 31 22:28:16 2003
--- LockFile.py Tue Oct 12 14:05:21 2004
*** 264,269 ****
--- 264,271 ----
                  # The link failed for some reason, possibly because someone
                  # else already has the lock (i.e. we got an EEXIST), or for
                  # some other bizarre reason.
+               self.__writelog ('Link attempt failed.  OSError is %s' % 
+                                                        os.strerror(e.errno))
                  if e.errno == errno.ENOENT:
                      # TBD: in some Linux environments, it is possible to get
                      # an ENOENT, which is truly strange, because this means
*** 283,290 ****
                  elif self.__linkcount() <> 2:
                      # Somebody's messin' with us!  Log this, and try again
                      # later.  TBD: should we raise an exception?
                      self.__writelog('unexpected linkcount: %d' %
!                                     self.__linkcount(), important=True)
                  elif self.__read() == self.__tmpfname:
                      # It was us that already had the link.
                      self.__writelog('already locked')
--- 285,297 ----
                  elif self.__linkcount() <> 2:
                      # Somebody's messin' with us!  Log this, and try again
                      # later.  TBD: should we raise an exception?
+                     links = self.__linkcount()             
+                     if links == -1:  # The lock was cleared already!
+                         self.__writelog(
+                             'No lockfile after a lockfile exists error?')
+                         continue                           
                      self.__writelog('unexpected linkcount: %d' %
!                                     links, important=True) 
                  elif self.__read() == self.__tmpfname:
                      # It was us that already had the link.
                      self.__writelog('already locked')
*** 299,305 ****
                  raise TimeOutError
              # Okay, we haven't timed out, but we didn't get the lock.  Let's
              # find if the lock lifetime has expired.
!             if time.time() > self.__releasetime() + CLOCK_SLOP:
                  # Yes, so break the lock.
                  self.__writelog('lifetime has expired, breaking',
--- 306,317 ----
                  raise TimeOutError
              # Okay, we haven't timed out, but we didn't get the lock.  Let's
              # find if the lock lifetime has expired.
!             rel_time = self.__releasetime()
!             if (rel_time == -1):      # Lock does not exist anymore?
!                 self.__writelog(
!                         'Checked the release time of a non-existant lock.')
!                 continue
!             elif time.time() > rel_time + CLOCK_SLOP:
                  # Yes, so break the lock.
                  self.__writelog('lifetime has expired, breaking',

Brian Greenberg
Mailman-Users mailing list
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/

Reply via email to