On 1 December 2015 at 08:37, Ben Nemec <openst...@nemebean.com> wrote: > On 11/30/2015 12:42 PM, Joshua Harlow wrote: >> Hi all, >> >> I just wanted to bring up an issue, possible solution and get feedback >> on it from folks because it seems to be an on-going problem that shows >> up not when an application is initially deployed but as on-going >> operation and running of that application proceeds (ie after running for >> a period of time). >> >> The jist of the problem is the following: >> >> A <<pick your favorite openstack project>> has a need to ensure that no >> application on the same machine can manipulate a given resource on that >> same machine, so it uses the lock file pattern (acquire a *local* lock >> file for that resource, manipulate that resource, release that lock >> file) to do actions on that resource in a safe manner (note this does >> not ensure safety outside of that machine, lock files are *not* >> distributed locks). >> >> The api that we expose from oslo is typically accessed via the following: >> >> oslo_concurrency.lockutils.synchronized(name, lock_file_prefix=None, >> external=False, lock_path=None, semaphores=None, delay=0.01) >> >> or via its underlying library (that I extracted from oslo.concurrency >> and have improved to add more usefulness) @ >> http://fasteners.readthedocs.org/ >> >> The issue though for <<your favorite openstack project>> is that each of >> these projects now typically has a large amount of lock files that exist >> or have existed and no easy way to determine when those lock files can >> be deleted (afaik no? periodic task exists in said projects to clean up >> lock files, or to delete them when they are no longer in use...) so what >> happens is bugs like https://bugs.launchpad.net/cinder/+bug/1432387 >> appear and there is no a simple solution to clean lock files up (since >> oslo.concurrency is really not the right layer to know when a lock can >> or can not be deleted, only the application knows that...) >> >> So then we get a few creative solutions like the following: >> >> - https://review.openstack.org/#/c/241663/ >> - https://review.openstack.org/#/c/239678/ >> - (and others?) >> >> So I wanted to ask the question, how are people involved in <<your >> favorite openstack project>> cleaning up these files (are they at all?) >> >> Another idea that I have been proposing also is to use offset locks. >> >> This would allow for not creating X lock files, but create a *single* >> lock file per project and use offsets into it as the way to lock. For >> example nova could/would create a 1MB (or larger/smaller) *empty* file >> for locks, that would allow for 1,048,576 locks to be used at the same >> time, which honestly should be way more than enough, and then there >> would not need to be any lock cleanup at all... Is there any reason this >> wasn't initially done back way when this lock file code was created? >> (https://github.com/harlowja/fasteners/pull/10 adds this functionality >> to the underlying library if people want to look it over) > > I think the main reason was that even with a million locks available, > you'd have to find a way to hash the lock names to offsets in the file, > and a million isn't a very large collision space for that. Having two > differently named locks that hashed to the same offset would lead to > incredibly confusing bugs. > > We could switch to requiring the projects to provide the offsets instead > of hashing a string value, but that's just pushing the collision problem > off onto every project that uses us. > > So that's the problem as I understand it, but where does that leave us > for solutions? First, there's > https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/lockutils.py#L151 > which allows consumers to delete lock files when they're done with them. > Of course, in that case the onus is on the caller to make sure the lock > couldn't possibly be in use anymore. > > Second, is this actually a problem? Modern filesystems have absurdly > large limits on the number of files in a directory, so it's highly > unlikely we would ever exhaust that, and we're creating all zero byte > files so there shouldn't be a significant space impact either. In the > past I believe our recommendation has been to simply create a cleanup > job that runs on boot, before any of the OpenStack services start, that > deletes all of the lock files. At that point you know it's safe to > delete them, and it prevents your lock file directory from growing forever.
Not that high - ext3 (still the default for nova ephemeral partitions!) has a limit of 64k in one directory. That said, I don't disagree - my thinkis is that we should advise putting such files on a tmpfs. -Rob __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev