Re: "unfairness" of juju/mutex
On 17 November 2016 at 12:12, Stuart Bishopwrote: > On 17 November 2016 at 02:34, roger peppe wrote: >> >> +1 to using blocking flock. Polling is a bad idea with a heavily contended >> lock. >> >> FWIW I still think that mutexing all unit hooks is a bad idea >> that's really only there to paper over the problem that apt-get >> doesn't work well concurrently. > > > apt is just the one you commonly trip over. If there was no mutex, then > charms would need to do their own locking for every single resource they > need to access that might potentially also be accessed by a subordinate (now > or in the future), and hope subordinates also use the lock. So I think > mutexing unit hooks on the same machine is a fantastic idea :) Just > something innocuous like 'adduser' can collide with a subordinate wanting to > stick a config file in that user's home directory. Surely a hook mutex primitve (e.g. "mutex adduser ...") would have been more appropriate than the sledgehammer approach of mutexing everything all the time? Sometimes I might want a hook to run for a long time (or it might unfortunately block on the network) and turning off all subordinate hooks while that happens doesn't seem right to me. Anyway, I appreciate that it's too late now. We can't change this assumption because it'll break all the charms that rely on it. cheers, rog. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: "unfairness" of juju/mutex
FWIW this is being tracked in https://bugs.launchpad.net/juju/+bug/1642541 On Thu, 17 Nov 2016 at 04:17 Nate Finchwrote: > Just for historical reference. The original implementation of the new OS > mutex used flock until Dave mentioned that it presented problems with file > management (files getting renamed, deleted, etc). > > In general, I'm definitely on the side of using flock, though I don't > think that necessarily solves our starvation problem, it depends on how > flock is implemented and the specific behavior of our units. > -- > Juju-dev mailing list > Juju-dev@lists.ubuntu.com > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju-dev > -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: "unfairness" of juju/mutex
Just for historical reference. The original implementation of the new OS mutex used flock until Dave mentioned that it presented problems with file management (files getting renamed, deleted, etc). In general, I'm definitely on the side of using flock, though I don't think that necessarily solves our starvation problem, it depends on how flock is implemented and the specific behavior of our units. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: "unfairness" of juju/mutex
On Wed, Nov 16, 2016 at 11:26 PM John Meinelwrote: > So we just ran into an issue when you are running multiple units on the > same machine and one of them is particularly busy. > > The specific case is when deploying Openstack and colocating things like > "monitoring" charms with the "keystone" charm. Keystone itself has *lots* of > things that relate to it, so it wants to fire something like 50 > relation-joined+changed hooks. > > The symptom is that unit-keystone ends up acquiring and re-acquiring the > uniter hook lock for approximately 50 minutes and starves out all other > units from coming up, because they can't run any of their hooks. > > From what I can tell, on Linux we are using > net.Listen("abstract-unix-socket") and then polling at a 250ms interval to > see if we can grab that socket. > > However, that means that every process that *doesn't* have the lock has > an average time of 125ms to wake up and notice that the lock isn't held. > However, a process that had the lock but has more hooks to fire is just > going to release the lock, do a bit of logic, and then be ready to acquire > the lock again, most likely much faster than 125ms. > > We *could* introduce some sort of sleep there, to give some other > processes a chance. And/or use a range of times, instead of a fixed 250ms. > (If sometimes you sleep for 50ms, etc). > > However, if we were using something like 'flock' then it has a blocking > mode, where it can give you the lock as soon as someone else releases it. > > AIUI the only reason we liked abstract-unix-sockets was to not have a file > on disk, but we had a whole directory on disk, and flock seems like it > still gives us better sharing primitives than net.Listen. > +1 to blocking file lock. We could probably leave Windows alone, and just do that on *nix. Thoughts? > John > =:-> > > -- > Juju-dev mailing list > Juju-dev@lists.ubuntu.com > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju-dev > -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: "unfairness" of juju/mutex
+1 to using blocking flock. Polling is a bad idea with a heavily contended lock. FWIW I still think that mutexing all unit hooks is a bad idea that's really only there to paper over the problem that apt-get doesn't work well concurrently. cheers, rog. On 16 November 2016 at 15:26, John Meinelwrote: > So we just ran into an issue when you are running multiple units on the same > machine and one of them is particularly busy. > > The specific case is when deploying Openstack and colocating things like > "monitoring" charms with the "keystone" charm. Keystone itself has lots of > things that relate to it, so it wants to fire something like 50 > relation-joined+changed hooks. > > The symptom is that unit-keystone ends up acquiring and re-acquiring the > uniter hook lock for approximately 50 minutes and starves out all other > units from coming up, because they can't run any of their hooks. > > From what I can tell, on Linux we are using > net.Listen("abstract-unix-socket") and then polling at a 250ms interval to > see if we can grab that socket. > > However, that means that every process that doesn't have the lock has an > average time of 125ms to wake up and notice that the lock isn't held. > However, a process that had the lock but has more hooks to fire is just > going to release the lock, do a bit of logic, and then be ready to acquire > the lock again, most likely much faster than 125ms. > > We could introduce some sort of sleep there, to give some other processes a > chance. And/or use a range of times, instead of a fixed 250ms. (If sometimes > you sleep for 50ms, etc). > > However, if we were using something like 'flock' then it has a blocking > mode, where it can give you the lock as soon as someone else releases it. > > AIUI the only reason we liked abstract-unix-sockets was to not have a file > on disk, but we had a whole directory on disk, and flock seems like it still > gives us better sharing primitives than net.Listen. > > Thoughts? > John > =:-> > > > -- > Juju-dev mailing list > Juju-dev@lists.ubuntu.com > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju-dev > -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
"unfairness" of juju/mutex
So we just ran into an issue when you are running multiple units on the same machine and one of them is particularly busy. The specific case is when deploying Openstack and colocating things like "monitoring" charms with the "keystone" charm. Keystone itself has *lots* of things that relate to it, so it wants to fire something like 50 relation-joined+changed hooks. The symptom is that unit-keystone ends up acquiring and re-acquiring the uniter hook lock for approximately 50 minutes and starves out all other units from coming up, because they can't run any of their hooks. >From what I can tell, on Linux we are using net.Listen("abstract-unix-socket") and then polling at a 250ms interval to see if we can grab that socket. However, that means that every process that *doesn't* have the lock has an average time of 125ms to wake up and notice that the lock isn't held. However, a process that had the lock but has more hooks to fire is just going to release the lock, do a bit of logic, and then be ready to acquire the lock again, most likely much faster than 125ms. We *could* introduce some sort of sleep there, to give some other processes a chance. And/or use a range of times, instead of a fixed 250ms. (If sometimes you sleep for 50ms, etc). However, if we were using something like 'flock' then it has a blocking mode, where it can give you the lock as soon as someone else releases it. AIUI the only reason we liked abstract-unix-sockets was to not have a file on disk, but we had a whole directory on disk, and flock seems like it still gives us better sharing primitives than net.Listen. Thoughts? John =:-> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev