Re: "unfairness" of juju/mutex

2016-11-17 Thread roger peppe
On 17 November 2016 at 12:12, Stuart Bishop  wrote:
> On 17 November 2016 at 02:34, roger peppe  wrote:
>>
>> +1 to using blocking flock. Polling is a bad idea with a heavily contended
>> lock.
>>
>> FWIW I still think that mutexing all unit hooks is a bad idea
>> that's really only there to paper over the problem that apt-get
>> doesn't work well concurrently.
>
>
> apt is just the one you commonly trip over. If there was no mutex, then
> charms would need to do their own locking for every single resource they
> need to access that might potentially also be accessed by a subordinate (now
> or in the future), and hope subordinates also use the lock. So I think
> mutexing unit hooks on the same machine is a fantastic idea :) Just
> something innocuous like 'adduser' can collide with a subordinate wanting to
> stick a config file in that user's home directory.

Surely a hook mutex primitve (e.g. "mutex adduser ...") would have been
more appropriate than the sledgehammer approach of mutexing everything
all the time? Sometimes I might want a hook to run for a long time
(or it might unfortunately block on the network) and turning off all
subordinate hooks while that happens doesn't seem right to me.

Anyway, I appreciate that it's too late now. We can't change this assumption
because it'll break all the charms that rely on it.

  cheers,
rog.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: "unfairness" of juju/mutex

2016-11-17 Thread Adam Collard
FWIW this is being tracked in https://bugs.launchpad.net/juju/+bug/1642541


On Thu, 17 Nov 2016 at 04:17 Nate Finch  wrote:

> Just for historical reference.  The original implementation of the new OS
> mutex used flock until Dave mentioned that it presented problems with file
> management (files getting renamed, deleted, etc).
>
> In general, I'm definitely on the side of using flock, though I don't
> think that necessarily solves our starvation problem, it depends on how
> flock is implemented and the specific behavior of our units.
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: "unfairness" of juju/mutex

2016-11-16 Thread Nate Finch
Just for historical reference.  The original implementation of the new OS
mutex used flock until Dave mentioned that it presented problems with file
management (files getting renamed, deleted, etc).

In general, I'm definitely on the side of using flock, though I don't think
that necessarily solves our starvation problem, it depends on how flock is
implemented and the specific behavior of our units.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: "unfairness" of juju/mutex

2016-11-16 Thread Andrew Wilkins
On Wed, Nov 16, 2016 at 11:26 PM John Meinel  wrote:

> So we just ran into an issue when you are running multiple units on the
> same machine and one of them is particularly busy.
>
> The specific case is when deploying Openstack and colocating things like
> "monitoring" charms with the "keystone" charm. Keystone itself has *lots* of
> things that relate to it, so it wants to fire something like 50
> relation-joined+changed hooks.
>
> The symptom is that unit-keystone ends up acquiring and re-acquiring the
> uniter hook lock for approximately 50 minutes and starves out all other
> units from coming up, because they can't run any of their hooks.
>
> From what I can tell, on Linux we are using
> net.Listen("abstract-unix-socket") and then polling at a 250ms interval to
> see if we can grab that socket.
>
> However, that means that every process that *doesn't* have the lock has
> an average time of 125ms to wake up and notice that the lock isn't held.
> However, a process that had the lock but has more hooks to fire is just
> going to release the lock, do a bit of logic, and then be ready to acquire
> the lock again, most likely much faster than 125ms.
>
> We *could* introduce some sort of sleep there, to give some other
> processes a chance. And/or use a range of times, instead of a fixed 250ms.
> (If sometimes you sleep for 50ms, etc).
>
> However, if we were using something like 'flock' then it has a blocking
> mode, where it can give you the lock as soon as someone else releases it.
>
> AIUI the only reason we liked abstract-unix-sockets was to not have a file
> on disk, but we had a whole directory on disk, and flock seems like it
> still gives us better sharing primitives than net.Listen.
>

+1 to blocking file lock. We could probably leave Windows alone, and just
do that on *nix.

Thoughts?
> John
> =:->
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: "unfairness" of juju/mutex

2016-11-16 Thread roger peppe
+1 to using blocking flock. Polling is a bad idea with a heavily contended
lock.

FWIW I still think that mutexing all unit hooks is a bad idea
that's really only there to paper over the problem that apt-get
doesn't work well concurrently.

  cheers,
rog.


On 16 November 2016 at 15:26, John Meinel  wrote:
> So we just ran into an issue when you are running multiple units on the same
> machine and one of them is particularly busy.
>
> The specific case is when deploying Openstack and colocating things like
> "monitoring" charms with the "keystone" charm. Keystone itself has lots of
> things that relate to it, so it wants to fire something like 50
> relation-joined+changed hooks.
>
> The symptom is that unit-keystone ends up acquiring and re-acquiring the
> uniter hook lock for approximately 50 minutes and starves out all other
> units from coming up, because they can't run any of their hooks.
>
> From what I can tell, on Linux we are using
> net.Listen("abstract-unix-socket") and then polling at a 250ms interval to
> see if we can grab that socket.
>
> However, that means that every process that doesn't have the lock has an
> average time of 125ms to wake up and notice that the lock isn't held.
> However, a process that had the lock but has more hooks to fire is just
> going to release the lock, do a bit of logic, and then be ready to acquire
> the lock again, most likely much faster than 125ms.
>
> We could introduce some sort of sleep there, to give some other processes a
> chance. And/or use a range of times, instead of a fixed 250ms. (If sometimes
> you sleep for 50ms, etc).
>
> However, if we were using something like 'flock' then it has a blocking
> mode, where it can give you the lock as soon as someone else releases it.
>
> AIUI the only reason we liked abstract-unix-sockets was to not have a file
> on disk, but we had a whole directory on disk, and flock seems like it still
> gives us better sharing primitives than net.Listen.
>
> Thoughts?
> John
> =:->
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


"unfairness" of juju/mutex

2016-11-16 Thread John Meinel
So we just ran into an issue when you are running multiple units on the
same machine and one of them is particularly busy.

The specific case is when deploying Openstack and colocating things like
"monitoring" charms with the "keystone" charm. Keystone itself has *lots* of
things that relate to it, so it wants to fire something like 50
relation-joined+changed hooks.

The symptom is that unit-keystone ends up acquiring and re-acquiring the
uniter hook lock for approximately 50 minutes and starves out all other
units from coming up, because they can't run any of their hooks.

>From what I can tell, on Linux we are using
net.Listen("abstract-unix-socket") and then polling at a 250ms interval to
see if we can grab that socket.

However, that means that every process that *doesn't* have the lock has an
average time of 125ms to wake up and notice that the lock isn't held.
However, a process that had the lock but has more hooks to fire is just
going to release the lock, do a bit of logic, and then be ready to acquire
the lock again, most likely much faster than 125ms.

We *could* introduce some sort of sleep there, to give some other processes
a chance. And/or use a range of times, instead of a fixed 250ms. (If
sometimes you sleep for 50ms, etc).

However, if we were using something like 'flock' then it has a blocking
mode, where it can give you the lock as soon as someone else releases it.

AIUI the only reason we liked abstract-unix-sockets was to not have a file
on disk, but we had a whole directory on disk, and flock seems like it
still gives us better sharing primitives than net.Listen.

Thoughts?
John
=:->
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev