Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

2007-10-23 Thread Jarek Poplawski
On Mon, Oct 22, 2007 at 10:02:59PM +0400, Oleg Nesterov wrote:
...
> If this work doesn't rearm itself - yes. (otherwise, the same ->func
> can run twice _at the same time_)
> 
> But again, in this case wait_on_work() after try_to_grab_pending() == 1
> doesn't block, so we can just do
> 
>   if (cancel_work_sync(w))
>   w->func();
> 

...but, if it were run just before work_clear_pending()?

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

2007-10-22 Thread Jarek Poplawski
On Mon, Oct 22, 2007 at 10:02:59PM +0400, Oleg Nesterov wrote:
> On 10/22, Jarek Poplawski wrote:
...
> > OK, I know I'm dumber and dumber everyday,
> 
> You are not alone. I have the same feeling about myself!

Feeling is not the same, only true knowledge counts!

> 
> > these all flushes are
> > rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync
> > looks perfectly fine
> 
> Yes,
> 
> > Then, if by any chance I'm right, something like flush_work_sync
> > (or changed flush_scheduled_work, if there is no problem with such
> > a change of implementation) could be safely (if it's called without
> > locks used by flushed work only) done cancel_work_sync() way, by
> > running a work function after try_to_grab_pending() returns 1
> 
> If this work doesn't rearm itself - yes. (otherwise, the same ->func
> can run twice _at the same time_)
> 
> But again, in this case wait_on_work() after try_to_grab_pending() == 1
> doesn't block, so we can just do
> 
>   if (cancel_work_sync(w))
>   w->func();

Of course, I should have written it much shorter by saying
flush_scheduled_work could be done internally just like you suggested!

My point is to make this all safer and simpler, so we don't have to
remember each time about these differences wrt. locking. Then checking
of such patches could be much easier. Unless this current behavior
of flush_scheduled_work has any real advantages, worth of this
potential unsafety. (Btw. is there any reason to use this with
rearming works, anyway?)

Thanks,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

2007-10-22 Thread Oleg Nesterov
On 10/22, Jarek Poplawski wrote:
>
> On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote:
> > On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote:
> > > On 10/18, Jarek Poplawski wrote:
> > > >
> > > > +/**
> > > > + * flush_work_sync - block until a work_struct's callback has 
> > > > terminated
> > > 
> > > ^^^
> > > Hmm...
> > > 
> > > > + * Similar to cancel_work_sync() but will only busy wait (without 
> > > > cancel)
> > > > + * if the work is queued.
> > > 
> > > Yes, it won't block, but will spin in busy-wait loop until all other works
> > > scheduled before this work are finished. Not good. After that it really
> > > blocks waiting for this work to complete.
> > > 
> > > And I am a bit confused. We can't use flush_workqueue() because some of 
> > > the
> > > queued work_structs may take rtnl_lock, yes? But in that case we can't use
> > > the new flush_work_sync() helper as well, no?
> 
> OK, I know I'm dumber and dumber everyday,

You are not alone. I have the same feeling about myself!

> these all flushes are
> rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync
> looks perfectly fine

Yes,

> Then, if by any chance I'm right, something like flush_work_sync
> (or changed flush_scheduled_work, if there is no problem with such
> a change of implementation) could be safely (if it's called without
> locks used by flushed work only) done cancel_work_sync() way, by
> running a work function after try_to_grab_pending() returns 1

If this work doesn't rearm itself - yes. (otherwise, the same ->func
can run twice _at the same time_)

But again, in this case wait_on_work() after try_to_grab_pending() == 1
doesn't block, so we can just do

if (cancel_work_sync(w))
w->func();

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

2007-10-21 Thread Jarek Poplawski
On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote:
> On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote:
> > On 10/18, Jarek Poplawski wrote:
> > >
> > > +/**
> > > + * flush_work_sync - block until a work_struct's callback has terminated
> > ^^^
> > Hmm...
> > 
> > > + * Similar to cancel_work_sync() but will only busy wait (without cancel)
> > > + * if the work is queued.
> > 
> > Yes, it won't block, but will spin in busy-wait loop until all other works
> > scheduled before this work are finished. Not good. After that it really
> > blocks waiting for this work to complete.
> > 
> > And I am a bit confused. We can't use flush_workqueue() because some of the
> > queued work_structs may take rtnl_lock, yes? But in that case we can't use
> > the new flush_work_sync() helper as well, no?

OK, I know I'm dumber and dumber everyday, but it seems in a hurry I
got it wrong again or miss something (as usual): these all flushes are
rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync
looks perfectly fine... (Or am I wrong because: ...?)

Then, if by any chance I'm right, something like flush_work_sync
(or changed flush_scheduled_work, if there is no problem with such
a change of implementation) could be safely (if it's called without
locks used by flushed work only) done cancel_work_sync() way, by
running a work function after try_to_grab_pending() returns 1 (after
list_del_init - of course without respecting a queue order).

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

2007-10-19 Thread Johannes Berg
On Thu, 2007-10-18 at 19:48 +0400, Oleg Nesterov wrote:

> > +void flush_work_sync(struct work_struct *work)

> If we really the new helper, perhaps we can make it a bit better?
> 
> 1. Modify insert_work() to take the "struct list_head *at" parameter instead
>of "int tail". I think this patch will also cleanup the code a bit, and
>shrink a couple of bytes from .text
> 
> 2. flush_work_sync() inserts a barrier right after this work and blocks.
>We still need some retry logic to handle the queueing is in progress
>of course, but we won't spin waiting for the other works.

3. Add lockdep annotation like the other API. :) Andrew just sent my
patch (used to be two patches by somebody's request but that's fine)
titled "workqueue: debug flushing deadlocks with lockdep" to Linus.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

2007-10-19 Thread Jarek Poplawski
On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote:
...
> sched_work_sync() with rtnl_lock(). It's only less probable to lockup
> with this than with flush_schedule_work().

...But, not much less...

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

2007-10-19 Thread Jarek Poplawski
On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote:
> On 10/18, Jarek Poplawski wrote:
> >
> > +/**
> > + * flush_work_sync - block until a work_struct's callback has terminated
> ^^^
> Hmm...
> 
> > + * Similar to cancel_work_sync() but will only busy wait (without cancel)
> > + * if the work is queued.
> 
> Yes, it won't block, but will spin in busy-wait loop until all other works
> scheduled before this work are finished. Not good. After that it really
> blocks waiting for this work to complete.
> 
> And I am a bit confused. We can't use flush_workqueue() because some of the
> queued work_structs may take rtnl_lock, yes? But in that case we can't use
> the new flush_work_sync() helper as well, no?

OOOPS!

Of course, we can't!!! I remembered there was this issue long time
ago, but then I've had some break in tracking net & workqueue. So,
while reading this patch I was alarmed at first, and self-misled
later. I think, there is definitely needed some warning about
locking (or unlocking) during these flush_ & cancel_ functions.
(Btw, I've very much wondered now, why I didn't notice at that 'old'
time, that you added such a great feature (wrt. locking) and I even
didn't notice this...). 

So, Maciej (and other readers of this thread) - I withdraw my false
opinion from my second message here: it's very wrong to call this
sched_work_sync() with rtnl_lock(). It's only less probable to lockup
with this than with flush_schedule_work().

> 
> If we can't just cancel the work, can't we do something like
> 
>   if (cancel_work_sync(w))
>   w->func(w);
> 
> instead?
> 
> > +void flush_work_sync(struct work_struct *work)
> > +{
> > +   int ret;
> > +
> > +   do {
> > +   ret = work_pending(work);
> > +   wait_on_work(work);
> > +   if (ret)
> > +   cpu_relax();
> > +   } while (ret);
> > +}
> 
> If we really the new helper, perhaps we can make it a bit better?
> 
> 1. Modify insert_work() to take the "struct list_head *at" parameter instead
>of "int tail". I think this patch will also cleanup the code a bit, and
>shrink a couple of bytes from .text

Looks like a very good idea, but I need more time to rethink this.
Probably some code example should be helpful.

> 
> 2. flush_work_sync() inserts a barrier right after this work and blocks.
>We still need some retry logic to handle the queueing is in progress
>of course, but we won't spin waiting for the other works.

Until monday I should have an opinion on that (today a bit under
fire...).

> 
> What do you think?

Since there is no gain wrt. locking with my current proposal, I
withdraw this patch of course.

It looks like my wrong patch was great idea because we got this very
precious Oleg's opinion! (I know I'm a genius sometimes...)

Thanks very much,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

2007-10-18 Thread Maciej W. Rozycki
On Thu, 18 Oct 2007, Oleg Nesterov wrote:

> If we can't just cancel the work, can't we do something like
> 
>   if (cancel_work_sync(w))
>   w->func(w);
> 
> instead?

 We do an equivalent of this -- all that we care about that w->func(w) 
would do is enable_irq() and the rest we want to skip at this point.  We 
probably do not need the counter in the end though.

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

2007-10-18 Thread Oleg Nesterov
On 10/18, Jarek Poplawski wrote:
>
> +/**
> + * flush_work_sync - block until a work_struct's callback has terminated
^^^
Hmm...

> + * Similar to cancel_work_sync() but will only busy wait (without cancel)
> + * if the work is queued.

Yes, it won't block, but will spin in busy-wait loop until all other works
scheduled before this work are finished. Not good. After that it really
blocks waiting for this work to complete.

And I am a bit confused. We can't use flush_workqueue() because some of the
queued work_structs may take rtnl_lock, yes? But in that case we can't use
the new flush_work_sync() helper as well, no?

If we can't just cancel the work, can't we do something like

if (cancel_work_sync(w))
w->func(w);

instead?

> +void flush_work_sync(struct work_struct *work)
> +{
> + int ret;
> +
> + do {
> + ret = work_pending(work);
> + wait_on_work(work);
> + if (ret)
> + cpu_relax();
> + } while (ret);
> +}

If we really the new helper, perhaps we can make it a bit better?

1. Modify insert_work() to take the "struct list_head *at" parameter instead
   of "int tail". I think this patch will also cleanup the code a bit, and
   shrink a couple of bytes from .text

2. flush_work_sync() inserts a barrier right after this work and blocks.
   We still need some retry logic to handle the queueing is in progress
   of course, but we won't spin waiting for the other works.

What do you think?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html