Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes
On Mon, Oct 22, 2007 at 10:02:59PM +0400, Oleg Nesterov wrote: ... > If this work doesn't rearm itself - yes. (otherwise, the same ->func > can run twice _at the same time_) > > But again, in this case wait_on_work() after try_to_grab_pending() == 1 > doesn't block, so we can just do > > if (cancel_work_sync(w)) > w->func(); > ...but, if it were run just before work_clear_pending()? Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes
On Mon, Oct 22, 2007 at 10:02:59PM +0400, Oleg Nesterov wrote: > On 10/22, Jarek Poplawski wrote: ... > > OK, I know I'm dumber and dumber everyday, > > You are not alone. I have the same feeling about myself! Feeling is not the same, only true knowledge counts! > > > these all flushes are > > rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync > > looks perfectly fine > > Yes, > > > Then, if by any chance I'm right, something like flush_work_sync > > (or changed flush_scheduled_work, if there is no problem with such > > a change of implementation) could be safely (if it's called without > > locks used by flushed work only) done cancel_work_sync() way, by > > running a work function after try_to_grab_pending() returns 1 > > If this work doesn't rearm itself - yes. (otherwise, the same ->func > can run twice _at the same time_) > > But again, in this case wait_on_work() after try_to_grab_pending() == 1 > doesn't block, so we can just do > > if (cancel_work_sync(w)) > w->func(); Of course, I should have written it much shorter by saying flush_scheduled_work could be done internally just like you suggested! My point is to make this all safer and simpler, so we don't have to remember each time about these differences wrt. locking. Then checking of such patches could be much easier. Unless this current behavior of flush_scheduled_work has any real advantages, worth of this potential unsafety. (Btw. is there any reason to use this with rearming works, anyway?) Thanks, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes
On 10/22, Jarek Poplawski wrote: > > On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote: > > On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote: > > > On 10/18, Jarek Poplawski wrote: > > > > > > > > +/** > > > > + * flush_work_sync - block until a work_struct's callback has > > > > terminated > > > > > > ^^^ > > > Hmm... > > > > > > > + * Similar to cancel_work_sync() but will only busy wait (without > > > > cancel) > > > > + * if the work is queued. > > > > > > Yes, it won't block, but will spin in busy-wait loop until all other works > > > scheduled before this work are finished. Not good. After that it really > > > blocks waiting for this work to complete. > > > > > > And I am a bit confused. We can't use flush_workqueue() because some of > > > the > > > queued work_structs may take rtnl_lock, yes? But in that case we can't use > > > the new flush_work_sync() helper as well, no? > > OK, I know I'm dumber and dumber everyday, You are not alone. I have the same feeling about myself! > these all flushes are > rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync > looks perfectly fine Yes, > Then, if by any chance I'm right, something like flush_work_sync > (or changed flush_scheduled_work, if there is no problem with such > a change of implementation) could be safely (if it's called without > locks used by flushed work only) done cancel_work_sync() way, by > running a work function after try_to_grab_pending() returns 1 If this work doesn't rearm itself - yes. (otherwise, the same ->func can run twice _at the same time_) But again, in this case wait_on_work() after try_to_grab_pending() == 1 doesn't block, so we can just do if (cancel_work_sync(w)) w->func(); Oleg. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes
On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote: > On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote: > > On 10/18, Jarek Poplawski wrote: > > > > > > +/** > > > + * flush_work_sync - block until a work_struct's callback has terminated > > ^^^ > > Hmm... > > > > > + * Similar to cancel_work_sync() but will only busy wait (without cancel) > > > + * if the work is queued. > > > > Yes, it won't block, but will spin in busy-wait loop until all other works > > scheduled before this work are finished. Not good. After that it really > > blocks waiting for this work to complete. > > > > And I am a bit confused. We can't use flush_workqueue() because some of the > > queued work_structs may take rtnl_lock, yes? But in that case we can't use > > the new flush_work_sync() helper as well, no? OK, I know I'm dumber and dumber everyday, but it seems in a hurry I got it wrong again or miss something (as usual): these all flushes are rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync looks perfectly fine... (Or am I wrong because: ...?) Then, if by any chance I'm right, something like flush_work_sync (or changed flush_scheduled_work, if there is no problem with such a change of implementation) could be safely (if it's called without locks used by flushed work only) done cancel_work_sync() way, by running a work function after try_to_grab_pending() returns 1 (after list_del_init - of course without respecting a queue order). Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes
On Thu, 2007-10-18 at 19:48 +0400, Oleg Nesterov wrote: > > +void flush_work_sync(struct work_struct *work) > If we really the new helper, perhaps we can make it a bit better? > > 1. Modify insert_work() to take the "struct list_head *at" parameter instead >of "int tail". I think this patch will also cleanup the code a bit, and >shrink a couple of bytes from .text > > 2. flush_work_sync() inserts a barrier right after this work and blocks. >We still need some retry logic to handle the queueing is in progress >of course, but we won't spin waiting for the other works. 3. Add lockdep annotation like the other API. :) Andrew just sent my patch (used to be two patches by somebody's request but that's fine) titled "workqueue: debug flushing deadlocks with lockdep" to Linus. johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes
On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote: ... > sched_work_sync() with rtnl_lock(). It's only less probable to lockup > with this than with flush_schedule_work(). ...But, not much less... Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes
On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote: > On 10/18, Jarek Poplawski wrote: > > > > +/** > > + * flush_work_sync - block until a work_struct's callback has terminated > ^^^ > Hmm... > > > + * Similar to cancel_work_sync() but will only busy wait (without cancel) > > + * if the work is queued. > > Yes, it won't block, but will spin in busy-wait loop until all other works > scheduled before this work are finished. Not good. After that it really > blocks waiting for this work to complete. > > And I am a bit confused. We can't use flush_workqueue() because some of the > queued work_structs may take rtnl_lock, yes? But in that case we can't use > the new flush_work_sync() helper as well, no? OOOPS! Of course, we can't!!! I remembered there was this issue long time ago, but then I've had some break in tracking net & workqueue. So, while reading this patch I was alarmed at first, and self-misled later. I think, there is definitely needed some warning about locking (or unlocking) during these flush_ & cancel_ functions. (Btw, I've very much wondered now, why I didn't notice at that 'old' time, that you added such a great feature (wrt. locking) and I even didn't notice this...). So, Maciej (and other readers of this thread) - I withdraw my false opinion from my second message here: it's very wrong to call this sched_work_sync() with rtnl_lock(). It's only less probable to lockup with this than with flush_schedule_work(). > > If we can't just cancel the work, can't we do something like > > if (cancel_work_sync(w)) > w->func(w); > > instead? > > > +void flush_work_sync(struct work_struct *work) > > +{ > > + int ret; > > + > > + do { > > + ret = work_pending(work); > > + wait_on_work(work); > > + if (ret) > > + cpu_relax(); > > + } while (ret); > > +} > > If we really the new helper, perhaps we can make it a bit better? > > 1. Modify insert_work() to take the "struct list_head *at" parameter instead >of "int tail". I think this patch will also cleanup the code a bit, and >shrink a couple of bytes from .text Looks like a very good idea, but I need more time to rethink this. Probably some code example should be helpful. > > 2. flush_work_sync() inserts a barrier right after this work and blocks. >We still need some retry logic to handle the queueing is in progress >of course, but we won't spin waiting for the other works. Until monday I should have an opinion on that (today a bit under fire...). > > What do you think? Since there is no gain wrt. locking with my current proposal, I withdraw this patch of course. It looks like my wrong patch was great idea because we got this very precious Oleg's opinion! (I know I'm a genius sometimes...) Thanks very much, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes
On Thu, 18 Oct 2007, Oleg Nesterov wrote: > If we can't just cancel the work, can't we do something like > > if (cancel_work_sync(w)) > w->func(w); > > instead? We do an equivalent of this -- all that we care about that w->func(w) would do is enable_irq() and the rest we want to skip at this point. We probably do not need the counter in the end though. Maciej - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes
On 10/18, Jarek Poplawski wrote: > > +/** > + * flush_work_sync - block until a work_struct's callback has terminated ^^^ Hmm... > + * Similar to cancel_work_sync() but will only busy wait (without cancel) > + * if the work is queued. Yes, it won't block, but will spin in busy-wait loop until all other works scheduled before this work are finished. Not good. After that it really blocks waiting for this work to complete. And I am a bit confused. We can't use flush_workqueue() because some of the queued work_structs may take rtnl_lock, yes? But in that case we can't use the new flush_work_sync() helper as well, no? If we can't just cancel the work, can't we do something like if (cancel_work_sync(w)) w->func(w); instead? > +void flush_work_sync(struct work_struct *work) > +{ > + int ret; > + > + do { > + ret = work_pending(work); > + wait_on_work(work); > + if (ret) > + cpu_relax(); > + } while (ret); > +} If we really the new helper, perhaps we can make it a bit better? 1. Modify insert_work() to take the "struct list_head *at" parameter instead of "int tail". I think this patch will also cleanup the code a bit, and shrink a couple of bytes from .text 2. flush_work_sync() inserts a barrier right after this work and blocks. We still need some retry logic to handle the queueing is in progress of course, but we won't spin waiting for the other works. What do you think? Oleg. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html