Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Andy Fleming


On Dec 7, 2006, at 11:05, Jeff Garzik wrote:

Yes, I merged the code, but looking deeper at phy its clear I  
missed some things.


Looking into libphy's workqueue stuff, it has the following sequence:

disable interrupts
schedule_work()

... time passes ...
... workqueue routine is called ...

enable interrupts
handle interrupt

I really have to question if a workqueue was the best choice of  
direction for such a sequence.  You don't want to put off handling  
an interrupt, with interrupts disabled, for a potentially unbounded  
amount of time.


Maybe the best course of action is to mark it with CONFIG_BROKEN  
until it gets fixed.



Yes, this is one of the design choices I made to be able to lock  
properly around MDIO transactions.


1) MDIO transactions take a long time
2) Some interfaces provide an interrupt to indicate the transaction  
has completed.


So I didn't want an irq-disabling spin lock.  It would prevent 2 from  
ever being used, and would mean all interrupts were disabled for  
thousands of cycles for MDIO transactions.


So I decreed that no MDIO transactions can happen in interrupt  
context.  But the registers to disable the specific PHY's interrupt  
are only accessible through MDIO, so in order to be able to follow  
that edict AND ever handle the interrupt, I need to disable the  
interrupt.  But I only disable the PHY's interrupt (which is likely  
shared among some devices).  I agree it's ugly, but I haven't yet  
figured out another way to do it.


I'd love to eliminate the work queue altogether.  I keep thinking  
that I could do something with semaphores, or spin_trylocks, but I'm  
not sure about the best way to let the interrupt handlers do their  
thing.


Andy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Linus Torvalds


On Thu, 7 Dec 2006, Andrew Morton wrote:
> 
> We _can_ trust it in the context of
> 
>   void flush_work(struct work_struct *work)

Yes, but the way the bits are defined, the "pending" bit is not meaningful 
as a synchronization event, for example - because _other_ users can't 
trust it once they've dispatched the function. So even in the synchronous 
run/flush_scheduled_work() kind of situation, you end up having to work 
with the fact that nobody _else_ can rely on the data structures, and that 
they are designed to work that way..

> ho-hum.  I'll take a look at turning that into something which compiles,
> then I'll convert a few oft-used flush_scheduled_work() callers over to use
> it.  To do this on a sensible timescale perhaps means that we should export
> current_is_keventd(), get the howling hordes off our backs.

Well, I simply committed my work that doesn't guarantee synchronization - 
the synchronization can now be added in kernel/workqueue.c any way we 
want. It's better than what we used to have, for sure, in both compiling 
and solving the practical problem, but also as a "go forward" point.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Andrew Morton
On Thu, 7 Dec 2006 10:01:56 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Thu, 7 Dec 2006, Andrew Morton wrote:
> > 
> > umm..  Putting a work_struct* into struct cpu_workqueue_struct and then
> > doing appropriate things with cpu_workqueue_struct.lock might work.
> 
> Yeah, that looks sane. We can't hide anything in "struct work", because we 
> can't trust it any more once it's been dispatched,

We _can_ trust it in the context of

void flush_work(struct work_struct *work)

because the caller "owns" the work_struct.  It'd be pretty nutty for the
caller to pass in a pointer to something which could be freed at any time. 
Most flush_work_queue() callers do something like:


flush_scheduled_work();
kfree(my_object_which_contains_a_work_struct);

hopefully libphy follows that model...

> but adding a pointer to 
> the cpu_workqueue_struct that is only used to compare against another 
> pointer sounds fine.

ho-hum.  I'll take a look at turning that into something which compiles,
then I'll convert a few oft-used flush_scheduled_work() callers over to use
it.  To do this on a sensible timescale perhaps means that we should export
current_is_keventd(), get the howling hordes off our backs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Andrew Morton
On Thu, 7 Dec 2006 09:57:15 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Thu, 07 Dec 2006 12:05:38 -0500
> Jeff Garzik <[EMAIL PROTECTED]> wrote:
> 
> > Yes, I merged the code, but looking deeper at phy its clear I missed 
> > some things.
> > 
> > Looking into libphy's workqueue stuff, it has the following sequence:
> > 
> > disable interrupts
> > schedule_work()
> > 
> > ... time passes ...
> > ... workqueue routine is called ...
> > 
> > enable interrupts
> > handle interrupt
> > 
> > I really have to question if a workqueue was the best choice of 
> > direction for such a sequence.  You don't want to put off handling an 
> > interrupt, with interrupts disabled, for a potentially unbounded amount 
> > of time.
> 
> That'll lock the box on UP, or if the timer fires on the current CPU?

oh. "disable interrupts" == disable_irq(), not local_irq_disable()?

Not so bad ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Maciej W. Rozycki
On Thu, 7 Dec 2006, Jeff Garzik wrote:

> Looking into libphy's workqueue stuff, it has the following sequence:
> 
>   disable interrupts
>   schedule_work()
> 
>   ... time passes ...
>   ... workqueue routine is called ...
> 
>   enable interrupts
>   handle interrupt
> 
> I really have to question if a workqueue was the best choice of direction for
> such a sequence.  You don't want to put off handling an interrupt, with
> interrupts disabled, for a potentially unbounded amount of time.

 This is because to ack the interrupt in the device the MDIO bus has to be 
accessed and I gather for some implementations it may be too obnoxiously 
slow for the interrupt context to cope with.  Note that only the interrupt 
line used for the PHY is disabled (though obviously with consequences to 
any sharers).

 Andy, could you please comment?

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Linus Torvalds


On Thu, 7 Dec 2006, Andrew Morton wrote:
> 
> umm..  Putting a work_struct* into struct cpu_workqueue_struct and then
> doing appropriate things with cpu_workqueue_struct.lock might work.

Yeah, that looks sane. We can't hide anything in "struct work", because we 
can't trust it any more once it's been dispatched, but adding a pointer to 
the cpu_workqueue_struct that is only used to compare against another 
pointer sounds fine.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Andrew Morton
On Thu, 07 Dec 2006 12:05:38 -0500
Jeff Garzik <[EMAIL PROTECTED]> wrote:

> Yes, I merged the code, but looking deeper at phy its clear I missed 
> some things.
> 
> Looking into libphy's workqueue stuff, it has the following sequence:
> 
>   disable interrupts
>   schedule_work()
> 
>   ... time passes ...
>   ... workqueue routine is called ...
> 
>   enable interrupts
>   handle interrupt
> 
> I really have to question if a workqueue was the best choice of 
> direction for such a sequence.  You don't want to put off handling an 
> interrupt, with interrupts disabled, for a potentially unbounded amount 
> of time.

That'll lock the box on UP, or if the timer fires on the current CPU?

> Maybe the best course of action is to mark it with CONFIG_BROKEN until 
> it gets fixed.

hm, maybe.  I wonder if as a short-term palliative we could remove the
current_is_keventd() call and drop rtnl_lock.  Or export current_is_keventd ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Andrew Morton
On Thu, 7 Dec 2006 08:49:02 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Wed, 6 Dec 2006, Andrew Morton wrote:
> >
> > But this will return to the caller if the callback is presently running on
> > a different CPU.  The whole point here is to be able to reliably kill off
> > the pending work so that the caller can free resources.
> 
> I mentioned that in one of the emails.
> 
> We do not _have_ the information to not do that. It simply doesn't exist. 
> We can either wait for _all_ pending entries on the to complete (by 
> waiting for the workqueue counters for added/removed to be the same), or 
> we can have the race.

Well we'll need to add the infrastructure to be able to do this, won't we? 
The whole point of calling flush_scheduled_work() (which we're trying to
replace/simplify) is to block the caller until it is safe to release
resources.

It might be a challenge to do this without adding more stuff to work_struct
though.

umm..  Putting a work_struct* into struct cpu_workqueue_struct and then
doing appropriate things with cpu_workqueue_struct.lock might work.



Something along these lines.  The keventd-calls-flush_work() case rather
sucks though.


diff -puN kernel/workqueue.c~a kernel/workqueue.c
--- a/kernel/workqueue.c~a
+++ a/kernel/workqueue.c
@@ -323,6 +323,7 @@ static void run_workqueue(struct cpu_wor
work_func_t f = work->func;
 
list_del_init(cwq->worklist.next);
+   cwq->current_work = work;
spin_unlock_irqrestore(&cwq->lock, flags);
 
BUG_ON(get_wq_data(work) != cwq);
@@ -342,6 +343,7 @@ static void run_workqueue(struct cpu_wor
}
 
spin_lock_irqsave(&cwq->lock, flags);
+   cwq->current_work = NULL;
cwq->remove_sequence++;
wake_up(&cwq->work_done);
}
@@ -425,6 +427,64 @@ static void flush_cpu_workqueue(struct c
}
 }
 
+static void wait_on_work(struct cpu_workqueue_struct *cwq,
+   struct work_struct *work)
+{
+   DEFINE_WAIT(wait);
+
+   spin_lock_irq(&cwq->lock);
+   while (cwq->current_work == work) {
+   prepare_to_wait(&cwq->work_done, &wait, TASK_UNINTERRUPTIBLE);
+   spin_unlock_irq(&cwq->lock);
+   schedule();
+   spin_lock_irq(&cwq->lock);
+   }
+   finish_wait(&cwq->work_done, &wait);
+   spin_unlock_irq(&cwq->lock);
+}
+
+static void flush_one_work(struct cpu_workqueue_struct *cwq,
+   struct work_struct *work)
+{
+   spin_lock_irq(&cwq->lock);
+   if (test_and_clear_bit(WORK_STRUCT_PENDING, &work->management)) {
+   list_del_init(&work->entry);
+   spin_unlock_irq(&cwq->lock);
+   return;
+   }
+   spin_unlock_irq(&cwq->lock);
+
+   /* It's running, or it has completed */
+
+   if (cwq->thread == current) {
+   /* This stinks */
+   /*
+* Probably keventd trying to flush its own queue. So simply run
+* it by hand rather than deadlocking.
+*/
+   run_workqueue(cwq);
+   } else {
+   wait_on_work(cwq, work);
+   }
+}
+
+void flush_work(struct work_struct *work)
+{
+   might_sleep();
+
+   if (is_single_threaded(wq)) {
+   /* Always use first cpu's area. */
+   flush_one_work(per_cpu_ptr(wq->cpu_wq, singlethread_cpu), work);
+   } else {
+   int cpu;
+
+   mutex_lock(&workqueue_mutex);
+   for_each_online_cpu(cpu)
+   flush_one_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
+   mutex_unlock(&workqueue_mutex);
+   }
+}
+
 /**
  * flush_workqueue - ensure that any scheduled work has run to completion.
  * @wq: workqueue to flush
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Jeff Garzik
Yes, I merged the code, but looking deeper at phy its clear I missed 
some things.


Looking into libphy's workqueue stuff, it has the following sequence:

disable interrupts
schedule_work()

... time passes ...
... workqueue routine is called ...

enable interrupts
handle interrupt

I really have to question if a workqueue was the best choice of 
direction for such a sequence.  You don't want to put off handling an 
interrupt, with interrupts disabled, for a potentially unbounded amount 
of time.


Maybe the best course of action is to mark it with CONFIG_BROKEN until 
it gets fixed.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Linus Torvalds


On Wed, 6 Dec 2006, Andrew Morton wrote:
>
> But this will return to the caller if the callback is presently running on
> a different CPU.  The whole point here is to be able to reliably kill off
> the pending work so that the caller can free resources.

I mentioned that in one of the emails.

We do not _have_ the information to not do that. It simply doesn't exist. 
We can either wait for _all_ pending entries on the to complete (by 
waiting for the workqueue counters for added/removed to be the same), or 
we can have the race.

> Also, I worry that this code can run the callback on the caller's CPU. 

Right.

> Users of per-cpu workqueues can legitimately assume that each callback runs
> on the right CPU.

Not if they use this interface, they can't.

They asked for it, they get it. That's the unix philosophy. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Maciej W. Rozycki
On Wed, 6 Dec 2006, Linus Torvalds wrote:

> I didn't get any answers on this. I'd like to get this issue resolved, but 
> since I don't even use libphy on my main machine, I need somebody else to 
> test it for me.

 I tested it in the evening with the system I implemented it for 
originally.  It seems to work -- it does not oops as would the original 
code without the call to flush_scheduled_work() nor does it deadlock as 
would code with the call (and no special precautions).

 Thanks a lot (and congrats for still being capable of doing something 
else from merging).

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread Andrew Morton
> On Thu, 07 Dec 2006 10:29:39 + David Howells <[EMAIL PROTECTED]> wrote:
> Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > I guess I don't understand exactly what problem the noautorel stuff is
> > trying to solve.  It _seems_ to me that in all cases we can simply stuff
> > the old `data' field in alongside the controlling work_struct or
> > delayed_work which wants to operate on it.
> 
> The problem is that you have to be able to guarantee that the data is still
> accessible once you clear the pending bit.  The pending bit is your only
> guaranteed protection, and once it is clear, the containing structure might be
> deallocated.
> 
> I would like to be able to get rid of the NAR bit too, but I'm not confident
> that in all cases I can.  It'll take a bit more study of the code to be able
> to do that.
> 

But anyone who is going to free the structure which contains the
work_struct would need to run flush_workqueue() beforehand, after having
ensured that the work won't reschedule itself.  So the
struct-which-contains-the-work_struct is safe during the callback's
execution.

If that's not being done then the code was buggy in 2.6.19, too..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-07 Thread David Howells
Andrew Morton <[EMAIL PROTECTED]> wrote:

> I guess I don't understand exactly what problem the noautorel stuff is
> trying to solve.  It _seems_ to me that in all cases we can simply stuff
> the old `data' field in alongside the controlling work_struct or
> delayed_work which wants to operate on it.

The problem is that you have to be able to guarantee that the data is still
accessible once you clear the pending bit.  The pending bit is your only
guaranteed protection, and once it is clear, the containing structure might be
deallocated.

I would like to be able to get rid of the NAR bit too, but I'm not confident
that in all cases I can.  It'll take a bit more study of the code to be able
to do that.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Andrew Morton
On Wed, 6 Dec 2006 22:42:07 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> But I wouldn't want to think about an implementation as long as we have
> that WORK_STRUCT_NOAUTOREL horror in there.  Can we just nuke that?  Only
> three drivers need it and I bet they can be modified to use the usual
> mechanisms.

I guess I don't understand exactly what problem the noautorel stuff is
trying to solve.  It _seems_ to me that in all cases we can simply stuff
the old `data' field in alongside the controlling work_struct or
delayed_work which wants to operate on it.

Bridge is the simple case..

diff -puN net/bridge/br_private.h~bridge-avoid-using-noautorel-workqueues 
net/bridge/br_private.h
--- a/net/bridge/br_private.h~bridge-avoid-using-noautorel-workqueues
+++ a/net/bridge/br_private.h
@@ -83,6 +83,7 @@ struct net_bridge_port
struct timer_list   message_age_timer;
struct kobject  kobj;
struct delayed_work carrier_check;
+   struct net_device   *carrier_check_dev;
struct rcu_head rcu;
 };
 
diff -puN net/bridge/br_if.c~bridge-avoid-using-noautorel-workqueues 
net/bridge/br_if.c
--- a/net/bridge/br_if.c~bridge-avoid-using-noautorel-workqueues
+++ a/net/bridge/br_if.c
@@ -83,14 +83,11 @@ static void port_carrier_check(struct wo
struct net_device *dev;
struct net_bridge *br;
 
-   dev = container_of(work, struct net_bridge_port,
-  carrier_check.work)->dev;
-   work_release(work);
-
+   p = container_of(work, struct net_bridge_port, carrier_check.work);
+   dev = p->carrier_check_dev;
rtnl_lock();
-   p = dev->br_port;
-   if (!p)
-   goto done;
+   if (!dev->br_port)
+   goto done;  /* Can this happen? */
br = p->br;
 
if (netif_carrier_ok(dev))
@@ -280,7 +277,8 @@ static struct net_bridge_port *new_nbp(s
p->port_no = index;
br_init_port(p);
p->state = BR_STATE_DISABLED;
-   INIT_DELAYED_WORK_NAR(&p->carrier_check, port_carrier_check);
+   p->carrier_check_dev = dev;
+   INIT_DELAYED_WORK(&p->carrier_check, port_carrier_check);
br_stp_port_timer_init(p);
 
kobject_init(&p->kobj);
_


What am I missing?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Andrew Morton
On Wed, 6 Dec 2006 17:21:50 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Wed, 6 Dec 2006, Linus Torvalds wrote:
> > 
> > How about something like this?
> 
> I didn't get any answers on this. I'd like to get this issue resolved, but 
> since I don't even use libphy on my main machine, I need somebody else to 
> test it for me.
> 
> Just to remind you all, here's the patch again. This is identical to the 
> previous version except for the trivial cleanup to use "work_pending()" 
> instead of open-coding it in two places.
> 
>   Linus
> 
> ...
>
> +static int __run_work(struct cpu_workqueue_struct *cwq, struct work_struct 
> *work)
> +{
> + int ret = 0;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&cwq->lock, flags);
> + /*
> +  * We need to re-validate the work info after we've gotten
> +  * the cpu_workqueue lock. We can run the work now iff:
> +  *
> +  *  - the wq_data still matches the cpu_workqueue_struct
> +  *  - AND the work is still marked pending
> +  *  - AND the work is still on a list (which will be this
> +  *workqueue_struct list)
> +  *
> +  * All these conditions are important, because we
> +  * need to protect against the work being run right
> +  * now on another CPU (all but the last one might be
> +  * true if it's currently running and has not been
> +  * released yet, for example).
> +  */
> + if (get_wq_data(work) == cwq
> + && work_pending(work)
> + && !list_empty(&work->entry)) {
> + work_func_t f = work->func;
> + list_del_init(&work->entry);
> + spin_unlock_irqrestore(&cwq->lock, flags);
> +
> + if (!test_bit(WORK_STRUCT_NOAUTOREL, &work->management))
> + work_release(work);
> + f(work);
> +
> + spin_lock_irqsave(&cwq->lock, flags);
> + cwq->remove_sequence++;
> + wake_up(&cwq->work_done);
> + ret = 1;
> + }
> + spin_unlock_irqrestore(&cwq->lock, flags);
> + return ret;
> +}
> +
> +/**
> + * run_scheduled_work - run scheduled work synchronously
> + * @work: work to run
> + *
> + * This checks if the work was pending, and runs it
> + * synchronously if so. It returns a boolean to indicate
> + * whether it had any scheduled work to run or not.
> + *
> + * NOTE! This _only_ works for normal work_structs. You
> + * CANNOT use this for delayed work, because the wq data
> + * for delayed work will not point properly to the per-
> + * CPU workqueue struct, but will change!
> + */
> +int fastcall run_scheduled_work(struct work_struct *work)
> +{
> + for (;;) {
> + struct cpu_workqueue_struct *cwq;
> +
> + if (!work_pending(work))
> + return 0;

But this will return to the caller if the callback is presently running on
a different CPU.  The whole point here is to be able to reliably kill off
the pending work so that the caller can free resources.

> + if (list_empty(&work->entry))
> + return 0;
> + /* NOTE! This depends intimately on __queue_work! */
> + cwq = get_wq_data(work);
> + if (!cwq)
> + return 0;
> + if (__run_work(cwq, work))
> + return 1;
> + }
> +}
> +EXPORT_SYMBOL(run_scheduled_work);

Also, I worry that this code can run the callback on the caller's CPU. 
Users of per-cpu workqueues can legitimately assume that each callback runs
on the right CPU.  I doubt if many callers _do_ do that - there's
schedule_delayed_work_on(), but that's a bit different.

A solution to both problems is of course to block the caller if the
callback is running.  We can perhaps borrow cwq->work_done for that.


But I wouldn't want to think about an implementation as long as we have
that WORK_STRUCT_NOAUTOREL horror in there.  Can we just nuke that?  Only
three drivers need it and I bet they can be modified to use the usual
mechanisms.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Linus Torvalds


On Wed, 6 Dec 2006, Linus Torvalds wrote:
> 
> How about something like this?

I didn't get any answers on this. I'd like to get this issue resolved, but 
since I don't even use libphy on my main machine, I need somebody else to 
test it for me.

Just to remind you all, here's the patch again. This is identical to the 
previous version except for the trivial cleanup to use "work_pending()" 
instead of open-coding it in two places.

Linus


diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 4044bb1..e175f39 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -587,8 +587,7 @@ int phy_stop_interrupts(struct phy_device *phydev)
 * Finish any pending work; we might have been scheduled
 * to be called from keventd ourselves, though.
 */
-   if (!current_is_keventd())
-   flush_scheduled_work();
+   run_scheduled_work(&phydev->phy_queue);
 
free_irq(phydev->irq, phydev);
 
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 4a3ea83..a601ed5 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -160,6 +160,7 @@ extern int queue_delayed_work_on(int cpu, struct 
workqueue_struct *wq,
 extern void FASTCALL(flush_workqueue(struct workqueue_struct *wq));
 
 extern int FASTCALL(schedule_work(struct work_struct *work));
+extern int FASTCALL(run_scheduled_work(struct work_struct *work));
 extern int FASTCALL(schedule_delayed_work(struct delayed_work *work, unsigned 
long delay));
 
 extern int schedule_delayed_work_on(int cpu, struct delayed_work *work, 
unsigned long delay);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8d1e7cb..36f9b78 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -103,6 +103,79 @@ static inline void *get_wq_data(struct work_struct *work)
return (void *) (work->management & WORK_STRUCT_WQ_DATA_MASK);
 }
 
+static int __run_work(struct cpu_workqueue_struct *cwq, struct work_struct 
*work)
+{
+   int ret = 0;
+   unsigned long flags;
+
+   spin_lock_irqsave(&cwq->lock, flags);
+   /*
+* We need to re-validate the work info after we've gotten
+* the cpu_workqueue lock. We can run the work now iff:
+*
+*  - the wq_data still matches the cpu_workqueue_struct
+*  - AND the work is still marked pending
+*  - AND the work is still on a list (which will be this
+*workqueue_struct list)
+*
+* All these conditions are important, because we
+* need to protect against the work being run right
+* now on another CPU (all but the last one might be
+* true if it's currently running and has not been
+* released yet, for example).
+*/
+   if (get_wq_data(work) == cwq
+   && work_pending(work)
+   && !list_empty(&work->entry)) {
+   work_func_t f = work->func;
+   list_del_init(&work->entry);
+   spin_unlock_irqrestore(&cwq->lock, flags);
+
+   if (!test_bit(WORK_STRUCT_NOAUTOREL, &work->management))
+   work_release(work);
+   f(work);
+
+   spin_lock_irqsave(&cwq->lock, flags);
+   cwq->remove_sequence++;
+   wake_up(&cwq->work_done);
+   ret = 1;
+   }
+   spin_unlock_irqrestore(&cwq->lock, flags);
+   return ret;
+}
+
+/**
+ * run_scheduled_work - run scheduled work synchronously
+ * @work: work to run
+ *
+ * This checks if the work was pending, and runs it
+ * synchronously if so. It returns a boolean to indicate
+ * whether it had any scheduled work to run or not.
+ *
+ * NOTE! This _only_ works for normal work_structs. You
+ * CANNOT use this for delayed work, because the wq data
+ * for delayed work will not point properly to the per-
+ * CPU workqueue struct, but will change!
+ */
+int fastcall run_scheduled_work(struct work_struct *work)
+{
+   for (;;) {
+   struct cpu_workqueue_struct *cwq;
+
+   if (!work_pending(work))
+   return 0;
+   if (list_empty(&work->entry))
+   return 0;
+   /* NOTE! This depends intimately on __queue_work! */
+   cwq = get_wq_data(work);
+   if (!cwq)
+   return 0;
+   if (__run_work(cwq, work))
+   return 1;
+   }
+}
+EXPORT_SYMBOL(run_scheduled_work);
+
 /* Preempt must be disabled. */
 static void __queue_work(struct cpu_workqueue_struct *cwq,
 struct work_struct *work)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Linus Torvalds


On Wed, 6 Dec 2006, David Howells wrote:
>
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> >  (a) "volatile" on kernel data is basically always a bug, and you should 
> >  use locking.
> 
> But what about when you're building a lock?  Actually, I suspect correct usage
> of asm constraints and memory barriers trumps volatile anyway even there.

The word you look for is not "suspect".

You _cannot_ build a lock using "volatile", unless your CPU is strictly 
in-order and has an in-order memory subsystem too (so, for example, while 
all ia64 implementations today are in-order, they do /not/ have an 
in-order memory subsystem). Only then could you do locking with volatile 
and some crazy Peterson's algorithm.

I don't think any such CPU actually exists.

Anyway, we've had this discussion before on linux-kernel, it really boils 
down to that "volatile" is basically never correct with the exception of 
flags that don't have any meaning and that you don't actually _care_ about 
the exact value (the low word of "jiffies" being the canonical example of 
something where "volatile" is actually fine, and where - as long as you 
can load it atomically - "volatile" really does make sense).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread David Howells
Linus Torvalds <[EMAIL PROTECTED]> wrote:

>  (a) "volatile" on kernel data is basically always a bug, and you should 
>  use locking.

But what about when you're building a lock?  Actually, I suspect correct usage
of asm constraints and memory barriers trumps volatile anyway even there.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Linus Torvalds


On Wed, 6 Dec 2006, Linus Torvalds wrote:
>
> and remove the "volatile" from all the bitop accessor functions.

It might also be interesting to see if this would change code-size at all. 

There's a number of things that check different bits in the same word 
right now, and they just reload the word unnecessarily and do multiple 
tests. Some of the page flags functions obviously already work around this 
by doing horrible things by hand instead, eg:

(page->flags & (
1 << PG_lru |
1 << PG_private |
1 << PG_locked  |
1 << PG_active  |
1 << PG_reclaim |
1 << PG_slab|
1 << PG_swapcache |
1 << PG_writeback |
1 << PG_reserved |
1 << PG_buddy ))

in the free_pages_check() thing. It may make sense there, but we really 
_should_ allow gcc to just do things like this for us, and just use the 
proper functions to test bits.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Linus Torvalds


On Wed, 6 Dec 2006, Linus Torvalds wrote:
> 
> Sadly, gcc doesn't do it in this case. Still, I'd rather keep the source 
> clean, and hope that the compiler improves eventually, than to make the 
> code uglier.

Actually, it's our own damn fault.

We've long had our arguments "const volatile" to test_bit(), which 
basically means that gcc can't do the optimization. Damn.

And they need to be "volatile" not because we _want_ the thing to be 
volaile, but because these things are occationally used on volatile 
objects (so if the function doesn't take a volatile pointer, it would 
warn).

That's why so many of these helper functions use "const volatile" 
pointers, which on the face of it would seem strange if you don't realize 
that it's more about a C type issue than about the _actual_ type being 
either "const" or "volatile".

Sad. I guess we could remove the "const volatile" from the _cast_, but the 
thing is, if the underlying object we're actually looking at really _is_ 
volatile, we shouldn't do that. GAAH.

Really sad. I doubt any of the callers actually want the "volatile" access 
at all. Things like  definitely _don't_ want it.

I suspect we should just face up to the fact that 

 (a) "volatile" on kernel data is basically always a bug, and you should 
 use locking. "volatile" doesn't help anything at all with memory 
 ordering and friends, so it's insane to think it "solves" anything on 
 its own.
 (b) on "iomem" pointers it does make sense, but those need special 
 accessor functions _anyway_, so things like test_bit() wouldn't work 
 on them.
 (c) if you spin on a value changing, you should use "cpu_relax()" or 
 "barrier()" anyway, which will force gcc to re-load any values from 
 memory over the loop.

and remove the "volatile" from all the bitop accessor functions.

Or at least from "test_bit()". It's not like it's synchronous _anyway_ 
(there's no memory barriers etc).

Hmm? Comments? linux-arch added to Cc, just in case people care (this 
particular thing is in , so it _is_ architecture- 
specific, but we should probably all agree on it first)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Linus Torvalds


On Wed, 6 Dec 2006, Jeff Garzik wrote:
>
> Honestly I wonder if some of these situations really want
> "kill_scheduled_work_unless_it_is_already_running_right_now_if_so_wait_for_it"

We could do that, and the code to do it would be fairly close to what the 
"run it" code is. Just replace the

if (!test_bit(WORK_STRUCT_NOAUTOREL, &work->management))
work_release(work);
f(work);

with an unconditional

work_release(work);

instead.

However, I think I'd prefer my patch for now, if only because that 
"work_release()" thing kind of worries me. You're supposed to either do 
the release yourself in the work function _after_ you've done all 
book-keeping, or if the thing doesn't need any book-keeping at all, you 
can do the "autorelease" thing. The "kill without running" breaks that 
conceptual model.

So a "kill_work()" usage should almost always end up being something like

if (kill_work(work))
release anything that running the work function would release

but then for the (probably common) case where there is nothing that the 
work function releases, that would obviously boil down to just a

kill_work(work);

However, my point is that the _workqueue_ logic cannot know what the user 
of the workqueue wants, so the "run_scheduled_work()" approach is much 
saner in this respect.

NOTE NOTE NOTE! In neither case can we do anything at all about a 
workqueue entry that is actually _already_ running on another CPU. My 
suggested patch will simply not run it at all synchronously (and return 
0), and a "kill_work()" thing couldn't do anything either. The required 
synchronization for something like that simply doesn't exist.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Linus Torvalds


On Wed, 6 Dec 2006, Linus Torvalds wrote:
> 
> Gcc should do it for us, afaik. I didn't check, but gcc is generally 
> pretty good at combining logical operations like this, because it's very 
> common.

Sadly, gcc doesn't do it in this case. Still, I'd rather keep the source 
clean, and hope that the compiler improves eventually, than to make the 
code uglier.

I replaced both of the "test_bit(WORK_STRUCT_PENDING, &work->management)" 
with "work_pending(work)" in my tree.

Now somebody who knows how to trigger this thing should just _test_ it.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Linus Torvalds


On Wed, 6 Dec 2006, David Howells wrote:
> 
> + if (get_wq_data(work) == cwq
> + && test_bit(WORK_STRUCT_PENDING, &work->management)
> 
> I wonder if those can be combined, perhaps:

Gcc should do it for us, afaik. I didn't check, but gcc is generally 
pretty good at combining logical operations like this, because it's very 
common.

> Otherwise for i386 the compiler can't combine them because test_bit() is done
> with inline asm.

Nope. Look again.

test_bit() with a constant number is done very much in C, and very much on 
purpose. _Exactly_ to allow the compiler to combine these kinds of things.

> And:
> 
> + if (!test_bit(WORK_STRUCT_PENDING, &work->management))
> 
> Should possibly be:
> 
> + if (!work_pending(work))

Yeah, that's a worthy cleanup.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread David Howells
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> test_bit() with a constant number is done very much in C, and very much on 
> purpose. _Exactly_ to allow the compiler to combine these kinds of things.

Ah...  I've read that several times, and each time I've assumed it's referring
to *addr not nr as being constant.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Jeff Garzik
Honestly I wonder if some of these situations really want 
"kill_scheduled_work_unless_it_is_already_running_right_now_if_so_wait_for_it"


Since its during shutdown, usually the task just wants to know that the 
code in the workqueue won't be touching the hardware or data structures 
after  point.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread David Howells
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> How about something like this?

At first glance, this looks reasonable.

It also looks like it should be used to replace a lot of the
cancel_delayed_work() calls that attempt to cancel _undelayed_ work items.
That would allow a number of work items to be downgraded from delayed_work to
work_struct.

Also, the name "run_scheduled_work" sort of suggests that the work *will* be
run regardless of whether it was pending or not.  Given the confusion over
cancel_delayed_work(), I imagine this will rain confusion too.

+   if (get_wq_data(work) == cwq
+   && test_bit(WORK_STRUCT_PENDING, &work->management)

I wonder if those can be combined, perhaps:

+   if ((work->management & ~WORK_STRUCT_NOAUTOREL) ==
+   ((unsigned long) cwq | (1 << WORK_STRUCT_PENDING))

Otherwise for i386 the compiler can't combine them because test_bit() is done
with inline asm.

And:

+   if (!test_bit(WORK_STRUCT_PENDING, &work->management))

Should possibly be:

+   if (!work_pending(work))

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Linus Torvalds


On Wed, 6 Dec 2006, Andrew Morton wrote:
> 
> I think so too.  But it would be imprudent to hang around waiting for me
> to write it :(

How about something like this?

This
 (a) depends on the just-merged "struct work" cleanup
 (b) is totally untested
 (c) probably kills you slowly and painfully
 (d) may breed frikken sharks with lasers on their frikken heads!
 (e) does compile, but I don't guarantee anything else.
 (f) may, in other words, be totally broken

And, btw: it may not work. Just in case that wasn't clear. This is a quick 
hack from me just sitting down and seeing if I can still do kernel 
programming, or whether I'm just relegated to merge other peoples code.

Linus

PS. It might be broken.

PPS. David Howells added to participant list, hopefully he can 
double-check all my assumptions, since he's touched the workqueue code 
last. Tag, you're it!


diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 4044bb1..e175f39 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -587,8 +587,7 @@ int phy_stop_interrupts(struct phy_device *phydev)
 * Finish any pending work; we might have been scheduled
 * to be called from keventd ourselves, though.
 */
-   if (!current_is_keventd())
-   flush_scheduled_work();
+   run_scheduled_work(&phydev->phy_queue);
 
free_irq(phydev->irq, phydev);
 
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 4a3ea83..a601ed5 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -160,6 +160,7 @@ extern int queue_delayed_work_on(int cpu, struct 
workqueue_struct *wq,
 extern void FASTCALL(flush_workqueue(struct workqueue_struct *wq));
 
 extern int FASTCALL(schedule_work(struct work_struct *work));
+extern int FASTCALL(run_scheduled_work(struct work_struct *work));
 extern int FASTCALL(schedule_delayed_work(struct delayed_work *work, unsigned 
long delay));
 
 extern int schedule_delayed_work_on(int cpu, struct delayed_work *work, 
unsigned long delay);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8d1e7cb..fcacf06 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -103,6 +103,79 @@ static inline void *get_wq_data(struct work_struct *work)
return (void *) (work->management & WORK_STRUCT_WQ_DATA_MASK);
 }
 
+static int __run_work(struct cpu_workqueue_struct *cwq, struct work_struct 
*work)
+{
+   int ret = 0;
+   unsigned long flags;
+
+   spin_lock_irqsave(&cwq->lock, flags);
+   /*
+* We need to re-validate the work info after we've gotten
+* the cpu_workqueue lock. We can run the work now iff:
+*
+*  - the wq_data still matches the cpu_workqueue_struct
+*  - AND the work is still marked pending
+*  - AND the work is still on a list (which will be this
+*workqueue_struct list)
+*
+* All these conditions are important, because we
+* need to protect against the work being run right
+* now on another CPU (all but the last one might be
+* true if it's currently running and has not been
+* released yet, for example).
+*/
+   if (get_wq_data(work) == cwq
+   && test_bit(WORK_STRUCT_PENDING, &work->management)
+   && !list_empty(&work->entry)) {
+   work_func_t f = work->func;
+   list_del_init(&work->entry);
+   spin_unlock_irqrestore(&cwq->lock, flags);
+
+   if (!test_bit(WORK_STRUCT_NOAUTOREL, &work->management))
+   work_release(work);
+   f(work);
+
+   spin_lock_irqsave(&cwq->lock, flags);
+   cwq->remove_sequence++;
+   wake_up(&cwq->work_done);
+   ret = 1;
+   }
+   spin_unlock_irqrestore(&cwq->lock, flags);
+   return ret;
+}
+
+/**
+ * run_scheduled_work - run scheduled work synchronously
+ * @work: work to run
+ *
+ * This checks if the work was pending, and runs it
+ * synchronously if so. It returns a boolean to indicate
+ * whether it had any scheduled work to run or not.
+ *
+ * NOTE! This _only_ works for normal work_structs. You
+ * CANNOT use this for delayed work, because the wq data
+ * for delayed work will not point properly to the per-
+ * CPU workqueue struct, but will change!
+ */
+int fastcall run_scheduled_work(struct work_struct *work)
+{
+   for (;;) {
+   struct cpu_workqueue_struct *cwq;
+
+   if (!test_bit(WORK_STRUCT_PENDING, &work->management))
+   return 0;
+   if (list_empty(&work->entry))
+   return 0;
+   /* NOTE! This depends intimately on __queue_work! */
+   cwq = get_wq_data(work);
+   if (!cwq)
+   return 0;
+   if (__run_work(cwq, work))
+   return 1;
+   }
+}
+EXPORT_SYMBOL(run_scheduled_work);
+
 /* Preempt must

Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Andrew Morton
On Wed, 6 Dec 2006 15:25:22 + (GMT)
"Maciej W. Rozycki" <[EMAIL PROTECTED]> wrote:

> > Maybe the lesson here is that flush_scheduled_work() is a bad function.
> > It should really be flush_this_work(struct work_struct *w).  That is in
> > fact what approximately 100% of the flush_scheduled_work() callers actually
> > want to do.
> 
>  There may be cases where flush_scheduled_work() is indeed needed, but 
> likely outside drivers, and I agree such a specific call would be useful 
> and work here.

I think so too.  But it would be imprudent to hang around waiting for me
to write it :(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Maciej W. Rozycki
On Tue, 5 Dec 2006, Andrew Morton wrote:

> But running flush_scheduled_work() from within dev_close() is a very
> sensible thing to do, and dev_close is called under rtnl_lock().
> davem is -> thattaway ;)

 And when within dev_close() there is quite a chance there is 
linkwatch_event() somewhere in the event queue already. ;-)

> Ah.  The point is that the phy code doesn't want to flush _all_ pending
> callbacks.  It only wants to flush its own one.  And its own one doesn't
> take rtnl_lock().
> 
> IOW, the phy code has no interest in running some random other subsystem's
> callback - it just wants to run its own.  Hence no deadlock.

 Both are true.  It's linkwatch_event() that's somewhere in the queue 
already that makes the trouble here.

> Maybe the lesson here is that flush_scheduled_work() is a bad function.
> It should really be flush_this_work(struct work_struct *w).  That is in
> fact what approximately 100% of the flush_scheduled_work() callers actually
> want to do.

 There may be cases where flush_scheduled_work() is indeed needed, but 
likely outside drivers, and I agree such a specific call would be useful 
and work here.

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-06 Thread Maciej W. Rozycki
On Tue, 5 Dec 2006, Andy Fleming wrote:

> We need to make sure there are no more pending phy_change() invocations in the
> work queue, this is true, however I'm not convinced that this avoids the
> problem.  And now that I come back to this email after Linus's response, let
> me add that I agree with his suggestion.  I still don't think it solves the
> original problem, though.  Unless I'm missing something, Maciej's suggested
> fix (having the driver invoke phy_stop_interrupts() from a work queue) doesn't
> stop the race:
> 
> * Schedule stop_interrupts and freeing of memory.
> * interrupt occurs, and schedules phy_change
> * work_queue triggers, and stop_interrupts is invoked.  It doesn't call
> flush_scheduled_work, because it's being called from keventd.
> * The invoker of stop_interrupts (presumably some function in the driver)
> frees up memory, including the phy_device.
> * phy_change is invoked() from the work queue, and starts accessing freed
> memory

 This is not going to happen with my other changes to the file applied.  
The reason is at the time phy_stop_interrupts() is called phy_stop() has 
already run and switched the state of the PHY being handled to PHY_HALTED.  
As a result any subsequent calls to phy_interrupt() that might have 
happened after phy_stop() have not scheduled calls to phy_change() for 
this PHY as will not any that may happen up until free_irq() have 
unregistered the interrupt for the PHY.

> I suggested this, mostly so that drivers wouldn't have to be aware of this.
> But I'm not quite sure what happens when you unload a module.  Does some stuff
> stay behind if needed?  If you unload the ethernet driver, that will usually
> remove the bus controller for the PHY, which would prevent any scheduled code
> from accessing the PHY.

 Hmm, I am unsure if there is anything that would ensure flushing of the 
queue after its stop() call has finished and before a driver is removed 
(its module_exit() call is invoked), probably nothing, and that is why I 
put explicit flush_scheduled_work() in sb1250-mac.c:sbmac_remove() and the 
driver's open() call checks whether a possible previous instance of the 
structure used by phy_change() have not been freed yet.  There may be a 
cleaner way of doing it, but I will have to think about it.

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Roland Dreier
 > Ah.  The point is that the phy code doesn't want to flush _all_ pending
 > callbacks.  It only wants to flush its own one.  And its own one doesn't
 > take rtnl_lock().

OK, got it.  You're absolutely correct.

 > Maybe the lesson here is that flush_scheduled_work() is a bad function.
 > It should really be flush_this_work(struct work_struct *w).  That is in
 > fact what approximately 100% of the flush_scheduled_work() callers actually
 > want to do.

I think flush_this_work() runs into trouble if it means "make sure
everything up to  has completed" because it still syncs
with everything before , which has the same risk of
deadlock.  And I'm not totally sure everyone who does
flush_scheduled_work() really means "cancel my work" -- they might mean
"finish up my work".

For example I would have to do some archeology to remember exactly
what I needed flush_scheduled_work() when I wrote drivers/infiniband/ulp/ipoib

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Roland Dreier
 > But running flush_scheduled_work() from within dev_close() is a very
 > sensible thing to do, and dev_close is called under rtnl_lock().

I can't argue with that -- this has actually bitten me in the past.

Hmm, I'll try to understand why we need rtnl_lock() to cover dev_close...

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Andrew Morton
On Tue, 05 Dec 2006 13:37:37 -0800
Roland Dreier <[EMAIL PROTECTED]> wrote:

>  > a) Ban the calling of flush_scheduled_work() from under rtnl_lock(). 
>  >Sounds hard.
> 
> Unfortunate if this is happening a lot.  It seems like the most
> sensible fix -- flush_scheduled_work() is in effect calling into
> an unknown and changeable in the future set of functions (since it
> waits for them to finish), and it seems error-prone to hold a lock
> across such a call.

yes, I agree.  It's really bad to be calling flush_scheduled_work() with
any locks held at all.  Fragile, hard-to-maintain, source of
once-in-a-blue-moon failures, etc.  I guess lockdep will help.

But running flush_scheduled_work() from within dev_close() is a very
sensible thing to do, and dev_close is called under rtnl_lock().
davem is -> thattaway ;)


>  >This will almost work, as long as it's done in workqueue.c with
>  >appropriate locking.  The bug occurs when some other CPU is running
>  >phy_change() right now - we'll end up freeing data which that CPU is
>  >presently playing with.
>  > 
>  >But perhaps we can take care of this within workqueue.c.  We need a
>  >cancel function which will cancel the work and, if its callback is
>  >presently executing it will block until that execution has completed.
> 
> I may be misunderstanding you, but this seems to deadlock in exactly
> the same way: if someone calls this cancel routine holding rtnl_lock,
> and the work function that will also take rtnl_lock has just started,
> it will get stuck when the work function tries to take rtnl_lock.

Ah.  The point is that the phy code doesn't want to flush _all_ pending
callbacks.  It only wants to flush its own one.  And its own one doesn't
take rtnl_lock().

IOW, the phy code has no interest in running some random other subsystem's
callback - it just wants to run its own.  Hence no deadlock.

Maybe the lesson here is that flush_scheduled_work() is a bad function.
It should really be flush_this_work(struct work_struct *w).  That is in
fact what approximately 100% of the flush_scheduled_work() callers actually
want to do.

hmm.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Roland Dreier
 > a) Ban the calling of flush_scheduled_work() from under rtnl_lock(). 
 >Sounds hard.

Unfortunate if this is happening a lot.  It seems like the most
sensible fix -- flush_scheduled_work() is in effect calling into
an unknown and changeable in the future set of functions (since it
waits for them to finish), and it seems error-prone to hold a lock
across such a call.

 >This will almost work, as long as it's done in workqueue.c with
 >appropriate locking.  The bug occurs when some other CPU is running
 >phy_change() right now - we'll end up freeing data which that CPU is
 >presently playing with.
 > 
 >But perhaps we can take care of this within workqueue.c.  We need a
 >cancel function which will cancel the work and, if its callback is
 >presently executing it will block until that execution has completed.

I may be misunderstanding you, but this seems to deadlock in exactly
the same way: if someone calls this cancel routine holding rtnl_lock,
and the work function that will also take rtnl_lock has just started,
it will get stuck when the work function tries to take rtnl_lock.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Andrew Morton
On Tue, 5 Dec 2006 14:59:31 -0600
Andy Fleming <[EMAIL PROTECTED]> wrote:

> Ok, I think this is the summary:
> 
> - phy_change() is the work queue callback function (scheduled when a  
> PHY interrupt occurs)
> 
> - dev_close() invokes the controller's stop/close/whatever function,  
> and it calls phy_disconnect()
> 
> - phy_disconnect() calls phy_stop_interrupts().  To prevent any  
> pending phy_change() calls from getting confused, phy_stop_interrupts 
> () needs to flush the queue.  Otherwise, subsequent memory freeings  
> will leave phy_change() hanging.
> 
> - If phy_stop_interrupts() calls flush_scheduled_work(), keventd will  
> execute its queues while rtnl_lock is held, providing opportunity for  
> other callbacks to deadlock.
> 
> - innocent puppies are slaughtered, and the world mourns.
> 

ah, OK.  So it's some other queued-up callback which takes rtnl_lock.

But I still don't see what's special about keventd.  If, say, /sbin/ip is
running flush_scheduled_work() under rtnl_lock then it too should deadlock
in this scenario.

> 
> Maciej's solution is to schedule phy_disconnect() to be called from a  
> work queue.  That solution should work, but it sounds like it doesn't  
> require the check for if keventd is running.
> 
> Of course, my objection to it is that it now requires the ethernet  
> controller to be excessively aware of the details of how the PHY Lib  
> is handling the PHY interrupts (by scheduling them on a work queue).

So what's a good fix?

a) Ban the calling of flush_scheduled_work() from under rtnl_lock(). 
   Sounds hard.

b) Ban the queueing of callback functions which take rtnl_lock().  This
   sounds like a plain bad idea - callbacks are low-level things which
   ought to be able to take locks.

c) Cancel the phy_change() callback within phy_stop_interrupts() so we
   don't need to run flush_scheduled_work() at all.

   This will almost work, as long as it's done in workqueue.c with
   appropriate locking.  The bug occurs when some other CPU is running
   phy_change() right now - we'll end up freeing data which that CPU is
   presently playing with.

   But perhaps we can take care of this within workqueue.c.  We need a
   cancel function which will cancel the work and, if its callback is
   presently executing it will block until that execution has completed.

   If we require of the calling subsystem a) that the work will not get
   rescheduled (that means phy_interrupt()) and b) that the callback does
   not rearm the work then things get simpler.

   But still not very simple.  It gets ugly with per-CPU qorkqueues.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Andy Fleming


On Dec 5, 2006, at 14:39, Andrew Morton wrote:


On Tue, 5 Dec 2006 17:48:05 + (GMT)
"Maciej W. Rozycki" <[EMAIL PROTECTED]> wrote:


 Essentially there is a race when disconnecting from a PHY, because
interrupt delivery uses the event queue for processing.  The  
function to
handle interrupts that is called from the event queue is phy_change 
().
It takes a pointer to a structure that is associated with the  
PHY.  At the
time phy_stop_interrupts() is called there may be one or more  
calls to
phy_change() still pending on the event queue.  They may not be  
able to be
processed until the structure passed to phy_change() have been  
freed, at

which point calling the function is wrong.

 One way of avoiding it is calling flush_scheduled_work() from
phy_stop_interrupts().  This is fine as long as a caller of
phy_stop_interrupts() (not necessarily the immediate one calling into
libphy) does not hold the netlink lock.


So let me try to rephrase...

- phy_change() is the workqueue callback function.  It is executed by
  keventd.

- Something under phy_change() takes rtnl_lock() (but what??)



I don't think it's phy_change().  It's something else that may be  
scheduled.





- phy_stop_interrupts() does flush_scheduled_work().  This has to
  following logic:

  - if I am kevetnd, run phy_change() directly.

  - If I am not keventd, wait for keventd() to run phy_change()

- So if the caller of phy_stop_interrupt() already holds rtnl_lock(),
  and if that caller is keventd then it will recur onto rntl_lock()  
and

  will deadlock.

Problem is, if the caller of phy_stop_interrupt() is *not* keventd,  
that
caller will still deadlock, because that caller is waiting for  
keventd to

run phy_change(), and keventd cannot do that, because the not-keventd
process already holds rtnl_lock.


Now, afaict, there are only two callers of phy_stop_interrupts(): the
close() handlers of gianfar.c and fs_enet-main.c (confusingly held in
netdevice.stop (confusingly called by dev_close())).  Via  
phy_disconnect.

Did I miss anything?



Right now, that's probably about right.




And the dev_close() caller holds rtnl_lock.




Ok, I think this is the summary:

- phy_change() is the work queue callback function (scheduled when a  
PHY interrupt occurs)


- dev_close() invokes the controller's stop/close/whatever function,  
and it calls phy_disconnect()


- phy_disconnect() calls phy_stop_interrupts().  To prevent any  
pending phy_change() calls from getting confused, phy_stop_interrupts 
() needs to flush the queue.  Otherwise, subsequent memory freeings  
will leave phy_change() hanging.


- If phy_stop_interrupts() calls flush_scheduled_work(), keventd will  
execute its queues while rtnl_lock is held, providing opportunity for  
other callbacks to deadlock.


- innocent puppies are slaughtered, and the world mourns.


Maciej's solution is to schedule phy_disconnect() to be called from a  
work queue.  That solution should work, but it sounds like it doesn't  
require the check for if keventd is running.


Of course, my objection to it is that it now requires the ethernet  
controller to be excessively aware of the details of how the PHY Lib  
is handling the PHY interrupts (by scheduling them on a work queue).


Andy


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Andrew Morton
On Tue, 5 Dec 2006 17:48:05 + (GMT)
"Maciej W. Rozycki" <[EMAIL PROTECTED]> wrote:

>  Essentially there is a race when disconnecting from a PHY, because 
> interrupt delivery uses the event queue for processing.  The function to 
> handle interrupts that is called from the event queue is phy_change().  
> It takes a pointer to a structure that is associated with the PHY.  At the 
> time phy_stop_interrupts() is called there may be one or more calls to 
> phy_change() still pending on the event queue.  They may not be able to be 
> processed until the structure passed to phy_change() have been freed, at 
> which point calling the function is wrong.
> 
>  One way of avoiding it is calling flush_scheduled_work() from 
> phy_stop_interrupts().  This is fine as long as a caller of 
> phy_stop_interrupts() (not necessarily the immediate one calling into 
> libphy) does not hold the netlink lock.

So let me try to rephrase...

- phy_change() is the workqueue callback function.  It is executed by
  keventd.

- Something under phy_change() takes rtnl_lock() (but what??)

- phy_stop_interrupts() does flush_scheduled_work().  This has to
  following logic:

  - if I am kevetnd, run phy_change() directly.

  - If I am not keventd, wait for keventd() to run phy_change()

- So if the caller of phy_stop_interrupt() already holds rtnl_lock(),
  and if that caller is keventd then it will recur onto rntl_lock() and
  will deadlock.

Problem is, if the caller of phy_stop_interrupt() is *not* keventd, that
caller will still deadlock, because that caller is waiting for keventd to
run phy_change(), and keventd cannot do that, because the not-keventd
process already holds rtnl_lock.


Now, afaict, there are only two callers of phy_stop_interrupts(): the
close() handlers of gianfar.c and fs_enet-main.c (confusingly held in
netdevice.stop (confusingly called by dev_close())).  Via phy_disconnect. 
Did I miss anything?

And the dev_close() caller holds rtnl_lock.


Summary:

a) Please tell us what code under phy_change() wants to take rtnl_lock

b) I think it should deadlock whether or not the caller of
   phy_stop_interrupt() is keventd.  What am I missing?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Andrew Morton
On Tue, 5 Dec 2006 10:07:21 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Tue, 5 Dec 2006, Maciej W. Rozycki wrote:
> >
> >  One way of avoiding it is calling flush_scheduled_work() from 
> > phy_stop_interrupts().  This is fine as long as a caller of 
> > phy_stop_interrupts() (not necessarily the immediate one calling into 
> > libphy) does not hold the netlink lock.
> > 
> >  If a caller indeed holds the netlink lock, then a driver effectively 
> > calling phy_stop_interrupts() may arrange for the function to be itself 
> > scheduled through the event queue.  This has the effect of avoiding the 
> > race as well, as the queue is processed in order, except it causes more 
> > hassle for the driver.
> 
> I would personally be ok with "flush_scheduled_work()" _itself_ noticing 
> that it is actually waiting to flush "itself", and just being a no-op in 
> that case.

It does do that:

static void flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
{
if (cwq->thread == current) {
/*
 * Probably keventd trying to flush its own queue. So simply run
 * it by hand rather than deadlocking.
 */
run_workqueue(cwq);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Andy Fleming


On Dec 5, 2006, at 11:48, Maciej W. Rozycki wrote:



 Essentially there is a race when disconnecting from a PHY, because
interrupt delivery uses the event queue for processing.  The  
function to

handle interrupts that is called from the event queue is phy_change().
It takes a pointer to a structure that is associated with the PHY.   
At the

time phy_stop_interrupts() is called there may be one or more calls to
phy_change() still pending on the event queue.  They may not be  
able to be
processed until the structure passed to phy_change() have been  
freed, at

which point calling the function is wrong.

 One way of avoiding it is calling flush_scheduled_work() from
phy_stop_interrupts().  This is fine as long as a caller of
phy_stop_interrupts() (not necessarily the immediate one calling into
libphy) does not hold the netlink lock.

 If a caller indeed holds the netlink lock, then a driver effectively
calling phy_stop_interrupts() may arrange for the function to be  
itself
scheduled through the event queue.  This has the effect of avoiding  
the
race as well, as the queue is processed in order, except it causes  
more
hassle for the driver.  Hence the choice was left to the driver's  
author
-- if a driver "knows" the netlink lock is not going to be held at  
that 
point, it may call phy_stop_interrupts() directly, otherwise it  
shall use

the event queue.



We need to make sure there are no more pending phy_change()  
invocations in the work queue, this is true, however I'm not  
convinced that this avoids the problem.  And now that I come back to  
this email after Linus's response, let me add that I agree with his  
suggestion.  I still don't think it solves the original problem,  
though.  Unless I'm missing something, Maciej's suggested fix (having  
the driver invoke phy_stop_interrupts() from a work queue) doesn't  
stop the race:


* Schedule stop_interrupts and freeing of memory.
* interrupt occurs, and schedules phy_change
* work_queue triggers, and stop_interrupts is invoked.  It doesn't  
call flush_scheduled_work, because it's being called from keventd.
* The invoker of stop_interrupts (presumably some function in the  
driver) frees up memory, including the phy_device.
* phy_change is invoked() from the work queue, and starts accessing  
freed memory


Am I misunderstanding how work queues operate?

If I'm right, an ugly solution would have to disable the PHY  
interrupts before scheduling the cleanup.  But is there really no way  
to tell the kernel to squash all pending work that came from *this*  
work_queue?






 With such an assumption in place the function has to check somehow
whether it has been scheduled through the queue or not and act
accordingly, which is why that "if" clause is there.

 Now I gather the conclusion was the whole mess was going to be  
included

within libphy and not exposed to Ethernet MAC drivers.  This way the
library would schedule both phy_stop_interrupts() and  
mdiobus_unregister()
(which is actually the function freeing the PHY device structure)  
through

the event queue as needed without a MAC driver having to know.



I suggested this, mostly so that drivers wouldn't have to be aware of  
this.  But I'm not quite sure what happens when you unload a module.   
Does some stuff stay behind if needed?  If you unload the ethernet  
driver, that will usually remove the bus controller for the PHY,  
which would prevent any scheduled code from accessing the PHY.



Andy


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Maciej W. Rozycki
On Mon, 4 Dec 2006, Steve Fox wrote:

> Unfortunately Maciej didn't get cc'ed correctly on your mail. Hopefully
> I've added the right person to this post as well as Andy who has also
> recently changed this code.

 Thanks -- my parser of the LKML does not always trigger.

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Linus Torvalds


On Tue, 5 Dec 2006, Maciej W. Rozycki wrote:
>
>  One way of avoiding it is calling flush_scheduled_work() from 
> phy_stop_interrupts().  This is fine as long as a caller of 
> phy_stop_interrupts() (not necessarily the immediate one calling into 
> libphy) does not hold the netlink lock.
> 
>  If a caller indeed holds the netlink lock, then a driver effectively 
> calling phy_stop_interrupts() may arrange for the function to be itself 
> scheduled through the event queue.  This has the effect of avoiding the 
> race as well, as the queue is processed in order, except it causes more 
> hassle for the driver.

I would personally be ok with "flush_scheduled_work()" _itself_ noticing 
that it is actually waiting to flush "itself", and just being a no-op in 
that case.

However, having outside code do that detection for it seems to be ugly as 
hell. And something like the network drievr layer shouldn't know about 
keventd details like this.

So if you can move that deadlock avoidance logic into 
"flush_scheduled_work()" itself, we'd avoid the stupid export, and we'd 
also avoid the driver layer having to care about these things. It would 
still be _ugly_, but I think it would be less so.

Hmm?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-05 Thread Maciej W. Rozycki
On Sun, 3 Dec 2006, Andrew Morton wrote:

> wtf?  That code was merged?  This bug has been known for months and after
> several unsuccessful attempts at trying to understand what on earth that
> hackery is supposed to be doing I gave up on it and asked Jeff to look after
> it.

 I am surprised it got merged too -- my understanding from the discussion 
was it was to be sorted out somehow within libphy, one way or another.

> Maciej, please, in very small words written in a very large font, explain to
> us what is going on in phy_stop_interrupts()?  Include pictures.

 Would ASCII art qualify as pictures?  Regardless, I am providing a 
textual description, which please feel free to skip to the conclusion 
below if uninterested in the details.

 Essentially there is a race when disconnecting from a PHY, because 
interrupt delivery uses the event queue for processing.  The function to 
handle interrupts that is called from the event queue is phy_change().  
It takes a pointer to a structure that is associated with the PHY.  At the 
time phy_stop_interrupts() is called there may be one or more calls to 
phy_change() still pending on the event queue.  They may not be able to be 
processed until the structure passed to phy_change() have been freed, at 
which point calling the function is wrong.

 One way of avoiding it is calling flush_scheduled_work() from 
phy_stop_interrupts().  This is fine as long as a caller of 
phy_stop_interrupts() (not necessarily the immediate one calling into 
libphy) does not hold the netlink lock.

 If a caller indeed holds the netlink lock, then a driver effectively 
calling phy_stop_interrupts() may arrange for the function to be itself 
scheduled through the event queue.  This has the effect of avoiding the 
race as well, as the queue is processed in order, except it causes more 
hassle for the driver.  Hence the choice was left to the driver's author 
-- if a driver "knows" the netlink lock is not going to be held at that 
point, it may call phy_stop_interrupts() directly, otherwise it shall use 
the event queue.

 With such an assumption in place the function has to check somehow 
whether it has been scheduled through the queue or not and act 
accordingly, which is why that "if" clause is there.

 Now I gather the conclusion was the whole mess was going to be included 
within libphy and not exposed to Ethernet MAC drivers.  This way the 
library would schedule both phy_stop_interrupts() and mdiobus_unregister() 
(which is actually the function freeing the PHY device structure) through 
the event queue as needed without a MAC driver having to know.

 And the whole question that remains is whether it is Andy (cc-ed) or me 
who is more competent to implement this change. ;-)

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-04 Thread Steve Fox
On Sun, 2006-12-03 at 01:16 -0800, Andrew Morton wrote:
> On Sun, 03 Dec 2006 00:50:55 -0500
> Ben Collins <[EMAIL PROTECTED]> wrote:
> 
> > Fixes this problem when libphy is compiled as module:
> >
> > WARNING: "current_is_keventd" [drivers/net/phy/libphy.ko] undefined!
> >
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 17c2f03..1cf226b 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -608,6 +608,7 @@ int current_is_keventd(void)
> > return ret;
> >
> >  }
> > +EXPORT_SYMBOL_GPL(current_is_keventd);
> >
> >  #ifdef CONFIG_HOTPLUG_CPU
> >  /* Take the work from this (downed) CPU. */
> 
> wtf?  That code was merged?  This bug has been known for months and after
> several unsuccessful attempts at trying to understand what on earth that
> hackery is supposed to be doing I gave up on it and asked Jeff to look after
> it.
> 
> Maciej, please, in very small words written in a very large font, explain to
> us what is going on in phy_stop_interrupts()?  Include pictures.

Unfortunately Maciej didn't get cc'ed correctly on your mail. Hopefully
I've added the right person to this post as well as Andy who has also
recently changed this code.

We're also hitting this on one of the test.kernel.org machines, bl6-13.

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export current_is_keventd() for libphy

2006-12-03 Thread Andrew Morton
On Sun, 03 Dec 2006 00:50:55 -0500
Ben Collins <[EMAIL PROTECTED]> wrote:

> Fixes this problem when libphy is compiled as module:
> 
> WARNING: "current_is_keventd" [drivers/net/phy/libphy.ko] undefined!
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 17c2f03..1cf226b 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -608,6 +608,7 @@ int current_is_keventd(void)
>   return ret;
>  
>  }
> +EXPORT_SYMBOL_GPL(current_is_keventd);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  /* Take the work from this (downed) CPU. */

wtf?  That code was merged?  This bug has been known for months and after
several unsuccessful attempts at trying to understand what on earth that
hackery is supposed to be doing I gave up on it and asked Jeff to look after
it.

Maciej, please, in very small words written in a very large font, explain to
us what is going on in phy_stop_interrupts()?  Include pictures.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Export current_is_keventd() for libphy

2006-12-02 Thread Ben Collins
Fixes this problem when libphy is compiled as module:

WARNING: "current_is_keventd" [drivers/net/phy/libphy.ko] undefined!

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 17c2f03..1cf226b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -608,6 +608,7 @@ int current_is_keventd(void)
return ret;
 
 }
+EXPORT_SYMBOL_GPL(current_is_keventd);
 
 #ifdef CONFIG_HOTPLUG_CPU
 /* Take the work from this (downed) CPU. */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/