date:20070413

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread Nick Piggin


William Lee Irwin III wrote:

Andrew Morton wrote:


Do a full pagetable walk, with all the associated locking from within
a systemtap script?  I'd be surprised.  Maybe if it's mostly hand-coded
in C, perhaps.  Then you just end up with the same thing, don't you?



On Fri, Apr 13, 2007 at 01:40:08PM +1000, Nick Piggin wrote:


And my problem isn't with the hardcoded pagetable walker. Yeah, we'd
probably still keep the pagetable callback walker thingy with Matt's
associated cleanups (and my subsequent ones to clean it up more and
move it to mm/): there are other in-kernel users for that anyway.
The point is the proc API, and exposing random little parts of deep
kernel internals that some people happen to find useful at the time.
(which is why we have an incredible proliferation of these things).
With systemtap scripts, you could walk pagetables and print *the exact
page information you want*, or you could walk pfns, or LRU, or page_tree,
or walk the page tree then the rmap structures. And you can selectively
cull out items you don't care about if you only care about a subset of
items, based on arbitrary criteria. And you can most likely do all that
more efficiently than with a conglomeration of various /proc files
(assuming they even provide what you want in the first place).



The EM guys are unwilling or unable for support-oriented reasons to
deal with anything but unmodified kernels as shipped by distros.


And I think major distros ship with kprobes enabled, so that is yet
another reason why systemtap should be considered before adding these
proc interfaces.

Thanks,
Nick

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread Nick Piggin


William Lee Irwin III wrote:

Andrew Morton wrote:


Then you just end up with the same thing, don't you?



On Fri, Apr 13, 2007 at 12:50:20PM +1000, Nick Piggin wrote:


Well _you_ do, because that happens to be exactly what you want. Bill
ends up with something that displays page_mapcount instead. And I
end up with something that traverses LRU lists rather than pfns. And
none of it goes in /proc/ or linux-2.6/.
So it isn't really the same thing at all.



The EM guys aren't dealing with the database; they're dealing with some
enterprise management thingie that does things like control how many
client connections are allowed for each database instance. Unless
they're doing less than I expect, and are largely something like procps
on steroids and enterprise silliness.


Ah, OK. Anyway, with kprobes/systemtap they can do whatever they like
and none of us need to care in the slightest ;)

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread William Lee Irwin III

William Lee Irwin III wrote:
>> The EM guys are unwilling or unable for support-oriented reasons to
>> deal with anything but unmodified kernels as shipped by distros.

On Fri, Apr 13, 2007 at 05:03:43PM +1000, Nick Piggin wrote:
> And I think major distros ship with kprobes enabled, so that is yet
> another reason why systemtap should be considered before adding these
> proc interfaces.

I'll have to check in and see if that will work for them. A lot of this
is about customer/distro/support interaction constraints on how it works
as opposed to purely technical affairs.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 05/10] add "permit user mounts in new namespace" clone flag

2007-04-13 Thread Miklos Szeredi

> question: how is mounting filesystems (loopback,
> fuse, etc) secured in such way that the user
> cannot 'create' device nodes with 'unfortunate'
> permissions?

All unprivileged mounts have "nosuid,nodev" added to their options.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 05/10] add "permit user mounts in new namespace" clone flag

2007-04-13 Thread Miklos Szeredi

> Given the existence of shared subtrees allowing/denying this at the mount
> namespace level is silly and wrong.
> 
> If we need more than just the filesystem permission checks can we
> make it a mount flag settable with mount and remount that allows
> non-privileged users the ability to create mount points under it
> in directories they have full read/write access to.

OK, that makes sense.

> I don't like the use of clone flags for this purpose but in this
> case the shared subtress are a much more fundamental reasons for not
> doing this at the namespace level.

I'll drop the clone flag, and add a mount flag instead.

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] wait_for_helper: remove unneeded do_sigaction()

2007-04-13 Thread Oleg Nesterov

allow_signal(SIGCHLD) does all necessary job, no need to call do_sigaction()
prior to.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 2.6.21-rc5/kernel/kmod.c~WH 2007-04-05 12:18:28.0 +0400
+++ 2.6.21-rc5/kernel/kmod.c2007-04-13 11:14:00.0 +0400
@@ -181,14 +181,9 @@ static int wait_for_helper(void *data)
 {
struct subprocess_info *sub_info = data;
pid_t pid;
-   struct k_sigaction sa;
 
/* Install a handler: if SIGCLD isn't handled sys_wait4 won't
 * populate the status, but will return -ECHILD. */
-   sa.sa.sa_handler = SIG_IGN;
-   sa.sa.sa_flags = 0;
-   siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
-   do_sigaction(SIGCHLD, &sa, NULL);
allow_signal(SIGCHLD);
 
pid = kernel_thread(call_usermodehelper, sub_info, SIGCHLD);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] worker_thread: don't play with SIGCHLD

2007-04-13 Thread Oleg Nesterov

depends on Eric's

kthread-dont-depend-on-work-queues-take-2.patch

worker_thread() inherits ignored SIGCHLD from its parent, kthreadd.
We can remove unneeded do_sigaction().

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- OLD/kernel/workqueue.c~wt   2007-04-05 12:20:35.0 +0400
+++ OLD/kernel/workqueue.c  2007-04-13 11:21:43.0 +0400
@@ -289,7 +289,6 @@ static int worker_thread(void *__cwq)
 {
struct cpu_workqueue_struct *cwq = __cwq;
DEFINE_WAIT(wait);
-   struct k_sigaction sa;
 
if (!cwq->wq->freezeable)
current->flags |= PF_NOFREEZE;
@@ -300,12 +299,6 @@ static int worker_thread(void *__cwq)
 */
numa_default_policy();
 
-   /* SIG_IGN makes children autoreap: see do_notify_parent(). */
-   sa.sa.sa_handler = SIG_IGN;
-   sa.sa.sa_flags = 0;
-   siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
-   do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
-
for (;;) {
if (cwq->wq->freezeable)
try_to_freeze();

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] change kernel threads to ignore signals instead of blocking them

2007-04-13 Thread Oleg Nesterov

On top of Eric's

kthread-dont-depend-on-work-queues-take-2.patch

Currently kernel threads use sigprocmask(SIG_BLOCK) to protect against signals.
This doesn't prevent the signal delivery, this only blocks signal_wake_up().
Every "killall -33 kthreadd" means a "struct siginfo" leak.

Change kthreadd_setup() to set all handlers to SIG_IGN instead of blocking them
(make a new helper ignore_signals() for that). If the kernel thread needs some
signal, it should use allow_signal() anyway, and in that case it should not use
CLONE_SIGHAND.

Note that we can't change daemonize() (should die!) in the same way, because
it can be used along with CLONE_SIGHAND. This means that allow_signal() still
should unblock the signal to work correctly with daemonize()ed threads.

However, disallow_signal() doesn't block the signal any longer but ignores it.

NOTE: with or without this patch the kernel threads are not protected from
handle_stop_signal(), this seems harmless, but not good.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

 include/linux/sched.h |1 +
 kernel/exit.c |2 +-
 kernel/kthread.c  |   17 +++--
 kernel/signal.c   |   10 ++
 4 files changed, 15 insertions(+), 15 deletions(-)

--- 2.6.21-rc5/include/linux/sched.h~1_SIGIGN   2007-04-05 12:18:28.0 
+0400
+++ 2.6.21-rc5/include/linux/sched.h2007-04-13 00:09:56.0 +0400
@@ -1299,6 +1299,7 @@ extern int in_egroup_p(gid_t);
 
 extern void proc_caches_init(void);
 extern void flush_signals(struct task_struct *);
+extern void ignore_signals(struct task_struct *);
 extern void flush_signal_handlers(struct task_struct *, int force_default);
 extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t 
*info);
 
--- 2.6.21-rc5/kernel/signal.c~1_SIGIGN 2007-04-05 12:18:28.0 +0400
+++ 2.6.21-rc5/kernel/signal.c  2007-04-13 02:14:06.0 +0400
@@ -329,6 +329,16 @@ void flush_signals(struct task_struct *t
spin_unlock_irqrestore(&t->sighand->siglock, flags);
 }
 
+void ignore_signals(struct task_struct *t)
+{
+   int i;
+
+   for (i = 0; i < _NSIG; ++i)
+   t->sighand->action[i].sa.sa_handler = SIG_IGN;
+
+   flush_signals(t);
+}
+
 /*
  * Flush all handlers for a task.
  */
--- 2.6.21-rc5/kernel/kthread.c~1_SIGIGN2007-04-12 23:18:09.0 
+0400
+++ 2.6.21-rc5/kernel/kthread.c 2007-04-13 02:27:39.0 +0400
@@ -215,24 +215,13 @@ EXPORT_SYMBOL(kthread_stop);
 static __init void kthreadd_setup(void)
 {
struct task_struct *tsk = current;
-   struct k_sigaction sa;
-   sigset_t blocked;
 
set_task_comm(tsk, "kthreadd");
 
-   /* Block and flush all signals */
-   sigfillset(&blocked);
-   sigprocmask(SIG_BLOCK, &blocked, NULL);
-   flush_signals(tsk);
-
-   /* SIG_IGN makes children autoreap: see do_notify_parent(). */
-   sa.sa.sa_handler = SIG_IGN;
-   sa.sa.sa_flags = 0;
-   siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
-   do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
+   ignore_signals(tsk);
 
-   set_user_nice(current, -5);
-   set_cpus_allowed(current, CPU_MASK_ALL);
+   set_user_nice(tsk, -5);
+   set_cpus_allowed(tsk, CPU_MASK_ALL);
 }
 
 int kthreadd(void *unused)
--- 2.6.21-rc5/kernel/exit.c~1_SIGIGN   2007-04-12 23:23:50.0 +0400
+++ 2.6.21-rc5/kernel/exit.c2007-04-13 10:17:06.0 +0400
@@ -348,7 +348,7 @@ int disallow_signal(int sig)
return -EINVAL;
 
spin_lock_irq(¤t->sighand->siglock);
-   sigaddset(¤t->blocked, sig);
+   current->sighand->action[(sig)-1].sa.sa_handler = SIG_IGN;
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
return 0;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

68328serial & pm_register

2007-04-13 Thread Christoph Hellwig

Hi Greg,

68328serial is the last driver to call pm_register and thus using and
keeping alive the really old PM scheme.  Any chance to convert it over
to platform devices (which would also clean up a lot of the ifdef
mess in the driver), or simply rip out that rudimentary PM support?

On a less urgent basis, is there any chance to convert the driver
to use serial_core, which it doesn't despite living in drivers/serial?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] [RFC] Common power driver for Linux gadgets

2007-04-13 Thread Anton Vorontsov

Hello David,

On Thu, Apr 12, 2007 at 11:06:46PM -0700, David Brownell wrote:
> > This driver used to stop code/logic duplication through different
> > machines we porting at handhelds.org. pda_power register machs' power
> > supplies, and will take care about notifying batteries about power
> > changes through external power interface.
> 
> It gets USB power management wrong though.  Have a look at two I2C drivers
> in drivers/i2c/chips:  tps65010.c (which would talk to this API) and then
> isp1301_omap.c (which would talk to the tps65010 driver).  The tps65010
> chip accepts the same two power sources you're addressing here, as part
> of its battery charging responsibilities. [1]
> 
> Key points:
> 
>  - The API needs to say *how much power* can be drawn.  Common values
>are 8 mA (OTG peripherals before configuration), 100 mA (non-OTG ones
>before configuration), 500 mA (high power configurations) ... but the
>exact value depends on what's listed in the configuration descriptor
>for the chosen configuration, any value 0..500 mA (increments of two)
>could be appropriate.
> 
>  - Sensing VBUS power is not the same thing as being allowed to consume
>it.  Again, USB OTG devices are different:  OTG hosts **SUPPLY** the
>current, so you really don't want to be telling the OTG transceiver to
>fire up its charge pump (say, 3.0V VBAT converted to 5V VBUS) while at
>the same time telling the battery manager to use that battery-derived
>VBUS current to recharge its battery!  Not only would that waste power,
>but it also deprives the peripheral of the power it needs to draw.
> 
> In general, no component other than the USB peripheral (or host) controller
> driver has any business trying to control how VBUS power is used.  It's
> likely to get it wrong ... as shown by this patch.  ;)
> 
> And that's exactly why the USB gadget API has had calls to manage the USB
> power consumption since mid-2004.  And I wonder why H5000 code evidently
> doesn't implement those calls.
> 
> - Dave
> 
> [1] You may find http://www.linux-usb.org/gadget/h2-otg.html useful too.
> It gives a high level map of the complicated OTG case, including
> those drivers, and points out how simpler peripheral-only systems
> could behave with the same drivers.  However it does turn out to
> be useful if USB peripheral drivers have a "transciever" notion
> that can receive "VBUS present" interrupts, letting the peripheral
> controller driver power down almost *everything* to save power,
> and that otg_transceiver interface does that job.

I see. Current devices I have just consumes power from USB host. I.e.
they able to boot using only USB and discharged battery. :-/ Even
more, HP iPaq hx4700 able to charge battery from USB (using that driver).
We just setting USB charging GPIO, and it starts consume power from
the host.

But I got the point, and yes I can't explain why it works correctly.

(1) Anyway, you're hinting that you'd want to see some
"usb_vbus_start_consume_power(how much)" callback, which will be
handled by gadget driver? Or it should just start using some gadget api?
Or even pda_battery must become an "usb power supplicant" gadget itself
to consume power by law?

-

I'm stuck in null-modem century, so don't wonder my USB-dumbness.
Will read http://www.linux-usb.org/gadget/h2-otg.html, of course.

Though I'd appreciate answers for (1), thus I can think in right
direction from the start.

Much thanks for your comments.

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.org/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread Christoph Hellwig

On Fri, Apr 13, 2007 at 05:05:47PM +1000, Nick Piggin wrote:
> Ah, OK. Anyway, with kprobes/systemtap they can do whatever they like
> and none of us need to care in the slightest ;)

Umm, folks.  systemtap basically means people compile kernel modules
from an odd scripting language with embedded C snipplets into kernel
modules.  The kernel modules don't use normal exported APIs but use
kallsysms and dwarf info to poke into every possible private bit.

Saying you don't care the slightest whether oracle will load huge
amounts of code into the kernel that depends on intimate implementation
details, and that you don't even have source to to debug it is not what
I'd call "none of us need to care in the slightest", at least for those
of you working for distributions that may actually have to debug this
shit in the end.

While we're at it, what happened to the idea of tainting the kernel
as soon as krpobes are placed in the kernel to at least make people
aware of it?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread Nick Piggin


Christoph Hellwig wrote:

On Fri, Apr 13, 2007 at 05:05:47PM +1000, Nick Piggin wrote:


Ah, OK. Anyway, with kprobes/systemtap they can do whatever they like
and none of us need to care in the slightest ;)



Umm, folks.  systemtap basically means people compile kernel modules
from an odd scripting language with embedded C snipplets into kernel
modules.  The kernel modules don't use normal exported APIs but use
kallsysms and dwarf info to poke into every possible private bit.

Saying you don't care the slightest whether oracle will load huge
amounts of code into the kernel that depends on intimate implementation
details, and that you don't even have source to to debug it is not what
I'd call "none of us need to care in the slightest", at least for those
of you working for distributions that may actually have to debug this
shit in the end.


Yeah good point ;) I just meant the wider "we".

With all the problems kprobes has, something like poking deep into
kernel internals seems like a good thing to use it for instead of
hardcoding that stuff into the kernel. If not, then why did we even
merge it in the first place?

We could distribute some systemtap scripts, and even distribute some
basic useful ones like this sort of page info in the kernel source
tree.



While we're at it, what happened to the idea of tainting the kernel
as soon as krpobes are placed in the kernel to at least make people
aware of it?


Seems like a very good idea.


--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] Kill off legacy power management stuff.

2007-04-13 Thread Stephen Rothwell

On Thu, 12 Apr 2007 20:33:16 -0400 (EDT) "Robert P. J. Day" <[EMAIL PROTECTED]> 
wrote:
>
> just something i threw together, not in final form, but it represents
> tossing the legacy PM stuff.  at the moment, the menuconfig entry for
> PM_LEGACY lists it as "DEPRECATED", while the help screen calls it
> "obsolete."  that's a good sign that it's getting close to the time
> for it to go, and the removal is fairly straightforward, but there's
> no mention of its removal in the feature removal schedule file.

One thing that comes to mind is that you will need some way to make sure
that only one of ACPI and APM get initialized ...

--
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgp3dq7qKWx4J.pgp
Description: PGP signature

Re: [PATCH] Stop pmac_zilog from abusing 8250's device numbers.

2007-04-13 Thread Geert Uytterhoeven

On Thu, 12 Apr 2007, David Lang wrote:
> On Thu, 12 Apr 2007, Gerhard Mack wrote:
> > Sometimes it's not the speed it's the cost.. The best I've ever done is
> > 5.5 interfaces per u/ Although with a better motherboard and case it might
> > have been different.
> 
> I have a bunch of servers from rackable, dual core cpu, 2G ram 2xgigE on the 
> motherboard (1x100M on motherboard), 4x Intel E1000 quad port cards, 120G 
> SATA 
> drive, DVD burner, floppy
> 
> 3u, 18 gig ports, just under $5k

I don't think we're still talking about _serial_ ports?

Gr{oetje,eeting}s,

Geert

P.S. Yes, Ethernet is serial ;-)
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread Christoph Hellwig

On Fri, Apr 13, 2007 at 06:03:45PM +1000, Nick Piggin wrote:
> Yeah good point ;) I just meant the wider "we".
> 
> With all the problems kprobes has, something like poking deep into
> kernel internals seems like a good thing to use it for instead of
> hardcoding that stuff into the kernel. If not, then why did we even
> merge it in the first place?

It's very nice to poke deep into the kernel for development purposes.
For example for the spu scheduler work I'm doing currently I have
a module using kprobes (note the systemtap crap because it's big, bloated,
in and odd language, and does a lot of really wrong things in it's runtime).

This module allows me to put probes into various places in the scheduler
and writes them into a ringbuffer with timestampts allowing me to
trace what's going on there.  This is really neat.  Unfortunately it
breaks as soon as I do some major reshuffling because then the points
it hooks up to are not there anymore.  That's perfectly fine for my
setup, because _I_ know what I have to change when it breaks, and can
easily fix that.  Now imagine a similar module to trace pagecache activity
used by a third-party monitoring tool.  We now get a major change to
the pagecache (say to make it lockless), and the probes just break.  In
the est case it just doesn't work anymore, in the worst case it crashes
the kernel.  Now if the app vendor at least gave me their source I
could at least fix it to not crash, but there's a fair enough chance
they poke into bits that simply aren't there anymore.

Now if we have a proper user interface with real code behing it we can
have a defined interface.  If the interface is bad enough (or just too
lowlevel) we might have the last problem of stastistic that were there
once to go away aswell, but we can deal with that gracefully by declaring
parts of the stats volatile and make sure people don't mess with them.

To summarize, I really love kprobes to ease my debugging work, but using
it for any kind of production code is a total nightmare.

> We could distribute some systemtap scripts, and even distribute some
> basic useful ones like this sort of page info in the kernel source
> tree.

We could not really distribute systemtap scripts with the kernel.
systemtap is a bloody complicated piece of shit outside the kernel
tree that breaks all the time we change kernel internals.  We could
provide useful kprobes modules, a proper tracing system (ltt-ng-lite)
and surrounding infrastucture.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [AppArmor 00/41] AppArmor security module overview

2007-04-13 Thread Rob Meijer

I've posted on the subject before, and as noone seemed to truely relate
to the concept I concequently dropped my effords, but as you seem to be half
a step in the general right direction, this may be a good time to bring
it up again.

If instead of 'least privilege' and fat profiles, you would opt for 'least
authority' and (potentialy) thin profiles, this would make the whole also
usable for systems initialy designed with security in mind.
To accomplish such a thing I proposed that the non 'at' family of calls
for filesystems should be considdered to be priviledged operations (within
a least authority context), and the 'at' family of operations should have
invocations with uptree path tokens also be considdered priviledged
(within a least authority context). The non uptree invocations of the 'at'
family
should be considdered unpriviledged, and should thus always be allowed
seperately from any profile.

With this distinction implemented, you could than potentialy use thin
profiles and dynamicly exchange authority by passing along directory
file handles when needed.

Where with the 'least priviledge' approach the profile would need to grant
the each priviledge the application might need, but when you would be able
to use both file handles, socket handles and directory handles as tokens
of authorization that can be comunicated seperately and freely, without
being stopped from being used by least priviledge profiles, the profiles
needed will start out much thiner, and will dynamicly expand to just a
limited subset of the fat least priviledge profile.

I believe that being able to be able to distinguish between the
the least authority (no uptree) usage of open file/dir/socket handles and
other operations is essential. If a profile blocks a program from using
these open file/dir/socket handles in a way not violating least authority
(that is no uptree tokens in at family calls), this would be a major
design flaw.

Rob

On Thu, April 12, 2007 11:08, [EMAIL PROTECTED] wrote:

> AppArmor's Overall Design
> =
>
> AppArmor protects systems from vulnerable software by confining
> processes, giving them "least privilege" access to the system's
> resources: with least privilege, processes are allowed exactly what they
> need, nothing more, and nothing less. Systems are thus protected from
> bugs in applications that would lead to privilege escalation, such as
> remote system access because of a buffer overflow in a web server, etc.
>
> AppArmor does this by defining application profiles which list allowed
> accesses, and assigning those profiles to processes. AppArmor does *not*
> require the user to confine all processes on the system.  Rather, you
> need to provide AppArmor profiles for every process that is potentially
> subject to manipulation by the attacker. For instance, to defend against
> network attack, confine all process that access the network.
>
> The corollary to this is that attacks against AppArmor that start with
> "assume some unconfined process does ..." are outside the AppArmor
> threat model. Any process that might do something malicious to an
> AppArmor system should be confined by an AppArmor profile.
>
> The kernel manages many different kinds of resources.  AppArmor
> currently controls access to two key resources among them: files, and
> POSIX Capabilities.  (Additional protection for things like rlimits,
> interprocess communication, and network access are being worked on, and
> are expected to become available in a future version.)
>
>
> File Access Control
> ---
>
> Application profiles control file access by pathname: each profile
> contains a list of fully qualified pathnames (potentially containing
> globbing) and associated access modes: read (r), write (w), different
> kinds of execute (ix, Px, px, Ux, ux), create hard-link (l), and memory
> map for execution (m).
>
> For example, the following two rules permit read access to any file
> below /srv/www/htdocs/**.html, and memory map for execution (m), execute
> inheriting the current profile (ix), and read (r) access to
> /usr/sbin/suexec2:
>
> /srv/www/htdocs/**.html r,
> /usr/sbin/suexec2 mixr,

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread William Lee Irwin III

On Fri, Apr 13, 2007 at 08:51:42AM +0100, Christoph Hellwig wrote:
> Umm, folks.  systemtap basically means people compile kernel modules
> from an odd scripting language with embedded C snipplets into kernel
> modules.  The kernel modules don't use normal exported APIs but use
> kallsysms and dwarf info to poke into every possible private bit.
> Saying you don't care the slightest whether oracle will load huge
> amounts of code into the kernel that depends on intimate implementation
> details, and that you don't even have source to to debug it is not what
> I'd call "none of us need to care in the slightest", at least for those
> of you working for distributions that may actually have to debug this
> shit in the end.
> While we're at it, what happened to the idea of tainting the kernel
> as soon as krpobes are placed in the kernel to at least make people
> aware of it?

This is for a system monitoring app outside the database proper that
just happens to be done by the same .com as makes the database. It's
got little to do with the database itself apart from knowing how to
tell the database to e.g. let fewer clients in. The part that deals
with this is sort of like a custom procps that does things rather
specifically how they need them, including being portable to other
OS's IIRC, though the scope of the app is much larger than that.

They're actually quite concerned about issues of this sort since they
want to run all the time in the background in order to respond to
system conditions, though probably not necessarily rapid-fire sorts of
responses to second-by-second changes in conditions.

Basically, they're not a debugging affair, and they need to be able to
run in supported conditions. They're rather disinterested in things
that would, say, taint the kernel or take customers out of supported
configurations. They'll fall back to the known-grossly-inaccurate
RSS-based estimates they're using now in preference to such.

They don't want omnibus back doors into the kernel and I honestly
expect them to NAK the systemtap solution. They really want the
"uniquely attributable RSS" or "proportional RSS" reported directly,
and it takes some doing to convince them that this can't be done
directly for various reasons, e.g. floating point in the kernel won't
fly. They can program sufficiently well to maintain a database of pfn's,
pid's of processes mapping them, and user virtual addresses they're
mapped at (easy enough to kick off a database instance for it if they
don't feel comfortable just hashing the triples) and do the tabulation
from there, though they're not happy having to do so much of the
calculation themselves. Actually, I promised them reporting of mapcount
which would make per-process UARSS/PRSS calculation able to be done on
a process-by-process basis, though I can probably convince them to do
whole-system pfn-by-pfn tabulation if such is lacking.

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [AppArmor 37/41] AppArmor: Main Part

2007-04-13 Thread Andreas Gruenbacher

On Thursday 12 April 2007 12:37, Alan Cox wrote:
> The proc file system may not be mounted at /proc. There are environments
> where this is done for good reason (eg not wanting the /proc info exposed
> to a low trust environment). Another is when FUSE is providing an
> arbitrated proc either by merging across clusters or by removing stuff.
> [...]
> Why can't this be done in the profile itself to avoid kernel special case
> uglies and inflexibility ?

Good points. I'm in fact not sure how this could have been missed, and indeed 
it makes more sense to put this in profiles.

Thanks,
Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] Kill off legacy power management stuff.

2007-04-13 Thread Rafael J. Wysocki

[appropriate CCs added]

On Friday, 13 April 2007 02:33, Robert P. J. Day wrote:
> 
> just something i threw together, not in final form, but it represents
> tossing the legacy PM stuff.  at the moment, the menuconfig entry for
> PM_LEGACY lists it as "DEPRECATED", while the help screen calls it
> "obsolete."  that's a good sign that it's getting close to the time
> for it to go, and the removal is fairly straightforward, but there's
> no mention of its removal in the feature removal schedule file.

It's been like this for a long long time.  I think you're right that it can be
dropped, but I don't know the details (eg. why it hasn't been dropped yet).
 
> NOTE:  this is not a working patch as it will fail on a MIPS or FR-V
> build, as i didn't remove the final vestiges from those two
> architectures.  that would require simply killing off the remaining
> calls to pm_send_all(), that's all.  (i think.)
> 
> anyway, this has been compile-tested on x86 with "make allyesconfig."
> 
> 
>  Documentation/pm.txt |  123 ---
>  arch/i386/kernel/apm.c   |   27 
>  drivers/acpi/bus.c   |   14 --
>  drivers/net/3c509.c  |1
>  drivers/serial/68328serial.c |   59 -
>  include/linux/pm.h   |   70 ---
>  include/linux/pm_legacy.h|   41 --
>  kernel/power/Kconfig |   10 -
>  kernel/power/Makefile|1
>  kernel/power/pm.c|  209 -
>  10 files changed, 1 insertion(+), 554 deletions(-)
> 
> 
> diff --git a/Documentation/pm.txt b/Documentation/pm.txt
> index da8589a..d0fcfe2 100644
> --- a/Documentation/pm.txt
> +++ b/Documentation/pm.txt
> @@ -36,93 +36,6 @@ system the associated daemon will exit gracefully.
>apmd:   http://worldvisions.ca/~apenwarr/apmd/
>acpid:  http://acpid.sf.net/
> 
> -Driver Interface -- OBSOLETE, DO NOT USE!
> -*
> -
> -Note: pm_register(), pm_access(), pm_dev_idle() and friends are
> -obsolete. Please do not use them. Instead you should properly hook
> -your driver into the driver model, and use its suspend()/resume()
> -callbacks to do this kind of stuff.
> -
> -If you are writing a new driver or maintaining an old driver, it
> -should include power management support.  Without power management
> -support, a single driver may prevent a system with power management
> -capabilities from ever being able to suspend (safely).
> -
> -Overview:
> -1) Register each instance of a device with "pm_register"
> -2) Call "pm_access" before accessing the hardware.
> -   (this will ensure that the hardware is awake and ready)
> -3) Your "pm_callback" is called before going into a
> -   suspend state (ACPI D1-D3) or after resuming (ACPI D0)
> -   from a suspend.
> -4) Call "pm_dev_idle" when the device is not being used
> -   (optional but will improve device idle detection)
> -5) When unloaded, unregister the device with "pm_unregister"
> -
> -/*
> - * Description: Register a device with the power-management subsystem
> - *
> - * Parameters:
> - *   type - device type (PCI device, system device, ...)
> - *   id - instance number or unique identifier
> - *   cback - request handler callback (suspend, resume, ...)
> - *
> - * Returns: Registered PM device or NULL on error
> - *
> - * Examples:
> - *   dev = pm_register(PM_SYS_DEV, PM_SYS_VGA, vga_callback);
> - *
> - *   struct pci_dev *pci_dev = pci_find_dev(...);
> - *   dev = pm_register(PM_PCI_DEV, PM_PCI_ID(pci_dev), callback);
> - */
> -struct pm_dev *pm_register(pm_dev_t type, unsigned long id, pm_callback 
> cback);
> -
> -/*
> - * Description: Unregister a device with the power management subsystem
> - *
> - * Parameters:
> - *   dev - PM device previously returned from pm_register
> - */
> -void pm_unregister(struct pm_dev *dev);
> -
> -/*
> - * Description: Unregister all devices with a matching callback function
> - *
> - * Parameters:
> - *   cback - previously registered request callback
> - *
> - * Notes: Provided for easier porting from old APM interface
> - */
> -void pm_unregister_all(pm_callback cback);
> -
> -/*
> - * Power management request callback
> - *
> - * Parameters:
> - *   dev - PM device previously returned from pm_register
> - *   rqst - request type
> - *   data - data, if any, associated with the request
> - *
> - * Returns: 0 if the request is successful
> - *  EINVAL if the request is not supported
> - *  EBUSY if the device is now busy and cannot handle the request
> - *  ENOMEM if the device was unable to handle the request due to 
> memory
> - *
> - * Details: The device request callback will be called before the
> - *  device/system enters a suspend state (ACPI D1-D3) or
> - *  or after the device/system resumes from suspend (ACPI D0).
> - *  For PM_SUSPEND, the ACPI D-state being entered is passed
> - *  as the "data" argument to the callback.  The device
> - *  driver should

Re: [PATCH][RFC] Kill off legacy power management stuff.

2007-04-13 Thread Robert P. J. Day

On Fri, 13 Apr 2007, Stephen Rothwell wrote:

> On Thu, 12 Apr 2007 20:33:16 -0400 (EDT) "Robert P. J. Day" <[EMAIL 
> PROTECTED]> wrote:
> >
> > just something i threw together, not in final form, but it
> > represents tossing the legacy PM stuff.  at the moment, the
> > menuconfig entry for PM_LEGACY lists it as "DEPRECATED", while the
> > help screen calls it "obsolete."  that's a good sign that it's
> > getting close to the time for it to go, and the removal is fairly
> > straightforward, but there's no mention of its removal in the
> > feature removal schedule file.
>
> One thing that comes to mind is that you will need some way to make sure
> that only one of ACPI and APM get initialized ...

i don't see how that has anything to do with removing legacy PM
support.  you can select both ACPI and APM *now*.  if that's a bad
thing, then fixing it is a completely independent issue.

rday
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread Nick Piggin


Christoph Hellwig wrote:

On Fri, Apr 13, 2007 at 06:03:45PM +1000, Nick Piggin wrote:


Yeah good point ;) I just meant the wider "we".

With all the problems kprobes has, something like poking deep into
kernel internals seems like a good thing to use it for instead of
hardcoding that stuff into the kernel. If not, then why did we even
merge it in the first place?



It's very nice to poke deep into the kernel for development purposes.
For example for the spu scheduler work I'm doing currently I have
a module using kprobes (note the systemtap crap because it's big, bloated,
in and odd language, and does a lot of really wrong things in it's runtime).


OK, I pick systemtap because I don't know any better... but kprobes
is what I mean for the kernel interface.



This module allows me to put probes into various places in the scheduler
and writes them into a ringbuffer with timestampts allowing me to
trace what's going on there.  This is really neat.  Unfortunately it
breaks as soon as I do some major reshuffling because then the points
it hooks up to are not there anymore.  That's perfectly fine for my
setup, because _I_ know what I have to change when it breaks, and can
easily fix that.  Now imagine a similar module to trace pagecache activity
used by a third-party monitoring tool.  We now get a major change to
the pagecache (say to make it lockless), and the probes just break.  In
the est case it just doesn't work anymore, in the worst case it crashes
the kernel.  Now if the app vendor at least gave me their source I
could at least fix it to not crash, but there's a fair enough chance
they poke into bits that simply aren't there anymore.

Now if we have a proper user interface with real code behing it we can
have a defined interface.  If the interface is bad enough (or just too
lowlevel) we might have the last problem of stastistic that were there
once to go away aswell, but we can deal with that gracefully by declaring
parts of the stats volatile and make sure people don't mess with them.

To summarize, I really love kprobes to ease my debugging work, but using
it for any kind of production code is a total nightmare.


OK, well Matt's stuff that he needs doesn't have to be kprobes at all,
and yes if lots of people want the same thing then we could distribute
it with the kernel.

But at least make it into its own module with a debugfs interface or
something. I mean, exposing a PG_name-to-nr and page count pfn and flags
as a supposedly formal proc interface doesn't sound nice to me. Page
flags does not tell you what is going on in the VM, it gives you a tiny
window into "something". Between reading a /proc/pid/ pfn and finding
the pfn's page flags, it could be used for something completely different
anyway.



We could distribute some systemtap scripts, and even distribute some
basic useful ones like this sort of page info in the kernel source
tree.



We could not really distribute systemtap scripts with the kernel.
systemtap is a bloody complicated piece of shit outside the kernel
tree that breaks all the time we change kernel internals.  We could
provide useful kprobes modules, a proper tracing system (ltt-ng-lite)
and surrounding infrastucture.


OK ;)

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] Kill off legacy power management stuff.

2007-04-13 Thread Stephen Rothwell

On Fri, 13 Apr 2007 04:20:10 -0400 (EDT) "Robert P. J. Day" <[EMAIL PROTECTED]> 
wrote:
>
> On Fri, 13 Apr 2007, Stephen Rothwell wrote:
> >
> > One thing that comes to mind is that you will need some way to make sure
> > that only one of ACPI and APM get initialized ...
>
> i don't see how that has anything to do with removing legacy PM
> support.  you can select both ACPI and APM *now*.  if that's a bad
> thing, then fixing it is a completely independent issue.

Except your patch removes this hunk:

@@ -2264,14 +2248,6 @@ static int __init apm_init(void)
apm_info.disabled = 1;
return -ENODEV;
}
-   if (PM_IS_ACTIVE()) {
-   printk(KERN_NOTICE "apm: overridden by ACPI.\n");
-   apm_info.disabled = 1;
-   return -ENODEV;
-   }
-#ifdef CONFIG_PM_LEGACY
-   pm_active = 1;
-#endif

in apm.c and a similar piece of the ACPI initialisation that prevented
one initialising if the other had already initialised.

--
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpn9wuNsU4bB.pgp
Description: PGP signature

Re: [PATCH 2/7] [RFC] Common power driver for Linux gadgets

2007-04-13 Thread David Brownell

On Friday 13 April 2007 12:36 am, Anton Vorontsov wrote:
> Hello David,
> > 
> >  - The API needs to say *how much power* can be drawn...
> > 
> >  - Sensing VBUS power is not the same thing as being allowed to consume
> >it.  Again, USB OTG devices are different:  OTG hosts **SUPPLY** the
> >current, ...
> 
> I see. Current devices I have just consumes power from USB host. I.e.
> they able to boot using only USB and discharged battery. :-/ Even
> more, HP iPaq hx4700 able to charge battery from USB (using that driver).
> We just setting USB charging GPIO, and it starts consume power from
> the host.

Sounds like it could be more power than the host expects it to consume;
or in some cases, not as much as it could ... ISTR only the Ethernet
gadget defaults to 100mA (in non-OTG configs), the others are set for
only 2 mA (expecting self-powered configs).

> But I got the point, and yes I can't explain why it works correctly.

It probably doesn't work correctly.  But it's not broken enough to
fail badly.

> (1) Anyway, you're hinting that you'd want to see some
> "usb_vbus_start_consume_power(how much)" callback, which will be
> handled by gadget driver? 

The gadget API includes usb_gadget_vbus_draw(gadget, mA) which the
gadget drivers (e.g. network, storage, tty) issue.  How a given board
implements that is best wrapped in a transceiver abstraction, which
would also issue the usb_gadget_vbus_{connect,disconnect}(gadget)
notifications back to the controller.

That is, in the pure-peripheral case there are at three entities involved
outside the battery/power framework:  the gadget driver (what protocol is
used over USB), the peripheral controller driver (talking to USB hardware),
and the transceiver (sensing VBUS and managing power state transitions of
that controller).

In dumb cases (most reference boards!) the transceiver is stubbed
out, and there's no power management either to put the USB stuff
into lowpower mode when it's inactive, or to make use of VBUS power
to help manage the battery (what battery?).  In less dumb cases,
VBUS is only used to sense whether the host is there.  In vaguely
smart cases VBUS will be used to power the USB hardware (saving
maybe 40 mA of battery power in one case).  Recharging the battery
is for some reason not always supported even in product boards.

> Or it should just start using some gadget api? 
> Or even pda_battery must become an "usb power supplicant" gadget itself
> to consume power by law?

I don't know about this "supplicant" thing.  That's actually below
what the gadget API exposes -- implementation detail, but you're
thinking about how to implement such board-specific details.  I
guess I'd think that's a board-specific detail in the same way
that the transceiver is; you'll observe isp1301_omap is where
such things hook up in that example implementation.

- Dave

> I'm stuck in null-modem century, so don't wonder my USB-dumbness.
> Will read http://www.linux-usb.org/gadget/h2-otg.html, of course.
> 
> Though I'd appreciate answers for (1), thus I can think in right
> direction from the start.
> 
> Much thanks for your comments.
> 
> -- 
> Anton Vorontsov
> email: [EMAIL PROTECTED]
> backup email: [EMAIL PROTECTED]
> irc://irc.freenode.org/bd2
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [AppArmor 37/41] AppArmor: Main Part

2007-04-13 Thread Andreas Gruenbacher

On Thursday 12 April 2007 12:37, Alan Cox wrote:
> > +   if (PTR_ERR(sa->name) == -ENOENT && (check & AA_CHECK_FD))
> > +   denied_mask = 0;
>
> Now there is an interesting question. Is PTR_ERR() safe for kernel
> pointers on all platforms or just for user ones ?

It's used for kernel pointers all over the place and mmap also mixes user 
addresses with -Exxx, so it's definitely supposed to work. I'm not sure how 
exactly the topmost page is kept from getting mapped.

Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] Kill off legacy power management stuff.

2007-04-13 Thread Robert P. J. Day

On Fri, 13 Apr 2007, Stephen Rothwell wrote:

> On Fri, 13 Apr 2007 04:20:10 -0400 (EDT) "Robert P. J. Day" <[EMAIL 
> PROTECTED]> wrote:
> >
> > On Fri, 13 Apr 2007, Stephen Rothwell wrote:
> > >
> > > One thing that comes to mind is that you will need some way to make sure
> > > that only one of ACPI and APM get initialized ...
> >
> > i don't see how that has anything to do with removing legacy PM
> > support.  you can select both ACPI and APM *now*.  if that's a bad
> > thing, then fixing it is a completely independent issue.
>
> Except your patch removes this hunk:
>
> @@ -2264,14 +2248,6 @@ static int __init apm_init(void)
>   apm_info.disabled = 1;
>   return -ENODEV;
>   }
> - if (PM_IS_ACTIVE()) {
> - printk(KERN_NOTICE "apm: overridden by ACPI.\n");
> - apm_info.disabled = 1;
> - return -ENODEV;
> - }
> -#ifdef CONFIG_PM_LEGACY
> - pm_active = 1;
> -#endif
>
> in apm.c and a similar piece of the ACPI initialisation that
> prevented one initialising if the other had already initialised.

ah, gotcha.  i'll take a closer look at that once in land in sunny
california.  but not right away.  :-)

rday
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21-rc6

2007-04-13 Thread Mattia Dongili

On Thu, Apr 12, 2007 at 09:26:44PM +0300, Maxim Levitsky wrote:
> On Thursday 12 April 2007 18:14:02 Mattia Dongili wrote:
> > On Thu, Apr 05, 2007 at 07:50:11PM -0700, Linus Torvalds wrote:
> > ...
> > > Maxim Levitsky (1):
> > >   Add suspend/resume for HPET
> > 
> > This one breaks resume for me (from STR) on a vaio SZ. Reverting this
> > commit allows resuming again but leaves me with some periodic and 
> > unpleasant:
> > 
> > [  155.232000] BUG: soft lockup detected on CPU#1!
> > [  155.232000]  [] show_trace_log_lvl+0x1a/0x2f
> > [  155.232000]  [] show_trace+0x12/0x14
> > [  155.232000]  [] dump_stack+0x16/0x18
> > [  155.232000]  [] softlockup_tick+0xa7/0xb6
> > [  155.232000]  [] run_local_timers+0x12/0x14
> > [  155.232000]  [] update_process_times+0x3e/0x63
> > [  155.232000]  [] tick_sched_timer+0x50/0x95
> > [  155.232000]  [] hrtimer_interrupt+0x10b/0x18b
> > [  155.232000]  [] smp_apic_timer_interrupt+0x6c/0x7e
> > [  155.232000]  [] apic_timer_interrupt+0x28/0x30
> > [  155.232000]  [] cpu_idle+0x1b/0xc7
> > [  155.232000]  [] start_secondary+0x32b/0x333
> > [  155.232000]  [<>] run_init_process+0x3fefed10/0x19
> > [  155.232000]  ===
> > 
> > FWIW: I hit the same BUG() in -rc5.
> > full boot+suspend+resume log: 
> > http://oioio.altervista.org/linux/kern-2.6.21-rc6.log
> > .config: http://oioio.altervista.org/linux/config-2.6.21-rc6-1
> > 
> > I'm available to test more patches or to provide other info.
> 
> Strange,strange...

Yes... strange. I can't reproduce the resume breakage anymore, with or
without your patch. I still have the soft lockup anyway after resuming.
I'll still keep trying, for now just disregard my previous mail.

-- 
mattia
:wq!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [AppArmor 37/41] AppArmor: Main Part

2007-04-13 Thread Nick Piggin


Andreas Gruenbacher wrote:

On Thursday 12 April 2007 12:37, Alan Cox wrote:


+   if (PTR_ERR(sa->name) == -ENOENT && (check & AA_CHECK_FD))
+   denied_mask = 0;


Now there is an interesting question. Is PTR_ERR() safe for kernel
pointers on all platforms or just for user ones ?



It's used for kernel pointers all over the place and mmap also mixes user 
addresses with -Exxx, so it's definitely supposed to work. I'm not sure how 
exactly the topmost page is kept from getting mapped.


Yeah, the comments indicate it was first used for dentries.

I wonder if it shouldn't be using the NULL page instead? (ie 0-4095)

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: How should an exit routine wait for release() callbacks?

2007-04-13 Thread Cornelia Huck

On Thu, 12 Apr 2007 17:23:18 -0400 (EDT),
Alan Stern <[EMAIL PROTECTED]> wrote:

> Here's a not-so-theoretical question.
> 
> I've got a module which registers a struct device.  (It represents a
> virtual device, not a real one, but that doesn't matter.)  Obviously the
> module's exit routine has to wait until the release() routine for that
> device has been invoked -- if it returned too early then the release()
> call would oops.
> 
> How should it wait?

Device lifetime vs. module lifetime - that's a fun one...

> 
> The most straightforward approach is to use a struct completion, like 
> this:
> 
>   static struct {
>   struct device dev;
>   ...
>   } my_dev;
> 
>   static DECLARE_COMPLETION(my_completion);
> 
>   static void my_release(struct device *dev)
>   {
>   complete(&my_completion);
>   }
> 
>   static void __exit my_exit(void)
>   {
>   device_unregister(&my_dev.dev);
>   wait_for_completion(&my_completion);
>   }
> 
> The problem is that there is no guarantee a context switch won't take
> place after my_release() has called complete() and before my_release()  
> returns.  If that happens and my_exit() finishes running, then the module
> will be unloaded and the next context switch back to finish off
> my_release() will oops.
> 
> Other approaches have similar defects.  So how can this problem be solved?

What I see that a device driver may do now is the following:
- disallow module unloading (duh)
- move the release function outside the module

To make the completion approach work, the complete() would need to be
after the release function. This would imply an upper layer, but this
upper layer would need to access the completion structure in the
module...

One could think about a owner field (for getting/putting the module
reference) for the object (with a final module_put() after the release
function has been called). The problem there would be that it would
preclude unloading of the module if there isn't a "self destruct" knob
for the object.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/30 #2] Use menuconfig objects - I2C

2007-04-13 Thread Jean Delvare

On Wed, 11 Apr 2007 14:28:30 +0200 (MEST), Jan Engelhardt wrote:
> Allow the whole I2C menu to be disabled at once without diving into
> the submenus for deselecting all options (should the user desire so).
> 
> Signed-off-by: Jan Engelhardt <[EMAIL PROTECTED]>
> 
>  Kconfig|   15 ++---
>  algos/Kconfig  |8 +---
>  busses/Kconfig |   93 
> +++--
>  chips/Kconfig  |   23 ++
>  4 files changed, 62 insertions(+), 77 deletions(-)

Applied, thanks.

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [KERNEL-DOC] kill warnings when building mandocs

2007-04-13 Thread Borislav Petkov

This patch shuts warnings of the sort:

make -C /mnt/samsung_200/sam/kernel/trees/21-rc6/build \
KBUILD_SRC=/mnt/samsung_200/sam/kernel/trees/21-rc6 \
KBUILD_EXTMOD="" -f /mnt/samsung_200/sam/kernel/trees/21-rc6/Makefile 
mandocs
make -f /mnt/samsung_200/sam/kernel/trees/21-rc6/scripts/Makefile.build 
obj=scripts/basic
make -f /mnt/samsung_200/sam/kernel/trees/21-rc6/scripts/Makefile.build 
obj=Documentation/DocBook mandocs
  SRCTREE=/mnt/samsung_200/sam/kernel/trees/21-rc6/ 
/mnt/samsung_200/sam/kernel/trees/21-rc6/build/scripts/basic/docproc doc 
/mnt/samsung_200/sam/kernel/trees/21-rc6/Documentation/DocBook/wanbook.tmpl 
>Documentation/DocBook/wanbook.xml
  if grep -q refentry Documentation/DocBook/wanbook.xml; then xmlto man -m 
/mnt/samsung_200/sam/kernel/trees/21-rc6/Documentation/DocBook/stylesheet.xsl 
-o Documentation/DocBook/man Documentation/DocBook/wanbook.xml ; gzip -f 
Documentation/DocBook/man/*.9; fi
Note: meta version: No productnumber or alternative sppp_close
Note: meta version: No [EMAIL PROTECTED]sppp_close
Note: Writing sppp_close.9
Note: meta version: No productnumber or alternative sppp_open
Note: meta version: No [EMAIL PROTECTED]sppp_open

by adding a RefMiscInfo xml tag in the form of the current kernel version to 
the function, struct and enum definitions in files included by kernel-doc when 
building 'mandocs'.  However, the version string appears truncated on the 
manpage
due to some constraints in the xml DTD for the man header, I believe, for the
troff output is truncated too.


Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>

Index: 21-rc6/scripts/kernel-doc
===
--- 21-rc6.orig/scripts/kernel-doc
+++ 21-rc6/scripts/kernel-doc
@@ -326,6 +326,34 @@ while ($ARGV[0] =~ m/^-(.*)/) {
 }
 }
 
+# get kernel version
+sub get_kernel_version() {
+
+   my $version;
+   open (FILE, "../Makefile") || die "Can't open man kernel Makefile: $!";
+
+   EOF: while (my $line = )
+   {
+   if ($line =~ /VERSION\s+=\s+(\d+)/) {
+   $version .= $1;
+   next;
+   }
+   if ($line =~ /PATCHLEVEL\s+=\s+(\d+)/) {
+   $version .= ".$1";
+   next;
+   }
+   if ($line =~ /SUBLEVEL\s+=\s+(\d+)/) {
+   $version .= ".$1";
+   next;
+   }
+   if ($line =~ /EXTRAVERSION\s+=\s+(.*)$/) {
+   $version .= $1;
+   last EOF;
+   }
+   }
+   return $version;
+}
+
 
 # generate a sequence of code that will splice in highlighting information
 # using the s// operator.
@@ -592,6 +620,7 @@ sub output_function_xml(%) {
 print "\n";
 print " 
".$args{'function'}."\n";
 print " 9\n";
+print " " . get_kernel_version() . 
"\n";
 print "\n";
 print "\n";
 print " ".$args{'function'}."\n";
@@ -668,6 +697,7 @@ sub output_struct_xml(%) {
 print "\n";
 print " ".$args{'type'}." 
".$args{'struct'}."\n";
 print " 9\n";
+print " " . get_kernel_version() . 
"\n";
 print "\n";
 print "\n";
 print " ".$args{'type'}." ".$args{'struct'}."\n";
@@ -752,6 +782,7 @@ sub output_enum_xml(%) {
 print "\n";
 print " enum 
".$args{'enum'}."\n";
 print " 9\n";
+print " " . get_kernel_version() . 
"\n";
 print "\n";
 print "\n";
 print " enum ".$args{'enum'}."\n";
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [KERNEL-DOC] kill warnings when building mandocs

2007-04-13 Thread Borislav Petkov

Sorry for the improper whitespaces, here's a correct version.

Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>


Index: 21-rc6/scripts/kernel-doc
===
--- 21-rc6.orig/scripts/kernel-doc
+++ 21-rc6/scripts/kernel-doc
@@ -326,6 +326,32 @@ while ($ARGV[0] =~ m/^-(.*)/) {
 }
 }
 
+# get kernel version
+sub get_kernel_version() {
+my $version;
+open (FILE, "../Makefile") || die "Can't open man kernel Makefile: $!";
+
+EOF: while (my $line = )
+{
+   if ($line =~ /VERSION\s+=\s+(\d+)/) {
+$version .= $1;
+next;
+   }
+   if ($line =~ /PATCHLEVEL\s+=\s+(\d+)/) {
+$version .= ".$1";
+next;
+   }
+   if ($line =~ /SUBLEVEL\s+=\s+(\d+)/) {
+$version .= ".$1";
+next;
+   }
+   if ($line =~ /EXTRAVERSION\s+=\s+(.*)$/) {
+$version .= $1;
+last EOF;
+   }
+}
+return $version;
+}
 
 # generate a sequence of code that will splice in highlighting information
 # using the s// operator.
@@ -592,6 +618,7 @@ sub output_function_xml(%) {
 print "\n";
 print " 
".$args{'function'}."\n";
 print " 9\n";
+print " " . get_kernel_version() . 
"\n";
 print "\n";
 print "\n";
 print " ".$args{'function'}."\n";
@@ -668,6 +695,7 @@ sub output_struct_xml(%) {
 print "\n";
 print " ".$args{'type'}." 
".$args{'struct'}."\n";
 print " 9\n";
+print " " . get_kernel_version() . 
"\n";
 print "\n";
 print "\n";
 print " ".$args{'type'}." ".$args{'struct'}."\n";
@@ -752,6 +780,7 @@ sub output_enum_xml(%) {
 print "\n";
 print " enum 
".$args{'enum'}."\n";
 print " 9\n";
+print " " . get_kernel_version() . 
"\n";
 print "\n";
 print "\n";
 print " enum ".$args{'enum'}."\n";
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] i386 - pte update optimizations

2007-04-13 Thread Keir Fraser

On 13/4/07 03:24, "Zachary Amsden" <[EMAIL PROTECTED]> wrote:

>> You do know that P6 and higher don't do locked bus references as long
>> as the value is in the cache, right?
> 
> Yes.  Even then, last time I clocked instructions, xchg was still slower
> than read / write, although I could be misremembering.  And it's not
> totally clear that they will always be in cached state, however, and for
> SMP, we still want to drop the implicit lock in cases where the
> processor might not know they are cached exclusive, but we know there
> are no other racing users.  And there are plenty of old processors out
> there to still make it worthwhile.

LOCKed instruction suck really badly on the netburst microarchitecture (like
factor of 10x, or not far off). I think it's probably because of their side
effect of serialising memory accesses, causing horrible pipeline stalls.

 -- Keir

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread Christoph Hellwig

On Fri, Apr 13, 2007 at 06:25:46PM +1000, Nick Piggin wrote:
> But at least make it into its own module with a debugfs interface or
> something. I mean, exposing a PG_name-to-nr and page count pfn and flags
> as a supposedly formal proc interface doesn't sound nice to me. Page
> flags does not tell you what is going on in the VM, it gives you a tiny
> window into "something". Between reading a /proc/pid/ pfn and finding
> the pfn's page flags, it could be used for something completely different
> anyway.

I agree that exposing numerical values of page flags is not a very good
idea at all.  If we really want to expose this information it should
at least be in string form, although that is quite a bit of a maintaince
horror aswell.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug

2007-04-13 Thread Gautham R Shenoy

On Thu, Apr 12, 2007 at 08:00:04PM +0400, Oleg Nesterov wrote:
> On 04/12, Srivatsa Vaddagiri wrote:
> >
> > On Tue, Apr 03, 2007 at 10:48:20PM +0530, Srivatsa Vaddagiri wrote:
> > > > Actually, we should do this before destroy_workqueue() calls 
> > > > flush_workqueue().
> > > > Otherwise flush_cpu_workqueue() can hang forever in a similar manner.
> > > 
> > > Yep. I guess these are a class of freezer deadlocks very similar to vfork
> > > parent waiting on child case. I get a feeling these should become common
> > > outside of kthread too (A waits on B for something, B gets frozen, which
> > > means A won't freeze causing freezer to fail). Can freezer detect this
> > > dependency somehow and thaw B automatically? Probably not that easy ..
> > 
> > I wonder if there is some value in "enforcing" an order in which
> > processes get frozen i.e freeze A first before B. That may solve the
> > deadlocks we have been discussing wrt kthread_stop and flush_workqueue
> > as well.
> 
> Perhaps we can add "atomic_t xxx" to task_struct.
> 
>   int freezing(struct task_struct *p)
>   {
>   return test_tsk_thread_flag(p, TIF_FREEZE)
>   && atomic_read(&p->xxx) == 0;
>   }
> 
>   void xxx_start(struct task_struct *p)
>   {
>   atomic_inc(p->xxx);
>   thaw_process(p);
>   }
> 
>   xxx_end(struct task_struct *p)
>   {
>   atomic_dec(p->xxx);
>   }
> 
> Now,
> 
>   xxx_start(p);
>   ... wait for something which depends on p...
>   xxx_end(p);
> 
> Of course we need other changes, freeze_process() should check ->xxx, etc.
> I am not sure this makes sense.

I think this is racy.
Say, if the task which does xxx_start(p) [let's call it task q] is not 
freezeable, and task p is
already frozen when q  calls xxx_start, then we might be in a situation
where 

- Freezer has declared the whole system to be frozen. Hence the thread
  issuing the 'freeze'(suspend/hotplug) can go ahead and do whatever it wants 
to.

- Task 'p' which was supposed to be frozen , is now running and
  *surprise* We have a thread running on a cpu which ain't there any more!

I feel we need some kind of a global rwlock. 


DEFINE_RWLOCK(freezer_status_lock);
int  xxx_start(struct task_struct *p)
{
int ret = 0;
ret = read_trylock(&freezer_status_lock);
if(ret) 
 /* We've succeeded. So lets thaw the chap */
 thaw_process(p);
/* If we've failed to acquire trylock, that means freezer doesn't 
 * depend on us.
 * So lets simply wait without thawing p
 */

return ret;

}


void xxx_end(int state)
{
if(state)
read_unlock(&freezer_status_lock);
}


int try_to_freeze_tasks()
{
do {
/* The regular freezer code */

if (!todo && !write_trylock(&freezer_status_lock));
continue;
/* When the freezer goes back, it will find task 'p'
 * woken up and hence wait for it to get frozen
 */
}while(todo);
}

void try_to_thaw_tasks()
{
.
.
.
write_unlock(&freezer_status_lock);
}


int state = xxx_start(p);
... wait for something which depends on p...
xxx_end(state);

However, now that I look at it again, I see that it will fail in our case
where we might need to nest the try_to_freeze_tasks call.

Hmm, we don't have a rwlock variant that allows multiple writers, now do
we?!


> 
> Oleg.
> 

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/30] Use menuconfig objects

2007-04-13 Thread Jean Delvare

Jan,

On Tue, 10 Apr 2007 21:17:40 +0200 (MEST), Jan Engelhardt wrote:
> the following patch series turns some menus into menuconfigs, so they 
> can be disabled whilst "walking" thorugh the parent menu (check the 
> videos [1], [2] to see what I mean), enabling for disabling lots of 
> options _quickly_.
> 
> I'll send the patches (as a reply to this mail) piece by piece out
> when I figure out the maintainers to Cc for each.
> 
> [1] 6.04 MB(1000s) 70 sec http://jengelh.hopto.org/mc/without.ogg
> [2] 1.96 MB(1000s) 44 sec http://jengelh.hopto.org/mc/with.ogg

You might want to try the -q flag of quilt push and quilt pop.

> I have not poked on all subsystems (it is quite a lot), but I can 
> already give a glimpse (`ls -1`) of who gets one and who does not.
> And, I am not completely finished changing some subsystems - but
> the patches are already big enough for a first wave. I want to get
> some feedback first. Thanks!
> 
> `ls -1`:
>   menuconfig-acpi.diff
>   menuconfig-apm.diff
>   menuconfig-ata.diff
>   menuconfig-block.diff
>   menuconfig-connector.diff
>   menuconfig-crypto.diff
>   menuconfig-crypto2.diff
>   menuconfig-dccp.diff
>   menuconfig-fusion.diff
>   menuconfig-i2c.diff
>   menuconfig-i2o.diff
>   menuconfig-ide.diff
>   menuconfig-ieee1394.diff
>   menuconfig-md.diff
>   menuconfig-modules.diff
>   menuconfig-mtd.diff
>   menuconfig-net-ipvs.diff
>   menuconfig-net-sctp.diff
>   menuconfig-net-tipc.diff
>   menuconfig-netdev-arcnet.diff
>   menuconfig-netdev-phy.diff
>   menuconfig-netdev-tr.diff
>   menuconfig-netdev.diff
>   menuconfig-oldcd.diff
>   menuconfig-parport.diff
>   menuconfig-pcihotplug.diff
>   menuconfig-pcmcia.diff
>   menuconfig-pnp.diff
>   menuconfig-scsi.diff
>   menuconfig-w1.diff

As far as I can see, the hwmon subsystem would benefit from that too.

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] [PATCH][RFC] Kill off legacy power management stuff.

2007-04-13 Thread David Brownell

On Friday 13 April 2007 1:22 am, Rafael J. Wysocki wrote:
> [appropriate CCs added]
> 
> On Friday, 13 April 2007 02:33, Robert P. J. Day wrote:
> > 
> > just something i threw together, not in final form, but it represents
> > tossing the legacy PM stuff.  at the moment, the menuconfig entry for
> > PM_LEGACY lists it as "DEPRECATED", while the help screen calls it
> > "obsolete."  that's a good sign that it's getting close to the time
> > for it to go, and the removal is fairly straightforward, but there's
> > no mention of its removal in the feature removal schedule file.
> 
> It's been like this for a long long time.  I think you're right that it can be
> dropped, but I don't know the details (eg. why it hasn't been dropped yet).

I was just thinking about this the other day.  I did an inventory of
the actual _users_ of this code when I added the deprecation to Kconfig,
and the only driver that would catch the notification was for some
old m68k platform serial driver (Amiga?) ... that won't have changed.

So the only reason not to remove it at that time was to make sure that
folk had a chance to squawk about it going away.  I think it's past
time for this ancient stuff to vanish, since it's been obsolete for
most of 2.5..2.6 and there's only that one event consumer left.

Go for it!

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] [RFC] Common power driver for Linux gadgets

2007-04-13 Thread Anton Vorontsov

On Fri, Apr 13, 2007 at 01:42:36AM -0700, David Brownell wrote:
> On Friday 13 April 2007 12:36 am, Anton Vorontsov wrote:
> > Hello David,
> > > 
> > >  - The API needs to say *how much power* can be drawn...
> > > 
> > >  - Sensing VBUS power is not the same thing as being allowed to consume
> > >it.  Again, USB OTG devices are different:  OTG hosts **SUPPLY** the
> > >current, ...
> > 
> > I see. Current devices I have just consumes power from USB host. I.e.
> > they able to boot using only USB and discharged battery. :-/ Even
> > more, HP iPaq hx4700 able to charge battery from USB (using that driver).
> > We just setting USB charging GPIO, and it starts consume power from
> > the host.
> 
> Sounds like it could be more power than the host expects it to consume;
> or in some cases, not as much as it could ... ISTR only the Ethernet
> gadget defaults to 100mA (in non-OTG configs), the others are set for
> only 2 mA (expecting self-powered configs).
> 
> 
> > But I got the point, and yes I can't explain why it works correctly.
> 
> It probably doesn't work correctly.  But it's not broken enough to
> fail badly.

Can that comment be an explanation?

--- drivers/usb/gadget/pxa2xx_udc.c:
static const struct usb_gadget_ops pxa2xx_udc_ops = {
.get_frame  = pxa2xx_udc_get_frame,
.wakeup = pxa2xx_udc_wakeup,
.vbus_session   = pxa2xx_udc_vbus_session,
.pullup = pxa2xx_udc_pullup,

// .vbus_draw ... boards may consume current from VBUS, up to
// 100-500mA based on config.  the 500uA suspend ceiling means
// that exclusively vbus-powered PXA designs violate USB specs.
};


Comparing to omap_udc.

--- drivers/usb/gadget/omap_udc.c
static int omap_vbus_draw(struct usb_gadget *gadget, unsigned mA)
{
struct omap_udc *udc;

udc = container_of(gadget, struct omap_udc, gadget);
if (udc->transceiver)
return otg_set_power(udc->transceiver, mA);
return -EOPNOTSUPP;
}
[...]
static struct usb_gadget_ops omap_gadget_ops = {
.get_frame  = omap_get_frame,
.wakeup = omap_wakeup,
.set_selfpowered= omap_set_selfpowered,
.vbus_session   = omap_vbus_session,
.vbus_draw  = omap_vbus_draw,
.pullup = omap_pullup,
};



Regarding API. If you all you want is to know how much power you need to
ask from VBUS, we can extend external power interface... thus suppliers
could ask their power consumption requirements in mA/uA, and these
requests will be forwarded to power supply driver, and power driver will
forward that request to USB transceiver (via platform hook).

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.org/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc5: swsusp: Not enough free memory

2007-04-13 Thread Jiri Slaby

On 4/12/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:

On Wednesday, 11 April 2007 17:02, Jiri Slaby wrote:
> Rafael J. Wysocki napsal(a):
> > On Wednesday, 11 April 2007 12:45, Jiri Slaby wrote:
> >> Rafael J. Wysocki napsal(a):
> >>> On Wednesday, 11 April 2007 09:36, Jiri Slaby wrote:
>  Rafael J. Wysocki napsal(a):
> > On Monday, 9 April 2007 22:07, Jiri Slaby wrote:
> >> I have bad news for you :(. I thought I had unpatched kernel, but it 
happens
> >> in -rc6 too.
> > I guess you mean you're still seeing the 'not enough memory to suspend'
> > problem?
>  Yes:
>  Disabling non-boot CPUs ...
>  kvm: disabling virtualization on CPU1
>  Breaking affinity for irq 9
>  CPU 1 is now offline
>  SMP alternatives: switching to UP code
>  CPU1 is down
>  swsusp: critical section:
>  swsusp: Need to copy 158309 pages
>  swsusp: Not enough free memory
>  Error -12 suspending
>  Enabling non-boot CPUs ...
>  SMP alternatives: switching to SMP code
>  Booting processor 1/2 APIC 0x1
>  Initializing CPU#1
> >>> How reproducible is it?  I'm going to try to reproduce it on one of my 
boxes.
> >> My tip is one of three cases: after some work on fresh boot -- some
> >> consumers such as thunderbird, firefox, 10 or so terminals with
> >> gnome-session. Single xterm + gnome-session semms not to be a problem.
> >
> > Does the workaround with setting the image size below 1/2 of RAM work for 
you?
>
> Yes. Yesterday I must set the value to 350M -- 400M was not enough.

Well, I can't reproduce it.

Can you please try to reproduce it with the appended patch applied and send
the output of dmesg to me?

Greetings,
Rafael

---
 kernel/power/snapshot.c |4 ++--
 kernel/power/swsusp.c   |   16 
 2 files changed, 14 insertions(+), 6 deletions(-)

Shrinking memory...  Pages needed: 128103 normal, 0 highmem
Pages needed: 125226 normal, 0 highmem
Pages needed: -5757 normal, 0 highmem
Pages needed: -5757 normal, 0 highmem
Pages needed: -5757 normal, 0 highmem
Pages needed: -5757
Pages needed: 127953 normal, 0 highmem
Pages needed: 125076 normal, 0 highmem
Pages needed: -6043 normal, 0 highmem
Pages needed: -6043 normal, 0 highmem
Pages needed: -6043 normal, 0 highmem
Pages needed: -6043
done (200 pages freed)
Freed 800 kbytes in 0.16 seconds (5.00 MB/s)
Suspending console(s)
...
CPU1 is down
swsusp: critical section:
swsusp: Need to copy 131358 pages
swsusp: Normal pages needed: 131358
swsusp: Normal pages needed: 131358 + 1024 + 22, available pages: 130607
swsusp: Not enough free memory
Error -12 suspending
Enabling non-boot CPUs ...
SMP alternatives: switching to SMP code
Booting processor 1/2 APIC 0x1
Not responding.
Inquiring remote APIC #1...
... APIC #1 ID: failed
... APIC #1 VERSION: failed
... APIC #1 SPIV: failed
kvm: disabling virtualization on CPU1
Error taking CPU1 up: -5
PCI: Setting latency timer of device :00:01.0 to 64

Please note the CPU#1 bring up problem too. Whole dmesg at:
http://www.fi.muni.cz/~xslaby/sklad/dmesg.nomem

regards,
--
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

fastcalls, was Re: [patch] generic rwsems

2007-04-13 Thread Christoph Hellwig

On Fri, Apr 13, 2007 at 12:04:16PM +0200, Nick Piggin wrote:
> Remove one level of indirection (kernel/rwsem.c -> lib/rwsem.c), and
> give a bit of a cleanup (eg remove the fastcall junk) to make the
> code a bit easier to read.

Arpopos fastcalls, now that -mregparam=3 is the defaul on i386 and
FASTCALL/fastcall is a noop everywhere else can we please get rid
of this attribute entirely?  If any other architecture provides
a more optimal non-standard calling convention we can just use
it by default in kernelspace as a few architectures already do.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] [RFC] Common power driver for Linux gadgets

2007-04-13 Thread David Brownell

On Friday 13 April 2007 2:52 am, Anton Vorontsov wrote:
> > > But I got the point, and yes I can't explain why it works correctly.
> > 
> > It probably doesn't work correctly.  But it's not broken enough to
> > fail badly.
> 
> Can that comment be an explanation?
> 
> --- drivers/usb/gadget/pxa2xx_udc.c:
> static const struct usb_gadget_ops pxa2xx_udc_ops = {
> .get_frame  = pxa2xx_udc_get_frame,
> .wakeup = pxa2xx_udc_wakeup,
> .vbus_session   = pxa2xx_udc_vbus_session,
> .pullup = pxa2xx_udc_pullup,
> 
> // .vbus_draw ... boards may consume current from VBUS, up to
> // 100-500mA based on config.  the 500uA suspend ceiling means
> // that exclusively vbus-powered PXA designs violate USB specs.

That's basically a "plug in implementation here".  Nobody's yet done
that on a platform that _uses_ the VBUS power.

> };
> 
> 
> Comparing to omap_udc.
> 
> --- drivers/usb/gadget/omap_udc.c
> static int omap_vbus_draw(struct usb_gadget *gadget, unsigned mA)
> {
> struct omap_udc *udc;
> 
> udc = container_of(gadget, struct omap_udc, gadget);
> if (udc->transceiver)
> return otg_set_power(udc->transceiver, mA);

Where the transceiver would then delegate to something else,
like the tps65010 driver.

> return -EOPNOTSUPP;
> }
> [...]
> static struct usb_gadget_ops omap_gadget_ops = {
> .get_frame  = omap_get_frame,
> .wakeup = omap_wakeup,
> .set_selfpowered= omap_set_selfpowered,
> .vbus_session   = omap_vbus_session,
> .vbus_draw  = omap_vbus_draw,

... which has most certainly been on platforms which are hooked
up to draw power from VBUS.

> .pullup = omap_pullup,
> };
> 
> 
> 
> Regarding API. If you all you want is to know how much power you need to
> ask from VBUS, we can extend external power interface... thus suppliers
> could ask their power consumption requirements in mA/uA, and these
> requests will be forwarded to power supply driver, and power driver will
> forward that request to USB transceiver (via platform hook).

I don't folow what you're saying.  The control flow *MUST* be that
the USB stack provides the only indication of how much power may
be drawn through the VBUS supply.  Nothing else in the system has
the knowledge of what's legal, and when.

If you want to talk about a "supplier", the way to put it might
then be that the USB stack is saying "here's N mA power for you";
it's supplying the power, not the other way around.

There's no "ask" involved, since the host controls "N".  So the
host supplies, the USB gadget stack interprets that, and some
power component must obeys.  That includes rules like reducing
VBUS draw to ~500 uA when the host suspends the USB link.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] generic rwsems

2007-04-13 Thread Nick Piggin

On Fri, Apr 13, 2007 at 12:04:16PM +0200, Nick Piggin wrote:
> OK, this patch is against 2.6.21-rc6 + Mathieu's atomic_long patches.
> 
> Last time this came up I was asked to get some numbers, so here are
> some in the changelog, captured with a simple kernel module tester.
> I got motivated again because of the MySQL/glibc/mmap_sem issue.
> 
> This patch converts all architectures to a generic rwsem implementation,
> which will compile down to the same code for i386, or powerpc, for
> example, and will allow some (eg. x86-64) to move away from spinlock
> based rwsems.

Oh, and it also converts powerpc and sparc64 to 64-bit counters, so
they can handle more than 32K tasks waiting (which was apparently a
real problem for SGI, and is probably a good thing).

But that reminds me:
> +/*
> + * the semaphore definition
> + */
> +struct rw_semaphore {
> + atomic_long_t   count;
> + spinlock_t  wait_lock;
> + struct list_headwait_list;
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> + struct lockdep_map dep_map;
> +#endif
> +};

I think I should put wait_lock after wait_list, so as to get a better
packing on most 64-bit architectures.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Feature Request?] Inline compression of process core dumps

2007-04-13 Thread Alan Cox

> I saw that too, and unfortunately I don't know what what that condition 
> represents, either.  It's the only other element in that if statement 
> that could make it take that path, so I'm assuming that's part of the 
> problem.

Multiple mm's mean multiple threads with a different set of mappings,
which would fit for UML. Either way there should be a check for !pipe
before appending the pid
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: HPA patches

2007-04-13 Thread Alan Cox

> Pondering about this, it's ATA_LBA according to the docs, specifying
> that the address is an LBA.

This is true for some commands, but not all. It gets used for other stuff
too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] generic rwsems

2007-04-13 Thread Andi Kleen

On Friday 13 April 2007 12:04:16 Nick Piggin wrote:
> OK, this patch is against 2.6.21-rc6 + Mathieu's atomic_long patches.
> 
> Last time this came up I was asked to get some numbers, so here are
> some in the changelog, captured with a simple kernel module tester.
> I got motivated again because of the MySQL/glibc/mmap_sem issue.
> 
> This patch converts all architectures to a generic rwsem implementation,
> which will compile down to the same code for i386, or powerpc, for
> example, and will allow some (eg. x86-64) to move away from spinlock
> based rwsems.
> 
> Comments?

Fine for me from the x86-64 side. Some more validation with a test suite
would be good though.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fastcalls, was Re: [patch] generic rwsems

2007-04-13 Thread Nick Piggin

On Fri, Apr 13, 2007 at 11:19:30AM +0100, Christoph Hellwig wrote:
> On Fri, Apr 13, 2007 at 12:04:16PM +0200, Nick Piggin wrote:
> > Remove one level of indirection (kernel/rwsem.c -> lib/rwsem.c), and
> > give a bit of a cleanup (eg remove the fastcall junk) to make the
> > code a bit easier to read.
> 
> Arpopos fastcalls, now that -mregparam=3 is the defaul on i386 and
> FASTCALL/fastcall is a noop everywhere else can we please get rid
> of this attribute entirely?  If any other architecture provides
> a more optimal non-standard calling convention we can just use
> it by default in kernelspace as a few architectures already do.

I don't see why not. David objected me removing them I think because
the rwsem code called some from asm (but of course that's removed too).
However AFAIK, this situation should be using the asmlinkage attribute
anyway.

fastcall in kernel is annoying... it seems to imply that we normally
rather slow calls ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] generic rwsems

2007-04-13 Thread Nick Piggin

On Fri, Apr 13, 2007 at 12:53:49PM +0200, Andi Kleen wrote:
> On Friday 13 April 2007 12:04:16 Nick Piggin wrote:
> > OK, this patch is against 2.6.21-rc6 + Mathieu's atomic_long patches.
> > 
> > Last time this came up I was asked to get some numbers, so here are
> > some in the changelog, captured with a simple kernel module tester.
> > I got motivated again because of the MySQL/glibc/mmap_sem issue.
> > 
> > This patch converts all architectures to a generic rwsem implementation,
> > which will compile down to the same code for i386, or powerpc, for
> > example, and will allow some (eg. x86-64) to move away from spinlock
> > based rwsems.
> > 
> > Comments?
> 
> Fine for me from the x86-64 side. Some more validation with a test suite
> would be good though.

David had a test suite somewhere, so I'll give that a run.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Feature Request?] Inline compression of process core dumps

2007-04-13 Thread Andi Kleen

Alan Cox <[EMAIL PROTECTED]> writes:

> > I saw that too, and unfortunately I don't know what what that condition 
> > represents, either.  It's the only other element in that if statement 
> > that could make it take that path, so I'm assuming that's part of the 
> > problem.
> 
> Multiple mm's mean multiple threads with a different set of mappings,
> which would fit for UML. Either way there should be a check for !pipe
> before appending the pid

Here's a patch. It just doesn't do any formatting for the pipe case.

-Andi

Fix core to pipe for multithreaded processes

I also removed the BKL around format_corename because it seems unneeded.

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc3-test/fs/exec.c
===
--- linux-2.6.21-rc3-test.orig/fs/exec.c
+++ linux-2.6.21-rc3-test/fs/exec.c
@@ -1501,9 +1501,6 @@ int do_coredump(long signr, int exit_cod
 * lock_kernel() because format_corename() is controlled by sysctl, 
which
 * uses lock_kernel()
 */
-   lock_kernel();
-   format_corename(corename, core_pattern, signr);
-   unlock_kernel();
if (corename[0] == '|') {
/* SIGPIPE can happen, but it's just never processed */
if(call_usermodehelper_pipe(corename+1, NULL, NULL, &file)) {
@@ -1512,10 +1509,12 @@ int do_coredump(long signr, int exit_cod
goto fail_unlock;
}
ispipe = 1;
-   } else
+   } else {
+   format_corename(corename, core_pattern, signr);
file = filp_open(corename,
 O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
 0600);
+   }
if (IS_ERR(file))
goto fail_unlock;
inode = file->f_path.dentry->d_inode;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] i386 - pte update optimizations

2007-04-13 Thread Andi Kleen

Keir Fraser <[EMAIL PROTECTED]> writes:

> On 13/4/07 03:24, "Zachary Amsden" <[EMAIL PROTECTED]> wrote:
> 
> >> You do know that P6 and higher don't do locked bus references as long
> >> as the value is in the cache, right?
> > 
> > Yes.  Even then, last time I clocked instructions, xchg was still slower
> > than read / write, although I could be misremembering.  And it's not
> > totally clear that they will always be in cached state, however, and for
> > SMP, we still want to drop the implicit lock in cases where the
> > processor might not know they are cached exclusive, but we know there
> > are no other racing users.  And there are plenty of old processors out
> > there to still make it worthwhile.
> 
> LOCKed instruction suck really badly on the netburst microarchitecture (like
> factor of 10x, or not far off). I think it's probably because of their side
> effect of serialising memory accesses, causing horrible pipeline stalls.

Unfortunately they tend to be HyperThreaded usually (except for early ones 
and Celerons) and need the LOCK anyways.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

PROBLEM: kernel BUG at mm/rmap.c:522!

2007-04-13 Thread Francesco Ricci

  First run the ver_linux script included as scripts/ver_linux, which
reports the version of some important subsystems.  Run this script with
the command "sh scripts/ver_linux".

-> umh... I cannot find this script

Use that information to fill in all fields of the bug report form, and
post it to the mailing list with a subject of "PROBLEM: " for easy
identification by the developers

[1.] One line summary of the problem:
kernel BUG at mm/rmap.c:522!

[2.] Full description of the problem/report:
alert message repeated on every terminal, Iceape (browser) crashed when
closing a tab (multitab browsing).

[3.] Keywords (i.e., modules, networking, kernel):
kernel bug

[4.] Kernel version (from /proc/version):
Linux version 2.6.18-4-686 (Debian 2.6.18.dfsg.1-12) ([EMAIL PROTECTED])
(gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Mon Mar
26 17:17:36 UTC 2007

[5.] Output of Oops.. message (if applicable) with symbolic information 
 resolved (see Documentation/oops-tracing.txt)
no oops

[6.] A small shell script or example program which triggers the
 problem (if possible)
n/a

[7.] Environment
[7.1.] Software (add the output of the ver_linux script here)
script not found

[7.2.] Processor information (from /proc/cpuinfo):
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 2.00GHz
stepping: 4
cpu MHz : 1996.778
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm up
bogomips: 3998.39

[7.3.] Module information (from /proc/modules):
ext3 119240 1 - Live 0xf0e33000
jbd 52456 1 ext3, Live 0xf0e01000
mbcache 8356 1 ext3, Live 0xf0cea000
mga 58176 2 - Live 0xf0cce000
drm 61332 3 mga, Live 0xf0c7c000
ppdev 8676 0 - Live 0xf0c64000
lp 11012 0 - Live 0xf0c6
button 6672 0 - Live 0xf0c5d000
ac 5188 0 - Live 0xf0c5a000
battery 9636 0 - Live 0xf0c56000
ipv6 226016 23 - Live 0xf0c95000
fuse 39828 1 - Live 0xf0b95000
dm_snapshot 15552 0 - Live 0xf0b9
dm_mirror 19152 0 - Live 0xf0a8a000
dm_mod 50232 2 dm_snapshot,dm_mirror, Live 0xf0ba
loop 15048 0 - Live 0xf0aba000
snd_via82xx 26008 0 - Live 0xf0ac3000
gameport 14632 1 snd_via82xx, Live 0xf0ab5000
snd_ac97_codec 83104 1 snd_via82xx, Live 0xf0ad1000
snd_ac97_bus 2400 1 snd_ac97_codec, Live 0xf09e
tsdev 7520 0 - Live 0xf0ab2000
snd_pcm_oss 38368 0 - Live 0xf0a95000
snd_mixer_oss 15200 1 snd_pcm_oss, Live 0xf0a9
snd_pcm 68676 3 snd_via82xx,snd_ac97_codec,snd_pcm_oss, Live 0xf0aa
snd_page_alloc 9640 2 snd_via82xx,snd_pcm, Live 0xf09e2000
snd_mpu401_uart 8064 1 snd_via82xx, Live 0xf0a5d000
snd_seq_dummy 3844 0 - Live 0xf09b9000
snd_seq_oss 28768 0 - Live 0xf0a47000
snd_seq_midi 8192 0 - Live 0xf09fb000
snd_seq_midi_event 7008 2 snd_seq_oss,snd_seq_midi, Live 0xf09ea000
i2c_viapro 8244 0 - Live 0xf09e6000
i2c_core 19680 1 i2c_viapro, Live 0xf0a41000
snd_seq 45680 6 snd_seq_dummy,snd_seq_oss,snd_seq_midi,snd_seq_midi_event,
Live 0xf0a5
via_ircc 23188 0 - Live 0xf0a2b000
snd_timer 20996 2 snd_pcm,snd_seq, Live 0xf0a3a000
snd_rawmidi 22560 2 snd_mpu401_uart,snd_seq_midi, Live 0xf0a33000
snd_seq_device 7820 5
snd_seq_dummy,snd_seq_oss,snd_seq_midi,snd_seq,snd_rawmidi, Live 0xf09db000
irda 162588 1 via_ircc, Live 0xf0a61000
pcspkr 3072 0 - Live 0xf09de000
parport_pc 32132 1 - Live 0xf09bb000
parport 33256 3 ppdev,lp,parport_pc, Live 0xf0a21000
floppy 53156 0 - Live 0xf09ed000
via_agp 9664 1 - Live 0xf09b5000
psmouse 35016 0 - Live 0xf09d1000
serio_raw 6660 0 - Live 0xf0851000
snd 47012 11
snd_via82xx,snd_ac97_codec,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_mpu401_uart,snd_seq_oss,snd_seq,snd_timer,snd_rawmidi,snd_seq_device,
Live 0xf09c4000
soundcore 9248 1 snd, Live 0xf08e
rtc 12372 0 - Live 0xf08db000
crc_ccitt 2240 1 irda, Live 0xf08ba000
shpchp 33024 0 - Live 0xf09ab000
pci_hotplug 28704 1 shpchp, Live 0xf08c2000
agpgart 29896 2 drm,via_agp, Live 0xf08cc000
evdev 9088 1 - Live 0xf08be000
reiserfs 212640 2 - Live 0xf08e5000
ide_cd 36064 0 - Live 0xf0894000
cdrom 32544 1 ide_cd, Live 0xf089f000
ide_disk 14848 6 - Live 0xf084c000
generic 5476 0 [permanent], Live 0xf084
via_rhine 22664 0 - Live 0xf088d000
mii 5344 1 via_rhine, Live 0xf0843000
ehci_hcd 28136 0 - Live 0xf0831000
uhci_hcd 21164 0 - Live 0xf0839000
via82cxxx 8388 0 [permanent], Live 0xf081d000
ide_core 110504 4 ide_cd,ide_disk,generic,via82cxxx, Live 0xf0871000
usbcore 112644 3 ehci_hcd,uhci_hcd, Live 0xf0854000
thermal 13608 0 - Live 0xf082c000
processor 28840 1 thermal, Live 0xf0823000
fan 4804 0 - Live 0xf0805000

[7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
cat /proc/ioports 
-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060

Re: [PATCH 0/4] i386 - pte update optimizations

2007-04-13 Thread Keir Fraser

On 13/4/07 13:27, "Andi Kleen" <[EMAIL PROTECTED]> wrote:

>> LOCKed instruction suck really badly on the netburst microarchitecture (like
>> factor of 10x, or not far off). I think it's probably because of their side
>> effect of serialising memory accesses, causing horrible pipeline stalls.
> 
> Unfortunately they tend to be HyperThreaded usually (except for early ones
> and Celerons) and need the LOCK anyways.

Fair point, although quite a few people disable HT.

 -- Keir

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] [RFC] Common power driver for Linux gadgets

2007-04-13 Thread Anton Vorontsov

On Fri, Apr 13, 2007 at 03:25:13AM -0700, David Brownell wrote:
> On Friday 13 April 2007 2:52 am, Anton Vorontsov wrote:
> > > > But I got the point, and yes I can't explain why it works correctly.
> > > 
> > > It probably doesn't work correctly.  But it's not broken enough to
> > > fail badly.
> > 
> > Can that comment be an explanation?
> > 
> > --- drivers/usb/gadget/pxa2xx_udc.c:
> > static const struct usb_gadget_ops pxa2xx_udc_ops = {
> > .get_frame  = pxa2xx_udc_get_frame,
> > .wakeup = pxa2xx_udc_wakeup,
> > .vbus_session   = pxa2xx_udc_vbus_session,
> > .pullup = pxa2xx_udc_pullup,
> > 
> > // .vbus_draw ... boards may consume current from VBUS, up to
> > // 100-500mA based on config.  the 500uA suspend ceiling means
> > // that exclusively vbus-powered PXA designs violate USB specs.
> 
> That's basically a "plug in implementation here".  Nobody's yet done
> that on a platform that _uses_ the VBUS power.
> 
> 
> > };
> > 
> > 
> > Comparing to omap_udc.
> > 
> > --- drivers/usb/gadget/omap_udc.c
> > static int omap_vbus_draw(struct usb_gadget *gadget, unsigned mA)
> > {
> > struct omap_udc *udc;
> > 
> > udc = container_of(gadget, struct omap_udc, gadget);
> > if (udc->transceiver)
> > return otg_set_power(udc->transceiver, mA);
> 
> Where the transceiver would then delegate to something else,
> like the tps65010 driver.
> 
> 
> > return -EOPNOTSUPP;
> > }
> > [...]
> > static struct usb_gadget_ops omap_gadget_ops = {
> > .get_frame  = omap_get_frame,
> > .wakeup = omap_wakeup,
> > .set_selfpowered= omap_set_selfpowered,
> > .vbus_session   = omap_vbus_session,
> > .vbus_draw  = omap_vbus_draw,
> 
>  which has most certainly been on platforms which are hooked
> up to draw power from VBUS.
> 
> 
> > .pullup = omap_pullup,
> > };
> > 
> > 
> > 
> > Regarding API. If you all you want is to know how much power you need to
> > ask from VBUS, we can extend external power interface... thus suppliers
> > could ask their power consumption requirements in mA/uA, and these
> > requests will be forwarded to power supply driver, and power driver will
> > forward that request to USB transceiver (via platform hook).
> 
> I don't folow what you're saying.  The control flow *MUST* be that
> the USB stack provides the only indication of how much power may
> be drawn through the VBUS supply.  Nothing else in the system has
> the knowledge of what's legal, and when.
> 
> If you want to talk about a "supplier", the way to put it might
> then be that the USB stack is saying "here's N mA power for you";
> it's supplying the power, not the other way around.
> 
> There's no "ask" involved, since the host controls "N".  So the
> host supplies, the USB gadget stack interprets that, and some
> power component must obeys.  That includes rules like reducing
> VBUS draw to ~500 uA when the host suspends the USB link.

This framework (pda_power) support ability to charge from USB. But
it is only dispatcher, which implements logic. Issues like what
exact mA to draw, and how all that communicate to USB stack, left
to specific devices and/or additional drivers, users of framework.

In this regard, "USB" is just a label. Can be replaced with
"alternative power source".

> - Dave

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.org/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20.6 vanilla does't boot

2007-04-13 Thread Denis Kirjanov


I updated the BIOS to the latest version, but the problem persists.
Boots option pci = noacpi not solved the problem. Reporting bios bug
disappears when setting pci = nommconf, But the kernel is still not
loaded (
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] generic rwsems

2007-04-13 Thread David Howells

Nick Piggin <[EMAIL PROTECTED]> wrote:

> I think I should put wait_lock after wait_list, so as to get a better
> packing on most 64-bit architectures.

It makes no difference.  struct lockdep_map contains at least one pointer and
so is going to be 8-byte aligned (assuming it's there at all).  struct
rw_semaphore contains at least one pointer/long, so it will be padded out to
8-byte size.

If you want to make a difference, you'd need to add __attribute__((packed))
but you would need to be very careful with that.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

ACPI problem revealed

2007-04-13 Thread Fausto Carvalho


Bill Gates once said:

http://antitrust.slated.org/www.iowaconsumercase.org/011607/3000/PX03020.pdf
--
Fausto Carvalho
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: How should an exit routine wait for release() callbacks?

2007-04-13 Thread Markus Rechberger


Alan,

seems like you have the same problem as the dvb framework has/had.

http://mcentral.de/hg/~mrec/v4l-dvb-stable

The last 3 changesets do the trick to not oops, it will delay the 
deinitialization of the device till the last user closed the device node.


Markus

Cornelia Huck wrote:

On Thu, 12 Apr 2007 17:23:18 -0400 (EDT),
Alan Stern <[EMAIL PROTECTED]> wrote:

  

Here's a not-so-theoretical question.

I've got a module which registers a struct device.  (It represents a
virtual device, not a real one, but that doesn't matter.)  Obviously the
module's exit routine has to wait until the release() routine for that
device has been invoked -- if it returned too early then the release()
call would oops.

How should it wait?



Device lifetime vs. module lifetime - that's a fun one...

  
The most straightforward approach is to use a struct completion, like 
this:


static struct {
struct device dev;
...
} my_dev;

static DECLARE_COMPLETION(my_completion);

static void my_release(struct device *dev)
{
complete(&my_completion);
}

static void __exit my_exit(void)
{
device_unregister(&my_dev.dev);
wait_for_completion(&my_completion);
}

The problem is that there is no guarantee a context switch won't take
place after my_release() has called complete() and before my_release()  
returns.  If that happens and my_exit() finishes running, then the module

will be unloaded and the next context switch back to finish off
my_release() will oops.

Other approaches have similar defects.  So how can this problem be solved?



What I see that a device driver may do now is the following:
- disallow module unloading (duh)
- move the release function outside the module

To make the completion approach work, the complete() would need to be
after the release function. This would imply an upper layer, but this
upper layer would need to access the completion structure in the
module...

One could think about a owner field (for getting/putting the module
reference) for the object (with a final module_put() after the release
function has been called). The problem there would be that it would
preclude unloading of the module if there isn't a "self destruct" knob
for the object.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




  



--
  |   AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System   |  Register Court Dresden: HRA 4896
Research  |  General Partner authorized to represent:
 Center   | AMD Saxony LLC (Wilmington, Delaware, US)
  | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc5: swsusp: Not enough free memory

2007-04-13 Thread Rafael J. Wysocki

On Friday, 13 April 2007 12:14, Jiri Slaby wrote:
> On 4/12/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > On Wednesday, 11 April 2007 17:02, Jiri Slaby wrote:
> > > Rafael J. Wysocki napsal(a):
> > > > On Wednesday, 11 April 2007 12:45, Jiri Slaby wrote:
> > > >> Rafael J. Wysocki napsal(a):
> > > >>> On Wednesday, 11 April 2007 09:36, Jiri Slaby wrote:
> > >  Rafael J. Wysocki napsal(a):
> > > > On Monday, 9 April 2007 22:07, Jiri Slaby wrote:
> > > >> I have bad news for you :(. I thought I had unpatched kernel, but 
> > > >> it happens
> > > >> in -rc6 too.
> > > > I guess you mean you're still seeing the 'not enough memory to 
> > > > suspend'
> > > > problem?
> > >  Yes:
> > >  Disabling non-boot CPUs ...
> > >  kvm: disabling virtualization on CPU1
> > >  Breaking affinity for irq 9
> > >  CPU 1 is now offline
> > >  SMP alternatives: switching to UP code
> > >  CPU1 is down
> > >  swsusp: critical section:
> > >  swsusp: Need to copy 158309 pages
> > >  swsusp: Not enough free memory
> > >  Error -12 suspending
> > >  Enabling non-boot CPUs ...
> > >  SMP alternatives: switching to SMP code
> > >  Booting processor 1/2 APIC 0x1
> > >  Initializing CPU#1
> > > >>> How reproducible is it?  I'm going to try to reproduce it on one of 
> > > >>> my boxes.
> > > >> My tip is one of three cases: after some work on fresh boot -- some
> > > >> consumers such as thunderbird, firefox, 10 or so terminals with
> > > >> gnome-session. Single xterm + gnome-session semms not to be a problem.
> > > >
> > > > Does the workaround with setting the image size below 1/2 of RAM work 
> > > > for you?
> > >
> > > Yes. Yesterday I must set the value to 350M -- 400M was not enough.
> >
> > Well, I can't reproduce it.
> >
> > Can you please try to reproduce it with the appended patch applied and send
> > the output of dmesg to me?
> >
> > Greetings,
> > Rafael
> >
> > ---
> >  kernel/power/snapshot.c |4 ++--
> >  kernel/power/swsusp.c   |   16 
> >  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> Shrinking memory...  Pages needed: 128103 normal, 0 highmem
> Pages needed: 125226 normal, 0 highmem
> Pages needed: -5757 normal, 0 highmem
> Pages needed: -5757 normal, 0 highmem
> Pages needed: -5757 normal, 0 highmem
> Pages needed: -5757
> Pages needed: 127953 normal, 0 highmem
> Pages needed: 125076 normal, 0 highmem
> Pages needed: -6043 normal, 0 highmem
> Pages needed: -6043 normal, 0 highmem
> Pages needed: -6043 normal, 0 highmem
> Pages needed: -6043
> done (200 pages freed)
> Freed 800 kbytes in 0.16 seconds (5.00 MB/s)
> Suspending console(s)
> ...
> CPU1 is down
> swsusp: critical section:
> swsusp: Need to copy 131358 pages
> swsusp: Normal pages needed: 131358
> swsusp: Normal pages needed: 131358 + 1024 + 22, available pages: 130607

Well, it looks like someone allocated about 6000 pages after we had freed
enough memory for suspending.

I suspect one of the device drivers plays some dirty tricks in its .suspend()
routine.  Now the question is which one.

I wonder if setting PAGES_FOR_IO in include/linux/suspend.h to eg. 8192 will
help.

> swsusp: Not enough free memory
> Error -12 suspending
> Enabling non-boot CPUs ...
> SMP alternatives: switching to SMP code
> Booting processor 1/2 APIC 0x1
> Not responding.
> Inquiring remote APIC #1...
> ... APIC #1 ID: failed
> ... APIC #1 VERSION: failed
> ... APIC #1 SPIV: failed
> kvm: disabling virtualization on CPU1
> Error taking CPU1 up: -5
> PCI: Setting latency timer of device :00:01.0 to 64
> 
> Please note the CPU#1 bring up problem too.

Yes, and this seems to be related to the APIC.

Gautham, can you please tell me who's the right person to ask about it?

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/8] unprivileged mount syscall

2007-04-13 Thread Miklos Szeredi

> On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote:
> > > 1. clone the master namespace.
> > > 
> > > 2. in the new namespace
> > > 
> > >   move the tree under /share/$me to /
> > > for each ($user, $what, $how) {
> > > move /share/$user/$what to /$what
> > >   if ($how == slave) {
> > >  make the mount tree under /$what as slave
> > > }
> > > }
> > > 
> > > 3. in the new namespace make the tree under 
> > >/share as private and unmount /share
> > 
> > Thanks.  I get the basic idea now: the namespace itself need not be
> > shared between the sessions, it is enough if "share" propagation is
> > set up between the different namespaces of a user.
> > 
> > I don't yet see either in your or Viro's description how the trees
> > under /share/$USER are initialized.  I guess they are recursively
> > bound from /, and are made slaves.
> 
> yes. I suppose, when a userid is created one of the steps would be
> 
> mount --rbind / /share/$USER
> mount --make-rslave /share/$USER
> mount --make-rshared /share/$USER

Thinking a bit more about this, I'm quite sure most users wouldn't
even want private namespaces.  It would be enough to

  chroot /share/$USER

and be done with it.

Private namespaces are only good for keeping a bunch of mounts
referenced by a group of processes.  But my guess is, that the natural
behavior for users is to see a persistent set of mounts.

If for example they mount something on a remote machine, then log out
from the ssh session and later log back in, they would want to see
their previous mount still there.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: initramfs unpack point and rules

2007-04-13 Thread Krzysztof Halasa

Al Viro <[EMAIL PROTECTED]> writes:

> Nope.  The point is to have it as early as possible, so that we had more
> or less normal environment when drivers, etc. are being initialized.

But traditionally the "normal environment" is a root fs not yet
mounted. Do the drivers need initramfs? Which drivers?

In both cases it's a bit before /sbin/init (or /init) is launched,
udevd isn't running and firmware can't be loaded.

OTOH when something goes wrong before console drivers are initialized,
nothing can be seen. And with initramfs (especially with the external
one) it's easy to screw something.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] generic rwsems

2007-04-13 Thread David Howells

Nick Piggin <[EMAIL PROTECTED]> wrote:

> This patch converts all architectures to a generic rwsem implementation,
> which will compile down to the same code for i386, or powerpc, for
> example,

> and will allow some (eg. x86-64) to move away from spinlock based rwsems.

Which are better on UP kernels because spinlocks degrade to nothing, and then
you're left with a single disable/enable interrupt pair per operation, and no
requirement for atomic ops at all.

What you propose may wind up with several per op because if the CPU does not
support atomic ops directly and cannot emulate them through other atomic ops,
then these have to be emulated by:

atomic_op() {
spin_lock_irqsave
do op
spin_unlock_irqrestore
}

> Move to an architecture independent rwsem implementation, using the
> better of the two rwsem implementations

That's not necessarily the case, as I said above.

Furthermore, the spinlock implementation struct is smaller on 64-bit machines,
and is less prone to counter overrun on 32-bit machines.

> Out-of-line the fastpaths, to bring rw-semaphores into line with
> mutexes and spinlocks WRT our icache vs function call policy.

That should depend on whether you optimise for space or for speed.  Function
calls are relatively heavyweight.

Please re-inline and fix Ingo's mess if you must clean up.  Take the i386
version, for instance, I'd made it so that the compiler didn't know it was
taking a function call when it went down the slow path, thus meaning the
compiler didn't have to deal with that.  Furthermore, it only interpolated two
or three instructions into the calling code in the fastpath.  It's a real shame
that gcc inline asm doesn't allow you to use status flags as boolean returns,
otherwise I could reduce that even further.

> Spinlock based rwsems are inferior to atomic based ones one most
> architectures that can do atomic ops without spinlocks:

Note the "most" in your statement...

Think about it.  This algorithm is only optimal where XADD is available.  If
you don't have XADD, but you do have LL/SC or CMPXCHG, you can do better.

If the only atomic op you have is XCHG, then this is a really poor choice;
similarly if you are using a UP-compiled kernel.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread Ananth N Mavinakayanahalli

On Fri, Apr 13, 2007 at 12:50:20PM +1000, Nick Piggin wrote:
> Andrew Morton wrote:
> >On Fri, 13 Apr 2007 12:18:56 +1000 Nick Piggin <[EMAIL PROTECTED]> 
> >wrote:
> >
> >
> >>>I guess one could generate an answer to the static question with 
> >>>systemtap,
> >>>by accumulating running counts across the application lifetime and then
> >>>snapshotting them.  Sounds hard though.
> >>
> >>Can't you just traverse arbitrary kernel data structures at a given point
> >>in time, exactly like the /proc/ call is doing?
> >
> >
> >Do a full pagetable walk, with all the associated locking from within
> >a systemtap script?  I'd be surprised.  Maybe if it's mostly hand-coded
> >in C, perhaps.
> 
> It looks like you can traverse arbitrary data structures, yes.
> 
> It definitely seems like you can use some kernel functions, but the
> ones I saw may just be systemtap facilities. But what is so surprising
> about being able to call a kernel function when running in kernel
> context? Perhaps there is some fundamental limitation of kprobes that
> I don't understand.

The main requirement for kprobes handlers is that they can't sleep. You
could definitely call a kernel function from kprobe handlers as long as
the function doesn't sleep.

Ananth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] generic rwsems

2007-04-13 Thread Christoph Hellwig

On Fri, Apr 13, 2007 at 12:44:50PM +0100, David Howells wrote:
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > I think I should put wait_lock after wait_list, so as to get a better
> > packing on most 64-bit architectures.
> 
> It makes no difference.  struct lockdep_map contains at least one pointer and
> so is going to be 8-byte aligned (assuming it's there at all).  struct
> rw_semaphore contains at least one pointer/long, so it will be padded out to
> 8-byte size.

I hope people are not going to enabled lockdep on their production systems :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] dont force uclinux mtd map to be root dev

2007-04-13 Thread Mike Frysinger

the cheesy uclinux mtd maps can be used for more than just the root device, so 
i think we should drop the forcing.  also, i feel like this is a policy 
decision that shouldnt be in the kernel in the first place.  people who have 
been lazy and boot with uclinux mtd maps and dont put root= into their 
commandline can simply add the appropriate root= line either into their 
bootloader or into the compiled in bootargs

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
---
diff --git a/drivers/mtd/maps/uclinux.c b/drivers/mtd/maps/uclinux.c
index 389fea2..14ffb1a 100644
--- a/drivers/mtd/maps/uclinux.c
+++ b/drivers/mtd/maps/uclinux.c
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -89,10 +88,6 @@ int __init uclinux_mtd_init(void)
uclinux_ram_mtdinfo = mtd;
add_mtd_partitions(mtd, uclinux_romfs, NUM_PARTITIONS);
 
-   printk("uclinux[mtd]: set %s to be root filesystem\n",
-   uclinux_romfs[0].name);
-   ROOT_DEV = MKDEV(MTD_BLOCK_MAJOR, 0);
-
return(0);
 }
 


pgp3IugvsJOB5.pgp
Description: PGP signature

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread Ananth N Mavinakayanahalli

On Fri, Apr 13, 2007 at 12:54:36PM +1000, Nick Piggin wrote:
> Matt Mackall wrote:
> >On Fri, Apr 13, 2007 at 12:21:25PM +1000, Nick Piggin wrote:
> >
> >>Matt Mackall wrote:
> >>
> >>>On Fri, Apr 13, 2007 at 11:42:29AM +1000, Nick Piggin wrote:
> >>
> If kprobes is simply crappy and doesn't work properly for this, then I
> could accept that. I'm not someone trying to get this info. So why can't
> it be used? (not just for kpagemap, but for clear_refs and all that gunk
> too).
> >>>
> >>>
> >>>kprobes is good for looking at events, but bad for looking at state.
> >>>Especially metric shitloads of state.
> >>
> >>Why? Why is a kprobes trap significantly more expensive than a read
> >>syscall?
> >
> >
> >I guess I'm not clear on what you're proposing. From my understanding
> >of kprobes (admittedly not an expert), this is hard to do and not a
> >very good match.
> 
> But you have an idea that it is bad for exposing lots of data. Why?
> (I'm not a kprobes expert either, these are not rhetorical questions)

You could tie your kprobe module to use relay channels. Kprobe handlers
run lockless and using the per-cpu relay channels will provide a fast
transport mechanism for exposing lots of data.

http://relayfs.sourceforge.net/examples.html#tprintk_kprobes is an
example using the earlier relayfs interface. It shouldn't be that hard
to change it to use the newer relay stuff.

AFAIK acme is using a similar mechanism for ctracer
(http://oops.ghostprotocols.net:81/blog/?p=50)

Ananth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc5: swsusp: Not enough free memory

2007-04-13 Thread Nigel Cunningham

Hi.

On Fri, 2007-04-13 at 14:00 +0200, Rafael J. Wysocki wrote:
> > 
> > Shrinking memory...  Pages needed: 128103 normal, 0 highmem
> > Pages needed: 125226 normal, 0 highmem
> > Pages needed: -5757 normal, 0 highmem
> > Pages needed: -5757 normal, 0 highmem
> > Pages needed: -5757 normal, 0 highmem
> > Pages needed: -5757
> > Pages needed: 127953 normal, 0 highmem
> > Pages needed: 125076 normal, 0 highmem
> > Pages needed: -6043 normal, 0 highmem
> > Pages needed: -6043 normal, 0 highmem
> > Pages needed: -6043 normal, 0 highmem
> > Pages needed: -6043
> > done (200 pages freed)
> > Freed 800 kbytes in 0.16 seconds (5.00 MB/s)
> > Suspending console(s)
> > ...
> > CPU1 is down
> > swsusp: critical section:
> > swsusp: Need to copy 131358 pages
> > swsusp: Normal pages needed: 131358
> > swsusp: Normal pages needed: 131358 + 1024 + 22, available pages: 130607
> 
> Well, it looks like someone allocated about 6000 pages after we had freed
> enough memory for suspending.

We have a tunable allowance in Suspend2 for this, because fglrx
allocates a lot of pages in its suspend routine if DRI is enabled. I
think some other drivers do too, but fglrx is the main one I know.

Nigel



signature.asc
Description: This is a digitally signed message part

Re: [CRYPTO] is it really optimized ?

2007-04-13 Thread Helge Hafting


Francis Moreau wrote:

Hi,

After reading the crypto code and trying to implement a AES driver,
I'm wondering if the current  implementation is optimum. My plan is to
use _exclusively_ the AES driver to encrypt filesystems by using
eCryptfs for example.

But it seems that because the current implementation of the crypto
core allows the drivers to be accessed by any part of the kernel at
any time, that forces the AES driver to do extra works for each block
ciphering: mainly they are (a) set the key in AES controller (b)
generate the decryption key if in decrypt mode.

So is this interpretation right ? If so wouldn't it be appropriate to
introduce a mechanism to reserve this AES hardware for a special
purpose (filesystem encryptions) and thus make it as fast as possible
?


Would this really help?
When reading/writing files, most of the time is i/o-wait, isn't it?

Reserving the device exclusively seems excessive. How about
a quick test to see if someone else have been using it since
the last time your crypto-fs used it?  If nobody else used it, then
you don't need to reset they key and so on.

If nobody else is using the AES controller, you get the same speed
as with a reservation.  If something else is using AES then it won't be
as fast, but then the AES controller have been used for other useful
work as well.  Other parts of the kernel surely won't use it just for 
fun. :-)


Helge Hafting


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] dont force uclinux mtd map to be root dev

2007-04-13 Thread Christoph Hellwig

On Fri, Apr 13, 2007 at 08:19:31AM -0400, Mike Frysinger wrote:
> the cheesy uclinux mtd maps can be used for more than just the root device, 
> so 
> i think we should drop the forcing.  also, i feel like this is a policy 
> decision that shouldnt be in the kernel in the first place.  people who have 
> been lazy and boot with uclinux mtd maps and dont put root= into their 
> commandline can simply add the appropriate root= line either into their 
> bootloader or into the compiled in bootargs

Agreed.  I remember trying to get rid of this a while ago, but I can't
remember what happened to it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] apm: Fix incorrect comment

2007-04-13 Thread Alan Cox

HZ has not always been 100Hz for some time.

Signed-off-by: Alan Cox <[EMAIL PROTECTED]>

diff -u --exclude-from /usr/src/exclude --new-file --recursive 
linux.vanilla-2.6.21-rc6-mm1/arch/i386/kernel/apm.c 
linux-2.6.21-rc6-mm1/arch/i386/kernel/apm.c
--- linux.vanilla-2.6.21-rc6-mm1/arch/i386/kernel/apm.c 2007-04-12 
14:15:03.0 +0100
+++ linux-2.6.21-rc6-mm1/arch/i386/kernel/apm.c 2007-04-12 14:27:23.0 
+0100
@@ -1173,7 +1173,7 @@
unsigned long flags;
 
spin_lock_irqsave(&i8253_lock, flags);
-   /* set the clock to 100 Hz */
+   /* set the clock to HZ */
outb_p(0x34, PIT_MODE); /* binary, mode 2, LSB/MSB, ch 0 */
udelay(10);
outb_p(LATCH & 0xff, PIT_CH0);  /* LSB */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] aacraid: 2.6.21-rc6-mm1 aacraid not finding device

2007-04-13 Thread Salyzyn, Mark

Thanks for the help from Steve Fox and Duane Cox investigating this
issue, I'd like to report that we found the problem. The issue is with
the patch Steve Fox isolated below, by not accommodating older adapters
properly and issuing a command they do not support when retrieving
storage parameters about the arrays. This simple patch resolves the
problem (and more accurately mimics the logic of the original code
before the patch).

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's
handling of patches.

This attached patch is against current scsi-misc-2.6 and can apply to
2.6.21-rc6-mm1. Please consider it for expedited inclusion.

Signed-off-by: Mark Salyzyn <[EMAIL PROTECTED]>

---

Sincerely -- Mark Salyzyn

> -Original Message-
> From: Steve Fox [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, April 10, 2007 6:21 PM
> To: Andrew Morton
> Cc: [EMAIL PROTECTED]; Salyzyn, Mark; 
> [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
> [EMAIL PROTECTED]
> Subject: Re: 2.6.21-rc6-mm1 aacraid not finding device
> 
> 
> On Sun, 2007-04-08 at 14:35 -0700, Andrew Morton wrote:
> > 
>
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6
/2.6.21-rc6-mm1/

Since 2.6.21-rc5-mm1, one of the test.kernel.org machines (elm3b239) has
not been able to boot because it cannot find the SCSI device. You can
view http://test.kernel.org/abat/82623/debug/console.log for the latest
boot log (rc6-mm1).

I tracked this down to the git-scsi-misc patch in the -mm tree and then
bisected the scsi-misc git tree until I reached the commit below from
Mark Salyzyn:

fe76df4235986cfacc2d3b71cef7c42bc1a6dd6c

[SCSI] aacraid: Fix blocking issue with container probing function (cast
update)

This is a pretty big patch, so hopefully Mark can take a look at it.
lspci shows

01:02.0 RAID bus controller: Adaptec AAC-RAID (rev 02)
0f:02.0 SCSI storage controller: Adaptec AIC-9410W SAS (Razor ASIC
non-RAID) (rev 08)
1d:02.0 SCSI storage controller: Adaptec AIC-9410W SAS (Razor ASIC
non-RAID) (rev 08)
2b:02.0 SCSI storage controller: Adaptec AIC-9410W SAS (Razor ASIC
non-RAID) (rev 08)

on 2.6.21-rc6. Let me know if I can provide more details.

-- 

Steve Fox
IBM Linux Technology Center

aacraid_logical_bugfix.patch
Description: aacraid_logical_bugfix.patch

Re: [Feature Request?] Inline compression of process core dumps

2007-04-13 Thread Alan Cox

> Looking at the code, it seems to me that format_corename() is appending 
> .pid, regardless if !core_uses_pid and corename[0]=='|', in which case 
> it creates an invalid path for call_usermodehelper_pipe().
> 
> Bug in the code, or bug in my methods?

This looks somewhat better and might do the trick. Also fixes a very very
obscure security corner case. If you change core pattern to start with
the program name then the user can run a program called "|myevilhack" as
it stands. The patch checks for "|" in the pattern not the output and
doesn't nail a pid on to a piped name.

Signed-off-by: Alan Cox <[EMAIL PROTECTED]>

diff -u --exclude-from /usr/src/exclude --new-file --recursive 
linux.vanilla-2.6.21-rc6-mm1/fs/exec.c linux-2.6.21-rc6-mm1/fs/exec.c
--- linux.vanilla-2.6.21-rc6-mm1/fs/exec.c  2007-04-12 14:15:05.0 
+0100
+++ linux-2.6.21-rc6-mm1/fs/exec.c  2007-04-13 13:11:20.709835952 +0100
@@ -1265,13 +1265,17 @@
  * name into corename, which must have space for at least
  * CORENAME_MAX_SIZE bytes plus one byte for the zero terminator.
  */
-static void format_corename(char *corename, const char *pattern, long signr)
+static int format_corename(char *corename, const char *pattern, long signr)
 {
const char *pat_ptr = pattern;
char *out_ptr = corename;
char *const out_end = corename + CORENAME_MAX_SIZE;
int rc;
int pid_in_pattern = 0;
+   int ispipe = 0;
+   
+   if (*pattern == '|')
+   ispipe = 1;
 
/* Repeat as long as we have more pattern to process and more output
   space */
@@ -1362,8 +1366,8 @@
 *
 * If core_pattern does not include a %p (as is the default)
 * and core_uses_pid is set, then .%pid will be appended to
-* the filename */
-   if (!pid_in_pattern
+* the filename. Do not do this for piped commands. */
+   if (!ispipe && !pid_in_pattern
 && (core_uses_pid || atomic_read(¤t->mm->mm_users) != 1)) {
rc = snprintf(out_ptr, out_end - out_ptr,
  ".%d", current->tgid);
@@ -1371,8 +1375,9 @@
goto out;
out_ptr += rc;
}
-  out:
+out:
*out_ptr = 0;
+   return ispipe;
 }
 
 static void zap_process(struct task_struct *start)
@@ -1523,16 +1528,15 @@
 * uses lock_kernel()
 */
lock_kernel();
-   format_corename(corename, core_pattern, signr);
+   ispipe = format_corename(corename, core_pattern, signr);
unlock_kernel();
-   if (corename[0] == '|') {
+   if (ispipe) {
/* SIGPIPE can happen, but it's just never processed */
if(call_usermodehelper_pipe(corename+1, NULL, NULL, &file)) {
printk(KERN_INFO "Core dump to %s pipe failed\n",
   corename);
goto fail_unlock;
}
-   ispipe = 1;
} else
file = filp_open(corename,
 O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] generic rwsems

2007-04-13 Thread Nick Piggin

David, you keep saying the same things and don't listen to me.

On Fri, Apr 13, 2007 at 01:09:42PM +0100, David Howells wrote:
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > This patch converts all architectures to a generic rwsem implementation,
> > which will compile down to the same code for i386, or powerpc, for
> > example,
> 
> > and will allow some (eg. x86-64) to move away from spinlock based rwsems.
> 
> Which are better on UP kernels because spinlocks degrade to nothing, and then
> you're left with a single disable/enable interrupt pair per operation, and no
> requirement for atomic ops at all.

On UP, if an IRQ disable/enable pair operation is _faster_ than the atomic
op, then the architecture can and should impelemnt atomic ops on UP by
doing exactly that.

> What you propose may wind up with several per op because if the CPU does not
> support atomic ops directly and cannot emulate them through other atomic ops,
> then these have to be emulated by:
> 
>   atomic_op() {
>   spin_lock_irqsave
>   do op
>   spin_unlock_irqrestore
>   }

Yes, this is the case on our 2 premiere SMP powerhouse architectures,
sparc32 and parsic.

The other way happens to be better for everyone else, which is why I
think your suggestion to instead move everyone to the spinlock version
was weird.

Finally: as I said, even for those 2 architectures, this may not be so
bad because it is using a different spinlock for the fastpath as it is
for the slowpath. So your uncontested, cache cold case will get a little
slower, but the contested case could improve a lot (eg. I saw an order of
magnitude improvement).

> > Move to an architecture independent rwsem implementation, using the
> > better of the two rwsem implementations
> 
> That's not necessarily the case, as I said above.
> 
> Furthermore, the spinlock implementation struct is smaller on 64-bit machines,
> and is less prone to counter overrun on 32-bit machines.

I think 64-bit machines will be happy to take the extra word it if they
have double the single threaded performance, quadruple the parallel read
performance, and 10 times the contested read performance.

32-bit machines might indeed overflow, but if it hasn't been a problem
for i386 or (even 64-bit) powerpc yet, is it a real worry? 

> > Out-of-line the fastpaths, to bring rw-semaphores into line with
> > mutexes and spinlocks WRT our icache vs function call policy.
> 
> That should depend on whether you optimise for space or for speed.  Function
> calls are relatively heavyweight.

This is what spinlocks and mutexes do, and they're much more common than
rwsems. I'm just trying to make it consistent, and you can muck around
with it all you want after that. It is actually very easy to inline
things now, unlike before my patch.

> Please re-inline and fix Ingo's mess if you must clean up.  Take the i386
> version, for instance, I'd made it so that the compiler didn't know it was
> taking a function call when it went down the slow path, thus meaning the
> compiler didn't have to deal with that.  Furthermore, it only interpolated two
> or three instructions into the calling code in the fastpath.  It's a real 
> shame
> that gcc inline asm doesn't allow you to use status flags as boolean returns,
> otherwise I could reduce that even further.

I cleaned it.

> > Spinlock based rwsems are inferior to atomic based ones one most
> > architectures that can do atomic ops without spinlocks:
> 
> Note the "most" in your statement...
> 
> Think about it.  This algorithm is only optimal where XADD is available.  If
> you don't have XADD, but you do have LL/SC or CMPXCHG, you can do better.

You keep saying this too, and I have thought about it but I couldn't think
of a much better way. I'm not saying you're wrong, but why don't you just
tell me what that better way is?

> If the only atomic op you have is XCHG, then this is a really poor choice;

What is better? spinlocks? I think that considering only 2 dead archs
really care at this stage, and I have good reason to believe that the
contested case will be impreoved, then why don't you come up with some
numbers to prove me wrong?

> similarly if you are using a UP-compiled kernel.

Then your UP-compiled kernel's atomic ops are suboptimal, not the rwsem
implementation.

Anyway, thanks for taking the time again. If you would please address each
of my points, then we might finally be able to stop having this discussion
every 6 months ;)

Nick

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: kernel BUG at mm/rmap.c:522!

2007-04-13 Thread Hugh Dickins

On Fri, 13 Apr 2007, Francesco Ricci wrote:
> 
> [7.7.] Other information that might be relevant to the problem
>(please look in /proc and include all information that you
>think to be relevant):

Thanks for your report, and for your patience in supplying all that
scarcely relevant info you were asked for.

> 
> from dmesg:
> Apr 13 11:31:35 localhost kernel: Bad pte = 461b43d4, process = ???,
> vm_flags = 75, vaddr = b7868000

Oh dear, one of your page tables has got corrupted, the page table
entry for virtual address b7868000 contains 461b43d4 - nonsense.

> Apr 13 11:31:35 localhost kernel:  [] vm_normal_page+0x3e/0x53
> Apr 13 11:31:35 localhost kernel:  [] unmap_vmas+0x183/0x4af
> Apr 13 11:31:35 localhost kernel:  [] exit_mmap+0x6a/0xd7
> Apr 13 11:31:35 localhost kernel:  [] mmput+0x20/0x76
> Apr 13 11:31:35 localhost kernel:  [] do_exit+0x193/0x71b
> Apr 13 11:31:35 localhost kernel:  [] sys_exit_group+0x0/0xd
> Apr 13 11:31:35 localhost kernel:  []
> get_signal_to_deliver+0x395/0x3bc
> Apr 13 11:31:35 localhost kernel:  [] do_notify_resume+0x71/0x5d7
> Apr 13 11:31:35 localhost kernel:  []
> default_wake_function+0x0/0xc
> Apr 13 11:31:35 localhost kernel:  [] do_gettimeofday+0x31/0xce
> Apr 13 11:31:35 localhost kernel:  [] sys_futex+0xdc/0xf1
> Apr 13 11:31:35 localhost kernel:  [] work_notifysig+0x13/0x19
> Apr 13 11:31:35 localhost kernel: [ cut here ]
> Apr 13 11:31:35 localhost kernel: kernel BUG at mm/rmap.c:522!
> Apr 13 11:31:35 localhost kernel: invalid opcode:  [#1]
> Apr 13 11:31:35 localhost kernel: SMP 
> Apr 13 11:31:35 localhost kernel: Modules linked in: smbfs ext3 jbd
> mbcache mga drm ppdev lp button ac battery ipv6 fuse dm_snapshot dm_mirror
> dm_mod loop tsdev snd_via82xx gameport snd_ac97_codec snd_ac97_bus
> snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_mpu401_uart
> snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq
> i2c_viapro i2c_core snd_timer snd_rawmidi snd_seq_device via_agp
> parport_pc parport via_ircc psmouse serio_raw floppy snd soundcore pcspkr
> rtc agpgart shpchp pci_hotplug irda crc_ccitt evdev reiserfs ide_cd cdrom
> ide_disk generic via_rhine mii ehci_hcd uhci_hcd via82cxxx ide_core
> usbcore thermal processor fan
> Apr 13 11:31:35 localhost kernel: CPU:0
> Apr 13 11:31:35 localhost kernel: EIP:0060:[]Not tainted
> VLI
> Apr 13 11:31:35 localhost kernel: EFLAGS: 00210286   (2.6.18-4-686 #1) 
> Apr 13 11:31:35 localhost kernel: EIP is at page_remove_rmap+0x14/0x2d
> Apr 13 11:31:35 localhost kernel: eax:    ebx: c100   ecx:
> c100   edx: 
> Apr 13 11:31:35 localhost kernel: esi: b7869000   edi:    ebp:
> d08011a4   esp: c9be9e14
> Apr 13 11:31:35 localhost kernel: ds: 007b   es: 007b   ss: 0068
> Apr 13 11:31:35 localhost kernel: Process iceape-bin (pid: 20500,
> ti=c9be8000 task=c2128aa0 task.ti=c9be8000)
> Apr 13 11:31:35 localhost kernel: Stack: c014b7d5  df10d278
> c9be9e7c  0001 b7884000 c4bc7b78 
> Apr 13 11:31:35 localhost kernel:e80fbe40 c16058a0 
> ffe2 c121002c c4bc7b78 0011ed07 b7884000 
> Apr 13 11:31:35 localhost kernel: c9be9e7c df4c7710
> e80fbe40 c9be9eb8 c014de31  c9be9e78 

And the next page table entry, for virtual address b7869000, has also
got corrupted: I'm rather guessing, but I believe the c100 implies
it's looking at pfn 0, so the page table entry in question would be
that 0001 seen on the stack (1 for present).

That by itself would suggest a single-bit error, which would point
you to running memtest86 overnight to check your RAM.  Worth a try.

But the 461b43d4 before it suggests corruption from elsewhere in
the kernel: sorry, I've no clue on that.  Just wait and see if this
happens again, and whether a pattern emerges - unless someone else
can suggest something better.

Hugh

> Apr 13 11:31:35 localhost kernel: Call Trace:
> Apr 13 11:31:35 localhost kernel:  [] unmap_vmas+0x25e/0x4af
> Apr 13 11:31:35 localhost kernel:  [] exit_mmap+0x6a/0xd7
> Apr 13 11:31:35 localhost kernel:  [] mmput+0x20/0x76
> Apr 13 11:31:35 localhost kernel:  [] do_exit+0x193/0x71b
> Apr 13 11:31:35 localhost kernel:  [] sys_exit_group+0x0/0xd
> Apr 13 11:31:35 localhost kernel:  []
> get_signal_to_deliver+0x395/0x3bc
> Apr 13 11:31:35 localhost kernel:  [] do_notify_resume+0x71/0x5d7
> Apr 13 11:31:35 localhost kernel:  []
> default_wake_function+0x0/0xc
> Apr 13 11:31:35 localhost kernel:  [] do_gettimeofday+0x31/0xce
> Apr 13 11:31:35 localhost kernel:  [] sys_futex+0xdc/0xf1
> Apr 13 11:31:35 localhost kernel:  [] work_notifysig+0x13/0x19
> Apr 13 11:31:35 localhost kernel: Code: ff ff 85 c0 89 c6 75 c9 b0 01 86
> 43 28 83 c4 20 89 e8 5b 5e 5f 5d c3 89 c1 90 83 40 08 ff 0f 98 c0 84 c0 74
> 1e 8b 41 08 40 79 08 <0f> 0b 0a 02 40 ab 29 c0 8b 51 10 89 c8 83 f2 01 83
> e2 01 e9 97 
> Apr 13 11:31:35 localhost kernel: EIP: []
> page_remove_rmap+0x14/0x2d SS:ESP 0068:c9be9e14
> A

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-13 Thread Nick Piggin


Ananth N Mavinakayanahalli wrote:

On Fri, Apr 13, 2007 at 12:50:20PM +1000, Nick Piggin wrote:



It definitely seems like you can use some kernel functions, but the
ones I saw may just be systemtap facilities. But what is so surprising
about being able to call a kernel function when running in kernel
context? Perhaps there is some fundamental limitation of kprobes that
I don't understand.



The main requirement for kprobes handlers is that they can't sleep. You
could definitely call a kernel function from kprobe handlers as long as
the function doesn't sleep.


That would be enough to access basically all the VM data structures.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6.20.4] BUG: dentry xattrs still in use in shrink_dcache_for_umount() with reiserfs

2007-04-13 Thread Andrea Righi

I can reproduce the problem umounting my /var (reiserfs), but it doesn't
occur with /usr or /opt, that are reiserfs too.

It seems very similar to this issue: 
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc3/2.6.18-rc3-mm2/hot-fixes/reiserfs-make-sure-all-dentries-refs-are-released-before-calling-kill_block_super-try-2.patch

How the xattrs->d_count can be 1 if the dentry is explicitly released
in reiserfs_kill_sb(), before calling kill_super_block()?

(config attached)

-Andrea

BUG: Dentry dfcd2570{i=21bc,n=xattrs} still in use (1) [unmount of reiserfs 
dm-4]
[ cut here ]
kernel BUG at fs/dcache.c:623!
invalid opcode:  [#1]
PREEMPT
Modules linked in: fuse hdaps ipt_TCPMSS xt_tcpudp ipt_LOG xt_limit 
cpufreq_ondemand cpufreq_powersave cpufreq_userspace speedstep_centrino 
ibm_acpi backlight button battery ac snd_pcm_oss snd_mixer_oss snd_seq 
snd_seq_device xt_state ipt_REJECT usbhid iptable_mangle iptable_nat nf_nat 
iptable_filter nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables x_tables 
hci_usb bluetooth ehci_hcd uhci_hcd usbcore snd_intel8x0 snd_ac97_codec 
ac97_bus snd_pcm snd_timer snd_page_alloc tg3 reiserfs dm_snapshot dm_mod fan 
thermal processor sr_mod cdrom ata_piix ahci
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010286   (2.6.20.4 #8)
EIP is at shrink_dcache_for_umount_subtree+0x236/0x270
eax: 0055   ebx: dfcd25c8   ecx:    edx: 
esi: dfcd2570   edi: f5afad24   ebp: 21bc   esp: f6bd9ed0
ds: 007b   es: 007b   ss: 0068
Process umount (pid: 6659, ti=f6bd8000 task=f6374a70 task.ti=f6bd8000)
Stack: c038f488 dfcd2570 21bc dfcd25c8 0001 f8bee08b f5afad24 f5afac00
   f5afac00 f6bd9f40 dfff4ac0 c0173de5 f8bf92e0 f5afac00 c0164208 c1a32c80
   f5afac00 c1a32c80 f5afac00 c0164325 f5afac00 f8bf92a0 c0164545 
Call Trace:
 [] shrink_dcache_for_umount+0x25/0x50
 [] generic_shutdown_super+0x18/0x110
 [] kill_block_super+0x25/0x40
 [] deactivate_super+0x55/0x80
 [] sys_umount+0x4a/0x210
 [] unmap_region+0xb9/0x120
 [] do_page_fault+0x327/0x650
 [] sys_oldumount+0x15/0x20
 [] sysenter_past_esp+0x5d/0x81
 ===
Code: 8b 00 74 03 8b 6a 20 89 7c 24 18 89 4c 24 10 89 5c 24 0c 89 6c 24 08 89 
74 24 04 89 44 24 14 c7 04 24 88 f4 38 c0 e8 ea 87 fa ff <0f> 0b eb fe 0f 0b eb 
fe 83 c4 1c 5b 5e 5f 5d e9 16 bf 1c 00 e8
EIP: [] shrink_dcache_for_umount_subtree+0x236/0x270 SS:ESP 
0068:f6bd9ed0




config.gz
Description: GNU Zip compressed data

[PATCH 1/3] fix kthread_create() vs freezer theoretical race

2007-04-13 Thread Oleg Nesterov

kthread() sleeps in TASK_INTERRUPTIBLE state waiting for the first wakeup.
In theory, this wakeup may come from freeze_process()->signal_wake_up(),
so the task can disappear even before kthread_create() sets its ->comm.

Change kthread() to use TASK_UNINTERRUPTIBLE.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 2.6.21-rc5/kernel/kthread.c~0_FREEZER   2007-04-13 14:52:44.0 
+0400
+++ 2.6.21-rc5/kernel/kthread.c 2007-04-13 15:36:43.0 +0400
@@ -70,7 +70,7 @@ static int kthread(void *_create)
data = create->data;
 
/* OK, tell user we're spawned, wait for stop or wakeup */
-   __set_current_state(TASK_INTERRUPTIBLE);
+   __set_current_state(TASK_UNINTERRUPTIBLE);
complete(&create->started);
schedule();
 
@@ -174,7 +174,7 @@ EXPORT_SYMBOL(kthread_create);
  */
 void kthread_bind(struct task_struct *k, unsigned int cpu)
 {
-   BUG_ON(k->state != TASK_INTERRUPTIBLE);
+   BUG_ON(k->state != TASK_UNINTERRUPTIBLE);
/* Must have done schedule() in kthread() before we set_task_cpu */
wait_task_inactive(k);
set_task_cpu(k, cpu);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] make kthread_create() more scalable

2007-04-13 Thread Oleg Nesterov

If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
complete(&create->started) + schedule(). After that it can't be woken because
nobody can see the new task yet. This means:

- we don't need tasklist_lock for find_task_by_pid().

- create_kthread() doesn't need to wait for create->started. Instead,
  kthread_create() first waits for create->created to get the result of
  kernel_thread(), then waits for create->started to synchronize with
  kthread().

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 2.6.21-rc5/kernel/kthread.c~1_CREATE2007-04-13 14:39:21.0 
+0400
+++ 2.6.21-rc5/kernel/kthread.c 2007-04-13 14:52:44.0 +0400
@@ -24,11 +24,11 @@ struct kthread_create_info
/* Information passed to kthread() from kthreadd. */
int (*threadfn)(void *data);
void *data;
+   struct completion created;
struct completion started;
 
/* Result passed back to kthread_create() from kthreadd. */
-   struct task_struct *result;
-   struct completion done;
+   pid_t result;
 
struct list_head list;
 };
@@ -91,15 +91,9 @@ static void create_kthread(struct kthrea
 
/* We want our own signal handler (we take no signals by default). */
pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
-   if (pid < 0) {
-   create->result = ERR_PTR(pid);
-   } else {
-   wait_for_completion(&create->started);
-   read_lock(&tasklist_lock);
-   create->result = find_task_by_pid(pid);
-   read_unlock(&tasklist_lock);
-   }
-   complete(&create->done);
+   create->result = pid;
+
+   complete(&create->created);
 }
 
 /**
@@ -127,27 +121,31 @@ struct task_struct *kthread_create(int (
   ...)
 {
struct kthread_create_info create;
+   struct task_struct *ret;
+   va_list args;
 
create.threadfn = threadfn;
create.data = data;
+   init_completion(&create.created);
init_completion(&create.started);
-   init_completion(&create.done);
 
spin_lock(&kthread_create_lock);
list_add_tail(&create.list, &kthread_create_list);
-   wake_up_process(kthreadd_task);
spin_unlock(&kthread_create_lock);
+   wake_up_process(kthreadd_task);
 
-   wait_for_completion(&create.done);
+   wait_for_completion(&create.created);
+   if (create.result < 0)
+   return ERR_PTR(create.result);
 
-   if (!IS_ERR(create.result)) {
-   va_list args;
-   va_start(args, namefmt);
-   vsnprintf(create.result->comm, sizeof(create.result->comm),
- namefmt, args);
-   va_end(args);
-   }
-   return create.result;
+   wait_for_completion(&create.started);
+   ret = find_task_by_pid(create.result);
+
+   va_start(args, namefmt);
+   vsnprintf(ret->comm, sizeof(ret->comm), namefmt, args);
+   va_end(args);
+
+   return ret;
 }
 EXPORT_SYMBOL(kthread_create);
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] make kthread_stop() scalable

2007-04-13 Thread Oleg Nesterov

It's a shame kthread_stop() (may take a while!) runs with a global semaphore
held. With this patch kthread() allocates all neccesary data (struct kthread)
on its own stack, globals kthread_stop_xxx are deleted.

HACKS:

- re-use task_struct->set_child_tid to point to "struct kthread"

- use do_exit() directly to preserve "struct kthread" on stack

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 2.6.21-rc5/kernel/kthread.c~3_STOP  2007-04-13 16:28:41.0 +0400
+++ 2.6.21-rc5/kernel/kthread.c 2007-04-13 16:24:37.0 +0400
@@ -17,7 +17,25 @@
 
 static DEFINE_SPINLOCK(kthread_create_lock);
 static LIST_HEAD(kthread_create_list);
-struct task_struct *kthreadd_task;
+struct task_struct *kthreadd_task __read_mostly;
+
+struct kthread {
+   int should_stop;
+   struct task_struct *task;
+
+   struct completion exited;
+   int err;
+};
+
+static inline struct kthread *to_kthread(struct task_struct *t)
+{
+   return (void*)t->set_child_tid;
+}
+
+static inline void set_kthread(struct kthread *self)
+{
+   current->set_child_tid = (void __user*)self;
+}
 
 struct kthread_create_info
 {
@@ -27,24 +45,12 @@ struct kthread_create_info
struct completion created;
struct completion started;
 
-   /* Result passed back to kthread_create() from kthreadd. */
-   pid_t result;
+   /* Result passed back to kthread_create() from kthread. */
+   struct kthread *result;
 
struct list_head list;
 };
 
-struct kthread_stop_info
-{
-   struct task_struct *k;
-   int err;
-   struct completion done;
-};
-
-/* Thread stopping is done by setthing this var: lock serializes
- * multiple kthread_stop calls. */
-static DEFINE_MUTEX(kthread_stop_lock);
-static struct kthread_stop_info kthread_stop_info;
-
 /**
  * kthread_should_stop - should this kthread return now?
  *
@@ -54,20 +60,28 @@ static struct kthread_stop_info kthread_
  */
 int kthread_should_stop(void)
 {
-   return (kthread_stop_info.k == current);
+   return to_kthread(current)->should_stop;
 }
 EXPORT_SYMBOL(kthread_should_stop);
 
 static int kthread(void *_create)
 {
-   struct kthread_create_info *create = _create;
-   int (*threadfn)(void *data);
-   void *data;
-   int ret = -EINTR;
+   struct kthread self = {
+   .task = current,
+   .err = -EINTR,
+   };
 
/* Copy data: it's on kthread's stack */
-   threadfn = create->threadfn;
-   data = create->data;
+   struct kthread_create_info *create = _create;
+   int (*threadfn)(void *data) = create->threadfn;
+   void *data = create->data;
+
+   /*
+* This should be enough to assure that self is still on
+* stack when we enter do_exit()
+*/
+   set_kthread(&self);
+   create->result = &self;
 
/* OK, tell user we're spawned, wait for stop or wakeup */
__set_current_state(TASK_UNINTERRUPTIBLE);
@@ -75,13 +89,13 @@ static int kthread(void *_create)
schedule();
 
if (!kthread_should_stop())
-   ret = threadfn(data);
+   self.err = threadfn(data);
 
-   /* It might have exited on its own, w/o kthread_stop.  Check. */
-   if (kthread_should_stop()) {
-   kthread_stop_info.err = ret;
-   complete(&kthread_stop_info.done);
-   }
+   /* It might have exited on its own, w/o kthread_stop. Check. */
+   if (kthread_should_stop())
+   complete(&self.exited);
+
+   do_exit(0);
return 0;
 }
 
@@ -91,7 +105,7 @@ static void create_kthread(struct kthrea
 
/* We want our own signal handler (we take no signals by default). */
pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
-   create->result = pid;
+   create->result = ERR_PTR(pid);
 
complete(&create->created);
 }
@@ -135,11 +149,11 @@ struct task_struct *kthread_create(int (
wake_up_process(kthreadd_task);
 
wait_for_completion(&create.created);
-   if (create.result < 0)
-   return ERR_PTR(create.result);
+   if (IS_ERR(create.result))
+   return (void*)create.result;
 
wait_for_completion(&create.started);
-   ret = find_task_by_pid(create.result);
+   ret = create.result->task;
 
va_start(args, namefmt);
vsnprintf(ret->comm, sizeof(ret->comm), namefmt, args);
@@ -183,27 +197,23 @@ EXPORT_SYMBOL(kthread_bind);
  */
 int kthread_stop(struct task_struct *k)
 {
+   struct kthread *kthread;
int ret;
 
-   mutex_lock(&kthread_stop_lock);
-
/* It could exit after stop_info.k set, but before wake_up_process. */
get_task_struct(k);
+   kthread = to_kthread(k);
 
-   /* Must init completion *before* thread sees kthread_stop_info.k */
-   init_completion(&kthread_stop_info.done);
+   /* Must init completion *before* thread sees ->should_stop */
+   init_completion(&

RE: [PATCH 1/1] cciss: kconfig patch to make cciss dependent onscsi for SG_IO ioctl

2007-04-13 Thread Cameron, Steve


Well, with SCSI turned off, it didn't compile:


drivers/block/cciss.c: In function `cciss_ioctl':
drivers/block/cciss.c:1180: error: `SCSI_IOCTL_GET_IDLUN' undeclared (first use 
in this function)
drivers/block/cciss.c:1180: error: (Each undeclared identifier is reported only 
once
drivers/block/cciss.c:1180: error: for each function it appears in.)
drivers/block/cciss.c:1181: error: `SCSI_IOCTL_GET_BUS_NUMBER' undeclared 
(first use in this function)
make[2]: *** [drivers/block/cciss.o] Error 1
make[1]: *** [drivers/block] Error 2
make: *** [drivers] Error 2

-- steve


-Original Message-
From: James Bottomley [mailto:[EMAIL PROTECTED]
Sent: Thu 4/12/2007 5:07 PM
To: Mike Miller (OS Dev)
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
Cameron, Steve
Subject: Re: [PATCH 1/1] cciss: kconfig patch to make cciss dependent onscsi 
for SG_IO ioctl
 
On Thu, 2007-04-12 at 16:27 -0500, Mike Miller (OS Dev) wrote:
> This kconfig patch makes cciss dependent on scsi for the new SG_IO ioctl we 
> just
> added. If cciss is built into the kernel it makes sures that scsi is also 
> statically
> linked. If scsi is a module then cciss will also be built as a module. Please 
> consider
> this for inclusion.

Erm, I don't think this is necessary: SG_IO is a property of the block
layer, not SCSI (as in it will work without SCSI even present).  SCSI
has a special driver: the sg driver that has its own implementation of
SG_IO, but that's the only connection.

James



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] cciss: kconfig patch to make cciss dependent onscsi for SG_IO ioctl

2007-04-13 Thread Christoph Hellwig

On Fri, Apr 13, 2007 at 01:58:41PM +0100, Cameron, Steve wrote:
> 
> Well, with SCSI turned off, it didn't compile:
> 
> 
> drivers/block/cciss.c: In function `cciss_ioctl':
> drivers/block/cciss.c:1180: error: `SCSI_IOCTL_GET_IDLUN' undeclared (first 
> use in this function)
> drivers/block/cciss.c:1180: error: (Each undeclared identifier is reported 
> only once
> drivers/block/cciss.c:1180: error: for each function it appears in.)
> drivers/block/cciss.c:1181: error: `SCSI_IOCTL_GET_BUS_NUMBER' undeclared 
> (first use in this function)

The two are defined in scsi/scsi.h unconditionally.  Make sure to add
an unconditional include of that header to your driver, apparently you're
only picking it up by accident when CONFIG_SCSI is enabled.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] generic rwsems

2007-04-13 Thread William Lee Irwin III

On Fri, Apr 13, 2007 at 02:43:03PM +0200, Nick Piggin wrote:
> Yes, this is the case on our 2 premiere SMP powerhouse architectures,
> sparc32 and parsic.

sparc32 is ultra-legacy and I have a tremendous amount of work to do on
SMP there. I don't feel that efficiency of locking primitives is a
crucial issue for sparc32.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: How should an exit routine wait for release() callbacks?

2007-04-13 Thread Cornelia Huck

On Fri, 13 Apr 2007 13:42:04 +0200,
"Markus Rechberger" <[EMAIL PROTECTED]> wrote:

> seems like you have the same problem as the dvb framework has/had.
> 
> http://mcentral.de/hg/~mrec/v4l-dvb-stable
> 
> The last 3 changesets do the trick to not oops, it will delay the 
> deinitialization of the device till the last user closed the device node.

Probably dumb question (since I'm not at all familiar with the dvb
code): Isn't that a different race you're solving there? I don't see
any driver core objects involved (except class devices created by
class_device_create, which obviously don't have the release function
problem). This looks more like a race of "we want an object to go
away, but a user still has a file open" (which would be similar to the
kobject<->sysfs lifetime rules issues, where work is currently ongoing).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] worker_thread: don't play with SIGCHLD

2007-04-13 Thread Eric W. Biederman

Oleg Nesterov <[EMAIL PROTECTED]> writes:

> depends on Eric's
>
>   kthread-dont-depend-on-work-queues-take-2.patch
>
> worker_thread() inherits ignored SIGCHLD from its parent, kthreadd.
> We can remove unneeded do_sigaction().

Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]>

Looks good.

We could delete all of:

>   set_user_nice(current, -5);
> 
>   /* Block and flush all signals */
>   sigfillset(&blocked);
>   sigprocmask(SIG_BLOCK, &blocked, NULL);
>   flush_signals(current);
> 
>   /*
>* We inherited MPOL_INTERLEAVE from the booting kernel.
>* Set MPOL_DEFAULT to insure node local allocations.
>*/
>   numa_default_policy();
> 
>   /* SIG_IGN makes children autoreap: see do_notify_parent(). */
>   sa.sa.sa_handler = SIG_IGN;
>   sa.sa.sa_flags = 0;
>   siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
>   do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);

If we wanted to, as that all comes by default from kthreadd.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/1] cciss: kconfig patch to make cciss dependent onscsi for SG_IO ioctl

2007-04-13 Thread Cameron, Steve



Hmm, now that I look again, those ioctls it's complaining
about are ones for which we just return ENOTTY, so I guess 
we don't really need them listed explicitly, and if they
weren't, the Kconfig patch would be unnecessary.

-- steve

-Original Message-
From: Cameron, Steve
Sent: Fri 4/13/2007 7:58 AM
To: James Bottomley; Mike Miller (OS Dev)
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: [PATCH 1/1] cciss: kconfig patch to make cciss dependent onscsi 
for SG_IO ioctl
 

Well, with SCSI turned off, it didn't compile:


drivers/block/cciss.c: In function `cciss_ioctl':
drivers/block/cciss.c:1180: error: `SCSI_IOCTL_GET_IDLUN' undeclared (first use 
in this function)
drivers/block/cciss.c:1180: error: (Each undeclared identifier is reported only 
once
drivers/block/cciss.c:1180: error: for each function it appears in.)
drivers/block/cciss.c:1181: error: `SCSI_IOCTL_GET_BUS_NUMBER' undeclared 
(first use in this function)
make[2]: *** [drivers/block/cciss.o] Error 1
make[1]: *** [drivers/block] Error 2
make: *** [drivers] Error 2

-- steve


-Original Message-
From: James Bottomley [mailto:[EMAIL PROTECTED]
Sent: Thu 4/12/2007 5:07 PM
To: Mike Miller (OS Dev)
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
Cameron, Steve
Subject: Re: [PATCH 1/1] cciss: kconfig patch to make cciss dependent onscsi 
for SG_IO ioctl
 
On Thu, 2007-04-12 at 16:27 -0500, Mike Miller (OS Dev) wrote:
> This kconfig patch makes cciss dependent on scsi for the new SG_IO ioctl we 
> just
> added. If cciss is built into the kernel it makes sures that scsi is also 
> statically
> linked. If scsi is a module then cciss will also be built as a module. Please 
> consider
> this for inclusion.

Erm, I don't think this is necessary: SG_IO is a property of the block
layer, not SCSI (as in it will work without SCSI even present).  SCSI
has a special driver: the sg driver that has its own implementation of
SG_IO, but that's the only connection.

James




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/8] unprivileged mount syscall

2007-04-13 Thread Serge E. Hallyn

Quoting Miklos Szeredi ([EMAIL PROTECTED]):
> > On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote:
> > > > 1. clone the master namespace.
> > > > 
> > > > 2. in the new namespace
> > > > 
> > > > move the tree under /share/$me to /
> > > > for each ($user, $what, $how) {
> > > > move /share/$user/$what to /$what
> > > > if ($how == slave) {
> > > >  make the mount tree under /$what as slave
> > > > }
> > > > }
> > > > 
> > > > 3. in the new namespace make the tree under 
> > > >/share as private and unmount /share
> > > 
> > > Thanks.  I get the basic idea now: the namespace itself need not be
> > > shared between the sessions, it is enough if "share" propagation is
> > > set up between the different namespaces of a user.
> > > 
> > > I don't yet see either in your or Viro's description how the trees
> > > under /share/$USER are initialized.  I guess they are recursively
> > > bound from /, and are made slaves.
> > 
> > yes. I suppose, when a userid is created one of the steps would be
> > 
> > mount --rbind / /share/$USER
> > mount --make-rslave /share/$USER
> > mount --make-rshared /share/$USER
> 
> Thinking a bit more about this, I'm quite sure most users wouldn't
> even want private namespaces.  It would be enough to
> 
>   chroot /share/$USER
> 
> and be done with it.
> 
> Private namespaces are only good for keeping a bunch of mounts
> referenced by a group of processes.  But my guess is, that the natural
> behavior for users is to see a persistent set of mounts.
> 
> If for example they mount something on a remote machine, then log out
> from the ssh session and later log back in, they would want to see
> their previous mount still there.
> 
> Miklos

Agreed on desired behavior, but not on chroot sufficing.  It actually
sounds like you want exactly what was outlined in the OLS paper.

Users still need to be in a different mounts namespace from the admin
user so long as we consider the deluser and backup problems to be
legitimate problems (well, so long as user mounts are allowed).  So,
when they log in, pam gives them a new namespace and chroots them into
/share/$USER.

Assuming I'm thinking clearly  :)

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [CRYPTO] is it really optimized ?

2007-04-13 Thread Francis Moreau

Hi,

On 4/13/07, Helge Hafting <[EMAIL PROTECTED]> wrote:

Francis Moreau wrote:
> So is this interpretation right ? If so wouldn't it be appropriate to
> introduce a mechanism to reserve this AES hardware for a special
> purpose (filesystem encryptions) and thus make it as fast as possible
> ?
>
Would this really help?
When reading/writing files, most of the time is i/o-wait, isn't it?

Well some systems, specially embedded ones, don't use hard drives for
mass storage purposes but rather MTD like flashes...

Reserving the device exclusively seems excessive. How about
a quick test to see if someone else have been using it since
the last time your crypto-fs used it?  If nobody else used it, then
you don't need to reset they key and so on.

You would spend the same time to make the test as loading the key...

If nobody else is using the AES controller, you get the same speed
as with a reservation.

I don't think this is right. AES hardware can do a block encryption in
a very few cycles. If you copy the key or test the key to see if it
has changed as you proposed, then you basically increase the execution
time by a factor 2 or 3...

If something else is using AES then it won't be
as fast, but then the AES controller have been used for other useful
work as well.  Other parts of the kernel surely won't use it just for
fun. :-)

You said it: others parts of the kernel are unlikely to use it. So why
not optimizing the common case ?

Crypto core already seems to implement a priority mechanism. But I
don't think I'm able to say "I'd like to use this algo for encrypting
filesystems. If another part of the kernel wants to use this algo then
give it the generic one". This choice seems really to depend on the
system the kernel is running.
--
Francis
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] generic rwsems

2007-04-13 Thread David Howells

Nick Piggin <[EMAIL PROTECTED]> wrote:

> The other way happens to be better for everyone else, which is why I
> think your suggestion to instead move everyone to the spinlock version
> was weird.

No, you misunderstand me.  My preferred solution is to leave it up to the arch
and not to make it generic, though I'm not averse to providing some prepackaged
solutions for an arch to pick from if it wants to.

> Finally: as I said, even for those 2 architectures, this may not be so
> bad because it is using a different spinlock for the fastpath as it is
> for the slowpath. So your uncontested, cache cold case will get a little
> slower, but the contested case could improve a lot (eg. I saw an order of
> magnitude improvement).

Agreed.  I can see why the spinlock implementation is bad on SMP.  By all means
change those cases, and reduce the spinlock implementation to an interrupt
disablement only version that may only be used on UP only.

> 32-bit machines might indeed overflow, but if it hasn't been a problem
> for i386 or (even 64-bit) powerpc yet, is it a real worry? 

It has happened, I believe.  People have tried having >32766 threads on a
32-bit box.  Mad it may be, but...

> This is what spinlocks and mutexes do, and they're much more common than
> rwsems. I'm just trying to make it consistent, and you can muck around
> with it all you want after that. It is actually very easy to inline
> things now, unlike before my patch.

My original stuff was very easy to inline until Ingo got hold of it.

> > Think about it.  This algorithm is only optimal where XADD is available.  If
> > you don't have XADD, but you do have LL/SC or CMPXCHG, you can do better.
> 
> You keep saying this too, and I have thought about it but I couldn't think
> of a much better way. I'm not saying you're wrong, but why don't you just
> tell me what that better way is?

Look at how the counter works in the spinlock case.  With the addition of an
extra flag in the counter to indicate there's stuff waiting on the queue, you
can manipulate the counter if it appears safe to do so, otherwise you have to
fall back to the slow path and take a spin lock.

Break the counter down like this:

0x  - not locked; queue empty
0x4000  - locked by writer; queue empty
0xc000  - locket by writer; queue occupied
0x0nnn  - n readers; queue empty
0x8nnn  - n readers; queue occupied

Now here's a rough guide as to how the main operations would work:

 (*) down_read of unlocked

cmpxchg(0 -> 1) -> 0 [okay - you've got the lock]

 (*) down_read of readlocked.

cmpxchg(0 -> 1) -> n [failed to get the lock]
do
  n = cmpxchg(old_n -> old_n+1)
until n == old_n

 (*) down_read of writelocked or contented readlocked.

cmpxchg(0 -> 1) -> 0x8000|n  [lock contended]
goto slowpath
  spinlock
  try to get lock again
  if still contended
mark counter contended
add to queue
spinunlock
sleep
  spinunlock

 (*) down_write of unlocked

cmpxchg(0 -> 0x4000) -> 0 [okay - you've got the lock]

 (*) down_write of locked, contended or otherwise

cmpxchg(0 -> 0x4000) -> nz [failed]
goto slowpath
  spinlock
  try to get lock again
  if still unavailable
mark counter contended
add to queue
spinunlock
sleep
  else
spinunlock

 (*) up_read

x = cmpxchg(1 -> 0)
if x == 0
  done
else
   do
 x = cmpxchg(old_x -> old_x-1)
   until x == old_x
   if old_x == 0x8000
 wake_up_writer

 (*) up_write

x = cmpxchg(0x8000 -> 0)
if x == 0
  done
else
  wake_up_readers

You can actually do better with LL/SC here because for the initial attempt with
CMPXCHG in each case you have to guess what the numbers will be.  Furthermore,
you might actually be able to do an "atomic increment or set contention flag"
operation.

Note that the contention flag may only be set or cleared in the slow path
whilst you are holding the spinlock.

Doing down_read and up_read with CMPXCHG is a pain.  XADD or LL/SC would be
better, and LOCK INC/ADD/DEC/SUB won't work.  You can't use XADD in down_*() as
you may not change the bottom part of the counter if you're going to end up
queuing.

Actually, looking at it, it might be better to have the counter start off at
0x8000 for "unlocked, no contention" and clear the no-contention flag when
you queue something.  That way you can check for the counter becoming 0 in the
up_*() functions as a trigger to go and invoke the slowpath.  Then you could
use LOCK DEC/SUB on i386 rather than XADD as you only need to check the Z flag.

Note there is a slight window whereby a reader can jump a writer that's
transiting betwee

Re: 68328serial & pm_register

2007-04-13 Thread Greg Ungerer



Hi Christoph,

Christoph Hellwig wrote:

68328serial is the last driver to call pm_register and thus using and
keeping alive the really old PM scheme.  Any chance to convert it over
to platform devices (which would also clean up a lot of the ifdef
mess in the driver), or simply rip out that rudimentary PM support?


I don't have any hardware that uses this, I can only compile test it.
The occassional patch I submit for it is just to keep it compiling.
But I am happy to take out what PM support it has.



On a less urgent basis, is there any chance to convert the driver
to use serial_core, which it doesn't despite living in drivers/serial?


I'd have to leave that to someone with hardware.

Regards
Greg



--

Greg Ungerer  --  Chief Software Dude   EMAIL: [EMAIL PROTECTED]
SnapGear -- a Secure Computing Company  PHONE:   +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Feature Request?] Inline compression of process core dumps

2007-04-13 Thread Christopher S. Aker


Alan Cox wrote:
Looking at the code, it seems to me that format_corename() is appending 
.pid, regardless if !core_uses_pid and corename[0]=='|', in which case 
it creates an invalid path for call_usermodehelper_pipe().


Bug in the code, or bug in my methods?


This looks somewhat better and might do the trick. Also fixes a very very
obscure security corner case. If you change core pattern to start with
the program name then the user can run a program called "|myevilhack" as
it stands. The patch checks for "|" in the pattern not the output and
doesn't nail a pid on to a piped name.




Works great now.  Queue this sucker up!

# cat /proc/sys/kernel/core_pattern
|/home/caker/bin/dumper.pl
# ./linux

Segmentation fault (core dumped)
# file /tmp/dumper.out
	/tmp/dumper.out: ELF 32-bit LSB core file Intel 80386, version 1 
(SYSV), SVR4-style


Thanks for everyone's help.

-Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] worker_thread: don't play with SIGCHLD

2007-04-13 Thread Oleg Nesterov

On 04/13, Eric W. Biederman wrote:
>
> Oleg Nesterov <[EMAIL PROTECTED]> writes:
> 
> > depends on Eric's
> >
> > kthread-dont-depend-on-work-queues-take-2.patch
> >
> > worker_thread() inherits ignored SIGCHLD from its parent, kthreadd.
> > We can remove unneeded do_sigaction().
> 
> Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]>
> 
> Looks good.
> 
> We could delete all of:
> 
> > set_user_nice(current, -5);
> > 
> > /* Block and flush all signals */
> > sigfillset(&blocked);
> > sigprocmask(SIG_BLOCK, &blocked, NULL);
> > flush_signals(current);

this one is already deleted in -mm...

> > /*
> >  * We inherited MPOL_INTERLEAVE from the booting kernel.
> >  * Set MPOL_DEFAULT to insure node local allocations.
> >  */
> > numa_default_policy();

Yes, I forgot about this!

> If we wanted to, as that all comes by default from kthreadd.

Thanks, will send an update.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 05/10] add "permit user mounts in new namespace" clone flag

2007-04-13 Thread Serge E. Hallyn

Quoting Miklos Szeredi ([EMAIL PROTECTED]):
> > Given the existence of shared subtrees allowing/denying this at the mount
> > namespace level is silly and wrong.
> > 
> > If we need more than just the filesystem permission checks can we
> > make it a mount flag settable with mount and remount that allows
> > non-privileged users the ability to create mount points under it
> > in directories they have full read/write access to.
> 
> OK, that makes sense.
> 
> > I don't like the use of clone flags for this purpose but in this
> > case the shared subtress are a much more fundamental reasons for not
> > doing this at the namespace level.
> 
> I'll drop the clone flag, and add a mount flag instead.
> 
> Thanks,
> Miklos

Makes sense, so then on login pam has to spawn a new user namespace and
construct a root fs with no shared subtrees and with the
user-mounts-allowed flag specified?

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Kernel-discuss] Re: [PATCH 3/7] [RFC] Battery monitoring class

2007-04-13 Thread Henrique de Moraes Holschuh

On Fri, 13 Apr 2007, Anton Vorontsov wrote:
> > With fixed-units files, having *_energy and *_capacity isn't too clear
> > either... Nor is it consistent with SBS, since SBS uses "capacity" to
> > refer to either energy or charge, depending on a units attribute.
> > 
> > As a compromise, how about using "energy" and "charge" for quantities,
> > and "charging" (i.e., a verb) when referring to the operation?
> 
> It would be great compromise! Please please please!

I can live with it, although I'd rather just use the units (zero margin of
error or confusion).

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/7] [RFC] Battery monitoring class

2007-04-13 Thread Anton Vorontsov

On Thu, Apr 12, 2007 at 03:25:03AM +0400, Anton Vorontsov wrote:
> Here is battery monitor class. According to first copyright string, we're
> maintaining it since 2003. I've took few days and cleaned it up to be
> more suitable for mainline inclusion.
> 
> It differs from battery class at git://git.infradead.org/battery-2.6.git:
> 
> * It's using external power kernel interface, i.e. does not fake external
>   powers as batteries. (Same thing David Woodhouse planed last year).
> 
> * It have predefined set of attributes, this eliminates code duplication
>   by battery drivers. And also gives opportunity to write emulation drivers
>   for legacy stuff (APM emulation driver follow).
> 
>   If driver can't afford some attribute, it will not appear in sysfs.
> 
> * It insists on reusing its predefined attributes *and* their units.
>   So, userspace getting expected values for any battery.
>   
>   Also common units is required for APM/ACPI emulation.
>   
>   Though our battery class insisting on re-usage, but not forces it. If some
>   battery driver can't convert its own raw values (can't imagine why), then
>   driver is free to implement its own attributes *and* additional _units
>   attribute. Though, this scheme is discouraged.
> 
> * LEDs support. Each battery register its trigger, and gadgets with LEDs
>   can quickly bind to battery-charging / battery-full triggers.
> 
> Here how it looks like from user space:
> 
> # ls /sys/class/battery/main-battery/
> capacity  max_capacity  max_voltage   min_current  power   subsystem  uevent
> current   max_current   min_capacity  min_voltage  status  temp   voltage
> # cat /sys/class/battery/main-battery/status
> Full
> # cat /sys/class/leds/h5400\:green-right/trigger
> none h5400-radio timer hwtimer main-battery-charging [main-battery-full]
> # cat /sys/class/leds/h5400\:green-right/brightness
> 255
> 

Changes:

- Cleanups based on comments from Randy Dunlap.

- Attribute creation scheme changed drastically. No more tons of

  macro-created functions. Compiled code should be much smaller.
  Also adding new "standard" attributes is trivial task now (matter of
  adding two lines, one in battery.c and another in battery.h).

- charge (as quantity) in mAh, energy in mWh.


I'll convert mXh to uXh a bit later, if there will no further objections
against uXh. Also I'd like to hear if there any objections on
mA/mV -> uA/uV conversion. I think we'd better keep all units at the
same order/precision.


Subject: [PATCH] [take2] Battery monitoring class


Signed-off-by: Anton Vorontsov <[EMAIL PROTECTED]>
---
 drivers/Kconfig   |2 +
 drivers/Makefile  |1 +
 drivers/battery/Kconfig   |   11 ++
 drivers/battery/Makefile  |1 +
 drivers/battery/battery.c |  290 +
 include/linux/battery.h   |  113 ++
 6 files changed, 418 insertions(+), 0 deletions(-)
 create mode 100644 drivers/battery/Kconfig
 create mode 100644 drivers/battery/Makefile
 create mode 100644 drivers/battery/battery.c
 create mode 100644 include/linux/battery.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index c546de3..c3a0038 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -56,6 +56,8 @@ source "drivers/w1/Kconfig"
 
 source "drivers/power/Kconfig"
 
+source "drivers/battery/Kconfig"
+
 source "drivers/hwmon/Kconfig"
 
 source "drivers/mfd/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index 2bdaae7..7cbfd37 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_RTC_LIB) += rtc/
 obj-$(CONFIG_I2C)  += i2c/
 obj-$(CONFIG_W1)   += w1/
 obj-$(CONFIG_EXTERNAL_POWER)   += power/
+obj-$(CONFIG_BATTERY)  += battery/
 obj-$(CONFIG_HWMON)+= hwmon/
 obj-$(CONFIG_PHONE)+= telephony/
 obj-$(CONFIG_MD)   += md/
diff --git a/drivers/battery/Kconfig b/drivers/battery/Kconfig
new file mode 100644
index 000..c386593
--- /dev/null
+++ b/drivers/battery/Kconfig
@@ -0,0 +1,11 @@
+
+menu "Battery support"
+
+config BATTERY
+   tristate "Battery monitoring support"
+   select EXTERNAL_POWER
+   help
+ Say Y here to enable generic battery status reporting in
+ the /sys filesystem.
+
+endmenu
diff --git a/drivers/battery/Makefile b/drivers/battery/Makefile
new file mode 100644
index 000..a2239cb
--- /dev/null
+++ b/drivers/battery/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_BATTERY)  += battery.o
diff --git a/drivers/battery/battery.c b/drivers/battery/battery.c
new file mode 100644
index 000..6c87fe3
--- /dev/null
+++ b/drivers/battery/battery.c
@@ -0,0 +1,290 @@
+/*
+ *  Universal battery monitor class
+ *
+ *  Copyright (c) 2007  Anton Vorontsov <[EMAIL PROTECTED]>
+ *  Copyright (c) 2004  Szabolcs Gyurko
+ *  Copyright (c) 2003  Ian Molton <[EMAIL PROTECTED]>
+ *
+ *  Modified: 2004, Oct Szabolcs Gyurko
+ *
+ *  You may use this code as per GPL version 2
+ *
+ * All v

Re: [PATCH 1/7] [RFC] External power framework

2007-04-13 Thread Anton Vorontsov

On Thu, Apr 12, 2007 at 03:24:46AM +0400, Anton Vorontsov wrote:
> External power framework - power supplies and power supplicants.
> 
> Supplicants (batteries so far) may ask to notify they when power supply
> arrive/gone. This framework used by battery class (next patches).
> 
> It's permitted for supply to be bound to several supplicants (think main
> and backup batteries).
> 
> It's also permitted for supplicants to consume power from several
> external supplies (say AC and USB).
> 
> Here is how it look like from userspace:
> 
>   # pwd
>   /sys/class/power_supply
>   # ls
>   ac  usb
>   # cat ac/online usb/online
>   1
>   0

Cleaned up version based on comments from Randy Dunlap.

Subject: [PATCH] [take2] External power framework


Signed-off-by: Anton Vorontsov <[EMAIL PROTECTED]>
---
 drivers/Kconfig|2 +
 drivers/Makefile   |1 +
 drivers/power/Kconfig  |   13 ++
 drivers/power/Makefile |1 +
 drivers/power/external_power.c |  326 
 include/linux/external_power.h |   54 +++
 6 files changed, 397 insertions(+), 0 deletions(-)
 create mode 100644 drivers/power/Kconfig
 create mode 100644 drivers/power/Makefile
 create mode 100644 drivers/power/external_power.c
 create mode 100644 include/linux/external_power.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 050323f..c546de3 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -54,6 +54,8 @@ source "drivers/spi/Kconfig"
 
 source "drivers/w1/Kconfig"
 
+source "drivers/power/Kconfig"
+
 source "drivers/hwmon/Kconfig"
 
 source "drivers/mfd/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index 3a718f5..2bdaae7 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -60,6 +60,7 @@ obj-$(CONFIG_I2O) += message/
 obj-$(CONFIG_RTC_LIB)  += rtc/
 obj-$(CONFIG_I2C)  += i2c/
 obj-$(CONFIG_W1)   += w1/
+obj-$(CONFIG_EXTERNAL_POWER)   += power/
 obj-$(CONFIG_HWMON)+= hwmon/
 obj-$(CONFIG_PHONE)+= telephony/
 obj-$(CONFIG_MD)   += md/
diff --git a/drivers/power/Kconfig b/drivers/power/Kconfig
new file mode 100644
index 000..17349c1
--- /dev/null
+++ b/drivers/power/Kconfig
@@ -0,0 +1,13 @@
+
+menu "External power support"
+
+config EXTERNAL_POWER
+   tristate "External power kernel interface"
+   help
+ Say Y here to enable kernel external power detection interface,
+ like AC or USB. Information also will exported to userspace via
+ /sys/class/external_power/ directory.
+
+ This interface is mandatory for battery class support.
+
+endmenu
diff --git a/drivers/power/Makefile b/drivers/power/Makefile
new file mode 100644
index 000..c303b45
--- /dev/null
+++ b/drivers/power/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_EXTERNAL_POWER)  += external_power.o
diff --git a/drivers/power/external_power.c b/drivers/power/external_power.c
new file mode 100644
index 000..310ea4b
--- /dev/null
+++ b/drivers/power/external_power.c
@@ -0,0 +1,326 @@
+/*
+ * Linux kernel interface for external power suppliers/supplicants
+ *
+ * Copyright (c) 2007  Anton Vorontsov <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct class *power_supply_class;
+
+static LIST_HEAD(supplicants);
+static struct rw_semaphore supplicants_sem;
+
+struct bound_supply {
+   struct power_supply *psy;
+   struct list_head node;
+};
+
+struct bound_supplicant {
+   struct power_supplicant *pst;
+   struct list_head node;
+};
+
+int power_supplicant_am_i_supplied(struct power_supplicant *pst)
+{
+   int ret = 0;
+   struct bound_supply *bpsy;
+
+   pr_debug("%s\n", __FUNCTION__);
+   down(&power_supply_class->sem);
+   list_for_each_entry(bpsy, &pst->bound_supplies, node) {
+   if (bpsy->psy->is_online(bpsy->psy)) {
+   ret = 1;
+   goto out;
+   }
+   }
+out:
+   up(&power_supply_class->sem);
+
+   return ret;
+}
+
+static void unbind_pst_from_psys(struct power_supplicant *pst)
+{
+   struct bound_supply *bpsy, *bpsy_tmp;
+   struct bound_supplicant *bpst, *bpst_tmp;
+
+   list_for_each_entry_safe(bpsy, bpsy_tmp, &pst->bound_supplies, node) {
+   list_for_each_entry_safe(bpst, bpst_tmp,
+   &bpsy->psy->bound_supplicants, node) {
+   if (bpst->pst == pst) {
+   list_del(&bpst->node);
+   kfree(bpst);
+   break;
+   }
+   }
+   list_del(&bpsy->node);
+   kfree(bpsy);
+   }
+
+   retur

Re: [PATCH 2/7] [RFC] Common power driver for Linux gadgets

2007-04-13 Thread Anton Vorontsov

On Thu, Apr 12, 2007 at 03:24:56AM +0400, Anton Vorontsov wrote:
> This driver used to stop code/logic duplication through different
> machines we porting at handhelds.org. pda_power register machs' power
> supplies, and will take care about notifying batteries about power
> changes through external power interface.
> 
> This driver should be suitable for almost every Linux gadget today.

Changes:

- implement timer, it's used for two purposes:
   1) on some machines you can't read is_{ac,usb}_online() values just
  after interrupt. Should wait a bit to read reliable values.
   2) irq debouncing

- cleanups


Subject: [PATCH] [take2] pda_power driver


Signed-off-by: Anton Vorontsov <[EMAIL PROTECTED]>
---
 drivers/power/Kconfig |8 ++
 drivers/power/Makefile|1 +
 drivers/power/pda_power.c |  231 +
 include/linux/pda_power.h |   25 +
 4 files changed, 265 insertions(+), 0 deletions(-)
 create mode 100644 drivers/power/pda_power.c
 create mode 100644 include/linux/pda_power.h

diff --git a/drivers/power/Kconfig b/drivers/power/Kconfig
index 17349c1..b87779e 100644
--- a/drivers/power/Kconfig
+++ b/drivers/power/Kconfig
@@ -10,4 +10,12 @@ config EXTERNAL_POWER
 
  This interface is mandatory for battery class support.
 
+config PDA_POWER
+   tristate "Generic PDA/phone power driver"
+   depends on EXTERNAL_POWER
+   help
+ Say Y here to enable generic power driver for PDAs and phones with
+ one or two external power supplies (AC/USB) connected to main and
+ backup batteries, and optional builtin charger.
+
 endmenu
diff --git a/drivers/power/Makefile b/drivers/power/Makefile
index c303b45..6f084e7 100644
--- a/drivers/power/Makefile
+++ b/drivers/power/Makefile
@@ -1 +1,2 @@
 obj-$(CONFIG_EXTERNAL_POWER)  += external_power.o
+obj-$(CONFIG_PDA_POWER)   += pda_power.o
diff --git a/drivers/power/pda_power.c b/drivers/power/pda_power.c
new file mode 100644
index 000..ae90bd7
--- /dev/null
+++ b/drivers/power/pda_power.c
@@ -0,0 +1,231 @@
+/*
+ * Common power driver for PDAs and phones with one or two external
+ * power supplies (AC/USB) connected to main and backup batteries,
+ * and optional builtin charger.
+ *
+ * Copyright 2007 Anton Vorontsov <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * include/linux/ioport.h does not provide flags for generic IRQ trigger
+ * types. So, we're using "ISA PnP IRQ specific bits", and converting them.
+ */
+static unsigned int get_irq_flags(struct resource *res)
+{
+   unsigned int flags = IRQF_DISABLED;
+
+   if (res->flags & IORESOURCE_IRQ_HIGHEDGE)
+   flags |= IRQF_TRIGGER_RISING;
+   if (res->flags & IORESOURCE_IRQ_LOWEDGE)
+   flags |= IRQF_TRIGGER_FALLING;
+   if (res->flags & IORESOURCE_IRQ_HIGHLEVEL)
+   flags |= IRQF_TRIGGER_HIGH;
+   if (res->flags & IORESOURCE_IRQ_LOWLEVEL)
+   flags |= IRQF_TRIGGER_LOW;
+   if (res->flags & IORESOURCE_IRQ_SHAREABLE)
+   flags |= IRQF_SHARED;
+
+   return flags;
+}
+
+static struct resource *ac_irq, *usb_irq;
+static struct pda_power_pdata *pdata;
+static struct timer_list isr_timer;
+
+static int pda_power_is_ac_online(struct power_supply *psy)
+{
+   return pdata->is_ac_online ? pdata->is_ac_online() : 0;
+}
+
+static int pda_power_is_usb_online(struct power_supply *psy)
+{
+   return pdata->is_usb_online ? pdata->is_usb_online() : 0;
+}
+
+static char *pda_power_supplied_to[] = {
+   "main-battery",
+   "backup-battery",
+};
+
+static struct power_supply pda_power_supplies[] = {
+   {
+   .name = "ac",
+   .type = "ac",
+   .supplied_to = pda_power_supplied_to,
+   .num_supplicants = ARRAY_SIZE(pda_power_supplied_to),
+   .is_online = pda_power_is_ac_online,
+   },
+   {
+   .name = "usb",
+   .type = "dc",
+   .supplied_to = pda_power_supplied_to,
+   .num_supplicants = ARRAY_SIZE(pda_power_supplied_to),
+   .is_online = pda_power_is_usb_online,
+   },
+};
+
+static void update_charger(void)
+{
+   if (!pdata->set_charge)
+   return;
+
+   if (pdata->is_ac_online && pdata->is_ac_online()) {
+   pr_debug("pda_power: charger on (AC)\n");
+   pdata->set_charge(PDA_POWER_CHARGE_AC);
+   }
+   else if (pdata->is_usb_online && pdata->is_usb_online()) {
+   pr_debug("pda_power: charger on (USB)\n");
+   pdata->set_charge(PDA_POWER_CHARGE_USB);
+   }
+   else {
+   pr_debug("pda_power: charger off\n");
+   pdata->set_charge(0);
+

Re: [PATCH 6/7] [RFC] ds2760 battery driver

2007-04-13 Thread Anton Vorontsov

On Thu, Apr 12, 2007 at 03:25:45AM +0400, Anton Vorontsov wrote:
> This is driver for batteries with ds2760 chip inside. Such batteries
> used in almost every HP iPaq and HTC PDAs/phones.

Changes:

- follow battery class changes (get rid of vast amount of macro-created
  functions).

- cleanups based on comments by Randy Dunlap.


Subject: [PATCH] [take2] ds2760 battery driver


Signed-off-by: Anton Vorontsov <[EMAIL PROTECTED]>
---
 drivers/battery/Kconfig  |7 +
 drivers/battery/Makefile |1 +
 drivers/battery/ds2760_battery.c |  493 ++
 include/linux/ds2760_battery.h   |   32 +++
 4 files changed, 533 insertions(+), 0 deletions(-)
 create mode 100644 drivers/battery/ds2760_battery.c
 create mode 100644 include/linux/ds2760_battery.h

diff --git a/drivers/battery/Kconfig b/drivers/battery/Kconfig
index c386593..0c14ae0 100644
--- a/drivers/battery/Kconfig
+++ b/drivers/battery/Kconfig
@@ -8,4 +8,11 @@ config BATTERY
  Say Y here to enable generic battery status reporting in
  the /sys filesystem.
 
+config BATTERY_DS2760
+   tristate "DS2760 battery driver (HP iPAQ & others)"
+   depends on BATTERY && W1
+   select W1_SLAVE_DS2760
+   help
+ Say Y here to enable support for batteries with ds2760 chip.
+
 endmenu
diff --git a/drivers/battery/Makefile b/drivers/battery/Makefile
index a2239cb..9902513 100644
--- a/drivers/battery/Makefile
+++ b/drivers/battery/Makefile
@@ -1 +1,2 @@
 obj-$(CONFIG_BATTERY)  += battery.o
+obj-$(CONFIG_BATTERY_DS2760)   += ds2760_battery.o
diff --git a/drivers/battery/ds2760_battery.c b/drivers/battery/ds2760_battery.c
new file mode 100644
index 000..d996994
--- /dev/null
+++ b/drivers/battery/ds2760_battery.c
@@ -0,0 +1,493 @@
+/*
+ * Driver for batteries with DS2760 chips inside.
+ *
+ * Copyright (c) 2007 Anton Vorontsov
+ *   2004 Matt Reimer
+ *   2004 Szabolcs Gyurko
+ *
+ * Use consistent with the GNU GPL is permitted,
+ * provided that this copyright notice is
+ * preserved in its entirety in all copies and derived works.
+ *
+ * Author:  Anton Vorontsov <[EMAIL PROTECTED]>
+ *  February 2007
+ *
+ *  Matt Reimer <[EMAIL PROTECTED]>
+ *  April 2004, 2005
+ *
+ *  Szabolcs Gyurko <[EMAIL PROTECTED]>
+ *  September 2004
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../w1/w1.h"
+#include "../w1/slaves/w1_ds2760.h"
+
+struct ds2760_device_info {
+   struct battery_info *bi;
+
+   /* DS2760 data, valid after calling ds2760_battery_read_status() */
+   unsigned long update_time;  /* jiffies when data read */
+   char raw[DS2760_DATA_SIZE]; /* raw DS2760 data */
+   int voltage_raw;/* units of 4.88 mV */
+   int voltage_mV; /* units of mV */
+   int current_raw;/* units of 0.625 mA */
+   int current_mA; /* units of mA */
+   int accum_current_raw;  /* units of 0.25 mAh */
+   int accum_current_mAh;  /* units of mAh */
+   int temp_raw;   /* units of 0.125 C */
+   int temp_C; /* units of 0.1 C */
+   int rated_capacity; /* units of mAh */
+   int rem_capacity;   /* percentage */
+   int full_active_mAh;/* units of mAh */
+   int empty_mAh;  /* units of mAh */
+   int life_min;   /* units of minutes */
+   int charge_status;  /* BATTERY_STATUS_* */
+
+   int full_counter;
+   struct battery batt;
+   struct device *w1_dev;
+   struct workqueue_struct *monitor_wqueue;
+   struct delayed_work monitor_work;
+};
+
+static unsigned int cache_time = 1000;
+module_param(cache_time, uint, 0644);
+MODULE_PARM_DESC(cache_time, "cache time in milliseconds");
+
+/* Some batteries have their rated capacity stored a N * 10 mAh, while
+ * others use an index into this table. */
+static int rated_capacities[] = {
+   0,
+   920,/* Samsung */
+   920,/* BYD */
+   920,/* Lishen */
+   920,/* NEC */
+   1440,   /* Samsung */
+   1440,   /* BYD */
+   1440,   /* Lishen */
+   1440,   /* NEC */
+   2880,   /* Samsung */
+   2880,   /* BYD */
+   2880,   /* Lishen */
+   2880/* NEC */
+};
+
+/* array is level at temps 0C, 10C, 20C, 30C, 40C
+ * temp is in Celsius */
+static int battery_interpolate(int array[], int temp)
+{
+   int index, dt;
+
+   if (temp <= 0)
+   return array[0];
+   if (temp >= 40)
+   return array[4];
+
+   index = temp / 10;
+   dt= temp % 10;
+
+   return array[index] + (((array[index + 1] - array[index]) * dt) / 10);
+}
+
+static int ds2760_battery_read_status(struct ds2760_device_in

Re: [PATCH 7/7] [RFC] APM emulation driver for class batteries

2007-04-13 Thread Anton Vorontsov

On Thu, Apr 12, 2007 at 03:26:44AM +0400, Anton Vorontsov wrote:
> It finds battery with "main_battery" flag set (or with max_capacity if no
> batteries marked as main), and converts battery values to APM form.

Changes:

- follows battery class changes

- minor cleanups


Subject: [PATCH] [take2] APM emulation driver for class batteries


Signed-off-by: Anton Vorontsov <[EMAIL PROTECTED]>
---
 drivers/battery/Kconfig |7 +++
 drivers/battery/Makefile|1 +
 drivers/battery/apm_power.c |  122 +++
 3 files changed, 130 insertions(+), 0 deletions(-)
 create mode 100644 drivers/battery/apm_power.c

diff --git a/drivers/battery/Kconfig b/drivers/battery/Kconfig
index 0c14ae0..bbf8283 100644
--- a/drivers/battery/Kconfig
+++ b/drivers/battery/Kconfig
@@ -15,4 +15,11 @@ config BATTERY_DS2760
help
  Say Y here to enable support for batteries with ds2760 chip.
 
+config APM_POWER
+   tristate "APM emulation"
+   depends on BATTERY && APM
+   help
+ Say Y here to enable support APM status emulation using
+ battery class devices.
+
 endmenu
diff --git a/drivers/battery/Makefile b/drivers/battery/Makefile
index 9902513..cea5807 100644
--- a/drivers/battery/Makefile
+++ b/drivers/battery/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_BATTERY)  += battery.o
 obj-$(CONFIG_BATTERY_DS2760)   += ds2760_battery.o
+obj-$(CONFIG_APM_POWER)+= apm_power.o
diff --git a/drivers/battery/apm_power.c b/drivers/battery/apm_power.c
new file mode 100644
index 000..367a17d
--- /dev/null
+++ b/drivers/battery/apm_power.c
@@ -0,0 +1,122 @@
+/*
+ * Copyright (c) 2007 Eugeny Boger
+ *
+ * Use consistent with the GNU GPL is permitted,
+ * provided that this copyright notice is
+ * preserved in its entirety in all copies and derived works.
+ */
+
+#include 
+#include 
+#include 
+
+#define BATTERY_PROP(bat, prop) ({ \
+   void *value = bat->get_property(bat, BATTERY_PROP_##prop); \
+   value ? *(int*)value : 0;  \
+})
+
+#define MBATTERY_PROP(prop) BATTERY_PROP(main_battery, prop)
+
+static struct battery *main_battery;
+
+static void (*old_apm_get_power_status)(struct apm_power_info*);
+
+static void apm_battery_find_main_battery(void)
+{
+   struct device *dev;
+   struct battery *bat, *batm;
+   int max_charge = 0;
+
+   main_battery = NULL;
+   batm = NULL;
+   list_for_each_entry(dev, &battery_class->devices, node) {
+   bat = dev_get_drvdata(dev);
+   /* If none of battery devices cantains 'main_battery' flag,
+  choice one with max CHARGE */
+   if (BATTERY_PROP(bat, MAX_CHARGE) > max_charge) {
+   batm = bat;
+   max_charge = BATTERY_PROP(bat, MAX_CHARGE);
+   }
+
+   if (bat->main_battery)
+   main_battery = bat;
+   }
+   if (!main_battery)
+   main_battery = batm;
+}
+
+static void apm_battery_apm_get_power_status(struct apm_power_info *info)
+{
+   int bat_current;
+
+   down(&battery_class->sem);
+   apm_battery_find_main_battery();
+   if (!main_battery) {
+   up(&battery_class->sem);
+   return;
+   }
+
+   if (MBATTERY_PROP(STATUS) == BATTERY_STATUS_FULL)
+   info->battery_life = 100;
+   else if (MBATTERY_PROP(MAX_CHARGE) - MBATTERY_PROP(MIN_CHARGE))
+   info->battery_life = ((MBATTERY_PROP(CHARGE) -
+  MBATTERY_PROP(MIN_CHARGE)) * 100) /
+(MBATTERY_PROP(MAX_CHARGE) -
+ MBATTERY_PROP(MIN_CHARGE));
+   else
+   info->battery_life = -1;
+
+   if ((MBATTERY_PROP(STATUS) == BATTERY_STATUS_CHARGING) ||
+   (MBATTERY_PROP(STATUS) == BATTERY_STATUS_NOT_CHARGING) ||
+   (MBATTERY_PROP(STATUS) == BATTERY_STATUS_FULL))
+   info->ac_line_status = APM_AC_ONLINE;
+   else
+   info->ac_line_status = APM_AC_OFFLINE;
+
+   if (MBATTERY_PROP(STATUS) == BATTERY_STATUS_CHARGING)
+   info->battery_status = APM_BATTERY_STATUS_CHARGING;
+   else {
+   if (info->battery_life > 50)
+   info->battery_status = APM_BATTERY_STATUS_HIGH;
+   else if (info->battery_life > 5)
+   info->battery_status = APM_BATTERY_STATUS_LOW;
+   else
+   info->battery_status = APM_BATTERY_STATUS_CRITICAL;
+   }
+   info->battery_flag = info->battery_status;
+
+   bat_current = MBATTERY_PROP(CURRENT);
+   if (bat_current)
+   info->time = ((MBATTERY_PROP(CHARGE) -
+ MBATTERY_PROP(MIN_CHARGE)) * 60) /
+bat_current;
+   else
+   info->time = -1;
+
+

[PATCH] worker_thread: don't play with SIGCHLD and numa policy

2007-04-13 Thread Oleg Nesterov

depends on Eric's

kthread-dont-depend-on-work-queues-take-2.patch

worker_thread() inherits ignored SIGCHLD and numa_default_policy() from its
parent, kthreadd. No need to setup this again.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>
Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]>

--- OLD/kernel/workqueue.c~wt   2007-04-05 12:20:35.0 +0400
+++ OLD/kernel/workqueue.c  2007-04-13 17:43:23.0 +0400
@@ -289,23 +289,10 @@ static int worker_thread(void *__cwq)
 {
struct cpu_workqueue_struct *cwq = __cwq;
DEFINE_WAIT(wait);
-   struct k_sigaction sa;
 
if (!cwq->wq->freezeable)
current->flags |= PF_NOFREEZE;
 
-   /*
-* We inherited MPOL_INTERLEAVE from the booting kernel.
-* Set MPOL_DEFAULT to insure node local allocations.
-*/
-   numa_default_policy();
-
-   /* SIG_IGN makes children autoreap: see do_notify_parent(). */
-   sa.sa.sa_handler = SIG_IGN;
-   sa.sa.sa_flags = 0;
-   siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
-   do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
-
for (;;) {
if (cwq->wq->freezeable)
try_to_freeze();

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

retry [PATCH] partition : add support for sysv68 partitions

2007-04-13 Thread Philippe De Muyter

Hi all,

Add support for the Motorola sysv68 disk partition table (slices in motorola
doc).

Signed-off-by: Philippe De Muyter <[EMAIL PROTECTED]>

diff -r 1b54f1d81bc5 fs/partitions/Kconfig
--- a/fs/partitions/Kconfig Thu Apr 12 15:44:52 2007 -0700
+++ b/fs/partitions/Kconfig Fri Apr 13 15:51:58 2007 +0200
@@ -236,3 +236,12 @@ config EFI_PARTITION
help
  Say Y here if you would like to use hard disks under Linux which
  were partitioned using EFI GPT.
+
+config SYSV68_PARTITION
+   bool "SYSV68 partition table support" if PARTITION_ADVANCED
+   default y if M68K
+   help
+ Say Y here if you would like to be able to read the hard disk
+ partition table format used by Motorola Delta machines (using
+ sysv68).
+ Otherwise, say N.
diff -r 1b54f1d81bc5 fs/partitions/Makefile
--- a/fs/partitions/MakefileThu Apr 12 15:44:52 2007 -0700
+++ b/fs/partitions/MakefileFri Apr 13 15:51:58 2007 +0200
@@ -17,3 +17,4 @@ obj-$(CONFIG_IBM_PARTITION) += ibm.o
 obj-$(CONFIG_IBM_PARTITION) += ibm.o
 obj-$(CONFIG_EFI_PARTITION) += efi.o
 obj-$(CONFIG_KARMA_PARTITION) += karma.o
+obj-$(CONFIG_SYSV68_PARTITION) += sysv68.o
diff -r 1b54f1d81bc5 fs/partitions/check.c
--- a/fs/partitions/check.c Thu Apr 12 15:44:52 2007 -0700
+++ b/fs/partitions/check.c Fri Apr 13 15:51:58 2007 +0200
@@ -34,6 +34,7 @@
 #include "ultrix.h"
 #include "efi.h"
 #include "karma.h"
+#include "sysv68.h"
 
 #ifdef CONFIG_BLK_DEV_MD
 extern void md_autodetect_dev(dev_t dev);
@@ -104,6 +105,9 @@ static int (*check_part[])(struct parsed
 #endif
 #ifdef CONFIG_KARMA_PARTITION
karma_partition,
+#endif
+#ifdef CONFIG_SYSV68_PARTITION
+   sysv68_partition,
 #endif
NULL
 };
diff -r 1b54f1d81bc5 fs/partitions/sysv68.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/fs/partitions/sysv68.cFri Apr 13 15:51:58 2007 +0200
@@ -0,0 +1,92 @@
+/*
+ *  fs/partitions/sysv68.c
+ *
+ *  Copyright (C) 2007 Philippe De Muyter <[EMAIL PROTECTED]>
+ */
+
+#include "check.h"
+#include "sysv68.h"
+
+/*
+ * Volume ID structure: on first 256-bytes sector of disk
+ */
+
+struct volumeid {
+   u8  vid_unused[248];
+   u8  vid_mac[8]; /* ASCII string "MOTOROLA" */
+};
+
+/*
+ * config block: second 256-bytes sector on disk
+ */
+
+struct dkconfig {
+   u8  ios_unused0[128];
+   __be32  ios_slcblk; /* Slice table block number */
+   __be16  ios_slccnt; /* Number of entries in slice table */
+   u8  ios_unused1[122];
+};
+
+/*
+ * combined volumeid and dkconfig block
+ */
+
+struct dkblk0 {
+   struct volumeid dk_vid;
+   struct dkconfig dk_ios;
+};
+
+/*
+ * Slice Table Structure
+ */
+
+struct slice {
+   __be32  nblocks;/* slice size (in blocks) */
+   __be32  blkoff; /* block offset of slice */
+};
+
+
+int sysv68_partition(struct parsed_partitions *state, struct block_device 
*bdev)
+{
+   int i, slices;
+   int slot = 1;
+   Sector sect;
+   unsigned char *data;
+   struct dkblk0 *b;
+   struct slice *slice;
+
+   data = read_dev_sector(bdev, 0, §);
+   if (!data)
+   return -1;
+
+   b = (struct dkblk0 *) data;
+   if (memcmp(b->dk_vid.vid_mac, "MOTOROLA", sizeof(b->dk_vid.vid_mac))) {
+   put_dev_sector(sect);
+   return 0;
+   }
+   slices = be16_to_cpu(b->dk_ios.ios_slccnt);
+   i = be32_to_cpu(b->dk_ios.ios_slcblk);
+   put_dev_sector(sect);
+
+   data = read_dev_sector(bdev, i, §);
+   if (!data)
+   return -1;
+
+   slices -= 1; /* last slice is the whole disk */
+   printk("sysV68: %s(s%u)", state->name, slices);
+   slice = (struct slice *)data;
+   for (i = 0 ; i < slices; i++, slice++) {
+   if (slot == state->limit)
+   break;
+   if (be32_to_cpu(slice->nblocks)) {
+   put_partition(state, slot,
+   be32_to_cpu(slice->blkoff),
+   be32_to_cpu(slice->nblocks));
+   printk("(s%u)", i);
+   }
+   slot++;
+   }
+   printk("\n");
+   put_dev_sector(sect);
+   return 1;
+}
diff -r 1b54f1d81bc5 fs/partitions/sysv68.h
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/fs/partitions/sysv68.hFri Apr 13 15:51:58 2007 +0200
@@ -0,0 +1,1 @@
+extern int sysv68_partition(struct parsed_partitions *state, struct 
block_device *bdev);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/8] Add container pointer on struct page

2007-04-13 Thread Jean-Pierre Dion


Hi Pavel,

I have been implied in the work for the
memory controller of res groups a few months ago.

I see that you propose to modify the struct
page to point to rss container struct.
This has made some debate because of the struct
page size increase, but this allows a quicker
scan to reclaim pages (I mean having per-container
lists of active/inactive pages).
We (here at Bull and others) proposed this implementation
for res groups and I am interested in knowing
if this has a chance of being accepted today (hope so).
I know this uses memory for internal management
and increases a lot the memory size used for
a large memory configuration, but in that case
we have lot of memory, so where is the issue ?
We tested this on a 28 GB server and it worked.
Also we can use larger page size to reduce
the overhead, and I believe this makes sense
on large servers with big memory.

So we balance between using more memory internally
and so getting faster access to pages for reclaim,
or do nothing. ;-)


jean-pierre


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] generic rwsems

2007-04-13 Thread Nick Piggin

On Fri, Apr 13, 2007 at 02:31:52PM +0100, David Howells wrote:
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > The other way happens to be better for everyone else, which is why I
> > think your suggestion to instead move everyone to the spinlock version
> > was weird.
> 
> No, you misunderstand me.  My preferred solution is to leave it up to the arch
> and not to make it generic, though I'm not averse to providing some 
> prepackaged
> solutions for an arch to pick from if it wants to.

Just doesn't seem to be much payoff. I know you didn't think there is
anything wrong with 2 different impleemtnations and hundreds of lines
of arch specific assembly, but there is really little gain. Sure you
might be able to optimise a few cycles off the i386 asm, but damn I
just blitzed that sort of improvement on x86-64... and I still doubt
it will be very noticable because rwsems don't get called like
spinlocks.

> > Finally: as I said, even for those 2 architectures, this may not be so
> > bad because it is using a different spinlock for the fastpath as it is
> > for the slowpath. So your uncontested, cache cold case will get a little
> > slower, but the contested case could improve a lot (eg. I saw an order of
> > magnitude improvement).
> 
> Agreed.  I can see why the spinlock implementation is bad on SMP.  By all 
> means
> change those cases, and reduce the spinlock implementation to an interrupt
> disablement only version that may only be used on UP only.

So you missed my point about this above. If your UP atomic ops are
slower than interrupt disabling, then implement the damn things using
interrupt disabling instead of whatever it is you are using that is
slower! ;)

> > 32-bit machines might indeed overflow, but if it hasn't been a problem
> > for i386 or (even 64-bit) powerpc yet, is it a real worry? 
> 
> It has happened, I believe.  People have tried having >32766 threads on a
> 32-bit box.  Mad it may be, but...

Anyway, I doubt all the 32-bit archs using atomics would convert to the
slower spinlocks, so maybe this just has to be a known issue.

> > This is what spinlocks and mutexes do, and they're much more common than
> > rwsems. I'm just trying to make it consistent, and you can muck around
> > with it all you want after that. It is actually very easy to inline
> > things now, unlike before my patch.
> 
> My original stuff was very easy to inline until Ingo got hold of it.

I agree that all the locking has turned pretty messy, but that isn't
my fault.

> > > Think about it.  This algorithm is only optimal where XADD is available.  
> > > If
> > > you don't have XADD, but you do have LL/SC or CMPXCHG, you can do better.
> > 
> > You keep saying this too, and I have thought about it but I couldn't think
> > of a much better way. I'm not saying you're wrong, but why don't you just
> > tell me what that better way is?
> 
> Look at how the counter works in the spinlock case.  With the addition of an
> extra flag in the counter to indicate there's stuff waiting on the queue, you
> can manipulate the counter if it appears safe to do so, otherwise you have to
> fall back to the slow path and take a spin lock.

[snip]

Ah, thanks. Yeah actually I remember you describing this at LCA, so I
apologise for saying you didn't ;)

Really, that isn't going to do much for performance (nothing as dramatic
as the x86-64 spinlock->atomic conversion). However it will reduce the
lock size by 8 bytes on 64-bit and fix the overflow on 32-bit...

So why don't we implement this as the generic version? UP archs won't
care because atomic_cmpxchg is generally just an interrupt disable,
similarly for sparc and parisc. Most others except x86 do atomic_add_return
with llsc or cas anyway, and even if the cmpxchg is a tiny bit slower for
x86, at least it should be much faster than the spinlock version for
x86-64, and will solve the overflow for i386. 

What do you say?

> > > similarly if you are using a UP-compiled kernel.
> > 
> > Then your UP-compiled kernel's atomic ops are suboptimal, not the rwsem
> > implementation.
> 
> That's usually a function of the CPU designer.

I mean your atomic_xxx functions are suboptimal for that design of CPU.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: How should an exit routine wait for release() callbacks?

2007-04-13 Thread Markus Rechberger


Cornelia Huck wrote:

On Fri, 13 Apr 2007 13:42:04 +0200,
"Markus Rechberger" <[EMAIL PROTECTED]> wrote:

  

seems like you have the same problem as the dvb framework has/had.

http://mcentral.de/hg/~mrec/v4l-dvb-stable

The last 3 changesets do the trick to not oops, it will delay the 
deinitialization of the device till the last user closed the device node.



Probably dumb question (since I'm not at all familiar with the dvb
code): Isn't that a different race you're solving there? I don't see
any driver core objects involved (except class devices created by
class_device_create, which obviously don't have the release function
problem). This looks more like a race of "we want an object to go
away, but a user still has a file open" (which would be similar to the
kobject<->sysfs lifetime rules issues, where work is currently ongoing).

  
most dvb usb drivers call the device node unregistration when a device 
gets unplugged (when
At this time the filehandle can still be open, the patch on that site 
sets a flag that disallows
any further access to the device node (in the DVB framework there are 4 
of such nodes)
This can happen any time, so while someone is reading or accessing the 
device some structures

might have gone away already and this could cause an oops.
The problem of the DVB framework is file operation related, the last 
user calls fops_put on the existing
structure and sets the pointer to NULL before it wakes up the other 
function which frees the file operation

structure.
In Alan's case isn't there any users flag available that shows that the 
structure is still beeing accessed?
If that would be the case he could set a flag when he enters my_exit 
which would disable access to all other
functions by returning an error value at the beginning of the other 
functions, the only way out would be
to call my_release for existing users and wake up my_exit when the last 
reference to that structure is gone.


Some more information about the whole driver/scenario would be helpful.

Markus

--
  |   AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System|  Register Court Dresden: HRA 4896
Research  |  General Partner authorized to represent:
Center| AMD Saxony LLC (Wilmington, Delaware, US)
  | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 311 matches

Mail list logo