date:20071130

Re: VMware on 2.6.34: patches to enable vmmon and vmet to build/run

2007-11-30 Thread Jeff Chua

On Dec 1, 2007 3:21 AM, Mark Lord <[EMAIL PROTECTED]> wrote:
> I've hacked my copy of VMware-6.01 to work with kernel 2.6.24-rc*,
> and dumped my patches for vmmon and vmnet onto my server at:

Thank you! Now, I one step closer to 2.6.24.

Wonder anyone has a patch for vpnclient-linux-x86_64-4.8.00.0490-k9 for 2.6.24?

> Feel free to use them or ignore them.  Flames to /dev/null

Totally agree with that.

Thank you!

Jeff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: cpu accounting controller (V2)

2007-11-30 Thread Paul Menage

Hi Vatsa,

Thanks, this looks pretty good.

On Nov 30, 2007 4:42 AM, Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:
>
> - Removed load average information. I felt it needs more thought (esp
>   to deal with SMP and virtualized platforms) and can be added for
>   2.6.25 after more discussions.

The "load" value was never a load average, it was just a count of the
% cpu time used in the previous fixed window of time, updated at the
end of each window.

Maybe we can instead do something based tracking the length of the run
queue for the cgroup?

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2.6.24-rc3] rtc-cmos alarm acts as oneshot

2007-11-30 Thread Ingo Molnar


* David Brownell <[EMAIL PROTECTED]> wrote:

> Start making the rtc-cmos alarm act more like a oneshot alarm by 
> disabling that alarm after its IRQ fires.  (ACPI hooks are also 
> needed.)
> 
> The Linux RTC framework has previously been a bit vague in this area, 
> but any other behavior is problematic and not very portable.  RTCs 
> with full -MM-DD HH:MM[:SS] alarms won't have a problem here.  
> Only ones with partial match criteria, with the most visible example 
> being the PC RTC, get confused.  (Because the criteria will match 
> repeatedly.)
> 
> Update comments relating to that oneshot behavior and timezone 
> handling. (Timezones are another issue that's mostly visible with 
> rtc-cmos.  That's because PCs often dual-boot MS-Windows, which likes 
> its RTC to match local wall-clock time instead of UTC.)

Cool. I'm still wondering about:

  http://bugzilla.kernel.org/show_bug.cgi?id=7014

basically, we had universally working /dev/rtc before, now it appears we 
dont have it anymore. Or were the problems in this bugzilla present with 
the old code too?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/2] x86, ptrace: support for branch trace store(BTS)

2007-11-30 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > Our debugger team has a prototype implementation for their debugger. 
> > But that will not be available for some time.
> > 
> > I hope that we get gdb support, soon, but that would take a while if 
> > I had to do it.
> 
> i'm wondering what the main use-case would be then, and what the gdb 
> folks think about the current API. (Roland?)

here's a forwarded mail from Roland about the patch and APIs. (and i 
hope that now i can stop playing the middleman, with everyone Cc:-ed :)

>
From: Roland McGrath <[EMAIL PROTECTED]>
Subject: Re: [patch][v2] x86, ptrace: support for branch trace store(BTS)]

Cool.  It's been on my list to look into exposing those features 
somehow. I hadn't planned on doing it until after the utrace stuff 
settles and there is a more coherent interface context in which to do 
it.

If they are tackling the MSR hacking and context switch and so forth, 
I'd like to see them start out by just adding block-step 
(debugctlmsr.btf) with the PTRACE_SINGLEBLOCK interface as ia64 has.  
That should lay some of the same groundwork needed here, but is much 
simpler.

I am not really in favor of this new ptrace interface.  I think they 
should look around across arch's and think about sane general-purpose 
interfaces for features of this kind that might be built with some 
commonality across machines.  Also do it in a layered way from 
low-level, with something usable for kernel-mode too.  The discussion 
Alan Stern and I had on LKML that started as kwatch and became 
hw_breakpoint is an example of how I would go at this set of features 
too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: What can we do to get ready for memory controller merge in 2.6.25

2007-11-30 Thread Paul Menage

On Nov 29, 2007 6:11 PM, Nick Piggin <[EMAIL PROTECTED]> wrote:
> And also some
> results or even anecdotes of where this is going to be used would be
> interesting...

We want to be able to run multiple isolated jobs on the same machine.
So being able to limit how much memory each job can consume, in terms
of anonymous memory and page cache, are useful. I've not had much time
to look at the patches in great detail, but they seem to provide a
sensible way to assign and enforce static limits on a bunch of jobs.

Some of our requirements are a bit beyond this, though:

In our experience, users are not good at figuring out how much memory
they really need. In general they tend to massively over-estimate
their requirements. So we want some way to determine how much of its
allocated memory a job is actively using, and how much could be thrown
away or swapped out without bothering the job too much.

Of course, the definition of "actve use" is tricky - one possibility
that we're looking at is "has been accessed within the last N
seconds", where N can be configured appropriately for different jobs
depending on the job's latency requirements. Active use should also be
reported for pages that can't be easily freed quickly, e.g. mlocked or
dirty pages, or anon pages on a swapless system. Inactive pages should
be easily freeable, and be the first ones to go in the event of memory
pressure. (From a scheduling point of view we can treat them as free
memory, and schedule more jobs on the machine)

The existing active/inactive distinction doesn't really capture this,
since it's relative rather than absolute.

We want to be able to overcommit a machine, so the sums of the cgroup
memory limits can add up to more than the total machine memory. So we
need control over what happens when there's global memory pressure,
and a way to ensure that the low-latency jobs don't get bogged down in
reclaim (or OOM) due to the activity of batch jobs.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: git guidance

2007-11-30 Thread Al Boldi

Jing Xue wrote:
> Quoting Al Boldi <[EMAIL PROTECTED]>:
> > Sure, browsing is the easy part, but Version Control starts when things
> > become writable.
>
> But how is that supposed to work?  What happens when you make some
> changes to a file and save it?  Do you want the "git file system" to
> commit it right aways or wait until you to issue a "commit" command?
> The first behavior would obviously be wrong, and the second would make
> the "file system" not operationally transparent anyways. Right?

Not sure what you mean by operationally transparent?  It would be transparent 
for the updating client,  and the rest of the git-users would need to wait 
for the commit from the updating client; which is ok, as this transparency 
is not meant to change the server-side git-update semantic.

Linus Torvalds wrote:
> On Thu, 29 Nov 2007, Jing Xue wrote:
> > By the way, the only SCM I have worked with that tries to mount its
> > repository (or a view on top of it) as a file system is ClearCase with
> > its dynamic views. And, between the buggy file system implementation,
> > the intrusion on workflow, and the lack of scalability, at least in
> > the organization I worked for, it turned out to be a horrible,
> > horrible, horrible idea.

Judging an idea, based on a flawed implementation, doesn't prove that the 
idea itself is flawed.

And...
> Doing a read-only mount setup tends to be pretty easy, but it's largely
> pointless except for specialty uses. Ie it's obviously not useful for
> actual *development*, but it can be useful for some other cases.
>
> For example, a read-only revctrl filesystem can be a _very_ useful thing
> for test-farms, where you may have hundreds of clients that run tests on
> possibly different versions at the same time. In situations like that, the
> read-only mount can actually often be done as a user-space NFS server on
> some machine.
>
> The advantage is that you don't need to export close to infinite amounts
> of versions from a "real" filesystem, or make the clients have their own
> copies. And if you do it as a user-space NFS server (or samba, for that
> matter), it's even portable, unlike many other approaches. The read-only
> part also makes 99% of all the complexity go away, and it turns out to be
> a fairly easy exercise to do.
>
> So I don't think the filesystem approach is _wrong_ per se. But yes, doing
> it read-write is almost invariably a big mistake. On operatign systems
> that support a "union mount" approach, it's likely much better to have a
> read-only revctl thing, and then over-mount a regular filesystem on top of
> it.

You could probably do that, or you could instead use cp -al.  Both would 
require some hacks to allow some basic version control.

> Trying to make it read-write from the revctl engine standpoint is almost
> certainly totally insane.

Sure, you wouldn't want to change the git-engine update semantics, as that 
sits on the server and handles all users.  But what the git model is 
currently missing is a client manager.  Right now, this is being worked 
around by replicating the git tree on the client, which still doesn't 
provide the required transparency.

IOW, git currently only implements the server-side use-case, but fails to 
deliver on the client-side.  By introducing a git-client manager that 
handles the transparency needs of a single user, it should be possible to 
clearly isolate update semantics for both the client and the server, each 
handling their specific use-case.

Thanks!

--
Al

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha

2007-11-30 Thread Michael Cree


Bob Tracy wrote:

Andrew Morton wrote:

Could be something change in sysfs.  Please double-check the config
options, make sure that something important didn't get disabled.


 Here's
hoping someone else is seeing this or can replicate it in the meantime.


Snap.

2.6.24-rc2 works fine.   2.6.24-rc3 boots on Alpha but once /dev is 
populated no partitions of the scsi sub-system are seen.  Looks like ide 
sub-system similarly affected.


Managed to get boot log.  Follows below (with output of various /proc info).

Cheerz
Michael.


Linux version 2.6.24-rc3 ([EMAIL PROTECTED]) (gcc version 4.1.3 20071019 
(prerelease) (Debian 4.1.2-17)) #1 Mon Nov 26 19:28:58 NZDT 2007

Booting on Tsunami variation Monet using machine vector Monet from SRM
Major Options: EV67 LEGACY_START VERBOSE_MCHECK
Command line: ro root=/dev/sda3 console=ttyS0
memcluster 0, usage 1, start0, end  215
memcluster 1, usage 0, start  215, end   131062
memcluster 2, usage 1, start   131062, end   131072
freeing pages 215:384
freeing pages 930:131062
reserving pages 930:932
4096K Bcache detected; load hit latency 21 cycles, load miss latency 127 
cycles

Console graphics on hose 0
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 130167
Kernel command line: ro root=/dev/sda3 console=ttyS0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Using epoch = 2000
Turning on RTC interrupts.
Console: colour VGA+ 80x25
console [ttyS0] enabled
Dentry cache hash table entries: 131072 (order: 7, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 6, 524288 bytes)
Memory: 1030896k/1048496k available (2786k kernel code, 15216k reserved, 
370k data, 168k init)

Mount-cache hash table entries: 512
net_namespace: 120 bytes
NET: Registered protocol family 16
PCI: Bridge: 0001:01:08.0
  IO window: 8000-8fff
  MEM window: 0900-090f
  PREFETCH window: disabled.
SMC37c669 Super I/O Controller found @ 0x3f0
Linux Plug and Play Support v0.97 (c) Adam Belay
SCSI subsystem initialized
NET: Registered protocol family 2
IP route cache hash table entries: 8192 (order: 3, 65536 bytes)
TCP established hash table entries: 32768 (order: 6, 524288 bytes)
TCP bind hash table entries: 32768 (order: 5, 262144 bytes)
TCP: Hash tables configured (established 32768 bind 32768)
TCP reno registered
srm_env: version 0.0.6 loaded successfully
io scheduler noop registered
io scheduler cfq registered (default)
tridentfb: Trident framebuffer 0.7.8-NEWAPI initializing
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
rtc: SRM (post-2000) epoch (2000) detected
Real Time Clock Driver v1.12ac
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
Floppy drive(s): fd0 is 2.88M
FDC 0 is a post-1991 82077
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
CY82C693: IDE controller (0x1080:0xc693 rev 0x00) at  PCI slot :00:07.1
CY82C693: not 100% native mode: will probe irqs later
CY82C693U driver v0.34 99-13-12 Andreas S. Krebs ([EMAIL PROTECTED])
ide0: BM-DMA at 0x8400-0x8407, BIOS settings: hda:pio, hdb:pio
CY82C693: port 0x01f0 already claimed by ide0
ALI15X3: IDE controller (0x10b9:0x5228 rev 0xc6) at  PCI slot 0001:02:09.1
ALI15X3: 100% native mode on irq 28
ide1: BM-DMA at 0x28410-0x28417, BIOS settings: hdc:DMA, 
hdd:DMA
ide2: BM-DMA at 0x28418-0x2841f, BIOS settings: hde:pio, 
hdf:pio

hdf: LITE-ON DVDRW SOHW-1653S, ATAPI CD/DVD-ROM drive
hde: ST3200822A, ATA DISK drive
ide2 at 0x28438-0x2843f,0x2844e on irq 28
hde: max request size: 512KiB
hde: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63, 
UDMA(100)

hde: cache flushes supported
 hde: hde1
qla1280: QLA1040 found on PCI bus 1, dev 6
scsi(0:0): Resetting SCSI BUS
scsi0 : QLogic QLA1040 PCI to SCSI Host Adapter
   Firmware version:  7.65.06, Driver version 3.26
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
scsi 0:0:1:0: Direct-Access SEAGATE  ST336706LW   0109 PQ: 0 ANSI: 3
scsi(0:0:1:0): Sync: period 10, offset 12, Wide
input: AT Raw Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
atkbd.c: keyboard reset failed on isa0060/serio1
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
NET: Registered protocol family 15
scsi: waiting for bus probes to complete ...
sd 0:0:1:0: [sda] 71687370 512-byte hardware sectors (36704 MB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, supports 
DPO and FUA

sd 0:0:1:0: [sda] 71687370 512-byte hardware sectors (36704 MB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, supports 
DPO and FUA

 s

Re: [PATCH] Remove rcu_assign_pointer() penalty for NULL pointers

2007-11-30 Thread Paul E. McKenney

On Sat, Dec 01, 2007 at 12:07:52PM +1100, Herbert Xu wrote:
> On Fri, Nov 30, 2007 at 04:37:21PM -0800, Paul E. McKenney wrote:
> > 
> > The rcu_assign_pointer() primitive currently unconditionally executes
> > a memory barrier, even when a NULL pointer is being assigned.  This
> > has lead some to avoid using rcu_assign_pointer() for NULL pointers,
> > which loses the self-documenting advantages of rcu_assign_pointer()
> > This patch uses __builtin_const_p() to omit needless memory barriers
> > for NULL-pointer assignments at compile time with no runtime penalty,
> > as discussed in the following thread:
> > 
> > http://www.mail-archive.com/[EMAIL PROTECTED]/msg54852.html
> > 
> > Tested on x86_64 and ppc64, also compiled the four cases (NULL/non-NULL
> > and const/non-const) with gcc version 4.1.2, and hand-checked the
> > assembly output.
> > 
> > Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
> 
> Acked-by: Herbert Xu <[EMAIL PROTECTED]>
> 
> Thanks a lot for following through with this Paul!

No problem -- after all, it is not every day that one gets the opportunity
to make a simple change that speeds things up and makes kernel hackers
lives a bit simpler.  ;-)

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: use of fixmap on non-x86/sh?

2007-11-30 Thread Benjamin Herrenschmidt

On Sat, 2007-12-01 at 10:03 +0900, Paul Mundt wrote:
> There are of course things that make this more attractive on x86,
> especially with regards to the global bit and preservation across a
> TLB
> flush, there's a note in arch/sh/mm/init.c above __set_fixmap() about
> that. fixmap doesn't really have any special behaviour that makes an
> architecture implementation problematic at least.

On ppc, we are mostly looking into memory mapped config space, which for
some new PCIe bridges is huge (about 512M per port on the 440SPe). Those
processors have 36 bits physical addresses and 32 bits virtual.

So I suppose we can just move our current kmap_atomic implementation out
of highmem, call it fixmap, and add a slot for use by PCI config space
access (those are fully spinlocked, so the per-cpu aspect is just what
we need, just like kmap_atomic).

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Mark Lord


Mark Lord wrote:

Arjan van de Ven wrote:

On Fri, 30 Nov 2007 22:44:25 -0500
Mark Lord <[EMAIL PROTECTED]> wrote:

all you need to do in your kernel module is call

...

set_acceptable_latency("mark", 5);

and to remove the constraint again you just do

remove_acceptable_latency("mark");

..

Then why not have a sysfs entry for scripts to write tsoro
to trigger the exact same end result?  :)


that's what is in current -mm pretty much
well not sysfs, but it goes via a file descriptor
(so that if the process that sets the contraint dies, the latency
requirement can be given up automatically)

...

But doesn't that approach also make it nearly impossible to script 

...

Okay, I have a working trivial kernel module that I can load/unload
to tweak this.  But a simple sysfs attribute would be *so much* better
as a permanent kernel feature.

Binary interfaces (fd) are fine for some uses, but not nice for scripts.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid overflows in kernel/time.c

2007-11-30 Thread H. Peter Anvin


Adrian Bunk wrote:


I have read the hep text, but are the advantages of HZ == 300 really 
visible or was this more theoretical?


In the latter case, we might remove the HZ == 300 choice instead.



Well, we have, for various architectures:

HZ == 48, 100, 128, 250, 256, 300, 1000, 1024

You'd have to kill 48, 128, 256, 300 and 1024.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] please pull infiniband.git

2007-11-30 Thread Roland Dreier

Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

This will get two small fixes for 2.6.24:

Jack Morgenstein (1):
  IPoIB: Fix oops if xmit is called when priv->broadcast is NULL

Joachim Fenkes (1):
  IB/ehca: Fix static rate if path faster than link

 drivers/infiniband/hw/ehca/ehca_av.c  |8 ++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c |3 +++
 2 files changed, 9 insertions(+), 2 deletions(-)


diff --git a/drivers/infiniband/hw/ehca/ehca_av.c 
b/drivers/infiniband/hw/ehca/ehca_av.c
index 453eb99..f7782c8 100644
--- a/drivers/infiniband/hw/ehca/ehca_av.c
+++ b/drivers/infiniband/hw/ehca/ehca_av.c
@@ -76,8 +76,12 @@ int ehca_calc_ipd(struct ehca_shca *shca, int port,
 
link = ib_width_enum_to_int(pa.active_width) * pa.active_speed;
 
-   /* IPD = round((link / path) - 1) */
-   *ipd = ((link + (path >> 1)) / path) - 1;
+   if (path >= link)
+   /* no need to throttle if path faster than link */
+   *ipd = 0;
+   else
+   /* IPD = round((link / path) - 1) */
+   *ipd = ((link + (path >> 1)) / path) - 1;
 
return 0;
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a03a65e..c9f6077 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -460,6 +460,9 @@ static struct ipoib_path *path_rec_create(struct net_device 
*dev, void *gid)
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ipoib_path *path;
 
+   if (!priv->broadcast)
+   return NULL;
+
path = kzalloc(sizeof *path, GFP_ATOMIC);
if (!path)
return NULL;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Mark Lord


Arjan van de Ven wrote:

On Fri, 30 Nov 2007 22:44:25 -0500
Mark Lord <[EMAIL PROTECTED]> wrote:

all you need to do in your kernel module is call

add_latency_constraint("mark_wants_his_mouse", 5);

Okay, and how to change it back again?  (thanks)



sorry I misremember it's called
set_acceptable_latency("mark", 5);

and to remove the constraint again you just do

remove_acceptable_latency("mark");

(and there's modify_ too to change existing)


Then why not have a sysfs entry for scripts to write tsoro
to trigger the exact same end result?  :)


that's what is in current -mm pretty much
well not sysfs, but it goes via a file descriptor
(so that if the process that sets the contraint dies, the latency
requirement can be given up automatically)

...

But doesn't that approach also make it nearly impossible to script 

Having to hack source for any apps that one wants to use it on
sounds rather barbaric.. 


(sure I *can* still script it, with background tasksand wait and ..
but that's getting rather complex again).

Meanwhile.. I'll now cook up a module to try the set/remove_acceptable_latency 
thing.

Thanks
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-11-30 Thread serge

Quoting KaiGai Kohei ([EMAIL PROTECTED]):
> Serge E. Hallyn wrote:
> > The capability bounding set is a set beyond which capabilities
> > cannot grow.  Currently cap_bset is per-system.  It can be
> > manipulated through sysctl, but only init can add capabilities.
> > Root can remove capabilities.  By default it includes all caps
> > except CAP_SETPCAP.
> 
> Serge,
> 
> This feature makes me being interested in.
> I think you intend to apply this feature for the primary process
> of security container.
> However, it is also worthwhile to apply when a session is starting up.
> 
> The following PAM module enables to drop capability bounding bit
> specified by the fifth field in /etc/passwd entry.
> This code is just an example now, but considerable feature.
> 
> build and install:
> # gcc -Wall -c pam_cap_drop.c
> # gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam
> # cp pam_cap_drop.so /lib/security
> 
> modify /etc/passwd as follows:
> 
> tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash
>^^
> example:
> [EMAIL PROTECTED] ~]$ ping 192.168.1.1
> PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
> 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms
> 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms
> 
> --- 192.168.1.1 ping statistics ---
> 2 packets transmitted, 2 received, 0% packet loss, time 999ms
> rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms
> 
> [EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED]
> [EMAIL PROTECTED]'s password:
> Last login: Sat Dec  1 10:09:29 2007 from masu.myhome.cx
> [EMAIL PROTECTED] ~]$ export LANG=C
> [EMAIL PROTECTED] ~]$ ping 192.168.1.1
> ping: icmp open socket: Operation not permitted
> 
> [EMAIL PROTECTED] ~]$ su
> Password:
> pam_cap_bset[6921]: user root does not have 'cap_drop=' property
> [EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap
> CapInh: 
> CapPrm: dffe
> CapEff: dffe
> [EMAIL PROTECTED] tak]#

Neat.  A bigger-stick version of not adding the account to
group wheel.  I'll use that.

Is there any reason not to have a separate /etc/login.capbounds
config file, though, so the account can still have a full name?
Did you only use that for convenience of proof of concept, or
is there another reason?

> # BTW, I replaced the James's address in the Cc: list,
> # because MTA does not accept it.

Thanks!  I don't know what happened to my alias for him...

thanks,
-serge

> -- 
> KaiGai Kohei <[EMAIL PROTECTED]>
> 
> 
> pam_cap_drop.c
> 
> 
> /*
>  * pam_cap_drop.c module -- drop capabilities bounding set
>  *
>  * Copyright: 2007 KaiGai Kohei <[EMAIL PROTECTED]>
>  */
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> #include 
> 
> #ifndef PR_CAPBSET_DROP
> #define PR_CAPBSET_DROP 24
> #endif
> 
> static char *captable[] = {
>   "cap_chown",
>   "cap_dac_override",
>   "cap_dac_read_search",
>   "cap_fowner",
>   "cap_fsetid",
>   "cap_kill",
>   "cap_setgid",
>   "cap_setuid",
>   "cap_setpcap",
>   "cap_linux_immutable",
>   "cap_net_bind_service",
>   "cap_net_broadcast",
>   "cap_net_admin",
>   "cap_net_raw",
>   "cap_ipc_lock",
>   "cap_ipc_owner",
>   "cap_sys_module",
>   "cap_sys_rawio",
>   "cap_sys_chroot",
>   "cap_sys_ptrace",
>   "cap_sys_pacct",
>   "cap_sys_admin",
>   "cap_sys_boot",
>   "cap_sys_nice",
>   "cap_sys_resource",
>   "cap_sys_time",
>   "cap_sys_tty_config",
>   "cap_mknod",
>   "cap_lease",
>   "cap_audit_write",
>   "cap_audit_control",
>   "cap_setfcap",
>   NULL,
> };
> 
> 
> PAM_EXTERN int
> pam_sm_open_session(pam_handle_t *pamh, int flags,
> int argc, const char **argv)
> {
>   struct passwd *pwd;
>   char *pos, *buf;
>   char *username = NULL;
> 
>   /* open system logger */
>   openlog("pam_cap_bset", LOG_PERROR | LOG_PID, LOG_AUTHPRIV);
> 
>   /* get the unix username */
>   if (pam_get_item(pamh, PAM_USER, (void *) &username) != PAM_SUCCESS || 
> !username)
>   return PAM_USER_UNKNOWN;
> 
>   /* get the passwd entry */
>   pwd = getpwnam(username);
>   if (!pwd)
>   return PAM_USER_UNKNOWN;
> 
>   /* Is there "cap_drop=" ? */
>   pos = strstr(pwd->pw_gecos, "cap_drop=");
>   if (pos) {
>   buf = strdup(pos + sizeof("cap_drop=") - 1);
>   if (!buf)
>   return PAM_SESSION_ERR;
>   pos = strtok(buf, ",");
>   while (pos) {
>   int rc, i;
> 
>   for (i=0; captable[i]; i++) {
>   if (!strcmp(pos, captable[i])) {
>   rc = prct

Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Mark Lord


Mark Lord wrote:

..
And I just figured out the powertop:  it needed the kernel timers
patch from the powertop site that was originally for 2.6.21..
Any chance of somebody actually pushing that patch upstream some year ??
Patch reproduced here for interest's sake only.
Hey, look who's on the Signed-off list for it, *Arjan* !

...

Mmm.. hey, I was using this patch on 2.6.23 as well..

I wonder if perhaps this is the culprit for the mysterious 1-second delays
that sometimes slow down resume (from RAM) on my machine from 2 seconds
(normal) to 20 seconds (slow) ?   
-

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Mark Lord


Arjan van de Ven wrote:

On Fri, 30 Nov 2007 22:31:17 -0500
Mark Lord <[EMAIL PROTECTED]> wrote:


Arjan van de Ven wrote:

On Fri, 30 Nov 2007 22:14:08 -0500
Mark Lord <[EMAIL PROTECTED]> wrote:


in -mm there is.. the QoS stuff allows you to set maximum
tolerable

..

That's encouraging, I think, but not for 2.6.24.


latency. If your app cant take any latency, you should set
those... and the side effect is that the kernel will not do
long-latency C-states or P-state transitions..

..

I don't mind the cpufreq changing (actually, I want it to drop in
cpugfreq to save power and keep the fan off), but the C-states just
kill this app.

The app is VMware.  I force the max_state=1 when launching,

ah but then its' even easier... and can be done in 2.6.24 already.
VMWare after all has a kernel module, and the latency stuff is in
2.6.23 and 2.6.24 available inside the kernel already.

..

Oh, I'm perfectly happy to write my own kernel module if that's what


all you need to do in your kernel module is call

add_latency_constraint("mark_wants_his_mouse", 5);


Okay, and how to change it back again?  (thanks)

Then why not have a sysfs entry for scripts to write to
to trigger the exact same end result?  :)


Speaking of which.. what's with powertop on 2.6.24 ???
It's gone from 100-200 wakeups/sec to 2 wakeups/sec !!!


ho hum.. Lenovo T61?
I have some reports that that happens once in a while (but it's not
limited to .24 and it's also real, it's not a powertop bug but it
actually is waking up that much)..

..

No, it's my hefty Dell Inspiron 9400.

And I just figured out the powertop:  it needed the kernel timers
patch from the powertop site that was originally for 2.6.21.. 


Any chance of somebody actually pushing that patch upstream some year ??
Patch reproduced here for interest's sake only.
Hey, look who's on the Signed-off list for it, *Arjan* !

* * *


From e6f2ff1e4763212f1dcc945db76fb744b951ac53 Mon Sep 17 00:00:00 2001

From: Josh Triplett <[EMAIL PROTECTED]>
Date: Sun, 13 May 2007 15:21:39 -0700
Subject: [PATCH] Lengthen and align background timers to decrease wakeups

This patch changes a few background timers in the Linux kernel to
1) be aligned to full seconds so that multiple timers get handled in one
  processor wakeup
2) have longer timeouts for those timers that can use such longer timeouts

Some of these are a bit crude, but it's effective.

Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
[Josh: Updates for 2.6.22-rc1]
Signed-off-by: Josh Triplett <[EMAIL PROTECTED]>
---
kernel/time/clocksource.c |   10 --
mm/page-writeback.c   |8 
mm/slab.c |6 +++---
net/core/neighbour.c  |4 ++--
4 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3db5c3c..77308c4 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -79,11 +79,17 @@ static int watchdog_resumed;
/*
 * Interval: 0.5sec Threshold: 0.0625s
 */
-#define WATCHDOG_INTERVAL (HZ >> 1)
+#define WATCHDOG_INTERVAL (HZ*10)
#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)

+static int secondtime;
+
static void clocksource_ratewd(struct clocksource *cs, int64_t delta)
{
+   if (!secondtime) {
+   secondtime = 1;
+   return;
+   };
if (delta > -WATCHDOG_THRESHOLD && delta < WATCHDOG_THRESHOLD)
return;

@@ -145,7 +151,7 @@ static void clocksource_watchdog(unsigned long data)

if (!list_empty(&watchdog_list)) {
__mod_timer(&watchdog_timer,
-   watchdog_timer.expires + WATCHDOG_INTERVAL);
+   round_jiffies(watchdog_timer.expires + 
WATCHDOG_INTERVAL));
}
spin_unlock(&watchdog_lock);
}
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index eec1481..26318e5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -77,7 +77,7 @@ int vm_dirty_ratio = 10;
/*
 * The interval between `kupdate'-style writebacks, in jiffies
 */
-int dirty_writeback_interval = 5 * HZ;
+int dirty_writeback_interval = 15 * HZ;

/*
 * The longest number of jiffies for which data is allowed to remain dirty
@@ -450,7 +450,7 @@ static void wb_kupdate(unsigned long arg)

oldest_jif = jiffies - dirty_expire_interval;
start_jif = jiffies;
-   next_jif = start_jif + dirty_writeback_interval;
+   next_jif = round_jiffies(start_jif + dirty_writeback_interval);
nr_to_write = global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_UNSTABLE_NFS) +
(inodes_stat.nr_inodes - inodes_stat.nr_unused);
@@ -467,7 +467,7 @@ static void wb_kupdate(unsigned long arg)
nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
}
if (time_before(next_jif, jiffies + HZ))
-   next_jif = jiffies + HZ;
+   next_jif = round_jiffies(jiffies + HZ);
if (dirty_writeback_interval)

Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Mark Lord


Arjan van de Ven wrote:

On Fri, 30 Nov 2007 22:14:08 -0500
Mark Lord <[EMAIL PROTECTED]> wrote:


in -mm there is.. the QoS stuff allows you to set maximum tolerable

..

That's encouraging, I think, but not for 2.6.24.


latency. If your app cant take any latency, you should set those...
and the side effect is that the kernel will not do long-latency
C-states or P-state transitions..

..

I don't mind the cpufreq changing (actually, I want it to drop in
cpugfreq to save power and keep the fan off), but the C-states just
kill this app.

The app is VMware.  I force the max_state=1 when launching,


ah but then its' even easier... and can be done in 2.6.24 already.
VMWare after all has a kernel module, and the latency stuff is in
2.6.23 and 2.6.24 available inside the kernel already.

..

Oh, I'm perfectly happy to write my own kernel module if that's what
is going to be needed here, but just doing an echo into sysfs was simpler.
But yes, it appears to have no effect even with the chmod patch I posted,
so something different is needed here.


I'm not sure about C?? -- it could be C8 or even be C2 or whatever.
I suppose I should find out, but that really takes a lot of fuss
(hours) to measure, and isn't strictly repeatable.


(also hope you don't have one of those AMD machines where the bios
turns C1 into C2/C3/etc behind the OSes back ;-)

..

Just a nice Intel Core2 Duo notebook.

Speaking of which.. what's with powertop on 2.6.24 ???
It's gone from 100-200 wakeups/sec to 2 wakeups/sec !!!

New thread time..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] kobject: add kobject_init_ng, kobject_add_ng, and kobject_init_and_add functions

2007-11-30 Thread Alan Stern

On Fri, 30 Nov 2007, Greg KH wrote:

>  /**
> + * kobject_init_ng - initialize a kobject structure
> + * @kobj: pointer to the kobject to initialize
> + * @ktype: pointer to the ktype for this kobject.
> + * @parent: pointer to the parent of this kobject.
> + * @fmt: the name of the kobject.
> + *
> + * This function will properly initialize a kobject such that it can then
> + * be passed to the kobject_add() call.
> + *
> + * If the function returns an error, the memory allocated by the kobject
> + * can be safely freed, no other functions need to be called.
> + */
> +void kobject_init_ng(struct kobject *kobj, struct kobj_type *ktype)

Kerneldoc needs to be updated -- no @parent or @fmt.  Also no error
returns.  But you could say that after this routine runs, the kobject
should be deallocated by kobject_put() and not by calling kfree()
directly.

> +/**
> + * kobject_add_ng - the main kobject add function
> + * @kobj: the kobject to add
> + * @parent: pointer to the parent of the kobject.
> + *
> + * The kobject name is set and added to the kobject hierarchy in this
> + * function.
> + *
> + * If @parent is set, then the parent of the @kobj will be set to it.
> + * If @parent is NULL, then the parent of the @kobj will be set to the
> + * kobject associted with the kset assigned to this kobject.  If no kset
> + * is assigned to the kobject, then the kobject will be located in the
> + * root of the sysfs tree.
> + *
> + * If this function returns an error, kobject_put() must be called to
> + * properly clean up the memory associated with the object.
> + *
> + * If the function is successful, the only way to properly clean up the
> + * memory is with a call to kobject_del().

In which case kobject_put() isn't needed?

> + *
> + * Under no instance should the kobject that is passed to this function
> + * be directly freed with a call to kfree(), that can leak memory.
> + */

Should you say something here about uevents?

> +int kobject_add_ng(struct kobject *kobj, struct kobject *parent,
> +const char *fmt, ...)
> +{
> + va_list args;
> + int retval;
> +
> + if (!kobj)
> + return -EINVAL;
> +
> + va_start(args, fmt);
> + retval = kobject_set_name_vargs(kobj, fmt, args);
> + va_end(args);
> + if (retval) {
> + printk(KERN_ERR "kobject: can not set name properly!\n");
> + return retval;
> + }
> + kobj->parent = parent;
> + return kobject_add(kobj);
> +}
> +EXPORT_SYMBOL(kobject_add_ng);

Looks like this should call kobject_add_varg() instead of duplicating
its code.

> +/**
> + * kobject_init_and_add - initialize a kobject structure and add it to the 
> kobject hierarchy
> + * @kobj: pointer to the kobject to initialize
> + * @ktype: pointer to the ktype for this kobject.
> + * @parent: pointer to the parent of this kobject.
> + * @fmt: the name of the kobject.
> + *
> + * This function will properly initialize a kobject and then call
> + * kobject_add().
> + *
> + * If the function returns an error, the kobject passed to this function
> + * must be cleaned up by calling kobject_put(), and not by directly
> + * trying to call kfree() on the kobject.
> + *
> + * If this function succeeds, the only way to properly clean up the
> + * kobject is to call kobject_destroy(), which will clean up all of the

kobject_destroy()?  Where did that come from?  Or did you mean 
kobject_del()?

> + * needed sysfs objects, and the kobject itself (by calling back to the
> + * ktype->release() function.)
> + *
> + * Note that the kobject_uevent() call should be called after this
> + * function succeeds, so that userspace can properly know that the
> + * kobject was created.
> + */

Could the comments be made shorter by saying merely that this routine 
combines calls to kobject_init() and kobject_add_ng()?

> +int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
> +  struct kobject *parent, const char *fmt, ...)
> +{
> + va_list args;
> + int retval;
> +
> + kobject_init_ng(kobj, ktype);
> +
> + va_start(args, fmt);
> + retval = kobject_add_varg(kobj, parent, fmt, args);
> + va_end(args);
> +
> + return retval;
> +}
> +EXPORT_SYMBOL(kobject_init_and_add);

Looks okay.

Did you want to add an extra kobject_put() to the end of kobject_del()?  
Or did you want to define a new kobject_destroy() that combines calls 
to kobject_del() and kobject_put()?

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Arjan van de Ven

On Fri, 30 Nov 2007 22:14:08 -0500
Mark Lord <[EMAIL PROTECTED]> wrote:

> > in -mm there is.. the QoS stuff allows you to set maximum tolerable
> ..
> 
> That's encouraging, I think, but not for 2.6.24.
> 
> > latency. If your app cant take any latency, you should set those...
> > and the side effect is that the kernel will not do long-latency
> > C-states or P-state transitions..
> ..
> 
> I don't mind the cpufreq changing (actually, I want it to drop in
> cpugfreq to save power and keep the fan off), but the C-states just
> kill this app.
> 
> The app is VMware.  I force the max_state=1 when launching,

ah but then its' even easier... and can be done in 2.6.24 already.
VMWare after all has a kernel module, and the latency stuff is in
2.6.23 and 2.6.24 available inside the kernel already.

> 
> I'm not sure about C?? -- it could be C8 or even be C2 or whatever.
> I suppose I should find out, but that really takes a lot of fuss
> (hours) to measure, and isn't strictly repeatable.

(also hope you don't have one of those AMD machines where the bios
turns C1 into C2/C3/etc behind the OSes back ;-)


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Mark Lord


Arjan van de Ven wrote:

On Fri, 30 Nov 2007 21:52:40 -0500
Mark Lord <[EMAIL PROTECTED]> wrote:


Pallipadi, Venkatesh wrote:



Exporting it as read only should be OK. We also need to know if
there are hard user space dependency on writing to this from
userspace.

..

Well, actually..  my scripts have a firm need to write "1" to it,
and then later restore the original value.

This is needed to *greatly* speed up an otherwise sluggish binary I
use, 


just curious, but this does sound like the c-state code has a bug...
independent of the sysfs thing, I think that really needs solving

Can you describe the behavior a little? Or provide information to the
degree that some of us can figure out how to tweak the algorithm..,.



as well as whenever I want to semi-accurately benchmark I/O.

Is there another way to achieve exactly the same behaviour?


in -mm there is.. the QoS stuff allows you to set maximum tolerable

..

That's encouraging, I think, but not for 2.6.24.


latency. If your app cant take any latency, you should set those... and
the side effect is that the kernel will not do long-latency C-states or
P-state transitions..

..

I don't mind the cpufreq changing (actually, I want it to drop in cpugfreq
to save power and keep the fan off), but the C-states just kill this app.

The app is VMware.  I force the max_state=1 when launching,
and restore it to (prior value) 8 when it exits.
Makes a *huge* difference for text input and the like.
Yes, there's something there that could get fixed in the app,
and maybe one day will get fixed.  But I'm not holding my breath.

I think it manages to resonate at exactly the harmonic required that
the chip transitions to C?? just prior to the app wanting to wake up 
and do another poll.  Just kills it.


I'm not sure about C?? -- it could be C8 or even be C2 or whatever.
I suppose I should find out, but that really takes a lot of fuss (hours)
to measure, and isn't strictly repeatable.





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Arjan van de Ven

On Fri, 30 Nov 2007 21:52:40 -0500
Mark Lord <[EMAIL PROTECTED]> wrote:

> Pallipadi, Venkatesh wrote:

> > 
> > Exporting it as read only should be OK. We also need to know if
> > there are hard user space dependency on writing to this from
> > userspace.
> ..
> 
> Well, actually..  my scripts have a firm need to write "1" to it,
> and then later restore the original value.
> 
> This is needed to *greatly* speed up an otherwise sluggish binary I
> use, 

just curious, but this does sound like the c-state code has a bug...
independent of the sysfs thing, I think that really needs solving

Can you describe the behavior a little? Or provide information to the
degree that some of us can figure out how to tweak the algorithm..,.

>as well as whenever I want to semi-accurately benchmark I/O.
> 
> Is there another way to achieve exactly the same behaviour?

in -mm there is.. the QoS stuff allows you to set maximum tolerable
latency. If your app cant take any latency, you should set those... and
the side effect is that the kernel will not do long-latency C-states or
P-state transitions..

-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Mark Lord


Pallipadi, Venkatesh wrote:
 


On Fri, 30 Nov 2007 14:06:55 -0800
"Pallipadi, Venkatesh" <[EMAIL PROTECTED]> wrote:

Please dont go off-list like this.  I put Mark's original 
mailing list cc's

back.


Sorry for missing some cc's earlier. I blindly did a reply-all to the
mm-commits mail I got.


I will have to Nack this. The reason max_cstate was initentionally
removed due to couple of reasons:

It broke userspace without any warning or migration period, afaict.


Yes. That's true. I will have to take the blame for that. It has been
known for a while during cpuidle development. But, it was never
documented as deprecating.
 

1) All in kernel users of max_cstate should rather be using
pm_qos/latency interfaces. All such max_cstate usages must already be
migrated.

That code isn't merged.


All kernel part is already merged. I mean, there are do drivers that
depend on max_cstate. They use latency_notifier thing today and their
migration to pm_qos part is not merged yet.


2) Supporting max_cstate as a dynamic parameter cleanly is no longer
possible in acpi/processor_idle.c as the C-state policy has moved to
cpuidle instead. It can be done if it is needed. But, just 

below patch

will not really work with cpuidle.

Selecting max_cstate at boot time as a debug option still 

works without

this patch.

So, just this patch will not get back the functionality with cpuidle.
Infact changing it at run time will have no effect. Question 

however is:

Is there a real need to revive this parameter so that user can change
max_cstate at run time?
It is not known whether Mark is actually writing to this 
thing.  Perhaps

read-only permissions would be a suitable fix?



Exporting it as read only should be OK. We also need to know if there
are hard user space dependency on writing to this from userspace.

..

Well, actually..  my scripts have a firm need to write "1" to it,
and then later restore the original value.

This is needed to *greatly* speed up an otherwise sluggish binary I use,
as well as whenever I want to semi-accurately benchmark I/O.

Is there another way to achieve exactly the same behaviour?

Thanks
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Bogus PCI vendor ID

2007-11-30 Thread Kyle McMartin

On Wed, Nov 28, 2007 at 09:41:24PM +0100, Kai Ruhnau wrote:
> Kyle McMartin wrote:
> > On Sun, Nov 25, 2007 at 01:36:19PM +0100, Kai Ruhnau wrote:
> >   
> >> If this is the same like the kernel option 'pci=conf1', that fixes the
> >> vendor IDs.
> >> 
> >
> > Same effect. Ubuntu and many other distros are shipping kernels with
> > MMCONFIG off by default for reasons like this. Check to see if you have
> > an updated BIOS from your motherboard manufacturer?
> >   
> 
> No, nothing available there. (OEM board, running fine under Vista, I
> don't expect to get support there...)
> 

Ah, that's interesting, as Vista is supposed to require working MMCONFIG
support to be certified.

regards,
Kyle
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-11-30 Thread KaiGai Kohei

Serge E. Hallyn wrote:
> The capability bounding set is a set beyond which capabilities
> cannot grow.  Currently cap_bset is per-system.  It can be
> manipulated through sysctl, but only init can add capabilities.
> Root can remove capabilities.  By default it includes all caps
> except CAP_SETPCAP.

Serge,

This feature makes me being interested in.
I think you intend to apply this feature for the primary process
of security container.
However, it is also worthwhile to apply when a session is starting up.

The following PAM module enables to drop capability bounding bit
specified by the fifth field in /etc/passwd entry.
This code is just an example now, but considerable feature.

build and install:
# gcc -Wall -c pam_cap_drop.c
# gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam
# cp pam_cap_drop.so /lib/security

modify /etc/passwd as follows:

tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash
   ^^
example:
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms

--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms

[EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED]
[EMAIL PROTECTED]'s password:
Last login: Sat Dec  1 10:09:29 2007 from masu.myhome.cx
[EMAIL PROTECTED] ~]$ export LANG=C
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
ping: icmp open socket: Operation not permitted

[EMAIL PROTECTED] ~]$ su
Password:
pam_cap_bset[6921]: user root does not have 'cap_drop=' property
[EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap
CapInh: 
CapPrm: dffe
CapEff: dffe
[EMAIL PROTECTED] tak]#

# BTW, I replaced the James's address in the Cc: list,
# because MTA does not accept it.
-- 
KaiGai Kohei <[EMAIL PROTECTED]>


pam_cap_drop.c


/*
 * pam_cap_drop.c module -- drop capabilities bounding set
 *
 * Copyright: 2007 KaiGai Kohei <[EMAIL PROTECTED]>
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

#ifndef PR_CAPBSET_DROP
#define PR_CAPBSET_DROP 24
#endif

static char *captable[] = {
"cap_chown",
"cap_dac_override",
"cap_dac_read_search",
"cap_fowner",
"cap_fsetid",
"cap_kill",
"cap_setgid",
"cap_setuid",
"cap_setpcap",
"cap_linux_immutable",
"cap_net_bind_service",
"cap_net_broadcast",
"cap_net_admin",
"cap_net_raw",
"cap_ipc_lock",
"cap_ipc_owner",
"cap_sys_module",
"cap_sys_rawio",
"cap_sys_chroot",
"cap_sys_ptrace",
"cap_sys_pacct",
"cap_sys_admin",
"cap_sys_boot",
"cap_sys_nice",
"cap_sys_resource",
"cap_sys_time",
"cap_sys_tty_config",
"cap_mknod",
"cap_lease",
"cap_audit_write",
"cap_audit_control",
"cap_setfcap",
NULL,
};


PAM_EXTERN int
pam_sm_open_session(pam_handle_t *pamh, int flags,
int argc, const char **argv)
{
struct passwd *pwd;
char *pos, *buf;
char *username = NULL;

/* open system logger */
openlog("pam_cap_bset", LOG_PERROR | LOG_PID, LOG_AUTHPRIV);

/* get the unix username */
if (pam_get_item(pamh, PAM_USER, (void *) &username) != PAM_SUCCESS || 
!username)
return PAM_USER_UNKNOWN;

/* get the passwd entry */
pwd = getpwnam(username);
if (!pwd)
return PAM_USER_UNKNOWN;

/* Is there "cap_drop=" ? */
pos = strstr(pwd->pw_gecos, "cap_drop=");
if (pos) {
buf = strdup(pos + sizeof("cap_drop=") - 1);
if (!buf)
return PAM_SESSION_ERR;
pos = strtok(buf, ",");
while (pos) {
int rc, i;

for (i=0; captable[i]; i++) {
if (!strcmp(pos, captable[i])) {
rc = prctl(PR_CAPBSET_DROP, i);
if (rc < 0) {
syslog(LOG_NOTICE, "user %s 
could not drop %s (%s)",
   username, captable[i], 
strerror(errno));
break;
}
syslog(LOG_NOTICE, "user %s drops 
%s\n", username, captable[i]);
goto next;
}
}
br

Re: [EXT4 set 6][PATCH 1/1]Export jbd stats through procfs

2007-11-30 Thread Mingming Cao

On Fri, 2007-11-30 at 17:08 -0600, Eric Sandeen wrote:
> Mingming Cao wrote:
> > [PATCH] jbd2 stats through procfs
> > 
> > The patch below updates the jbd stats patch to 2.6.20/jbd2.
> > The initial patch was posted by Alex Tomas in December 2005
> > (http://marc.info/?l=linux-ext4&m=113538565128617&w=2).
> > It provides statistics via procfs such as transaction lifetime and size.
> > 
> > [ This probably should be rewritten to use debugfs?   -- Ted]
> > 
> > Signed-off-by: Johann Lombardi <[EMAIL PROTECTED]>
> 
> I've started going through this one to clean it up to the point where it
> can go forward.  It's been sitting at the top of the unstable portion of
> the patch queue for long enough, I think :)
> 
Thanks!

> I've converted the msecs to jiffies until the user boundary, changed the
> union #defines as suggested by Andrew, and various other little issues etc.
> 
> Remaining to do is a generic time-difference calculator (instead of
> jbd2_time_diff), and looking into whether it should be made a config
> option; I tend to think it should, but it's fairly well sprinkled
> through the code, so I'll see how well that works.
> 
> Also did we ever decided if this should go to debugfs?
> 

I thought it was decided to keep it on procfs as debugfs is not always
on...
> Thanks,
> 
> -Eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-11-30 Thread J.A. Magallón

On Sat, 1 Dec 2007 00:31:19 +, Al Viro <[EMAIL PROTECTED]> wrote:

> On Sat, Dec 01, 2007 at 12:19:50AM +0100, J.A. Magall??n wrote:
> > An vtable in C++ takes exactly the same space that the function
> > table pointer present in every driver nowadays... and probably
> > the virtual method call that C++ does itself with
> > 
> > thing->do_something(with,this)
> > 
> > like
> > push thing
> > push with
> > push this
> > call THING_vtable+indexof(do_something) // constants at compile time
> 
> This is not what vtables are.  Think for a minute - all codepaths arriving
> to that point in your code will pick the address to call from the same
> location.  Either the contents of that location is constant (in which case
> you could bloody well call it directly in the first place) *or* it has to
> somehow be reassigned back and forth, according to the value of this.  The
> former is dumb, the latter - outright insane.
> 
> The contents of vtables is constant.  The whole point of that thing is
> to deal with the situations where we _can't_ tell which derived class
> this ->do_something() is from; if we could tell which vtable it is at
> compile time, we wouldn't need to bother at all.
> 

Yup, my mistake (that's why I said i will learn something). I was thinking
on non-virtual methods. For virtual ones you have to fetch the vtable
start address and index from it.

> It's a tradeoff - we pay the extra memory access (fetch vtable pointer, then 
> fetch method from vtable) for not having to store a slew of method pointers
> in each instance of base class.  But the extra memory access is very much
> there.  It can be further optimized away if you have several method calls
> for the same object next to each other (then vtable can be picked once),
> but it's still done at runtime.

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam03 (gcc 4.2.2 (4.2.2-1mdv2008.1)) SMP Sat Nov
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove rcu_assign_pointer() penalty for NULL pointers

2007-11-30 Thread Herbert Xu

On Fri, Nov 30, 2007 at 04:37:21PM -0800, Paul E. McKenney wrote:
> 
> The rcu_assign_pointer() primitive currently unconditionally executes
> a memory barrier, even when a NULL pointer is being assigned.  This
> has lead some to avoid using rcu_assign_pointer() for NULL pointers,
> which loses the self-documenting advantages of rcu_assign_pointer()
> This patch uses __builtin_const_p() to omit needless memory barriers
> for NULL-pointer assignments at compile time with no runtime penalty,
> as discussed in the following thread:
> 
>   http://www.mail-archive.com/[EMAIL PROTECTED]/msg54852.html
> 
> Tested on x86_64 and ppc64, also compiled the four cases (NULL/non-NULL
> and const/non-const) with gcc version 4.1.2, and hand-checked the
> assembly output.
> 
> Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>

Acked-by: Herbert Xu <[EMAIL PROTECTED]>

Thanks a lot for following through with this Paul!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: use of fixmap on non-x86/sh?

2007-11-30 Thread Paul Mundt

On Fri, Nov 30, 2007 at 04:14:55PM -0600, Kumar Gala wrote:
> Ben and I are talking about using fixmap on ppc for similar  
> applications to it use on x86.  However in poking around other arch's  
> (sparc, mips) they appear to have some support but not as complete as  
> x86.
> 
> For example both SPARC & MIPS reference __set_fixmap() in asm/fixmap.h  
> but I can't find an implementation on either.
> 
That's probably because people got lazy with copying around the
definitions -- perhaps surprisingly this happens quite frequently in arch
headers ;-)

MIPS has a fixrange_init() which does things in more or less one shot.
__set_fixmap() is a good abstraction if you're interested in poking at
individual fixmaps, but at least the kmap fixmaps have special handling all
over the place (look for kmap_pte in the various highmem implementations),
and there are few fixmaps otherwise.

> So I was wondering if there was some reason fixmap isn't as well  
> supported or if its just used for a specific function on those SPARC,  
> MIPS, etc. and they dont need as much functionality out of it as x86  
> does.
> 
There are of course things that make this more attractive on x86,
especially with regards to the global bit and preservation across a TLB
flush, there's a note in arch/sh/mm/init.c above __set_fixmap() about
that. fixmap doesn't really have any special behaviour that makes an
architecture implementation problematic at least.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] kobject: the new functions in use

2007-11-30 Thread Greg KH

Here's an example of a number of different places in the kernel that
have been converted from the older kobject functions to the new
versions.

Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 fs/char_dev.c  |6 ++
 kernel/module.c|   14 ++
 kernel/params.c|6 ++
 kernel/user.c  |9 -
 mm/slub.c  |9 -
 net/bridge/br_if.c |   10 +++---
 6 files changed, 21 insertions(+), 33 deletions(-)

--- a/fs/char_dev.c
+++ b/fs/char_dev.c
@@ -510,9 +510,8 @@ struct cdev *cdev_alloc(void)
 {
struct cdev *p = kzalloc(sizeof(struct cdev), GFP_KERNEL);
if (p) {
-   p->kobj.ktype = &ktype_cdev_dynamic;
INIT_LIST_HEAD(&p->list);
-   kobject_init(&p->kobj);
+   kobject_init_ng(&p->kobj, &ktype_cdev_dynamic);
}
return p;
 }
@@ -529,8 +528,7 @@ void cdev_init(struct cdev *cdev, const 
 {
memset(cdev, 0, sizeof *cdev);
INIT_LIST_HEAD(&cdev->list);
-   cdev->kobj.ktype = &ktype_cdev_default;
-   kobject_init(&cdev->kobj);
+   kobject_init_ng(&cdev->kobj, &ktype_cdev_default);
cdev->ops = fops;
 }
 
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1218,18 +1218,16 @@ int mod_sysfs_init(struct module *mod)
err = -EINVAL;
goto out;
}
-   memset(&mod->mkobj.kobj, 0, sizeof(mod->mkobj.kobj));
-   err = kobject_set_name(&mod->mkobj.kobj, "%s", mod->name);
-   if (err)
-   goto out;
-   mod->mkobj.kobj.kset = module_kset;
-   mod->mkobj.kobj.ktype = &module_ktype;
mod->mkobj.mod = mod;
 
-   kobject_init(&mod->mkobj.kobj);
+   memset(&mod->mkobj.kobj, 0, sizeof(mod->mkobj.kobj));
+   mod->mkobj.kobj.kset = module_kset;
+   err = kobject_init_and_add(&mod->mkobj.kobj, &module_ktype, NULL,
+  "%s", mod->name);
+   if (err)
+   kobject_put(&mod->mkobj.kobj);
 
/* delay uevent until full sysfs population */
-   err = kobject_add(&mod->mkobj.kobj);
 out:
return err;
 }
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -561,11 +561,9 @@ static void __init kernel_param_sysfs_se
 
mk->mod = THIS_MODULE;
mk->kobj.kset = module_kset;
-   mk->kobj.ktype = &module_ktype;
-   kobject_set_name(&mk->kobj, name);
-   kobject_init(&mk->kobj);
-   ret = kobject_add(&mk->kobj);
+   ret = kobject_init_and_add(&mk->kobj, &module_ktype, NULL, "%s", name);
if (ret) {
+   kobject_put(&mk->kobj);
printk(KERN_ERR "Module '%s' failed to be added to sysfs, "
  "error number %d\n", name, ret);
printk(KERN_ERR "The system will be unstable now.\n");
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -181,13 +181,12 @@ static int uids_user_create(struct user_
int error;
 
memset(kobj, 0, sizeof(struct kobject));
-   kobj->ktype = &uids_ktype;
kobj->kset = uids_kset;
-   kobject_init(kobj);
-   kobject_set_name(&up->kobj, "%d", up->uid);
-   error = kobject_add(kobj);
-   if (error)
+   error = kobject_init_and_add(kobj, &uids_ktype, NULL, "%d", up->uid);
+   if (error) {
+   kobject_put(kobj);
goto done;
+   }
 
kobject_uevent(kobj, KOBJ_ADD);
 done:
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4021,13 +4021,12 @@ static int sysfs_slab_add(struct kmem_ca
name = create_unique_id(s);
}
 
-   kobject_set_name(&s->kobj, name);
s->kobj.kset = slab_kset;
-   s->kobj.ktype = &slab_ktype;
-   kobject_init(&s->kobj);
-   err = kobject_add(&s->kobj);
-   if (err)
+   err = kobject_init_and_add(&s->kobj, &slab_ktype, NULL, name);
+   if (err) {
+   kobject_put(&s->kobj);
return err;
+   }
 
err = sysfs_create_group(&s->kobj, &slab_attr_group);
if (err)
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -258,12 +258,6 @@ static struct net_bridge_port *new_nbp(s
p->state = BR_STATE_DISABLED;
br_stp_port_timer_init(p);
 
-   kobject_init(&p->kobj);
-   kobject_set_name(&p->kobj, SYSFS_BRIDGE_PORT_ATTR);
-   p->kobj.ktype = &brport_ktype;
-   p->kobj.parent = &(dev->dev.kobj);
-   p->kobj.kset = NULL;
-
return p;
 }
 
@@ -379,7 +373,8 @@ int br_add_if(struct net_bridge *br, str
if (IS_ERR(p))
return PTR_ERR(p);
 
-   err = kobject_add(&p->kobj);
+   err = kobject_init_and_add(&p->kobj, &brport_ktype, &(dev->dev.kobj),
+  SYSFS_BRIDGE_PORT_ATTR);
if (err)
goto err0;
 
@@ -416,6 +411,7 @@ err2:
br_fdb_delete_by_port(br, p, 1);
 err1:
kobject_del(&p->kobj);
+   return err;
 err0:
kobject_put(&p->kobj);
return err;
-
To unsubscribe from this list: send the line "unsubscribe

[RFC] kobject_init changes - take 2

2007-11-30 Thread Greg KH

After Alan pointed out my stupidity, here are some new patches :)

They add three new functions:
kobject_init_ng()
kobject_add_ng()
kobject_init_and_add()

The "_ng" portion will go away after all of the current kernel users of
kobject_init() and kobject_add() are converted over.

There's also a second patch that shows how they are used, and how this
actually saves code in the callers.

Any further objections about these changes?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] kobject: add kobject_init_ng, kobject_add_ng, and kobject_init_and_add functions

2007-11-30 Thread Greg KH

This is what the kobject_init function is going to become.  Add it to
the kernel and then we can convert over the current kobject_init() users
before renaming it.

Also add a kobject_init_and_add function which bundles up what a lot of
the current callers want to do all at once, and it properly handles the
memory usages, unlike kobject_register();

Cc: Kay Sievers <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 include/linux/kobject.h |   11 ++-
 lib/kobject.c   |  152 ++--
 2 files changed, 156 insertions(+), 7 deletions(-)

--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -79,9 +79,16 @@ static inline const char * kobject_name(
 }
 
 extern void kobject_init(struct kobject *);
-extern void kobject_cleanup(struct kobject *);
-
+extern void kobject_init_ng(struct kobject *kobj, struct kobj_type *ktype);
 extern int __must_check kobject_add(struct kobject *);
+extern int __must_check kobject_add_ng(struct kobject *kobj,
+  struct kobject *parent,
+  const char *fmt, ...);
+extern int __must_check kobject_init_and_add(struct kobject *kobj,
+struct kobj_type *ktype,
+struct kobject *parent,
+const char *fmt, ...);
+
 extern void kobject_del(struct kobject *);
 
 extern struct kobject * __must_check kobject_create(const char *name,
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -348,6 +348,149 @@ int kobject_set_name(struct kobject *kob
 EXPORT_SYMBOL(kobject_set_name);
 
 /**
+ * kobject_init_ng - initialize a kobject structure
+ * @kobj: pointer to the kobject to initialize
+ * @ktype: pointer to the ktype for this kobject.
+ * @parent: pointer to the parent of this kobject.
+ * @fmt: the name of the kobject.
+ *
+ * This function will properly initialize a kobject such that it can then
+ * be passed to the kobject_add() call.
+ *
+ * If the function returns an error, the memory allocated by the kobject
+ * can be safely freed, no other functions need to be called.
+ */
+void kobject_init_ng(struct kobject *kobj, struct kobj_type *ktype)
+{
+   char *err_str;
+
+   if (!kobj) {
+   err_str = "invalid kobject pointer!";
+   goto error;
+   }
+   if (!ktype) {
+   err_str = "must have a ktype to be initialized properly!\n";
+   goto error;
+   }
+   if (atomic_read(&kobj->kref.refcount)) {
+   /* do not error out as sometimes we can recover */
+   printk(KERN_ERR "kobject: reference count is already set, "
+  "something is seriously wrong.\n");
+   dump_stack();
+   }
+
+   kref_init(&kobj->kref);
+   INIT_LIST_HEAD(&kobj->entry);
+   kobj->ktype = ktype;
+   return;
+
+error:
+   printk(KERN_ERR "kobject: %s\n", err_str);
+   dump_stack();
+}
+EXPORT_SYMBOL(kobject_init_ng);
+
+static int kobject_add_varg(struct kobject *kobj, struct kobject *parent,
+   const char *fmt, va_list vargs)
+{
+   va_list aq;
+   int retval;
+
+   va_copy(aq, vargs);
+   retval = kobject_set_name_vargs(kobj, fmt, aq);
+   va_end(aq);
+   if (retval) {
+   printk(KERN_ERR "kobject: can not set name properly!\n");
+   return retval;
+   }
+   kobj->parent = parent;
+   return kobject_add(kobj);
+}
+
+/**
+ * kobject_add_ng - the main kobject add function
+ * @kobj: the kobject to add
+ * @parent: pointer to the parent of the kobject.
+ *
+ * The kobject name is set and added to the kobject hierarchy in this
+ * function.
+ *
+ * If @parent is set, then the parent of the @kobj will be set to it.
+ * If @parent is NULL, then the parent of the @kobj will be set to the
+ * kobject associted with the kset assigned to this kobject.  If no kset
+ * is assigned to the kobject, then the kobject will be located in the
+ * root of the sysfs tree.
+ *
+ * If this function returns an error, kobject_put() must be called to
+ * properly clean up the memory associated with the object.
+ *
+ * If the function is successful, the only way to properly clean up the
+ * memory is with a call to kobject_del().
+ *
+ * Under no instance should the kobject that is passed to this function
+ * be directly freed with a call to kfree(), that can leak memory.
+ */
+int kobject_add_ng(struct kobject *kobj, struct kobject *parent,
+  const char *fmt, ...)
+{
+   va_list args;
+   int retval;
+
+   if (!kobj)
+   return -EINVAL;
+
+   va_start(args, fmt);
+   retval = kobject_set_name_vargs(kobj, fmt, args);
+   va_end(args);
+   if (retval) {
+   printk(KERN_ERR "kobject: can not set name properly!\n");
+   return retval;
+   }
+   kobj->parent = par

Re: [RFC] kobject: add kobject_init_ng and kobject_init_and_add functions

2007-11-30 Thread Greg KH

On Fri, Nov 30, 2007 at 06:22:37PM -0500, Alan Stern wrote:
> On Fri, 30 Nov 2007, Greg KH wrote:
> >  And you
> > can't know that, so you have to call kobject_put() in order to be safe
> > and clean up everything.
> > 
> > Now why did we not do the final kobject_put() in kobject_del() as well?
> > Doing two calls, always in order, seems a bit strange, anyone know why
> > it's this way?
> 
> To be symmetrical with kobject_init() and kobject_add().  Besides, 
> isn't there kobject_unregister()?  Presumably it will go away along 
> with kobject_register(), though.

Yes, it will go away too, once everyone gets converted.

> > > You could put that a little less strongly.  After kobject_init() you
> > > SHOULD call kobject_put() to clean up properly, and after kobject_add()
> > > you MUST call kobject_del() and kobject_put().
> >
> > No, in looking at the code, you only need to call kobject_del() to clean
> > everything up properly, if kobject_add() succeeds.  No need to call
> > kobject_put() yet again.
> 
> Sorry, yes, that's what I meant.  After a successful call to 
> kobject_add() you must call kobject_del() to undo the _add, and then
> kobject_put() for the final cleanup.

No, kobject_del() does the put for you[1].  All that is needed is a call
to kobject_del().

I'll post the updated patches in a minute, they look and seem to work
much better.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Remove rcu_assign_pointer() penalty for NULL pointers

2007-11-30 Thread Paul E. McKenney

Hello!

The rcu_assign_pointer() primitive currently unconditionally executes
a memory barrier, even when a NULL pointer is being assigned.  This
has lead some to avoid using rcu_assign_pointer() for NULL pointers,
which loses the self-documenting advantages of rcu_assign_pointer()
This patch uses __builtin_const_p() to omit needless memory barriers
for NULL-pointer assignments at compile time with no runtime penalty,
as discussed in the following thread:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg54852.html

Tested on x86_64 and ppc64, also compiled the four cases (NULL/non-NULL
and const/non-const) with gcc version 4.1.2, and hand-checked the
assembly output.

Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
---

 rcupdate.h |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff -urpNa -X dontdiff linux-2.6.24-rc1-ego/include/linux/rcupdate.h 
linux-2.6.24-rc1-egoxu/include/linux/rcupdate.h
--- linux-2.6.24-rc1-ego/include/linux/rcupdate.h   2007-11-06 
15:30:02.0 -0800
+++ linux-2.6.24-rc1-egoxu/include/linux/rcupdate.h 2007-11-30 
09:06:11.0 -0800
@@ -191,10 +191,13 @@ static inline void rcu_preempt_boost(voi
  * code.
  */
 
-#define rcu_assign_pointer(p, v)   ({ \
-   smp_wmb(); \
-   (p) = (v); \
-   })
+#define rcu_assign_pointer(p, v) \
+   ({ \
+   if (!__builtin_constant_p(v) || \
+   ((v) != NULL)) \
+   smp_wmb(); \
+   (p) = (v); \
+   })
 
 /**
  * synchronize_sched - block until all CPUs have exited any non-preemptive
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] ata: ahci: Enclosure Management via LED rev2

2007-11-30 Thread Kristen Carlson Accardi

Enclosure Management via LED

This patch implements Enclosure Management via the LED protocol as specified
in AHCI specification.

Signed-off-by: Kristen Carlson Accardi <[EMAIL PROTECTED]>
---
This revision makes the change to the comment requested by Mark Lord,
fixes some bugs in the bit shifting for writing the new led state,
and implements a show function so that led status can be read as
well as written.

 drivers/ata/ahci.c|  184 +-
 drivers/ata/libata-scsi.c |5 -
 include/linux/libata.h|2 
 3 files changed, 187 insertions(+), 4 deletions(-)

Index: 2.6-git/drivers/ata/ahci.c
===
--- 2.6-git.orig/drivers/ata/ahci.c 2007-11-30 12:04:12.0 -0800
+++ 2.6-git/drivers/ata/ahci.c  2007-11-30 18:02:19.0 -0800
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define DRV_NAME   "ahci"
@@ -92,6 +93,8 @@ enum {
HOST_IRQ_STAT   = 0x08, /* interrupt status */
HOST_PORTS_IMPL = 0x0c, /* bitmap of implemented ports */
HOST_VERSION= 0x10, /* AHCI spec. version compliancy */
+   HOST_EM_LOC = 0x1c, /* Enclosure Management location */
+   HOST_EM_CTL = 0x20, /* Enclosure Management Control */
 
/* HOST_CTL bits */
HOST_RESET  = (1 << 0),  /* reset controller; self-clear */
@@ -99,6 +102,7 @@ enum {
HOST_AHCI_EN= (1 << 31), /* AHCI enabled */
 
/* HOST_CAP bits */
+   HOST_CAP_EMS= (1 << 6),  /* Enclosure Management support */
HOST_CAP_SSC= (1 << 14), /* Slumber capable */
HOST_CAP_PMP= (1 << 17), /* Port Multiplier support */
HOST_CAP_CLO= (1 << 24), /* Command List Override support */
@@ -193,6 +197,10 @@ enum {
  ATA_FLAG_ACPI_SATA | ATA_FLAG_AN |
  ATA_FLAG_IPM,
AHCI_LFLAG_COMMON   = ATA_LFLAG_SKIP_D2H_BSY,
+
+   /* em_ctl bits */
+   EM_CTL_RST  = (1 << 9), /* Reset */
+   EM_CTL_TM   = (1 << 8), /* Transmit Message */
 };
 
 struct ahci_cmd_hdr {
@@ -216,6 +224,7 @@ struct ahci_host_priv {
u32 port_map;   /* port map to use */
u32 saved_cap;  /* saved initial cap */
u32 saved_port_map; /* saved initial port_map */
+   u32 em_loc; /* enclosure management location */
 };
 
 struct ahci_port_priv {
@@ -231,6 +240,7 @@ struct ahci_port_priv {
unsigned intncq_saw_dmas:1;
unsigned intncq_saw_sdb:1;
u32 intr_mask;  /* interrupts to enable */
+   u16 led_state;  /* saved current led state */
 };
 
 static int ahci_scr_read(struct ata_port *ap, unsigned int sc_reg, u32 *val);
@@ -572,6 +582,11 @@ static struct pci_driver ahci_pci_driver
 #endif
 };
 
+static int ahci_em_messages = 1;
+module_param(ahci_em_messages, int, 0444);
+/* add other LED protocol types when they become supported */
+MODULE_PARM_DESC(ahci_em_messages,
+   "Set AHCI Enclosure Management Message type (0 = disabled, 1 = LED");
 
 static inline int ahci_nr_ports(u32 cap)
 {
@@ -1079,6 +1094,148 @@ static int ahci_reset_controller(struct 
return 0;
 }
 
+/** LED Enclosure Management routines /
+static int ahci_reset_em(struct ata_host *host)
+{
+   void __iomem *mmio = host->iomap[AHCI_PCI_BAR];
+   u32 em_ctl;
+
+   em_ctl = readl(mmio + HOST_EM_CTL);
+   if ((em_ctl & EM_CTL_TM) || (em_ctl & EM_CTL_RST))
+   return -EINVAL;
+
+   writel(em_ctl | EM_CTL_RST, mmio + HOST_EM_CTL);
+   return 0;
+}
+
+static int ahci_transmit_led_message(struct ata_port *ap, int led_num,
+   int state)
+{
+   struct ahci_host_priv *hpriv = ap->host->private_data;
+   void __iomem *mmio = ap->host->iomap[AHCI_PCI_BAR];
+   struct ahci_port_priv *pp = ap->private_data;
+   u32 em_ctl;
+   u32 message[] = {0, 0};
+   unsigned int flags;
+
+   spin_lock_irqsave(ap->lock, flags);
+
+   /*
+* if we are still busy transmitting a previous message,
+* do not allow
+*/
+   em_ctl = readl(mmio + HOST_EM_CTL);
+   if (em_ctl & EM_CTL_TM) {
+   spin_unlock_irqrestore(ap->lock, flags);
+   return -EINVAL;
+   }
+
+   /*
+* create message header - this is all zero except for
+* the message size, which is 4 bytes.
+*/
+   message[0] |= (4 << 8);
+
+   pp->led_state &= ~(7 << (3*led_num));
+
+   /*
+* create the actual message
+* This does not support Port Multiplier at this time
+* due to lack of hardware for

Re: Kernel Development & Objective-C

2007-11-30 Thread Al Viro

On Sat, Dec 01, 2007 at 12:31:19AM +, Al Viro wrote:
> somehow be reassigned back and forth, according to the value of this.  The
s/this/thing/, of course
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Declare PNP option parsing functions as __init

2007-11-30 Thread Rene Herman


On 01-12-07 00:52, Bjorn Helgaas wrote:


On Friday 30 November 2007 04:37:26 pm Rene Herman wrote:

On 30-11-07 18:04, Thomas Renninger wrote:

If I have not overseen something, it should be rather obvious that those
can all be declared __init...
---

Declare PNP option parsing functions as __init

There are three kind of parse functions provided by PNP acpi/bios:
 - get current resources
 - set resources
 - get possible resources
The first two may be needed later at runtime.
The possible resource settings should never change dynamically.
And even if this would make any sense (I doubt it), the current implementation
only parses possible resource settings at early init time:
  -> declare all the option parsing __init

Signed-off-by: Thomas Renninger <[EMAIL PROTECTED]>

Yes. Obviousness aside,

(0) pnpacpi_add_device  is only caller of
...


I agree this is probably safe in the current implementation.

However, I think the current implementation is just broken because
we can't really handle hotplug of ACPI devices.  Specifically, I think
the first TBD in acpi_bus_check_device() should be fleshed out so it
does something like pnpacpi_add_device().

So my dissenting opinion is that this patch would just get reverted
soon anyway when somebody finishes implementing ACPI hotplug, and
therefore it's not worth doing.




The PnPBIOS bits should still be fine at least I guess. And, it would seem 
this is rather essential to Thomas' efforts of making this stuff dynamic in 
the first place anyway. In these threads, Alan Cox (added to CC) earlier 
commented that with required locking when things are quite _that_ dynamic a 
different setup than the one currently on the table seems in order.


I don't know, but small steps may make sense giving that it seems dynamic 
allocation is fairly wanted with the massive number of PnPACPI resources 
people are reporting since the warning about overrunning them was added.


Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid overflows in kernel/time.c

2007-11-30 Thread Adrian Bunk

On Thu, Nov 29, 2007 at 04:19:51PM -0800, H. Peter Anvin wrote:
> When the conversion factor between jiffies and milli- or microseconds
> is not a single multiply or divide, as for the case of HZ == 300, we
> currently do a multiply followed by a divide.  The intervening
> result, however, is subject to overflows, especially since the
> fraction is not simplified (for HZ == 300, we multiply by 300 and
> divide by 1000).
>...
>  kernel/Makefile |8 +++
>  kernel/time.c   |   29 +---
>  kernel/timeconst.bc |  123 
> +++
>  3 files changed, 152 insertions(+), 8 deletions(-)
>  create mode 100644 kernel/timeconst.bc
>...

I have read the hep text, but are the advantages of HZ == 300 really 
visible or was this more theoretical?

In the latter case, we might remove the HZ == 300 choice instead.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-11-30 Thread Al Viro

On Sat, Dec 01, 2007 at 12:19:50AM +0100, J.A. Magall??n wrote:
> An vtable in C++ takes exactly the same space that the function
> table pointer present in every driver nowadays... and probably
> the virtual method call that C++ does itself with
> 
>   thing->do_something(with,this)
> 
> like
>   push thing
>   push with
>   push this
>   call THING_vtable+indexof(do_something) // constants at compile time

This is not what vtables are.  Think for a minute - all codepaths arriving
to that point in your code will pick the address to call from the same
location.  Either the contents of that location is constant (in which case
you could bloody well call it directly in the first place) *or* it has to
somehow be reassigned back and forth, according to the value of this.  The
former is dumb, the latter - outright insane.

The contents of vtables is constant.  The whole point of that thing is
to deal with the situations where we _can't_ tell which derived class
this ->do_something() is from; if we could tell which vtable it is at
compile time, we wouldn't need to bother at all.

It's a tradeoff - we pay the extra memory access (fetch vtable pointer, then 
fetch method from vtable) for not having to store a slew of method pointers
in each instance of base class.  But the extra memory access is very much
there.  It can be further optimized away if you have several method calls
for the same object next to each other (then vtable can be picked once),
but it's still done at runtime.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: capabilities-introduce-per-process-capability-bounding-set.patch breaks FC6 Avahi

2007-11-30 Thread Jeff Dike

On Fri, Nov 30, 2007 at 11:29:47PM +0100, Jiri Slaby wrote:
> Nope, try this :):
> http://lkml.org/lkml/2007/11/28/390

Excellent, thanks.

I just wanted to make sure that someone knew about this.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-11-30 Thread Arnaldo Carvalho de Melo

Em Fri, Nov 30, 2007 at 11:40:13PM +, Alan Cox escreveu:
> > BCPL was typeless, as was the successor B (between Bell Labs and GE we 
> 
> B isn't quite typeless. It has minimal inbuilt support for concepts like
> strings (although you can of course multiply a string by an array
> pointer ;))
> 
> It also had some elegances that C lost, notably 
> 
>   case 1..5:

Hey, the language we use, gcC has this too 8-)

[EMAIL PROTECTED] net-2.6.25]$ find . -name "*.c" | xargs grep 'case.\+\.\.' | 
wc -l
400
[EMAIL PROTECTED] net-2.6.25]$ find . -name "*.c" | xargs grep 'case.\+\.\.' | 
head
./kernel/signal.c:  default: /* this is just in case for now ... */
./kernel/audit.c:   case AUDIT_FIRST_USER_MSG ...  AUDIT_LAST_USER_MSG:
./kernel/audit.c:   case AUDIT_FIRST_USER_MSG2 ...  AUDIT_LAST_USER_MSG2:
./kernel/audit.c:   case AUDIT_FIRST_USER_MSG ...  AUDIT_LAST_USER_MSG:
./kernel/audit.c:   case AUDIT_FIRST_USER_MSG2 ...  AUDIT_LAST_USER_MSG2:
./kernel/timer.c:* well, in that case 2.2.x was broken anyways...
./arch/frv/kernel/traps.c:  case TBR_TT_TRAP2 ... TBR_TT_TRAP126:
./arch/frv/kernel/ptrace.c: case 0 ... PT__END - 1:
./arch/frv/kernel/ptrace.c: case 0 ... PT__END-1:
./arch/frv/kernel/gdb-stub.c:   case GDB_REG_GR(1) ...  
GDB_REG_GR(63):
[EMAIL PROTECTED] net-2.6.25]$

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Out of tree module using LSM

2007-11-30 Thread James Morris

On Fri, 30 Nov 2007, Crispin Cowan wrote:

> > The only case of this so far has been Multiadm, although there seems to be 
> > no reason for it to stay out of tree.
> >   
> Dazuko. It has the same yucky code issues as Talpa, but AFAIK is pure
> GPL2 and thus is clean on the license issues.
>
> That these modules are valid modules that users want to use, are GPL
> clean, and are *not* something LKML wants to upstream because of code
> issues, is precisely why the LSM interface makes sense.

I think the idea is to try and fix code quality issues through 
participation in the upstream process, rather than have upstream maintain 
stable kernel APIs which are naturally mismatched to the unknown 
requirements of out of tree users.

- James
-- 
James Morris 
<[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] net: Implement the per network namespace sysctl infrastructure

2007-11-30 Thread Serge E. Hallyn

Quoting Eric W. Biederman ([EMAIL PROTECTED]):
> "Serge E. Hallyn" <[EMAIL PROTECTED]> writes:
> 
> >
> > Hey Eric,
> >
> > the patches look nice.
> >
> > The hand-forcing of the passed-in net_ns into a copy of current->nsproxy
> > does make it seem like nsproxy may not be the best choice of what to
> > pass in.  Doesn't only net_sysctl_root->lookup() look at the argument?
> 
> Yes.  Although I call it from __register_sysctl_paths.
> 
> > But I assume you don't want to be more general than sending in a
> > nsproxy so as to dissuade abuse of this interface for needlessly complex
> > sysctl interfaces?
> 
> A bit of that.  I would love to pass in a task_struct so you can use
> anything from a task.  The trouble is I don't have any task_structs or
> nsproxys with the proper value at the point where I am first setting
> this up.  Further I have to have the full sysctl lookup working or I
> could not call sysctl_check.
> 
> > (Well I expect that'll become clear once the the patches using this
> > come out.)
> >
> > Are you planning to use this infrastructure for the uts and ipc
> > sysctls as well?
> 
> Yes.  Where it comes in especially useful, is I can move /proc/sys
> to /proc/sys//task//sys.  And get a particular processes
> view of sysctl.  
> 
> We also get a little more reuse of common functions.
> 
> Otherwise Pavel does have a point that using this for uts and ipc
> is not a savings lines of code wise.
> 
> After having seen Pavel changes I am asking myself if there is a sane
> way to remove the ctl_name argument from the ctl_path.
> 
> Anyway where I am with the nsproxy question was that I don't
> see anything easily better.  What I have works and gets the job
> done, and doesn't have any module unload races or holes where a sloppy
> programmer can mess up the sysctl tree.  We needed a solution.
> Trying any harder to find something better would take ages.  So
> I figured this implementation was good enough.

I agree.  So it's already in -mm but still

Acked-by: Serge Hallyn <[EMAIL PROTECTED]>

thanks,
-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Declare PNP option parsing functions as __init

2007-11-30 Thread Bjorn Helgaas

On Friday 30 November 2007 04:37:26 pm Rene Herman wrote:
> On 30-11-07 18:04, Thomas Renninger wrote:
> > If I have not overseen something, it should be rather obvious that those
> > can all be declared __init...
> > ---
> > 
> > Declare PNP option parsing functions as __init
> > 
> > There are three kind of parse functions provided by PNP acpi/bios:
> >  - get current resources
> >  - set resources
> >  - get possible resources
> > The first two may be needed later at runtime.
> > The possible resource settings should never change dynamically.
> > And even if this would make any sense (I doubt it), the current 
> > implementation
> > only parses possible resource settings at early init time:
> >   -> declare all the option parsing __init
> > 
> > Signed-off-by: Thomas Renninger <[EMAIL PROTECTED]>
> 
> Yes. Obviousness aside,
> 
> (0) pnpacpi_add_device  is only caller of
> ...

I agree this is probably safe in the current implementation.

However, I think the current implementation is just broken because
we can't really handle hotplug of ACPI devices.  Specifically, I think
the first TBD in acpi_bus_check_device() should be fleshed out so it
does something like pnpacpi_add_device().

So my dissenting opinion is that this patch would just get reverted
soon anyway when somebody finishes implementing ACPI hotplug, and
therefore it's not worth doing.

Bjorn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-11-30 Thread Nicholas Miell


On Sat, 2007-12-01 at 00:19 +0100, J.A. Magallón wrote:

> An vtable in C++ takes exactly the same space that the function
> table pointer present in every driver nowadays... and probably
> the virtual method call that C++ does itself with
> 
>   thing->do_something(with,this)
> 
> like
>   push thing
>   push with
>   push this
>   call THING_vtable+indexof(do_something) // constants at compile time
> 
> is much more efficient that what gcc can mangle to do with
> 
>   thing->do_something(with,this,thing)
> 
>   push with
>   push this
>   push thing
>   get thing+offsetof(do_something) // not constant at compile time
>   dereference it
>   call it
> 
> (that is, get a generic field on a structure and use it as jump address)
> 
> In short, the kernel is object oriented, implements OO programming by
> hand, but the compiler lacks the knowledge that it is object oriented
> programming so it could do some optimizations.

struct test;
struct testVtbl
{
int (*fn1)(struct test *t, int x, int y);
int (*fn2)(struct test *t, int x, int y);
};
struct test
{
struct testVtbl *vtbl;
int x, y;
};
void testCall(struct test *t, int x, int y)
{
t->vtbl->fn1(t, x, y);
t->vtbl->fn2(t, x, y);
}

and

struct test
{
virtual int fn1(int x, int y);
virtual int fn2(int x, int y);

int x, y;
};

void testCall(struct test *t, int x, int y)
{
t->fn1(x, y);
t->fn2(x, y);
}

generate instruction-for-instruction identical code.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Possibly SATA related freeze killed networking and RAID

2007-11-30 Thread Tejun Heo

Phillip Susi wrote:
> Tejun Heo wrote:
>> Because SFF ATA controller don't have IRQ pending bit.  You don't know
>> whether IRQ is raised or not.  Plus, accessing the status register which
>> clears pending IRQ can be very slow on PATA machines.  It has to go
>> through the PCI and ATA bus and come back.  So, unconditionally trying
>> to clear IRQ by accessing Status can incur noticeable overhead if the
>> IRQ is shared with devices which raise a lot of IRQs.
> 
> There HAS to be a way to determine if that device generated the
> interrupt, or the interrupt can not be shared.  Since the kernel said
> nobody cared about the interrupt, that indicates that the sata driver
> checked the status register and realized the sata chip didn't generate
> the interrupt, and returned to the kernel letting it know that the
> interrupt was not for it.

Surprise, surprise.  There's no way to tell whether the controller
raised interrupt or not if command is not in progress.  As I said
before, there's no IRQ pending bit.  While processing commands, you can
tell by looking at other status registers but when there's nothing in
flight and the controller determines it's a good time to raise a
spurious interrupt, there's no way you can tell.  That dang SFF
interface is like 15+ years old.

But we can still make things pretty robust.  We're working on it.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ipwireless_cs driver for 4G PC Card

2007-11-30 Thread Michael Robb

>At the moment they seem to be ending up under serial, so I would prefer
>consistency between the USB tty interfaces for 3G cards, the stuff like
?Nozomi and any newer goodies.

That makes sense - I was wondering how the device drivers are going to
be categorised given that the same hardware can be interface to the system
either through pcmicia, USB ports, or hardwired directly onto the motherboard.

- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Out of tree module using LSM

2007-11-30 Thread Crispin Cowan

James Morris wrote:
> On Fri, 30 Nov 2007, Crispin Cowan wrote:
>> restored faces a lot of challenges, but I hope that some kind of
>> solution can be found, because the alternative is to effectively force
>> vendors like Sophos to do it the "dirty" way by fishing in memory for
>> the syscall table.
>> 
> I don't think this is quite correct.
>
> The alternative is to engage with the kernel community to become part of 
> the development process, to ensure that appropriate APIs are implemented, 
> and to get code upstream before shipping it.
>   
That would be part of the "some kind of solution can be found" so I
think we are in agreement here.

> In any case, a patch to revert the dynamic aspect of LSM has been posted 
> by Arjan (and acked by myself) for the case of valid out of tree users.  
> The only case of this so far has been Multiadm, although there seems to be 
> no reason for it to stay out of tree.
>   
Dazuko. It has the same yucky code issues as Talpa, but AFAIK is pure
GPL2 and thus is clean on the license issues.

That these modules are valid modules that users want to use, are GPL
clean, and are *not* something LKML wants to upstream because of code
issues, is precisely why the LSM interface makes sense.

Crispin

-- 
Crispin Cowan, Ph.D.   http://crispincowan.com/~crispin
CEO, Mercenary Linux   http://mercenarylinux.com/
   Itanium. Vista. GPLv3. Complexity at work

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-11-30 Thread Alan Cox

> BCPL was typeless, as was the successor B (between Bell Labs and GE we 

B isn't quite typeless. It has minimal inbuilt support for concepts like
strings (although you can of course multiply a string by an array
pointer ;))

It also had some elegances that C lost, notably 

case 1..5:

the ability to do no zero biased arrays

x[40];
x-=10;

and the ability to reassign function names.

printk = wombat;

as well as stuff like free(function);

Alan (who learned B before C, and is still waiting for P)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 27/28] blk_end_request: changing scsi mid-layer for bidi (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts bidi of scsi mid-layer to use blk_end_request().

rq->next_rq represents a pair of bidi requests.
(There are no other use of 'next_rq' of struct request.)
For both requests in the pair, end_that_request_chunk() should be
called before end_that_request_last() is called for one of them.
Since the calls to end_that_request_first()/chunk() and
end_that_request_last() are packaged into blk_end_request(),
the handling of next_rq completion has to be moved into
blk_end_request(), too.

Bidi sets its specific value to rq->data_len before the request is
completed so that upper-layer can read it.
This setting must be between end_that_request_chunk() and
end_that_request_last(), because rq->data_len may be used
in end_that_request_chunk() by blk_trace and so on.
To satisfy the requirement, use blk_end_request_callback() which
is added in PATCH 25 only for the tricky drivers.

If bidi didn't reuse rq->data_len and added new members to request
for the specific value, it could set before end_that_request_chunk()
and use the standard blk_end_request() like below.

void scsi_end_bidi_request(struct scsi_cmnd *cmd)
{
struct request *req = cmd->request;

rq->resid = scsi_out(cmd)->resid;
rq->next_rq->resid = scsi_in(cmd)->resid;

if (blk_end_request(req, 1, req->data_len))
BUG();

scsi_release_buffers(cmd);
scsi_next_command(cmd);
}

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c   |   18 +
 drivers/scsi/scsi_lib.c |   66 
 2 files changed, 52 insertions(+), 32 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c
===
--- 2.6.24-rc3-mm2.orig/drivers/scsi/scsi_lib.c
+++ 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c
@@ -629,28 +629,6 @@ void scsi_run_host_queues(struct Scsi_Ho
scsi_run_queue(sdev->request_queue);
 }
 
-static void scsi_finalize_request(struct scsi_cmnd *cmd, int uptodate)
-{
-   struct request_queue *q = cmd->device->request_queue;
-   struct request *req = cmd->request;
-   unsigned long flags;
-
-   add_disk_randomness(req->rq_disk);
-
-   spin_lock_irqsave(q->queue_lock, flags);
-   if (blk_rq_tagged(req))
-   blk_queue_end_tag(q, req);
-
-   end_that_request_last(req, uptodate);
-   spin_unlock_irqrestore(q->queue_lock, flags);
-
-   /*
-* This will goose the queue request function at the end, so we don't
-* need to worry about launching another command.
-*/
-   scsi_next_command(cmd);
-}
-
 /*
  * Function:scsi_end_request()
  *
@@ -921,6 +899,20 @@ void scsi_release_buffers(struct scsi_cm
 EXPORT_SYMBOL(scsi_release_buffers);
 
 /*
+ * Called from blk_end_request_callback() after all DATA in rq and its next_rq
+ * are completed before rq is completed/freed.
+ */
+static int scsi_end_bidi_request_cb(struct request *rq)
+{
+   struct scsi_cmnd *cmd = rq->special;
+
+   rq->data_len = scsi_out(cmd)->resid;
+   rq->next_rq->data_len = scsi_in(cmd)->resid;
+
+   return 0;
+}
+
+/*
  * Bidi commands Must be complete as a whole, both sides at once.
  * If part of the bytes were written and lld returned
  * scsi_in()->resid and/or scsi_out()->resid this information will be left
@@ -931,22 +923,32 @@ void scsi_end_bidi_request(struct scsi_c
 {
struct request *req = cmd->request;
 
-   end_that_request_chunk(req, 1, req->data_len);
-   req->data_len = scsi_out(cmd)->resid;
-
-   end_that_request_chunk(req->next_rq, 1, req->next_rq->data_len);
-   req->next_rq->data_len = scsi_in(cmd)->resid;
-
-   scsi_release_buffers(cmd);
-
/*
 *FIXME: If ll_rw_blk.c is changed to also put_request(req->next_rq)
-*   in end_that_request_last() then this WARN_ON must be removed.
+*   in blk_end_request() then this WARN_ON must be removed.
 *   for now, upper-driver must have registered an end_io.
 */
WARN_ON(!req->end_io);
 
-   scsi_finalize_request(cmd, 1);
+   /*
+* blk_end_request() family take care of data completion of next_rq.
+*
+* req->data_len and req->next_rq->data_len must be set after
+* all data are completed, since they may be referenced during
+* the data completion process.
+* So use the callback feature of blk_end_request() here.
+*
+* NOTE: If bidi doesn't reuse the data_len field for upper-layer's
+*   reference (e.g. adds new members for it to struct request),
+*   we can use the standard blk_end_request() interface here.
+*/
+   if (blk_end_request_callback(req, 1, req->data_len,
+scsi_end_bidi_request_cb))
+   /* req has not been completed */
+   BUG();
+
+

Re: [PATCH] Declare PNP option parsing functions as __init

2007-11-30 Thread Rene Herman


On 30-11-07 18:04, Thomas Renninger wrote:


If I have not overseen something, it should be rather obvious that those
can all be declared __init...
---

Declare PNP option parsing functions as __init

There are three kind of parse functions provided by PNP acpi/bios:
 - get current resources
 - set resources
 - get possible resources
The first two may be needed later at runtime.
The possible resource settings should never change dynamically.
And even if this would make any sense (I doubt it), the current implementation
only parses possible resource settings at early init time:
  -> declare all the option parsing __init

Signed-off-by: Thomas Renninger <[EMAIL PROTECTED]>


Yes. Obviousness aside,

(0) pnpacpi_add_device  is only caller of
(1)   pnpacpi_parse_resource_option_datais only caller of
(2) pnpacpi_option_resource is only caller of
(3)   pnpacpi_parse_irq_option
(3)   pnpacpi_parse_dma_option
(3)   pnpacpi_parse_port_option
(3)   pnpacpi_parse_fixed_port_option
(3)   pnpacpi_parse_mem24_option
(3)   pnpacpi_parse_mem32_option
(3)   pnpacpi_parse_fixed_mem32_option
(3)   pnpacpi_parse_address_option
(3)   pnpacpi_parse_ext_irq_option

and

(0) build_devlist   is only caller of
(1)   insert_device is only caller of
(2) pnpbios_parse_data_stream   is only caller of
(3)   pnpbios_parse_resource_option_datais only caller of
(4) pnpbios_parse_mem_option
(4) pnpbios_parse_mem32_option
(4) pnpbios_parse_fixed_mem32_option
(4) pnpbios_parse_irq_option
(4) pnpbios_parse_dma_option
(4) pnpbios_parse_port_option
(4) pnpbios_parse_fixed_port_option

which given that both (0)s are __init already, means all are fine indeed.

Acked-By: Rene Herman <[EMAIL PROTECTED]>

Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 26/28] blk_end_request: changing ide-cd (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts ide-cd (cdrom_newpc_intr()) to use blk_end_request().

ide-cd (cdrom_newpc_intr()) has some tricky behaviors below which
need to use blk_end_request_callback().
Needs to:
  1. call post_transform_command() to modify request contents
  2. wait completing request until DRQ_STAT is cleared
after end_that_request_first() and before end_that_request_last().

As for the second one, ide-cd will wait for the interrupt from device.
So blk_end_request_callback() has to return without completing request
even if no leftover in the request.
ide-cd uses a dummy callback function, which just returns value '1',
to tell blk_end_request_callback() about that.

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/ide/ide-cd.c |   78 +++
 1 files changed, 61 insertions(+), 17 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/ide/ide-cd.c
===
--- 2.6.24-rc3-mm2.orig/drivers/ide/ide-cd.c
+++ 2.6.24-rc3-mm2/drivers/ide/ide-cd.c
@@ -1669,6 +1669,37 @@ static void post_transform_command(struc
}
 }
 
+/*
+ * Called from blk_end_request_callback() after the data of the request
+ * is completed and before the request is completed.
+ */
+static int cdrom_newpc_intr_dma_cb(struct request *rq)
+{
+   ide_drive_t *drive = rq->q->queuedata;
+   spinlock_t *ide_lock = rq->q->queue_lock;
+   unsigned long flags = 0UL;
+
+   rq->data_len = 0;
+   post_transform_command(rq);
+
+   spin_lock_irqsave(ide_lock, flags);
+   HWGROUP(drive)->rq = NULL;
+   spin_unlock_irqrestore(ide_lock, flags);
+
+   return 0;
+}
+
+/*
+ * Called from blk_end_request_callback() after the data of the request
+ * is completed and before the request is completed.
+ * By returning value '1', blk_end_request_callback() returns immediately
+ * without completing the request.
+ */
+static int cdrom_newpc_intr_dummy_cb(struct request *rq)
+{
+   return 1;
+}
+
 typedef void (xfer_func_t)(ide_drive_t *, void *, u32);
 
 /*
@@ -1707,9 +1738,16 @@ static ide_startstop_t cdrom_newpc_intr(
return ide_error(drive, "dma error", stat);
}
 
-   end_that_request_chunk(rq, 1, rq->data_len);
-   rq->data_len = 0;
-   goto end_request;
+   /*
+* post_transform_command() needs to be called after
+* the data of the request is completed, since it may
+* modify the data area of the request.
+* So use the callback special feature of blk_end_request().
+*/
+   if (blk_end_request_callback(rq, 1, rq->data_len,
+cdrom_newpc_intr_dma_cb))
+   BUG();
+   return ide_stopped;
}
 
/*
@@ -1727,8 +1765,18 @@ static ide_startstop_t cdrom_newpc_intr(
/*
 * If DRQ is clear, the command has completed.
 */
-   if ((stat & DRQ_STAT) == 0)
-   goto end_request;
+   if ((stat & DRQ_STAT) == 0) {
+   if (!rq->data_len)
+   post_transform_command(rq);
+
+   spin_lock_irqsave(&ide_lock, flags);
+   if (__blk_end_request(rq, 1, 0))
+   BUG();
+   HWGROUP(drive)->rq = NULL;
+   spin_unlock_irqrestore(&ide_lock, flags);
+
+   return ide_stopped;
+   }
 
/*
 * check which way to transfer data
@@ -1781,7 +1829,14 @@ static ide_startstop_t cdrom_newpc_intr(
rq->data_len -= blen;
 
if (rq->bio)
-   end_that_request_chunk(rq, 1, blen);
+   /*
+* The request can't be completed until DRQ is cleared.
+* So complete the data, but don't complete the request
+* using the dummy function for the callback feature
+* of blk_end_request().
+*/
+   blk_end_request_callback(rq, 1, blen,
+cdrom_newpc_intr_dummy_cb);
else
rq->data += blen;
}
@@ -1802,17 +1857,6 @@ static ide_startstop_t cdrom_newpc_intr(
 
ide_set_handler(drive, cdrom_newpc_intr, rq->timeout, NULL);
return ide_started;
-
-end_request:
-   if (!rq->data_len)
-   post_transform_command(rq);
-
-   spin_lock_irqsave(&ide_lock, flags);
-   blkdev_dequeue_request(rq);
-   end_that_request_last(rq, 1);
-   HWGROUP(drive)->rq = NULL;
-   spin_unlock_irqrestore(&ide_lock, flags);
-   return ide_stopped;
 }
 
 static ide_startstop_t cdrom_write_intr(ide_drive_t *drive)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the b

[PATCH 28/28] blk_end_request: remove/unexport end_that_request_* (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch removes the following functions:
  o end_that_request_first()
  o end_that_request_chunk()
and stops exporting the functions below:
  o end_that_request_last()

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c  |   61 -
 include/linux/blkdev.h |   15 
 2 files changed, 21 insertions(+), 55 deletions(-)

Index: 2.6.24-rc3-mm2/block/ll_rw_blk.c
===
--- 2.6.24-rc3-mm2.orig/block/ll_rw_blk.c
+++ 2.6.24-rc3-mm2/block/ll_rw_blk.c
@@ -3415,6 +3415,20 @@ static void blk_recalc_rq_sectors(struct
}
 }
 
+/**
+ * __end_that_request_first - end I/O on a request
+ * @req:  the request being processed
+ * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
+ * @nr_bytes: number of bytes to complete
+ *
+ * Description:
+ * Ends I/O on a number of bytes attached to @req, and sets it up
+ * for the next range of segments (if any) in the cluster.
+ *
+ * Return:
+ * 0 - we are done with this request, call end_that_request_last()
+ * 1 - still buffers pending for this request
+ **/
 static int __end_that_request_first(struct request *req, int uptodate,
int nr_bytes)
 {
@@ -3531,49 +3545,6 @@ static int __end_that_request_first(stru
return 1;
 }
 
-/**
- * end_that_request_first - end I/O on a request
- * @req:  the request being processed
- * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
- * @nr_sectors: number of sectors to end I/O on
- *
- * Description:
- * Ends I/O on a number of sectors attached to @req, and sets it up
- * for the next range of segments (if any) in the cluster.
- *
- * Return:
- * 0 - we are done with this request, call end_that_request_last()
- * 1 - still buffers pending for this request
- **/
-int end_that_request_first(struct request *req, int uptodate, int nr_sectors)
-{
-   return __end_that_request_first(req, uptodate, nr_sectors << 9);
-}
-
-EXPORT_SYMBOL(end_that_request_first);
-
-/**
- * end_that_request_chunk - end I/O on a request
- * @req:  the request being processed
- * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
- * @nr_bytes: number of bytes to complete
- *
- * Description:
- * Ends I/O on a number of bytes attached to @req, and sets it up
- * for the next range of segments (if any). Like end_that_request_first(),
- * but deals with bytes instead of sectors.
- *
- * Return:
- * 0 - we are done with this request, call end_that_request_last()
- * 1 - still buffers pending for this request
- **/
-int end_that_request_chunk(struct request *req, int uptodate, int nr_bytes)
-{
-   return __end_that_request_first(req, uptodate, nr_bytes);
-}
-
-EXPORT_SYMBOL(end_that_request_chunk);
-
 /*
  * splice the completion data to a local structure and hand off to
  * process_completion_queue() to complete the requests
@@ -3653,7 +3624,7 @@ EXPORT_SYMBOL(blk_complete_request);
 /*
  * queue lock must be held
  */
-void end_that_request_last(struct request *req, int uptodate)
+static void end_that_request_last(struct request *req, int uptodate)
 {
struct gendisk *disk = req->rq_disk;
int error;
@@ -3688,8 +3659,6 @@ void end_that_request_last(struct reques
__blk_put_request(req->q, req);
 }
 
-EXPORT_SYMBOL(end_that_request_last);
-
 static inline void __end_request(struct request *rq, int uptodate,
 unsigned int nr_bytes)
 {
Index: 2.6.24-rc3-mm2/include/linux/blkdev.h
===
--- 2.6.24-rc3-mm2.orig/include/linux/blkdev.h
+++ 2.6.24-rc3-mm2/include/linux/blkdev.h
@@ -717,19 +717,16 @@ static inline void blk_run_address_space
 }
 
 /*
- * end_request() and friends. Must be called with the request queue spinlock
- * acquired. All functions called within end_request() _must_be_ atomic.
+ * blk_end_request() and friends.
+ * __blk_end_request() and end_request() must be called with
+ * the request queue spinlock acquired.
  *
  * Several drivers define their own end_request and call
- * end_that_request_first() and end_that_request_last()
- * for parts of the original function. This prevents
- * code duplication in drivers.
+ * blk_end_request() for parts of the original function.
+ * This prevents code duplication in drivers.
  */
 extern int blk_end_request(struct request *rq, int uptodate, int nr_bytes);
 extern int __blk_end_request(struct request *rq, int uptodate, int nr_bytes);
-extern int end_that_request_first(struct request *, int, int);
-extern int end_that_request_chunk(struct request *, int, int);
-extern void end_that_request_last(struct request *, int);
 extern void end_request(struct request *, int);
 extern void end_queued_request(struct request *, int);
 extern void end_dequeued_

[PATCH 25/28] blk_end_request: add callback feature (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch adds a variant of the interface, blk_end_request_callback(),
which has driver callback feature.

There are 2 drivers which need to do special works between
end_that_request_first() and end_that_request_last():
ide-cd and scsi bidi.

For such drivers, blk_end_request_callback() allows it to pass
a callback function which is called between end_that_request_first()
and end_that_request_last().

This interface should/will be removed, after the drivers remove
such tricky behaviors.

o ide-cd (cdrom_newpc_intr())
  cdrom_newpc_intr() needs to:
1. call post_transform_command() to modify request contents
2. wait completing request until DRQ_STAT is cleared
  after end_that_request_first() and before end_that_request_last().

  As for the second one, ide-cd will wait for the interrupt from device.
  So end_that_request_first() and end_that_request_last() are called
  separately in ide-cd.
  This means blk_end_request_callback() has to return without
  completing request even if no leftover in the request.
  To satisfy the requirement, callback function has return value
  so that drivers can tell blk_end_request_callback() to return
  without completing request.

o scsi mid-layer for bidi request (scsi_end_bidi_request())
  Bidi sets its specific value to rq->data_len before the request is
  completed so that upper-layer can read it.
  This setting must be between end_that_request_chunk() and
  end_that_request_last(), because rq->data_len may be used
  in end_that_request_chunk() by blk_trace and so on.

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c  |   50 +
 include/linux/blkdev.h |3 ++
 2 files changed, 53 insertions(+)

Index: 2.6.24-rc3-mm2/block/ll_rw_blk.c
===
--- 2.6.24-rc3-mm2.orig/block/ll_rw_blk.c
+++ 2.6.24-rc3-mm2/block/ll_rw_blk.c
@@ -3850,6 +3850,56 @@ int __blk_end_request(struct request *rq
 }
 EXPORT_SYMBOL_GPL(__blk_end_request);
 
+/**
+ * blk_end_request_callback - Special helper function for tricky drivers
+ * @rq:   the request being processed
+ * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
+ * @nr_bytes: number of bytes to complete
+ * @drv_callback: function called between completion of bios in the request
+ *and completion of the request.
+ *If the callback returns non 0, this helper returns without
+ *completion of the request.
+ *
+ * Description:
+ * Ends I/O on a number of bytes attached to @rq.
+ * If @rq has leftover, sets it up for the next range of segments.
+ *
+ * This special helper function is used only for existing tricky drivers.
+ * (e.g. cdrom_newpc_intr() of ide-cd)
+ * This interface will be removed when such drivers are rewritten.
+ * Don't use this interface in other places anymore.
+ *
+ * Return:
+ * 0 - we are done with this request
+ * 1 - this request is not freed yet.
+ * this request still has pending buffers or
+ * the driver doesn't want to finish this request yet.
+ **/
+int blk_end_request_callback(struct request *rq, int uptodate, int nr_bytes,
+int (drv_callback)(struct request *))
+{
+   struct request_queue *q = rq->q;
+   unsigned long flags = 0UL;
+
+   if (blk_fs_request(rq) || blk_pc_request(rq)) {
+   if (__end_that_request_first(rq, uptodate, nr_bytes))
+   return 1;
+   }
+
+   /* Special feature for tricky drivers */
+   if (drv_callback && drv_callback(rq))
+   return 1;
+
+   add_disk_randomness(rq->rq_disk);
+
+   spin_lock_irqsave(q->queue_lock, flags);
+   complete_request(rq, uptodate);
+   spin_unlock_irqrestore(q->queue_lock, flags);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(blk_end_request_callback);
+
 static void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
struct bio *bio)
 {
Index: 2.6.24-rc3-mm2/include/linux/blkdev.h
===
--- 2.6.24-rc3-mm2.orig/include/linux/blkdev.h
+++ 2.6.24-rc3-mm2/include/linux/blkdev.h
@@ -733,6 +733,9 @@ extern void end_that_request_last(struct
 extern void end_request(struct request *, int);
 extern void end_queued_request(struct request *, int);
 extern void end_dequeued_request(struct request *, int);
+extern int blk_end_request_callback(struct request *rq, int uptodate,
+   int nr_bytes,
+   int (drv_callback)(struct request *));
 extern void blk_complete_request(struct request *);
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read t

[PATCH 23/28] blk_end_request: changing cpqarray (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts cpqarray to use blk_end_request().

cpqarray is a little bit different from "normal" drivers.
cpqarray directly calls bio_endio() and disk_stat_add()
when completing request.  But those can be replaced with
__end_that_request_first().
After the replacement, request completion procedures of
those drivers become like the following:
o end_that_request_first()
o add_disk_randomness()
o end_that_request_last()
This can be converted to blk_end_request() by following
the rule (b) mentioned in the patch subject
"[PATCH 01/28] blk_end_request: add new request completion interface".

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/cpqarray.c |   27 ++-
 1 files changed, 2 insertions(+), 25 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/cpqarray.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/cpqarray.c
+++ 2.6.24-rc3-mm2/drivers/block/cpqarray.c
@@ -167,7 +167,6 @@ static void start_io(ctlr_info_t *h);
 
 static inline void addQ(cmdlist_t **Qptr, cmdlist_t *c);
 static inline cmdlist_t *removeQ(cmdlist_t **Qptr, cmdlist_t *c);
-static inline void complete_buffers(struct bio *bio, int ok);
 static inline void complete_command(cmdlist_t *cmd, int timeout);
 
 static irqreturn_t do_ida_intr(int irq, void *dev_id);
@@ -980,19 +979,6 @@ static void start_io(ctlr_info_t *h)
}
 }
 
-static inline void complete_buffers(struct bio *bio, int ok)
-{
-   struct bio *xbh;
-
-   while (bio) {
-   xbh = bio->bi_next;
-   bio->bi_next = NULL;
-   
-   bio_endio(bio, ok ? 0 : -EIO);
-
-   bio = xbh;
-   }
-}
 /*
  * Mark all buffers that cmd was responsible for
  */
@@ -1030,18 +1016,9 @@ static inline void complete_command(cmdl
 pci_unmap_page(hba[cmd->ctlr]->pci_dev, cmd->req.sg[i].addr,
cmd->req.sg[i].size, ddir);
 
-   complete_buffers(rq->bio, ok);
-
-   if (blk_fs_request(rq)) {
-   const int rw = rq_data_dir(rq);
-
-   disk_stat_add(rq->rq_disk, sectors[rw], rq->nr_sectors);
-   }
-
-   add_disk_randomness(rq->rq_disk);
-
DBGPX(printk("Done with %p\n", rq););
-   end_that_request_last(rq, ok ? 1 : -EIO);
+   if (__blk_end_request(rq, ok, blk_rq_bytes(rq)))
+   BUG();
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 22/28] blk_end_request: changing cciss (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts cciss to use blk_end_request().

cciss is a little bit different from "normal" drivers.
cciss directly calls bio_endio() and disk_stat_add()
when completing request.  But those can be replaced with
__end_that_request_first().
After the replacement, request completion procedures of
those drivers become like the following:
o end_that_request_first()
o add_disk_randomness()
o end_that_request_last()
This can be converted to blk_end_request() by following
the rule (a) mentioned in the patch subject
"[PATCH 01/28] blk_end_request: add new request completion interface".

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/cciss.c |   25 +++--
 1 files changed, 3 insertions(+), 22 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/cciss.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/cciss.c
+++ 2.6.24-rc3-mm2/drivers/block/cciss.c
@@ -1187,17 +1187,6 @@ static int cciss_ioctl(struct inode *ino
}
 }
 
-static inline void complete_buffers(struct bio *bio, int status)
-{
-   while (bio) {
-   struct bio *xbh = bio->bi_next;
-
-   bio->bi_next = NULL;
-   bio_endio(bio, status ? 0 : -EIO);
-   bio = xbh;
-   }
-}
-
 static void cciss_check_queues(ctlr_info_t *h)
 {
int start_queue = h->next_to_run;
@@ -1263,21 +1252,14 @@ static void cciss_softirq_done(struct re
pci_unmap_page(h->pdev, temp64.val, cmd->SG[i].Len, ddir);
}
 
-   complete_buffers(rq->bio, (rq->errors == 0));
-
-   if (blk_fs_request(rq)) {
-   const int rw = rq_data_dir(rq);
-
-   disk_stat_add(rq->rq_disk, sectors[rw], rq->nr_sectors);
-   }
-
 #ifdef CCISS_DEBUG
printk("Done with %p\n", rq);
 #endif /* CCISS_DEBUG */
 
-   add_disk_randomness(rq->rq_disk);
+   if (blk_end_request(rq, (rq->errors == 0), blk_rq_bytes(rq)))
+   BUG();
+
spin_lock_irqsave(&h->lock, flags);
-   end_that_request_last(rq, (rq->errors == 0));
cmd_free(h, cmd, 1);
cciss_check_queues(h);
spin_unlock_irqrestore(&h->lock, flags);
@@ -2544,7 +2526,6 @@ after_error_processing:
}
cmd->rq->data_len = 0;
cmd->rq->completion_data = cmd;
-   blk_add_trace_rq(cmd->rq->q, cmd->rq, BLK_TA_COMPLETE);
blk_complete_request(cmd->rq);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 24/28] blk_end_request: changing ide normal caller (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts "normal" parts of ide to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/ide/ide-cd.c |6 +++---
 drivers/ide/ide-io.c |   17 ++---
 2 files changed, 9 insertions(+), 14 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/ide/ide-cd.c
===
--- 2.6.24-rc3-mm2.orig/drivers/ide/ide-cd.c
+++ 2.6.24-rc3-mm2/drivers/ide/ide-cd.c
@@ -655,9 +655,9 @@ static void cdrom_end_request (ide_drive
BUG();
} else {
spin_lock_irqsave(&ide_lock, flags);
-   end_that_request_chunk(failed, 0,
-   failed->data_len);
-   end_that_request_last(failed, 0);
+   if (__blk_end_request(failed, 0,
+ failed->data_len))
+   BUG();
spin_unlock_irqrestore(&ide_lock, flags);
}
} else
Index: 2.6.24-rc3-mm2/drivers/ide/ide-io.c
===
--- 2.6.24-rc3-mm2.orig/drivers/ide/ide-io.c
+++ 2.6.24-rc3-mm2/drivers/ide/ide-io.c
@@ -78,14 +78,9 @@ static int __ide_end_request(ide_drive_t
ide_dma_on(drive);
}
 
-   if (!end_that_request_chunk(rq, uptodate, nr_bytes)) {
-   add_disk_randomness(rq->rq_disk);
-   if (dequeue) {
-   if (!list_empty(&rq->queuelist))
-   blkdev_dequeue_request(rq);
+   if (!__blk_end_request(rq, uptodate, nr_bytes)) {
+   if (dequeue)
HWGROUP(drive)->rq = NULL;
-   }
-   end_that_request_last(rq, uptodate);
ret = 0;
}
 
@@ -290,9 +285,9 @@ static void ide_complete_pm_request (ide
drive->blocked = 0;
blk_start_queue(drive->queue);
}
-   blkdev_dequeue_request(rq);
HWGROUP(drive)->rq = NULL;
-   end_that_request_last(rq, 1);
+   if (__blk_end_request(rq, 1, 0))
+   BUG();
spin_unlock_irqrestore(&ide_lock, flags);
 }
 
@@ -402,10 +397,10 @@ void ide_end_drive_cmd (ide_drive_t *dri
}
 
spin_lock_irqsave(&ide_lock, flags);
-   blkdev_dequeue_request(rq);
HWGROUP(drive)->rq = NULL;
rq->errors = err;
-   end_that_request_last(rq, !rq->errors);
+   if (__blk_end_request(rq, !rq->errors, 0))
+   BUG();
spin_unlock_irqrestore(&ide_lock, flags);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 21/28] blk_end_request: changing xsysace (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts xsysace to use blk_end_request().

xsysace is a little bit different from "normal" drivers.
xsysace driver has a state machine in it.
It calls end_that_request_first() and end_that_request_last()
from different states. (ACE_FSM_STATE_REQ_TRANSFER and
ACE_FSM_STATE_REQ_COMPLETE, respectively.)

However, those states are consecutive and without any interruption
inbetween.
So we can just follow the standard conversion rule (b) mentioned in
the patch subject "[PATCH 01/27] blk_end_request: add new request
completion interface".

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/xsysace.c |5 +
 1 files changed, 1 insertion(+), 4 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/xsysace.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/xsysace.c
+++ 2.6.24-rc3-mm2/drivers/block/xsysace.c
@@ -703,7 +703,7 @@ static void ace_fsm_dostate(struct ace_d
 
/* bio finished; is there another one? */
i = ace->req->current_nr_sectors;
-   if (end_that_request_first(ace->req, 1, i)) {
+   if (__blk_end_request(ace->req, 1, i)) {
/* dev_dbg(ace->dev, "next block; h=%li c=%i\n",
 *  ace->req->hard_nr_sectors,
 *  ace->req->current_nr_sectors);
@@ -718,9 +718,6 @@ static void ace_fsm_dostate(struct ace_d
break;
 
case ACE_FSM_STATE_REQ_COMPLETE:
-   /* Complete the block request */
-   blkdev_dequeue_request(ace->req);
-   end_that_request_last(ace->req, 1);
ace->req = NULL;
 
/* Finished request; go to idle state */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 18/28] blk_end_request: changing s390 (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts s390 to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/s390/block/dasd.c  |4 +---
 drivers/s390/char/tape_block.c |3 +--
 2 files changed, 2 insertions(+), 5 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/s390/block/dasd.c
===
--- 2.6.24-rc3-mm2.orig/drivers/s390/block/dasd.c
+++ 2.6.24-rc3-mm2/drivers/s390/block/dasd.c
@@ -1080,10 +1080,8 @@ dasd_int_handler(struct ccw_device *cdev
 static inline void
 dasd_end_request(struct request *req, int uptodate)
 {
-   if (end_that_request_first(req, uptodate, req->hard_nr_sectors))
+   if (__blk_end_request(req, uptodate, blk_rq_bytes(req)))
BUG();
-   add_disk_randomness(req->rq_disk);
-   end_that_request_last(req, uptodate);
 }
 
 /*
Index: 2.6.24-rc3-mm2/drivers/s390/char/tape_block.c
===
--- 2.6.24-rc3-mm2.orig/drivers/s390/char/tape_block.c
+++ 2.6.24-rc3-mm2/drivers/s390/char/tape_block.c
@@ -76,9 +76,8 @@ tapeblock_trigger_requeue(struct tape_de
 static void
 tapeblock_end_request(struct request *req, int uptodate)
 {
-   if (end_that_request_first(req, uptodate, req->hard_nr_sectors))
+   if (__blk_end_request(req, uptodate, blk_rq_bytes(req)))
BUG();
-   end_that_request_last(req, uptodate);
 }
 
 static void
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 20/28] blk_end_request: changing ide-scsi (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts ide-scsi to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/scsi/ide-scsi.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/scsi/ide-scsi.c
===
--- 2.6.24-rc3-mm2.orig/drivers/scsi/ide-scsi.c
+++ 2.6.24-rc3-mm2/drivers/scsi/ide-scsi.c
@@ -918,8 +918,8 @@ static int idescsi_eh_reset (struct scsi
}
 
/* kill current request */
-   blkdev_dequeue_request(req);
-   end_that_request_last(req, 0);
+   if (__blk_end_request(req, 0, 0))
+   BUG();
if (blk_sense_request(req))
kfree(scsi->pc->buffer);
kfree(scsi->pc);
@@ -928,8 +928,8 @@ static int idescsi_eh_reset (struct scsi
 
/* now nuke the drive queue */
while ((req = elv_next_request(drive->queue))) {
-   blkdev_dequeue_request(req);
-   end_that_request_last(req, 0);
+   if (__blk_end_request(req, 0, 0))
+   BUG();
}
 
HWGROUP(drive)->rq = NULL;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lm-sensors] [PATCH 1/1] HWMON: coretemp, suspend fix

2007-11-30 Thread Rafael J. Wysocki

On Saturday, 1 of December 2007, Rafael J. Wysocki wrote:
> On Friday, 30 of November 2007, Jiri Slaby wrote:
> > On 11/30/2007 11:15 PM, Jean Delvare wrote:
> > > Hi Jiri,
> > 
[--snip--]
> > > 
> > > Should this change go to the stable tree(s) as well?
> > 
> > Sorry, I have no idea. Rafael?
> 
> Well, actually, having looked once again at the patch, I think that it's
> slightly wrong.  Namely, it looks like we just should drop all of the _FROZEN
> actions from there.
> 
> Fixed patch follows and I think it's also a candidate for -stable.

Crap, I forgot to add the sign-off, so here it goes again:

---
Subject: HWMON: coretemp, suspend fix
 
It's not permitted to unregister a device after devices have been suspended.
It causes deadlocks to appear on systems with coretemp hwmon loaded.  To avoid
this, we can make coretemp_cpu_callback() do nothing if the _FROZEN bit is set
in action.
 
Also, in other cases it's generally to late to unregister the coretemp device
if the CPU is already dead, so it should be unregistered on CPU_DOWN_PREPARE.
 
Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> (frozen fix)
Cc: Mark M. Hoffman <[EMAIL PROTECTED]>
Cc: Jiri Slaby <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/hwmon/coretemp.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: linux-2.6/drivers/hwmon/coretemp.c
===
--- linux-2.6.orig/drivers/hwmon/coretemp.c
+++ linux-2.6/drivers/hwmon/coretemp.c
@@ -337,11 +337,10 @@ static int coretemp_cpu_callback(struct 
 
switch (action) {
case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
+   case CPU_DOWN_FAILED:
coretemp_device_add(cpu);
break;
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
+   case CPU_DOWN_PREPARE:
coretemp_device_remove(cpu);
break;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 19/28] blk_end_request: changing scsi (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts scsi mid-layer to use blk_end_request().

The comment above scsi_next_command() is not related to this change.
It had originally been there before scsi_next_command() was included
in scsi_finalize_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/scsi/scsi_lib.c |   10 +++---
 1 files changed, 7 insertions(+), 3 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c
===
--- 2.6.24-rc3-mm2.orig/drivers/scsi/scsi_lib.c
+++ 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c
@@ -683,7 +683,7 @@ static struct scsi_cmnd *scsi_end_reques
 * If there are blocks left over at the end, set up the command
 * to queue the remainder of them.
 */
-   if (end_that_request_chunk(req, uptodate, bytes)) {
+   if (blk_end_request(req, uptodate, bytes)) {
int leftover = (req->hard_nr_sectors << 9);
 
if (blk_pc_request(req))
@@ -691,7 +691,7 @@ static struct scsi_cmnd *scsi_end_reques
 
/* kill remainder if no retrys */
if (!uptodate && blk_noretry_request(req))
-   end_that_request_chunk(req, 0, leftover);
+   blk_end_request(req, 0, leftover);
else {
if (requeue) {
/*
@@ -706,7 +706,11 @@ static struct scsi_cmnd *scsi_end_reques
}
}
 
-   scsi_finalize_request(cmd, uptodate);
+   /*
+* This will goose the queue request function at the end, so we don't
+* need to worry about launching another command.
+*/
+   scsi_next_command(cmd);
return NULL;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/28] blk_end_request: changing i2o_block (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts i2o_block to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/message/i2o/i2o_block.c |8 ++--
 1 files changed, 2 insertions(+), 6 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/message/i2o/i2o_block.c
===
--- 2.6.24-rc3-mm2.orig/drivers/message/i2o/i2o_block.c
+++ 2.6.24-rc3-mm2/drivers/message/i2o/i2o_block.c
@@ -426,22 +426,18 @@ static void i2o_block_end_request(struct
struct request_queue *q = req->q;
unsigned long flags;
 
-   if (end_that_request_chunk(req, uptodate, nr_bytes)) {
+   if (blk_end_request(req, uptodate, nr_bytes)) {
int leftover = (req->hard_nr_sectors << KERNEL_SECTOR_SHIFT);
 
if (blk_pc_request(req))
leftover = req->data_len;
 
if (end_io_error(uptodate))
-   end_that_request_chunk(req, 0, leftover);
+   blk_end_request(req, 0, leftover);
}
 
-   add_disk_randomness(req->rq_disk);
-
spin_lock_irqsave(q->queue_lock, flags);
 
-   end_that_request_last(req, uptodate);
-
if (likely(dev)) {
dev->open_queue_depth--;
list_del(&ireq->queue);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 17/28] blk_end_request: changing mmc (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts mmc to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/mmc/card/block.c |   24 +---
 drivers/mmc/card/queue.c |4 ++--
 2 files changed, 7 insertions(+), 21 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/mmc/card/block.c
===
--- 2.6.24-rc3-mm2.orig/drivers/mmc/card/block.c
+++ 2.6.24-rc3-mm2/drivers/mmc/card/block.c
@@ -348,15 +348,7 @@ static int mmc_blk_issue_rq(struct mmc_q
 * A block was successfully transferred.
 */
spin_lock_irq(&md->lock);
-   ret = end_that_request_chunk(req, 1, brq.data.bytes_xfered);
-   if (!ret) {
-   /*
-* The whole request completed successfully.
-*/
-   add_disk_randomness(req->rq_disk);
-   blkdev_dequeue_request(req);
-   end_that_request_last(req, 1);
-   }
+   ret = __blk_end_request(req, 1, brq.data.bytes_xfered);
spin_unlock_irq(&md->lock);
} while (ret);
 
@@ -386,27 +378,21 @@ static int mmc_blk_issue_rq(struct mmc_q
else
bytes = blocks << 9;
spin_lock_irq(&md->lock);
-   ret = end_that_request_chunk(req, 1, bytes);
+   ret = __blk_end_request(req, 1, bytes);
spin_unlock_irq(&md->lock);
}
} else if (rq_data_dir(req) != READ &&
   (card->host->caps & MMC_CAP_MULTIWRITE)) {
spin_lock_irq(&md->lock);
-   ret = end_that_request_chunk(req, 1, brq.data.bytes_xfered);
+   ret = __blk_end_request(req, 1, brq.data.bytes_xfered);
spin_unlock_irq(&md->lock);
}
 
mmc_release_host(card->host);
 
spin_lock_irq(&md->lock);
-   while (ret) {
-   ret = end_that_request_chunk(req, 0,
-   req->current_nr_sectors << 9);
-   }
-
-   add_disk_randomness(req->rq_disk);
-   blkdev_dequeue_request(req);
-   end_that_request_last(req, 0);
+   while (ret)
+   ret = __blk_end_request(req, 0, blk_rq_cur_bytes(req));
spin_unlock_irq(&md->lock);
 
return 0;
Index: 2.6.24-rc3-mm2/drivers/mmc/card/queue.c
===
--- 2.6.24-rc3-mm2.orig/drivers/mmc/card/queue.c
+++ 2.6.24-rc3-mm2/drivers/mmc/card/queue.c
@@ -94,8 +94,8 @@ static void mmc_request(struct request_q
printk(KERN_ERR "MMC: killing requests for dead queue\n");
while ((req = elv_next_request(q)) != NULL) {
do {
-   ret = end_that_request_chunk(req, 0,
-   req->current_nr_sectors << 9);
+   ret = __blk_end_request(req, 0,
+   blk_rq_cur_bytes(req));
} while (ret);
}
return;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/28] blk_end_request: changing xen-blkfront (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts xen-blkfront to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/xen-blkfront.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/xen-blkfront.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/xen-blkfront.c
+++ 2.6.24-rc3-mm2/drivers/block/xen-blkfront.c
@@ -494,10 +494,9 @@ static irqreturn_t blkif_interrupt(int i
dev_dbg(&info->xbdev->dev, "Bad return from 
blkdev data "
"request: %x\n", bret->status);
 
-   ret = end_that_request_first(req, uptodate,
-   req->hard_nr_sectors);
+   ret = __blk_end_request(req, uptodate,
+   blk_rq_bytes(req));
BUG_ON(ret);
-   end_that_request_last(req, uptodate);
break;
default:
BUG();
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/28] blk_end_request: changing viocd (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts viocd to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/cdrom/viocd.c |5 +
 1 files changed, 1 insertion(+), 4 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/cdrom/viocd.c
===
--- 2.6.24-rc3-mm2.orig/drivers/cdrom/viocd.c
+++ 2.6.24-rc3-mm2/drivers/cdrom/viocd.c
@@ -302,11 +302,8 @@ static void viocd_end_request(struct req
if (!nsectors)
nsectors = 1;
 
-   if (end_that_request_first(req, uptodate, nsectors))
+   if (__blk_end_request(req, uptodate, nsectors << 9))
BUG();
-   add_disk_randomness(req->rq_disk);
-   blkdev_dequeue_request(req);
-   end_that_request_last(req, uptodate);
 }
 
 static int rwreq;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/28] blk_end_request: changing viodasd (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts viodasd to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/viodasd.c |5 +
 1 files changed, 1 insertion(+), 4 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/viodasd.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/viodasd.c
+++ 2.6.24-rc3-mm2/drivers/block/viodasd.c
@@ -232,10 +232,7 @@ static struct block_device_operations vi
 static void viodasd_end_request(struct request *req, int uptodate,
int num_sectors)
 {
-   if (end_that_request_first(req, uptodate, num_sectors))
-   return;
-   add_disk_randomness(req->rq_disk);
-   end_that_request_last(req, uptodate);
+   __blk_end_request(req, uptodate, num_sectors << 9);
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/28] blk_end_request: changing sx8 (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts sx8 to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/sx8.c |4 +---
 1 files changed, 1 insertion(+), 3 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/sx8.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/sx8.c
+++ 2.6.24-rc3-mm2/drivers/block/sx8.c
@@ -749,11 +749,9 @@ static inline void carm_end_request_queu
struct request *req = crq->rq;
int rc;
 
-   rc = end_that_request_first(req, uptodate, req->hard_nr_sectors);
+   rc = __blk_end_request(req, uptodate, blk_rq_bytes(req));
assert(rc == 0);
 
-   end_that_request_last(req, uptodate);
-
rc = carm_put_request(host, crq);
assert(rc == 0);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/28] blk_end_request: changing ub (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts ub to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/ub.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/ub.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/ub.c
+++ 2.6.24-rc3-mm2/drivers/block/ub.c
@@ -816,8 +816,8 @@ static void ub_end_rq(struct request *rq
uptodate = 0;
rq->errors = scsi_status;
}
-   end_that_request_first(rq, uptodate, rq->hard_nr_sectors);
-   end_that_request_last(rq, uptodate);
+   if (__blk_end_request(rq, uptodate, blk_rq_bytes(rq)))
+   BUG();
 }
 
 static int ub_rw_cmd_retry(struct ub_dev *sc, struct ub_lun *lun,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/28] blk_end_request: changing sunvdc (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts sunvdc to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/sunvdc.c |5 +
 1 files changed, 1 insertion(+), 4 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/sunvdc.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/sunvdc.c
+++ 2.6.24-rc3-mm2/drivers/block/sunvdc.c
@@ -214,10 +214,7 @@ static void vdc_end_special(struct vdc_p
 
 static void vdc_end_request(struct request *req, int uptodate, int num_sectors)
 {
-   if (end_that_request_first(req, uptodate, num_sectors))
-   return;
-   add_disk_randomness(req->rq_disk);
-   end_that_request_last(req, uptodate);
+   __blk_end_request(req, uptodate, num_sectors << 9);
 }
 
 static void vdc_end_one(struct vdc_port *port, struct vio_dring_state *dr,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/28] blk_end_request: changing ps3disk (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts ps3disk to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/ps3disk.c |6 +-
 1 files changed, 1 insertion(+), 5 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/ps3disk.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/ps3disk.c
+++ 2.6.24-rc3-mm2/drivers/block/ps3disk.c
@@ -280,11 +280,7 @@ static irqreturn_t ps3disk_interrupt(int
}
 
spin_lock(&priv->lock);
-   if (!end_that_request_first(req, uptodate, num_sectors)) {
-   add_disk_randomness(req->rq_disk);
-   blkdev_dequeue_request(req);
-   end_that_request_last(req, uptodate);
-   }
+   __blk_end_request(req, uptodate, num_sectors << 9);
priv->req = NULL;
ps3disk_do_request(dev, priv->queue);
spin_unlock(&priv->lock);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/28] blk_end_request: changing nbd (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts nbd to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/nbd.c |4 +---
 1 files changed, 1 insertion(+), 3 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/nbd.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/nbd.c
+++ 2.6.24-rc3-mm2/drivers/block/nbd.c
@@ -108,9 +108,7 @@ static void nbd_end_request(struct reque
req, uptodate? "done": "failed");
 
spin_lock_irqsave(q->queue_lock, flags);
-   if (!end_that_request_first(req, uptodate, req->nr_sectors)) {
-   end_that_request_last(req, uptodate);
-   }
+   __blk_end_request(req, uptodate, req->nr_sectors << 9);
spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/28] blk_end_request: changing floppy (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts floppy to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/floppy.c |8 +++-
 1 files changed, 3 insertions(+), 5 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/floppy.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/floppy.c
+++ 2.6.24-rc3-mm2/drivers/block/floppy.c
@@ -2290,18 +2290,16 @@ static int do_format(int drive, struct f
 static void floppy_end_request(struct request *req, int uptodate)
 {
unsigned int nr_sectors = current_count_sectors;
+   unsigned int drive = (unsigned int)req->rq_disk->private_data;
 
/* current_count_sectors can be zero if transfer failed */
if (!uptodate)
nr_sectors = req->current_nr_sectors;
-   if (end_that_request_first(req, uptodate, nr_sectors))
+   if (__blk_end_request(req, uptodate, nr_sectors << 9))
return;
-   add_disk_randomness(req->rq_disk);
-   floppy_off((long)req->rq_disk->private_data);
-   blkdev_dequeue_request(req);
-   end_that_request_last(req, uptodate);
 
/* We're done with the request */
+   floppy_off(drive);
current_req = NULL;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lm-sensors] [PATCH 1/1] HWMON: coretemp, suspend fix

2007-11-30 Thread Rafael J. Wysocki

On Friday, 30 of November 2007, Jiri Slaby wrote:
> On 11/30/2007 11:15 PM, Jean Delvare wrote:
> > Hi Jiri,
> 
> Hi.
> 
> > On Fri, 30 Nov 2007 15:12:46 +0100, Jiri Slaby wrote:
> >> Ok, I don't see it merged in the latest -mm (mmotm). Could you, Mark, 
> >> Rafael,
> >> sign off this version of the patch (Mark's + Rafael's fix)?
> >>
> >> --
> >>
> >> From: Mark M. Hoffman <[EMAIL PROTECTED]>
> >>
> >> coretemp, suspend fix
> >>
> >> It's not permitted to unregister device/cpu if frozen and going to sleep.
> >> It causes deadlock on systems, where coretemp hwmon is loaded. Do it only
> >> in non-freezed states instead.
> >>
> >> Cc: Rafael J. Wysocki <[EMAIL PROTECTED]> (frozen fix)
> >> Cc: Mark M. Hoffman <[EMAIL PROTECTED]>
> >> Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>
> >>
> >> ---
> >> commit 4f0e19b172ed18fb29e8006c4470fd37aa245a7a
> >> tree bec1cc4f7a499efe94c5f9d2d208db325914f28e
> >> parent 877dcc2ef6c7c17a64155cf201886c49622250e9
> >> author Jiri Slaby <[EMAIL PROTECTED]> Tue, 27 Nov 2007 20:19:47 +0100
> >> committer Jiri Slaby <[EMAIL PROTECTED]> Thu, 29 Nov 2007 23:41:11 +0100
> >>
> >>  drivers/hwmon/coretemp.c |6 --
> >>  1 files changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> >> index 5c82ec7..ce7457d 100644
> >> --- a/drivers/hwmon/coretemp.c
> >> +++ b/drivers/hwmon/coretemp.c
> >> @@ -338,11 +338,13 @@ static int coretemp_cpu_callback(struct 
> >> notifier_block *nfb,
> >>switch (action) {
> >>case CPU_ONLINE:
> >>case CPU_ONLINE_FROZEN:
> >> +  case CPU_DOWN_FAILED:
> >>coretemp_device_add(cpu);
> >> +  case CPU_DOWN_FAILED_FROZEN:
> >>break;
> >> -  case CPU_DEAD:
> >> -  case CPU_DEAD_FROZEN:
> >> +  case CPU_DOWN_PREPARE:
> >>coretemp_device_remove(cpu);
> >> +  case CPU_DOWN_PREPARE_FROZEN:
> >>break;
> >>}
> >>return NOTIFY_OK;
> > 
> > Should this change go to the stable tree(s) as well?
> 
> Sorry, I have no idea. Rafael?

Well, actually, having looked once again at the patch, I think that it's
slightly wrong.  Namely, it looks like we just should drop all of the _FROZEN
actions from there.

Fixed patch follows and I think it's also a candidate for -stable.

---
Subject: HWMON: coretemp, suspend fix

It's not permitted to unregister a device after devices have been suspended.
It causes deadlocks to appear on systems with coretemp hwmon loaded.  To avoid
this, we can make coretemp_cpu_callback() do nothing if the _FROZEN bit is set
in action.

Also, in other cases it's generally to late to unregister the coretemp device
if the CPU is already dead, so it should be unregistered on CPU_DOWN_PREPARE.

Cc: Rafael J. Wysocki <[EMAIL PROTECTED]> (frozen fix)
Cc: Mark M. Hoffman <[EMAIL PROTECTED]>
Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/hwmon/coretemp.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: linux-2.6/drivers/hwmon/coretemp.c
===
--- linux-2.6.orig/drivers/hwmon/coretemp.c
+++ linux-2.6/drivers/hwmon/coretemp.c
@@ -337,11 +337,10 @@ static int coretemp_cpu_callback(struct 
 
switch (action) {
case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
+   case CPU_DOWN_FAILED:
coretemp_device_add(cpu);
break;
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
+   case CPU_DOWN_PREPARE:
coretemp_device_remove(cpu);
break;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/28] blk_end_request: changing DAC960 (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts DAC960 to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/block/DAC960.c |5 +
 1 files changed, 1 insertion(+), 4 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/block/DAC960.c
===
--- 2.6.24-rc3-mm2.orig/drivers/block/DAC960.c
+++ 2.6.24-rc3-mm2/drivers/block/DAC960.c
@@ -3464,10 +3464,7 @@ static inline bool DAC960_ProcessComplet
pci_unmap_sg(Command->Controller->PCIDevice, Command->cmd_sglist,
Command->SegmentCount, Command->DmaDirection);
 
-if (!end_that_request_first(Request, UpToDate, Command->BlockCount)) {
-   add_disk_randomness(Request->rq_disk);
-   end_that_request_last(Request, UpToDate);
-
+if (!__blk_end_request(Request, UpToDate, Command->BlockCount << 9)) {
if (Command->Completion) {
complete(Command->Completion);
Command->Completion = NULL;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/28] blk_end_request: changing um (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts um to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 arch/um/drivers/ubd_kern.c |   10 +-
 1 files changed, 1 insertion(+), 9 deletions(-)

Index: 2.6.24-rc3-mm2/arch/um/drivers/ubd_kern.c
===
--- 2.6.24-rc3-mm2.orig/arch/um/drivers/ubd_kern.c
+++ 2.6.24-rc3-mm2/arch/um/drivers/ubd_kern.c
@@ -481,15 +481,7 @@ int thread_fd = -1;
 
 static void ubd_end_request(struct request *req, int bytes, int uptodate)
 {
-   if (!end_that_request_first(req, uptodate, bytes >> 9)) {
-   struct ubd *dev = req->rq_disk->private_data;
-   unsigned long flags;
-
-   add_disk_randomness(req->rq_disk);
-   spin_lock_irqsave(&dev->lock, flags);
-   end_that_request_last(req, uptodate);
-   spin_unlock_irqrestore(&dev->lock, flags);
-   }
+   blk_end_request(req, uptodate, bytes);
 }
 
 /* Callable only from interrupt context - otherwise you need to do
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/28] blk_end_request: changing block layer core (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts core parts of block layer to use blk_end_request().

'dequeue' argument was originally introduced for end_dequeued_request(),
where no attempt should be made to dequeue the request as it's already
dequeued.
However, it's not necessary as it can be checked with
list_empty(&rq->queuelist).
(Dequeued request has empty list and queued request doesn't.)

As a result of this patch, end_queued_request() and
end_dequeued_request() become identical.  Later patch will merge
and rename them and change users of those functions.

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c |   25 ++---
 1 files changed, 10 insertions(+), 15 deletions(-)

Index: 2.6.24-rc3-mm2/block/ll_rw_blk.c
===
--- 2.6.24-rc3-mm2.orig/block/ll_rw_blk.c
+++ 2.6.24-rc3-mm2/block/ll_rw_blk.c
@@ -368,8 +368,8 @@ void blk_ordered_complete_seq(struct req
q->ordseq = 0;
rq = q->orig_bar_rq;
 
-   end_that_request_first(rq, uptodate, rq->hard_nr_sectors);
-   end_that_request_last(rq, uptodate);
+   if (__blk_end_request(rq, uptodate, blk_rq_bytes(rq)))
+   BUG();
 }
 
 static void pre_flush_end_io(struct request *rq, int error)
@@ -486,9 +486,9 @@ int blk_do_ordered(struct request_queue 
 * ORDERED_NONE while this request is on it.
 */
blkdev_dequeue_request(rq);
-   end_that_request_first(rq, -EOPNOTSUPP,
-  rq->hard_nr_sectors);
-   end_that_request_last(rq, -EOPNOTSUPP);
+   if (__blk_end_request(rq, -EOPNOTSUPP,
+ blk_rq_bytes(rq)))
+   BUG();
*rqp = NULL;
return 0;
}
@@ -3691,14 +3691,9 @@ void end_that_request_last(struct reques
 EXPORT_SYMBOL(end_that_request_last);
 
 static inline void __end_request(struct request *rq, int uptodate,
-unsigned int nr_bytes, int dequeue)
+unsigned int nr_bytes)
 {
-   if (!end_that_request_chunk(rq, uptodate, nr_bytes)) {
-   if (dequeue)
-   blkdev_dequeue_request(rq);
-   add_disk_randomness(rq->rq_disk);
-   end_that_request_last(rq, uptodate);
-   }
+   __blk_end_request(rq, uptodate, nr_bytes);
 }
 
 /**
@@ -3741,7 +3736,7 @@ EXPORT_SYMBOL_GPL(blk_rq_cur_bytes);
  **/
 void end_queued_request(struct request *rq, int uptodate)
 {
-   __end_request(rq, uptodate, blk_rq_bytes(rq), 1);
+   __end_request(rq, uptodate, blk_rq_bytes(rq));
 }
 EXPORT_SYMBOL(end_queued_request);
 
@@ -3758,7 +3753,7 @@ EXPORT_SYMBOL(end_queued_request);
  **/
 void end_dequeued_request(struct request *rq, int uptodate)
 {
-   __end_request(rq, uptodate, blk_rq_bytes(rq), 0);
+   __end_request(rq, uptodate, blk_rq_bytes(rq));
 }
 EXPORT_SYMBOL(end_dequeued_request);
 
@@ -3784,7 +3779,7 @@ EXPORT_SYMBOL(end_dequeued_request);
  **/
 void end_request(struct request *req, int uptodate)
 {
-   __end_request(req, uptodate, req->hard_cur_sectors << 9, 1);
+   __end_request(req, uptodate, req->hard_cur_sectors << 9);
 }
 EXPORT_SYMBOL(end_request);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/28] blk_end_request: changing arm (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch converts arm to use blk_end_request().

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---

 arch/arm/plat-omap/mailbox.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

Index: 2.6.24-rc3-mm2/arch/arm/plat-omap/mailbox.c
===
--- 2.6.24-rc3-mm2.orig/arch/arm/plat-omap/mailbox.c
+++ 2.6.24-rc3-mm2/arch/arm/plat-omap/mailbox.c
@@ -117,7 +117,8 @@ static void mbox_tx_work(struct work_str
 
spin_lock(q->queue_lock);
blkdev_dequeue_request(rq);
-   end_that_request_last(rq, 0);
+   if (__blk_end_request(rq, 0, 0))
+   BUG();
spin_unlock(q->queue_lock);
}
 }
@@ -151,7 +152,8 @@ static void mbox_rx_work(struct work_str
 
spin_lock_irqsave(q->queue_lock, flags);
blkdev_dequeue_request(rq);
-   end_that_request_last(rq, 0);
+   if (__blk_end_request(rq, 0, 0))
+   BUG();
spin_unlock_irqrestore(q->queue_lock, flags);
 
mbox->rxq->callback((void *)msg);
@@ -265,7 +267,8 @@ omap_mbox_read(struct device *dev, struc
 
spin_lock_irqsave(q->queue_lock, flags);
blkdev_dequeue_request(rq);
-   end_that_request_last(rq, 0);
+   if (__blk_end_request(rq, 0, 0))
+   BUG();
spin_unlock_irqrestore(q->queue_lock, flags);
 
if (unlikely(mbox_seq_test(mbox, *p))) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/28] blk_end_request: add new request completion interface (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch adds 2 new interfaces for request completion:
  o blk_end_request()   : called without queue lock
  o __blk_end_request() : called with queue lock held

Some device drivers call some generic functions below between
end_that_request_{first/chunk} and end_that_request_last().
  o add_disk_randomness()
  o blk_queue_end_tag()
  o blkdev_dequeue_request()
These are called in the blk_end_request() as a part of generic
request completion.
So all device drivers become to call above functions.

"Normal" drivers can be converted to use blk_end_request()
in a standard way shown below.

 a) end_that_request_{chunk/first}
spin_lock_irqsave()
(add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
end_that_request_last()
spin_unlock_irqrestore()
=> blk_end_request()

 b) spin_lock_irqsave()
end_that_request_{chunk/first}
(add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
end_that_request_last()
spin_unlock_irqrestore()
=> spin_lock_irqsave()
   __blk_end_request()
   spin_unlock_irqsave()

 c) end_that_request_last()
=> __blk_end_request()

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c  |   67 +
 include/linux/blkdev.h |2 +
 2 files changed, 69 insertions(+)

Index: 2.6.24-rc3-mm2/block/ll_rw_blk.c
===
--- 2.6.24-rc3-mm2.orig/block/ll_rw_blk.c
+++ 2.6.24-rc3-mm2/block/ll_rw_blk.c
@@ -3769,6 +3769,73 @@ void end_request(struct request *req, in
 }
 EXPORT_SYMBOL(end_request);
 
+static void complete_request(struct request *rq, int uptodate)
+{
+   if (blk_rq_tagged(rq))
+   blk_queue_end_tag(rq->q, rq);
+
+   /* rq->queuelist of dequeued request should be list_empty() */
+   if (!list_empty(&rq->queuelist))
+   blkdev_dequeue_request(rq);
+
+   end_that_request_last(rq, uptodate);
+}
+
+/**
+ * blk_end_request - Helper function for drivers to complete the request.
+ * @rq:   the request being processed
+ * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
+ * @nr_bytes: number of bytes to complete
+ *
+ * Description:
+ * Ends I/O on a number of bytes attached to @rq.
+ * If @rq has leftover, sets it up for the next range of segments.
+ *
+ * Return:
+ * 0 - we are done with this request
+ * 1 - still buffers pending for this request
+ **/
+int blk_end_request(struct request *rq, int uptodate, int nr_bytes)
+{
+   struct request_queue *q = rq->q;
+   unsigned long flags = 0UL;
+
+   if (blk_fs_request(rq) || blk_pc_request(rq)) {
+   if (__end_that_request_first(rq, uptodate, nr_bytes))
+   return 1;
+   }
+
+   add_disk_randomness(rq->rq_disk);
+
+   spin_lock_irqsave(q->queue_lock, flags);
+   complete_request(rq, uptodate);
+   spin_unlock_irqrestore(q->queue_lock, flags);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(blk_end_request);
+
+/**
+ * __blk_end_request - Helper function for drivers to complete the request.
+ *
+ * Description:
+ * Must be called with queue lock held unlike blk_end_request().
+ **/
+int __blk_end_request(struct request *rq, int uptodate, int nr_bytes)
+{
+   if (blk_fs_request(rq) || blk_pc_request(rq)) {
+   if (__end_that_request_first(rq, uptodate, nr_bytes))
+   return 1;
+   }
+
+   add_disk_randomness(rq->rq_disk);
+
+   complete_request(rq, uptodate);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(__blk_end_request);
+
 static void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
struct bio *bio)
 {
Index: 2.6.24-rc3-mm2/include/linux/blkdev.h
===
--- 2.6.24-rc3-mm2.orig/include/linux/blkdev.h
+++ 2.6.24-rc3-mm2/include/linux/blkdev.h
@@ -725,6 +725,8 @@ static inline void blk_run_address_space
  * for parts of the original function. This prevents
  * code duplication in drivers.
  */
+extern int blk_end_request(struct request *rq, int uptodate, int nr_bytes);
+extern int __blk_end_request(struct request *rq, int uptodate, int nr_bytes);
 extern int end_that_request_first(struct request *, int, int);
 extern int end_that_request_chunk(struct request *, int, int);
 extern void end_that_request_last(struct request *, int);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/28] blk_end_request: add/export functions to get request size (take 3)

2007-11-30 Thread Kiyoshi Ueda

This patch adds/exports functions to get the size of request in bytes.
They are useful because blk_end_request() takes bytes
as a completed I/O size instead of sectors.

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c  |   25 ++---
 include/linux/blkdev.h |8 
 2 files changed, 30 insertions(+), 3 deletions(-)

Index: 2.6.24-rc3-mm2/include/linux/blkdev.h
===
--- 2.6.24-rc3-mm2.orig/include/linux/blkdev.h
+++ 2.6.24-rc3-mm2/include/linux/blkdev.h
@@ -736,6 +736,14 @@ extern void end_dequeued_request(struct 
 extern void blk_complete_request(struct request *);
 
 /*
+ * blk_end_request() takes bytes instead of sectors as a complete size.
+ * blk_rq_bytes() returns bytes left to complete in the entire request.
+ * blk_rq_cur_bytes() returns bytes left to complete in the current segment.
+ */
+extern unsigned int blk_rq_bytes(struct request *rq);
+extern unsigned int blk_rq_cur_bytes(struct request *rq);
+
+/*
  * end_that_request_first/chunk() takes an uptodate argument. we account
  * any value <= as an io error. 0 means -EIO for compatability reasons,
  * any other < 0 value is the direct error type. An uptodate value of
Index: 2.6.24-rc3-mm2/block/ll_rw_blk.c
===
--- 2.6.24-rc3-mm2.orig/block/ll_rw_blk.c
+++ 2.6.24-rc3-mm2/block/ll_rw_blk.c
@@ -3701,13 +3701,32 @@ static inline void __end_request(struct 
}
 }
 
-static unsigned int rq_byte_size(struct request *rq)
+/**
+ * blk_rq_bytes - Returns bytes left to complete in the entire request
+ **/
+unsigned int blk_rq_bytes(struct request *rq)
 {
if (blk_fs_request(rq))
return rq->hard_nr_sectors << 9;
 
return rq->data_len;
 }
+EXPORT_SYMBOL_GPL(blk_rq_bytes);
+
+/**
+ * blk_rq_cur_bytes - Returns bytes left to complete in the current segment
+ **/
+unsigned int blk_rq_cur_bytes(struct request *rq)
+{
+   if (blk_fs_request(rq))
+   return rq->current_nr_sectors << 9;
+
+   if (rq->bio)
+   return rq->bio->bi_size;
+
+   return rq->data_len;
+}
+EXPORT_SYMBOL_GPL(blk_rq_cur_bytes);
 
 /**
  * end_queued_request - end all I/O on a queued request
@@ -3722,7 +3741,7 @@ static unsigned int rq_byte_size(struct 
  **/
 void end_queued_request(struct request *rq, int uptodate)
 {
-   __end_request(rq, uptodate, rq_byte_size(rq), 1);
+   __end_request(rq, uptodate, blk_rq_bytes(rq), 1);
 }
 EXPORT_SYMBOL(end_queued_request);
 
@@ -3739,7 +3758,7 @@ EXPORT_SYMBOL(end_queued_request);
  **/
 void end_dequeued_request(struct request *rq, int uptodate)
 {
-   __end_request(rq, uptodate, rq_byte_size(rq), 0);
+   __end_request(rq, uptodate, blk_rq_bytes(rq), 0);
 }
 EXPORT_SYMBOL(end_dequeued_request);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 00/28] blk_end_request: full I/O completion handler (take 3)

2007-11-30 Thread Kiyoshi Ueda

Hello Jens,

The following is the updated patch-set for blk_end_request().
Changes since the last version are only minor updates to catch up
with the base kernel changes.
Do you agree the implementation of blk_end_request()?
If there's no problem, could you merge it to your tree?
Or does it have to be merged to -mm tree first?


Boaz,
Could you review the newly added PATCH 27 which converts the bidi part,
and give me your comments?
It uses blk_end_request_callback() in PATCH 25, which was only for
the tricky ide-cd driver.
If bidi added a 'resid' member to struct request instead of reusing
'data_len' for the other purpose, it could use the standard
blk_end_request() instead.

-- Changes from the previous post -
Changes between take2 and take3:
  o Rebased on top of 2.6.24-rc3-mm2
  o Added a bidi patch, which changes bidi to use blk_end_request()
(PATCH 27)
  o Dropped blk_rq_size() which was to get size of entire request
because rq_byte_size() has been added to ll_rw_blk.c (PATCH 02)
  o Removed 'dequeue' argument, which was added in 2.6.23-rc7-mm1,
from __end_request() (PATCH 03)
  o Removed lguest patch because lguest has been changed not to use
end_that_request_{chunk/last} directly.

Changes between take1 and take2:
  o Rebased on top of 2.6.23-rc4-mm1
  o Don't pass the lock held information (PATCH 01)
  o Removed sect2byte() macro (PATCH 02)
  o fixed blk_rq_size() and blk_rq_cur_size() for blk_pc_requests (PATCH 02)
  o Separated the patch for changes of end_that_request_* user (PATCH 03-26)
  o Removed the patch which changes the role of rq->end_io()
from this patch-set because some more discussions are needed
about it.
---


Summary of each patch are below:
  01/28: add new request completion interface, blk_end_request()
  02/28: add some functions to get the size of request in bytes
  03/28: convert to use blk_end_request() (core parts of block layer)
  04/28: convert to use blk_end_request() (arm)
  05/28: convert to use blk_end_request() (um)
  06/28: convert to use blk_end_request() (DAC960)
  07/28: convert to use blk_end_request() (floppy)
  08/28: convert to use blk_end_request() (nbd)
  09/28: convert to use blk_end_request() (ps3disk)
  10/28: convert to use blk_end_request() (sunvdc)
  11/28: convert to use blk_end_request() (sx8)
  12/28: convert to use blk_end_request() (ub)
  13/28: convert to use blk_end_request() (viodasd)
  14/28: convert to use blk_end_request() (xen-blkfront)
  15/28: convert to use blk_end_request() (viocd)
  16/28: convert to use blk_end_request() (i2o_block)
  17/28: convert to use blk_end_request() (mmc)
  18/28: convert to use blk_end_request() (s390)
  19/28: convert to use blk_end_request() (scsi mid-layer)
  20/28: convert to use blk_end_request() (ide-scsi)
  21/28: convert to use blk_end_request() (xsysace)
  22/28: convert to use blk_end_request() (cciss)
  23/28: convert to use blk_end_request() (cpqarray)
  24/28: convert to use blk_end_request() (normal parts of ide)
  25/28: add a valiant of blk_end_request() having callback feature
  26/28: convert to use blk_end_request() (ide-cd, cdrom_newpc_intr())
  27/28: convert to use blk_end_request() (scsi bidi)
  28/28: remove/unexport no longer needed end_that_request_*

I have tested this patch-set on two machines,
IA64+QLA1280+QLA2200 box and x86_64+SATA+IDE-CDROM box.


Below is the explanation about needs and details of this patch-set.

SUMMARY
===
This set of patches changes request completion interface
between device drivers and block layer to 1 step procedure
from current 2 step procedures using end_that_request_{first/chunk}
and end_that_request_last().

This patch-set prepares for realizing another patch-set which changes
the role of rq->end_io().  It allows request-based multipath to hook
in before completing each chunk of request, check errors for it and
retry it using another path if error is detected.


BACKGROUND
==
The patch-set which changes the role of rq->end_io() is necessary
to allow device stacking at request level, that is request-based
device-mapper multipath.
Currently, device-mapper is implemented as a stacking block device
at BIO level.  OTOH, request-based dm will stack at request level
to allow better multipathing decision.
To allow device stacking at request level, the completion procedure
need to provide a hook for it.
For example, dm-multipath has to check errors and retry with other
paths if necessary before returning the I/O result to upper layer.
struct request has 'end_io' hook currently.  But it's called at
the very late stage of completion handling where the I/O result
is already returned to the upper layer.
So we need something here.

The first approach to hook in completion of each chunk of request
was adding a new rq->end_io_first() hook and calling it on the top
of __end_that_request_first().
  - http://marc.theaimsgroup.com

Re: [RFC] kobject: add kobject_init_ng and kobject_init_and_add functions

2007-11-30 Thread Alan Stern

On Fri, 30 Nov 2007, Greg KH wrote:

> > However if kobject_add() is never called, or if it is called and it 
> > fails, then it's okay to use kfree().  It's not clear whether this 
> > distinction will matter in practice.  It's probably best to document 
> > this using your stronger description.
> 
> No, if kobject_add() fails, kobject_put() still must be called in order
> to free up the name pointer, unless you are somehow guessing that the
> "kobject_set_name()" portion of kobject_add() somehow failed.

Actually I imagined that if kobject_add() failed it would back out all 
the changes it made -- which means it would deallocate the name 
string.  But requiring people to call kobject_put() will do this just 
as well.

>  And you
> can't know that, so you have to call kobject_put() in order to be safe
> and clean up everything.
> 
> Now why did we not do the final kobject_put() in kobject_del() as well?
> Doing two calls, always in order, seems a bit strange, anyone know why
> it's this way?

To be symmetrical with kobject_init() and kobject_add().  Besides, 
isn't there kobject_unregister()?  Presumably it will go away along 
with kobject_register(), though.

> > You could put that a little less strongly.  After kobject_init() you
> > SHOULD call kobject_put() to clean up properly, and after kobject_add()
> > you MUST call kobject_del() and kobject_put().
>
> No, in looking at the code, you only need to call kobject_del() to clean
> everything up properly, if kobject_add() succeeds.  No need to call
> kobject_put() yet again.

Sorry, yes, that's what I meant.  After a successful call to 
kobject_add() you must call kobject_del() to undo the _add, and then
kobject_put() for the final cleanup.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-11-30 Thread J.A. Magallón

On Fri, 30 Nov 2007 11:29:55 +0100, "Loïc Grenié" <[EMAIL PROTECTED]> wrote:

> 2007/11/29, Ben Crowhurst <[EMAIL PROTECTED]>:
> > Has Objective-C ever been considered for kernel development?
> >
> > regards,
> > BPC
> 

Well, I really would like to learn some things here, could we
keep this off-topic thread alive just a bit, please ?
(I know, I'm going to gain a troll's fame because I can't avoid this
discussions, its one of my secret vices...)

>No, it has not. Any language that looks remotely like an OO language
>   has not ever been considered for (Linux) kernel development and for
>   most, if not all, other operating systems kernels.
> 

I think BeOS was C++ and OSX is C+ObjectiveC (and runs on an iPhone).
Original MacOS (fron 6 to 9) was Pascal (and a mac SE was very near
to embedded hardware :) ).

I do not advocate to rewrite Linux in C++, but don't say a kernel written
in C++ can not be efficient.

> Various problems occur in an object oriented language. One of them
>   is garbage collection: it provokes asynchronous delays and, during
>   an interrupt or a system call for a real time task, the kernel cannot
>   wait. 

C++ (and for what I read on other answer, nor ObjectiveC) has no garbage
collection. It does not anything you did not it to do. It just allows
you to change this

struct buffer *x;
x = kmalloc(...)
x->sz = 128
x->buff = kmalloc(...)
...
kfree(x->buff)
kfree(x)

to
struct buffer *x;
x = new buffer(128); (that does itself allocates x->buff,
  because _you_ programmed it,
  so you poor programmer don't forget)
...
delete x;(that also was programmed to deallocate
  x->buff itself, sou you have one less
  memory leak to worry about)

>   Another is memory overhead: all the magic that OO languages
>   provide take space in memory and Linux kernel is used in embedded
>   systems with very tight memory requirements.
> 

An vtable in C++ takes exactly the same space that the function
table pointer present in every driver nowadays... and probably
the virtual method call that C++ does itself with

thing->do_something(with,this)

like
push thing
push with
push this
call THING_vtable+indexof(do_something) // constants at compile time

is much more efficient that what gcc can mangle to do with

thing->do_something(with,this,thing)

push with
push this
push thing
get thing+offsetof(do_something) // not constant at compile time
dereference it
call it

(that is, get a generic field on a structure and use it as jump address)

In short, the kernel is object oriented, implements OO programming by
hand, but the compiler lacks the knowledge that it is object oriented
programming so it could do some optimizations.

> Lots of people will think of better reasons why ObjC is not used...

People usually complains about RTTI or exceptions, but benefits versus
memory space should be seriously considered (sure there is something
in current drivers to ask 'are you a SATA or an IDE disk?').

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam03 (gcc 4.2.2 (4.2.2-1mdv2008.1)) SMP Sat Nov
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-11-30 Thread Bill Davidsen


David Newall wrote:

Jan Engelhardt wrote:

On Nov 30 2007 11:20, Xavier Bestel wrote:
 

On Fri, 2007-11-30 at 19:09 +0900, KOSAKI Motohiro wrote:
   

Has Objective-C ever been considered for kernel development?
  

Why not C# instead ?


Why not Haskell nor Erlang instead ? :-D
  

I heard of a bash compiler. That would enable development time
rationalization and maximize the collaborative convergence of a
community-oriented synergy.



Fortran90 it has to be.


It used to be written in BCPL; or was that Multics?


BCPL was typeless, as was the successor B (between Bell Labs and GE we 
write thousands of lines of B, ported to 8080, GE600, etc). C introduced 
types, and the rest is history. Multics is written in PL/1, and I wrote 
a lot of PL/1 subset G back when as well. You don't know slow compile 
until you get a seven pass compiler with each pass on floppy.



--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [EXT4 set 6][PATCH 1/1]Export jbd stats through procfs

2007-11-30 Thread Eric Sandeen

Mingming Cao wrote:
> [PATCH] jbd2 stats through procfs
> 
> The patch below updates the jbd stats patch to 2.6.20/jbd2.
> The initial patch was posted by Alex Tomas in December 2005
> (http://marc.info/?l=linux-ext4&m=113538565128617&w=2).
> It provides statistics via procfs such as transaction lifetime and size.
> 
> [ This probably should be rewritten to use debugfs?   -- Ted]
> 
> Signed-off-by: Johann Lombardi <[EMAIL PROTECTED]>

I've started going through this one to clean it up to the point where it
can go forward.  It's been sitting at the top of the unstable portion of
the patch queue for long enough, I think :)

I've converted the msecs to jiffies until the user boundary, changed the
union #defines as suggested by Andrew, and various other little issues etc.

Remaining to do is a generic time-difference calculator (instead of
jbd2_time_diff), and looking into whether it should be made a config
option; I tend to think it should, but it's fairly well sprinkled
through the code, so I'll see how well that works.

Also did we ever decided if this should go to debugfs?

Thanks,

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha

2007-11-30 Thread Rafael J. Wysocki

On Friday, 30 of November 2007, Andrew Morton wrote:
> On Sat, 01 Dec 2007 11:30:01 +1300
> Michael Cree <[EMAIL PROTECTED]> wrote:
> 
> > Bob Tracy wrote:
> > > Andrew Morton wrote:
> > >> Could be something change in sysfs.  Please double-check the config
> > >> options, make sure that something important didn't get disabled.
> > >>
> > >  Here's
> > > hoping someone else is seeing this or can replicate it in the meantime.
> > 
> > Snap.
> > 
> > 2.6.24-rc2 works fine.   2.6.24-rc3 boots on Alpha but once /dev is 
> > populated no partitions of the scsi sub-system are seen.  Looks like ide 
> > sub-system similarly affected.
> 
> Rafael, I assume you have this regression in the list?

Yes, http://bugzilla.kernel.org/show_bug.cgi?id=9457
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: WARNING: at kernel/resource.c:189 __release_resource

2007-11-30 Thread Bjorn Helgaas

On Friday 30 November 2007 03:49:55 pm Jiri Slaby wrote:
> On 11/30/2007 10:08 PM, Bjorn Helgaas wrote:
> > On Thursday 29 November 2007 05:42:07 pm Andrew Morton wrote:
> >> On Thu, 29 Nov 2007 16:40:37 -0700
> >>> Maybe we could either remove the pnp_{stop,start}_dev() calls
> >>> from the suspend/resume path, or move the PNP resource management
> >>> out of pnp_{start,stop}_dev().
> >>>
> >>> Bjorn
> >>>
> >>> [1] http://lkml.org/lkml/2005/11/30/39
> >> So was this particular problem caused/exposed by
> >> pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch, or is
> >> it in mainline?
> > 
> > I'm pretty sure this problem is caused by that patch, so we
> > we shouldn't see this in mainline.
> > 
> > Jiri, can you try the additional patch below, please?
> > 
> > Index: linux-mm/drivers/pnp/driver.c
> > ===
> > --- linux-mm.orig/drivers/pnp/driver.c  2007-11-30 13:58:25.0 
> > -0700
> > +++ linux-mm/drivers/pnp/driver.c   2007-11-30 13:59:37.0 -0700
> > @@ -161,13 +161,6 @@
> > return error;
> > }
> >  
> > -   if (!(pnp_drv->flags & PNP_DRIVER_RES_DO_NOT_CHANGE) &&
> > -   pnp_can_disable(pnp_dev)) {
> > -   error = pnp_stop_dev(pnp_dev);
> > -   if (error)
> > -   return error;
> > -   }
> > -
> > if (pnp_dev->protocol && pnp_dev->protocol->suspend)
> > pnp_dev->protocol->suspend(pnp_dev, state);
> > return 0;
> > @@ -185,12 +178,6 @@
> > if (pnp_dev->protocol && pnp_dev->protocol->resume)
> > pnp_dev->protocol->resume(pnp_dev);
> >  
> > -   if (!(pnp_drv->flags & PNP_DRIVER_RES_DO_NOT_CHANGE)) {
> > -   error = pnp_start_dev(pnp_dev);
> > -   if (error)
> > -   return error;
> > -   }
> > -
> > if (pnp_drv->resume)
> > return pnp_drv->resume(pnp_dev);
> >  
> 
> No, it breaks suspend.

Thanks for trying it.  What are the symptoms?  I'd like to understand
why we need to stop the devices before suspend.

Bjorn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

SoftMAC: Getting essid from req_essid

2007-11-30 Thread Ray Lee

Hey there Larry, all,

git blame fingered commit id efe870f9 (from Larry) for adding a couple
of fairly harmless looking messages to
net/ieee80211/softmac/ieee80211softmac_wx.c . The problem is that one
of them is clogging up my syslog at the tune of once a second or so
("SoftMAC: Getting essid from req_essid"), and rolling everything else
out of my dmesg.

I just rebooted into 2.6.23-rc3+some, and after 36 minutes of uptime I
already have:

$ dmesg | cut -d ']' -f2- | sort | uniq -c | sort -nr | head -3

   1047  SoftMAC: Getting essid from req_essid
 38  SoftMAC: Getting essid from associate_essid
 22  SoftMAC: Scanning finished: scanned 13 channels starting with channel 1

Is the message important for debugging, or can I make a patch to yank
the silly thing?

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] xfs: revert to double-buffering readdir

2007-11-30 Thread Chris Wedgwood

On Fri, Nov 30, 2007 at 04:36:25PM -0600, Stephen Lord wrote:

> Looks like the readdir is in the bowels of the btree code when
> filldir gets called here, there are probably locks on several
> buffers in the btree at this point. This will only show up for large
> directories I bet.

I see it for fairly small directories.  Larger than what you can stuff
into an inode but less than a block (I'm not checking but fairly sure
that's the case).

> Just rambling, not a single line of code was consulted in writing
> this message.

Can you explain why the offset is capped and treated in an 'odd way'
at all?

+   curr_offset = filp->f_pos;
+   if (curr_offset == 0x7fff)
+   offset = 0x;
+   else
+   offset = filp->f_pos;

and later the offset to filldir is masked.  Is that some restriction
in filldir?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] mac80211: add power management support -v2

2007-11-30 Thread John W. Linville

On Sat, Nov 17, 2007 at 01:13:58AM +0100, Miguel Botón wrote:

> This patch adds power management support in mac80211.
> 
> This allows us to enable power management through the "iwconfig  
> power " command.
> The code is based on "mac80211-10.0.0" but it is a little bit modified.
> 
> Signed-off-by: Miguel Botón <[EMAIL PROTECTED]>
> 
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index 5fcc4c1..c82b6fa 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -452,6 +452,8 @@ struct ieee80211_conf {
>   u8 antenna_max;
>   u8 antenna_sel_tx;
>   u8 antenna_sel_rx;
> +
> + u8 power_management_enable; /* flag to enable/disable power 
> management */
>  };
>  
>  /**

I'm not overly happy with this.  What about folding this into the
flags variable?

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-11-30 Thread J.A. Magallón

On Fri, 30 Nov 2007 19:09:45 +0900, KOSAKI Motohiro <[EMAIL PROTECTED]> wrote:

> > > Has Objective-C ever been considered for kernel development?
> > 
> > Why not C# instead ?
> 
> Why not Haskell nor Erlang instead ? :-D
> 

Flash

http://www.lagmonster.info/humor/windowsrg.html

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam03 (gcc 4.2.2 (4.2.2-1mdv2008.1)) SMP Sat Nov
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: WARNING: at kernel/resource.c:189 __release_resource

2007-11-30 Thread Jiri Slaby

On 11/30/2007 10:08 PM, Bjorn Helgaas wrote:
> On Thursday 29 November 2007 05:42:07 pm Andrew Morton wrote:
>> On Thu, 29 Nov 2007 16:40:37 -0700
>>> Maybe we could either remove the pnp_{stop,start}_dev() calls
>>> from the suspend/resume path, or move the PNP resource management
>>> out of pnp_{start,stop}_dev().
>>>
>>> Bjorn
>>>
>>> [1] http://lkml.org/lkml/2005/11/30/39
>> So was this particular problem caused/exposed by
>> pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch, or is
>> it in mainline?
> 
> I'm pretty sure this problem is caused by that patch, so we
> we shouldn't see this in mainline.
> 
> Jiri, can you try the additional patch below, please?
> 
> Index: linux-mm/drivers/pnp/driver.c
> ===
> --- linux-mm.orig/drivers/pnp/driver.c2007-11-30 13:58:25.0 
> -0700
> +++ linux-mm/drivers/pnp/driver.c 2007-11-30 13:59:37.0 -0700
> @@ -161,13 +161,6 @@
>   return error;
>   }
>  
> - if (!(pnp_drv->flags & PNP_DRIVER_RES_DO_NOT_CHANGE) &&
> - pnp_can_disable(pnp_dev)) {
> - error = pnp_stop_dev(pnp_dev);
> - if (error)
> - return error;
> - }
> -
>   if (pnp_dev->protocol && pnp_dev->protocol->suspend)
>   pnp_dev->protocol->suspend(pnp_dev, state);
>   return 0;
> @@ -185,12 +178,6 @@
>   if (pnp_dev->protocol && pnp_dev->protocol->resume)
>   pnp_dev->protocol->resume(pnp_dev);
>  
> - if (!(pnp_drv->flags & PNP_DRIVER_RES_DO_NOT_CHANGE)) {
> - error = pnp_start_dev(pnp_dev);
> - if (error)
> - return error;
> - }
> -
>   if (pnp_drv->resume)
>   return pnp_drv->resume(pnp_dev);
>  

No, it breaks suspend.

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha

2007-11-30 Thread Andrew Morton

On Sat, 01 Dec 2007 11:30:01 +1300
Michael Cree <[EMAIL PROTECTED]> wrote:

> Bob Tracy wrote:
> > Andrew Morton wrote:
> >> Could be something change in sysfs.  Please double-check the config
> >> options, make sure that something important didn't get disabled.
> >>
> >  Here's
> > hoping someone else is seeing this or can replicate it in the meantime.
> 
> Snap.
> 
> 2.6.24-rc2 works fine.   2.6.24-rc3 boots on Alpha but once /dev is 
> populated no partitions of the scsi sub-system are seen.  Looks like ide 
> sub-system similarly affected.

Rafael, I assume you have this regression in the list?

> Managed to get boot log.  Follows below (with output of various /proc info).
> 
> Cheerz
> Michael.
> 
> 
> Linux version 2.6.24-rc3 ([EMAIL PROTECTED]) (gcc version 4.1.3 20071019 
> (prerelease) (Debian 4.1.2-17)) #1 Mon Nov 26 19:28:58 NZDT 2007
> Booting on Tsunami variation Monet using machine vector Monet from SRM
> Major Options: EV67 LEGACY_START VERBOSE_MCHECK
> Command line: ro root=/dev/sda3 console=ttyS0
> memcluster 0, usage 1, start0, end  215
> memcluster 1, usage 0, start  215, end   131062
> memcluster 2, usage 1, start   131062, end   131072
> freeing pages 215:384
> freeing pages 930:131062
> reserving pages 930:932
> 4096K Bcache detected; load hit latency 21 cycles, load miss latency 127 
> cycles
> Console graphics on hose 0
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 130167
> Kernel command line: ro root=/dev/sda3 console=ttyS0
> PID hash table entries: 4096 (order: 12, 32768 bytes)
> Using epoch = 2000
> Turning on RTC interrupts.
> Console: colour VGA+ 80x25
> console [ttyS0] enabled
> Dentry cache hash table entries: 131072 (order: 7, 1048576 bytes)
> Inode-cache hash table entries: 65536 (order: 6, 524288 bytes)
> Memory: 1030896k/1048496k available (2786k kernel code, 15216k reserved, 
> 370k data, 168k init)
> Mount-cache hash table entries: 512
> net_namespace: 120 bytes
> NET: Registered protocol family 16
> PCI: Bridge: 0001:01:08.0
>IO window: 8000-8fff
>MEM window: 0900-090f
>PREFETCH window: disabled.
> SMC37c669 Super I/O Controller found @ 0x3f0
> Linux Plug and Play Support v0.97 (c) Adam Belay
> SCSI subsystem initialized
> NET: Registered protocol family 2
> IP route cache hash table entries: 8192 (order: 3, 65536 bytes)
> TCP established hash table entries: 32768 (order: 6, 524288 bytes)
> TCP bind hash table entries: 32768 (order: 5, 262144 bytes)
> TCP: Hash tables configured (established 32768 bind 32768)
> TCP reno registered
> srm_env: version 0.0.6 loaded successfully
> io scheduler noop registered
> io scheduler cfq registered (default)
> tridentfb: Trident framebuffer 0.7.8-NEWAPI initializing
> isapnp: Scanning for PnP cards...
> isapnp: No Plug & Play device found
> rtc: SRM (post-2000) epoch (2000) detected
> Real Time Clock Driver v1.12ac
> Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
> serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> Floppy drive(s): fd0 is 2.88M
> FDC 0 is a post-1991 82077
> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> CY82C693: IDE controller (0x1080:0xc693 rev 0x00) at  PCI slot :00:07.1
> CY82C693: not 100% native mode: will probe irqs later
> CY82C693U driver v0.34 99-13-12 Andreas S. Krebs ([EMAIL PROTECTED])
>  ide0: BM-DMA at 0x8400-0x8407, BIOS settings: hda:pio, hdb:pio
> CY82C693: port 0x01f0 already claimed by ide0
> ALI15X3: IDE controller (0x10b9:0x5228 rev 0xc6) at  PCI slot 0001:02:09.1
> ALI15X3: 100% native mode on irq 28
>  ide1: BM-DMA at 0x28410-0x28417, BIOS settings: hdc:DMA, 
> hdd:DMA
>  ide2: BM-DMA at 0x28418-0x2841f, BIOS settings: hde:pio, 
> hdf:pio
> hdf: LITE-ON DVDRW SOHW-1653S, ATAPI CD/DVD-ROM drive
> hde: ST3200822A, ATA DISK drive
> ide2 at 0x28438-0x2843f,0x2844e on irq 28
> hde: max request size: 512KiB
> hde: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63, 
> UDMA(100)
> hde: cache flushes supported
>   hde: hde1
> qla1280: QLA1040 found on PCI bus 1, dev 6
> scsi(0:0): Resetting SCSI BUS
> scsi0 : QLogic QLA1040 PCI to SCSI Host Adapter
> Firmware version:  7.65.06, Driver version 3.26
> serio: i8042 KBD port at 0x60,0x64 irq 1
> serio: i8042 AUX port at 0x60,0x64 irq 12
> mice: PS/2 mouse device common for all mice
> scsi 0:0:1:0: Direct-Access SEAGATE  ST336706LW   0109 PQ: 0 ANSI: 3
> scsi(0:0:1:0): Sync: period 10, offset 12, Wide
> input: AT Raw Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
> atkbd.c: keyboard reset failed on isa0060/serio1
> TCP cubic registered
> Initializing XFRM netlink socket
> NET: Registered protocol family 1
> NET: Registered protocol family 17
> NET: Registered protocol family 15
> scsi: waiting for bus probes to complete ...
> sd 0:0:1:

Re: [PATCH] xfs: revert to double-buffering readdir

2007-11-30 Thread Stephen Lord




Wow, was it really that long ago!

Looks like the readdir is in the bowels of the btree code when  
filldir gets called
here, there are probably locks on several buffers in the btree at  
this point. This

will only show up for large directories I bet.

The xfs readdir code has the complete xfs inode number in its hands  
at this point
(filldir is not necessarily getting all the bits of it). All we are  
doing the lookup

for really is to get the inode number back again so we can get the inode
and get the attributes. Rather dumb really. There has got to be a way of
doing a callout structure here so that the inode number can be pushed
through filldir and back into an fs specific call. The fs then can do  
a lookup

by id - which is what it does most of the time for resolving nfs handles
anyway. Should be more efficient than the current scheme.

Just rambling, not a single line of code was consulted in writing  
this message.


You want to make a big fat btree directory for testing this stuff.  
Make sure it gets

at least a couple of layers of node blocks.

Steve

On Nov 30, 2007, at 1:22 AM, Timothy Shimmin wrote:


Christoph Hellwig wrote:

The current readdir implementation deadlocks on a btree buffers locks
because nfsd calls back into ->lookup from the filldir callback.  The
only short-term fix for this is to revert to the old inefficient
double-buffering scheme.


Probably why Steve did this: :)

xfs_file.c

revision 1.40
date: 2001/03/15 23:33:20;  author: lord;  state: Exp;  lines: +54 -17
modid: 2.4.x-xfs:slinx:90125a
Change linvfs_readdir to allocate a buffer, call xfs to fill it, and
then call the filldir function on each entry. This is instead of  
doing the

filldir deep in the bowels of xfs which causes locking problems.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Pallipadi, Venkatesh

 

>On Fri, 30 Nov 2007 14:06:55 -0800
>"Pallipadi, Venkatesh" <[EMAIL PROTECTED]> wrote:
>
>Please dont go off-list like this.  I put Mark's original 
>mailing list cc's
>back.

Sorry for missing some cc's earlier. I blindly did a reply-all to the
mm-commits mail I got.

>> I will have to Nack this. The reason max_cstate was initentionally
>> removed due to couple of reasons:
>
>It broke userspace without any warning or migration period, afaict.

Yes. That's true. I will have to take the blame for that. It has been
known for a while during cpuidle development. But, it was never
documented as deprecating.
 
>> 1) All in kernel users of max_cstate should rather be using
>> pm_qos/latency interfaces. All such max_cstate usages must already be
>> migrated.
>
>That code isn't merged.

All kernel part is already merged. I mean, there are do drivers that
depend on max_cstate. They use latency_notifier thing today and their
migration to pm_qos part is not merged yet.

>> 2) Supporting max_cstate as a dynamic parameter cleanly is no longer
>> possible in acpi/processor_idle.c as the C-state policy has moved to
>> cpuidle instead. It can be done if it is needed. But, just 
>below patch
>> will not really work with cpuidle.
>> 
>> Selecting max_cstate at boot time as a debug option still 
>works without
>> this patch.
>> 
>> So, just this patch will not get back the functionality with cpuidle.
>> Infact changing it at run time will have no effect. Question 
>however is:
>> Is there a real need to revive this parameter so that user can change
>> max_cstate at run time?
>
>It is not known whether Mark is actually writing to this 
>thing.  Perhaps
>read-only permissions would be a suitable fix?
>

Exporting it as read only should be OK. We also need to know if there
are hard user space dependency on writing to this from userspace.

Thanks,
Venki  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [bug] SLOB, tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()

2007-11-30 Thread Matt Mackall

On Fri, Nov 30, 2007 at 10:14:18AM +0100, Ingo Molnar wrote:
> 
> * Matt Mackall <[EMAIL PROTECTED]> wrote:
> 
> > > plus, and this is a slob question i guess, how come we drop into 
> > > clear_highpage() for a kzalloc()??
> > 
> > Good question. Looks like kzalloc switched from doing a memset to
> > passing a GFP_ZERO flag down to kmalloc. Slob didn't get completely
> > updated to reflect this, so it blindly propagates the flag onto
> > __alloc_pages and does a harmless double-clear.
> > 
> > Someone should remind us what the point of moving the kzalloc memset
> > down into the allocators was. We now have all three allocators doing:
> > 
> > if (unlikely((flags & __GFP_ZERO) && ptr))
> > memset(ptr, 0, obj_size(cachep));
> > 
> > and needing to mask flags before passing them to page allocators,
> > which hardly seems better than kzalloc unconditionally doing the
> > memset. Wouldn't it be better/faster/smaller to make kzalloc a
> > non-inline?
> > 
> > Slob also has a nice second path for large kmallocs where it just 
> > calls the page allocator directly which also needs this treatment. 
> > Which does the right thing with non-highmem systems, but can hit this 
> > bug otherwise.
> > 
> > Below is a totally untested patch. Alternately, we could simply tweak 
> > clear_highpage to remove the limitation, but that would leave slob 
> > doing an extra clear.
> 
> ok, this fixes the debug warning.
> 

But the question remains: is this the right fix? The commit in
question is here:

http://www.kernel.org/hg/linux-2.6/rev/13683609d67a

Christoph, remind us what's the upside here? It seems to me it would
be better to have separate non-inline kzalloc and kcalloc functions
that did the memset instead.

Another smaller open question is whether we want to remove the
in_interrupt restriction from clear_pagehigh.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] kobject: add kobject_init_ng and kobject_init_and_add functions

2007-11-30 Thread Greg KH

On Fri, Nov 30, 2007 at 05:10:33PM -0500, Alan Stern wrote:
> On Fri, 30 Nov 2007, Greg KH wrote:
> 
> > Ok, how about this:
> > void kobject_init(struct kobject *kobj, struct ktype *ktype);
> > 
> > and then:
> > int kobject_add(struct kobject *kobj, struct kobject *parent, const 
> > char *fmt, ...);
> > 
> > After we call kobject_init() we HAVE to call kobject_put() to clean up
> > properly.  So, if kobject_add() fails, we still need to clean up with
> > kobject_put();
> 
> You could put that a little less strongly.  After kobject_init() you
> SHOULD call kobject_put() to clean up properly, and after kobject_add()
> you MUST call kobject_del() and kobject_put().

No, in looking at the code, you only need to call kobject_del() to clean
everything up properly, if kobject_add() succeeds.  No need to call
kobject_put() yet again.

Can someone else verify that this really is correct?

thanks,

greg k-h

p.s. I think it's time to start a "travel to .nz and kick a certain
ex-kernel-developer around a bit" fund...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: capabilities-introduce-per-process-capability-bounding-set.patch breaks FC6 Avahi

2007-11-30 Thread Jiri Slaby

On 11/30/2007 08:41 PM, Jeff Dike wrote:
> avahi-daemon fails to start on FC6 when
> capabilities-introduce-per-process-capability-bounding-set.patch is
> applied.
> 
> strace shows
>   capset(0x19980330, 0, {CAP_SETGID|CAP_SETUID|CAP_SYS_CHROOT, 
> CAP_SETGID|CAP_SETUID|CAP_SYS_CHROOT, 0}) = -1 EPERM (Operation not permitted)
> 
> I don't know if this is expected, but the changelog doesn't seem to
> imply that this will break things.

Nope, try this :):
http://lkml.org/lkml/2007/11/28/390

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] kobject: add kobject_init_ng and kobject_init_and_add functions

2007-11-30 Thread Greg KH

On Fri, Nov 30, 2007 at 05:10:33PM -0500, Alan Stern wrote:
> On Fri, 30 Nov 2007, Greg KH wrote:
> 
> > Ok, how about this:
> > void kobject_init(struct kobject *kobj, struct ktype *ktype);
> > 
> > and then:
> > int kobject_add(struct kobject *kobj, struct kobject *parent, const 
> > char *fmt, ...);
> > 
> > After we call kobject_init() we HAVE to call kobject_put() to clean up
> > properly.  So, if kobject_add() fails, we still need to clean up with
> > kobject_put();
> 
> You could put that a little less strongly.  After kobject_init() you
> SHOULD call kobject_put() to clean up properly, and after kobject_add()
> you MUST call kobject_del() and kobject_put().
> 
> However if kobject_add() is never called, or if it is called and it 
> fails, then it's okay to use kfree().  It's not clear whether this 
> distinction will matter in practice.  It's probably best to document 
> this using your stronger description.

No, if kobject_add() fails, kobject_put() still must be called in order
to free up the name pointer, unless you are somehow guessing that the
"kobject_set_name()" portion of kobject_add() somehow failed.  And you
can't know that, so you have to call kobject_put() in order to be safe
and clean up everything.

Now why did we not do the final kobject_put() in kobject_del() as well?
Doing two calls, always in order, seems a bit strange, anyone know why
it's this way?

> The same sort of rule should apply to other kernel objects, like struct 
> device.  After intialization you have to do a final _put, before that 
> you just do a kfree().  (And initialization cannot fail.)

Yes.

> > That means we _can_ create a:
> > int kobject_init_and_add(struct kobject *kobj, struct ktype *ktype, 
> > struct kobject *parent, const char *fmt, ...);
> > 
> > and if that fails, then again, you have to call kobject_put() to clean
> > things up, right?
> 
> Right.  Because you know that the failure must have occurred in the 
> _add portion.

Ok, good, I might get this right yet :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pnpacpi : exceeded the max number of IO resources

2007-11-30 Thread Rene Herman


On 30-11-07 14:14, Chris Holvenstot wrote:


For what it is worth I too have seen this problem this morning and it
DOES appear to be new (in contrast to a previous comment)

The message:  pnpacpi: exceeded the max number of mem resources: 12

is displayed each time the system is booted with the 2.6.24-rc3-git5
kernel but is NOT displayed when booting 2.6.24-rc3-git4

I have made no changes in my config file between these two kernels other
than to accept any new defaults when running make oldconfig.

If you had already narrowed it down to a change between git4 and git5 I
apologize for wasting your time.  Have to run to work now.


Thanks, and re-added the proper CCs. Sigh...

Well, yes, the warning is actually new as well. Previously your kernel just 
silently ignored 8 more mem resources than it does now it seems.


Given that people are hitting these limits, it might make sense to just do 
away with the warning for 2.6.24 again while waiting for the dynamic code?


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree

2007-11-30 Thread Andrew Morton

On Fri, 30 Nov 2007 14:06:55 -0800
"Pallipadi, Venkatesh" <[EMAIL PROTECTED]> wrote:

Please dont go off-list like this.  I put Mark's original mailing list cc's
back.

> 
> I will have to Nack this. The reason max_cstate was initentionally
> removed due to couple of reasons:

It broke userspace without any warning or migration period, afaict.

> 1) All in kernel users of max_cstate should rather be using
> pm_qos/latency interfaces. All such max_cstate usages must already be
> migrated.

That code isn't merged.

> 2) Supporting max_cstate as a dynamic parameter cleanly is no longer
> possible in acpi/processor_idle.c as the C-state policy has moved to
> cpuidle instead. It can be done if it is needed. But, just below patch
> will not really work with cpuidle.
> 
> Selecting max_cstate at boot time as a debug option still works without
> this patch.
> 
> So, just this patch will not get back the functionality with cpuidle.
> Infact changing it at run time will have no effect. Question however is:
> Is there a real need to revive this parameter so that user can change
> max_cstate at run time?

It is not known whether Mark is actually writing to this thing.  Perhaps
read-only permissions would be a suitable fix?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 345 matches

Mail list logo