date:20070518

building i386 requires s390: "driver/crypto/Kconfig" sourcing s390 arch

2007-05-18 Thread Linda Walsh


Seems there is an include of s390 based config in file
drivers/crypto/Kconfig: source "arch/s390/crypto/Kconfig"

The line doesn't seem to be need for an i386 build (haven't
tried x86_64 though).

I take it that this was a braino?






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ht CPU flag

2007-05-18 Thread Bernd Eckenfels

On Fri, May 18, 2007 at 12:07:09PM -0700, Siddha, Suresh B wrote:
> On Fri, May 18, 2007 at 11:45:59AM -0700, H. Peter Anvin wrote:
> > IIRC, the HT flag is also reported for multicore CPUs.
> 
> Yes. Thats correct.

And for some Single-Core Non-HT CPUs.

Gruss
Bernd
-- 
  (OO) -- [EMAIL PROTECTED] --
 ( .. )[EMAIL PROTECTED],linux.de,debian.org}  http://www.eckes.org/
  o--o   1024D/E383CD7E  [EMAIL PROTECTED]  v:+497211603874  f:+49721151516129
(OO)  When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Add the device IDs for AMD/ATI SB700

2007-05-18 Thread Dave Jones

On Wed, May 09, 2007 at 08:36:54AM -0400, Jeff Garzik wrote:
 > Henry Su wrote:
 > > From: [EMAIL PROTECTED]
 > > Adding the device ID for AMD/ATI SB700.
 > > Signed-off-by:henry su <[EMAIL PROTECTED]>
 > 
 > Time to train new people...
 > 
 > You need to split up your patches:
 > * send I2C and PCI quirk patches to GregKH
 > * send drivers/ide/* stuff to Bart
 > * send drivers/ata/* patches to me

Additionally, these patches seem to be hurrendously MIME damaged.
patch(1) sees this..

diff -Nur linux-2.6.21.1.orig/include/linux/pci+AF8-ids.h 
linux-2.6.21.1/include/linux/pci+AF8-ids.h
--- linux-2.6.21.1.orig/include/linux/pci+AF8-ids.h 2007-05-16 
13:28:54.405386000 +-0800
+-+-+- linux-2.6.21.1/include/linux/pci+AF8-ids.h  2007-05-16 
13:45:29.936636000 +-0800
+AEAAQA- -371,6 +-371,9 +AEAAQA-
+ACM-define PCI+AF8-DEVICE+AF8-ID+AF8-ATI+AF8-IXP600+AF8-SRAID 0x4381
+ACM-define PCI+AF8-DEVICE+AF8-ID+AF8-ATI+AF8-IXP600+AF8-SMBUS 0x4385
+ACM-define PCI+AF8-DEVICE+AF8-ID+AF8-ATI+AF8-IXP600+AF8-IDE   0x438c
+-+ACM-define PCI+AF8-DEVICE+AF8-ID+AF8-ATI+AF8-IXP700+AF8-SATA  0x4390
+-+ACM-define PCI+AF8-DEVICE+AF8-ID+AF8-ATI+AF8-IXP700+AF8-SMBUS 0x4395
+-+ACM-define PCI+AF8-DEVICE+AF8-ID+AF8-ATI+AF8-IXP700+AF8-IDE   0x439c

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86-64 highres/dyntick support 2.6.22-rc1-v5

2007-05-18 Thread Christoph Lameter

On Thu, 17 May 2007, Frank Sorenson wrote:

> > Please boot with slub_debug.
> 
> No debugging output at all.  Still hangs with only:
>   Kernel alive
>   Kernel direct mapping tables up to 1 @ 8000-d000

H. No other output? Could it be that early console output is not 
available? Try earlyprintk=xx? Try another platform that has working early 
printk support (x86_64 seems broken to me)?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ht CPU flag

2007-05-18 Thread eugene




On Fri, 18 May 2007, Chris Snook wrote:


[EMAIL PROTECTED] wrote:

 ---

 I have Pentium D CPU, which many Windows utilities like cpuz, wcpuid,
 everest identify as D 930 (Dual Core, 3GHz). From Intel site I find out
 that it has no HT feature, nor Windows XP identify it as HT.

 Why do I have "ht" flag in cpuinfo?
 ---


The "ht" flag merely means "I know how to report hyperthreaded logical 
processors if I have them."  My Woodcrest Xeon 5110 and my Athlon64 X2 both 
have the "ht" flag, and correctly report the zero hyperthreaded logical 
processors they each have.


-- Chris




Thanks, Chris.

Am I right that is chipset on mainboard, who is saying - "I know", not 
CPU itself? Is it better to switch off HT support in BIOS?

Is it possible to generate CPU name as: "Pentium D 930" in /proc/cpuinfo?
On the other server I have some 2GHz HT Xeons which can't be identified on 
Intel site because of strange naming pattern.
I tried to find any utility for Linux to solve this, but it looks like 
everybody are using /pros/cpuinfo, which is not enough :)


Regards, Eugene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] drm modesetting core

2007-05-18 Thread Luca Tettamanti

Il Thu, May 17, 2007 at 06:04:54PM -0700, Jesse Barnes ha scritto: 
> On Thursday, May 17, 2007, Luca Tettamanti wrote:
> > Il Thu, May 17, 2007 at 03:37:45PM -0700, Jesse Barnes ha scritto:
> > > This patch adds the core of the new DRM based modesetting system.
> >
> > A couple of comments on drm_fb since I'm somewhat familiar with fb code:
> > > new file mode 100644
> > > index 000..0d06792
> > > --- /dev/null
> > > +++ b/linux-core/drm_edid.c
> > > @@ -0,0 +1,467 @@
> > > +/*
> > > + * Copyright (c) 2007 Intel Corporation
> > > + *   Jesse Barnes <[EMAIL PROTECTED]>
> > > + *
> > > + * DDC probing routines (drm_ddc_read & drm_do_probe_ddc_edid)
> > > originally from
> > > + * FB layer.
> >
> > Hum, why are you duplicating them here? fbmon.c has the
> > infrastructure for parsing and even fixing known-broken EDIDs.
> 
> Yeah, there's more sharing that could be done... though I don't think the 
> fb layer has the bits to actually grab EDIDs.

There are the I2C functions (fb_do_probe_ddc_edid, fb_ddc_read - I wrote
them for the radeon driver, but now are available for general use) which
will issue the read command; fbmon.c has the stuff for parsing the EDID;
you usualy build a DB of supported modes which is then used to validate
the mode requested by the user. Of course each driver has to implement
the I2C adapter.

> Also, DRM is shared with BSD...

Your patch already uses 'struct i2c_adapter' in drm_edid.c, is it
portable?

Luca
-- 
"Vorrei morire ucciso dagli agi. Vorrei che di me si dicesse: ``Com'è
morto?'' ``Gli è scoppiato il portafogli''" -- Marcello Marchesi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq

2007-05-18 Thread Siddha, Suresh B

On Fri, May 18, 2007 at 12:02:16PM -0700, Eric W. Biederman wrote:
> I will look closer but I do believe that from the ioapic to the cpu the 2.6.21
> code should be fairly robust with respect to inflight messages from the ioapic
> to the local apics and the cpus.  What I failed to consider were inflight
> messages in the other direction arriving out of order.

Yes. Inflight messages from cpu to ioapic was what I was referring to.

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ht CPU flag

2007-05-18 Thread Chris Snook


[EMAIL PROTECTED] wrote:

---

I have Pentium D CPU, which many Windows utilities like cpuz, wcpuid, 
everest identify as D 930 (Dual Core, 3GHz). From Intel site I find out 
that it has no HT feature, nor Windows XP identify it as HT.


Why do I have "ht" flag in cpuinfo?
---


The "ht" flag merely means "I know how to report hyperthreaded logical 
processors if I have them."  My Woodcrest Xeon 5110 and my Athlon64 X2 both have 
the "ht" flag, and correctly report the zero hyperthreaded logical processors 
they each have.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-mm1 and now 2.6.21-git: SLUB Crashes on boot - crypto?

2007-05-18 Thread Luca Tettamanti

Il Fri, May 18, 2007 at 11:14:55PM +1000, Herbert Xu ha scritto: 
> On Fri, May 18, 2007 at 02:09:54PM +0200, Luca wrote:
> > 
> > Well, pretty sure. The OOPS says 2.6.22-rc1-libata-g705962cc-dirty,
> > git agrees and I've done a full rebuild. The .config is generated
> > using 'make oldconfig' using the 2.6.21 as baseline, maybe ALGAPI is
> > coming from there?
> 
> Sorry, my mistake.  That bug only happens if you have padlock turned on.
> 
> Anyway, if possible could you post the complete dmesg when it crashes?
> I'd like to see what has happened up to the point where it crashes.

Output from serial console is enlightening (sort of...):

Loading IPsec SA/SP database from /etc/ipsec-tools.conf: BUG: unable to
handle kernel paging request at virtual address 6b6b6ceb printing eip:
b0141aef
[oops]

Problem is that:
- /etc/ipsec-tools.conf is empty (everything is commented out), it's
  a leftover of previous experiments.
- AH and ESP are disabled in the kernel since I don't use them anymore.

Removing setkey script from init.d makes the OOPS disappear though;
nothing happens if I manually run setkey after the boot...

This is the full log:

Linux version 2.6.22-rc1-libata-g705962cc-dirty ([EMAIL PROTECTED]) (gcc 
version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #63 SMP PREEMPT Thu May 
17 00:22:29 CEST 2007
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009c800 (usable)
 BIOS-e820: 0009c800 - 000a (reserved)
 BIOS-e820: 000e4000 - 0010 (reserved)
 BIOS-e820: 0010 - 3ff9 (usable)
 BIOS-e820: 3ff9 - 3ff9e000 (ACPI data)
 BIOS-e820: 3ff9e000 - 3ffe (ACPI NVS)
 BIOS-e820: 3ffe - 4000 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: ffb0 - 0001 (reserved)
1023MB LOWMEM available.
found SMP MP-table at 000ff780
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   262032
early_node_map[1] active PFN ranges
0:0 ->   262032
DMI 2.4 present.
ACPI: RSDP 000FA980, 0024 (r2 ACPIAM)
ACPI: XSDT 3FF90100, 0054 (r1 KOZIRO FRONTIER 12000611 MSFT   97)
ACPI: FACP 3FF90290, 00F4 (r3 MSTEST OEMFACP  12000611 MSFT   97)
ACPI: DSDT 3FF905C0, 8F8C (r1  A0637 A06370000 INTL 20060113)
ACPI: FACS 3FF9E000, 0040
ACPI: APIC 3FF90390, 006C (r1 MSTEST OEMAPIC  12000611 MSFT   97)
ACPI: MCFG 3FF90400, 003C (r1 MSTEST OEMMCFG  12000611 MSFT   97)
ACPI: SLIC 3FF90440, 0176 (r1 KOZIRO FRONTIER 12000611 MSFT   97)
ACPI: OEMB 3FF9E040, 007B (r1 MSTEST AMI_OEM  12000611 MSFT   97)
ACPI: HPET 3FF99550, 0038 (r1 MSTEST OEMHPET  12000611 MSFT   97)
ACPI: PM-Timer IO Port: 0x808
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 6:15 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 6:15 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
ACPI: HPET id: 0x8086a202 base: 0xfed0
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 5000 (gap: 4000:bee0)
Built 1 zonelists.  Total pages: 259985
Kernel command line: BOOT_IMAGE=linux-2.6.22r1 ro video=radeonfb:[EMAIL 
PROTECTED] lapic apic=verbose root=/dev/mapper/mainVol-root console=tty0 
console=ttyS0,57600n8
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=b042e000 soft=b042c000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 2135.141 MHz processor.
Console: colour VGA+ 80x25
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:8
... MAX_LOCK_DEPTH:  30
... MAX_LOCKDEP_KEYS:2048
... CLASSHASH_SIZE:   1024
... MAX_LOCKDEP_ENTRIES: 8192
... MAX_LOCKDEP_CHAINS:  16384
... CHAINHASH_SIZE:  8192
 memory used by lock dependency info: 992 kB
 per task-struct memory footprint: 1200 bytes

| Locking API testsuite:

 | spin |wlock |rlock |mutex | wsem | rsem |
  --
 A-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
 A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
 A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
 A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
 A-B-B-C-C-D-D-A deadlock:

Re: [PATCH] 2.6.21-git15 - Kconfig Cleanup

2007-05-18 Thread Matt LaPlante

On Fri, 18 May 2007 20:01:54 +0200
Adrian Bunk <[EMAIL PROTECTED]> wrote:

> On Fri, May 18, 2007 at 01:04:41PM -0400, Matt LaPlante wrote:
> 
> > ping?
> 
> Noone disagreed, and trivial patches will be forwarded again during the 
> 2.6.23 merge window.

Ok, I didn't know it would be acceptable without an ack from someone... (is 
Randy on vacation? :)

I know we've discussed the logic behind trivial merges going in prior
to RC candidates, and generally I think it's sound enough (we don't
want to disrupt the merging of patches that actually "matter," etc).
I don't want to create a redundant conversation here, but I'm
compelled to ask for opinions on this...  I think the Kconfigs are a
fairly prominent kernel feature, and would expect a lot of systems
people will be seeing them when the new version goes final.  That also
means that a lot more people will be seeing the "trivial" errors in
spelling or grammar that are being left in until the next version.  In
this round, for example, the new blackfin entries were really in need
of some love.

I don't really know how many people will report such things, or submit
duplicate patches for them before we actually get to the next kernel
cycle, but it seems like a waste to me.  I don't really care so much
about fixes to the Documentation texts or source comments because, in
my estimation, they probably have a much smaller audience than the
kernel configuration interface.  I guess I'm just more sensitive to
the presentation aspects of a project than the average developer, but
I can't help feel it's a shame when we're willing to show the public
such an unpolished face in a "final" product.  To me, good code is
taken down a notch by haphazard presentation.

Thoughts?

Cheers,
Matt

> 
> cu
> Adrian
> 
> -- 
> 
>"Is there not promise of rain?" Ling Tan asked suddenly out
> of the darkness. There had been need of rain for many days.
>"Only a promise," Lao Er said.
>Pearl S. Buck - Dragon Seed
> 

-- 
Matt LaPlante
CCNP, CCDP, A+, Linux+, CQS
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ht CPU flag

2007-05-18 Thread Siddha, Suresh B

On Fri, May 18, 2007 at 11:45:59AM -0700, H. Peter Anvin wrote:
> IIRC, the HT flag is also reported for multicore CPUs.

Yes. Thats correct.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

glitch1 v1.6 script update and results on cfs-v13

2007-05-18 Thread Bill Davidsen

The glitch1 script has been vastly updated, and now runs by itself after 
being started. It produces files with the fps from glxgears and a 
"fairloops" file which indicates the number of loops for each of the 
scrolling xterms. This gives a good indication of fairness, all 
processes should have about the same number of loops.


Testing 2.6.21.1-cfs-v13:

Using all default settings, all four processes ran the same number of 
loops over 40sec within about 8%. I'll have some neat results with 
standard deviation by the end of the weekend, it's supposed to rain. 
Visual inspection of the glxgears while running looked smooth as a 
baby's ass.


Current self-running script attached, I'm writing a doc, hopefully if 
you want to tune it the comments are clear.


*Note*: these values make sense when various schedulers and tuning 
values are run on the same machine. So I'll be testing on three 
machines, with dual-core, hyperthreaded uni, and pure uni. Unless I see 
a hint that one of these cases is handled less well than the others I 
won't compare.


--
Bill Davidsen
  He was a full-time professional cat, not some moonlighting
ferret or weasel. He knew about these things.


glitch1.sh
Description: Bourne shell script

Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq

2007-05-18 Thread Eric W. Biederman

"Siddha, Suresh B" <[EMAIL PROTECTED]> writes:

> On Fri, May 18, 2007 at 11:28:25AM -0700, Yinghai Lu wrote:
>> On 5/18/07, Siddha, Suresh B <[EMAIL PROTECTED]> wrote:
>> >
>> > If the vector number stays same during irq migration and if we reset remote
>> > IRR bit using the above method(edge and then back to level) during
>> > irq migration, then we have a problem. A new interrupt arriving on a new
>> > cpu will set the remote IRR bit and now the old inflight EOI broadcast
>> > reaches IOAPIC RTE(resetting the remote IRR bit, because the vector in the
>> > broadcast msg is same), while the kernel code still assumes that the remote
>> > IRR bit is still set. This will lead to more problems and issues.
>> 
>> coud add some line __assign_irq_vector. to make sure old_vector!=vector.
>
> hmm..
> what happens when there is second(and very quick) irq migration which brings 
> the
> irq back to old cpu(or to a third cpu) with old vector.
>
> Point is, we are not taking care of the inflight messages(which can perhaps,
> theoretically, can get delayed for long time)

I will look closer but I do believe that from the ioapic to the cpu the 2.6.21
code should be fairly robust with respect to inflight messages from the ioapic
to the local apics and the cpus.  What I failed to consider were inflight
messages in the other direction arriving out of order.

Part of the problem in the general case is the only way you can tell an inflight
message was transmitted is that a message that you can prove followed the first
message arrives somewhere.

I'm in the middle of tracking two other problems so I can't review this in
detail until later today at the earliest.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/2] timerfd use waitqueue lock ...

2007-05-18 Thread Davide Libenzi

The timerfd was using the unlocked waitqueue operations, but it was
using a different lock, so poll_wait() would race with it. This patch
makes timerfd directly use the waitqueue lock.


Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


Index: linux-2.6.mod/fs/timerfd.c
===
--- linux-2.6.mod.orig/fs/timerfd.c 2007-05-18 11:11:09.0 -0700
+++ linux-2.6.mod/fs/timerfd.c  2007-05-18 11:11:16.0 -0700
@@ -24,7 +24,6 @@
 struct timerfd_ctx {
struct hrtimer tmr;
ktime_t tintv;
-   spinlock_t lock;
wait_queue_head_t wqh;
int expired;
 };
@@ -39,10 +38,10 @@
struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
unsigned long flags;
 
-   spin_lock_irqsave(>lock, flags);
+   spin_lock_irqsave(>wqh.lock, flags);
ctx->expired = 1;
wake_up_locked(>wqh);
-   spin_unlock_irqrestore(>lock, flags);
+   spin_unlock_irqrestore(>wqh.lock, flags);
 
return HRTIMER_NORESTART;
 }
@@ -83,10 +82,10 @@
 
poll_wait(file, >wqh, wait);
 
-   spin_lock_irqsave(>lock, flags);
+   spin_lock_irqsave(>wqh.lock, flags);
if (ctx->expired)
events |= POLLIN;
-   spin_unlock_irqrestore(>lock, flags);
+   spin_unlock_irqrestore(>wqh.lock, flags);
 
return events;
 }
@@ -101,7 +100,7 @@
 
if (count < sizeof(ticks))
return -EINVAL;
-   spin_lock_irq(>lock);
+   spin_lock_irq(>wqh.lock);
res = -EAGAIN;
if (!ctx->expired && !(file->f_flags & O_NONBLOCK)) {
__add_wait_queue(>wqh, );
@@ -115,9 +114,9 @@
res = -ERESTARTSYS;
break;
}
-   spin_unlock_irq(>lock);
+   spin_unlock_irq(>wqh.lock);
schedule();
-   spin_lock_irq(>lock);
+   spin_lock_irq(>wqh.lock);
}
__remove_wait_queue(>wqh, );
__set_current_state(TASK_RUNNING);
@@ -139,7 +138,7 @@
} else
ticks = 1;
}
-   spin_unlock_irq(>lock);
+   spin_unlock_irq(>wqh.lock);
if (ticks)
res = put_user(ticks, buf) ? -EFAULT: sizeof(ticks);
return res;
@@ -176,7 +175,6 @@
return -ENOMEM;
 
init_waitqueue_head(>wqh);
-   spin_lock_init(>lock);
 
timerfd_setup(ctx, clockid, flags, );
 
@@ -202,10 +200,10 @@
 * it to the new values.
 */
for (;;) {
-   spin_lock_irq(>lock);
+   spin_lock_irq(>wqh.lock);
if (hrtimer_try_to_cancel(>tmr) >= 0)
break;
-   spin_unlock_irq(>lock);
+   spin_unlock_irq(>wqh.lock);
cpu_relax();
}
/*
@@ -213,7 +211,7 @@
 */
timerfd_setup(ctx, clockid, flags, );
 
-   spin_unlock_irq(>lock);
+   spin_unlock_irq(>wqh.lock);
fput(file);
}
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/2] eventfd use waitqueue lock ...

2007-05-18 Thread Davide Libenzi

The eventfd was using the unlocked waitqueue operations, but it was
using a different lock, so poll_wait() would race with it. This patch
makes eventfd directly use the waitqueue lock.


Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide



Index: linux-2.6.mod/fs/eventfd.c
===
--- linux-2.6.mod.orig/fs/eventfd.c 2007-05-18 10:33:39.0 -0700
+++ linux-2.6.mod/fs/eventfd.c  2007-05-18 11:05:01.0 -0700
@@ -17,7 +17,6 @@
 #include 
 
 struct eventfd_ctx {
-   spinlock_t lock;
wait_queue_head_t wqh;
/*
 * Every time that a write(2) is performed on an eventfd, the
@@ -45,13 +44,13 @@
 
if (n < 0)
return -EINVAL;
-   spin_lock_irqsave(>lock, flags);
+   spin_lock_irqsave(>wqh.lock, flags);
if (ULLONG_MAX - ctx->count < n)
n = (int) (ULLONG_MAX - ctx->count);
ctx->count += n;
if (waitqueue_active(>wqh))
wake_up_locked(>wqh);
-   spin_unlock_irqrestore(>lock, flags);
+   spin_unlock_irqrestore(>wqh.lock, flags);
 
return n;
 }
@@ -70,14 +69,14 @@
 
poll_wait(file, >wqh, wait);
 
-   spin_lock_irqsave(>lock, flags);
+   spin_lock_irqsave(>wqh.lock, flags);
if (ctx->count > 0)
events |= POLLIN;
if (ctx->count == ULLONG_MAX)
events |= POLLERR;
if (ULLONG_MAX - 1 > ctx->count)
events |= POLLOUT;
-   spin_unlock_irqrestore(>lock, flags);
+   spin_unlock_irqrestore(>wqh.lock, flags);
 
return events;
 }
@@ -92,7 +91,7 @@
 
if (count < sizeof(ucnt))
return -EINVAL;
-   spin_lock_irq(>lock);
+   spin_lock_irq(>wqh.lock);
res = -EAGAIN;
ucnt = ctx->count;
if (ucnt > 0)
@@ -110,9 +109,9 @@
res = -ERESTARTSYS;
break;
}
-   spin_unlock_irq(>lock);
+   spin_unlock_irq(>wqh.lock);
schedule();
-   spin_lock_irq(>lock);
+   spin_lock_irq(>wqh.lock);
}
__remove_wait_queue(>wqh, );
__set_current_state(TASK_RUNNING);
@@ -122,7 +121,7 @@
if (waitqueue_active(>wqh))
wake_up_locked(>wqh);
}
-   spin_unlock_irq(>lock);
+   spin_unlock_irq(>wqh.lock);
if (res > 0 && put_user(ucnt, (__u64 __user *) buf))
return -EFAULT;
 
@@ -143,7 +142,7 @@
return -EFAULT;
if (ucnt == ULLONG_MAX)
return -EINVAL;
-   spin_lock_irq(>lock);
+   spin_lock_irq(>wqh.lock);
res = -EAGAIN;
if (ULLONG_MAX - ctx->count > ucnt)
res = sizeof(ucnt);
@@ -159,9 +158,9 @@
res = -ERESTARTSYS;
break;
}
-   spin_unlock_irq(>lock);
+   spin_unlock_irq(>wqh.lock);
schedule();
-   spin_lock_irq(>lock);
+   spin_lock_irq(>wqh.lock);
}
__remove_wait_queue(>wqh, );
__set_current_state(TASK_RUNNING);
@@ -171,7 +170,7 @@
if (waitqueue_active(>wqh))
wake_up_locked(>wqh);
}
-   spin_unlock_irq(>lock);
+   spin_unlock_irq(>wqh.lock);
 
return res;
 }
@@ -210,7 +209,6 @@
return -ENOMEM;
 
init_waitqueue_head(>wqh);
-   spin_lock_init(>lock);
ctx->count = count;
 
/*

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/10] Slab defragmentation V2

2007-05-18 Thread Michal Piotrowski


On 18/05/07, Christoph Lameter <[EMAIL PROTECTED]> wrote:

For Dave: You can find the patchset also at

http://ftp.kernel.org/pub/linux/kernel/people/christoph/slub-defrag


s/slub-defrag/slab-defrag

http://ftp.kernel.org/pub/linux/kernel/people/christoph/slab-defrag/

Regards,
Michal

--
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] Char: cyclades, add firmware loading

2007-05-18 Thread Jiri Slaby


On 5/18/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Fri, 18 May 2007 19:49:11 +0200 (CEST)
Jiri Slaby <[EMAIL PROTECTED]> wrote:
> cyclades, add firmware loading

[...]

The second patch fixes a bug in 2.6.22-rc1 and in 2.6.21 (yes?) but


I think, the bug is there since the driver merge (original driver
available on vendor's site contain this issue too), so yes.

thanks,
--
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ht CPU flag

2007-05-18 Thread H. Peter Anvin

Bernd Eckenfels wrote:
> In article <[EMAIL PROTECTED]> you wrote:
>> I have Pentium D CPU, which many Windows utilities like cpuz, wcpuid, 
>> everest identify as D 930 (Dual Core, 3GHz). From Intel site I find out 
>> that it has no HT feature, nor Windows XP identify it as HT.
> 
> the ht flag reported by the CPU and cpuinfo is not a reliable detection if
> HT is available on your CPU or your motherboard/bios.
> 
>> Why do I have "ht" flag in cpuinfo?
> 
> Because your CPU reports it. You will see that also in cpuz output.
> 
> However, you can see ht in the sibblings value (for a single core it will be
> 2 if you have HT, I am not sure if it is 4 for a dual core CPU)

IIRC, the HT flag is also reported for multicore CPUs.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq

2007-05-18 Thread Siddha, Suresh B

On Fri, May 18, 2007 at 11:28:25AM -0700, Yinghai Lu wrote:
> On 5/18/07, Siddha, Suresh B <[EMAIL PROTECTED]> wrote:
> >
> > If the vector number stays same during irq migration and if we reset remote
> > IRR bit using the above method(edge and then back to level) during
> > irq migration, then we have a problem. A new interrupt arriving on a new
> > cpu will set the remote IRR bit and now the old inflight EOI broadcast
> > reaches IOAPIC RTE(resetting the remote IRR bit, because the vector in the
> > broadcast msg is same), while the kernel code still assumes that the remote
> > IRR bit is still set. This will lead to more problems and issues.
> 
> coud add some line __assign_irq_vector. to make sure old_vector!=vector.

hmm..
what happens when there is second(and very quick) irq migration which brings the
irq back to old cpu(or to a third cpu) with old vector.

Point is, we are not taking care of the inflight messages(which can perhaps,
theoretically, can get delayed for long time)

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] drivers/net/wireless/libertas/fw.c: fix use-before-check

2007-05-18 Thread John W. Linville

On Sat, May 19, 2007 at 01:06:49AM +0800, Eugene Teo wrote:
> NULL checks should be performed before the dereference.
> 
> Spotted by the Coverity checker.
> 
> Signed-off-by: Eugene Teo <[EMAIL PROTECTED]>

This does not apply against 2.6.22-rc1.  Please rediff and repost.

Thanks,

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] fix unused setup_nr_node_ids

2007-05-18 Thread Miklos Szeredi

> That doesn't do much to inprove overall readability.
> 
> I suspect the warning was only there because the stubbed version of
> setup_nr_node_ids() forgot to be declared static inline, yes?
> 
> How about this?

Yes, looks good.

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq

2007-05-18 Thread Yinghai Lu


On 5/18/07, Siddha, Suresh B <[EMAIL PROTECTED]> wrote:


If the vector number stays same during irq migration and if we reset remote
IRR bit using the above method(edge and then back to level) during
irq migration, then we have a problem. A new interrupt arriving on a new
cpu will set the remote IRR bit and now the old inflight EOI broadcast
reaches IOAPIC RTE(resetting the remote IRR bit, because the vector in the
broadcast msg is same), while the kernel code still assumes that the remote
IRR bit is still set. This will lead to more problems and issues.


coud add some line __assign_irq_vector. to make sure old_vector!=vector.

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/10] Slab defragmentation V2

2007-05-18 Thread Christoph Lameter

For Dave: You can find the patchset also at

http://ftp.kernel.org/pub/linux/kernel/people/christoph/slub-defrag

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 06/10] xfs: inode defragmentation support

2007-05-18 Thread Christoph Lameter

Rats. Missing a piece due to the need to change the parameters of
kmem_zone_init_flags (Isnt it possible to use kmem_cache_create 
directly?).

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: slub/fs/xfs/xfs_vfsops.c
===
--- slub.orig/fs/xfs/xfs_vfsops.c   2007-05-18 11:23:27.0 -0700
+++ slub/fs/xfs/xfs_vfsops.c2007-05-17 22:14:34.0 -0700
@@ -109,13 +109,13 @@ xfs_init(void)
xfs_inode_zone =
kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
KM_ZONE_HWALIGN | KM_ZONE_RECLAIM |
-   KM_ZONE_SPREAD, NULL);
+   KM_ZONE_SPREAD, NULL, NULL);
xfs_ili_zone =
kmem_zone_init_flags(sizeof(xfs_inode_log_item_t), "xfs_ili",
-   KM_ZONE_SPREAD, NULL);
+   KM_ZONE_SPREAD, NULL, NULL);
xfs_chashlist_zone =
kmem_zone_init_flags(sizeof(xfs_chashlist_t), "xfs_chashlist",
-   KM_ZONE_SPREAD, NULL);
+   KM_ZONE_SPREAD, NULL, NULL);
 
/*
 * Allocate global trace buffers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Weird hard disk noise on shutdown (bug #7674)

2007-05-18 Thread Lennart Sorensen

On Fri, May 18, 2007 at 03:16:45AM -0400, Rob Landley wrote:
> Because the controller's far more likely to go than the moving parts...

Having redundant everything would be preferable.  But cables do
sometimes fail.  Not sure about the controllers, but if the controller
went you might loose the whole raid.

> On the models I saw they also literally gold plated the connectors, spun the 
> disks faster, and basically did everything they could think of to make the 
> same basic bundle of components more expensive.  (Of course the fundamental 
> thing you do to make it more expensive is have smaller production runs in the 
> first place...)

Well I guess the theory is that a gold plated connector won't corrode as
much so the connection is less likely to fail.  Not sure if there is any
proof to backup that theory.

--
Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq

2007-05-18 Thread Eric W. Biederman

"Yinghai Lu" <[EMAIL PROTECTED]> writes:

> On 5/18/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> We can solve the problem without doing that, and keeping the same
>> vector number during migration keeps x86 from scaling.
>
> I mean ioapic level irq couls be limited. new device could use MSI or
> HT irq directly and less irq routing problem.

Possibly.  It really doesn't buy us anything until most irqs are MSI
which they are not yet.

>> Personally I would prefer to disallow irq migration.
> ? typo?
> For amd platform with different hypertransport chain on different
> nodes, irq migration is needed.

irqs not on cpu0 are needed.  irq migration is less necessary, and I 
periodically
think we are insane for supporting it.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v13

2007-05-18 Thread Anant Nitya

On Friday 18 May 2007 15:56:07 Ingo Molnar wrote:
> * Anant Nitya <[EMAIL PROTECTED]> wrote:
> > Hi
> >
> > Been testing this version of CFS from last an hour or so and still
> > facing same lag problems while browsing sites with heavy JS and or
> > flash usage. Mouse movement is pathetic and audio starts to skip. I
> > haven't face this behavior with CFS till v11.
>
> i have just tried 5 different versions of the Flash plugin and i cannot
> reproduce this (flash games are still smooth and acceptable even with
> the system significantly overloaded with 5 infite loops or with a kernel
> build), so it would be nice if you could help me debug this problem.
>
> The last version that worked for you was v11, correct? The biggest v11
> -> v12 change was the yield workaround, and while testing your workload
> i also noticed that all Flash versions except the latest one (9.0 r31)
> use sys_sched_yield() quite frequently. So it would be nice to know
> which plugin version you are using (and which Firefox version): you can
> check that by typing about:plugins into firefox. Furthermore, could you
> also try the following tune:
Hi
I am using konqueror and about:plugins gives back this information regarding 
flashplayer.
Shockwave Flash  Shockwave Flash 9.0 r31  libflashplayer.so
  application/x-shockwave-flash - Shockwave Flash (swf)
application/futuresplash - FutureSplash Player (spl)
>
>echo 0 > /proc/sys/kernel/sched_yield_bug_workaround
>
> and this:
>
>echo 2 > /proc/sys/kernel/sched_yield_bug_workaround
>
These values do visibly makes browsing smooth but it still lags though lag 
time is less compared to original values.

> if none of this changes behavior then please send me the output of the
> following:
>
>   strace -ttt -TTT -o strace.txt -f -p `pidof firefox-bin`
>   < reproduce the lag in firefox >
>   < Ctrl-C the strace >
>
> and send me the strace.txt file (off-line, it's going to be large).
> Thanks,
I am sending you all these information off list.

Regards 
Ananitya

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] MM : alloc_large_system_hash() can free some memory for non power-of-two bucketsize

2007-05-18 Thread Christoph Lameter

On Fri, 18 May 2007, Eric Dumazet wrote:

>   table = (void*) __get_free_pages(GFP_ATOMIC, order);

ATOMIC? Is there some reason why we need atomic here?

> + /*
> +  * If bucketsize is not a power-of-two, we may free
> +  * some pages at the end of hash table.
> +  */
> + if (table) {
> + unsigned long alloc_end = (unsigned long)table +
> + (PAGE_SIZE << order);
> + unsigned long used = (unsigned long)table +
> + PAGE_ALIGN(size);
> + while (used < alloc_end) {
> + free_page(used);

Isnt this going to interfere with the kernel_map_pages debug stuff?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rfc] increase struct page size?!

2007-05-18 Thread Christoph Lameter

On Fri, 18 May 2007, Nick Piggin wrote:

> However we don't have to let those 8 bytes go to waste: we can use them
> to store the virtual address of the page, which kind of makes sense for
> 64-bit, because they can likely to use complicated memory models.

That is not a valid consideration anymore. There is virtual memmap update 
pending with the sparsemem folks that will simplify things.

> Many batch operations on struct page are completely random, and as such, I
> think it is better if each struct page fits completely into a single
> cacheline even if it means being slightly larger.

Right. That would simplify the calculations.

> Don't let this space go to waste though, we can use page->virtual in order
> to optimise page_address operations.

page->virtual is a benefit if the page is cache hot. Otherwise it may 
cause a useless lookup.

I wonder if there are other uses for the free space?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rfc] increase struct page size?!

2007-05-18 Thread Christoph Lameter

On Fri, 18 May 2007, Nick Piggin wrote:

> The page->virtual thing is just a bonus (although have you seen what
> sort of hoops SPARSEMEM has to go through to find page_address?! It
> will definitely be a win on those architectures).

That is on the way out. See the discussion on virtual memmap support in 
sparseme.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] Char: cyclades, add firmware loading

2007-05-18 Thread Andrew Morton

On Fri, 18 May 2007 19:49:11 +0200 (CEST)
Jiri Slaby <[EMAIL PROTECTED]> wrote:

> cyclades, add firmware loading
> 

eww, it adds a variable called "tmp".

This change isn't appropriate to 2.6.22.

The second patch fixes a bug in 2.6.22-rc1 and in 2.6.21 (yes?) but
includes irrelevant changes which aren't appropriate to 2.6.22 and which
depend on patch #1.

So I split the bugfix out into a separate patch.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 09/10] sockets: inode defragmentation support

2007-05-18 Thread clameter

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 net/socket.c |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

Index: slub/net/socket.c
===
--- slub.orig/net/socket.c  2007-05-18 00:54:30.0 -0700
+++ slub/net/socket.c   2007-05-18 01:03:31.0 -0700
@@ -264,6 +264,17 @@ static void init_once(void *foo, struct 
inode_init_once(>vfs_inode);
 }
 
+static void *sock_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+   return fs_get_inodes(s, nr, v,
+   offsetof(struct socket_alloc, vfs_inode));
+}
+
+static struct kmem_cache_ops sock_kmem_cache_ops = {
+   .get = sock_get_inodes,
+   .kick = kick_inodes
+};
+
 static int init_inodecache(void)
 {
sock_inode_cachep = kmem_cache_create("sock_inode_cache",
@@ -273,7 +284,7 @@ static int init_inodecache(void)
   SLAB_RECLAIM_ACCOUNT |
   SLAB_MEM_SPREAD),
  init_once,
- NULL);
+ _kmem_cache_ops);
if (sock_inode_cachep == NULL)
return -ENOMEM;
return 0;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 02/10] SLUB: slab defragmentation and kmem_cache_vacate

2007-05-18 Thread clameter

Slab defragmentation occurs when the slabs are shrunk (after inode, dentry
shrinkers have been run from the reclaim code) or when a manual shrinking
is requested via slabinfo. During the shrink operation SLUB will generate a
list of partially populated slabs sorted by the number of objects in use.

We extract pages off that list that are only filled less than a quarter and
attempt to motivate the users of those slabs to either remove the objects
or move the objects.

Targeted reclaim allows to target a single slab for reclaim. This is done by
calling

kmem_cache_vacate(page);

It will return 1 on success, 0 if the operation failed.


In order for a slabcache to support defragmentation a couple of functions
must be defined via kmem_cache_ops. These are

void *get(struct kmem_cache *s, int nr, void **objects)

Must obtain a reference to the listed objects. SLUB guarantees that
the objects are still allocated. However, other threads may be blocked
in slab_free attempting to free objects in the slab. These may succeed
as soon as get() returns to the slab allocator. The function must
be able to detect the situation and void the attempts to handle such
objects (by for example voiding the corresponding entry in the objects
array).

No slab operations may be performed in get_reference(). Interrupts
are disabled. What can be done is very limited. The slab lock
for the page with the object is taken. Any attempt to perform a slab
operation may lead to a deadlock.

get() returns a private pointer that is passed to kick. Should we
be unable to obtain all references then that pointer may indicate
to the kick() function that it should not attempt any object removal
or move but simply remove the reference counts.

void kick(struct kmem_cache *, int nr, void **objects, void *get_result)

After SLUB has established references to the objects in a
slab it will drop all locks and then use kick() to move objects out
of the slab. The existence of the object is guaranteed by virtue of
the earlier obtained references via get(). The callback may perform
any slab operation since no locks are held at the time of call.

The callback should remove the object from the slab in some way. This
may be accomplished by reclaiming the object and then running
kmem_cache_free() or reallocating it and then running
kmem_cache_free(). Reallocation is advantageous because the partial
slabs were just sorted to have the partial slabs with the most objects
first. Allocation is likely to result in filling up a slab so that
it can be removed from the partial list.

Kick() does not return a result. SLUB will check the number of
remaining objects in the slab. If all objects were removed then
we know that the operation was successful.

If a kmem_cache_vacate on a page fails then the slab has usually a pretty
low usage ratio. Go through the slab and resequence the freelist so that
object addresses increase as we allocate objects. This will trigger the
cacheline prefetcher when we start allocating from the slab again and
thereby increase allocations speed.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slab.h |   31 +
 mm/slab.c|9 +
 mm/slob.c|9 +
 mm/slub.c|  264 +--
 4 files changed, 303 insertions(+), 10 deletions(-)

Index: slub/include/linux/slab.h
===
--- slub.orig/include/linux/slab.h  2007-05-18 00:13:39.0 -0700
+++ slub/include/linux/slab.h   2007-05-18 00:13:40.0 -0700
@@ -39,6 +39,36 @@ void __init kmem_cache_init(void);
 int slab_is_available(void);
 
 struct kmem_cache_ops {
+   /*
+* Called with slab lock held and interrupts disabled.
+* No slab operation may be performed.
+*
+* Parameters passed are the number of objects to process
+* and a an array of pointers to objects for which we
+* need references.
+*
+* Returns a pointer that is passed to the kick function.
+* If all objects cannot be moved then the pointer may
+* indicate that this wont work and then kick can simply
+* remove the references that were already obtained.
+*
+* The array passed to get() is also passed to kick(). The
+* function may remove objects by setting array elements to NULL.
+*/
+   void *(*get)(struct kmem_cache *, int nr, void **);
+
+   /*
+* Called with no locks held and interrupts enabled.
+* Any operation may be performed in kick().
+*
+* Parameters passed are the number of objects in the array,
+* the array of pointers to the objects and the pointer
+

[patch 10/10] ext2 ext3 ext4: support inode slab defragmentation

2007-05-18 Thread clameter

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/ext2/super.c |   16 ++--
 fs/ext3/super.c |   14 +-
 fs/ext4/super.c |   15 ++-
 3 files changed, 41 insertions(+), 4 deletions(-)

Index: slub/fs/ext2/super.c
===
--- slub.orig/fs/ext2/super.c   2007-05-18 10:19:12.0 -0700
+++ slub/fs/ext2/super.c2007-05-18 10:24:03.0 -0700
@@ -168,14 +168,26 @@ static void init_once(void * foo, struct
mutex_init(>truncate_mutex);
inode_init_once(>vfs_inode);
 }
- 
+
+static void *ext2_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+   return fs_get_inodes(s, nr, v,
+   offsetof(struct ext2_inode_info, vfs_inode));
+}
+
+static struct kmem_cache_ops ext2_kmem_cache_ops = {
+   ext2_get_inodes,
+   kick_inodes
+};
+
 static int init_inodecache(void)
 {
ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
 sizeof(struct ext2_inode_info),
 0, (SLAB_RECLAIM_ACCOUNT|
SLAB_MEM_SPREAD),
-init_once, NULL);
+init_once,
+_kmem_cache_ops);
if (ext2_inode_cachep == NULL)
return -ENOMEM;
return 0;
Index: slub/fs/ext3/super.c
===
--- slub.orig/fs/ext3/super.c   2007-05-18 10:22:01.0 -0700
+++ slub/fs/ext3/super.c2007-05-18 10:23:04.0 -0700
@@ -475,13 +475,25 @@ static void init_once(void * foo, struct
inode_init_once(>vfs_inode);
 }
 
+static void *ext3_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+   return fs_get_inodes(s, nr, v,
+   offsetof(struct ext3_inode_info, vfs_inode));
+}
+
+static struct kmem_cache_ops ext3_kmem_cache_ops = {
+   ext3_get_inodes,
+   kick_inodes
+};
+
 static int init_inodecache(void)
 {
ext3_inode_cachep = kmem_cache_create("ext3_inode_cache",
 sizeof(struct ext3_inode_info),
 0, (SLAB_RECLAIM_ACCOUNT|
SLAB_MEM_SPREAD),
-init_once, NULL);
+init_once,
+_kmem_cache_ops);
if (ext3_inode_cachep == NULL)
return -ENOMEM;
return 0;
Index: slub/fs/ext4/super.c
===
--- slub.orig/fs/ext4/super.c   2007-05-18 10:23:15.0 -0700
+++ slub/fs/ext4/super.c2007-05-18 10:23:48.0 -0700
@@ -535,13 +535,26 @@ static void init_once(void * foo, struct
inode_init_once(>vfs_inode);
 }
 
+static void *ext4_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+   return fs_get_inodes(s, nr, v,
+   offsetof(struct ext4_inode_info, vfs_inode));
+}
+
+static struct kmem_cache_ops ext4_kmem_cache_ops = {
+   ext4_get_inodes,
+   kick_inodes
+};
+
+
 static int init_inodecache(void)
 {
ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
 sizeof(struct ext4_inode_info),
 0, (SLAB_RECLAIM_ACCOUNT|
SLAB_MEM_SPREAD),
-init_once, NULL);
+init_once,
+_kmem_cache_ops);
if (ext4_inode_cachep == NULL)
return -ENOMEM;
return 0;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 08/10] shmem: inode defragmentation support

2007-05-18 Thread clameter

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/shmem.c |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

Index: slub/mm/shmem.c
===
--- slub.orig/mm/shmem.c2007-05-18 00:54:30.0 -0700
+++ slub/mm/shmem.c 2007-05-18 01:02:26.0 -0700
@@ -2337,11 +2337,22 @@ static void init_once(void *foo, struct 
 #endif
 }
 
+static void *shmem_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+   return fs_get_inodes(s, nr, v,
+   offsetof(struct shmem_inode_info, vfs_inode));
+}
+
+static struct kmem_cache_ops shmem_kmem_cache_ops = {
+   .get = shmem_get_inodes,
+   .kick = kick_inodes
+};
+
 static int init_inodecache(void)
 {
shmem_inode_cachep = kmem_cache_create("shmem_inode_cache",
sizeof(struct shmem_inode_info),
-   0, 0, init_once, NULL);
+   0, 0, init_once, _kmem_cache_ops);
if (shmem_inode_cachep == NULL)
return -ENOMEM;
return 0;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 03/10] Dentry defragmentation

2007-05-18 Thread clameter

This patch allows the removal of unused or negative dentry entries in a
partially populated slab page.

get() uses the dcache lock and then works with dget_locked to obtain a
reference to the dentry. An additional complication is that the dentry
may be in process of being freed or it may just have been allocated.
We add an additional flag to d_flags to be able to determined the
status of an object.

kick() is called after get() has been used and after the slab has dropped
all of its own locks. The dentry pruning for unused entries works in a
straighforward way.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/dcache.c|  100 +
 include/linux/dcache.h |4 +
 2 files changed, 96 insertions(+), 8 deletions(-)

Index: slub/fs/dcache.c
===
--- slub.orig/fs/dcache.c   2007-05-18 10:53:01.0 -0700
+++ slub/fs/dcache.c2007-05-18 10:58:38.0 -0700
@@ -136,6 +136,7 @@ static struct dentry *d_kill(struct dent
 
list_del(>d_u.d_child);
dentry_stat.nr_dentry--;/* For d_free, below */
+   dentry->d_flags &= ~DCACHE_ENTRY_VALID;
/*drops the locks, at that point nobody can reach this dentry */
dentry_iput(dentry);
parent = dentry->d_parent;
@@ -952,6 +953,7 @@ struct dentry *d_alloc(struct dentry * p
if (parent)
list_add(>d_u.d_child, >d_subdirs);
dentry_stat.nr_dentry++;
+   dentry->d_flags |= DCACHE_ENTRY_VALID;
spin_unlock(_lock);
 
return dentry;
@@ -2114,18 +2116,100 @@ static void __init dcache_init_early(voi
INIT_HLIST_HEAD(_hashtable[loop]);
 }
 
+/*
+ * The slab is holding off frees. Thus we can safely examine
+ * the object without the danger of it vanishing from under us.
+ */
+static void *get_dentries(struct kmem_cache *s, int nr, void **v)
+{
+   struct dentry *dentry;
+   unsigned long abort = 0;
+   int i;
+
+   spin_lock(_lock);
+   for (i = 0; i < nr; i++) {
+   dentry = v[i];
+   /*
+* if DCACHE_ENTRY_VALID is not set then the dentry
+* may be already in the process of being freed.
+*/
+   if (abort || !(dentry->d_flags & DCACHE_ENTRY_VALID))
+   v[i] = NULL;
+   else {
+   dget_locked(dentry);
+   abort = atomic_read(>d_count) > 1;
+   }
+   }
+   spin_unlock(_lock);
+   return (void *)abort;
+}
+
+/*
+ * Slab has dropped all the locks. Get rid of the
+ * refcount we obtained earlier and also rid of the
+ * object.
+ */
+static void kick_dentries(struct kmem_cache *s, int nr, void **v, void 
*private)
+{
+   struct dentry *dentry;
+   unsigned long abort = (unsigned long)private;
+   int i;
+
+   spin_lock(_lock);
+   for (i = 0; i < nr; i++) {
+   dentry = v[i];
+   if (!dentry)
+   continue;
+
+   if (abort)
+   goto put_dentry;
+
+   spin_lock(>d_lock);
+   if (atomic_read(>d_count) > 1) {
+   /*
+* Reference count was increased.
+* We need to abandon the freeing of
+* objects.
+*/
+   abort = 1;
+   spin_unlock(>d_lock);
+put_dentry:
+   spin_unlock(_lock);
+   dput(dentry);
+   spin_lock(_lock);
+   continue;
+   }
+
+   /* Remove from LRU */
+   if (!list_empty(>d_lru)) {
+   dentry_stat.nr_unused--;
+   list_del_init(>d_lru);
+   }
+   /* Drop the entry */
+   prune_one_dentry(dentry, 1);
+   }
+   spin_unlock(_lock);
+   /*
+* dentries are freed using RCU so we need to wait until RCU
+* operations arei complete
+*/
+   if (!abort)
+   synchronize_rcu();
+}
+
+static struct kmem_cache_ops dentry_kmem_cache_ops = {
+   .get = get_dentries,
+   .kick = kick_dentries,
+};
+
 static void __init dcache_init(unsigned long mempages)
 {
int loop;
 
-   /* 
-* A constructor could be added for stable state like the lists,
-* but it is probably not worth it because of the cache nature
-* of the dcache. 
-*/
-   dentry_cache = KMEM_CACHE(dentry,
-   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);
-   
+   dentry_cache = KMEM_CACHE_OPS(dentry,
+   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD,
+   _kmem_cache_ops);
+
register_shrinker(_shrinker);
 
/* Hash may have been set up in dcache_init_early */
Index:

[patch 05/10] reiserfs: inode defragmentation support

2007-05-18 Thread clameter

Add inode defrag support

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/reiserfs/super.c |   14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

Index: slub/fs/reiserfs/super.c
===
--- slub.orig/fs/reiserfs/super.c   2007-05-18 00:54:30.0 -0700
+++ slub/fs/reiserfs/super.c2007-05-18 00:57:12.0 -0700
@@ -520,6 +520,17 @@ static void init_once(void *foo, struct 
 #endif
 }
 
+static void *reiserfs_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+   return fs_get_inodes(s, nr, v,
+   offsetof(struct reiserfs_inode_info, vfs_inode));
+}
+
+struct kmem_cache_ops reiserfs_kmem_cache_ops = {
+   .get = reiserfs_get_inodes,
+   .kick = kick_inodes
+};
+
 static int init_inodecache(void)
 {
reiserfs_inode_cachep = kmem_cache_create("reiser_inode_cache",
@@ -527,7 +538,8 @@ static int init_inodecache(void)
 reiserfs_inode_info),
  0, (SLAB_RECLAIM_ACCOUNT|
SLAB_MEM_SPREAD),
- init_once, NULL);
+ init_once,
+ _kmem_cache_ops);
if (reiserfs_inode_cachep == NULL)
return -ENOMEM;
return 0;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 00/10] Slab defragmentation V2

2007-05-18 Thread clameter

Hugh: Could you have a look at this? There is lots of critical locking
here

Support for Slab defragmentation and targeted reclaim. The current
state of affairs is that a large portion of inode and dcache slab caches
can be effectively reclaimed. The remaining problems are:

1. We cannot reclaim dentries / inodes that are in active use.
   Probably inadvisable anyways and maybe impossible if we cannot track
   down all the references to dentries. The active dentries / inodes are
   usually only a small subset of the inode / dentries in the system.

2. Directory reclaim is an issue both for dentries and inodes. A solution
   here may be complex. However, the majority of dentries and inodes are
   *not* directories. So I think that we are fine even without being
   able to reclaim slabs with directory entries in them. These slabs
   will move to the head of the partial list and soon be filled up with
   other entries. This means that the directory entries will tend to
   aggregate over time in certain slabs.

Slab defragmentation is performed during kmem_cache_shrink. This is usually
triggered through the slab shrinkers but can also be manually triggered
through the slabinfo command by running "slabinfo -s".

Support is also provided for antifrag/defrag to evict a specific slab page
through the kmem_cache_vacate function call.  Since we can now target the
freeing of slab pages we may now be able to remove a page that hinders
the freeing of a higher order page in the antifrag/defrag code.

V1->V2
- Clean up control flow using a state variable. Simplify API. Back to 2
  functions that now take arrays of objects.
- Inode defrag support for a set of filesystems
- Fix up dentry defrag support to work on negative dentries by adding
  a new dentry flag that indicates that a dentry is not in the process
  of being freed or allocated.

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 07/10] procfs: inode defragmentation support

2007-05-18 Thread clameter

Hmmm... Do we really need this?

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/proc/inode.c |   15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

Index: slub/fs/proc/inode.c
===
--- slub.orig/fs/proc/inode.c   2007-05-18 00:54:30.0 -0700
+++ slub/fs/proc/inode.c2007-05-18 01:00:36.0 -0700
@@ -111,14 +111,25 @@ static void init_once(void * foo, struct
 
inode_init_once(>vfs_inode);
 }
- 
+
+static void *proc_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+   return fs_get_inodes(s, nr, v,
+   offsetof(struct proc_inode, vfs_inode));
+};
+
+static struct kmem_cache_ops proc_kmem_cache_ops = {
+   .get = proc_get_inodes,
+   .kick = kick_inodes
+};
+
 int __init proc_init_inodecache(void)
 {
proc_inode_cachep = kmem_cache_create("proc_inode_cache",
 sizeof(struct proc_inode),
 0, (SLAB_RECLAIM_ACCOUNT|
SLAB_MEM_SPREAD),
-init_once, NULL);
+init_once, _kmem_cache_ops);
if (proc_inode_cachep == NULL)
return -ENOMEM;
return 0;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 04/10] Generic inode defragmentation

2007-05-18 Thread clameter

This implements the ability to remove a list of inodes from the inode
cache. In order to remove an inode we may have to write out the pages
of an inode, the inode itself and remove the dentries referring to the
node.

Provide generic functionality that can be used by filesystems that have
their own inode caches to also tie into the defragmentation functions
that are made available here.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/inode.c |   92 -
 include/linux/fs.h |5 ++
 2 files changed, 96 insertions(+), 1 deletion(-)

Index: slub/fs/inode.c
===
--- slub.orig/fs/inode.c2007-05-18 00:50:36.0 -0700
+++ slub/fs/inode.c 2007-05-18 00:55:40.0 -0700
@@ -1361,6 +1361,96 @@ static int __init set_ihash_entries(char
 }
 __setup("ihash_entries=", set_ihash_entries);
 
+static void *get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+   int i;
+
+   spin_lock(_lock);
+   for (i = 0; i < nr; i++) {
+   struct inode *inode = v[i];
+
+   if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+   v[i] = NULL;
+   else
+   __iget(inode);
+   }
+   spin_unlock(_lock);
+   return NULL;
+}
+
+/*
+ * Function for filesystems that embedd struct inode into their own
+ * structures. The offset is the offset of the struct inode in the fs inode.
+ */
+void *fs_get_inodes(struct kmem_cache *s, int nr, void **v, unsigned long 
offset)
+{
+   int i;
+
+   for (i = 0; i < nr; i++)
+   v[i] += offset;
+
+   return get_inodes(s, nr, v);
+}
+EXPORT_SYMBOL(fs_get_inodes);
+
+void kick_inodes(struct kmem_cache *s, int nr, void **v, void *private)
+{
+   struct inode *inode;
+   int i;
+   int abort = 0;
+   LIST_HEAD(freeable);
+   struct super_block *sb;
+
+   for (i = 0; i < nr; i++) {
+   inode = v[i];
+   if (!inode)
+   continue;
+
+   if (inode_has_buffers(inode) || inode->i_data.nrpages) {
+   if (remove_inode_buffers(inode))
+   invalidate_mapping_pages(>i_data,
+   0, -1);
+   }
+
+   if (inode->i_state & I_DIRTY)
+   write_inode_now(inode, 1);
+
+   if (atomic_read(>i_count) > 1)
+   d_prune_aliases(inode);
+   }
+
+   mutex_lock(_mutex);
+   for (i = 0; i < nr; i++) {
+   inode = v[i];
+   if (!inode)
+   continue;
+
+   sb = inode->i_sb;
+   iput(inode);
+   if (abort || !(sb->s_flags & MS_ACTIVE))
+   continue;
+
+   spin_lock(_lock);
+   if (!can_unuse(inode)) {
+   abort = 1;
+   spin_unlock(_lock);
+   continue;
+   }
+   list_move(>i_list, );
+   inode->i_state |= I_FREEING;
+   inodes_stat.nr_unused--;
+   spin_unlock(_lock);
+   }
+   dispose_list();
+   mutex_unlock(_mutex);
+}
+EXPORT_SYMBOL(kick_inodes);
+
+static struct kmem_cache_ops inode_kmem_cache_ops = {
+   .get = get_inodes,
+   .kick = kick_inodes
+};
+
 /*
  * Initialize the waitqueues and inode hash table.
  */
@@ -1399,7 +1489,7 @@ void __init inode_init(unsigned long mem
 (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
 SLAB_MEM_SPREAD),
 init_once,
-NULL);
+_kmem_cache_ops);
register_shrinker(_shrinker);
 
/* Hash may have been set up in inode_init_early */
Index: slub/include/linux/fs.h
===
--- slub.orig/include/linux/fs.h2007-05-18 00:50:36.0 -0700
+++ slub/include/linux/fs.h 2007-05-18 00:54:33.0 -0700
@@ -1608,6 +1608,11 @@ static inline void insert_inode_hash(str
__insert_inode_hash(inode, inode->i_ino);
 }
 
+/* Helpers to realize inode defrag support in filesystems */
+extern void kick_inodes(struct kmem_cache *, int, void **, void *);
+extern void *fs_get_inodes(struct kmem_cache *, int nr, void **,
+   unsigned long offset);
+
 extern struct file * get_empty_filp(void);
 extern void file_move(struct file *f, struct list_head *list);
 extern void file_kill(struct file *f);

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the

[patch 06/10] xfs: inode defragmentation support

2007-05-18 Thread clameter

Add slab defrag support.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/xfs/linux-2.6/kmem.h  |5 +++--
 fs/xfs/linux-2.6/xfs_buf.c   |2 +-
 fs/xfs/linux-2.6/xfs_super.c |   13 -
 3 files changed, 16 insertions(+), 4 deletions(-)

Index: slub/fs/xfs/linux-2.6/kmem.h
===
--- slub.orig/fs/xfs/linux-2.6/kmem.h   2007-05-18 00:54:30.0 -0700
+++ slub/fs/xfs/linux-2.6/kmem.h2007-05-18 00:58:38.0 -0700
@@ -79,9 +79,10 @@ kmem_zone_init(int size, char *zone_name
 
 static inline kmem_zone_t *
 kmem_zone_init_flags(int size, char *zone_name, unsigned long flags,
-void (*construct)(void *, kmem_zone_t *, unsigned long))
+void (*construct)(void *, kmem_zone_t *, unsigned long),
+const struct kmem_cache_ops *ops)
 {
-   return kmem_cache_create(zone_name, size, 0, flags, construct, NULL);
+   return kmem_cache_create(zone_name, size, 0, flags, construct, ops);
 }
 
 static inline void
Index: slub/fs/xfs/linux-2.6/xfs_buf.c
===
--- slub.orig/fs/xfs/linux-2.6/xfs_buf.c2007-05-18 00:54:30.0 
-0700
+++ slub/fs/xfs/linux-2.6/xfs_buf.c 2007-05-18 00:58:38.0 -0700
@@ -1832,7 +1832,7 @@ xfs_buf_init(void)
 #endif
 
xfs_buf_zone = kmem_zone_init_flags(sizeof(xfs_buf_t), "xfs_buf",
-   KM_ZONE_HWALIGN, NULL);
+   KM_ZONE_HWALIGN, NULL, NULL);
if (!xfs_buf_zone)
goto out_free_trace_buf;
 
Index: slub/fs/xfs/linux-2.6/xfs_super.c
===
--- slub.orig/fs/xfs/linux-2.6/xfs_super.c  2007-05-18 00:54:30.0 
-0700
+++ slub/fs/xfs/linux-2.6/xfs_super.c   2007-05-18 00:58:38.0 -0700
@@ -355,13 +355,24 @@ xfs_fs_inode_init_once(
inode_init_once(vn_to_inode((bhv_vnode_t *)vnode));
 }
 
+static void *xfs_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+   return fs_get_inodes(s, nr, v, offsetof(bhv_vnode_t, v_inode));
+};
+
+static struct kmem_cache_ops xfs_kmem_cache_ops = {
+   .get = xfs_get_inodes,
+   .kick = kick_inodes
+};
+
 STATIC int
 xfs_init_zones(void)
 {
xfs_vnode_zone = kmem_zone_init_flags(sizeof(bhv_vnode_t), "xfs_vnode",
KM_ZONE_HWALIGN | KM_ZONE_RECLAIM |
KM_ZONE_SPREAD,
-   xfs_fs_inode_init_once);
+   xfs_fs_inode_init_once,
+   _kmem_cache_ops);
if (!xfs_vnode_zone)
goto out;
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 01/10] SLUB: add support for kmem_cache_ops

2007-05-18 Thread clameter

We use the parameter formerly used by the destructor to pass an optional
pointer to a kmem_cache_ops structure to kmem_cache_create.

kmem_cache_ops is created as empty. Later patches populate kmem_cache_ops.

Create a KMEM_CACHE_OPS macro that allows the specification of a the
kmem_cache_ops.

Code to handle kmem_cache_ops is added to SLUB. SLAB and SLOB are updated
to be able to take a kmem_cache_ops structure but will ignore it.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slab.h |   13 +
 include/linux/slub_def.h |1 +
 mm/slab.c|6 +++---
 mm/slob.c|2 +-
 mm/slub.c|   44 ++--
 5 files changed, 44 insertions(+), 22 deletions(-)

Index: slub/include/linux/slab.h
===
--- slub.orig/include/linux/slab.h  2007-05-15 21:19:51.0 -0700
+++ slub/include/linux/slab.h   2007-05-15 21:27:07.0 -0700
@@ -38,10 +38,13 @@ typedef struct kmem_cache kmem_cache_t _
 void __init kmem_cache_init(void);
 int slab_is_available(void);
 
+struct kmem_cache_ops {
+};
+
 struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
unsigned long,
void (*)(void *, struct kmem_cache *, unsigned long),
-   void (*)(void *, struct kmem_cache *, unsigned long));
+   const struct kmem_cache_ops *s);
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
@@ -59,9 +62,11 @@ int kmem_ptr_validate(struct kmem_cache 
  * f.e. add cacheline_aligned_in_smp to the struct declaration
  * then the objects will be properly aligned in SMP configurations.
  */
-#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\
-   sizeof(struct __struct), __alignof__(struct __struct),\
-   (__flags), NULL, NULL)
+#define KMEM_CACHE_OPS(__struct, __flags, __ops) \
+   kmem_cache_create(#__struct, sizeof(struct __struct), \
+   __alignof__(struct __struct), (__flags), NULL, (__ops))
+
+#define KMEM_CACHE(__struct, __flags) KMEM_CACHE_OPS(__struct, __flags, NULL)
 
 #ifdef CONFIG_NUMA
 extern void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
Index: slub/mm/slub.c
===
--- slub.orig/mm/slub.c 2007-05-15 21:25:46.0 -0700
+++ slub/mm/slub.c  2007-05-15 21:29:36.0 -0700
@@ -294,6 +294,9 @@ static inline int check_valid_pointer(st
return 1;
 }
 
+struct kmem_cache_ops slub_default_ops = {
+};
+
 /*
  * Slow version of get and set free pointer.
  *
@@ -2003,11 +2006,13 @@ static int calculate_sizes(struct kmem_c
 static int kmem_cache_open(struct kmem_cache *s, gfp_t gfpflags,
const char *name, size_t size,
size_t align, unsigned long flags,
-   void (*ctor)(void *, struct kmem_cache *, unsigned long))
+   void (*ctor)(void *, struct kmem_cache *, unsigned long),
+   const struct kmem_cache_ops *ops)
 {
memset(s, 0, kmem_size);
s->name = name;
s->ctor = ctor;
+   s->ops = ops;
s->objsize = size;
s->flags = flags;
s->align = align;
@@ -2191,7 +2196,7 @@ static struct kmem_cache *create_kmalloc
 
down_write(_lock);
if (!kmem_cache_open(s, gfp_flags, name, size, ARCH_KMALLOC_MINALIGN,
-   flags, NULL))
+   flags, NULL, _default_ops))
goto panic;
 
list_add(>list, _caches);
@@ -2505,12 +2510,16 @@ static int slab_unmergeable(struct kmem_
if (s->ctor)
return 1;
 
+   if (s->ops != _default_ops)
+   return 1;
+
return 0;
 }
 
 static struct kmem_cache *find_mergeable(size_t size,
size_t align, unsigned long flags,
-   void (*ctor)(void *, struct kmem_cache *, unsigned long))
+   void (*ctor)(void *, struct kmem_cache *, unsigned long),
+   const struct kmem_cache_ops *ops)
 {
struct list_head *h;
 
@@ -2520,6 +2529,9 @@ static struct kmem_cache *find_mergeable
if (ctor)
return NULL;
 
+   if (ops != _default_ops)
+   return NULL;
+
size = ALIGN(size, sizeof(void *));
align = calculate_alignment(flags, align, size);
size = ALIGN(size, align);
@@ -2555,13 +2567,15 @@ static struct kmem_cache *find_mergeable
 struct kmem_cache *kmem_cache_create(const char *name, size_t size,
size_t align, unsigned long flags,
void (*ctor)(void *, struct kmem_cache *, unsigned long),
-   void (*dtor)(void *, struct kmem_cache *, unsigned long))
+   const struct kmem_cache_ops *ops)
 {
struct kmem_cache

Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq

2007-05-18 Thread Siddha, Suresh B

Eric,

On Fri, May 18, 2007 at 07:40:53AM -0700, Eric W. Biederman wrote:
> Still in any of those I don't see a problem with switching to edge
> triggered mode and then back again.  Either Remote IRR will keep
> it's current state or it will be cleared.   Remote IRR should not
> get set (when it was clear) by such a procedure because that
> would mess up the normal interrupt enable sequence that happens
> on boot.  So I'm pretty certain toggling the edge bit is harmless
> and it may actually clear Remote IRR for us.
...
> 
> I think going more the way that this code has gone on arch/i386 with
> real functions is preferable.

There is an issue with this suggestion. We have an inflight
EOI broadcast msg to this IOAPIC (that got delayed but still alive) and
that can cause problem if we use edge and back to level to reset remote IRR bit.

If the vector number stays same during irq migration and if we reset remote
IRR bit using the above method(edge and then back to level) during
irq migration, then we have a problem. A new interrupt arriving on a new
cpu will set the remote IRR bit and now the old inflight EOI broadcast
reaches IOAPIC RTE(resetting the remote IRR bit, because the vector in the
broadcast msg is same), while the kernel code still assumes that the remote
IRR bit is still set. This will lead to more problems and issues.

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq

2007-05-18 Thread Yinghai Lu


On 5/18/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

We can solve the problem without doing that, and keeping the same
vector number during migration keeps x86 from scaling.


I mean ioapic level irq couls be limited. new device could use MSI or
HT irq directly and less irq routing problem.


Personally I would prefer to disallow irq migration.

? typo?
For amd platform with different hypertransport chain on different
nodes, irq migration is needed.

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Linux-fbdev-devel] [PATCH] atmel_lcdfb: AT91/AT32 LCD Controller framebuffer driver

2007-05-18 Thread Jan Altenberg

Hi Nicolas,

> + info->fix.line_length = info->var.xres_virtual * 
> (info->var.bits_per_pixel / 8);

line_length will always be 0 for bits_per_pixel < 8.

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq

2007-05-18 Thread Eric W. Biederman

"Yinghai Lu" <[EMAIL PROTECTED]> writes:

> Eric,
>
> ioapic_level irq is limited, So if we keep vector number not changed
> when imgration to other cpus.  It that could help.

We can solve the problem without doing that, and keeping the same
vector number during migration keeps x86 from scaling.  Personally
I would prefer to disallow irq migration.

> it will need modify a little with assign_irq_vector and
> irq_complete_move/smp_irq_move_cleanup_interrupt. because it assume
> vector must be changed.

Yes it does.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] drivers/net/wireless/libertas/rx.c: fix use-after-free

2007-05-18 Thread John W. Linville

First, please send all wireless patches to
[EMAIL PROTECTED], and be sure to CC me as well...thanks!

On Sat, May 19, 2007 at 12:50:31AM +0800, Eugene Teo wrote:
> libertas_upload_rx_packet() calls netif_rx() before returning, and it always 
> return 0.
> Also within libertas_upload_rx_packet(), it will initialize skb->protocol 
> anyways.
> 
> Spotted by the Coverity checker.

A nearly identical patch was posted by Florin Malita <[EMAIL PROTECTED]>
to netdev (also the wrong list) on Wednesday evening.

>  done:
> LEAVE();
> 
> -   skb->protocol = __constant_htons(0x0019);   /* ETH_P_80211_RAW */
> -

Except for this part...is this intentional?

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] drivers/net/wireless/libertas/fw.c: fix use-before-check

2007-05-18 Thread John W. Linville

This should be sent to linux-wireless (and CC'ed to me) as well...

On Sat, May 19, 2007 at 01:06:49AM +0800, Eugene Teo wrote:
> NULL checks should be performed before the dereference.
> 
> Spotted by the Coverity checker.
> 
> Signed-off-by: Eugene Teo <[EMAIL PROTECTED]>
> 
> diff --git a/drivers/net/wireless/libertas/fw.c 
> b/drivers/net/wireless/libertas/fw.c
> index 441123c..5c63c9b 100644
> --- a/drivers/net/wireless/libertas/fw.c
> +++ b/drivers/net/wireless/libertas/fw.c
> @@ -333,18 +333,22 @@ static void command_timer_fn(unsigned long data)
> unsigned long flags;
> 
> ptempnode = adapter->cur_cmd;
> +   if (ptempnode == NULL) {
> +   lbs_pr_debug(1, "PTempnode Empty\n");
> +   return;
> +   }
> +
> cmd = (struct cmd_ds_command *)ptempnode->bufvirtualaddr;
> +   if (!cmd) {
> +   lbs_pr_debug(1, "cmd is NULL\n");
> +   return;
> +   }
> 
> lbs_pr_info("command_timer_fn fired (%x)\n", cmd->command);
> 
> if (!adapter->fw_ready)
> return;
> 
> -   if (ptempnode == NULL) {
> -   lbs_pr_debug(1, "PTempnode Empty\n");
> -   return;
> -   }
> -
> spin_lock_irqsave(>driver_lock, flags);
> adapter->cur_cmd = NULL;
> spin_unlock_irqrestore(>driver_lock, flags);
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] 2.6.21-git15 - Kconfig Cleanup

2007-05-18 Thread Adrian Bunk

On Fri, May 18, 2007 at 01:04:41PM -0400, Matt LaPlante wrote:

> ping?

Noone disagreed, and trivial patches will be forwarded again during the 
2.6.23 merge window.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] fix unused setup_nr_node_ids

2007-05-18 Thread Andrew Morton

On Fri, 18 May 2007 12:39:14 +0200
Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> This is now the only (!) compiler warning I get in my UML build :)
> 
> 
> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> mm/page_alloc.c:931: warning: 'setup_nr_node_ids' defined but not used
> 
> Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
> ---
> 
> Index: linux/mm/page_alloc.c
> ===
> --- linux.orig/mm/page_alloc.c2007-04-26 13:07:11.0 +0200
> +++ linux/mm/page_alloc.c 2007-04-26 13:07:12.0 +0200
> @@ -914,7 +914,10 @@ static int rmqueue_bulk(struct zone *zon
>  #if MAX_NUMNODES > 1
>  int nr_node_ids __read_mostly = MAX_NUMNODES;
>  EXPORT_SYMBOL(nr_node_ids);
> +#endif
>  
> +#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
> +#if MAX_NUMNODES > 1
>  /*
>   * Figure out the number of possible node ids.
>   */
> @@ -930,6 +933,7 @@ static void __init setup_nr_node_ids(voi
>  #else
>  static void __init setup_nr_node_ids(void) {}
>  #endif
> +#endif
>  
>  #ifdef CONFIG_NUMA
>  /*

That doesn't do much to inprove overall readability.

I suspect the warning was only there because the stubbed version of
setup_nr_node_ids() forgot to be declared static inline, yes?

How about this?

--- a/mm/page_alloc.c~fix-unused-setup_nr_node_ids
+++ a/mm/page_alloc.c
@@ -136,6 +136,11 @@ static unsigned long __meminitdata dma_r
 #endif /* CONFIG_MEMORY_HOTPLUG_RESERVE */
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 
+#if MAX_NUMNODES > 1
+int nr_node_ids __read_mostly = MAX_NUMNODES;
+EXPORT_SYMBOL(nr_node_ids);
+#endif
+
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {
@@ -669,26 +674,6 @@ static int rmqueue_bulk(struct zone *zon
return i;
 }
 
-#if MAX_NUMNODES > 1
-int nr_node_ids __read_mostly = MAX_NUMNODES;
-EXPORT_SYMBOL(nr_node_ids);
-
-/*
- * Figure out the number of possible node ids.
- */
-static void __init setup_nr_node_ids(void)
-{
-   unsigned int node;
-   unsigned int highest = 0;
-
-   for_each_node_mask(node, node_possible_map)
-   highest = node;
-   nr_node_ids = highest + 1;
-}
-#else
-static void __init setup_nr_node_ids(void) {}
-#endif
-
 #ifdef CONFIG_NUMA
 /*
  * Called from the vmstat counter updater to drain pagesets of this
@@ -2733,6 +2718,26 @@ void __meminit free_area_init_node(int n
 }
 
 #ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+
+#if MAX_NUMNODES > 1
+/*
+ * Figure out the number of possible node ids.
+ */
+static void __init setup_nr_node_ids(void)
+{
+   unsigned int node;
+   unsigned int highest = 0;
+
+   for_each_node_mask(node, node_possible_map)
+   highest = node;
+   nr_node_ids = highest + 1;
+}
+#else
+static inline void setup_nr_node_ids(void)
+{
+}
+#endif
+
 /**
  * add_active_range - Register a range of PFNs backed by physical memory
  * @nid: The node ID the range resides on
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ht CPU flag

2007-05-18 Thread Bernd Eckenfels

In article <[EMAIL PROTECTED]> you wrote:
> I have Pentium D CPU, which many Windows utilities like cpuz, wcpuid, 
> everest identify as D 930 (Dual Core, 3GHz). From Intel site I find out 
> that it has no HT feature, nor Windows XP identify it as HT.

the ht flag reported by the CPU and cpuinfo is not a reliable detection if
HT is available on your CPU or your motherboard/bios.

> Why do I have "ht" flag in cpuinfo?

Because your CPU reports it. You will see that also in cpuz output.

However, you can see ht in the sibblings value (for a single core it will be
2 if you have HT, I am not sure if it is 4 for a dual core CPU)

Gruss
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-18 Thread Badari Pulavarty

On Fri, 2007-05-18 at 09:35 +0200, Jens Axboe wrote:
> On Thu, May 17 2007, Badari Pulavarty wrote:
> > On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
> > > On Wed, May 16 2007, Badari Pulavarty wrote:
> > > > On Tue, 2007-05-15 at 19:50 +0200, Jens Axboe wrote:
> > > > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > > > On Tue, 2007-05-15 at 19:20 +0200, Jens Axboe wrote:
> > > > > > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > > > > > On Fri, 2007-05-11 at 15:51 +0200, Jens Axboe wrote:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > Updated version of the patch - this time I'll just attach the 
> > > > > > > > > patch
> > > > > > > > > file...
> > > > > > > > 
> > > > > > > > Missing scatterlist.h inclusions..
> > > > > > > > 
> > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c: In function 
> > > > > > > > ???sym_scatter???:
> > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: warning: implicit 
> > > > > > > > declaration
> > > > > > > > of function ???for_each_sg???
> > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: error: expected 
> > > > > > > > ???;??? before ???{???
> > > > > > > > token
> > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:375: warning: unused 
> > > > > > > > variable ???tp???
> > > > > > > > make[3]: *** [drivers/scsi/sym53c8xx_2/sym_glue.o] Error 1
> > > > > > > > 
> > > > > > > > 
> > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c: In function 
> > > > > > > > ???qla24xx_build_scsi_iocbs???:
> > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c:678: warning: implicit 
> > > > > > > > declaration of
> > > > > > > > function ???for_each_sg???
> > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c:678: error: expected ???;??? 
> > > > > > > > before ???{???
> > > > > > > > token
> > > > > > > 
> > > > > > > Thanks, will fix those. What arch? I tested it here.
> > > > > > 
> > > > > > I am playing with them on ppc64.
> > > > > 
> > > > > Ah ok, you need the updated patch series for ppc64 support. Builds 
> > > > > fine
> > > > > here on ppc64. See the #sglist branch of the block repo:
> > > > > 
> > > > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > > > 
> > > > > I can mail you an updated patch, if you want.
> > > > 
> > > > 
> > > > Here is the whole panic stack..
> > > 
> > > Thanks will fix that up, the IDE part is totally untested. Can you try
> > > and backout this patch and see if it boots?
> > 
> > I increased max_segments to 1024 on my qla2200 attached disks and
> > simple "dd" (direct read) resulted in following:
> > 
> > elm3b29:/sys/block/sdd/queue # echo 1024 > max_segments
> > elm3b29:/sys/block/sdd/queue # cat max_hw_sectors_kb > max_sectors_kb
> > elm3b29:/mnt # dd iflag=direct if=./z of=/dev/null bs=512M
> > 
> > Unable to handle kernel paging request at 1008 RIP:
> >  [] __rmqueue+0x6f/0x120
> 
> Auch, that's a bug. I don't think the oom path has been tested yet,
> perhaps this is hitting it.
> 
> Can you try with this debug patch, plus enable the slab debugging
> helpers (like poisoning)?
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 7456992..a479d1e 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -793,6 +793,7 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd 
> *cmd, gfp_t gfp_mask)
>   return ret;
>  enomem:
>   if (ret) {
> + printk(KERN_ERR "scsi: failed to allocate sg table\n");
>   /*
>* Free entries chained off ret. Since we were trying to
>* allocate another sglist, we know that all entries are of
> 

Not much help. I get all kinds of weird panics.. This time I got (with
the above debug).

general protection fault:  [1] SMP
CPU 1
Modules linked in: jfs sg sd_mod qla2xxx firmware_class
scsi_transport_fc scsi_mod vfat fat ipv6 thermal processor fan button
battery ac dm_mod floppy parport_pc lp parport
Pid: 56, comm: kblockd/1 Not tainted 2.6.22-rc1-sg #8
RIP: 0010:[]  [] kmem_cache_alloc
+0x36/0x70
RSP: 0018:81017abbfc10  EFLAGS: 00010002
RAX:  RBX: 0082 RCX: 0664
RDX: 81019ff2b8a0 RSI: 00011220 RDI: 8068d120
RBP: 81017abbfc20 R08: 39f8 R09: 
R10: 81019cbee700 R11: 0188 R12: 8101df2a64e0
R13: 00011220 R14: 8101df2a6510 R15: 81017abbfc50
FS:  2b505b027f20() GS:81018021f300()
knlGS:f7da26b0
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 2b505b029008 CR3: 00019af73000 CR4: 06e0
Process kblockd/1 (pid: 56, threadinfo 81017abbe000, task
81017a571440)
Stack:  7abbfc30  81017abbfc30
8025d001
 81017abbfcb0 8025d122 81017abbfc60 80219dc0
 880e5da6 00ad 81017abbfcd0 8021a366
Call Trace:
 [] mempool_alloc_slab+0x11/0x20
 [] mempool_alloc+0x42/0x110
 [] flush_gart+0x40/0x50
 []

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-18 Thread Badari Pulavarty

On Fri, 2007-05-18 at 19:03 +0200, Jens Axboe wrote:
> On Fri, May 18 2007, Badari Pulavarty wrote:
> > On Fri, 2007-05-18 at 09:33 +0200, Jens Axboe wrote:
> > > On Thu, May 17 2007, Badari Pulavarty wrote:
> > > > On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
> > > > .. 
> > > > > > > 
> > > > > > > Ah ok, you need the updated patch series for ppc64 support. 
> > > > > > > Builds fine
> > > > > > > here on ppc64. See the #sglist branch of the block repo:
> > > > > > > 
> > > > > > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > > > > > 
> > > > > > > I can mail you an updated patch, if you want.
> > > > > > 
> > > > > > 
> > > > > > Here is the whole panic stack..
> > > > > 
> > > > > Thanks will fix that up, the IDE part is totally untested. Can you try
> > > > > and backout this patch and see if it boots?
> > > > 
> > > > Yes. It boots fine with following backed out.
> > > > 
> > > > Looking at the code ide_probe.c: hwif_init() is doing
> > > > 
> > > > hwif->sg_table = kmalloc(sizeof(struct 
> > > > scatterlist)*hwif->sg_max_nents,
> > > >  GFP_KERNEL);
> > > > 
> > > > blk_rq_map_sg() is looking for the chaining info and going over end of 
> > > > the
> > > > allocation.
> > > 
> > > Hmm, looks ok, I'm guessing it's just missing a memset (or just turn it
> > > into a kzalloc())?
> > > 
> > 
> > Even with backing out all the ide changes, I get this on boot
> > once in a while.
> 
> Yep, I think the ide changes are fine as such, the problem is the
> missing memset/kzalloc. Can you try that?

kzalloc() made it better. I haven't seen ide panics anymore. I will try
it again after applying ide patches.

Thanks
Badari


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] Char: cyclades, fix sparse warning

2007-05-18 Thread Jiri Slaby

cyclades, fix sparse warning

+ one possible deadlock (omitted unlock)

Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>

---
commit a9dbc0b98956d548b1aee3f55b3799a12946ace4
tree 1e62235a9bf1edb7c206932c9a10d1e9e77cb0a0
parent 7a9aa4781fc5aa6493bb3a7ac59b3c9e5f20fa76
author Jiri Slaby <[EMAIL PROTECTED]> Fri, 18 May 2007 19:42:55 +0200
committer Jiri Slaby <[EMAIL PROTECTED]> Fri, 18 May 2007 19:42:55 +0200

 drivers/char/cyclades.c |   19 +--
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/char/cyclades.c b/drivers/char/cyclades.c
index 829be9d..01d1da0 100644
--- a/drivers/char/cyclades.c
+++ b/drivers/char/cyclades.c
@@ -1100,6 +1100,7 @@ static void cyy_intr_chip(struct cyclades_card *cinfo, 
int chip,
 
if (data & info->ignore_status_mask) {
info->icount.rx++;
+   spin_unlock(>card_lock);
return;
}
if (tty_buffer_request_room(tty, 1)) {
@@ -1889,11 +1890,11 @@ static void cyz_poll(unsigned long arg)
struct cyclades_card *cinfo;
struct cyclades_port *info;
struct tty_struct *tty;
-   static struct FIRM_ID *firm_id;
-   static struct ZFW_CTRL *zfw_ctrl;
-   static struct BOARD_CTRL *board_ctrl;
-   static struct CH_CTRL *ch_ctrl;
-   static struct BUF_CTRL *buf_ctrl;
+   struct FIRM_ID __iomem *firm_id;
+   struct ZFW_CTRL __iomem *zfw_ctrl;
+   struct BOARD_CTRL __iomem *board_ctrl;
+   struct CH_CTRL __iomem *ch_ctrl;
+   struct BUF_CTRL __iomem *buf_ctrl;
unsigned long expires = jiffies + HZ;
int card, port;
 
@@ -2037,7 +2038,6 @@ static int startup(struct cyclades_port *info)
struct ZFW_CTRL __iomem *zfw_ctrl;
struct BOARD_CTRL __iomem *board_ctrl;
struct CH_CTRL __iomem *ch_ctrl;
-   int retval;
 
base_addr = card->base_addr;
 
@@ -2409,7 +2409,6 @@ block_til_ready(struct tty_struct *tty, struct file *filp,
struct ZFW_CTRL __iomem *zfw_ctrl;
struct BOARD_CTRL __iomem *board_ctrl;
struct CH_CTRL __iomem *ch_ctrl;
-   int retval;
 
base_addr = cinfo->base_addr;
firm_id = base_addr + ID_ADDRESS;
@@ -4905,7 +4904,7 @@ static int __devinit cyz_load_fw(struct pci_dev *pdev, 
void __iomem *base_addr,
struct FIRM_ID __iomem *fid = base_addr + ID_ADDRESS;
struct CUSTOM_REG __iomem *cust = base_addr;
struct ZFW_CTRL __iomem *pt_zfwctrl;
-   u8 *tmp;
+   void __iomem *tmp;
u32 mailbox, status;
unsigned int i;
int retval;
@@ -4967,13 +4966,13 @@ static int __devinit cyz_load_fw(struct pci_dev *pdev, 
void __iomem *base_addr,
udelay(100);
 
/* clear memory */
-   for (tmp = base_addr; (void *)tmp < base_addr + RAM_SIZE; tmp++)
+   for (tmp = base_addr; tmp < base_addr + RAM_SIZE; tmp++)
cy_writeb(tmp, 255);
if (mailbox != 0) {
/* set window to last 512K of RAM */
cy_writel(_addr->loc_addr_base, WIN_RAM + RAM_SIZE);
//sleep(1);
-   for (tmp = base_addr; (void *)tmp < base_addr + RAM_SIZE; tmp++)
+   for (tmp = base_addr; tmp < base_addr + RAM_SIZE; tmp++)
cy_writeb(tmp, 255);
/* set window to beginning of RAM */
cy_writel(_addr->loc_addr_base, WIN_RAM);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fork Bombing Attack

2007-05-18 Thread Valdis . Kletnieks

On Fri, 18 May 2007 22:52:15 +0530, Anand Jahagirdar said:

> output is 8050. when root or any other user changes ulimit by typing
> "ulimit -u value",.ulimit value is changed for that session and not
> permantely.

Right.  That value is only applied to that process, and its children. Or more
correctly, those children that don't themselves change the value again - the
distinction is crucial in this case.

> actually ulimit should help to prevent fork bombing attack

Right. It *helps* prevent it.  But it isn't by itself a total cure.

> but it wont and fork bombing attack still take down the machine having
> latest linux distributions.
> 
> will any please tell me why this is so?

Because it only requires *one* process not subject to the ulimit, or a group of
cooperating processes subject to the limit, to bypass that particular
protection.

ulimits are a fairly good and inexpensive way to guard against *accidental*
runaway processes from trashing the system.  They're at best a 95% solution,
and won't stop *every* case.

Consider - you determine that a small fork bomb that launches more than
7,500 processes will kill your system, so you set the ulimit to 7000.

I, as an attacker, am using a compromised userid on your system (think about
it for a moment - if I'm a *legit* user of the system, and use my own userid
for it, I'm a self-limiting problem, as I can only do it once, after which I
have to worry about getting fired, possible legal/criminal action, etc).

1) Fork bomb 6,500 processes - and have each one check the 'ulimit -m' value
and proceed to malloc() and then dirty that amount minus 5 or 10 megabytes.

2) Instead of using *one* compromised userid, I use two, and launch 6,000
processes from each...

3) Lots of *other* possibilities.


pgpl8Xup2JT9C.pgp
Description: PGP signature

[PATCH 1/2] Char: cyclades, add firmware loading

2007-05-18 Thread Jiri Slaby

cyclades, add firmware loading

Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>

---
commit 7a9aa4781fc5aa6493bb3a7ac59b3c9e5f20fa76
tree 18d85e52a13cd780cf808a43d68b569a8546e2ab
parent 4ea1257b890befc706f6d43562ba68671db39195
author Jiri Slaby <[EMAIL PROTECTED]> Fri, 18 May 2007 18:45:35 +0200
committer Jiri Slaby <[EMAIL PROTECTED]> Fri, 18 May 2007 18:45:35 +0200

 drivers/char/cyclades.c |  351 ---
 1 files changed, 328 insertions(+), 23 deletions(-)

diff --git a/drivers/char/cyclades.c b/drivers/char/cyclades.c
index c72ee97..829be9d 100644
--- a/drivers/char/cyclades.c
+++ b/drivers/char/cyclades.c
@@ -646,6 +646,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -680,6 +681,44 @@ static void cy_send_xchar(struct tty_struct *tty, char ch);
 
 #define STD_COM_FLAGS (0)
 
+/* firmware stuff */
+#define ZL_MAX_BLOCKS  16
+#define DRIVER_VERSION 0x02010203
+#define RAM_SIZE 0x8
+
+#define Z_FPGA_LOADED(X)   ((readl(&(X)->init_ctrl) & (1<<17)) != 0)
+
+enum zblock_type {
+   ZBLOCK_PRG = 0,
+   ZBLOCK_FPGA = 1
+};
+
+struct zfile_header {
+   char name[64];
+   char date[32];
+   char aux[32];
+   u32 n_config;
+   u32 config_offset;
+   u32 n_blocks;
+   u32 block_offset;
+   u32 reserved[9];
+} __attribute__ ((packed));
+
+struct zfile_config {
+   char name[64];
+   u32 mailbox;
+   u32 function;
+   u32 n_blocks;
+   u32 block_list[ZL_MAX_BLOCKS];
+} __attribute__ ((packed));
+
+struct zfile_block {
+   u32 type;
+   u32 file_offset;
+   u32 ram_offset;
+   u32 size;
+} __attribute__ ((packed));
+
 static struct tty_driver *cy_serial_driver;
 
 #ifdef CONFIG_ISA
@@ -4738,17 +4777,295 @@ static int __init cy_detect_isa(void)
 }  /* cy_detect_isa */
 
 #ifdef CONFIG_PCI
-static void __devinit plx_init(void __iomem * addr, __u32 initctl)
+static inline int __devinit cyc_isfwstr(const char *str, unsigned int size)
+{
+   unsigned int a;
+
+   for (a = 0; a < size && *str; a++, str++)
+   if (*str & 0x80)
+   return -EINVAL;
+
+   for (; a < size; a++, str++)
+   if (*str)
+   return -EINVAL;
+
+   return 0;
+}
+
+static inline void __devinit cyz_fpga_copy(void __iomem *fpga, u8 *data,
+   unsigned int size)
+{
+   for (; size > 0; size--) {
+   cy_writel(fpga, *data++);
+   udelay(10);
+   }
+}
+
+static void __devinit plx_init(struct pci_dev *pdev, int irq,
+   struct RUNTIME_9060 __iomem *addr)
 {
/* Reset PLX */
-   cy_writel(addr + initctl, readl(addr + initctl) | 0x4000);
+   cy_writel(>init_ctrl, readl(>init_ctrl) | 0x4000);
udelay(100L);
-   cy_writel(addr + initctl, readl(addr + initctl) & ~0x4000);
+   cy_writel(>init_ctrl, readl(>init_ctrl) & ~0x4000);
 
/* Reload Config. Registers from EEPROM */
-   cy_writel(addr + initctl, readl(addr + initctl) | 0x2000);
+   cy_writel(>init_ctrl, readl(>init_ctrl) | 0x2000);
udelay(100L);
-   cy_writel(addr + initctl, readl(addr + initctl) & ~0x2000);
+   cy_writel(>init_ctrl, readl(>init_ctrl) & ~0x2000);
+
+   /* For some yet unknown reason, once the PLX9060 reloads the EEPROM,
+* the IRQ is lost and, thus, we have to re-write it to the PCI config.
+* registers. This will remain here until we find a permanent fix.
+*/
+   pci_write_config_byte(pdev, PCI_INTERRUPT_LINE, irq);
+}
+
+static int __devinit __cyz_load_fw(const struct firmware *fw,
+   const char *name, const u32 mailbox, void __iomem *base,
+   void __iomem *fpga)
+{
+   void *ptr = fw->data;
+   struct zfile_header *h = ptr;
+   struct zfile_config *c, *cs;
+   struct zfile_block *b, *bs;
+   unsigned int a, tmp, len = fw->size;
+#define BAD_FW KERN_ERR "Bad firmware: "
+   if (len < sizeof(*h)) {
+   printk(BAD_FW "too short: %u<%zu\n", len, sizeof(*h));
+   return -EINVAL;
+   }
+
+   cs = ptr + h->config_offset;
+   bs = ptr + h->block_offset;
+
+   if ((void *)(cs + h->n_config) > ptr + len ||
+   (void *)(bs + h->n_blocks) > ptr + len) {
+   printk(BAD_FW "too short");
+   return  -EINVAL;
+   }
+
+   if (cyc_isfwstr(h->name, sizeof(h->name)) ||
+   cyc_isfwstr(h->date, sizeof(h->date))) {
+   printk(BAD_FW "bad formatted header string\n");
+   return -EINVAL;
+   }
+
+   if (strncmp(name, h->name, sizeof(h->name))) {
+   printk(BAD_FW "bad name '%s' (expected '%s')\n", h->name, name);
+   return -EINVAL;
+   }
+
+   tmp = 0;
+   for (c = cs; c < cs + h->n_config; c++) {
+   for (a = 0; a < c->n_blocks; a++)
+

Re: [PATCH 12/15] Make NFS client use seq_list_xxx helpers

2007-05-18 Thread Trond Myklebust

On Fri, 2007-05-18 at 14:04 +0400, Pavel Emelianov wrote:
> This includes /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes
> entries.
> 
> Both need to show the header and use the list_head.
> 
> Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]>

Acked-by: Trond Myklebust <[EMAIL PROTECTED]>


> 
> ---
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 50c6821..10355ec 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -1232,23 +1232,9 @@ static int nfs_server_list_open(struct i
>   */
>  static void *nfs_server_list_start(struct seq_file *m, loff_t *_pos)
>  {
> - struct list_head *_p;
> - loff_t pos = *_pos;
> -
>   /* lock the list against modification */
>   spin_lock(_client_lock);
> -
> - /* allow for the header line */
> - if (!pos)
> - return SEQ_START_TOKEN;
> - pos--;
> -
> - /* find the n'th element in the list */
> - list_for_each(_p, _client_list)
> - if (!pos--)
> - break;
> -
> - return _p != _client_list ? _p : NULL;
> + return seq_list_start_head(_client_list, *_pos);
>  }
>  
>  /*
> @@ -1256,14 +1242,7 @@ static void *nfs_server_list_start(struc
>   */
>  static void *nfs_server_list_next(struct seq_file *p, void *v, loff_t *pos)
>  {
> - struct list_head *_p;
> -
> - (*pos)++;
> -
> - _p = v;
> - _p = (v == SEQ_START_TOKEN) ? nfs_client_list.next : _p->next;
> -
> - return _p != _client_list ? _p : NULL;
> + return seq_list_next(v, _client_list, pos);
>  }
>  
>  /*
> @@ -1282,7 +1261,7 @@ static int nfs_server_list_show(struct s
>   struct nfs_client *clp;
>  
>   /* display header on line 1 */
> - if (v == SEQ_START_TOKEN) {
> + if (v == _client_list) {
>   seq_puts(m, "NV SERVER   PORT USE HOSTNAME\n");
>   return 0;
>   }
> @@ -1323,23 +1302,9 @@ static int nfs_volume_list_open(struct i
>   */
>  static void *nfs_volume_list_start(struct seq_file *m, loff_t *_pos)
>  {
> - struct list_head *_p;
> - loff_t pos = *_pos;
> -
>   /* lock the list against modification */
>   spin_lock(_client_lock);
> -
> - /* allow for the header line */
> - if (!pos)
> - return SEQ_START_TOKEN;
> - pos--;
> -
> - /* find the n'th element in the list */
> - list_for_each(_p, _volume_list)
> - if (!pos--)
> - break;
> -
> - return _p != _volume_list ? _p : NULL;
> + return seq_list_start_head(_volume_list, *_pos);
>  }
>  
>  /*
> @@ -1347,14 +1312,7 @@ static void *nfs_volume_list_start(struc
>   */
>  static void *nfs_volume_list_next(struct seq_file *p, void *v, loff_t *pos)
>  {
> - struct list_head *_p;
> -
> - (*pos)++;
> -
> - _p = v;
> - _p = (v == SEQ_START_TOKEN) ? nfs_volume_list.next : _p->next;
> -
> - return _p != _volume_list ? _p : NULL;
> + return seq_list_next(v, _volume_list, pos);
>  }
>  
>  /*
> @@ -1375,7 +1333,7 @@ static int nfs_volume_list_show(struct s
>   char dev[8], fsid[17];
>  
>   /* display header on line 1 */
> - if (v == SEQ_START_TOKEN) {
> + if (v == _volume_list) {
>   seq_puts(m, "NV SERVER   PORT DEV FSID\n");
>   return 0;
>   }
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1-mm1

2007-05-18 Thread Edward Shishkin


Richard Purdie wrote:


On Wed, 2007-05-16 at 10:06 -0700, Andrew Morton wrote:
 


On Wed, 16 May 2007 18:00:43 +0100 Richard Purdie <[EMAIL PROTECTED]> wrote:

   


On Wed, 2007-05-16 at 09:50 -0700, Randy Dunlap wrote:
 


On Tue, 15 May 2007 20:19:14 -0700 Andrew Morton wrote:

   


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/
 


LZO build fails on allyesconfig:

lib/built-in.o: In function `lzo1x_1_compress':
lib/lzo/minilzo.c:724: multiple definition of `lzo1x_1_compress'
   fs/built-in.o:fs/reiser4/plugin/compress/minilzo.c:1307: first 
defined here
ld: Warning: size of symbol `lzo1x_1_compress' changed from 1541 in 
fs/built-in.o to 244 in lib/built-in.o
lib/built-in.o: In function `lzo1x_decompress': 
   lib/lzo/minilzo.c:885: multiple definition of `lzo1x_decompress'
fs/built-in.o:fs/reiser4/plugin/compress/minilzo.c:1466: first defined here 
   ld: Warning: size of symbol `lzo1x_decompress' changed from 1047 in 
fs/built-in.o to 678 in lib/built-in.o
make: *** [.tmp_vmlinux1] Error 1
make: Target `all' not remade because of errors.
   


Looks like reiser4 contains a copy of minilzo used as some kind of
compression plugin. It can be dropped in favour of the version in
lib/lzo/, they'll be compatible.

Andrew: Do you want a patch to remove it from reiser4?

 


yes please.
   



Sent.

I also noticed that reiser4 is using lzo1x_decompress(), not
lzo1x_decompress_safe(). The unsafe version is open to buffer overflows
through malicious data since it performs no validation of where it
writes output to.



Hm, if you accept unknown drive, then yes, it is open..


I'm not sure whether thats acceptable in filesystem
code, I'd suspect not?
 



Ok, we will consider safe decompression,
moreover, as I remember, it doesn't lead to
sensible performance drop..

Thanks for this point,
Edward.

Fixing it is a case of s/lzo1x_decompress(/lzo1x_decompress_safe(/ in 
fs/reiser4/plugin/compress/compress.c...


Richard


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RESEND] PIE randomization

2007-05-18 Thread Andrew Morton

On Thu, 17 May 2007 22:24:11 +0200 (CEST)
Jan Kratochvil <[EMAIL PROTECTED]> wrote:

> This patch is using mmap()'s randomization functionality in such a way 
> that it maps the main executable of (specially compiled/linked -pie/-fpie) 
> ET_DYN binaries onto a random address (in cases in which mmap() is allowed 
> to perform a randomization).
> 
> Origin of this patch is in exec-shield 
> (http://people.redhat.com/mingo/exec-shield/)


From: Andrew Morton <[EMAIL PROTECTED]>

- the compiler knows how to inline things

- return -EINVAL on zero-size, not -EIO

- reduce scope of local `interp_map_addr', remove unneeded initialisation,
  add needed comment.

- coding-style repairs

Cc: Jan Kratochvil <[EMAIL PROTECTED]>
Cc: Jiri Kosina <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Roland McGrath <[EMAIL PROTECTED]>
Cc: Jakub Jelinek <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/binfmt_elf.c |   26 +-
 1 files changed, 17 insertions(+), 9 deletions(-)

diff -puN fs/binfmt_elf.c~pie-randomization-fix fs/binfmt_elf.c
--- a/fs/binfmt_elf.c~pie-randomization-fix
+++ a/fs/binfmt_elf.c
@@ -322,17 +322,17 @@ static unsigned long elf_map(struct file
 
 #endif /* !elf_map */
 
-static inline unsigned long total_mapping_size(struct elf_phdr *cmds, int nr)
+static unsigned long total_mapping_size(struct elf_phdr *cmds, int nr)
 {
int i, first_idx = -1, last_idx = -1;
 
-   for (i = 0; i < nr; i++)
+   for (i = 0; i < nr; i++) {
if (cmds[i].p_type == PT_LOAD) {
last_idx = i;
if (first_idx == -1)
first_idx = i;
}
-
+   }
if (first_idx == -1)
return 0;
 
@@ -396,8 +396,10 @@ static unsigned long load_elf_interp(str
}
 
total_size = total_mapping_size(elf_phdata, interp_elf_ex->e_phnum);
-   if (!total_size)
+   if (!total_size) {
+   error = -EINVAL;
goto out_close;
+   }
 
eppnt = elf_phdata;
for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
@@ -586,7 +588,8 @@ static int load_elf_binary(struct linux_
int elf_exec_fileno;
int retval, i;
unsigned int size;
-   unsigned long elf_entry, interp_load_addr = 0, interp_map_addr = 0;
+   unsigned long elf_entry;
+   unsigned long interp_load_addr = 0;
unsigned long start_code, end_code, start_data, end_data;
unsigned long reloc_func_desc = 0;
char passed_fileno[6];
@@ -908,7 +911,7 @@ static int load_elf_binary(struct linux_
 * default mmap base, as well as whatever program they
 * might try to exec.  This is because the brk will
 * follow the loader, and is not movable.  */
-#if defined(__i386__) || defined(__x86_64__)
+#ifdef CONFIG_X86
load_bias = 0;
 #else
load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
@@ -992,16 +995,21 @@ static int load_elf_binary(struct linux_
}
 
if (elf_interpreter) {
-   if (interpreter_type == INTERPRETER_AOUT)
+   if (interpreter_type == INTERPRETER_AOUT) {
elf_entry = load_aout_interp(>interp_ex,
 interpreter);
-   else {
+   } else {
+   unsigned long interp_map_addr;  /* unused */
+
elf_entry = load_elf_interp(>interp_elf_ex,
interpreter,
_map_addr,
load_bias);
if (!BAD_ADDR(elf_entry)) {
-   /* load_elf_interp() returns relocation 
adjustment */
+   /*
+* load_elf_interp() returns relocation
+* adjustment
+*/
interp_load_addr = elf_entry;
elf_entry += loc->interp_elf_ex.e_entry;
}
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] x86_64, irq: check remote IRR bit before migrating level triggered irq

2007-05-18 Thread Yinghai Lu


Eric,

ioapic_level irq is limited, So if we keep vector number not changed
when imgration to other cpus.  It that could help.

it will need modify a little with assign_irq_vector and
irq_complete_move/smp_irq_move_cleanup_interrupt. because it assume
vector must be changed.

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH - 1/1] Documentation/HOWTO

2007-05-18 Thread Valdis . Kletnieks

On Fri, 18 May 2007 12:01:41 EDT, "Robert P. J. Day" said:
> On Fri, 18 May 2007, debian developer wrote:

> > -your skills, and other developers will be aware of your presence. Fixing
> > bugs is one of the best ways to get merits among other developers, because
> > not many people like wasting time fixing other people's bugs.
>^^^ 
> you might want to find a less demeaning term for debugging than
> "wasting time."  just a thought.

Wasn't his text, not his fault.  At least his patch is an improvement - now
HOWTO only equates it with "wasting time" once, rather than twice. ;)


pgpv8e2Jyccvw.pgp
Description: PGP signature

Re: Fork Bombing Attack

2007-05-18 Thread Anand Jahagirdar

Hello All
  I found one more interesting thing related with fork
bombing attack. i have set following in /etc/security/limits.conf file

[EMAIL PROTECTED]hard  nproc  3000
[EMAIL PROTECTED] hard  nproc  500

I have tried to execute fork bombing program on the same machine. it
killed the box completely and machine needed a reboot. will any body
please tell me why this is so? even after setting limits in
/etc/security/limits.conf file.

about ulimit:
ulimit are by default  set to some value for all users.. root, guest.
on my machine with FC4 distribution when i typed command "ulimit -u"
it gave me output as 3055 and another machine having FC6 distribution
output is 8050. when root or any other user changes ulimit by typing
"ulimit -u value",.ulimit value is changed for that session and not
permantely. actually ulimit should help to prevent fork bombing attack
but it wont and fork bombing attack still take down the machine having
latest linux distributions.

will any please tell me why this is so?

Regards
Anand

On 5/18/07, Ahmed S. Darwish <[EMAIL PROTECTED]> wrote:

On 5/18/07, Anand Jahagirdar <[EMAIL PROTECTED]> wrote:
> Hello All
>I tried to execute a program which creates 8152 process.(
> i=0; while( i<14) i++ fork(); )  with ulimit 8200. This program
> created 8152 processes and then stopped and came back to command
> prompt. this proves that my machine do have sufficient resources to
> create 8000 processes.
>
>I found one more interesting thing on the same machine
> having FC6 distribution and Linux Kernel 2.6.18. i have set "ulimit -u
> 100". after setting this limit i tried to execute fork bombing program
> with guest account. after executing it
>
> expected result:-  guest uesr should not able to fork another single
> process when it reaches to 100 processes count.
>
> actual result :-  kernel allow me to create another processes without
> giving error. due to this i tried to execute same fork bombing program
> on another terminal with guest account and this fork bombing attack
> killed the box completely and machine needed reboot.
>

I think if you want resource limiting per _UID_ (and not per _process_
as you did), you should use PAM module pam_limits.so. You can edit
those limits using the file /etc/security/limits.conf

Regards,
--
Ahmed S. Darwish
http://darwish-07.blogspot.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] make slab gfp fair

2007-05-18 Thread Paul Jackson

Peter wrote:
> cpusets are ignored when in dire straights for an kernel alloc.

No - most kernel allocations never ignore cpusets.

The ones marked NOFAIL or ATOMIC can ignore cpusets in dire straights
and the ones off interrupts lack an applicable cpuset context.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] make slab gfp fair

2007-05-18 Thread Christoph Lameter

On Fri, 18 May 2007, Peter Zijlstra wrote:

> On Thu, 2007-05-17 at 15:27 -0700, Christoph Lameter wrote:

> Isn't the zone mask the same for all allocations from a specific slab?
> If so, then the slab wide ->reserve_slab will still dtrt (barring
> cpusets).

All allocations from a single slab have the same set of allowed types of 
zones. I.e. a DMA slab can access only ZONE_DMA a regular slab 
ZONE_NORMAL, ZONE_DMA32 and ZONE_DMA.

> > On x86_64 systems you have the additional complication that there are 
> > even multiple DMA32 or NORMAL zones per node. Some will have DMA32 and 
> > NORMAL, others DMA32 alone or NORMAL alone. Which watermarks are we 
> > talking about?
> 
> Watermarks like used by the page allocator given the slabs zone mask.
> The page allocator will only fall back to ALLOC_NO_WATERMARKS when all
> target zones are exhausted.

That works if zones do not vary between slab requests. So on SMP (without 
extra gfp flags) we may be fine. But see other concerns below.

> > The use of ALLOC_NO_WATERMARKS depends on the contraints of the allocation 
> > in all cases. You can only compare the stresslevel (rank?) of allocations 
> > that have the same allocation constraints. The allocation constraints are
> > a result of gfp flags,
> 
> The gfp zone mask is constant per slab, no? It has to, because the zone
> mask is only used when the slab is extended, other allocations live off
> whatever was there before them.

The gfp zone mask is used to select the zones in a SMP config. But not in 
a NUMA configuration there the zones can come from multiple nodes.

Ok in an SMP configuration the zones are determined by the allocation 
flags. But then there are also the gfp flags that influence reclaim 
behavior. These also have an influence on the memory pressure.

These are

__GFP_IO
__GFP_FS
__GFP_NOMEMMALLOC
__GFP_NOFAIL
__GFP_NORETRY
__GFP_REPEAT

An allocation that can call into a filesystem or do I/O will have much 
less memory pressure to contend with. Are the ranks for an allocation
with __GFP_IO|__GFP_FS really comparable with an allocation that does not 
have these set?

> >  cpuset configuration and memory policies in effect.
> 
> Yes, I see now that these might become an issue, I will have to think on
> this.

Note that we have not yet investigated what weird effect memory policy 
constraints can have on this. There are issues with memory policies only 
applying to certain zones.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Protect include/linux/console_struct.h against multiple inclusion.

2007-05-18 Thread Robert P. J. Day


Protect the header file include/linux/console_struct.h against
multiple inclusion, since not doing this causes one of the example
module programs in the Linux Kernel Module Programming Guide to fail
to build due to a bogus "redeclaration" of some structures.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

---

diff --git a/include/linux/console_struct.h b/include/linux/console_struct.h
index a461f76..e4c1f65 100644
--- a/include/linux/console_struct.h
+++ b/include/linux/console_struct.h
@@ -1,3 +1,5 @@
+#ifndef _LINUX_CONSOLE_STRUCT_H
+#define _LINUX_CONSOLE_STRUCT_H
 /*
  * console_struct.h
  *
@@ -130,3 +132,5 @@ extern void vc_SAK(struct work_struct *work);
 #define CUR_DEFAULT CUR_UNDERLINE

 #define CON_IS_VISIBLE(conp) (*conp->vc_display_fg == conp)
+
+#endif // _LINUX_CONSOLE_STRUCT_H
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Make prepare_namespace() wait for devices

2007-05-18 Thread Pierre Ossman

Hi there brave visitor from the future ;)

Andi Kleen wrote:
> Actually that's not correct. With panic=30 and lilo -R and a working 
> backup kernel a system can recover from this. With your endless loop it can't.
>
> Always add some kind of timeout.
>
>   

Did you check the second version? Is that sufficient or is a timeout a
must in your book?

Rgds

-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] drivers/net/wireless/libertas/fw.c: fix use-before-check

2007-05-18 Thread Eugene Teo

NULL checks should be performed before the dereference.

Spotted by the Coverity checker.

Signed-off-by: Eugene Teo <[EMAIL PROTECTED]>

diff --git a/drivers/net/wireless/libertas/fw.c 
b/drivers/net/wireless/libertas/fw.c
index 441123c..5c63c9b 100644
--- a/drivers/net/wireless/libertas/fw.c
+++ b/drivers/net/wireless/libertas/fw.c
@@ -333,18 +333,22 @@ static void command_timer_fn(unsigned long data)
unsigned long flags;

ptempnode = adapter->cur_cmd;
+   if (ptempnode == NULL) {
+   lbs_pr_debug(1, "PTempnode Empty\n");
+   return;
+   }
+
cmd = (struct cmd_ds_command *)ptempnode->bufvirtualaddr;
+   if (!cmd) {
+   lbs_pr_debug(1, "cmd is NULL\n");
+   return;
+   }

lbs_pr_info("command_timer_fn fired (%x)\n", cmd->command);

if (!adapter->fw_ready)
return;

-   if (ptempnode == NULL) {
-   lbs_pr_debug(1, "PTempnode Empty\n");
-   return;
-   }
-
spin_lock_irqsave(>driver_lock, flags);
adapter->cur_cmd = NULL;
spin_unlock_irqrestore(>driver_lock, flags);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] 2.6.21-git15 - Kconfig Cleanup

2007-05-18 Thread Matt LaPlante

ping?

On Fri, 11 May 2007 22:05:01 -0400
Matt LaPlante <[EMAIL PROTECTED]> wrote:

> Fix misc small issues/typos/grammar in Kconfigs for 2.6.21-git15.
> 
> Signed-off-by: Matt LaPlante <[EMAIL PROTECTED]>
> --
> 
> diff -ru a/arch/arm/plat-s3c24xx/Kconfig b/arch/arm/plat-s3c24xx/Kconfig
> --- a/arch/arm/plat-s3c24xx/Kconfig   2007-04-25 23:08:32.0 -0400
> +++ b/arch/arm/plat-s3c24xx/Kconfig   2007-05-11 21:44:06.0 -0400
> @@ -70,7 +70,7 @@
>   help
> Set the chunksize in Kilobytes of the CRC for checking memory
> corruption over suspend and resume. A smaller value will mean that
> -   the CRC data block will take more memory, but wil identify any
> +   the CRC data block will take more memory, but will identify any
> faults with better precision.
>  
> See 
> diff -ru a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
> --- a/arch/blackfin/Kconfig   2007-05-11 20:32:24.0 -0400
> +++ b/arch/blackfin/Kconfig   2007-05-11 21:33:28.0 -0400
> @@ -435,100 +435,100 @@
>   default y
>   help
> If enabled interrupt entry code (STORE/RESTORE CONTEXT) is linked
> -   into L1 instruction memory.(less latency)
> +   into L1 instruction memory. (less latency)
>  
>  config EXCPT_IRQ_SYSC_L1
> - bool "Locate entire ASM lowlevel excepetion / interrupt - Syscall and 
> CPLB handler code in L1 Memory"
> + bool "Locate entire ASM lowlevel exception / interrupt - Syscall and 
> CPLB handler code in L1 Memory"
>   default y
>   help
> -   If enabled entire ASM lowlevel exception and interrupt entry code 
> (STORE/RESTORE CONTEXT) is linked
> -   into L1 instruction memory.(less latency)
> +   If enabled, the entire ASM lowlevel exception and interrupt entry 
> code 
> +   (STORE/RESTORE CONTEXT) is linked into L1 instruction memory. (less 
> latency)
>  
>  config DO_IRQ_L1
>   bool "Locate frequently called do_irq dispatcher function in L1 Memory"
>   default y
>   help
> -   If enabled frequently called do_irq dispatcher function is linked
> -   into L1 instruction memory.(less latency)
> +   If enabled, the frequently called do_irq dispatcher function is linked
> +   into L1 instruction memory. (less latency)
>  
>  config CORE_TIMER_IRQ_L1
>   bool "Locate frequently called timer_interrupt() function in L1 Memory"
>   default y
>   help
> -   If enabled frequently called timer_interrupt() function is linked
> -   into L1 instruction memory.(less latency)
> +   If enabled, the frequently called timer_interrupt() function is linked
> +   into L1 instruction memory. (less latency)
>  
>  config IDLE_L1
>   bool "Locate frequently idle function in L1 Memory"
>   default y
>   help
> -   If enabled frequently called idle function is linked
> -   into L1 instruction memory.(less latency)
> +   If enabled, the frequently called idle function is linked
> +   into L1 instruction memory. (less latency)
>  
>  config SCHEDULE_L1
>   bool "Locate kernel schedule function in L1 Memory"
>   default y
>   help
> -   If enabled frequently called kernel schedule is linked
> -   into L1 instruction memory.(less latency)
> +   If enabled, the frequently called kernel schedule is linked
> +   into L1 instruction memory. (less latency)
>  
>  config ARITHMETIC_OPS_L1
>   bool "Locate kernel owned arithmetic functions in L1 Memory"
>   default y
>   help
> If enabled arithmetic functions are linked
> -   into L1 instruction memory.(less latency)
> +   into L1 instruction memory. (less latency)
>  
>  config ACCESS_OK_L1
>   bool "Locate access_ok function in L1 Memory"
>   default y
>   help
> -   If enabled access_ok function is linked
> -   into L1 instruction memory.(less latency)
> +   If enabled, the access_ok function is linked
> +   into L1 instruction memory. (less latency)
>  
>  config MEMSET_L1
>   bool "Locate memset function in L1 Memory"
>   default y
>   help
> -   If enabled memset function is linked
> -   into L1 instruction memory.(less latency)
> +   If enabled, the memset function is linked
> +   into L1 instruction memory. (less latency)
>  
>  config MEMCPY_L1
>   bool "Locate memcpy function in L1 Memory"
>   default y
>   help
> -   If enabled memcpy function is linked
> -   into L1 instruction memory.(less latency)
> +   If enabled, the memcpy function is linked
> +   into L1 instruction memory. (less latency)
>  
>  config SYS_BFIN_SPINLOCK_L1
>   bool "Locate sys_bfin_spinlock function in L1 Memory"
>   default y
>   help
> -   If enabled sys_bfin_spinlock function is linked
> -   into L1 instruction memory.(less latency)
> +   If enabled, the sys_bfin_spinlock function is linked
> +   into L1 instruction memory. (less latency)
>  
>  config

ht CPU flag

2007-05-18 Thread eugene


Hello dear developers!

Here is my system configuration.

Linux ns 2.6.19-gentoo-r5 #5 SMP Thu May 3 00:45:12 AMST 2007 i686 
Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux


Gnu C  4.1.1
Gnu make   3.81
binutils   2.16.1
util-linux 2.12r
mount  2.12r
module-init-tools  3.2.2
e2fsprogs  1.39
reiserfsprogs  3.6.19
Linux C Library> libc.2.5
Dynamic linker (ldd)   2.5
Procps 3.2.7
Net-tools  1.60
Kbd1.12
Sh-utils   6.4
udev   104
Modules Loaded ipt_ULOG iptable_filter iptable_mangle iptable_nat 
ip_nat ip_tables x_tables rtc nls_iso8859_5 ntfs ip_conntrack_ftp 
ip_conntrack nfnetlink e1000 e100 intelfb i2c_algo_bit intel_agp agpgart 
i2c_i801 i2c_core ehci_hcd pcspkr uhci_hcd usbcore sg

---

I have Pentium D CPU, which many Windows utilities like cpuz, wcpuid, 
everest identify as D 930 (Dual Core, 3GHz). From Intel site I find out 
that it has no HT feature, nor Windows XP identify it as HT.


Why do I have "ht" flag in cpuinfo?
---

cat /proc/cpuinfo
-
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 6
model name  : Intel(R) Pentium(R) D CPU 3.00GHz
stepping: 5
cpu MHz : 2992.732
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm

bogomips: 5990.31

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 15
model   : 6
model name  : Intel(R) Pentium(R) D CPU 3.00GHz
stepping: 5
cpu MHz : 2992.732
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm

bogomips: 5985.19
---

So, how must I choose kernel settings like:

number of CPU's?
HT support?

Best regards, Eugene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-18 Thread Jens Axboe

On Fri, May 18 2007, Badari Pulavarty wrote:
> On Fri, 2007-05-18 at 09:33 +0200, Jens Axboe wrote:
> > On Thu, May 17 2007, Badari Pulavarty wrote:
> > > On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
> > > .. 
> > > > > > 
> > > > > > Ah ok, you need the updated patch series for ppc64 support. Builds 
> > > > > > fine
> > > > > > here on ppc64. See the #sglist branch of the block repo:
> > > > > > 
> > > > > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > > > > 
> > > > > > I can mail you an updated patch, if you want.
> > > > > 
> > > > > 
> > > > > Here is the whole panic stack..
> > > > 
> > > > Thanks will fix that up, the IDE part is totally untested. Can you try
> > > > and backout this patch and see if it boots?
> > > 
> > > Yes. It boots fine with following backed out.
> > > 
> > > Looking at the code ide_probe.c: hwif_init() is doing
> > > 
> > > hwif->sg_table = kmalloc(sizeof(struct 
> > > scatterlist)*hwif->sg_max_nents,
> > >  GFP_KERNEL);
> > > 
> > > blk_rq_map_sg() is looking for the chaining info and going over end of the
> > > allocation.
> > 
> > Hmm, looks ok, I'm guessing it's just missing a memset (or just turn it
> > into a kzalloc())?
> > 
> 
> Even with backing out all the ide changes, I get this on boot
> once in a while.

Yep, I think the ide changes are fine as such, the problem is the
missing memset/kzalloc. Can you try that?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1-mm1

2007-05-18 Thread Mel Gorman


On Fri, 18 May 2007, H. Peter Anvin wrote:


young dave wrote:

Hi,

After installation the new mm1 kernel, My system can not boot, the rc1
kernel works ok.

The cursor just blinks after appearing "Bios data check successful"
message.

what do you think about this?


"Bios data check successful" is not a message that comes from Linux, nor
from the boot loader.

Since you have left absolutely zero details about your system or
anything else, there isn't much anyone can do about it.



It sounds vagely similar to the silent failure on elm3b132. I'm still 
bisecting this on the side. It's taking an age because the target machine 
is so slow and using a faster machine with a different compiler does not 
reproduce the problem. I don't think it's git-newsetup that is the problem 
though for what that's worth.


--
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Make prepare_namespace() wait for devices

2007-05-18 Thread Andi Kleen

Pierre Ossman <[EMAIL PROTECTED]> writes:
> 
> If the device never shows up than we will hang in an infinite loop.
> Previously we panic:ed instead, so this behaviour should be no
> worse.


Actually that's not correct. With panic=30 and lilo -R and a working 
backup kernel a system can recover from this. With your endless loop it can't.

Always add some kind of timeout.

-Andi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86-64 highres/dyntick support 2.6.22-rc1-v5

2007-05-18 Thread Christoph Lameter

On Thu, 17 May 2007, Frank Sorenson wrote:

> >> I've tracked down this hang to a kzalloc in the hpet code that never
> >> returns.  But only when using SLUB.  Using SLAB, the highres/dyntick
> >> patch boots without problem.
> >>
> >> ...adding Christoph to the CC list...
> > 
> > Please boot with slub_debug.
> 
> No debugging output at all.  Still hangs with only:
>   Kernel alive
>   Kernel direct mapping tables up to 1 @ 8000-d000

Is there some way you can get a stack trace?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1-mm1

2007-05-18 Thread H. Peter Anvin

young dave wrote:
> Hi,
> 
> After installation the new mm1 kernel, My system can not boot, the rc1
> kernel works ok.
> 
> The cursor just blinks after appearing "Bios data check successful"
> message.
> 
> what do you think about this?

"Bios data check successful" is not a message that comes from Linux, nor
from the boot loader.

Since you have left absolutely zero details about your system or
anything else, there isn't much anyone can do about it.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.21-rt2] PowerPC: decrementer clockevent driver

2007-05-18 Thread Matt Sealey


Thomas Gleixner wrote:
> On Fri, 2007-05-18 at 11:31 -0500, Kumar Gala wrote:
>> I asked this earlier, but figured you might have a better insight.   
>> Is their value in having 'drivers' for more than one clock source?   
>> I'd say most (of not all) the PPC SoCs have timers on the system side  
>> that we could provide drivers for, I'm just not sure if that does  
>> anything for anyone.
> 
> Not necessarily for the tick/highres stuff, but clock events allows
> other users as well to utilize such facilities. We have no users yet,
> but there are drivers, which utilize special timer hardware with nice
> #ifdeffery to allow the driver to be shared. This might be a useful
> thing for such stuff.

*ahem*

Please indulge my laziness and recommend me one or two to look at? I'm
no good at guessing what to grep for to find an example (I wonder if
we have any candidates in the ppc tree mostly..)

-- 
Matt Sealey <[EMAIL PROTECTED]>
Genesi, Manager, Developer Relations
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] drivers/net/wireless/libertas/rx.c: fix use-after-free

2007-05-18 Thread Eugene Teo

libertas_upload_rx_packet() calls netif_rx() before returning, and it always 
return 0.
Also within libertas_upload_rx_packet(), it will initialize skb->protocol 
anyways.

Spotted by the Coverity checker.

Signed-off-by: Eugene Teo <[EMAIL PROTECTED]>

diff --git a/drivers/net/wireless/libertas/rx.c 
b/drivers/net/wireless/libertas/rx.c
index d17924f..1d8d5e4 100644
--- a/drivers/net/wireless/libertas/rx.c
+++ b/drivers/net/wireless/libertas/rx.c
@@ -269,15 +269,12 @@ int libertas_process_rxed_packet(wlan_private * priv, 
struct sk_buff *skb)
wlan_compute_rssi(priv, p_rx_pd);

lbs_pr_debug(1, "RX Data: size of actual packet = %d\n", skb->len);
-   if (libertas_upload_rx_packet(priv, skb)) {
-   lbs_pr_debug(1, "RX error: libertas_upload_rx_packet"
-  " returns failure\n");
-   ret = -1;
-   goto done;
-   }
+
priv->stats.rx_bytes += skb->len;
priv->stats.rx_packets++;

+   libertas_upload_rx_packet(priv, skb);
+
ret = 0;
 done:
LEAVE();
@@ -439,21 +436,14 @@ static int process_rxed_802_11_packet(wlan_private * 
priv, struct
sk_buff *skb)

lbs_pr_debug(1, "RX Data: size of actual packet = %d\n", skb->len);

-   if (libertas_upload_rx_packet(priv, skb)) {
-   lbs_pr_debug(1, "RX error: libertas_upload_rx_packet "
-   "returns failure\n");
-   ret = -1;
-   goto done;
-   }
-
priv->stats.rx_bytes += skb->len;
priv->stats.rx_packets++;

+   libertas_upload_rx_packet(priv, skb);
+
ret = 0;
 done:
LEAVE();

-   skb->protocol = __constant_htons(0x0019);   /* ETH_P_80211_RAW */
-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.21-rt2] PowerPC: decrementer clockevent driver

2007-05-18 Thread Matt Sealey

Kumar Gala wrote:
> 
> On May 18, 2007, at 9:48 AM, Thomas Gleixner wrote:
> 
>> On Fri, 2007-05-18 at 15:28 +0100, Matt Sealey wrote:
>>>
>>> I think both the MPC52xx GPT0-7 and the SLT0-1 fulfil this fairly
>>> easily.
>>
>> There is some basic work for MPC5200 available:
>>
>> http://www.pengutronix.de/oselas/bsp/phytec/index_en.html#phyCORE-MPC5200B-tiny
>>
> 
> I asked this earlier, but figured you might have a better insight.  Is
> their value in having 'drivers' for more than one clock source?  I'd say
> most (of not all) the PPC SoCs have timers on the system side that we
> could provide drivers for, I'm just not sure if that does anything for
> anyone.

As I asked after, I'm also very intrigued as to what is going to end
up using these timers, but likewise, not much use writing a driver if
everyone can use the extremely high resolution decrementer all at
once..

As I said before too, at least Intel has decided there is a great need
for up to 256 high resolution timer sources on a system, but since this
is a fairly new concept to Linux (and hrtimers and dynticks too) it
only seems to be used in the case of i8254/RTC emulation, mostly on
x86-64.

I'm looking at it now and finding "users" of hrtimers is looking very
thin on the ground. Maybe it's justified on the basis that more is
better, and having support is preferable to not having it (even if
nobody really uses it) but it seems the entire gamut of timing
possibility in Linux can be handled through a simple, and single,
high resolution timer and a queue of events..

So do we need some more? :D

-- 
Matt Sealey <[EMAIL PROTECTED]>
Genesi, Manager, Developer Relations
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.21-rt2] PowerPC: decrementer clockevent driver

2007-05-18 Thread Thomas Gleixner

On Fri, 2007-05-18 at 11:31 -0500, Kumar Gala wrote:
> I asked this earlier, but figured you might have a better insight.   
> Is their value in having 'drivers' for more than one clock source?   
> I'd say most (of not all) the PPC SoCs have timers on the system side  
> that we could provide drivers for, I'm just not sure if that does  
> anything for anyone.

Not necessarily for the tick/highres stuff, but clock events allows
other users as well to utilize such facilities. We have no users yet,
but there are drivers, which utilize special timer hardware with nice
#ifdeffery to allow the driver to be shared. This might be a useful
thing for such stuff.

tglx

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH - 1/1] Documentation/HOWTO

2007-05-18 Thread Bernd Eckenfels

In article <[EMAIL PROTECTED]> you wrote:
>> bugs is one of the best ways to get merits among other developers, because
>> not many people like wasting time fixing other people's bugs.
>   ^^^ 
> 
> you might want to find a less demeaning term for debugging than
> "wasting time."  just a thought.

and it is not even correct.

Gruss
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] log out-of-virtual-memory events

2007-05-18 Thread Bernd Eckenfels

In article <[EMAIL PROTECTED]> you wrote:
> +   printk(KERN_INFO
> +  "out of virtual memory for process %d (%s): total_vm=%lu, 
> uid=%d\n",
> +   current->pid, current->comm, total_vm, current->uid);

And align this one with the print_fatal layout:

   printk(KERN_WARNING
  "%s/%d process cannot request more virtual memory: total_vm=%lu, 
uid=%d\n",
   current->comm, current->pid, total_vm, current->uid);

Greetings
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] log out-of-virtual-memory events

2007-05-18 Thread Bernd Eckenfels

In article <[EMAIL PROTECTED]> you wrote:
>printk("%s/%d: potentially unexpected fatal signal %d.\n",
>current->comm, current->pid, signr);

can we have both KERN_WARNING please?

Gruss
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.21-rt2] PowerPC: decrementer clockevent driver

2007-05-18 Thread Kumar Gala



On May 18, 2007, at 9:48 AM, Thomas Gleixner wrote:


On Fri, 2007-05-18 at 15:28 +0100, Matt Sealey wrote:
I guess the real question is, how high resolution does a high  
resolution

timer need to be,


   In the order of microseconds.


I think both the MPC52xx GPT0-7 and the SLT0-1 fulfil this fairly
easily.


There is some basic work for MPC5200 available:

http://www.pengutronix.de/oselas/bsp/phytec/index_en.html#phyCORE- 
MPC5200B-tiny


I asked this earlier, but figured you might have a better insight.   
Is their value in having 'drivers' for more than one clock source?   
I'd say most (of not all) the PPC SoCs have timers on the system side  
that we could provide drivers for, I'm just not sure if that does  
anything for anyone.


- k
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.22-rc1-mm1 - s390 vs. md

2007-05-18 Thread Williams, Dan J

> From: Cornelia Huck [mailto:[EMAIL PROTECTED]
> Finer granularity is certainly better here, but I'm not quite sure if
> this solves our s390 problem (we don't have dma support). All those
> backends should also have a non-dma version...

In fact that is already there.  Here is the form of async_memcpy for
example:
... async_memcpy( ... )
{
struct dma_chan *chan = async_tx_find_channel(depend_tx,
DMA_MEMCPY);
struct dma_device *device = chan ? chan->device : NULL;
int int_en = callback ? 1 : 0;
struct dma_async_tx_descriptor *tx = device ?
device->device_prep_dma_memcpy(chan, len,
int_en) : NULL;

if (tx) { /* run the memcpy asynchronously */

...

} else { /* run the memcpy synchronously */

...
}
}

When CONFIG_DMA_ENGINE=n async_tx_find_channel takes the form:
... async_tx_find_channel( ... )
{
return NULL;
}

So in the S390 case the entire asynchronous path will be compiled away.

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/REDIFF] Cleanup libata HPA support

2007-05-18 Thread Ben Collins

On Fri, 2007-05-18 at 11:06 -0400, Ben Collins wrote:
> (Rediffed against latest git)

Added error check for ata_dev_read_id (Thanks tj)

Also, since hpa is disabled by default, print the native size, even when
HPA isn't asked for (so users and developers can know that it may need
to be used).

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index d3ea7f5..3c8eb77 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -814,16 +814,19 @@ void ata_id_c_string(const u16 *id, unsigned char *s,
 
 static u64 ata_tf_to_lba48(struct ata_taskfile *tf)
 {
-   u64 sectors = 0;
+   u64 sectors;
+   u32 high, low;
 
-   sectors |= ((u64)(tf->hob_lbah & 0xff)) << 40;
-   sectors |= ((u64)(tf->hob_lbam & 0xff)) << 32;
-   sectors |= (tf->hob_lbal & 0xff) << 24;
-   sectors |= (tf->lbah & 0xff) << 16;
-   sectors |= (tf->lbam & 0xff) << 8;
-   sectors |= (tf->lbal & 0xff);
+   high = (tf->hob_lbah << 16) |
+  (tf->hob_lbam <<  8) |
+   tf->hob_lbal;
+   low  = (tf->lbah << 16) |
+  (tf->lbam <<  8) |
+   tf->lbal;
 
-   return ++sectors;
+   sectors = ((u64)high << 24) | low;
+
+   return sectors + 1;
 }
 
 static u64 ata_tf_to_lba(struct ata_taskfile *tf)
@@ -965,52 +968,107 @@ static u64 ata_set_native_max_address(struct ata_device 
*dev, u64 new_sectors)
 }
 
 /**
+ * ata_hpa_get_native_size -   Get the native size of a disk
+ * @dev: Device to get the size for
+ *
+ * Read the size of an LBA28 or LBA48 disk with HPA features and
+ * return the native size. Caller must check that the drive has HPA
+ * feature set enabled.
+ */
+static u64 ata_hpa_get_native_size(struct ata_device *dev)
+{
+   if (ata_id_has_lba48(dev->id))
+   return ata_read_native_max_address_ext(dev);
+   else
+   return ata_read_native_max_address(dev);
+}
+
+
+static u64 ata_hpa_set_native_size(struct ata_device *dev, u64 new_sectors)
+{
+   if (ata_id_has_lba48(dev->id))
+   return ata_set_native_max_address_ext(dev, new_sectors);
+   else
+   return ata_set_native_max_address(dev, new_sectors);
+}
+
+static unsigned long long sectors_to_MB(unsigned long long n)
+{
+   n <<= 9;/* make it bytes */
+   do_div(n, 100); /* make it MB */
+   return n;
+}
+
+static u64 ata_id_n_sectors(const u16 *id);
+
+/**
  * ata_hpa_resize  -   Resize a device with an HPA set
  * @dev: Device to resize
  *
  * Read the size of an LBA28 or LBA48 disk with HPA features and resize
- * it if required to the full size of the media. The caller must check
- * the drive has the HPA feature set enabled.
+ * it if required to the full size of the media.
  */
 
-static u64 ata_hpa_resize(struct ata_device *dev)
+static int ata_hpa_resize(struct ata_device *dev)
 {
-   u64 sectors = dev->n_sectors;
u64 hpa_sectors;
+
+   if (!ata_id_hpa_enabled(dev->id))
+   return 0;
+
+   hpa_sectors = ata_hpa_get_native_size(dev);

-   if (ata_id_has_lba48(dev->id))
-   hpa_sectors = ata_read_native_max_address_ext(dev);
-   else
-   hpa_sectors = ata_read_native_max_address(dev);
+   ata_dev_printk(dev, KERN_INFO, "%s: sectors = %lld, hpa_sectors = 
%lld\n",
+   __FUNCTION__, dev->n_sectors, hpa_sectors);
 
-   /* if no hpa, both should be equal */
-   ata_dev_printk(dev, KERN_INFO, "%s 1: sectors = %lld, "
-   "hpa_sectors = %lld\n",
-   __FUNCTION__, (long long)sectors, (long long)hpa_sectors);
+   if (ata_ignore_hpa && hpa_sectors > dev->n_sectors) {
+   u64 ret_sectors;
 
-   if (hpa_sectors > sectors) {
ata_dev_printk(dev, KERN_INFO,
-   "Host Protected Area detected:\n"
-   "\tcurrent size: %lld sectors\n"
-   "\tnative size: %lld sectors\n",
-   (long long)sectors, (long long)hpa_sectors);
-
-   if (ata_ignore_hpa) {
-   if (ata_id_has_lba48(dev->id))
-   hpa_sectors = 
ata_set_native_max_address_ext(dev, hpa_sectors);
-   else
-   hpa_sectors = ata_set_native_max_address(dev,
-   hpa_sectors);
-
-   if (hpa_sectors) {
-   ata_dev_printk(dev, KERN_INFO, "native size "
-   "increased to %lld sectors\n",
-   (long long)hpa_sectors);
-   return hpa_sectors;
+   "Host Protected Area detected.\n"
+   " current size : %llu sectors (%llu MB)\n"
+   " native  size :

Re: [RFC] log out-of-virtual-memory events

2007-05-18 Thread Andrea Righi

Andrea Righi wrote:
> Robin Holt wrote:
>> On Fri, May 18, 2007 at 09:50:03AM +0200, Andrea Righi wrote:
>>> Rik van Riel wrote:
 Andrea Righi wrote:
> I'm looking for a way to keep track of the processes that fail to
> allocate new
> virtual memory. What do you think about the following approach
> (untested)?
 Looks like an easy way for users to spam syslogd over and
 over and over again.

 At the very least, shouldn't this be dependant on print_fatal_signals?

>>> Anyway, with print-fatal-signals enabled a user could spam syslogd too, 
>>> simply
>>> with a (char *)0 = 0 program, but we could always identify the spam attempts
>>> logging the process uid...
>>>
>>> In any case, I agree, it should depend on that patch...
>>>
>>> What about adding a simple msleep_interruptible(SOME_MSECS) at the end of
>>> log_vm_enomem() or, at least, a might_sleep() to limit the potential 
>>> spam/second
>>> rate?
>> An msleep will slow down this process, but do nothing about slowing
>> down the amount of logging.  Simply fork a few more processes and all
>> you are doing with msleep is polluting the pid space.
>>
> 
> Very true.
> 
>> What about a throttling similar to what ia64 does for floating point
>> assist faults (handle_fpu_swa()).  There is a thread flag to not log
>> the events at all.  It is rate throttled globally, but uses per cpu
>> variables for early exits.  This algorithm scaled well to a thousand
>> cpus.
> 
> Actually using printk_ratelimit() should be enough... BTW 
> print_fatal_signals()
> should use it too.
> 

I mean, something like this...

---

Limit the rate of the printk()s in print_fatal_signal() to avoid potential DoS
problems.

Signed-off-by: Andrea Righi <[EMAIL PROTECTED]>

diff -urpN linux-2.6.22-rc1-mm1/kernel/signal.c 
linux-2.6.22-rc1-mm1-limit-print_fatal_signals-rate/kernel/signal.c
--- linux-2.6.22-rc1-mm1/kernel/signal.c2007-05-18 17:48:55.0 
+0200
+++ linux-2.6.22-rc1-mm1-limit-print_fatal_signals-rate/kernel/signal.c 
2007-05-18 17:58:13.0 +0200
@@ -790,6 +790,9 @@ static void print_vmas(void)
 
 static void print_fatal_signal(struct pt_regs *regs, int signr)
 {
+   if (unlikely(!printk_ratelimit()))
+   return;
+
printk("%s/%d: potentially unexpected fatal signal %d.\n",
current->comm, current->pid, signr);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] LZO: Further cleanup of the kernel LZO library headers

2007-05-18 Thread Heikki Orsila

On Fri, May 18, 2007 at 03:51:26PM +0100, Richard Purdie wrote:
> A further cleanup of the kernel LZO library headers which untangles and
> removes ~400 lines of defines. This doesn't change the core minilzo code
> so diffability is maintained.

You should just throw away that. Guptas implementation is much cleaner, 
work with that. Putting a few bound checks into Guptas version will 
solve crashes and overruns by returning an error.

-- 
Heikki Orsila   Barbie's law:
[EMAIL PROTECTED]   "Math is hard, let's go shopping!"
http://www.iki.fi/shd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH - 1/1] Documentation/HOWTO

2007-05-18 Thread Robert P. J. Day

On Fri, 18 May 2007, debian developer wrote:
...
> -Managing bug reports
> -
> -
> -One of the best ways to put into practice your hacking skills is by fixing
> -bugs reported by other people. Not only you will help to make the kernel
> -more stable, you'll learn to fix real world problems and you will improve
> -your skills, and other developers will be aware of your presence. Fixing
> bugs is one of the best ways to get merits among other developers, because
> not many people like wasting time fixing other people's bugs.
   ^^^ 

you might want to find a less demeaning term for debugging than
"wasting time."  just a thought.

rday
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-18 Thread Badari Pulavarty

On Fri, 2007-05-18 at 09:33 +0200, Jens Axboe wrote:
> On Thu, May 17 2007, Badari Pulavarty wrote:
> > On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
> > .. 
> > > > > 
> > > > > Ah ok, you need the updated patch series for ppc64 support. Builds 
> > > > > fine
> > > > > here on ppc64. See the #sglist branch of the block repo:
> > > > > 
> > > > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > > > 
> > > > > I can mail you an updated patch, if you want.
> > > > 
> > > > 
> > > > Here is the whole panic stack..
> > > 
> > > Thanks will fix that up, the IDE part is totally untested. Can you try
> > > and backout this patch and see if it boots?
> > 
> > Yes. It boots fine with following backed out.
> > 
> > Looking at the code ide_probe.c: hwif_init() is doing
> > 
> > hwif->sg_table = kmalloc(sizeof(struct 
> > scatterlist)*hwif->sg_max_nents,
> >  GFP_KERNEL);
> > 
> > blk_rq_map_sg() is looking for the chaining info and going over end of the
> > allocation.
> 
> Hmm, looks ok, I'm guessing it's just missing a memset (or just turn it
> into a kzalloc())?
> 

Even with backing out all the ide changes, I get this on boot
once in a while.

Thanks,
Badari

ReiserFS: hda2: checking transaction log (hda2)
Unable to handle kernel paging request at 005e5e66 RIP:
 [] blk_rq_map_sg+0x71/0x1b0
PGD 0
Oops:  [1] SMP
CPU 3
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.22-rc1-sg #7
RIP: 0010:[]  [] blk_rq_map_sg
+0x71/0x1b0
RSP: :8101a024fcc8  EFLAGS: 00010287
RAX: 0001df33e000 RBX: 8101df2b5f70 RCX: 00019f352000
RDX:  RSI: 8101df228300 RDI: 001df33e
RBP: 8101a024fd28 R08: 04e2 R09: 
R10: 007f R11: 0001 R12: 005e5e46
R13: 1000 R14:  R15: 8101df2b5f60
FS:  () GS:8101c021f300()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 005e5e66 CR3: 00201000 CR4: 06e0
Process swapper (pid: 1, threadinfo 8101a0238000, task
810180238000)
Stack:  0003 810179c58000 00019f352000
810179c562c0
 8101df228e80 00170082 01ff81010001 8101df3207a8
 8078a500 810179c56000 8078a500 8101df3207a8
Call Trace:
   [] ide_map_sg+0x42/0xd0
 [] ide_build_sglist+0x2a/0x90
 [] ide_build_dmatable+0x2f/0x180
 [] ide_dma_setup+0x44/0xe0
 [] ide_do_rw_disk+0x349/0x510
 [] ide_do_request+0x622/0xb40
 [] ide_end_request+0x9d/0x160
 [] ide_dma_intr+0x0/0xd0
 [] ide_dma_intr+0x0/0xd0
 [] ide_intr+0x23f/0x250
 [] handle_IRQ_event+0x35/0x70
 [] handle_edge_irq+0xcc/0x150
 [] do_IRQ+0x80/0x100
 [] ret_from_intr+0x0/0xa
   [] kmem_cache_alloc+0x40/0x70
 [] mempool_alloc_slab+0x11/0x20
 [] mempool_alloc+0x42/0x110
 [] generic_make_request+0x198/0x240
 [] bio_alloc_bioset+0x2e/0x120
 [] bio_alloc+0x10/0x20
 [] submit_bh+0x6b/0x140
 [] ll_rw_block+0xd0/0xe0
 [] journal_read+0xb5e/0xec0
 [] zone_statistics+0x61/0xa0
 [] get_page_from_freelist+0x3c8/0x510
 [] __alloc_pages+0x6e/0x330
 [] alloc_page_interleave+0x8d/0xa0
 [] alloc_pages_current+0x86/0x90
 [] get_zeroed_page+0x20/0x40
 [] __pte_alloc_kernel+0x64/0x80
 [] map_vm_area+0x1dc/0x2e0
 [] __vmalloc_area_node+0x157/0x1a0
 [] journal_init+0x819/0x990
 [] __vmalloc_area_node+0x157/0x1a0
 [] __vmalloc_node+0x6f/0x80
 [] __vmalloc+0xe/0x10
 [] reiserfs_fill_super+0x2ba/0xc20
 [] vsnprintf+0x2e7/0x680
 [] snprintf+0x59/0x60
 [] __down_write_nested+0x17/0xc0
 [] strlcpy+0x4f/0x70
 [] test_bdev_super+0x0/0x20
 [] get_sb_bdev+0x13c/0x170
 [] reiserfs_fill_super+0x0/0xc20
 [] get_super_block+0x13/0x20
 [] vfs_kern_mount+0xd8/0x160
 [] do_kern_mount+0x4e/0x100
 [] do_mount+0x4e2/0x790
 [] __d_lookup+0x9c/0x130
 [] do_lookup+0x84/0x200
 [] do_lookup+0x84/0x200
 [] dput+0x24/0x140
 [] __link_path_walk+0x469/0xec0
 [] zone_statistics+0x7d/0xa0
 [] __alloc_pages+0x6e/0x330
 [] alloc_page_interleave+0x8d/0xa0
 [] alloc_pages_current+0x86/0x90
 [] __get_free_pages+0x1b/0x40
 [] copy_mount_options+0x52/0x180
 [] sys_mount+0x94/0xf0
 [] do_mount_root+0x21/0xa0
 [] mount_block_root+0x90/0x220
 [] sys_rmdir+0x11/0x20
 [] mount_root+0xe6/0xf0
 [] prepare_namespace+0xad/0x160
 [] kernel_init+0x23a/0x330
 [] child_rip+0xa/0x12
 [] kernel_init+0x0/0x330
 [] child_rip+0x0/0x12


Code: 49 8b 44 24 20 49 8d 4c 24 20 48 89 c2 48 83 e2 fe a8 01 48
RIP  [] blk_rq_map_sg+0x71/0x1b0
 RSP 
CR2: 005e5e66


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kconfig powernow-k8 driver should depend on ACPI P-States driver

2007-05-18 Thread Dave Jones

On Fri, May 18, 2007 at 12:09:38AM -0400, Ed Sweetman wrote:
 > the previous post i keep referring you to has a patch that was mangled 
 > ...here is the non-mangled version
 > 
 > --- ./linux-backup/arch/x86_64/kernel/cpufreq/Kconfig   2007-02-04 
 > 13:44:54.0 -0500
 > +++ ./linux-2.6.21-rc5-mm2/arch/x86_64/kernel/cpufreq/Kconfig 
 > 2007-05-17 18:13:07.0 -0400
 > @@ -10,20 +10,27 @@
 > 
 >   comment "CPUFreq processor drivers"
 > 
 > -config X86_POWERNOW_K8
 > +config  X86_POWERNOW_K8
 >  tristate "AMD Opteron/Athlon64 PowerNow!"

still has unnecessary whitespace changes

 >  select CPU_FREQ_TABLE
 >  help
 >This adds the CPUFreq driver for mobile AMD Opteron/Athlon64 
 > processors.
 > + An acpi interface is available if acpi support has been selected.
 > + This is required for multi-socket and other systems but not 
 > necessarily required for UP single socket systems.

and still wordwrapped.
(also capitalise ACPI)

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v13

2007-05-18 Thread Ingo Molnar


* Michael Lothian <[EMAIL PROTECTED]> wrote:

> Just thought I'd let you know that CFS is working on the PS3

heh, an important milestone i think =B-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] log out-of-virtual-memory events

2007-05-18 Thread Andrea Righi

Robin Holt wrote:
> On Fri, May 18, 2007 at 09:50:03AM +0200, Andrea Righi wrote:
>> Rik van Riel wrote:
>>> Andrea Righi wrote:
 I'm looking for a way to keep track of the processes that fail to
 allocate new
 virtual memory. What do you think about the following approach
 (untested)?
>>> Looks like an easy way for users to spam syslogd over and
>>> over and over again.
>>>
>>> At the very least, shouldn't this be dependant on print_fatal_signals?
>>>
>> Anyway, with print-fatal-signals enabled a user could spam syslogd too, 
>> simply
>> with a (char *)0 = 0 program, but we could always identify the spam attempts
>> logging the process uid...
>>
>> In any case, I agree, it should depend on that patch...
>>
>> What about adding a simple msleep_interruptible(SOME_MSECS) at the end of
>> log_vm_enomem() or, at least, a might_sleep() to limit the potential 
>> spam/second
>> rate?
> 
> An msleep will slow down this process, but do nothing about slowing
> down the amount of logging.  Simply fork a few more processes and all
> you are doing with msleep is polluting the pid space.
> 

Very true.

> What about a throttling similar to what ia64 does for floating point
> assist faults (handle_fpu_swa()).  There is a thread flag to not log
> the events at all.  It is rate throttled globally, but uses per cpu
> variables for early exits.  This algorithm scaled well to a thousand
> cpus.

Actually using printk_ratelimit() should be enough... BTW print_fatal_signals()
should use it too.

-Andrea

---

Signed-off-by: Andrea Righi <[EMAIL PROTECTED]>

diff -urpN linux-2.6.21/mm/mmap.c linux-2.6.21-vm-log-enomem/mm/mmap.c
--- linux-2.6.21/mm/mmap.c  2007-04-26 05:08:32.0 +0200
+++ linux-2.6.21-vm-log-enomem/mm/mmap.c2007-05-18 17:17:32.0 
+0200
@@ -77,6 +77,29 @@ int sysctl_max_map_count __read_mostly =
 atomic_t vm_committed_space = ATOMIC_INIT(0);
 
 /*
+ * Print current process informations when it fails to allocate new virtual
+ * memory.
+ */
+static inline void log_vm_enomem(void)
+{
+   unsigned long total_vm = 0;
+   struct mm_struct *mm;
+
+   if (unlikely(!printk_ratelimit()))
+   return;
+
+   task_lock(current);
+   mm = current->mm;
+   if (mm)
+   total_vm = mm->total_vm;
+   task_unlock(current);
+
+   printk(KERN_INFO
+  "out of virtual memory for process %d (%s): total_vm=%lu, 
uid=%d\n",
+   current->pid, current->comm, total_vm, current->uid);
+}
+
+/*
  * Check that a process has enough memory to allocate a new virtual
  * mapping. 0 means there is enough memory for the allocation to
  * succeed and -ENOMEM implies there is not.
@@ -175,6 +198,7 @@ int __vm_enough_memory(long pages, int c
return 0;
 error:
vm_unacct_memory(pages);
+   log_vm_enomem();
 
return -ENOMEM;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.21-rt2] PowerPC: decrementer clockevent driver

2007-05-18 Thread Sergei Shtylyov


Hello.

Daniel Walker wrote:

   Well, the decrementer frequency may change, at least in theory (if the bus 
clock changes).



Does that happen very often?


   Never, I hope. :-)


Daniel


WBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Increased ipw2200 power usage with dynticks

2007-05-18 Thread Björn Steinbrink

Hi,

while playing around with powertop, I noticed that my power usage wasn't
what it used to be. In total idle mode, everything was fine, but as soon
as I loaded the ipw2200 module and bring up the device, power usage
rises to about 16.8W, while kernel up to 2.6.20 used only about 15.3W. A
day worth of bisecting later, I could put my finger on dynticks, tested
2.6.22-rc1 without dynticks and power usage was at least down to about
15.5-15.6W, not perfect, but a lot better.

It seems that ipw2200 wants to do something about its power usage, but
dynticks stops it from doing so, as holding down the "Fn" key, and
thereby generating about 60 additional wakeups per second, brings the
power usage down to about the same as in the non-dynticks case.

Powertop also revealed, that with dynticks, my CPU is spending 99% in C2
when the power usage is high, but about 50% in C3 when I hold down "Fn".

Let me know if you need further details.

Thanks,
Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rfc] increase struct page size?!

2007-05-18 Thread Hugh Dickins

On Fri, 18 May 2007, Nick Piggin wrote:
> 
> If we add 8 bytes to struct page on 64-bit machines, it becomes 64 bytes,
> which is quite a nice number for cache purposes.
> 
> However we don't have to let those 8 bytes go to waste: we can use them
> to store the virtual address of the page, which kind of makes sense for
> 64-bit, because they can likely to use complicated memory models.

Sooner rather than later, don't we need those 8 bytes to expand from
atomic_t to atomic64_t _count and _mapcount?  Not that we really need
all 64 bits of both, but I don't know how to work atomically with less.

(Why do I have this sneaking feeling that you're actually wanting
to stick something into the lower bits of page->virtual?)

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.21-rt2] PowerPC: decrementer clockevent driver

2007-05-18 Thread Daniel Walker

On Fri, 2007-05-18 at 19:06 +0400, Sergei Shtylyov wrote:
> Daniel Walker wrote:
> 
> >>I haven't looked at all the new clock/timer code, is there any  
> >>utility in having support for more than one clock source?
> 
> > There is if the main clocksource has some issues where it can't be used.
> 
> You mean, having more than one clocksource is *useful* in this case?

Yes.

> Well, the decrementer frequency may change, at least in theory (if the 
> bus 
> clock changes).

Does that happen very often?

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.21-rt2] PowerPC: decrementer clockevent driver

2007-05-18 Thread Matt Sealey

I already have that stuff, but it only implements the decrementer (in fact
it's the patch submitted at the beginning of this thread).

I got it because I was far more interested in the GPIO handling..

-- 
Matt Sealey <[EMAIL PROTECTED]>
Genesi, Manager, Developer Relations

Thomas Gleixner wrote:
> On Fri, 2007-05-18 at 15:28 +0100, Matt Sealey wrote:
 I guess the real question is, how high resolution does a high resolution
 timer need to be,
>>>In the order of microseconds.
>> I think both the MPC52xx GPT0-7 and the SLT0-1 fulfil this fairly
>> easily.
> 
> There is some basic work for MPC5200 available:
> 
> http://www.pengutronix.de/oselas/bsp/phytec/index_en.html#phyCORE-MPC5200B-tiny
> 
>   tglx
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-18 Thread Andrew Morton

On Fri, 18 May 2007 15:17:36 +0300 Zilvinas Valinskas <[EMAIL PROTECTED]> wrote:

> Have found this in dmesg (well earlier because of initcall_debug) I've
> never noticed that during boot (scrolls away too fast). Anyway -
> 
> [7.841871] NetLabel: Initializing
> [7.841983] NetLabel:  domain hash size = 128
> [7.842095] NetLabel:  protocols = UNLABELED CIPSOv4
> [7.842219] NetLabel:  unlabeled traffic allowed by default
> [7.842338] BUG: at include/linux/slub_def.h:77 kmalloc_index()
> [7.842451] 
> [7.842452] Call Trace:
> [7.842677]  [] get_slab+0x1cc/0x260
> [7.842791]  [] __kmalloc+0xd/0x80
> [7.842907]  [] cache_k8_northbridges+0x7e/0x100
> [7.843024]  [] gart_iommu_init+0x33/0x5b0
> [7.843140]  [] netlbl_unlabel_acceptflg_set+0x86/0xf0
> [7.843255]  [] pci_iommu_init+0x9/0x20
> [7.843370]  [] kernel_init+0x157/0x330
> [7.843485]  [] child_rip+0xa/0x12
> [7.843601]  [] acpi_ds_init_one_object+0x0/0x7c
> [7.843715]  [] kernel_init+0x0/0x330
> [7.843829]  [] child_rip+0x0/0x12
> [7.843941] 
> [7.844056] PCI-GART: No AMD northbridge found.


yup, thanks - the below patch will be in this evening's batch -> Linus.



From: Ben Collins <[EMAIL PROTECTED]>

kmalloc for flush_words resulted in zero size allocation when no
k8_northbridges existed.  Short circuit the code path for this case.

Also remove uneeded zeroing of num_k8_northbridges just after checking if
it is zero.

Signed-off-by: Ben Collins <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Dave Jones <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/k8.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletion(-)

diff -puN 
arch/x86_64/kernel/k8.c~avoid-zero-size-allocation-in-cache_k8_northbridges 
arch/x86_64/kernel/k8.c
--- 
a/arch/x86_64/kernel/k8.c~avoid-zero-size-allocation-in-cache_k8_northbridges
+++ a/arch/x86_64/kernel/k8.c
@@ -39,10 +39,10 @@ int cache_k8_northbridges(void)
 {
int i;
struct pci_dev *dev;
+
if (num_k8_northbridges)
return 0;
 
-   num_k8_northbridges = 0;
dev = NULL;
while ((dev = next_k8_northbridge(dev)) != NULL)
num_k8_northbridges++;
@@ -52,6 +52,11 @@ int cache_k8_northbridges(void)
if (!k8_northbridges)
return -ENOMEM;
 
+   if (!num_k8_northbridges) {
+   k8_northbridges[0] = NULL;
+   return 0;
+   }
+
flush_words = kmalloc(num_k8_northbridges * sizeof(u32), GFP_KERNEL);
if (!flush_words) {
kfree(k8_northbridges);
_

> Does this backtrace looks sane ? Hmm, netlabel code mixes with
> acpi_ds_init_one_object() ... Strange.

Backtraces can be pretty messy nowadays.  CONFIG_FRAME_POINTER helps
improve them.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86_64: mcelog tolerant level cleanup

2007-05-18 Thread Tim Hockin

On 5/18/07, Andi Kleen <[EMAIL PROTECTED]> wrote:

>  * If RIPV is set it is not safe to restart, so set the 'no way out'
>flag rather than the 'kill it' flag.

Why? It is not PCC. We cannot return of course, but killing isn't returning.

My understanding is that the absence of RIPV indicates that it is not
safe to restart, period.  Not that the running *task* is not safe* but
that the IP on the stack is not valid to restart at all.

>  * Don't panic() on correctable MCEs.

The idea behind this was that if you get an exception it is always a bit risky
because there are a few potential deadlocks that cannot be avoided.
And normally non UC is just polled which will never cause a panic.
So I don't quite see the value of this change.

It will still always panic when tolerant == 0, and of course you're
right correctable errors would skip over the panic() path anyway.  I
can roll back the "<0" part, though I don't see the difference now :)

> This patch also calls nonseekable_open() in mce_open (as suggested by akpm).

That should be a separate patch

Andrew already sucked it into -mm - do you want me to break it out,
and re-submit?

> + 0: always panic on uncorrected errors, log corrected errors
> + 1: panic or SIGBUS on uncorrected errors, log corrected errors
> + 2: SIGBUS or log uncorrected errors, log corrected errors

Just saying SIGBUS is misleading because it isn't a catchable
signal.

should I change that to "kill" ?

Why did you remove the idle special case?

Because once the other tolerant rules are clarified, it's redundant
for tolerant < 2, and I think it's a bad special case for tolerant ==
2, and it's definately wrong for tolerant == 3.

Shall I re-roll?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v13

2007-05-18 Thread Michael Lothian


Just thought I'd let you know that CFS is working on the PS3


neutrino boot # dmesg
Using PS3 machine description
Page orders: linear mapping = 24, virtual = 12, io = 12
Starting Linux PPC64 #1 SMP Fri May 18 09:26:38 UTC 2007
-
ppc64_pft_size= 0x14
physicalMemorySize= 0x800
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address  = 0x
htab_hash_mask= 0x1fff
-
Linux version 2.6.22-rc1-cfs-v13 ([EMAIL PROTECTED]) (gcc version 4.1.1
(Gentoo 4.1.1-r3)) #1 SMP Fri May 18 09:26:38 UTC 2007

It feels more responsive but I shall do more testing and see if there
are any real benefits

On 17/05/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:


i'm pleased to announce release -v13 of the CFS scheduler patchset.

The CFS patch against v2.6.22-rc1, v2.6.21.1 or v2.6.20.10 can be
downloaded from the usual place:

 http://people.redhat.com/mingo/cfs-scheduler/

-v13 is a fixes-only release. It fixes a smaller accounting bug, so if
you saw small lags during desktop use under certain workloads then
please re-check that workload under -v13 too. It also tweaks SMP
load-balancing a bit. (Note: the load-balancing artifact reported by
Peter Williams is not a CFS-specific problem and he reproduced it in
v2.6.21 too. Nevertheless -v13 should be less prone to such artifacts.)

I know about no open CFS regression at the moment, so please re-test
-v13 and if you still see any problem please re-report it. Thanks!

Changes since -v12:

 - small tweak: made the "fork flow" of reniced tasks zero-sum

 - debugging update: /proc//sched is now seqfile based and echoing
   0 to it clears the maximum-tracking counters.

 - more debugging counters

 - small rounding fix to make the statistical average of rounding errors
   zero

 - scale both the runtime limit and the granularity on SMP too, and make
   it dependent on HZ

 - misc cleanups

As usual, any sort of feedback, bugreport, fix and suggestion is more
than welcome,

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61

2007-05-18 Thread Alan Cox

On Fri, 18 May 2007 10:34:35 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:

> Alan Cox wrote:
> >> That shouldn't be a problem, libata default DMA mask is 32 bits (which 
> >> isn't overridden with this controller) and so the block layer will 
> >> bounce any data being read/written above that point with IOMMU or 
> >> swiotlb. The comment is a bit unnecessarily scary.
> > 
> > Adding a BUG_ON for this would be wise. Its trivial to check and a BUG
> > rather than corruption if this assumption ever changes would be far
> > preferable
> 
> The default DMA mask -everywhere- is 32 bits.
> 
> A lot of code will break if this assumption ever changes, not just libata.

Little lesson from history..

Over ten years ago someone (Eric Youngdale I guess) stuck a panic check
for DMA over 16MBytes in the AHA 1542 ISA SCSI driver. Last year it
triggered. The panic probably saved someone from corruption and meant the
bug could be fixed.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 1 2 3 4 5 6 7 8 >

101 - 200 of 773 matches

Mail list logo