Re: 2.6.20.6 vanilla does't boot

2007-04-12 Thread Денис Кирьянов

I showed demsg output at the current running kernel. When booting
kernel 2.6.20.6 I see only lines I have described above

--
Regards,
Denis
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Apple SMC driver - standardize and sanitize sysfs tree + minor features addition

2007-04-12 Thread Nicolas Boichat
Hi again,

Jean Delvare wrote:
>>
>> However, I'm not really satisfied with the way sysfs files are created:
>> I use a lot of preprocessor macros to avoid repetition of code.
>> The files created with these macros in /sys/devices/platform/applesmc are
>> the following (on a Macbook Pro):
>> fan0_actual_speed
>> fan0_manual
>> fan0_maximum_speed
>> fan0_minimum_speed
>> fan0_safe_speed
>> fan0_target_speed
>> fan1_actual_speed
>> fan1_manual
>> fan1_maximum_speed
>> fan1_minimum_speed
>> fan1_safe_speed
>> fan1_target_speed
>> temperature_0
>> temperature_1
>> temperature_2
>> temperature_3
>> temperature_4
>> temperature_5
>> temperature_6
>> 
>
> First of all, please read Documentation/hwmon/sysfs-documentation, and
> rename the entries to match the standard names whenever possible. Also
> make sure that you use the standard units. If you use the standard
> names and units and if you register your device with the hwmon class,
> standard monitoring application will be able to support your driver.
>   

Fixed.

[snip]

>> Also, I never call any sysfs_remove_* function, as the files are
>> deleted when the module is unloaded. Is it safe to do so? Doesn't it
>> cause any memory leak?
>> 
>
> This is considered a bad practice, as in theory you driver shouldn't
> create the device by itself, and the files are associated to the device,
> not the driver. All hardware monitoring drivers have been fixed now, so
> please add the file removal calls in your driver too. You might find it
> easier to use file groups rather than individual files. Again, see for
> example the f71805f driver, and in particular the f71805f_attributes
> array and f71805f_group structure, and the sysfs_create_group() and
> sysfs_remove_group() calls.
>   

Fixed too.

I also added some sanity checks, and some minor features I discovered
using key enumeration (see next patch).

Best regards,

Nicolas

- Standardize applesmc to use sysfs filenames recommended by
  Documentation/hwmon/sysfs-interface, and register the device with the hwmon
  class.
- Use snprintf instead of sprintf in sysfs show handlers.
- Remove the sysfs files properly in case of initialisation problem, and when
  the driver is unloaded.
- Add data buffer length sanity checks.
- Improvements of SMC keys' comments (add data type reported by the device).
- Add temperature sensors to Macbook Pro.
- Add support for reading fan physical position (e.g. "Left Side")

Signed-off-by: Nicolas Boichat <[EMAIL PROTECTED]>
---

 drivers/hwmon/applesmc.c |  280 --
 1 files changed, 192 insertions(+), 88 deletions(-)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index f7b59fc..531bc9a 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -37,40 +37,48 @@
 #include 
 #include 
 #include 
+#include 
 
-/* data port used by apple SMC */
+/* data port used by Apple SMC */
 #define APPLESMC_DATA_PORT 0x300
-/* command/status port used by apple SMC */
+/* command/status port used by Apple SMC */
 #define APPLESMC_CMD_PORT  0x304
 
-#define APPLESMC_NR_PORTS  5 /* 0x300-0x304 */
+#define APPLESMC_NR_PORTS  32 /* 0x300-0x31f */
+
+#define APPLESMC_MAX_DATA_LENGTH 32
 
 #define APPLESMC_STATUS_MASK   0x0f
 #define APPLESMC_READ_CMD  0x10
 #define APPLESMC_WRITE_CMD 0x11
 
-#define LIGHT_SENSOR_LEFT_KEY  "ALV0" /* r-o length 6 */
-#define LIGHT_SENSOR_RIGHT_KEY "ALV1" /* r-o length 6 */
-#define BACKLIGHT_KEY  "LKSB" /* w-o */
+#define LIGHT_SENSOR_LEFT_KEY  "ALV0" /* r-o {alv (6 bytes) */
+#define LIGHT_SENSOR_RIGHT_KEY "ALV1" /* r-o {alv (6 bytes) */
+#define BACKLIGHT_KEY  "LKSB" /* w-o {lkb (2 bytes) */
 
-#define CLAMSHELL_KEY  "MSLD" /* r-o length 1 (unused) */
+#define CLAMSHELL_KEY  "MSLD" /* r-o ui8 (unused) */
 
-#define MOTION_SENSOR_X_KEY"MO_X" /* r-o length 2 */
-#define MOTION_SENSOR_Y_KEY"MO_Y" /* r-o length 2 */
-#define MOTION_SENSOR_Z_KEY"MO_Z" /* r-o length 2 */
-#define MOTION_SENSOR_KEY  "MOCN" /* r/w length 2 */
+#define MOTION_SENSOR_X_KEY"MO_X" /* r-o sp78 (2 bytes) */
+#define MOTION_SENSOR_Y_KEY"MO_Y" /* r-o sp78 (2 bytes) */
+#define MOTION_SENSOR_Z_KEY"MO_Z" /* r-o sp78 (2 bytes) */
+#define MOTION_SENSOR_KEY  "MOCN" /* r/w ui16 */
 
-#define FANS_COUNT "FNum" /* r-o length 1 */
-#define FANS_MANUAL"FS! " /* r-w length 2 */
-#define FAN_ACTUAL_SPEED   "F0Ac" /* r-o length 2 */
-#define FAN_MIN_SPEED  "F0Mn" /* r-o length 2 */
-#define FAN_MAX_SPEED  "F0Mx" /* r-o length 2 */
-#define FAN_SAFE_SPEED "F0Sf" /* r-o length 2 */
-#define FAN_TARGET_SPEED   "F0Tg" /* r-w length 2 */
+#define FANS_COUNT "FNum" /* r-o ui8 */
+#define FANS_MANUAL"FS! " /* r-w ui16 */
+#define FAN_ACTUAL_SPEED   "F0Ac" /* r-o fpe2 (2 bytes) */
+#define FAN_MIN_SPEED  "F0Mn" /* r-o fpe2 (2 bytes) */
+#define FAN_MAX_SPEED  

RE: sched_yield proposals/rationale

2007-04-12 Thread Buytaert_Steven

> From: Bill Davidsen
>
> And having gotten same, are you going to code up what appears to be a
> solution, based on this feedback?

The feedback was helpful in verifying whether there are any arguments against 
my approach. The real proof is in the pudding.

I'm running a kernel with these changes, as we speak. Overall system throughput 
is about up 20%. With 'system throughput' I mean measured performance of a 
rather large (experimental) system. The patch isn't even 24h old... Also the 
application latency has improved.

Additional settings: my patch is running *also* with a kernel modified to have 
only 8 default time slices at 250Hz setting. And no, the overall number of 
context switches per second hasn't blown up. The kernel was compiled with low 
latency and in-kernel preemption enabled, BKL preemption enabled. I haven't 
checked the patch stand alone yet.

> I'm curious how well it would run poorly written programs, having
> recently worked with a company which seemed to have a whole part of
> purchasing dedicated to buying same. :-(

So first signs are positive; note that it requires much more run time and a 
slew of other tests/scrutiny before we can be really sure.

W.r.t. the remarks; I am most interested in possibilities of DOS attacks that 
could exploit this change in sched_yield. Therefore the comments of Andi were 
interesting, but I haven't heard back from him yet. I'm still not sure how a 
task could juggle more slices from the system because of these changes.

Last remark on the O(1)'ness being violated. I think it's a mooth point. The 
sched_yield is executed on the CPU time of the yielder. Being O(1) is most 
important for the scheduler proper at each timer tick (interrupt). That being 
O(1) is crucial.

Steven Buytaert
--
La perfection est atteinte non quand il ne reste rien ajouter, mais quand il ne 
reste rien à enlever. (Antoine de Saint-Exupéry)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.20.6 vanilla does't boot

2007-04-12 Thread Денис Кирьянов

Hi all!
I installed a new kernel 2.6.20.6 and it is unable to boot. During
loading, I get some messages from the kernel, similar to the
following
PCI: BIOS Bug: MCFG area at x is not E820-reserved
PCI: Not using MMCONFIG
udevplug: make_queue: unable to create /dev/.udev/queue: No such file
or directory
udevplug: make_queue: unable to create /dev/.udev/queue: File exists
sda: assuming drive cache: write through
Then loading stops and after about 2 minutes boots BusyBox.  Kernel
version 2.6.18.8 and 2.6.16.44-rc2 loaded properly.

the output of dmesg:
[17179569.184000] Linux version 2.6.15-26-686 ([EMAIL PROTECTED]) (gcc
version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)) #1 SMP PREEMPT Fri Sep 8
20:16:40 UTC 2006
[17179569.184000] BIOS-provided physical RAM map:
[17179569.184000]  BIOS-e820:  - 0009fc00 (usable)
[17179569.184000]  BIOS-e820: 0009fc00 - 000a (reserved)
[17179569.184000]  BIOS-e820: 000e4000 - 0010 (reserved)
[17179569.184000]  BIOS-e820: 0010 - 7ffa (usable)
[17179569.184000]  BIOS-e820: 7ffa - 7ffae000 (ACPI data)
[17179569.184000]  BIOS-e820: 7ffae000 - 7ffe (ACPI NVS)
[17179569.184000]  BIOS-e820: 7ffe - 8000 (reserved)
[17179569.184000]  BIOS-e820: ffb0 - 0001 (reserved)
[17179569.184000] 1151MB HIGHMEM available.
[17179569.184000] 896MB LOWMEM available.
[17179569.184000] found SMP MP-table at 000ff780
[17179569.184000] On node 0 totalpages: 524192
[17179569.184000]   DMA zone: 4096 pages, LIFO batch:0
[17179569.184000]   DMA32 zone: 0 pages, LIFO batch:0
[17179569.184000]   Normal zone: 225280 pages, LIFO batch:31
[17179569.184000]   HighMem zone: 294816 pages, LIFO batch:31
[17179569.184000] DMI 2.3 present.
[17179569.184000] ACPI: RSDP (v000 ACPIAM
 ) @ 0x000fad00
[17179569.184000] ACPI: RSDT (v001 A M I  OEMRSDT  0x05000504 MSFT
0x0097) @ 0x7ffa
[17179569.184000] ACPI: FADT (v001 A M I  OEMFACP  0x05000504 MSFT
0x0097) @ 0x7ffa0200
[17179569.184000] ACPI: MADT (v001 A M I  OEMAPIC  0x05000504 MSFT
0x0097) @ 0x7ffa0390
[17179569.184000] ACPI: OEMB (v001 A M I  AMI_OEM  0x05000504 MSFT
0x0097) @ 0x7ffae040
[17179569.184000]   >>> ERROR: Invalid checksum
[17179569.184000] ACPI: MCFG (v001 A M I  OEMMCFG  0x05000504 MSFT
0x0097) @ 0x7ffa8810
[17179569.184000] ACPI: DSDT (v001  A0229 A0229000 0x INTL
0x02002026) @ 0x
[17179569.184000] ACPI: PM-Timer IO Port: 0x808
[17179569.184000] ACPI: Local APIC address 0xfee0
[17179569.184000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[17179569.184000] Processor #0 15:4 APIC version 20
[17179569.184000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
[17179569.184000] Processor #1 15:4 APIC version 20
[17179569.184000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
[17179569.184000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
[17179569.184000] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
[17179569.184000] IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
[17179569.184000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[17179569.184000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[17179569.184000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[17179569.184000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[17179569.184000] ACPI: IRQ0 used by override.
[17179569.184000] ACPI: IRQ2 used by override.
[17179569.184000] ACPI: IRQ9 used by override.
[17179569.184000] Enabling APIC mode:  Flat.  Using 1 I/O APICs
[17179569.184000] Using ACPI (MADT) for SMP configuration information
[17179569.184000] Allocating PCI resources starting at 8800 (gap:
8000:7fb0)
[17179569.184000] Built 1 zonelists
[17179569.184000] Kernel command line: root=/dev/sda2 ro quiet vga=795
[17179569.184000] mapped APIC to d000 (fee0)
[17179569.184000] mapped IOAPIC to c000 (fec0)
[17179569.184000] Initializing CPU#0
[17179569.184000] PID hash table entries: 4096 (order: 12, 65536 bytes)
[17179569.184000] Detected 3011.186 MHz processor.
[17179569.184000] Using pmtmr for high-res timesource
[17179569.184000] Console: colour dummy device 80x25
[17179572.764000] Dentry cache hash table entries: 131072 (order: 7,
524288 bytes)
[17179572.764000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[17179572.868000] Memory: 2065764k/2096768k available (2115k kernel
code, 29864k reserved, 595k data, 332k init, 1179264k highmem)
[17179572.868000] Checking if this processor honours the WP bit even
in supervisor mode... Ok.
[17179572.948000] Calibrating delay using timer specific routine..
6029.29 BogoMIPS (lpj=12058592)
[17179572.948000] Security Framework v1.0.0 initialized
[17179572.948000] SELinux:  Disabled at boot.
[17179572.948000] Mount-cache hash table entries: 512
[17179572.948000] CPU: After generic identify, caps: bfebfbff 2000

[PATCH]Fix parsing kernelcore boot option for ia64

2007-04-12 Thread Yasunori Goto
Hello.

cmdline_parse_kernelcore() should return the next pointer of boot option
like memparse() doing. If not, it is cause of eternal loop on ia64 box.
This patch is for 2.6.21-rc6-mm1.

Signed-off-by: Yasunori Goto <[EMAIL PROTECTED]>



 arch/ia64/kernel/efi.c |2 +-
 include/linux/mm.h |2 +-
 mm/page_alloc.c|4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

Index: current_test/arch/ia64/kernel/efi.c
===
--- current_test.orig/arch/ia64/kernel/efi.c2007-04-12 17:33:28.0 
+0900
+++ current_test/arch/ia64/kernel/efi.c 2007-04-13 12:13:21.0 +0900
@@ -424,7 +424,7 @@ efi_init (void)
} else if (memcmp(cp, "max_addr=", 9) == 0) {
max_addr = GRANULEROUNDDOWN(memparse(cp + 9, ));
} else if (memcmp(cp, "kernelcore=",11) == 0) {
-   cmdline_parse_kernelcore(cp+11);
+   cmdline_parse_kernelcore(cp+11, );
} else if (memcmp(cp, "min_addr=", 9) == 0) {
min_addr = GRANULEROUNDDOWN(memparse(cp + 9, ));
} else {
Index: current_test/mm/page_alloc.c
===
--- current_test.orig/mm/page_alloc.c   2007-04-12 18:25:37.0 +0900
+++ current_test/mm/page_alloc.c2007-04-13 12:12:58.0 +0900
@@ -3736,13 +3736,13 @@ void __init free_area_init_nodes(unsigne
  * kernelcore=size sets the amount of memory for use for allocations that
  * cannot be reclaimed or migrated.
  */
-int __init cmdline_parse_kernelcore(char *p)
+int __init cmdline_parse_kernelcore(char *p, char **retp)
 {
unsigned long long coremem;
if (!p)
return -EINVAL;
 
-   coremem = memparse(p, );
+   coremem = memparse(p, retp);
required_kernelcore = coremem >> PAGE_SHIFT;
 
/* Paranoid check that UL is enough for required_kernelcore */
Index: current_test/include/linux/mm.h
===
--- current_test.orig/include/linux/mm.h2007-04-11 14:15:33.0 
+0900
+++ current_test/include/linux/mm.h 2007-04-13 12:12:20.0 +0900
@@ -1051,7 +1051,7 @@ extern unsigned long find_max_pfn_with_a
 extern void free_bootmem_with_active_regions(int nid,
unsigned long max_low_pfn);
 extern void sparse_memory_present_with_active_regions(int nid);
-extern int cmdline_parse_kernelcore(char *p);
+extern int cmdline_parse_kernelcore(char *p, char **retp);
 #ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
 extern int early_pfn_to_nid(unsigned long pfn);
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */

-- 
Yasunori Goto 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/10] add "permit user mounts in new namespace" clone flag

2007-04-12 Thread Eric W. Biederman
"Serge E. Hallyn" <[EMAIL PROTECTED]> writes:

> Quoting Miklos Szeredi ([EMAIL PROTECTED]):
>> From: Miklos Szeredi <[EMAIL PROTECTED]>
>> 
>> If CLONE_NEWNS and CLONE_NEWNS_USERMNT are given to clone(2) or
>> unshare(2), then allow user mounts within the new namespace.
>> 
>> This is not flexible enough, because user mounts can't be enabled for
>> the initial namespace.
>> 
>> The remaining clone bits also getting dangerously few...
>> 
>> Alternatives are:
>> 
>>   - prctl() flag
>>   - setting through the containers filesystem
>
> Sorry, I know I had mentioned it, but this is definately my least
> favorite approach.
>
> Curious whether are any other suggestions/opinions from the containers
> list?

Given the existence of shared subtrees allowing/denying this at the mount
namespace level is silly and wrong.

If we need more than just the filesystem permission checks can we
make it a mount flag settable with mount and remount that allows
non-privileged users the ability to create mount points under it
in directories they have full read/write access to.

I don't like the use of clone flags for this purpose but in this
case the shared subtress are a much more fundamental reasons for not
doing this at the namespace level.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH Trivial Resend] [DOC] Add webpages' URL and summarize 3 lines.

2007-04-12 Thread Miguel Ojeda

Trivial patch, against -rc6. Please apply, thanks.
---

CREDITS:
- Summarize 3 lines into one.
- Add webpage.

MAINTAINERS:
- Add auxdisplay drivers/tree webpages.

CREDITS |7 +++
MAINTAINERS |4 
2 files changed, 7 insertions(+), 4 deletions(-)

Signed-off-by: Miguel Ojeda Sandonis <[EMAIL PROTECTED]>
---
diff --git a/CREDITS b/CREDITS
index 6bd8ab8..f990730 100644
--- a/CREDITS
+++ b/CREDITS
@@ -2573,10 +2573,9 @@ S: Australia

N: Miguel Ojeda Sandonis
E: [EMAIL PROTECTED]
-D: Author: Auxiliary LCD Controller driver (ks0108)
-D: Author: Auxiliary LCD driver (cfag12864b)
-D: Author: Auxiliary LCD framebuffer driver (cfag12864bfb)
-D: Maintainer: Auxiliary display drivers tree (drivers/auxdisplay/*)
+W: http://maxextreme.googlepages.com/
+D: Author of the ks0108, cfag12864b and cfag12864bfb auxiliary display drivers.
+D: Maintainer of the auxiliary display drivers tree (drivers/auxdisplay/*)
S: C/ Mieses 20, 9-B
S: Valladolid 47009
S: Spain
diff --git a/MAINTAINERS b/MAINTAINERS
index 829407f..2a658ef 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -672,6 +672,7 @@ AUXILIARY DISPLAY DRIVERS
P:  Miguel Ojeda Sandonis
M:  [EMAIL PROTECTED]
L:  [EMAIL PROTECTED]
+W: http://auxdisplay.googlepages.com/
S:  Maintained

AVR32 ARCHITECTURE
@@ -884,12 +885,14 @@ CFAG12864B LCD DRIVER
P:  Miguel Ojeda Sandonis
M:  [EMAIL PROTECTED]
L:  [EMAIL PROTECTED]
+W: http://auxdisplay.googlepages.com/
S:  Maintained

CFAG12864BFB LCD FRAMEBUFFER DRIVER
P:  Miguel Ojeda Sandonis
M:  [EMAIL PROTECTED]
L:  [EMAIL PROTECTED]
+W: http://auxdisplay.googlepages.com/
S:  Maintained

COMMON INTERNET FILE SYSTEM (CIFS)
@@ -2020,6 +2023,7 @@ KS0108 LCD CONTROLLER DRIVER
P:  Miguel Ojeda Sandonis
M:  [EMAIL PROTECTED]
L:  [EMAIL PROTECTED]
+W: http://auxdisplay.googlepages.com/
S:  Maintained

LAPB module

--
Miguel Ojeda
http://maxextreme.googlepages.com/index.htm
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [KJ][PATCH 02/03]ROUND_UP|DOWN macro cleanup in arch/ia64,x86_64

2007-04-12 Thread Milind Arun Choudhary
On 14:13 Thu 12 Apr , Luck, Tony wrote:
> On Fri, Apr 13, 2007 at 02:01:40AM +0530, Milind Arun Choudhary wrote:
> > -   size = ROUNDUP(size, iovp_size);
> > +   size = ALIGN(size, iovp_size);
> 
> Why is "ALIGN" better than "ROUNDUP"?  I can't see any point
> to this change.
Its a janitorial work. I'm trying to celanup 
all the corners where ROUNDUP/DOWN & likes are defined.
Kernel.h currently has macros like ALIGN roundup DIV_ROUND_UP.
in this patch series I've added ALIGN_DOWN & round_down
[waiting for comments on the same.]

So as ALIGN macro does the same work as ROUNDUP,
  is at a common place
  & is accessible to everyone
it should be used instead...i think

-- 
Milind Arun Choudhary
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Feature Request?] Inline compression of process core dumps

2007-04-12 Thread Jeff Dike
On Thu, Apr 12, 2007 at 10:57:37PM -0400, Christopher S. Aker wrote:
> The process is a UML instance (skas mode, so at least a kernel, 
> userspace, and io thread), which will generate a single, usable, core 
> file just fine with a non-pipe core_pattern...

Yeah, but can you get a core file without the .pid on the end?  I just
tried, with core_pattern == core and core_uses_pid == 0, and I still
got core.pid.

I can fix this on my end - just have to kill off a bunch of things
before aborting.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/10] add "permit user mounts in new namespace" clone flag

2007-04-12 Thread Herbert Poetzl
On Thu, Apr 12, 2007 at 03:32:08PM -0500, Serge E. Hallyn wrote:
> Quoting Miklos Szeredi ([EMAIL PROTECTED]):
> > From: Miklos Szeredi <[EMAIL PROTECTED]>
> > 
> > If CLONE_NEWNS and CLONE_NEWNS_USERMNT are given to clone(2) or
> > unshare(2), then allow user mounts within the new namespace.

> > This is not flexible enough, because user mounts can't be enabled for
> > the initial namespace.
> > 
> > The remaining clone bits also getting dangerously few...

ATM I think we do not have that many CLONE flags
available, so that this feature will have to wait
for a clone2/64 or similar ...

> > Alternatives are:
> > 
> >   - prctl() flag
> >   - setting through the containers filesystem

> Sorry, I know I had mentioned it, but this is definately my least
> favorite approach.
> 
> Curious whether are any other suggestions/opinions from the containers
> list?

question: how is mounting filesystems (loopback,
fuse, etc) secured in such way that the user
cannot 'create' device nodes with 'unfortunate'
permissions?

TIA,
Herbert

> thanks,
> -serge
> 
> > Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
> > ---
> > 
> > Index: linux/fs/namespace.c
> > ===
> > --- linux.orig/fs/namespace.c   2007-04-12 13:46:19.0 +0200
> > +++ linux/fs/namespace.c2007-04-12 13:54:36.0 +0200
> > @@ -1617,6 +1617,8 @@ struct mnt_namespace *copy_mnt_ns(int fl
> > return ns;
> > 
> > new_ns = dup_mnt_ns(ns, new_fs);
> > +   if (new_ns && (flags & CLONE_NEWNS_USERMNT))
> > +   new_ns->flags |= MNT_NS_PERMIT_USERMOUNTS;
> > 
> > put_mnt_ns(ns);
> > return new_ns;
> > Index: linux/include/linux/sched.h
> > ===
> > --- linux.orig/include/linux/sched.h2007-04-12 13:26:48.0 
> > +0200
> > +++ linux/include/linux/sched.h 2007-04-12 13:54:36.0 +0200
> > @@ -26,6 +26,7 @@
> >  #define CLONE_STOPPED  0x0200  /* Start in stopped 
> > state */
> >  #define CLONE_NEWUTS   0x0400  /* New utsname group? */
> >  #define CLONE_NEWIPC   0x0800  /* New ipcs */
> > +#define CLONE_NEWNS_USERMNT0x1000  /* Allow user mounts in 
> > ns? */
> > 
> >  /*
> >   * Scheduling policies
> > Index: linux/kernel/fork.c
> > ===
> > --- linux.orig/kernel/fork.c2007-04-11 18:27:46.0 +0200
> > +++ linux/kernel/fork.c 2007-04-12 13:59:10.0 +0200
> > @@ -1586,7 +1586,7 @@ asmlinkage long sys_unshare(unsigned lon
> > err = -EINVAL;
> > if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
> > CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
> > -   CLONE_NEWUTS|CLONE_NEWIPC))
> > +   CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNS_USERMNT))
> > goto bad_unshare_out;
> > 
> > if ((err = unshare_thread(unshare_flags)))
> > 
> > --
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.linux-foundation.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xmon debugger doc?

2007-04-12 Thread Michael Ellerman
On Thu, 2007-04-12 at 22:48 -0500, Olof Johansson wrote:
> On Thu, Apr 12, 2007 at 03:44:07PM -0500, Steve Wise wrote:
> > Can someone please point me at ppc64 xmon debugger usage /
> > documentation? I've had little luck finding info on-line.
> 
> The help output from it is pretty much all there is.
> 
> You might have better luck asking on [EMAIL PROTECTED] though
> (adding as Cc).
> 
> There's also an old writeup at
> http://mbligh.org/linuxdocs/Kernel/DebuggingPPC64 for the very basics
> of digging through a crash. Some of it is likely out of date by now.

A good trick which the help output doesn't mention is that % and $ are
special in input, so you can do:

0:mon> di %pc

disassemble instructions at address pointed to by register PC

Other regs are eg: %lr, %r1, %r12.
And it works with di, d and other commands.

Also:

0:mon> di $.xmon_register_spus

disassemble instructions at address of symbol .xmon_register_spus.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Andrew Morton wrote:

On Fri, 13 Apr 2007 12:18:56 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:



I guess one could generate an answer to the static question with systemtap,
by accumulating running counts across the application lifetime and then
snapshotting them.  Sounds hard though.


Can't you just traverse arbitrary kernel data structures at a given point
in time, exactly like the /proc/ call is doing?



Do a full pagetable walk, with all the associated locking from within
a systemtap script?  I'd be surprised.  Maybe if it's mostly hand-coded
in C, perhaps.  Then you just end up with the same thing, don't you?


And my problem isn't with the hardcoded pagetable walker. Yeah, we'd
probably still keep the pagetable callback walker thingy with Matt's
associated cleanups (and my subsequent ones to clean it up more and
move it to mm/): there are other in-kernel users for that anyway.

The point is the proc API, and exposing random little parts of deep
kernel internals that some people happen to find useful at the time.
(which is why we have an incredible proliferation of these things).

With systemtap scripts, you could walk pagetables and print *the exact
page information you want*, or you could walk pfns, or LRU, or page_tree,
or walk the page tree then the rmap structures. And you can selectively
cull out items you don't care about if you only care about a subset of
items, based on arbitrary criteria. And you can most likely do all that
more efficiently than with a conglomeration of various /proc files
(assuming they even provide what you want in the first place).

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xmon debugger doc?

2007-04-12 Thread Olof Johansson
On Thu, Apr 12, 2007 at 03:44:07PM -0500, Steve Wise wrote:
> Can someone please point me at ppc64 xmon debugger usage /
> documentation? I've had little luck finding info on-line.

The help output from it is pretty much all there is.

You might have better luck asking on [EMAIL PROTECTED] though
(adding as Cc).

There's also an old writeup at
http://mbligh.org/linuxdocs/Kernel/DebuggingPPC64 for the very basics
of digging through a crash. Some of it is likely out of date by now.


-Olof
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: HPA patches

2007-04-12 Thread Kyle McMartin
On Fri, Mar 23, 2007 at 01:03:15PM -0700, Randy Dunlap wrote:
> > It's 0x40. Its a "command dependant bit" - no useful name.
> 
> dependent.  OK, thanks.
> 

Hi,

Pondering about this, it's ATA_LBA according to the docs, specifying
that the address is an LBA.

Cheers,
Kyle
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Nick Piggin wrote:

Andrew Morton wrote:



 Then you just end up with the same thing, don't you?



Well _you_ do, because that happens to be exactly what you want. Bill
ends up with something that displays page_mapcount instead. And I
end up with something that traverses LRU lists rather than pfns. And
none of it goes in /proc/ or linux-2.6/.


Oh, and you get to change it without recompiling and rebooting your
kernel.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] USB: BandRich BandLuxe HSDPA Data Card Driver

2007-04-12 Thread Leon Leong
This patch adds the detection for the BandRich BandLuxe C100/C100S/C120 
HSDPA Data Card.  With the vendor and product IDs are set properly, 
the data card can be detected and works fine.
It was patched based on Kernel 2.6.20.1.

Signed-off-by: Leon Leong <[EMAIL PROTECTED]>

---

Index: drivers/usb/serial/option.c
===
--- linux-2.6.20.1/drivers/usb/serial/option.c.orig 2007-02-05
02:44:54.0 +0800
+++ linux-2.6.20.1/drivers/usb/serial/option.c  2007-04-13
10:36:33.0 +0800
@@ -72,6 +72,7 @@ static int  option_send_setup(struct usb
 #define AUDIOVOX_VENDOR_ID  0x0F3D
 #define NOVATELWIRELESS_VENDOR_ID   0x1410
 #define ANYDATA_VENDOR_ID   0x16d5
+#define BANDRICH_VENDOR_ID  0x1A8D
 
 #define OPTION_PRODUCT_OLD  0x5000
 #define OPTION_PRODUCT_FUSION   0x6000
@@ -84,6 +85,8 @@ static int  option_send_setup(struct usb
 #define AUDIOVOX_PRODUCT_AIRCARD0x0112
 #define NOVATELWIRELESS_PRODUCT_U7400x1400
 #define ANYDATA_PRODUCT_ID  0x6501
+#define BANDRICH_PRODUCT_C100_1 0x1002
+#define BANDRICH_PRODUCT_C100_2 0x1003
 
 static struct usb_device_id option_ids[] = {
{ USB_DEVICE(OPTION_VENDOR_ID, OPTION_PRODUCT_OLD) },
@@ -97,6 +100,8 @@ static struct usb_device_id option_ids[]
{ USB_DEVICE(AUDIOVOX_VENDOR_ID, AUDIOVOX_PRODUCT_AIRCARD) },

{ USB_DEVICE(NOVATELWIRELESS_VENDOR_ID,NOVATELWIRELESS_PRODUCT_U740) },
{ USB_DEVICE(ANYDATA_VENDOR_ID, ANYDATA_PRODUCT_ID) },
+   { USB_DEVICE(BANDRICH_VENDOR_ID, BANDRICH_PRODUCT_C100_1) },
+   { USB_DEVICE(BANDRICH_VENDOR_ID, BANDRICH_PRODUCT_C100_2) },
{ } /* Terminating entry */
 };
 
@@ -112,6 +117,8 @@ static struct usb_device_id option_ids1[
{ USB_DEVICE(AUDIOVOX_VENDOR_ID, AUDIOVOX_PRODUCT_AIRCARD) },

{ USB_DEVICE(NOVATELWIRELESS_VENDOR_ID,NOVATELWIRELESS_PRODUCT_U740) },
{ USB_DEVICE(ANYDATA_VENDOR_ID, ANYDATA_PRODUCT_ID) },
+   { USB_DEVICE(BANDRICH_VENDOR_ID, BANDRICH_PRODUCT_C100_1) },
+   { USB_DEVICE(BANDRICH_VENDOR_ID, BANDRICH_PRODUCT_C100_2) },
{ } /* Terminating entry */
 };

===

-
Leon Leong
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Feature Request?] Inline compression of process core dumps

2007-04-12 Thread Christopher S. Aker

Randy Dunlap wrote:

On Thu, 12 Apr 2007 22:22:18 -0400 Christopher S. Aker wrote:


Alan Cox wrote:
 > Indeed. So useful that in current kernels you can set the core dump
 > path to be
 >
 >   "|application"

Cool stuff!  However, it's not working (2.6.20.6):

Core dump to |/home/caker/bin/dumper.pl.4442 pipe failed

even though...

# cat /proc/sys/kernel/core_uses_pid
0
# cat /proc/sys/kernel/core_pattern
|/home/caker/bin/dumper.pl

Looking at the code, it seems to me that format_corename() is appending 
.pid, regardless if !core_uses_pid and corename[0]=='|', in which case 
it creates an invalid path for call_usermodehelper_pipe().


Bug in the code, or bug in my methods?


What are you trying to dump?  is it a multi-thread group app,
not a "simple" app?  I ask because of this (I'm looking at 2.6.21-rc6)
 reference (not that I know what that is):

if (!pid_in_pattern
&& (core_uses_pid || atomic_read(>mm->mm_users) != 1)) {
rc = snprintf(out_ptr, out_end - out_ptr,
  ".%d", current->tgid);
if (rc > out_end - out_ptr)
goto out;
out_ptr += rc;
}


I saw that too, and unfortunately I don't know what what that condition 
represents, either.  It's the only other element in that if statement 
that could make it take that path, so I'm assuming that's part of the 
problem.


The process is a UML instance (skas mode, so at least a kernel, 
userspace, and io thread), which will generate a single, usable, core 
file just fine with a non-pipe core_pattern...


-Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Stop pmac_zilog from abusing 8250's device numbers.

2007-04-12 Thread David Lang


On Thu, 12 Apr 2007, Gerhard Mack wrote:


Sometimes it's not the speed it's the cost.. The best I've ever done is
5.5 interfaces per u/ Although with a better motherboard and case it might
have been different.


I have a bunch of servers from rackable, dual core cpu, 2G ram 2xgigE on the 
motherboard (1x100M on motherboard), 4x Intel E1000 quad port cards, 120G SATA 
drive, DVD burner, floppy


3u, 18 gig ports, just under $5k

if you have 36" deep racks you can put them back to back and have two of these 
in 3u (12 gig ports per u)


not nessasarily the cheapest available, but they've been reliable, and there's 
pleanty of CPU and ram to handle firewall tasks.


besides, sometimes you don't want to trust the closed-source vlan 
implementations on the switches ;-)


David Lang


http://innerfire.net/pics/projects/21portfirewall_2.jpg
(assigns each port it's ip range and blocks any address not assigned to
that port)


On Thu, 12 Apr 2007, Roland Dreier wrote:


Date: Thu, 12 Apr 2007 08:34:40 -0700
From: Roland Dreier <[EMAIL PROTECTED]>
To: Benny Amorsen <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: [PATCH] Stop pmac_zilog from abusing 8250's device numbers.

> Indeed, port density is disappointingly poor in modern servers. Do you
> know any with more than 14 ports per U? (That's an MBX 1U server with
> 8 on-board and a 6-port expansion).

If you really need a ton of ports you could probably build a 1U server
with 2 * 2-port 10gig NICs, and use VLAN-capable switches with 10gig
and 1gig ports to fan out each 10gig link from your server to 10 1-gig
ports.  That would get you 40 ports of 1-gig from each server (plus
whatever the server has on board).

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
Gerhard Mack

[EMAIL PROTECTED]

<>< As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Matt Mackall wrote:

On Fri, Apr 13, 2007 at 12:21:25PM +1000, Nick Piggin wrote:


Matt Mackall wrote:


On Fri, Apr 13, 2007 at 11:42:29AM +1000, Nick Piggin wrote:



If kprobes is simply crappy and doesn't work properly for this, then I
could accept that. I'm not someone trying to get this info. So why can't
it be used? (not just for kpagemap, but for clear_refs and all that gunk
too).



kprobes is good for looking at events, but bad for looking at state.
Especially metric shitloads of state.


Why? Why is a kprobes trap significantly more expensive than a read
syscall?



I guess I'm not clear on what you're proposing. From my understanding
of kprobes (admittedly not an expert), this is hard to do and not a
very good match.


But you have an idea that it is bad for exposing lots of data. Why?
(I'm not a kprobes expert either, these are not rhetorical questions)

From what it looks like, you can traverse data structures and copy data
back to userspace. Which is what makes me think it might be suitable
(or could be made suitable).



Maybe. How about LRU? Reclaim performance is bad, and you want to work out
which pages keep going off the end of it, or which pages keep getting
written out via it, or who's pages are on the active list, forcing mine
out.



Those are actually probably a good match for systemtap as they're all 
events.


Traverse the LRU? Which files do they belong to? What process maps them?



-ENOPARSE.


Basically, any "stuff" other than what you're exposing.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Andrew Morton wrote:

On Fri, 13 Apr 2007 12:18:56 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:



I guess one could generate an answer to the static question with systemtap,
by accumulating running counts across the application lifetime and then
snapshotting them.  Sounds hard though.


Can't you just traverse arbitrary kernel data structures at a given point
in time, exactly like the /proc/ call is doing?



Do a full pagetable walk, with all the associated locking from within
a systemtap script?  I'd be surprised.  Maybe if it's mostly hand-coded
in C, perhaps.


It looks like you can traverse arbitrary data structures, yes.

It definitely seems like you can use some kernel functions, but the
ones I saw may just be systemtap facilities. But what is so surprising
about being able to call a kernel function when running in kernel
context? Perhaps there is some fundamental limitation of kprobes that
I don't understand.


 Then you just end up with the same thing, don't you?


Well _you_ do, because that happens to be exactly what you want. Bill
ends up with something that displays page_mapcount instead. And I
end up with something that traverses LRU lists rather than pfns. And
none of it goes in /proc/ or linux-2.6/.

So it isn't really the same thing at all.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Stop pmac_zilog from abusing 8250's device numbers.

2007-04-12 Thread Gerhard Mack
Sometimes it's not the speed it's the cost.. The best I've ever done is 
5.5 interfaces per u/ Although with a better motherboard and case it might 
have been different.

http://innerfire.net/pics/projects/21portfirewall_2.jpg
(assigns each port it's ip range and blocks any address not assigned to 
that port)


On Thu, 12 Apr 2007, Roland Dreier wrote:

> Date: Thu, 12 Apr 2007 08:34:40 -0700
> From: Roland Dreier <[EMAIL PROTECTED]>
> To: Benny Amorsen <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
> Subject: Re: [PATCH] Stop pmac_zilog from abusing 8250's device numbers.
> 
>  > Indeed, port density is disappointingly poor in modern servers. Do you
>  > know any with more than 14 ports per U? (That's an MBX 1U server with
>  > 8 on-board and a 6-port expansion).
> 
> If you really need a ton of ports you could probably build a 1U server
> with 2 * 2-port 10gig NICs, and use VLAN-capable switches with 10gig
> and 1gig ports to fan out each 10gig link from your server to 10 1-gig
> ports.  That would get you 40 ports of 1-gig from each server (plus
> whatever the server has on board).
> 
>  - R.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
Gerhard Mack

[EMAIL PROTECTED]

<>< As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cheap lock for user mode processes release when process exits

2007-04-12 Thread Andrew Morton
On Fri, 13 Apr 2007 01:54:28 + (GMT) <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> Maybe someone here knows better.
> 
> I have several user-mode processes using shared mmap.  There can be several 
> reader processes and only one writer.  Readers access the shared region 
> frequently, writer seldom.
> 
> Naturally, multi-reader/single-writer locks works best.  I tried this with 
> futex on 2.6.9-42.EL.  However, if one of the processes is killed/exits, the 
> lock doesn't get released.
> 
> I can trap the signal to release the lock, but not all signals like kill.
> 
> Anyway I can achieve this without a potential deadlock?
> 

Robust futexes: http://lwn.net/Articles/172149/

But I don't know whether RH backported them into RHEL4.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kernel-discuss] Re: [PATCH 3/7] [RFC] Battery monitoring class

2007-04-12 Thread Anton Vorontsov
On Thu, Apr 12, 2007 at 10:34:06PM -0400, Shem Multinymous wrote:
> Hi,
> 
> On 4/12/07, Henrique de Moraes Holschuh <[EMAIL PROTECTED]> wrote:
> >On Fri, 13 Apr 2007, Anton Vorontsov wrote:
> >> * Yup, I've read last discussion regarding batteries, and I've seen
> >>   objections against "charge" term, quoting Shem Multinymous:
> >>
> >>   "And, for the reasons I explained earlier, I strongly suggest not using
> >>   the term "charge" except when referring to the action of charging.
> >>   Hence:
> >>   s/charge_rate/rate/;  s/charge/capacity/"
> >>
> >>   But lets think about it once again? We'll make things much cleaner
> >>   if we'll drop "capacity" at all.
> >
> >I stand with Shem on this one.  The people behind the SBS specification
> >seems to agree... that specification is aimed at *engineers* and still
> >avoids the obvious trap of using "charge" due to its high potential for
> >confusion.
> >
> >I don't even want to know how much of a mess the people writing applets
> >woudl make of it...
> 
> With fixed-units files, having *_energy and *_capacity isn't too clear
> either... Nor is it consistent with SBS, since SBS uses "capacity" to
> refer to either energy or charge, depending on a units attribute.
> 
> As a compromise, how about using "energy" and "charge" for quantities,
> and "charging" (i.e., a verb) when referring to the operation?

It would be great compromise! Please please please!

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.org/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Feature Request?] Inline compression of process core dumps

2007-04-12 Thread Randy Dunlap
On Thu, 12 Apr 2007 22:22:18 -0400 Christopher S. Aker wrote:

> Alan Cox wrote:
>  > Indeed. So useful that in current kernels you can set the core dump
>  > path to be
>  >
>  >"|application"
> 
> Cool stuff!  However, it's not working (2.6.20.6):
> 
>   Core dump to |/home/caker/bin/dumper.pl.4442 pipe failed
> 
> even though...
> 
>   # cat /proc/sys/kernel/core_uses_pid
>   0
>   # cat /proc/sys/kernel/core_pattern
>   |/home/caker/bin/dumper.pl
> 
> Looking at the code, it seems to me that format_corename() is appending 
> .pid, regardless if !core_uses_pid and corename[0]=='|', in which case 
> it creates an invalid path for call_usermodehelper_pipe().
> 
> Bug in the code, or bug in my methods?

What are you trying to dump?  is it a multi-thread group app,
not a "simple" app?  I ask because of this (I'm looking at 2.6.21-rc6)
 reference (not that I know what that is):

if (!pid_in_pattern
&& (core_uses_pid || atomic_read(>mm->mm_users) != 1)) {
rc = snprintf(out_ptr, out_end - out_ptr,
  ".%d", current->tgid);
if (rc > out_end - out_ptr)
goto out;
out_ptr += rc;
}

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Matt Mackall
On Fri, Apr 13, 2007 at 12:21:25PM +1000, Nick Piggin wrote:
> Matt Mackall wrote:
> >On Fri, Apr 13, 2007 at 11:42:29AM +1000, Nick Piggin wrote:
> 
> >>If kprobes is simply crappy and doesn't work properly for this, then I
> >>could accept that. I'm not someone trying to get this info. So why can't
> >>it be used? (not just for kpagemap, but for clear_refs and all that gunk
> >>too).
> >
> >
> >kprobes is good for looking at events, but bad for looking at state.
> >Especially metric shitloads of state.
> 
> Why? Why is a kprobes trap significantly more expensive than a read
> syscall?

I guess I'm not clear on what you're proposing. From my understanding
of kprobes (admittedly not an expert), this is hard to do and not a
very good match.
 
> >>Maybe. How about LRU? Reclaim performance is bad, and you want to work out
> >>which pages keep going off the end of it, or which pages keep getting
> >>written out via it, or who's pages are on the active list, forcing mine
> >>out.
> >
> >
> >Those are actually probably a good match for systemtap as they're all 
> >events.
> 
> Traverse the LRU? Which files do they belong to? What process maps them?

-ENOPARSE.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kernel-discuss] Re: [PATCH 3/7] [RFC] Battery monitoring class

2007-04-12 Thread Shem Multinymous

Hi,

On 4/12/07, Henrique de Moraes Holschuh <[EMAIL PROTECTED]> wrote:

On Fri, 13 Apr 2007, Anton Vorontsov wrote:
> * Yup, I've read last discussion regarding batteries, and I've seen
>   objections against "charge" term, quoting Shem Multinymous:
>
>   "And, for the reasons I explained earlier, I strongly suggest not using
>   the term "charge" except when referring to the action of charging.
>   Hence:
>   s/charge_rate/rate/;  s/charge/capacity/"
>
>   But lets think about it once again? We'll make things much cleaner
>   if we'll drop "capacity" at all.

I stand with Shem on this one.  The people behind the SBS specification
seems to agree... that specification is aimed at *engineers* and still
avoids the obvious trap of using "charge" due to its high potential for
confusion.

I don't even want to know how much of a mess the people writing applets
woudl make of it...


With fixed-units files, having *_energy and *_capacity isn't too clear
either... Nor is it consistent with SBS, since SBS uses "capacity" to
refer to either energy or charge, depending on a units attribute.

As a compromise, how about using "energy" and "charge" for quantities,
and "charging" (i.e., a verb) when referring to the operation?

BTW,  tp_smapi uses "charge" and "charging" interchangeably; that was
a  mistake.

 Shem
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Andrew Morton
On Fri, 13 Apr 2007 12:18:56 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:

> > I guess one could generate an answer to the static question with systemtap,
> > by accumulating running counts across the application lifetime and then
> > snapshotting them.  Sounds hard though.
> 
> Can't you just traverse arbitrary kernel data structures at a given point
> in time, exactly like the /proc/ call is doing?

Do a full pagetable walk, with all the associated locking from within
a systemtap script?  I'd be surprised.  Maybe if it's mostly hand-coded
in C, perhaps.  Then you just end up with the same thing, don't you?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


D-Link DFE-580TX 4 port Server Adapter problem: only 2 of 4 ports

2007-04-12 Thread Pallai Roland

 I've got a problem with my DFE-580TX cards when I installed thoose into
a new server box. One card has been worked before in a test box,
it's sure, here is a dmesg snippet when everything was OK:

Apr  3 22:10:38 cyrax kernel: sundance.c:v1.2 11-Sep-2006  Written by Donald 
Becker
Apr  3 22:10:38 cyrax kernel:   http://www.scyld.com/network/sundance.html
Apr  3 22:10:38 cyrax kernel: ACPI: PCI Interrupt :02:04.0[A] -> GSI 21 
(level, low) -> IRQ 23
Apr  3 22:10:38 cyrax kernel: eth2: D-Link DFE-580TX 4 port Server Adapter at 
0001a800, 00:0d:88:cc:da:dc, IRQ 23.
Apr  3 22:10:38 cyrax kernel: eth2: MII PHY found at address 1, status 0x7809 
advertising 01e1.
Apr  3 22:10:38 cyrax kernel: ACPI: PCI Interrupt :02:05.0[A] -> GSI 22 
(level, low) -> IRQ 18
Apr  3 22:10:39 cyrax kernel: eth3: D-Link DFE-580TX 4 port Server Adapter at 
0001b000, 00:0d:88:cc:da:dd, IRQ 18.
Apr  3 22:10:39 cyrax kernel: eth3: MII PHY found at address 1, status 0x7809 
advertising 01e1.
Apr  3 22:10:39 cyrax kernel: ACPI: PCI Interrupt :02:06.0[A] -> GSI 23 
(level, low) -> IRQ 19
Apr  3 22:10:39 cyrax kernel: eth4: D-Link DFE-580TX 4 port Server Adapter at 
0001b400, 00:0d:88:cc:da:de, IRQ 19.
Apr  3 22:10:39 cyrax kernel: eth4: MII PHY found at address 1, status 0x7809 
advertising 01e1.
Apr  3 22:10:39 cyrax kernel: ACPI: PCI Interrupt :02:07.0[A] -> GSI 20 
(level, low) -> IRQ 20
Apr  3 22:10:39 cyrax kernel: eth5: D-Link DFE-580TX 4 port Server Adapter at 
0001b800, 00:0d:88:cc:da:df, IRQ 20.
Apr  3 22:10:39 cyrax kernel: eth5: MII PHY found at address 1, status 0x7809 
advertising 01e1.

 And the current dmesg from the new box, when I've only 2 of 4 ports on each 
card:

sundance.c:v1.2 11-Sep-2006  Written by Donald Becker
  http://www.scyld.com/network/sundance.html
ACPI: PCI Interrupt :05:04.0[A] -> GSI 21 (level, low) -> IRQ 22
eth2: D-Link DFE-580TX 4 port Server Adapter at 00012180, 00:00:00:00:00:00, 
IRQ 22.
eth2: No MII transceiver found, aborting.  ASIC status 
ACPI: PCI Interrupt :05:05.0[A] -> GSI 22 (level, low) -> IRQ 23
eth2: D-Link DFE-580TX 4 port Server Adapter at 00012100, 00:00:00:00:00:00, 
IRQ 23.
eth2: No MII transceiver found, aborting.  ASIC status 
ACPI: PCI Interrupt :05:06.0[A] -> GSI 23 (level, low) -> IRQ 19
eth2: D-Link DFE-580TX 4 port Server Adapter at 00012080, 00:0d:88:cc:da:ee, 
IRQ 19.
eth2: MII PHY found at address 1, status 0x7809 advertising 01e1.
ACPI: PCI Interrupt :05:07.0[A] -> GSI 20 (level, low) -> IRQ 21
eth3: D-Link DFE-580TX 4 port Server Adapter at 00012000, 00:0d:88:cc:da:ef, 
IRQ 21.
eth3: MII PHY found at address 1, status 0x7809 advertising 01e1.
ACPI: PCI Interrupt :06:04.0[A] -> GSI 22 (level, low) -> IRQ 23
eth4: D-Link DFE-580TX 4 port Server Adapter at 00011180, 00:00:00:00:00:00, 
IRQ 23.
eth4: No MII transceiver found, aborting.  ASIC status 
ACPI: PCI Interrupt :06:05.0[A] -> GSI 21 (level, low) -> IRQ 22
eth4: D-Link DFE-580TX 4 port Server Adapter at 00011100, 00:00:00:00:00:00, 
IRQ 22.
eth4: No MII transceiver found, aborting.  ASIC status 
ACPI: PCI Interrupt :06:06.0[A] -> GSI 20 (level, low) -> IRQ 21
eth4: D-Link DFE-580TX 4 port Server Adapter at 00011080, 00:0d:88:cc:da:de, 
IRQ 21.
eth4: MII PHY found at address 1, status 0x7809 advertising 01e1.
ACPI: PCI Interrupt :06:07.0[A] -> GSI 23 (level, low) -> IRQ 19
eth5: D-Link DFE-580TX 4 port Server Adapter at 00011000, 00:0d:88:cc:da:df, 
IRQ 19.
eth5: MII PHY found at address 1, status 0x7809 advertising 01e1.


 Kernel version is vanilla 2.6.20.3, Sundance MMIO disabled in the config. I
can send lspci, full dmesg, .config if anyone interested in. Maybe it's a BIOS 
problem?
What should I try?


thanks,
--
 d

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Matt Mackall wrote:

On Thu, Apr 12, 2007 at 06:57:23PM -0700, Andrew Morton wrote:


I guess one could generate an answer to the static question with systemtap,
by accumulating running counts across the application lifetime and then
snapshotting them.  Sounds hard though.



You'd have to do it from boot onward to get a complete system image.
One way to look at it is that systemtap can give you the derivative of
the information, and you have to integrate it.


So everyone keeps saying.

Would you tell me why you can't just traverse the data structures
in the same way as your proc handler? From the systemtap example
scripts it seems like you can traverse arbitrary kernel data
structures.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] i386 - pte update optimizations

2007-04-12 Thread Zachary Amsden

H. Peter Anvin wrote:

Zachary Amsden wrote:

Some PTE optimizations for native and paravirt-ops kernels; this
provides a huge win for shadow mode hypervisors and gets rid of
some unnecessary atomic instructions in native kernels, saving
even more on UP by getting rid of implicit LOCK on xchg instruction.


You do know that P6 and higher don't do locked bus references as long 
as the value is in the cache, right?


Yes.  Even then, last time I clocked instructions, xchg was still slower 
than read / write, although I could be misremembering.  And it's not 
totally clear that they will always be in cached state, however, and for 
SMP, we still want to drop the implicit lock in cases where the 
processor might not know they are cached exclusive, but we know there 
are no other racing users.  And there are plenty of old processors out 
there to still make it worthwhile.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Feature Request?] Inline compression of process core dumps

2007-04-12 Thread Christopher S. Aker

Alan Cox wrote:
> Indeed. So useful that in current kernels you can set the core dump
> path to be
>
>"|application"

Cool stuff!  However, it's not working (2.6.20.6):

Core dump to |/home/caker/bin/dumper.pl.4442 pipe failed

even though...

# cat /proc/sys/kernel/core_uses_pid
0
# cat /proc/sys/kernel/core_pattern
|/home/caker/bin/dumper.pl

Looking at the code, it seems to me that format_corename() is appending 
.pid, regardless if !core_uses_pid and corename[0]=='|', in which case 
it creates an invalid path for call_usermodehelper_pipe().


Bug in the code, or bug in my methods?

-Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-12 Thread Patrick McHardy
Waskiewicz Jr, Peter P wrote:
>>Still leaks the device
> 
> 
> I explained this in a previous response, and you seemed to be ok with
> the explanation.  Can you elaborate if this is still an issue?


I'm OK with allocating subqueues even for single queue devices, not
with leaking memory on error.

>>I stand by my point, this needs to be explicitly enabled by 
>>the user since it changes the behaviour of prio on multiqueue 
>>capable device.
> 
> 
> The user can enable this by using TC filters. I do understand what you
> mean that PRIO's behavior somewhat changes because of how the queues
> turn off and on, but how is this different than today?  Today, if the
> queue on the NIC shuts down, all the PRIO queues are down.


Right. And if the queue is enabled again, bands continue to be dequeued
in strict priority order.

> This way
> it's actually helping get traffic out.  I don't see how the user can
> control which queue is shut down; that is a function of how congested
> the network is.  So if I can clarify what you're saying, are you asking
> that the user actually setup the band to queue mapping?  Because if so,
> I don't see how that would help since queues today don't have any
> priority, and you would have no control which one stops over another
> one.


No, I'm asking that the users explicitly states that he wants the driver
to control which bands are dequeued (by stopping and starting subqueues)
and not the established strict priority order. You assume everyone using
prio on e1000 wants to use multiple HW queues, which is not necessarily
true. Additionally the prio qdisc might be used as child of a classful
qdisc that assumes it can always dequeue packets as long as q.qlen > 0
(HFSC for example will complain if it can't since that is a
configuration error).

So I'm asking that you only enable this behaviour if the user does
something like this:

tc qdisc add dev eth0 root handle 1: prio bands N multiqueue

Ideally the band2queue mapping would be supplied by userspace as well,
but currently that doesn't seem to be possible in a clean way since
userspace has no way of finding out how many queues the HW supports.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Matt Mackall wrote:

On Fri, Apr 13, 2007 at 11:42:29AM +1000, Nick Piggin wrote:



If kprobes is simply crappy and doesn't work properly for this, then I
could accept that. I'm not someone trying to get this info. So why can't
it be used? (not just for kpagemap, but for clear_refs and all that gunk
too).



kprobes is good for looking at events, but bad for looking at state.
Especially metric shitloads of state.


Why? Why is a kprobes trap significantly more expensive than a read
syscall?


Maybe. How about LRU? Reclaim performance is bad, and you want to work out
which pages keep going off the end of it, or which pages keep getting
written out via it, or who's pages are on the active list, forcing mine
out.



Those are actually probably a good match for systemtap as they're all events.


Traverse the LRU? Which files do they belong to? What process maps them?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Andrew Morton wrote:

On Fri, 13 Apr 2007 11:42:29 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:



Maybe. How about LRU? Reclaim performance is bad, and you want to work out
which pages keep going off the end of it, or which pages keep getting
written out via it, or who's pages are on the active list, forcing mine
out.



I guess we have static analysis versus dynamic.  The interfaces which Matt
is proposing are suited to answering the question "what is my memory being
used for" (static).  They're unlikely to be useful for answering the question
"what's happening in the VM" (dynamic).  Systemtap is probably better for the
dynamic analysis.


"what is my memory being used for *now*" ;)



I guess one could generate an answer to the static question with systemtap,
by accumulating running counts across the application lifetime and then
snapshotting them.  Sounds hard though.


Can't you just traverse arbitrary kernel data structures at a given point
in time, exactly like the /proc/ call is doing?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kernel-discuss] Re: [PATCH 3/7] [RFC] Battery monitoring class

2007-04-12 Thread Anton Vorontsov
On Thu, Apr 12, 2007 at 09:51:12PM -0300, Henrique de Moraes Holschuh wrote:
> On Fri, 13 Apr 2007, Anton Vorontsov wrote:
> > Let's name attributes with mWh units as {min_,max_,design_,}energy,
> > and attributes with mAh units as {min_,max_,design_,}charge.
> 
> [...]
> 
> > * Yup, I've read last discussion regarding batteries, and I've seen
> >   objections against "charge" term, quoting Shem Multinymous:
> > 
> >   "And, for the reasons I explained earlier, I strongly suggest not using
> >   the term "charge" except when referring to the action of charging.
> >   Hence:
> >   s/charge_rate/rate/;  s/charge/capacity/"
> > 
> >   But lets think about it once again? We'll make things much cleaner
> >   if we'll drop "capacity" at all.
> 
> I stand with Shem on this one.  The people behind the SBS specification
> seems to agree... that specification is aimed at *engineers* and still
> avoids the obvious trap of using "charge" due to its high potential for
> confusion.
> 
> I don't even want to know how much of a mess the people writing applets
> woudl make of it...

:-(

Okay, term "charge" is out of scope, I guess. But can we use "capacity"
for xAh, and "energy" for xWh? I just trying to separate these terms
somehow, and avoid "_units" stuff.

> 
> > > That said, you may need to use uWh and uAh instead of mAh and mWh, though.
> > 
> > Not sure. Is there any existing chip that can report uAh/uWh? That is
> > great precision.
> 
> The way things are going, it should be feasible for small embedded systems
> quite soon.  Refer to the previous thread.

I see... is it also applicable to currents and voltages? I.e. should we
use uA and uV from the start?

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.org/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Matt Mackall
On Thu, Apr 12, 2007 at 06:57:23PM -0700, Andrew Morton wrote:
> I guess one could generate an answer to the static question with systemtap,
> by accumulating running counts across the application lifetime and then
> snapshotting them.  Sounds hard though.

You'd have to do it from boot onward to get a complete system image.
One way to look at it is that systemtap can give you the derivative of
the information, and you have to integrate it.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Matt Mackall wrote:

On Fri, Apr 13, 2007 at 11:01:41AM +1000, Nick Piggin wrote:


Basically: to show what the hell's going on in the VM.


kprobes / systemtap isn't good enough?



It's not really a good match to the kprobes model. I'm not interested
in events, per se. I don't want to need to know about every single
alloc/free of N different varieties integrated from boot onward to
build up an image of the state of the system. Instead, I want to take
snapshots of the state of the VM.


Systemtap can't output a large set of values?

Why can't you attach a kprobe to a dummy syscall, and from there
iterate over pgdat/zone/memmap and output what you want?

Actually I'm surprised that kind of data querying facility isn't
already in there (I haven't used it seriously though).



The main goal here is to be able to answer the question "where's my
memory going?". Currently you can't really give a good answer to that
question from userspace because of shared mappings, etc.

There are lots of secondary questions that follow on very quickly from
that, like "what parts of my shared mappings are or aren't shared, and
why?", "what's actually in my application's working set?" and "how much
of this crap can I ditch?".


I understand roughly what you want, and that you can't easily get
it from /proc currently. My question at this point is just why can
we not use systemtap.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Matt Mackall
On Fri, Apr 13, 2007 at 11:42:29AM +1000, Nick Piggin wrote:
> >Instead, one says "what pages are being used by my application", then, for
> 
> That includes unmapped pagecache being used by my application, doesn't it?
> Maybe that's too hard to do via /proc so we forget about it...

It'd be really nice to have a window into the pagecache too. But I for
one couldn't come up with a sensible scheme for it.

> >each of those pages "what is that page's state".  So the first step is to
> >collect all the pfns from /proc/$(pidof my-application)/pagemap and then to
> >use those pfns to look the individual pages up in /proc/kpagemap.
> 
> OK I realise you could do it that way, but systemtap can definitely be
> used as a tool for understanding application behaviour in the context of
> the kernel, I think? The purpose for it is so that various little bits
> of deep kernel internals do not have to be exposed on a case by case basis.
> 
> If kprobes is simply crappy and doesn't work properly for this, then I
> could accept that. I'm not someone trying to get this info. So why can't
> it be used? (not just for kpagemap, but for clear_refs and all that gunk
> too).

kprobes is good for looking at events, but bad for looking at state.
Especially metric shitloads of state.

> > If you really want to know "who is using page 123435" then you'd need to
> > search /proc/*/pagemap.  There are possibly legitimate reasons why an
> > application developer would want to at least pertially perform such an
> > operation ("who am I sharing with"), but I doubt if it's the common case.
> 
> Maybe. How about LRU? Reclaim performance is bad, and you want to work out
> which pages keep going off the end of it, or which pages keep getting
> written out via it, or who's pages are on the active list, forcing mine
> out.

Those are actually probably a good match for systemtap as they're all events.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/3] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-12 Thread Waskiewicz Jr, Peter P
> -Original Message-
> From: Patrick McHardy [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, April 12, 2007 5:16 PM
> To: Waskiewicz Jr, Peter P
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
> [EMAIL PROTECTED]; [EMAIL PROTECTED]; cramerj; 
> Kok, Auke-jan H; Leech, Christopher
> Subject: Re: [PATCH 2/3] NET: [UPDATED] Multiqueue network 
> device support implementation.
> 
> Peter P Waskiewicz Jr wrote:
> > diff --git a/net/core/dev.c b/net/core/dev.c index 219a57f..3ce449e 
> > 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -1471,6 +1471,8 @@ gso:
> > q = dev->qdisc;
> > if (q->enqueue) {
> > rc = q->enqueue(skb, q);
> > +   /* reset queue_mapping to zero */
> > +   skb->queue_mapping = 0;
> 
> 
> This must be done before enqueueing. At this point you don't 
> even have a valid reference to the skb anymore.

Agreed, this is a transcription error on my part between my dev box and
this tree.

> 
> > @@ -3326,12 +3330,23 @@ struct net_device *alloc_netdev(int 
> sizeof_priv, const char *name,
> > if (sizeof_priv)
> > dev->priv = netdev_priv(dev);
> >  
> > +   alloc_size = (sizeof(struct net_device_subqueue) * queue_count);
> > + 
> > +   p = kzalloc(alloc_size, GFP_KERNEL);
> > +   if (!p) {
> > +   printk(KERN_ERR "alloc_netdev: Unable to 
> allocate queues.\n");
> > +   return NULL;
> 
> 
> Still leaks the device

I explained this in a previous response, and you seemed to be ok with
the explanation.  Can you elaborate if this is still an issue?

> 
> > diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index 
> > 5cfe60b..6a38905 100644
> > --- a/net/sched/sch_prio.c
> > +++ b/net/sched/sch_prio.c
> > @@ -144,11 +152,17 @@ prio_dequeue(struct Qdisc* sch)
> > struct Qdisc *qdisc;
> >  
> > for (prio = 0; prio < q->bands; prio++) {
> > -   qdisc = q->queues[prio];
> > -   skb = qdisc->dequeue(qdisc);
> > -   if (skb) {
> > -   sch->q.qlen--;
> > -   return skb;
> > +   /* Check if the target subqueue is available before
> > +* pulling an skb.  This way we avoid excessive requeues
> > +* for slower queues.
> > +*/
> > +   if (!netif_subqueue_stopped(sch->dev, 
> q->band2queue[prio])) {
> > +   qdisc = q->queues[prio];
> > +   skb = qdisc->dequeue(qdisc);
> > +   if (skb) {
> > +   sch->q.qlen--;
> > +   return skb;
> > +   }
> > }
> > }
> > return NULL;
> > @@ -200,6 +214,10 @@ static int prio_tune(struct Qdisc 
> *sch, struct rtattr *opt)
> > struct prio_sched_data *q = qdisc_priv(sch);
> > struct tc_prio_qopt *qopt = RTA_DATA(opt);
> > int i;
> > +   int queue;
> > +   int qmapoffset;
> > +   int offset;
> > +   int mod;
> >  
> > if (opt->rta_len < RTA_LENGTH(sizeof(*qopt)))
> > return -EINVAL;
> > @@ -242,6 +260,30 @@ static int prio_tune(struct Qdisc 
> *sch, struct rtattr *opt)
> > }
> > }
> > }
> > +   /* setup queue to band mapping */
> > +   if (q->bands < sch->dev->egress_subqueue_count) {
> > +   qmapoffset = 1;
> > +   mod = sch->dev->egress_subqueue_count;
> > +   } else {
> > +   mod = q->bands % sch->dev->egress_subqueue_count;
> > +   qmapoffset = q->bands / 
> sch->dev->egress_subqueue_count +
> > +   ((mod) ? 1 : 0);
> > +   }
> > +
> > +   queue = 0;
> > +   offset = 0;
> > +   for (i = 0; i < q->bands; i++) {
> > +   q->band2queue[i] = queue;
> > +   if ( ((i + 1) - offset) == qmapoffset) {
> > +   queue++;
> > +   offset += qmapoffset;
> > +   if (mod)
> > +   mod--;
> > +   qmapoffset = q->bands /
> > +   sch->dev->egress_subqueue_count +
> > +   ((mod) ? 1 : 0);
> > +   }
> > +   }
> > return 0;
> >  }
> 
> 
> I stand by my point, this needs to be explicitly enabled by 
> the user since it changes the behaviour of prio on multiqueue 
> capable device.
> 

The user can enable this by using TC filters.  I do understand what you
mean that PRIO's behavior somewhat changes because of how the queues
turn off and on, but how is this different than today?  Today, if the
queue on the NIC shuts down, all the PRIO queues are down.  This way
it's actually helping get traffic out.  I don't see how the user can
control which queue is shut down; that is a function of how congested
the network is.  So if I can clarify what you're saying, are you asking
that the user actually setup the band to queue mapping?  Because if so,
I don't see how that would help since queues today don't have any
priority, and you would have no control which one stops 

Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Andrew Morton
On Fri, 13 Apr 2007 11:42:29 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Fri, 13 Apr 2007 11:14:20 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:
> > 
> > 
> >>Andrew Morton wrote:
> 
> >>>It *will* be viable.  If the application wants to know if a page is dirty,
> >>>it looks up "PG_dirty" in /proc/pg_foo-to-bitnumber and uses PG_dirty's
> >>>numerical offset when inspecting fields in /proc/kpagemap.  If correctly
> >>>designed, such a monitoring application will be able to report upon page
> >>>flags which we haven't even thought up yet.
> >>
> >>Ooh, you wanted a _runtime_ mapping of flags, yeah then I guess that works.
> >>Still seems like a basically hit and miss affair to just use flags. What if
> >>you want to know the process mapping a page? With systemtap or something you
> >>could walk the rmap structures. What if you want to look at pages along the
> >>LRU list rather than per-pfn? What about connecting pages to inodes?
> > 
> > 
> > Well hang on.  This isn't a tool for understanding kernel behaviour.  It's
> > a tool for understanding applciation behaviour.
> > 
> > So one doesn't ask "who is mapping that page" - that's a kernel developer
> > thing.
> > 
> > Instead, one says "what pages are being used by my application", then, for
> 
> That includes unmapped pagecache being used by my application, doesn't it?
> Maybe that's too hard to do via /proc so we forget about it...

Yes, harder.  I'm hoping that sampling of /proc/pid/io can be used to
determine pagecache use sufficiently accurately.  I know of one large
hosting company who are using it ("BTW, we are making great use of
taskstats!!  Its GREAT!")

> 
> > each of those pages "what is that page's state".  So the first step is to
> > collect all the pfns from /proc/$(pidof my-application)/pagemap and then to
> > use those pfns to look the individual pages up in /proc/kpagemap.
> 
> OK I realise you could do it that way, but systemtap can definitely be
> used as a tool for understanding application behaviour in the context of
> the kernel, I think? The purpose for it is so that various little bits
> of deep kernel internals do not have to be exposed on a case by case basis.
> 
> If kprobes is simply crappy and doesn't work properly for this, then I
> could accept that. I'm not someone trying to get this info. So why can't
> it be used? (not just for kpagemap, but for clear_refs and all that gunk
> too).
> 
>  > If you really want to know "who is using page 123435" then you'd need to
>  > search /proc/*/pagemap.  There are possibly legitimate reasons why an
>  > application developer would want to at least pertially perform such an
>  > operation ("who am I sharing with"), but I doubt if it's the common case.
> 
> Maybe. How about LRU? Reclaim performance is bad, and you want to work out
> which pages keep going off the end of it, or which pages keep getting
> written out via it, or who's pages are on the active list, forcing mine
> out.

I guess we have static analysis versus dynamic.  The interfaces which Matt
is proposing are suited to answering the question "what is my memory being
used for" (static).  They're unlikely to be useful for answering the question
"what's happening in the VM" (dynamic).  Systemtap is probably better for the
dynamic analysis.

I guess one could generate an answer to the static question with systemtap,
by accumulating running counts across the application lifetime and then
snapshotting them.  Sounds hard though.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Cheap lock for user mode processes release when process exits

2007-04-12 Thread mchu
Hi all,

Maybe someone here knows better.

I have several user-mode processes using shared mmap.  There can be several 
reader processes and only one writer.  Readers access the shared region 
frequently, writer seldom.

Naturally, multi-reader/single-writer locks works best.  I tried this with 
futex on 2.6.9-42.EL.  However, if one of the processes is killed/exits, the 
lock doesn't get released.

I can trap the signal to release the lock, but not all signals like kill.

Anyway I can achieve this without a potential deadlock?

Thanks,
Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Matt Mackall
On Fri, Apr 13, 2007 at 11:01:41AM +1000, Nick Piggin wrote:
> >Basically: to show what the hell's going on in the VM.
> 
> kprobes / systemtap isn't good enough?

It's not really a good match to the kprobes model. I'm not interested
in events, per se. I don't want to need to know about every single
alloc/free of N different varieties integrated from boot onward to
build up an image of the state of the system. Instead, I want to take
snapshots of the state of the VM.

The main goal here is to be able to answer the question "where's my
memory going?". Currently you can't really give a good answer to that
question from userspace because of shared mappings, etc.

There are lots of secondary questions that follow on very quickly from
that, like "what parts of my shared mappings are or aren't shared, and
why?", "what's actually in my application's working set?" and "how much
of this crap can I ditch?".

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Andrew Morton wrote:

On Fri, 13 Apr 2007 11:14:20 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:



Andrew Morton wrote:



It *will* be viable.  If the application wants to know if a page is dirty,
it looks up "PG_dirty" in /proc/pg_foo-to-bitnumber and uses PG_dirty's
numerical offset when inspecting fields in /proc/kpagemap.  If correctly
designed, such a monitoring application will be able to report upon page
flags which we haven't even thought up yet.


Ooh, you wanted a _runtime_ mapping of flags, yeah then I guess that works.
Still seems like a basically hit and miss affair to just use flags. What if
you want to know the process mapping a page? With systemtap or something you
could walk the rmap structures. What if you want to look at pages along the
LRU list rather than per-pfn? What about connecting pages to inodes?



Well hang on.  This isn't a tool for understanding kernel behaviour.  It's
a tool for understanding applciation behaviour.

So one doesn't ask "who is mapping that page" - that's a kernel developer
thing.

Instead, one says "what pages are being used by my application", then, for


That includes unmapped pagecache being used by my application, doesn't it?
Maybe that's too hard to do via /proc so we forget about it...



each of those pages "what is that page's state".  So the first step is to
collect all the pfns from /proc/$(pidof my-application)/pagemap and then to
use those pfns to look the individual pages up in /proc/kpagemap.


OK I realise you could do it that way, but systemtap can definitely be
used as a tool for understanding application behaviour in the context of
the kernel, I think? The purpose for it is so that various little bits
of deep kernel internals do not have to be exposed on a case by case basis.

If kprobes is simply crappy and doesn't work properly for this, then I
could accept that. I'm not someone trying to get this info. So why can't
it be used? (not just for kpagemap, but for clear_refs and all that gunk
too).

> If you really want to know "who is using page 123435" then you'd need to
> search /proc/*/pagemap.  There are possibly legitimate reasons why an
> application developer would want to at least pertially perform such an
> operation ("who am I sharing with"), but I doubt if it's the common case.

Maybe. How about LRU? Reclaim performance is bad, and you want to work out
which pages keep going off the end of it, or which pages keep getting
written out via it, or who's pages are on the active list, forcing mine
out.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] i386 - pte update optimizations

2007-04-12 Thread H. Peter Anvin

Zachary Amsden wrote:

Some PTE optimizations for native and paravirt-ops kernels; this
provides a huge win for shadow mode hypervisors and gets rid of
some unnecessary atomic instructions in native kernels, saving
even more on UP by getting rid of implicit LOCK on xchg instruction.


You do know that P6 and higher don't do locked bus references as long as 
the value is in the cache, right?


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Andrew Morton
On Fri, 13 Apr 2007 11:14:20 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Fri, 13 Apr 2007 10:15:24 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> +   for (; i < 2 * chunk / KPMSIZE; i += 2, pfn++) {
> +   ppage = pfn_to_page(pfn);
> +   if (!ppage) {
> +   page[i] = 0;
> +   page[i + 1] = 0;
> +   } else {
> +   page[i] = ppage->flags;
> +   page[i + 1] = atomic_read(>_count);
> +   }
> +   }
> >>>
> >>>
> >>>Not a good idea to expose raw flags in this manner - it changes at the drop
> >>>of a hat.  We'd need to also expose the kernel's PG_foo-to-bitnumber
> >>>mapping to make this viable.
> >>
> >>I don't think it is viable because that makes the flags part of the
> >>userspace ABI.
> > 
> > 
> > It *will* be viable.  If the application wants to know if a page is dirty,
> > it looks up "PG_dirty" in /proc/pg_foo-to-bitnumber and uses PG_dirty's
> > numerical offset when inspecting fields in /proc/kpagemap.  If correctly
> > designed, such a monitoring application will be able to report upon page
> > flags which we haven't even thought up yet.
> 
> Ooh, you wanted a _runtime_ mapping of flags, yeah then I guess that works.
> Still seems like a basically hit and miss affair to just use flags. What if
> you want to know the process mapping a page? With systemtap or something you
> could walk the rmap structures. What if you want to look at pages along the
> LRU list rather than per-pfn? What about connecting pages to inodes?

Well hang on.  This isn't a tool for understanding kernel behaviour.  It's
a tool for understanding applciation behaviour.

So one doesn't ask "who is mapping that page" - that's a kernel developer
thing.

Instead, one says "what pages are being used by my application", then, for
each of those pages "what is that page's state".  So the first step is to
collect all the pfns from /proc/$(pidof my-application)/pagemap and then to
use those pfns to look the individual pages up in /proc/kpagemap.

If you really want to know "who is using page 123435" then you'd need to
search /proc/*/pagemap.  There are possibly legitimate reasons why an
application developer would want to at least pertially perform such an
operation ("who am I sharing with"), but I doubt if it's the common case.

> 
> But I was going to say
> that satisfying an Oracle requirement is a good reason _not_ to merge it ;)
>

hm, yes, there's plenty of precedent for that.

> (I joke!)

I akpm!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Andrew Morton wrote:

On Fri, 13 Apr 2007 10:15:24 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:



+   for (; i < 2 * chunk / KPMSIZE; i += 2, pfn++) {
+   ppage = pfn_to_page(pfn);
+   if (!ppage) {
+   page[i] = 0;
+   page[i + 1] = 0;
+   } else {
+   page[i] = ppage->flags;
+   page[i + 1] = atomic_read(>_count);
+   }
+   }



Not a good idea to expose raw flags in this manner - it changes at the drop
of a hat.  We'd need to also expose the kernel's PG_foo-to-bitnumber
mapping to make this viable.


I don't think it is viable because that makes the flags part of the
userspace ABI.



It *will* be viable.  If the application wants to know if a page is dirty,
it looks up "PG_dirty" in /proc/pg_foo-to-bitnumber and uses PG_dirty's
numerical offset when inspecting fields in /proc/kpagemap.  If correctly
designed, such a monitoring application will be able to report upon page
flags which we haven't even thought up yet.


Ooh, you wanted a _runtime_ mapping of flags, yeah then I guess that works.
Still seems like a basically hit and miss affair to just use flags. What if
you want to know the process mapping a page? With systemtap or something you
could walk the rmap structures. What if you want to look at pages along the
LRU list rather than per-pfn? What about connecting pages to inodes?

I thought this type of deep poking was the whole reason the probles thingies
were merged. I'm saddened that they're no good for this. I thought it would
be an ideal usage :(



I wonder what they are needed for.



Poking deeply into the kernel to provide information about kernel state. 


There are real-world needs for this, and the people who develop tools to
process this information will have decent kernel understanding and will
know that the file's contents may alter across kernel versions.  It sure
beats poking around in /dev/kmem.

I doubt if there's a sensible way in which we can prettify this interface
without losing information.  But we should aim to make it as robust as
possible agaisnt future kenrel changes, of course.

And we should satisfy ourselves that all the required information has been
made available.  The fact that it will satisfy the Oracle requirement is
encouraging.


Yeah it is close, they need page_mapcount I think. But I was going to say
that satisfying an Oracle requirement is a good reason _not_ to merge it ;)
(I joke!)

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] convert aio event reap to use atomic-op instead of spin_lock

2007-04-12 Thread Ken Chen

On 4/12/07, Ken Chen <[EMAIL PROTECTED]> wrote:

On 4/12/07, Jeff Moyer <[EMAIL PROTECTED]> wrote:
> I didn't see any response to Zach's request for code that actually
> tests out the shared ring buffer.  Do you have such code?

Yes, I do.  I was stress testing the code since last night.  After 20+
hours of stress run with fio and aio-stress, now I'm posting it with
confidence.

I modified libaio's io_getevents to take advantage of new user level
reap function. The feature is exported out via ring->compat_features.
btw, is compat_feature suppose to be a version number or a bit mask?
I think bitmask make more sense and more flexible.


Additional patch on the kernel side to export the new features.  On
top of patch posted at:
http://marc.info/?l=linux-kernel=117636401818057=2

--- a/include/linux/aio.h
+++ b/include/linux/aio.h
@@ -138,8 +138,11 @@ #define init_sync_kiocb(x, filp)   \
init_wait((&(x)->ki_wait)); \
} while (0)

+#define AIO_RING_BASE  1
+#define AIO_RING_USER_REAP 2
+
#define AIO_RING_MAGIC  0xa10a10a1
-#define AIO_RING_COMPAT_FEATURES   1
+#define AIO_RING_COMPAT_FEATURES   (AIO_RING_BASE | AIO_RING_USER_REAP)
#define AIO_RING_INCOMPAT_FEATURES  0
struct aio_ring {
unsignedid; /* kernel internal index number */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Matt Mackall wrote:

On Fri, Apr 13, 2007 at 10:15:24AM +1000, Nick Piggin wrote:


Andrew Morton wrote:


On Thu, 12 Apr 2007 16:10:50 -0700
William Lee Irwin III <[EMAIL PROTECTED]> wrote:



+   while (count > 0) {
+   chunk = min_t(size_t, count, PAGE_SIZE);
+   i = 0;
+
+   if (pfn == -1) {
+   page[0] = 0;
+   page[1] = 0;
+   ((char *)page)[0] = (ntohl(1) != 1);



OK.




+   ((char *)page)[1] = PAGE_SHIFT;



OK.


Shouldn't we just expose page size and endianness by other means? (another 
file or

syscall).



If I send you this file dumped from a random machine, you won't know
what to make of it.


That's a good reason ;)


I'm planning to write a trivial server to sit on, say, my embedded
target and spew this over the wire to a client. 




Not a good idea to expose raw flags in this manner - it changes at the drop
of a hat.  We'd need to also expose the kernel's PG_foo-to-bitnumber
mapping to make this viable.


I don't think it is viable because that makes the flags part of the
userspace ABI. I wonder what they are needed for.



Basically: to show what the hell's going on in the VM.


kprobes / systemtap isn't good enough?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] convert aio event reap to use atomic-op instead of spin_lock

2007-04-12 Thread Ken Chen

On 4/12/07, Jeff Moyer <[EMAIL PROTECTED]> wrote:

I didn't see any response to Zach's request for code that actually
tests out the shared ring buffer.  Do you have such code?


Yes, I do.  I was stress testing the code since last night.  After 20+
hours of stress run with fio and aio-stress, now I'm posting it with
confidence.

I modified libaio's io_getevents to take advantage of new user level
reap function. The feature is exported out via ring->compat_features.
btw, is compat_feature suppose to be a version number or a bit mask?
I think bitmask make more sense and more flexible.

(warning: some lines are extremely long in the patch and my email
client will probably mangle it badly).


diff -Nurp libaio-0.3.104/src/io_getevents.c
libaio-0.3.104-new/src/io_getevents.c
--- libaio-0.3.104/src/io_getevents.c   2003-06-18 12:58:21.0 -0700
+++ libaio-0.3.104-new/src/io_getevents.c   2007-04-12 17:35:06.0 
-0700
@@ -21,10 +21,13 @@
#include 
#include 
#include "syscall.h"
+#include 

io_syscall5(int, __io_getevents_0_4, io_getevents, io_context_t, ctx,
long, min_nr, long, nr, struct io_event *, events, struct timespec *,
timeout)

#define AIO_RING_MAGIC  0xa10a10a1
+#define AIO_RING_BASE  1
+#define AIO_RING_USER_REAP 2

/* Ben will hate me for this */
struct aio_ring {
@@ -41,7 +44,11 @@ struct aio_ring {

int io_getevents_0_4(io_context_t ctx, long min_nr, long nr, struct
io_event * events, struct timespec * timeout)
{
+   long i = 0, ret;
+   unsigned head;
+   struct io_event *evt_base;
struct aio_ring *ring;
+
ring = (struct aio_ring*)ctx;
if (ring==NULL || ring->magic != AIO_RING_MAGIC)
goto do_syscall;
@@ -49,9 +56,35 @@ int io_getevents_0_4(io_context_t ctx, l
if (ring->head == ring->tail)
return 0;
}
-   
+
+   if (!(ring->compat_features & AIO_RING_USER_REAP))
+   goto do_syscall;
+
+   if (min_nr > nr || min_nr < 0 || nr < 0)
+   return -EINVAL;
+
+   evt_base = (struct io_event *) (ring + 1);
+   while (i < nr) {
+   head = ring->head;
+   if (head == ring->tail)
+   break;
+
+   *events = evt_base[head & (ring->nr - 1)];
+   if (head == cmpxchg(>head, head, head + 1)) {
+   events++;
+   i++;
+   }
+   }
+
+   if (i >= min_nr)
+   return i;
+
do_syscall: 
-   return __io_getevents_0_4(ctx, min_nr, nr, events, timeout);
+   ret = __io_getevents_0_4(ctx, min_nr - i, nr - i, events, timeout);
+   if (ret >= 0)
+   return i + ret;
+   else
+   return i ? i : ret;
}

DEFSYMVER(io_getevents_0_4, io_getevents, 0.4)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kernel-discuss] Re: [PATCH 3/7] [RFC] Battery monitoring class

2007-04-12 Thread Henrique de Moraes Holschuh
On Fri, 13 Apr 2007, Anton Vorontsov wrote:
> Let's name attributes with mWh units as {min_,max_,design_,}energy,
> and attributes with mAh units as {min_,max_,design_,}charge.

[...]

> * Yup, I've read last discussion regarding batteries, and I've seen
>   objections against "charge" term, quoting Shem Multinymous:
> 
>   "And, for the reasons I explained earlier, I strongly suggest not using
>   the term "charge" except when referring to the action of charging.
>   Hence:
>   s/charge_rate/rate/;  s/charge/capacity/"
> 
>   But lets think about it once again? We'll make things much cleaner
>   if we'll drop "capacity" at all.

I stand with Shem on this one.  The people behind the SBS specification
seems to agree... that specification is aimed at *engineers* and still
avoids the obvious trap of using "charge" due to its high potential for
confusion.

I don't even want to know how much of a mess the people writing applets
woudl make of it...

> > That said, you may need to use uWh and uAh instead of mAh and mWh, though.
> 
> Not sure. Is there any existing chip that can report uAh/uWh? That is
> great precision.

The way things are going, it should be feasible for small embedded systems
quite soon.  Refer to the previous thread.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Andrew Morton
On Fri, 13 Apr 2007 10:15:24 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:

> >>+   ((char *)page)[1] = PAGE_SHIFT;
> > 
> > 
> > OK.
> 
> Shouldn't we just expose page size and endianness by other means? (another 
> file or
> syscall).

I don't think so - this file exposes fairly deep kernel internals and
that's unavoidable, really - it's *supposed* to do that.  It is explicitly
designed for monitoring kernel behaviour.

So it needs special handling by userspace.  Keeping the number of files
which need such special handling to a minimum will keep the number of
applications which are exposed to kernel changes to a minimum.

> >>+   for (; i < 2 * chunk / KPMSIZE; i += 2, pfn++) {
> >>+   ppage = pfn_to_page(pfn);
> >>+   if (!ppage) {
> >>+   page[i] = 0;
> >>+   page[i + 1] = 0;
> >>+   } else {
> >>+   page[i] = ppage->flags;
> >>+   page[i + 1] = atomic_read(>_count);
> >>+   }
> >>+   }
> > 
> > 
> > Not a good idea to expose raw flags in this manner - it changes at the drop
> > of a hat.  We'd need to also expose the kernel's PG_foo-to-bitnumber
> > mapping to make this viable.
> 
> I don't think it is viable because that makes the flags part of the
> userspace ABI.

It *will* be viable.  If the application wants to know if a page is dirty,
it looks up "PG_dirty" in /proc/pg_foo-to-bitnumber and uses PG_dirty's
numerical offset when inspecting fields in /proc/kpagemap.  If correctly
designed, such a monitoring application will be able to report upon page
flags which we haven't even thought up yet.

> I wonder what they are needed for.

Poking deeply into the kernel to provide information about kernel state. 

There are real-world needs for this, and the people who develop tools to
process this information will have decent kernel understanding and will
know that the file's contents may alter across kernel versions.  It sure
beats poking around in /dev/kmem.

I doubt if there's a sensible way in which we can prettify this interface
without losing information.  But we should aim to make it as robust as
possible agaisnt future kenrel changes, of course.

And we should satisfy ourselves that all the required information has been
made available.  The fact that it will satisfy the Oracle requirement is
encouraging.

Matt, these changes make the new field in /proc/pid/smaps redundant, don't
they?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Matt Mackall
On Fri, Apr 13, 2007 at 10:15:24AM +1000, Nick Piggin wrote:
> Andrew Morton wrote:
> >On Thu, 12 Apr 2007 16:10:50 -0700
> >William Lee Irwin III <[EMAIL PROTECTED]> wrote:
> 
> >>+   while (count > 0) {
> >>+   chunk = min_t(size_t, count, PAGE_SIZE);
> >>+   i = 0;
> >>+
> >>+   if (pfn == -1) {
> >>+   page[0] = 0;
> >>+   page[1] = 0;
> >>+   ((char *)page)[0] = (ntohl(1) != 1);
> >
> >
> >OK.
> >
> >
> >>+   ((char *)page)[1] = PAGE_SHIFT;
> >
> >
> >OK.
> 
> Shouldn't we just expose page size and endianness by other means? (another 
> file or
> syscall).

If I send you this file dumped from a random machine, you won't know
what to make of it.

I'm planning to write a trivial server to sit on, say, my embedded
target and spew this over the wire to a client. 

> >Not a good idea to expose raw flags in this manner - it changes at the drop
> >of a hat.  We'd need to also expose the kernel's PG_foo-to-bitnumber
> >mapping to make this viable.
> 
> I don't think it is viable because that makes the flags part of the
> userspace ABI. I wonder what they are needed for.

Basically: to show what the hell's going on in the VM.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH][RFC] Kill off legacy power management stuff.

2007-04-12 Thread Robert P. J. Day

just something i threw together, not in final form, but it represents
tossing the legacy PM stuff.  at the moment, the menuconfig entry for
PM_LEGACY lists it as "DEPRECATED", while the help screen calls it
"obsolete."  that's a good sign that it's getting close to the time
for it to go, and the removal is fairly straightforward, but there's
no mention of its removal in the feature removal schedule file.

NOTE:  this is not a working patch as it will fail on a MIPS or FR-V
build, as i didn't remove the final vestiges from those two
architectures.  that would require simply killing off the remaining
calls to pm_send_all(), that's all.  (i think.)

anyway, this has been compile-tested on x86 with "make allyesconfig."


 Documentation/pm.txt |  123 ---
 arch/i386/kernel/apm.c   |   27 
 drivers/acpi/bus.c   |   14 --
 drivers/net/3c509.c  |1
 drivers/serial/68328serial.c |   59 -
 include/linux/pm.h   |   70 ---
 include/linux/pm_legacy.h|   41 --
 kernel/power/Kconfig |   10 -
 kernel/power/Makefile|1
 kernel/power/pm.c|  209 -
 10 files changed, 1 insertion(+), 554 deletions(-)


diff --git a/Documentation/pm.txt b/Documentation/pm.txt
index da8589a..d0fcfe2 100644
--- a/Documentation/pm.txt
+++ b/Documentation/pm.txt
@@ -36,93 +36,6 @@ system the associated daemon will exit gracefully.
   apmd:   http://worldvisions.ca/~apenwarr/apmd/
   acpid:  http://acpid.sf.net/

-Driver Interface -- OBSOLETE, DO NOT USE!
-*
-
-Note: pm_register(), pm_access(), pm_dev_idle() and friends are
-obsolete. Please do not use them. Instead you should properly hook
-your driver into the driver model, and use its suspend()/resume()
-callbacks to do this kind of stuff.
-
-If you are writing a new driver or maintaining an old driver, it
-should include power management support.  Without power management
-support, a single driver may prevent a system with power management
-capabilities from ever being able to suspend (safely).
-
-Overview:
-1) Register each instance of a device with "pm_register"
-2) Call "pm_access" before accessing the hardware.
-   (this will ensure that the hardware is awake and ready)
-3) Your "pm_callback" is called before going into a
-   suspend state (ACPI D1-D3) or after resuming (ACPI D0)
-   from a suspend.
-4) Call "pm_dev_idle" when the device is not being used
-   (optional but will improve device idle detection)
-5) When unloaded, unregister the device with "pm_unregister"
-
-/*
- * Description: Register a device with the power-management subsystem
- *
- * Parameters:
- *   type - device type (PCI device, system device, ...)
- *   id - instance number or unique identifier
- *   cback - request handler callback (suspend, resume, ...)
- *
- * Returns: Registered PM device or NULL on error
- *
- * Examples:
- *   dev = pm_register(PM_SYS_DEV, PM_SYS_VGA, vga_callback);
- *
- *   struct pci_dev *pci_dev = pci_find_dev(...);
- *   dev = pm_register(PM_PCI_DEV, PM_PCI_ID(pci_dev), callback);
- */
-struct pm_dev *pm_register(pm_dev_t type, unsigned long id, pm_callback cback);
-
-/*
- * Description: Unregister a device with the power management subsystem
- *
- * Parameters:
- *   dev - PM device previously returned from pm_register
- */
-void pm_unregister(struct pm_dev *dev);
-
-/*
- * Description: Unregister all devices with a matching callback function
- *
- * Parameters:
- *   cback - previously registered request callback
- *
- * Notes: Provided for easier porting from old APM interface
- */
-void pm_unregister_all(pm_callback cback);
-
-/*
- * Power management request callback
- *
- * Parameters:
- *   dev - PM device previously returned from pm_register
- *   rqst - request type
- *   data - data, if any, associated with the request
- *
- * Returns: 0 if the request is successful
- *  EINVAL if the request is not supported
- *  EBUSY if the device is now busy and cannot handle the request
- *  ENOMEM if the device was unable to handle the request due to memory
- *
- * Details: The device request callback will be called before the
- *  device/system enters a suspend state (ACPI D1-D3) or
- *  or after the device/system resumes from suspend (ACPI D0).
- *  For PM_SUSPEND, the ACPI D-state being entered is passed
- *  as the "data" argument to the callback.  The device
- *  driver should save (PM_SUSPEND) or restore (PM_RESUME)
- *  device context when the request callback is called.
- *
- *  Once a driver returns 0 (success) from a suspend
- *  request, it should not process any further requests or
- *  access the device hardware until a call to "pm_access" is made.
- */
-typedef int (*pm_callback)(struct pm_dev *dev, pm_request_t rqst, void *data);
-
 Driver Details
 --
 This is just a quick Q as a stopgap 

Re: [PATCH] make MADV_FREE lazily free memory

2007-04-12 Thread Nick Piggin

Rik van Riel wrote:

Nick Piggin wrote:


The lazy freeing is aimed at avoiding page faults on memory
that is freed and later realloced, which is quite a common
thing in many workloads.



I would be interested to see how it performs and what these
workloads look like, although we do need to fix the basic glibc and
madvise locking problems first.



The attached graph are results of running the MySQL sysbench
workload on my quad core system.  As you can see, performance
with #threads == #cpus (4) almost doubles from 1070 transactions
per second to 2014 transactions/second.

On the high end (16 threads on 4 cpus), performance increases
from 778 transactions/second on vanilla to 1310 transactions/second.

I have also benchmarked running Ulrich's changed glibc on a vanilla
kernel, which gives results somewhere in-between, but much closer to
just the vanilla kernel.


Looks like the idle time issue is still biting for those guys.

Hmm, maybe MySQL is actually _touching_ the memory inside a more
critical lock, so the faults get tangled up on mmap_sem there. I
wonder if making malloc call memset right afterwards would hide
that ;) Or the madvise exclusive mmap_sem avoidance.

Seems like with perfect scaling we should get to the 2400 mark.
It would be nice to be able to not degrade under load. Of course
some of that will be MySQL scaling issues.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Matt Mackall
On Thu, Apr 12, 2007 at 04:32:35PM -0700, Andrew Morton wrote:
> On Thu, 12 Apr 2007 16:10:50 -0700
> William Lee Irwin III <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, Apr 03, 2007 at 09:43:30PM -0500, Matt Mackall wrote:
> > > This patch series introduces /proc/pid/pagemap and /proc/kpagemap,
> > > which allow detailed run-time examination of process memory usage at a
> > > page granularity.
> > > The first several patches whip the page-walking code introduced for
> > > /proc/pid/smaps and clear_refs into a more generic form, the next
> > > couple make those interfaces optional, and the last two introduce the
> > > new interfaces, also optional.
> > 
> > This solves a real-life problem for Oracle system monitoring software
> > (specifically EM). Among the tasks it must carry out is determining
> > per-process memory footprint of a set of cooperating tasks (i.e. Oracle
> > processes). RSS is inadequate for this due to page sharing; this work
> > provides sufficient information to determine what EM needs.
> 
> I'm still dying to see what the human-readable output from this
> thing looks like.

Still a work-in-progress. It's a monstrous amount of data and it
basically requires a GUI to really get a handle on. Here's a couple
apps I've been tinkering with (aka My First GTK Apps):

http://selenic.com/Screenshot-pagemap.png

That's a snapshot of a live-updating image of memory usage for a
running process (Galeon). Each pixel is a page. Each 32x32 block is
4MB. Mappings are dark red. Pages that are actually faulted in are
bright red. You can poke around in the memory map with the mouse and
highlight mappings (blue). And pages that get faulted in flash green
(hard to capture in a screenshot).

http://selenic.com/Screenshot-kpagemap.png

And that's a live-updating image of system-wide memory usage. Bright
red are pages with a count of 1, dark red are pages with higher
counts. Next is to visualize slab/page cache/buddy/active/lru data as
well as highlight changing pages.

This isn't terribly interesting yet. It can tell you things about page
cache usage and fragmentation and readahead and so on.

But correlating across the two sources, we'll be able to show
information like "what pages in a process are actually
shared/active/lru/etc." You can take it even further by correlating
the above data with symbol info from nm, /proc/pid/clear_refs, etc.

Also, something I immediately noticed on looking at the raw data
(cat /proc/`pidof`/pagemap | hexdump -C | less):

002c8fd0  ff ff ff ff ff ff ff ff  ff ff ff ff 6d f8 03 00 |m...|
002c8fe0  6c f8 03 00 b9 f8 03 00  6b f8 03 00 6a f8 03 00 |l...k...j...|
002c8ff0  b8 f8 03 00 69 f8 03 00  68 f8 03 00 b7 f8 03 00 |i...h...|
002c9000  67 f8 03 00 66 f8 03 00  b6 f8 03 00 65 f8 03 00 |g...f...e...|
002c9010  64 f8 03 00 b5 f8 03 00  63 f8 03 00 62 f8 03 00 |d...c...b...|
002c9020  b4 f8 03 00 61 f8 03 00  60 f8 03 00 b3 f8 03 00 |a...`...|
002c9030  7f f8 03 00 7e f8 03 00  b2 f8 03 00 7d f8 03 00 |~...}...|
002c9040  7c f8 03 00 b1 f8 03 00  5f f8 03 00 5e f8 03 00 ||..._...^...|
002c9050  b0 f8 03 00 5d f8 03 00  5c f8 03 00 af f8 03 00 |]...\...|

Most of the consecutive page frames are allocated in descending order
(6d 6c 6b 6a ...). That's pessimal for physical merging of block I/O.
Given that we theoretically fixed this long-standing problem in 2.6
but it's obviously still happening, it's clear that a little more
visibility into the VM would be useful.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

William Lee Irwin III wrote:

On Thu, 12 Apr 2007 16:10:50 -0700 William Lee Irwin III <[EMAIL PROTECTED]> 
wrote:


This solves a real-life problem for Oracle system monitoring software
(specifically EM). Among the tasks it must carry out is determining
per-process memory footprint of a set of cooperating tasks (i.e. Oracle
processes). RSS is inadequate for this due to page sharing; this work
provides sufficient information to determine what EM needs.



On Thu, Apr 12, 2007 at 04:32:35PM -0700, Andrew Morton wrote:


Not a good idea to expose raw flags in this manner - it changes at the drop
of a hat.  We'd need to also expose the kernel's PG_foo-to-bitnumber
mapping to make this viable.
Not a good idea to use page->_count: page_count() will be more stable. 
Otherwise OK, I guess: the interpretation of the page refcount is unlikely

to change much over time.



EM wants to determine page_mapcount() for the most part for the
purposes of determining "uniquely attributable RSS" (my ca. 2004
nomenclature) or "proportional RSS" (mpm's more recent nomenclature);
as things now stand it will have to infer them by maintaining a table
of pfn's and mappings thereof, but at least that can be done with it.


I don't know whether you can easily determine page_mapcount with
page_count and flags, though (count gives you an educated guess,
but mapcount is the real thing).

page_mapcount sounds very reasonable to export. It is directly
tied with the userspace concept of mapping pages. page_count doesn't
seem very useful (and if you must have it, please use page_count),
neither does page flags.

You could have a bit indicating whether the page is free or not (but
that doesn't tell you much that meminfo or zoneinfo or buddyinfo does
not). Dirty/writeback/referenced/uptodate maybe?... I'm stumped,
what's flags for?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: intermittant petabyte usage reported with broadcom nic

2007-04-12 Thread Andi Kleen
Roland Dreier <[EMAIL PROTECTED]> writes:

> [Adding Michael Chan, who seems to look after bnx2, to the cc list]
> 
>  > To clarify it's an Intel Dual Core Xeon (I just wound up as thinking of
>  > them all as amd64s). Network card driver in use is the one defined by
>  > CONFIG_BNX2. Kernel's monolithic.
> 
> From a quick look at bnx2.c, it seems that the driver gives the NIC
> (firmware?) a block of memory to DMA stats into, and just reads from
> that memory in its get_stats method.  So if you're seeing wonky stats
> from the NIC intermittently, my best guess would be that firmware is
> occasionally writing junk into the stats block.

When only the firmware is writing to that area it could be put
into an own page and then write protected with change_page_attr()
That would catch any corruption coming from the rest of the kernel.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Nick Piggin

Andrew Morton wrote:

On Thu, 12 Apr 2007 16:10:50 -0700
William Lee Irwin III <[EMAIL PROTECTED]> wrote:



+   while (count > 0) {
+   chunk = min_t(size_t, count, PAGE_SIZE);
+   i = 0;
+
+   if (pfn == -1) {
+   page[0] = 0;
+   page[1] = 0;
+   ((char *)page)[0] = (ntohl(1) != 1);



OK.



+   ((char *)page)[1] = PAGE_SHIFT;



OK.


Shouldn't we just expose page size and endianness by other means? (another file 
or
syscall).


+   for (; i < 2 * chunk / KPMSIZE; i += 2, pfn++) {
+   ppage = pfn_to_page(pfn);
+   if (!ppage) {
+   page[i] = 0;
+   page[i + 1] = 0;
+   } else {
+   page[i] = ppage->flags;
+   page[i + 1] = atomic_read(>_count);
+   }
+   }



Not a good idea to expose raw flags in this manner - it changes at the drop
of a hat.  We'd need to also expose the kernel's PG_foo-to-bitnumber
mapping to make this viable.


I don't think it is viable because that makes the flags part of the
userspace ABI. I wonder what they are needed for.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-12 Thread Patrick McHardy
Peter P Waskiewicz Jr wrote:
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 219a57f..3ce449e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1471,6 +1471,8 @@ gso:
>   q = dev->qdisc;
>   if (q->enqueue) {
>   rc = q->enqueue(skb, q);
> + /* reset queue_mapping to zero */
> + skb->queue_mapping = 0;


This must be done before enqueueing. At this point you don't even have
a valid reference to the skb anymore.

> @@ -3326,12 +3330,23 @@ struct net_device *alloc_netdev(int sizeof_priv, 
> const char *name,
>   if (sizeof_priv)
>   dev->priv = netdev_priv(dev);
>  
> + alloc_size = (sizeof(struct net_device_subqueue) * queue_count);
> + 
> + p = kzalloc(alloc_size, GFP_KERNEL);
> + if (!p) {
> + printk(KERN_ERR "alloc_netdev: Unable to allocate queues.\n");
> + return NULL;


Still leaks the device

> diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
> index 5cfe60b..6a38905 100644
> --- a/net/sched/sch_prio.c
> +++ b/net/sched/sch_prio.c
> @@ -144,11 +152,17 @@ prio_dequeue(struct Qdisc* sch)
>   struct Qdisc *qdisc;
>  
>   for (prio = 0; prio < q->bands; prio++) {
> - qdisc = q->queues[prio];
> - skb = qdisc->dequeue(qdisc);
> - if (skb) {
> - sch->q.qlen--;
> - return skb;
> + /* Check if the target subqueue is available before
> +  * pulling an skb.  This way we avoid excessive requeues
> +  * for slower queues.
> +  */
> + if (!netif_subqueue_stopped(sch->dev, q->band2queue[prio])) {
> + qdisc = q->queues[prio];
> + skb = qdisc->dequeue(qdisc);
> + if (skb) {
> + sch->q.qlen--;
> + return skb;
> + }
>   }
>   }
>   return NULL;
> @@ -200,6 +214,10 @@ static int prio_tune(struct Qdisc *sch, struct rtattr 
> *opt)
>   struct prio_sched_data *q = qdisc_priv(sch);
>   struct tc_prio_qopt *qopt = RTA_DATA(opt);
>   int i;
> + int queue;
> + int qmapoffset;
> + int offset;
> + int mod;
>  
>   if (opt->rta_len < RTA_LENGTH(sizeof(*qopt)))
>   return -EINVAL;
> @@ -242,6 +260,30 @@ static int prio_tune(struct Qdisc *sch, struct rtattr 
> *opt)
>   }
>   }
>   }
> + /* setup queue to band mapping */
> + if (q->bands < sch->dev->egress_subqueue_count) {
> + qmapoffset = 1;
> + mod = sch->dev->egress_subqueue_count;
> + } else {
> + mod = q->bands % sch->dev->egress_subqueue_count;
> + qmapoffset = q->bands / sch->dev->egress_subqueue_count +
> + ((mod) ? 1 : 0);
> + }
> +
> + queue = 0;
> + offset = 0;
> + for (i = 0; i < q->bands; i++) {
> + q->band2queue[i] = queue;
> + if ( ((i + 1) - offset) == qmapoffset) {
> + queue++;
> + offset += qmapoffset;
> + if (mod)
> + mod--;
> + qmapoffset = q->bands /
> + sch->dev->egress_subqueue_count +
> + ((mod) ? 1 : 0);
> + }
> + }
>   return 0;
>  }


I stand by my point, this needs to be explicitly enabled by the
user since it changes the behaviour of prio on multiqueue capable
device.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] NET: Multiqueue network device support documentation.

2007-04-12 Thread Peter P Waskiewicz Jr
From: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>

Adding documentation for the new multiqueue API.

Signed-off-by: Peter P. Waskiewicz Jr <[EMAIL PROTECTED]>
Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 Documentation/networking/multiqueue.txt |   97 +++
 1 files changed, 97 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/multiqueue.txt 
b/Documentation/networking/multiqueue.txt
new file mode 100644
index 000..c32ed83
--- /dev/null
+++ b/Documentation/networking/multiqueue.txt
@@ -0,0 +1,97 @@
+
+   HOWTO for multiqueue network device support
+   ===
+
+Section 1: Base driver requirements for implementing multiqueue support
+Section 2: Qdisc support for multiqueue devices
+Section 3: Brief howto using PRIO for multiqueue devices
+
+
+Intro: Kernel support for multiqueue devices
+-
+
+Kernel support for multiqueue devices is only an API that is presented to the
+netdevice layer for base drivers to implement.  This feature is part of the
+core networking stack, and all network devices will be running on the
+multiqueue-aware stack.  If a base driver only has one queue, then these
+changes are transparent to that driver.
+
+
+Section 2: Base driver requirements for implementing multiqueue support
+---
+
+Base drivers are required to use the new alloc_etherdev_mq() or
+alloc_netdev_mq() functions to allocate the subqueues for the device.  The
+underlying kernel API will take care of the allocation and deallocation of
+the subqueue memory, as well as netdev configuration of where the queues
+exist in memory.
+
+The base driver will also need to manage the queues as it does the global
+netdev->queue_lock today.  Therefore base drivers should use the
+netif_{start|stop|wake}_subqueue() functions to manage each queue while the
+device is still operational.  netdev->queue_lock is still used when the device
+comes online or when it's completely shut down (unregister_netdev(), etc.).
+
+Finally, the base driver should indicate that it is a multiqueue device.  The
+feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
+bitmap on device initialization.  Below is an example from e1000:
+
+#ifdef CONFIG_E1000_MQ
+   if ( (adapter->hw.mac.type == e1000_82571) ||
+(adapter->hw.mac.type == e1000_82572) ||
+(adapter->hw.mac.type == e1000_80003es2lan))
+   netdev->features |= NETIF_F_MULTI_QUEUE;
+#endif
+
+
+Section 3: Qdisc support for multiqueue devices
+---
+
+Currently two qdiscs support multiqueue devices.  The default qdisc, 
pfifo_fast,
+and the PRIO qdisc.  The qdisc is responsible for classifying the skb's to
+bands and queues, and will store the queue mapping into skb->queue_mapping.
+Use this field in the base driver to determine which queue to send the skb
+to.
+
+pfifo_fast, being the default qdisc when a device is brought online, will not
+assign a queue mapping, therefore the skb will have a value of zero.  We
+cannot assume anything about the device itself, how many queues it really has,
+etc.  Therefore sending all traffic to queue 0 is the safest thing to do here.
+
+The PRIO qdisc naturally plugs into a multiqueue device.  Upon load of the
+qdisc, PRIO will make a best-effort assignment of queue to PRIO band to evenly
+distribute traffic flows.  The algorithm can be found in prio_tune() in
+net/sched/sch_prio.c.  Once the association is made, any skb that is
+classified will have skb->queue_mapping set, which will allow the driver to
+properly queue skb's to multiple queues.
+
+
+Section 4: Brief howto using PRIO for multiqueue devices
+
+
+The userspace command 'tc,' part of the iproute2 package, is used to configure
+qdiscs.  To add the PRIO qdisc to your network device, assuming the device is
+called eth0, run the following command:
+
+# tc qdisc add dev eth0 root handle 1: prio
+
+This will create 3 bands, 0 being highest priority, and associate those bands
+to the queues on your NIC.  Assuming eth0 has 2 Tx queues, the band mapping
+would look like:
+
+band 0 => queue 0
+band 1 => queue 1
+band 2 => queue 1
+
+Traffic will begin flowing through each queue if your TOS values are assigning
+traffic across the various bands.  For example, ssh traffic will always try to
+go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
+so it will be sent out queue 0.  ICMP traffic (pings) fall into the "normal"
+traffic classification, which is band 1.  Therefore pings will be send out
+queue 1 on the NIC.
+
+The behavior of tc filters remains the same, where it will override TOS 
priority
+classification.
+
+
+Author: Peter P. Waskiewicz Jr. <[EMAIL PROTECTED]>

-
To unsubscribe from this list: 

[PATCH 2/3] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-12 Thread Peter P Waskiewicz Jr
From: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>

Update: Removed unnecessary whitespace removals.  Reset skb->queue_mapping to
zero prior to enqueueing to a qdisc.  Fixed band2queue mapping algorithm for
bands less than queues.

Added an API and associated supporting routines for multiqueue network devices.
This allows network devices supporting multiple TX queues to configure each
queue within the netdevice and manage each queue independantly.  Changes to the
PRIO Qdisc also allow a user to map multiple flows to individual TX queues,
taking advantage of each queue on the device.

Signed-off-by: Peter P. Waskiewicz Jr <[EMAIL PROTECTED]>
Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 include/linux/etherdevice.h |3 +-
 include/linux/netdevice.h   |   62 ++-
 include/linux/skbuff.h  |2 +
 net/core/dev.c  |   27 +++
 net/core/skbuff.c   |3 ++
 net/ethernet/eth.c  |9 +++---
 net/sched/sch_generic.c |3 +-
 net/sched/sch_prio.c|   54 +
 8 files changed, 144 insertions(+), 19 deletions(-)

diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index 745c988..446de39 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -39,7 +39,8 @@ extern void   eth_header_cache_update(struct hh_cache 
*hh, struct net_device *dev
 extern int eth_header_cache(struct neighbour *neigh,
 struct hh_cache *hh);
 
-extern struct net_device *alloc_etherdev(int sizeof_priv);
+extern struct net_device *alloc_etherdev_mq(int sizeof_priv, int queue_count);
+#define alloc_etherdev(sizeof_priv) alloc_etherdev_mq(sizeof_priv, 1)
 static inline void eth_copy_and_sum (struct sk_buff *dest, 
 const unsigned char *src, 
 int len, int base)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 71fc8ff..f00b94a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -106,6 +106,14 @@ struct netpoll_info;
 #define MAX_HEADER (LL_MAX_HEADER + 48)
 #endif
 
+struct net_device_subqueue
+{
+   /* Give a control state for each queue.  This struct may contain
+* per-queue locks in the future.
+*/
+   unsigned long   state;
+};
+
 /*
  * Network device statistics. Akin to the 2.0 ether stats but
  * with byte counters.
@@ -324,6 +332,7 @@ struct net_device
 #define NETIF_F_GSO2048/* Enable software GSO. */
 #define NETIF_F_LLTX   4096/* LockLess TX */
 #define NETIF_F_INTERNAL_STATS 8192/* Use stats structure in net_device */
+#define NETIF_F_MULTI_QUEUE16384   /* Has multiple TX/RX queues */
 
/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT  16
@@ -534,6 +543,10 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+   /* The TX queue control structures */
+   struct net_device_subqueue  *egress_subqueue;
+   int egress_subqueue_count;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -675,6 +688,48 @@ static inline int netif_running(const struct net_device 
*dev)
return test_bit(__LINK_STATE_START, >state);
 }
 
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the device if we're multiqueue.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, u16 
queue_index)
+{
+   clear_bit(__LINK_STATE_XOFF, >egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, u16 queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   set_bit(__LINK_STATE_XOFF, >egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+ u16 queue_index)
+{
+   return test_bit(__LINK_STATE_XOFF,
+   >egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   if (test_and_clear_bit(__LINK_STATE_XOFF,
+  >egress_subqueue[queue_index].state))
+   __netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev)
+{
+   return (!!(NETIF_F_MULTI_QUEUE & dev->features));
+}
 
 /* Use this variant when it is known for sure that it
  * is executing from interrupt context.
@@ -968,8 +1023,11 @@ static inline void 

[PATCH 3/3] NET: [e1000] Example implementation of multiqueue network device API

2007-04-12 Thread Peter P Waskiewicz Jr
From: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>

This patch is *not* intended to be integrated into any tree please.  This is
fulfilling a request to demonstrate the proposed multiqueue network device
API in a driver.  The necessary updates to the e1000 driver will come in a
more official release.  This is an as-is patch to this version of e1000, and
should not be used outside of testing purposes only.

Signed-off-by: Peter P. Waskiewicz Jr <[EMAIL PROTECTED]>
---

 drivers/net/e1000/e1000.h |8 ++
 drivers/net/e1000/e1000_ethtool.c |   47 ++-
 drivers/net/e1000/e1000_main.c|  164 -
 3 files changed, 194 insertions(+), 25 deletions(-)

diff --git a/drivers/net/e1000/e1000.h b/drivers/net/e1000/e1000.h
index dd4b728..15e484e 100644
--- a/drivers/net/e1000/e1000.h
+++ b/drivers/net/e1000/e1000.h
@@ -168,6 +168,10 @@ struct e1000_buffer {
uint16_t next_to_watch;
 };
 
+struct e1000_queue_stats {
+   u64 packets;
+   u64 bytes;
+};
 
 struct e1000_ps_page { struct page *ps_page[PS_PAGE_BUFFERS]; };
 struct e1000_ps_page_dma { uint64_t ps_page_dma[PS_PAGE_BUFFERS]; };
@@ -188,9 +192,11 @@ struct e1000_tx_ring {
/* array of buffer information structs */
struct e1000_buffer *buffer_info;
 
+   spinlock_t tx_queue_lock;
spinlock_t tx_lock;
uint16_t tdh;
uint16_t tdt;
+   struct e1000_queue_stats tx_stats;
boolean_t last_tx_tso;
 };
 
@@ -218,6 +224,7 @@ struct e1000_rx_ring {
 
uint16_t rdh;
uint16_t rdt;
+   struct e1000_queue_stats rx_stats;
 };
 
 #define E1000_DESC_UNUSED(R) \
@@ -271,6 +278,7 @@ struct e1000_adapter {
 
/* TX */
struct e1000_tx_ring *tx_ring;  /* One per active queue */
+   struct e1000_tx_ring **cpu_tx_ring;
unsigned int restart_queue;
unsigned long tx_queue_len;
uint32_t txd_cmd;
diff --git a/drivers/net/e1000/e1000_ethtool.c 
b/drivers/net/e1000/e1000_ethtool.c
index 6777887..fd466a1 100644
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -105,7 +105,12 @@ static const struct e1000_stats e1000_gstrings_stats[] = {
{ "dropped_smbus", E1000_STAT(stats.mgpdc) },
 };
 
-#define E1000_QUEUE_STATS_LEN 0
+#define E1000_QUEUE_STATS_LEN \
+((struct e1000_adapter *)netdev->priv)->num_rx_queues > 1) ? \
+  ((struct e1000_adapter *)netdev->priv)->num_rx_queues : 0 ) + \
+ (struct e1000_adapter *)netdev->priv)->num_tx_queues > 1) ? \
+  ((struct e1000_adapter *)netdev->priv)->num_tx_queues : 0 ))) * \
+(sizeof(struct e1000_queue_stats) / sizeof(u64)))
 #define E1000_GLOBAL_STATS_LEN \
sizeof(e1000_gstrings_stats) / sizeof(struct e1000_stats)
 #define E1000_STATS_LEN (E1000_GLOBAL_STATS_LEN + E1000_QUEUE_STATS_LEN)
@@ -693,8 +698,10 @@ e1000_set_ringparam(struct net_device *netdev,
E1000_MAX_TXD : E1000_MAX_82544_TXD));
E1000_ROUNDUP(txdr->count, REQ_TX_DESCRIPTOR_MULTIPLE);
 
-   for (i = 0; i < adapter->num_tx_queues; i++)
+   for (i = 0; i < adapter->num_tx_queues; i++) {
txdr[i].count = txdr->count;
+   spin_lock_init(>tx_ring[i].tx_queue_lock);
+   }
for (i = 0; i < adapter->num_rx_queues; i++)
rxdr[i].count = rxdr->count;
 
@@ -1909,6 +1916,9 @@ e1000_get_ethtool_stats(struct net_device *netdev,
struct ethtool_stats *stats, uint64_t *data)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
+u64 *queue_stat;
+int stat_count = sizeof(struct e1000_queue_stats) / sizeof(u64);
+int j, k;
int i;
 
e1000_update_stats(adapter);
@@ -1917,12 +1927,29 @@ e1000_get_ethtool_stats(struct net_device *netdev,
data[i] = (e1000_gstrings_stats[i].sizeof_stat ==
sizeof(uint64_t)) ? *(uint64_t *)p : *(uint32_t *)p;
}
+if (adapter->num_tx_queues > 1) {
+for (j = 0; j < adapter->num_tx_queues; j++) {
+queue_stat = (u64 *)>tx_ring[j].tx_stats;
+for (k = 0; k < stat_count; k++)
+data[i + k] = queue_stat[k];
+i += k;
+}
+}
+if (adapter->num_rx_queues > 1) {
+for (j = 0; j < adapter->num_rx_queues; j++) {
+queue_stat = (u64 *)>rx_ring[j].rx_stats;
+for (k = 0; k < stat_count; k++)
+data[i + k] = queue_stat[k];
+i += k;
+}
+}
 /* BUG_ON(i != E1000_STATS_LEN); */
 }
 
 static void
 e1000_get_strings(struct net_device *netdev, uint32_t stringset, uint8_t *data)
 {
+   struct e1000_adapter *adapter = netdev_priv(netdev);
uint8_t *p = data;
int i;
 
@@ -1937,6 +1964,22 @@ e1000_get_strings(struct net_device 

[PATCH 0/3] [UPDATED]: Multiqueue network device support

2007-04-12 Thread Peter P Waskiewicz Jr
This is a redesign and repost of the multiqueue network device support patches. 
The new API for base drivers allows multiqueue-capable devices to manage their
individual queues in the network stack.  The stack now handles both
non-multiqueue and multiqueue devices on the same codepath.  Also, allocation
and deallocation of the queues is handled by the kernel instead of the driver.

Fixes have been integrated into this patchset based on community feedback.  A
patched version of e1000 using the multiqueue API has also been included.  NOTE
that this version of e1000 is *only* for testing purposes, and is not intended
to be integrated into the kernel at this time.  It is only for demonstration
purposes.

The e1000 patch will only work with MAC types of 82571 and higher.

Documentation is also included describing in more detail how this works, as
well as how a base driver can use the API to implement multiple queues.

These patches can also be pulled from my git repository at:

git-pull git://lost.foo-projects.org/~ppwaskie/git/net-2.6.22 mq

--
Peter P. Waskiewicz Jr.
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
> Yes, but then we should have seen more frequently, shouldn't we? I always
> run with the stack overflow check enabled and I don't think I ever saw 
> warnings in inflate.
>   

I guess the window is just while decompressing the root filesystem. 
Interrupts under Xen might be using a little more stack (~40-50 bytes?),
but its not a qualitative difference.  It might have more to do with
different timing.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pnpbios_thread_init: don't use CLONE_SIGHAND

2007-04-12 Thread Oleg Nesterov
pnp_dock_thread() calls allow_signal() which plays with parent process's
->sighand.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 2.6.21-rc5/drivers/pnp/pnpbios/core.c~3_pnp 2006-12-17 19:06:40.0 
+0300
+++ 2.6.21-rc5/drivers/pnp/pnpbios/core.c   2007-04-13 03:44:34.0 
+0400
@@ -589,7 +589,7 @@ static int __init pnpbios_thread_init(vo
return 0;
 #ifdef CONFIG_HOTPLUG
init_completion(_sem);
-   if (kernel_thread(pnp_dock_thread, NULL, CLONE_KERNEL) > 0)
+   if (kernel_thread(pnp_dock_thread, NULL, CLONE_FS | CLONE_FILES) > 0)
unloading = 0;
 #endif
return 0;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nlmclnt_recovery: don't use CLONE_SIGHAND

2007-04-12 Thread Oleg Nesterov
reclaimer() calls allow_signal() which plays with parent process's ->sighand.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 2.6.21-rc5/fs/lockd/clntlock.c~1_lockd  2007-04-05 12:04:07.0 
+0400
+++ 2.6.21-rc5/fs/lockd/clntlock.c  2007-04-13 03:20:51.0 +0400
@@ -153,7 +153,7 @@ nlmclnt_recovery(struct nlm_host *host)
if (!host->h_reclaiming++) {
nlm_get_host(host);
__module_get(THIS_MODULE);
-   if (kernel_thread(reclaimer, host, CLONE_KERNEL) < 0)
+   if (kernel_thread(reclaimer, host, CLONE_FS | CLONE_FILES) < 0)
module_put(THIS_MODULE);
}
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] usbatm_heavy_init: don't use CLONE_SIGHAND

2007-04-12 Thread Oleg Nesterov
usbatm_do_heavy_init() calls allow_signal() which plays with parent process's
->sighand.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 2.6.21-rc5/drivers/usb/atm/usbatm.c~usbatm  2006-11-27 21:19:30.0 
+0300
+++ 2.6.21-rc5/drivers/usb/atm/usbatm.c 2007-04-13 03:34:56.0 +0400
@@ -1019,7 +1019,7 @@ static int usbatm_do_heavy_init(void *ar
 
 static int usbatm_heavy_init(struct usbatm_data *instance)
 {
-   int ret = kernel_thread(usbatm_do_heavy_init, instance, CLONE_KERNEL);
+   int ret = kernel_thread(usbatm_do_heavy_init, instance, CLONE_FS | 
CLONE_FILES);
 
if (ret < 0) {
usb_err(instance, "%s: failed to create kernel_thread (%d)!\n", 
__func__, ret);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [KERNEL-DOC] fix tex error when building pdfdocs

2007-04-12 Thread Randy Dunlap
On Thu, 12 Apr 2007 22:38:42 +0200 Borislav Petkov wrote:

> When building pdfdocs, the db2pdf converter bails out because of an
> latex-reserved token  - '#' - in the intermediary .tex file which ends up in a
> conversion error with the following error message: 
> 
> 
> [15.0.32])
> ! Incomplete \iffalse; all text was ignored after line 8154.
> 
> \fi
> <*> kernel-hacking.tex
> 
> 
> This is a rather arbitrary fix, so suggest away.

Hi,

I don't have a problem with the change, but I don't get that tex error either.
Here is an extract from the .tex file:


{\def\Element%
{451}\def\ProcessingMode%
{title-sosofo-mode}}\#if\endNode{}\endSeq{}\endLink{}\Seq%
{}\Leader%
{}.\endLeader{}\Link%


> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> 
> Index: 21-rc6/Documentation/DocBook/kernel-hacking.tmpl
> ===
> --- 21-rc6.orig/Documentation/DocBook/kernel-hacking.tmpl
> +++ 21-rc6/Documentation/DocBook/kernel-hacking.tmpl
> @@ -1138,7 +1138,7 @@ static struct block_device_operations op
>
>  
>
> -   if
> +   Prepocessor Conditionals
> 
> 
>  It is generally considered cleaner to use macros in header files
> 
> -

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread William Lee Irwin III
On Thu, 12 Apr 2007 16:10:50 -0700 William Lee Irwin III <[EMAIL PROTECTED]> 
wrote:
>> This solves a real-life problem for Oracle system monitoring software
>> (specifically EM). Among the tasks it must carry out is determining
>> per-process memory footprint of a set of cooperating tasks (i.e. Oracle
>> processes). RSS is inadequate for this due to page sharing; this work
>> provides sufficient information to determine what EM needs.

On Thu, Apr 12, 2007 at 04:32:35PM -0700, Andrew Morton wrote:
> Not a good idea to expose raw flags in this manner - it changes at the drop
> of a hat.  We'd need to also expose the kernel's PG_foo-to-bitnumber
> mapping to make this viable.
> Not a good idea to use page->_count: page_count() will be more stable. 
> Otherwise OK, I guess: the interpretation of the page refcount is unlikely
> to change much over time.

EM wants to determine page_mapcount() for the most part for the
purposes of determining "uniquely attributable RSS" (my ca. 2004
nomenclature) or "proportional RSS" (mpm's more recent nomenclature);
as things now stand it will have to infer them by maintaining a table
of pfn's and mappings thereof, but at least that can be done with it.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Andi Kleen
On Friday 13 April 2007 01:20:40 Jan Engelhardt wrote:
> 
> On Apr 12 2007 15:39, Jeremy Fitzhardinge wrote:
> >Andi Kleen wrote:
> >> Hmm, does Xen perhaps not use interrupt stacks? Normally 2.7k should be 
> >> still
> >> green as long as there are not too many functions above/below it.
> >
> >That's a good point, I'll need to check that.  Still, nearly 3k of stack!
> 
> I bite. Would compressing the vmlinux binary with LZO or LZMA make an
> improvement to the bootstrap uncompress stack usage?

We don't care about the stack usage, as long as it doesn't overflow.
It's a very limited piece of code that doesn't run on top or below
other subsystems.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/30] Use menuconfig objects

2007-04-12 Thread Andrew Morton
On Fri, 13 Apr 2007 01:16:35 +0200 (MEST)
Jan Engelhardt <[EMAIL PROTECTED]> wrote:

> On Apr 12 2007 15:50, Andrew Morton wrote:
> >On Tue, 10 Apr 2007 21:17:40 +0200 (MEST)
> >Jan Engelhardt <[EMAIL PROTECTED]> wrote:
> >
> >> the following patch series turns some menus into menuconfigs, so they 
> >> can be disabled whilst "walking" thorugh the parent menu
> >
> >So I merged the 23 of these which survived review and which do not
> >intersect with other outstanding work.
> >
> >I don't think I have an opinion on whether the change is actually an
> >
> >If we're going to make this change, we should ensure that it is done
> >kernel-wide, for UI consistency reasons.
> 
> If time permits, I'll go through the rest of the menus I find
> eligible for menuconfig-izing.

OK.  It's encouraging that Randy is on board.

> Does it help to base them on -mm to work better with outstanding work?

At this stage in the development cycle:

 5444 files changed, 530428 insertions(+), 179401 deletions(-)

yes, it helps quite a lot.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Jeremy Fitzhardinge
Jan Engelhardt wrote:
> On Apr 12 2007 15:39, Jeremy Fitzhardinge wrote:
>   
>> Andi Kleen wrote:
>> 
>>> Hmm, does Xen perhaps not use interrupt stacks? Normally 2.7k should be 
>>> still
>>> green as long as there are not too many functions above/below it.
>>>   
>> That's a good point, I'll need to check that.  Still, nearly 3k of stack!
>> 
>
> I bite. Would compressing the vmlinux binary with LZO or LZMA make an
> improvement to the bootstrap uncompress stack usage?
>   

Well, the thread started with my patch to fix inflate.  The stack usage
of LZO or LZMA decompressors will primarily depend on how they're
implemented rather than any inherent property of the algorithms.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread Andrew Morton
On Thu, 12 Apr 2007 16:10:50 -0700
William Lee Irwin III <[EMAIL PROTECTED]> wrote:

> On Tue, Apr 03, 2007 at 09:43:30PM -0500, Matt Mackall wrote:
> > This patch series introduces /proc/pid/pagemap and /proc/kpagemap,
> > which allow detailed run-time examination of process memory usage at a
> > page granularity.
> > The first several patches whip the page-walking code introduced for
> > /proc/pid/smaps and clear_refs into a more generic form, the next
> > couple make those interfaces optional, and the last two introduce the
> > new interfaces, also optional.
> 
> This solves a real-life problem for Oracle system monitoring software
> (specifically EM). Among the tasks it must carry out is determining
> per-process memory footprint of a set of cooperating tasks (i.e. Oracle
> processes). RSS is inadequate for this due to page sharing; this work
> provides sufficient information to determine what EM needs.
> 
> 

I'm still dying to see what the human-readable output from this
thing looks like.



> + * Each entry is a pair of unsigned longs representing the
> + * corresponding physical page, the first containing the page flags
> + * and the second containing the page use count.
> + *
> + * The first 4 bytes of this file form a simple header:
> + *
> + * first byte:   0 for big endian, 1 for little
> + * second byte:  page shift (eg 12 for 4096 byte pages)
> + * third byte:   entry size in bytes (currently either 4 or 8)
> + * fourth byte:  header size
>
> ...
>
> +   while (count > 0) {
> +   chunk = min_t(size_t, count, PAGE_SIZE);
> +   i = 0;
> +
> +   if (pfn == -1) {
> +   page[0] = 0;
> +   page[1] = 0;
> +   ((char *)page)[0] = (ntohl(1) != 1);

OK.

> +   ((char *)page)[1] = PAGE_SHIFT;

OK.

> +   ((char *)page)[2] = sizeof(unsigned long);

OK.

> +   ((char *)page)[3] = KPMSIZE;

OK.

> +   i = 2;
> +   pfn++;
> +   }
> +
> +   for (; i < 2 * chunk / KPMSIZE; i += 2, pfn++) {
> +   ppage = pfn_to_page(pfn);
> +   if (!ppage) {
> +   page[i] = 0;
> +   page[i + 1] = 0;
> +   } else {
> +   page[i] = ppage->flags;
> +   page[i + 1] = atomic_read(>_count);
> +   }
> +   }

Not a good idea to expose raw flags in this manner - it changes at the drop
of a hat.  We'd need to also expose the kernel's PG_foo-to-bitnumber
mapping to make this viable.

Not a good idea to use page->_count: page_count() will be more stable. 
Otherwise OK, I guess: the interpretation of the page refcount is unlikely
to change much over time.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH resend][CRYPTO]: RSA algorithm patch

2007-04-12 Thread Indan Zupancic
Hello,

Next time, please do a reply-all so CC's aren't dropped.

It seems you jumped halfway in, missing some background info, I'll try to
clarify some things.

On Thu, April 12, 2007 23:28, David Wagner wrote:
> Yes, Satyam Sharma is 100% correct.  Unpadded RSA makes no sense.  RSA is
> not secure if you omit the padding.  If you have a good reason why RSA
> needs to be in the kernel for security reasons, then the padding has to be
> in the kernel, too.  Putting plain unpadded RSA in the kernel seems bogus.

He is correct, I only argued that's it can still be named RSA (which Satyam
disputed), no matter what critical features are missing for a complete
infrastructure.

I don't know if you read the patch, but right now it's only a multi-precision
integer implementation, useful to implement RSA. The rest, including the
binary checking, is missing. We're pondering a bit about what, in the end,
would be useful to have in or around the kernel.


> I worry about the quality of this patch if it is using unpadded RSA.
> This is pretty elementary stuff.  No one should be implementing their
> own crypto code unless they have considerable competence and knowledge
> of cryptography.  This elementary error leaves reason to be concerned
> about whether the developer of this patch has the skills that are needed
> to write this kind of code and get it right.

As said above, the patch is only an MPI implementation, not RSA, and neither
the rest to make it useful, like a crypto API interface and padding.

So we can't really judge the developer's skills or crypto knowledge.

It does point out that having a hidden implementation can never foster
much trust, as no one can read the code and judge if it's good or not.


> People often take it personally when I tell them that they do are not
> competent to write their own crypto code, but this is not a personal
> attack.  It takes very specialized knowledge and considerable study
> before one can write your own crypto implementation from scratch and
> have a good chance that the result will be secure.  People without
> those skills shouldn't be writing their own crypto code, at least not
> if security is important, because it's too easy to get something wrong.

To a certain degree you're right, but the nice thing about open source is
that people who know better can spot errors, and if those are fixed, it
can happen that an "incompetent" person created something excellent. Maybe
not at first, but in the end. (I suspect that a good coder with no crypto
knowledge, but with feedback from experts, can implement something better
than one expert with mediocre coding skills.)

The code should be judged, not the people writing it. Besides, it isn't
always that hard to get something secure, if things are kept simple and
straightforward. E.g. writing a secure AES implementation isn't magic.
RSA is much more complex though. (Rather ironic, as the theory behind RSA
is simple, but the implementation hairy. With AES it's exactly the opposite.
The coder doesn't need to understand the algebra behind it, knowing that it
can be done with a simple table lookup is enough).

In general the tricky part is around the crypto implementation itself, how
it's used, key management, etc. (Though the border is vague, so maybe you
included all that when saying "crypto implementation".)


> (No, just reading Applied Cryptography is not good enough.)  My experience
> is that code that contains elementary errors like this is also likely
> to contain more subtle errors that are harder to spot.  In short, I'm
> not getting warm fuzzies here.

The code posted has no such errors, see above. Maybe the part that wasn't has,
who knows.


> And no, you can't just blithely push padding into user space and expect
> that to make the security issues go away.  If you are putting the
> RSA exponentiation in the kernel because you don't trust user space,
> then you have to put the padding in the kernel, too, otherwise you're
> vulnerable to attack from evil user space code.

True, but the code wasn't put into the kernel for security reasons. Why it
was remains a bit of a mystery, but it looks like it was for convenience.

Greetings,

Indan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/30] Use menuconfig objects

2007-04-12 Thread Randy Dunlap

Jan Engelhardt wrote:

Hi,

On Apr 12 2007 16:07, Randy Dunlap wrote:

On Thu, 12 Apr 2007 15:50:12 -0700 Andrew Morton wrote:

So I merged the 23 of these which survived review and which do not
intersect with other outstanding work.

I don't think I have an opinion on whether the change is actually an
improvement, and I don't get a clear sense of what others think.  Shrug.

I like them, but then I have made & sent similar patches in the past.


Would you like to go through remaining menus and make the patches?
Just that efforts are not needlessy duplicated again. And of course
for you to get your share if you desire so.


Hi,

I'm a bit too busy at the moment so please go ahead with it.

--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: intermittant petabyte usage reported with broadcom nic

2007-04-12 Thread Roland Dreier
[Adding Michael Chan, who seems to look after bnx2, to the cc list]

 > To clarify it's an Intel Dual Core Xeon (I just wound up as thinking of
 > them all as amd64s). Network card driver in use is the one defined by
 > CONFIG_BNX2. Kernel's monolithic.

>From a quick look at bnx2.c, it seems that the driver gives the NIC
(firmware?) a block of memory to DMA stats into, and just reads from
that memory in its get_stats method.  So if you're seeing wonky stats
from the NIC intermittently, my best guess would be that firmware is
occasionally writing junk into the stats block.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Jan Engelhardt

On Apr 12 2007 15:39, Jeremy Fitzhardinge wrote:
>Andi Kleen wrote:
>> Hmm, does Xen perhaps not use interrupt stacks? Normally 2.7k should be still
>> green as long as there are not too many functions above/below it.
>
>That's a good point, I'll need to check that.  Still, nearly 3k of stack!

I bite. Would compressing the vmlinux binary with LZO or LZMA make an
improvement to the bootstrap uncompress stack usage?


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: intermittant petabyte usage reported with broadcom nic

2007-04-12 Thread CaT
On Thu, Apr 12, 2007 at 04:18:24PM -0700, Roland Dreier wrote:
> > > Apr 11 22:14:02 '  eth0:220898233988841368 66750274000 0  
> > > 0  86458738 52386430545 101089219 19931300 0  199313  
> > > 0 '
> 
> > > Apr 11 22:15:02 '  eth0:17227454818 81381144000 0 
> > >  0 0 33091307388 86658381000 0   0  0 
> > > '
> 
> > But in fact I think you're saying that the numbers go bad, and then stay 
> > bad.
> 
> Doesn't look like it -- one minute after the first hiccup the eth0 #s
> look reasonable again.

Yeah. Sorry for not making it clear. I included good values on either
side of the bad one.

-- 
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/30] Use menuconfig objects

2007-04-12 Thread Jan Engelhardt
Hi,

On Apr 12 2007 15:50, Andrew Morton wrote:
>On Tue, 10 Apr 2007 21:17:40 +0200 (MEST)
>Jan Engelhardt <[EMAIL PROTECTED]> wrote:
>
>> the following patch series turns some menus into menuconfigs, so they 
>> can be disabled whilst "walking" thorugh the parent menu
>
>So I merged the 23 of these which survived review and which do not
>intersect with other outstanding work.
>
>I don't think I have an opinion on whether the change is actually an
>
>If we're going to make this change, we should ensure that it is done
>kernel-wide, for UI consistency reasons.

If time permits, I'll go through the rest of the menus I find
eligible for menuconfig-izing.

Does it help to base them on -mm to work better with outstanding work?


Thanks,
Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: intermittant petabyte usage reported with broadcom nic

2007-04-12 Thread Roland Dreier
 > > Apr 11 22:14:02 '  eth0:220898233988841368 66750274000 0   
 > >0  86458738 52386430545 101089219 19931300 0  199313
 > >   0 '

 > > Apr 11 22:15:02 '  eth0:17227454818 81381144000 0  
 > > 0 0 33091307388 86658381000 0   0  0 '

 > But in fact I think you're saying that the numbers go bad, and then stay bad.

Doesn't look like it -- one minute after the first hiccup the eth0 #s
look reasonable again.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/30] Use menuconfig objects

2007-04-12 Thread Jan Engelhardt
Hi,

On Apr 12 2007 16:07, Randy Dunlap wrote:
>On Thu, 12 Apr 2007 15:50:12 -0700 Andrew Morton wrote:
>> 
>> So I merged the 23 of these which survived review and which do not
>> intersect with other outstanding work.
>> 
>> I don't think I have an opinion on whether the change is actually an
>> improvement, and I don't get a clear sense of what others think.  Shrug.
>
>I like them, but then I have made & sent similar patches in the past.

Would you like to go through remaining menus and make the patches?
Just that efforts are not needlessy duplicated again. And of course
for you to get your share if you desire so.


Thanks,
Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-04-12 Thread David Miller
From: Paul Mackerras <[EMAIL PROTECTED]>
Date: Wed, 21 Mar 2007 11:03:14 +1100

> Linus Torvalds writes:
> 
> > We should just do this natively. There's been several tests over the years 
> > saying that it's much more efficient to do sti/cli as a simple store, and 
> > handling the "oops, we got an interrupt while interrupts were disabled" as 
> > a special case.
> > 
> > I have this dim memory that ARM has done it that way for a long time 
> > because it's so expensive to do a "real" cli/sti.
> > 
> > And I think -rt does it for other reasons. It's just more flexible.
> 
> 64-bit powerpc does this now as well.

I was curious about this so I had a look.

There appears to be three pieces of state used to manage this
on powerpc, PACASOFTIRQEN(r13), PACAHARDIRQEN(r13) and the
SOFTE() in the stackframe.

Plus there is all of this complicated logic on trap entry and
exit to manage these three values properly.

local_irq_restore() doesn't look like a simple piece of code
either.  Logically it should be simple, update the software
binary state, and if enabling see if any interrupts came in
while we were disable so we can run them.

Given all of that, is it really cheaper than just flipping the
bit in the cpu control register? :-/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc6-mm1 USB related boot hang

2007-04-12 Thread Jiri Kosina
On Thu, 12 Apr 2007, Helge Hafting wrote:

> Are you sure this is the correct patch - against 2.6.21-rc6-mm1 ?
> Hunk 1 out of 1 failed . . .

Well I am pretty sure:

box:~/scratch # wget 
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/2.6.21-rc6-mm1.bz2>/dev/null
 2>&1
box:~/scratch # wget 
ftp://ftp.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.tar.bz2>/dev/null 2>&1
box:~/scratch # wget 
ftp://ftp.kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.21-rc6.bz2>/dev/null
 2>&1 
box:~/scratch # tar xf linux-2.6.20.tar.bz2
box:~/scratch # cd linux-2.6.20/
box:~/scratch/linux-2.6.20 # mv ../patch-2.6.21-rc6.bz2 .
box:~/scratch/linux-2.6.20 # bunzip2 patch-2.6.21-rc6.bz2
box:~/scratch/linux-2.6.20 # patch -p1 < patch-2.6.21-rc6 >/dev/null 2>&1; echo 
$?
0
box:~/scratch/linux-2.6.20 # mv ../2.6.21-rc6-mm1.bz2 .
box:~/scratch/linux-2.6.20 # bunzip2 2.6.21-rc6-mm1.bz2
box:~/scratch/linux-2.6.20 # patch -p1 < 2.6.21-rc6-mm1 >/dev/null 2>&1; echo $?
0
box:~/scratch/linux-2.6.20 # cat tmp.patch
diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
index 1ddca31..d930f62 100644
--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -1550,15 +1550,22 @@ static int __init hid_init(void)
retval = hiddev_init();
if (retval)
goto hiddev_init_fail;
+   printk(KERN_DEBUG "hid_init: before usb_register()\n");
retval = usb_register(_driver);
+   printk(KERN_DEBUG "hid_init: after usb_register(), retuned %d\n", 
retval);
if (retval)
goto usb_register_fail;
info(DRIVER_VERSION ":" DRIVER_DESC);

+   printk(KERN_DEBUG "hid_init: returning 0\n");
+   dump_stack();
return 0;
 usb_register_fail:
+   printk(KERN_DEBUG "hid_init: calling hiddev_exit()\n");
hiddev_exit();
 hiddev_init_fail:
+   printk(KERN_DEBUG "hid_init: returning %d\n", retval);
+   dump_stack();
return retval;
 }
box:~/scratch/linux-2.6.20 # patch -p1 < tmp.patch
patching file drivers/hid/usbhid/hid-core.c
box:~/scratch/linux-2.6.20 #

So I guess you are operating on some broken version of 2.6.21-rc6-mm1 
codebase if you are getting rejects on this trivial patch.


Anyway, based on information you have provided in your later messages, it 
seems that it is probably not necessairly related neither to USB nor HID, 
as you are getting hangs at different stages of boot, depending on your 
local configuration/kernel version used.

Is vanilla 2.6.21-rc6 ok? If so, would you have time to bisect the 
offending patch?

Thanks,

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: intermittant petabyte usage reported with broadcom nic

2007-04-12 Thread Roland Dreier
 > Apr  9 06:19:04 '  eth0:14250798570591813804 2284720007938 1863800 
 > 18638  0  27375938 1556640980159 3345714490000 0 
 >   0  0 '

One odd thing is that crazy number 14250798570591813804 is
c5c501cbc5c500ac in hex.  I dunno what the significant of the 0xc5 bit
pattern is though...

The other line has 220898233988841368, which is 0x310c9c6006a7f98, not
nearly so regular a patter.

I don't think I'm helping much...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [DEBUG] sd-sched: monitor dynamic priority levels of a running task

2007-04-12 Thread Dmitry Adamushko

Hi,

[ just in case, it can be of some avail for anybody ]

target : 2.6.21-rc6-mm1

a very simplified but quite funny "toy" that

[1]  allows to monitor all the dynamic priority levels (counts a
number of hits per level) on which a given task (configured via proc)
is running;

# echo "pid" > /proc/sd_pid  - to monitor a task with a given pid";
# cat /proc/sd_slots  - to dump statistics.

[2]  triggers a message when task's *prio* and *static_prio* are out
of sync, i.e. a current prio is not allowed by
prio_matrix[USER_PRIO(static_prio)].

[ example: --- [1857] static: 35, slot: 3  -  nice ]

maybe Con has something similar.. but at least I haven't found
anything on his website.


e.g. for X all the following scenarios give different (obviously)
patterns: (1) just occasional cpu users ; (2) a cpu hog with the same
static_prio; (3) a niced cpu hog.

There are cases when [2] is triggered indeed. It's due to
set_user_nice(). Con, is it a "feature"?

 [ explanation ] -
In fact, all this "delta" calculation (delta = p->prio - old_prio)
staff is useless in set_user_prio() as effective_prio() returns just
the old p->prio and, as a result, we have got p->prio = p->prio :)  It
makes sense to use delta = p->static_prio - old_static_prio;

The p->prio will be recalculated as a result of enqueue_task ->
__enqueue_task -> recalc_task_prio ..  _but_ if the task is currently
in the "active" array and its time_slice != 0 -- the old p->prio is
not changed

So the task is queued taking into account the old_prio, although this
slot can be prohibited by a new p->static_prio. It's only for the very
first slot so one may call it err.. a feature (?)


--
Best regards,
Dmitry Adamushko
--- linux-2.6.21-rc6-mm1/kernel/sched-orig3.c	2007-04-11 14:48:19.0 +0200
+++ linux-2.6.21-rc6-mm1/kernel/sched.c	2007-04-12 16:13:12.0 +0200
@@ -260,6 +260,164 @@ struct rq {
 static DEFINE_PER_CPU(struct rq, runqueues);
 static DEFINE_MUTEX(sched_hotcpu_mutex);
 
+#define DEBUG_SD_SLOTS
+#ifdef DEBUG_SD_SLOTS
+
+#include 
+
+static int sd_monitor_pid, sd_monitor_idx;
+static unsigned long sd_slot_hits[PRIO_RANGE];
+static struct proc_dir_entry *sd_pid_dir, *sd_slots_dir;
+static int sd_debug_done;
+
+static void init_debug_slots(void);
+
+static void reset_slot_hits(void)
+{
+	int i = 0;
+
+	for ( ; i < PRIO_RANGE; i++)
+		sd_slot_hits[i] = 0;
+}
+
+static inline void debug_check_slot_validity(struct task_struct *p)
+{
+	int sprio = USER_PRIO(p->static_prio), uprio = USER_PRIO(p->prio);
+
+	/* SCHED_BATCH and rt tasks don't use prio_matrix so just skip them. */
+	if (p->policy == SCHED_BATCH || rt_task(p))
+		return;
+
+	if (unlikely(!sd_debug_done))
+		init_debug_slots();
+
+	if (sd_monitor_pid && p->pid == sd_monitor_pid)
+		++sd_slot_hits[uprio];
+
+	if (test_bit(uprio, prio_matrix[sprio]))
+		printk(KERN_EMERG "--- [%d] static: %d, slot: %d  -  %s\n",
+			p->pid, sprio, uprio, p->comm);
+}
+
+static int sd_pid_proc_read(char *page, char **start, off_t off,
+			int count, int *eof, void *data)
+{
+	char *p = page;
+	int len = 0;
+
+	p += sprintf(p, "pid: %d\n", sd_monitor_pid);
+
+len = p - page - off;
+
+if (len <= off + count)
+*eof = 1;
+*start = page + off;
+if (len > count)
+len = count;
+if (len < 0)
+len = 0;
+
+return len;
+}
+
+static int sd_pid_proc_write(struct file *file, const char __user *buffer,
+			unsigned long count, void *data)
+{
+	struct task_struct *task;
+char *end, buf[16];
+long pid;
+int n;
+
+n = count > sizeof(buf) - 1 ? sizeof(buf) - 1 : count;
+
+if (copy_from_user(buf, buffer, n))
+return -EFAULT;
+
+buf[n] = '\0';
+pid = simple_strtol(buf, , 0);
+
+	/* Stop monitoring. */
+	if (!pid) {
+		sd_monitor_pid = 0;
+		goto out_exit;
+	}
+
+	read_lock(_lock);
+	task = find_task_by_pid(pid);
+
+	if (!task || task->policy == SCHED_BATCH || rt_task(task)) {
+		read_unlock(_lock);
+
+		printk(KERN_EMERG "*** don't monitor SCHED_BATCH or Real-Time tasks ***\n");
+		goto out_exit;
+	}
+
+	sd_monitor_idx = USER_PRIO(task->static_prio);
+	read_unlock(_lock);
+
+	reset_slot_hits();
+	sd_monitor_pid = pid;
+
+out_exit:
+return count;
+}
+
+static int sd_slots_proc_read(char *page, char **start, off_t off,
+			int count, int *eof, void *data)
+{
+	int len = 0, i = 0;
+	char *p = page;
+
+	if (!sd_monitor_pid)
+		goto out_exit;
+
+	p += sprintf(p, " slot  allowed hits\n\n");
+
+	for ( ; i < PRIO_RANGE; i++)
+		p += sprintf(p, "[ %d ] - %d : %lu \n",
+			i, !!test_bit(i, prio_matrix[sd_monitor_idx]), sd_slot_hits[i]);
+
+out_exit:
+len = p - page - off;
+
+if (len <= off + count)
+*eof = 1;
+*start = page + off;
+if (len > count)
+len = count;
+if (len < 0)
+len = 0;
+
+

Re: intermittant petabyte usage reported with broadcom nic

2007-04-12 Thread Andrew Morton
On Fri, 13 Apr 2007 08:52:49 +1000
CaT <[EMAIL PROTECTED]> wrote:

> On Mon, Apr 02, 2007 at 12:13:00AM -0700, Andrew Morton wrote:
> > On Mon, 2 Apr 2007 11:43:19 +1000 CaT <[EMAIL PROTECTED]> wrote:
> > 
> > > I take minute by minute snapshots of network traffic by sampling
> > > /proc/net/dev and most of the time everything works fine. Occasionally
> > > though I get petabyte byte traffic and corresponding packet traffic.
> > 
> > How frequently?
> > 
> > Are you able to provide some actual numbers (expected and actual values),
> > so we can look at the bit patterns?
> 
> I have some now. These are raw lines from /proc/net/dev. In this case it's
> eth0 at 22:14 that chucked a wee wibbly.
> 
> Apr 11 22:13:02 '  eth0:17227166357 81379716000 0  0  
>0 33090495625 86656584000 0   0  0 '
> Apr 11 22:13:02 '  eth1:30708022097 91219466000 0  0  
>0 122989582024 125073786000 0   0  0 '
> Apr 11 22:14:02 '  eth0:220898233988841368 66750274000 0  
> 0  86458738 52386430545 101089219 19931300 0  199313  
> 0 '

0x310_c9c6_006a_7f98

Not sure what to make of that.

> Apr 11 22:14:02 '  eth1:30708307787 91220183000 0  0  
>0 122989665004 125074344000 0   0  0 '
> Apr 11 22:15:02 '  eth0:17227454818 81381144000 0  0  
>0 33091307388 86658381000 0   0  0 '
> Apr 11 22:15:02 '  eth1:30708569308 91220742000 0  0  
>0 122989732601 125074712000 0   0  0 '
> 
> On another server (same hardware except for 2ru case, more ram and more hds):
> 
> Apr  9 06:18:05 '  eth0:1556640056941 3598105481000 0 
>  0 0 2281147324747 3318270401000 0   0  0 
> '
> Apr  9 06:18:05 '  eth1:912389249044 1190286687000 0  
> 0 0 642943095469 991257887000 0   0  0 '
> Apr  9 06:19:04 '  eth0:14250798570591813804 2284720007938 1863800 
> 18638  0  27375938 1556640980159 3345714490000 0  
>  0  0 '

0xc5c5_01cb_c5c5_00ac and 0x213_f3ec_ab02

The first one looks like trashed memory: it got overwritten by kernel
addresses.  Except they're x86-32 kernel addresses, and you're running
x86_64 64-bit kernel.  hm.

I don't see any pattern here.

> Apr  9 06:19:04 '  eth1:912389281939 1190287072000 0  
> 0 0 642943219035 991258183000 0   0  0 '
> Apr  9 06:20:05 '  eth0:1556643514710 3598121584000 0 
>  0 0 2281154391794 3318284878000 0   0  0 
> '
> Apr  9 06:20:05 '  eth1:912389305767 1190287354000 0  
> 0 0 642943273879 991258351000 0   0  0 '
> 
> > > This happens on an AMD64, dual core smp box with Broadcom NetXtreme II
> > > nics.
> > 
> > What driver drivers that?  b44.c?
> 
> To clarify it's an Intel Dual Core Xeon (I just wound up as thinking of
> them all as amd64s). Network card driver in use is the one defined by
> CONFIG_BNX2. Kernel's monolithic.
> 
> > We do perform racy 64-bit updates of some of the stats counters.  But
> > that'll only affect 32-bit kernels and I'm assuming you're running a 64-bit
> > kernel on that AMD64 box (are you?)
> 
> Yes. With 32bit compat for executables built in.

OK.  I was earlier assuming that you were seeing transient funny numbers. 
But in fact I think you're saying that the numbers go bad, and then stay
bad.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] maps: pagemap, kpagemap, and related cleanups

2007-04-12 Thread William Lee Irwin III
On Tue, Apr 03, 2007 at 09:43:30PM -0500, Matt Mackall wrote:
> This patch series introduces /proc/pid/pagemap and /proc/kpagemap,
> which allow detailed run-time examination of process memory usage at a
> page granularity.
> The first several patches whip the page-walking code introduced for
> /proc/pid/smaps and clear_refs into a more generic form, the next
> couple make those interfaces optional, and the last two introduce the
> new interfaces, also optional.

This solves a real-life problem for Oracle system monitoring software
(specifically EM). Among the tasks it must carry out is determining
per-process memory footprint of a set of cooperating tasks (i.e. Oracle
processes). RSS is inadequate for this due to page sharing; this work
provides sufficient information to determine what EM needs.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Matt Mackall
On Thu, Apr 12, 2007 at 03:57:48PM -0700, Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
> > On Thu, Apr 12, 2007 at 01:50:54PM -0700, Jeremy Fitzhardinge wrote:
> >   
> >> -#define HEAP_SIZE 0x3000
> >> +#define HEAP_SIZE 0x4000
> >> 
> >
> > There are a bunch more of these that'll need fixing.
> >   
> 
> Like this?

I'm not sure what the story is with the platforms that bump this to
0x1, but this does get the rest of them.

Acked-by: Matt Mackall <[EMAIL PROTECTED]>

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Chuck Ebbert
Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
>>> (This was under Xen, but there's no reason it couldn't happen on bare
>>>   hardware.)
>>> 
>> Hmm, does Xen perhaps not use interrupt stacks?
> 
> Looks like that's all done in do_IRQ, so it should be independent of
> whether its Xen or not.  And the stack overflow check is performed on
> the main stack, before switching to the interrupt stack.
> 

Yeah, the do_IRQ thing is misleading because it makes you think the
interrupt caused an overflow when all it did was detect a
near-overflow condition. (The number printed is the amount of space
left.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/30] Use menuconfig objects

2007-04-12 Thread Randy Dunlap
On Thu, 12 Apr 2007 15:50:12 -0700 Andrew Morton wrote:

> On Tue, 10 Apr 2007 21:17:40 +0200 (MEST)
> Jan Engelhardt <[EMAIL PROTECTED]> wrote:
> 
> > the following patch series turns some menus into menuconfigs, so they 
> > can be disabled whilst "walking" thorugh the parent menu
> 
> So I merged the 23 of these which survived review and which do not
> intersect with other outstanding work.
> 
> I don't think I have an opinion on whether the change is actually an
> improvement, and I don't get a clear sense of what others think.  Shrug.

I like them, but then I have made & sent similar patches in the past.


> If we're going to make this change, we should ensure that it is done
> kernel-wide, for UI consistency reasons.
> 
> If nothing else happens, I guess I'll spray these patches at the relevant
> maintainers in a couple of weeks time, see what sticks.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Andi Kleen
On Friday 13 April 2007 00:56:56 Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> >> (This was under Xen, but there's no reason it couldn't happen on bare
> >>   hardware.)
> >> 
> >
> > Hmm, does Xen perhaps not use interrupt stacks?
> 
> Looks like that's all done in do_IRQ, so it should be independent of
> whether its Xen or not.  And the stack overflow check is performed on
> the main stack, before switching to the interrupt stack.

Yes, but then we should have seen more frequently, shouldn't we? I always
run with the stack overflow check enabled and I don't think I ever saw 
warnings in inflate.

Something must be different in the Xen setup. Dunno if it's a bug,
but such differences could cause more problems later.

-Andi



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
> On Thu, Apr 12, 2007 at 01:50:54PM -0700, Jeremy Fitzhardinge wrote:
>   
>> -#define HEAP_SIZE 0x3000
>> +#define HEAP_SIZE 0x4000
>> 
>
> There are a bunch more of these that'll need fixing.
>   

Like this?

diff -r 2ad8a0729f26 arch/alpha/boot/misc.c
--- a/arch/alpha/boot/misc.cThu Apr 12 13:44:02 2007 -0700
+++ b/arch/alpha/boot/misc.cThu Apr 12 15:48:43 2007 -0700
@@ -98,7 +98,7 @@ static ulg free_mem_ptr;
 static ulg free_mem_ptr;
 static ulg free_mem_ptr_end;
 
-#define HEAP_SIZE 0x2000
+#define HEAP_SIZE 0x3000
 
 #include "../../../lib/inflate.c"
 
diff -r 2ad8a0729f26 arch/arm/boot/compressed/misc.c
--- a/arch/arm/boot/compressed/misc.c   Thu Apr 12 13:44:02 2007 -0700
+++ b/arch/arm/boot/compressed/misc.c   Thu Apr 12 15:48:43 2007 -0700
@@ -239,7 +239,7 @@ static ulg free_mem_ptr;
 static ulg free_mem_ptr;
 static ulg free_mem_ptr_end;
 
-#define HEAP_SIZE 0x2000
+#define HEAP_SIZE 0x3000
 
 #include "../../../../lib/inflate.c"
 
diff -r 2ad8a0729f26 arch/arm26/boot/compressed/misc.c
--- a/arch/arm26/boot/compressed/misc.c Thu Apr 12 13:44:02 2007 -0700
+++ b/arch/arm26/boot/compressed/misc.c Thu Apr 12 15:48:43 2007 -0700
@@ -182,7 +182,7 @@ static ulg free_mem_ptr;
 static ulg free_mem_ptr;
 static ulg free_mem_ptr_end;
 
-#define HEAP_SIZE 0x2000
+#define HEAP_SIZE 0x3000
 
 #include "../../../../lib/inflate.c"
 
diff -r 2ad8a0729f26 arch/x86_64/boot/compressed/misc.c
--- a/arch/x86_64/boot/compressed/misc.cThu Apr 12 13:44:02 2007 -0700
+++ b/arch/x86_64/boot/compressed/misc.cThu Apr 12 15:48:43 2007 -0700
@@ -189,7 +189,7 @@ static long free_mem_ptr;
 static long free_mem_ptr;
 static long free_mem_end_ptr;
 
-#define HEAP_SIZE 0x6000
+#define HEAP_SIZE 0x7000
 
 static char *vidmem = (char *)0xb8000;
 static int vidport;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
>> (This was under Xen, but there's no reason it couldn't happen on bare
>>   hardware.)
>> 
>
> Hmm, does Xen perhaps not use interrupt stacks?

Looks like that's all done in do_IRQ, so it should be independent of
whether its Xen or not.  And the stack overflow check is performed on
the main stack, before switching to the interrupt stack.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH resend][CRYPTO]: RSA algorithm patch

2007-04-12 Thread Indan Zupancic
On Thu, April 12, 2007 23:13, Satyam Sharma wrote:
> But timing attacks are not exclusive to RSA / asymmetric
> cryptosystems. Such (side channel / timing / power measurement / bus
> access) attacks are possible against AES, etc too.

True, but those are often easier to protect, or are less vulnerable in
the first place. (E.g. it isn't very hard to make a constant time AES
implementation. The operations it does are independent of the key.)


> Of course, now we're really moving into a different realm -- I guess
> in security there is always a threshold, and you really needn't care
> beyond a particular threat perception level. I don't see how even the
> existing cryptoapi (or *any* security measure in the kernel for that
> matter) stands up to the kind of attacks we're talking about now.

True, and very specialized hardware is needed in such cases anyway, so
arguing that it's not the kernel's task to protect against such attacks
is valid. But it are interesting attacks, and people should be aware of
them, instead of blindly trusting any security measure (not implying
anyone here does, I mean in general).


>> > constant-time crypto implementations do take care of
>> > them, though I agree the GPG code too lacks that.
>>
>> That's because for side-channel attacks you need physical access to the
>> hardware, something for most machines means security is breached anyway.
>> But when this code is going to be used to sign things by embedded devices
>> (with a local, secret key), it can be important.
>>
>> For checking signatures the key is known and all this doesn't matter, but
>> we're talking about a common implementation. It are things to keep in mind.
>
> I think the original idea was to generate signatures at a centralized
> place (not on an embedded system) and only *verify* them using
> *public* keys on the embedded systems? For most common
> implementations, as I suggested, you only need bother yourself upto a
> certain security threshold.

Yes, but it depends on how the code is used. It is supposed to be generic code,
so whether someone wants to use it for signing or not is an open question. So
far it seems only signature checking is needed, and that simplifies a lot, but
if that isn't the case more questions pop up, like where the security threshold
should be.

The user with the tightest requirements more or less dictates the 
implementation.

All in all, to get anything merged at all in the kernel it seems at least the
following needs to happen:

- Future users speaking up and uniting.

- Figuring out their needs (so overlapping needs can make it into common
  code, and other decisions can be made, as where the kernel and user
  space border should be.)

- Deciding on a commonly agreed security threshold, and making that explicit.

- Coding it all up and keeping it in sync with mainline.

I don't see this happening soon. But a good start would be if someone who
cares about this sets up a mailing list or website to collect all users
and information.

Good night,

Indan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: intermittant petabyte usage reported with broadcom nic

2007-04-12 Thread CaT
On Mon, Apr 02, 2007 at 12:13:00AM -0700, Andrew Morton wrote:
> On Mon, 2 Apr 2007 11:43:19 +1000 CaT <[EMAIL PROTECTED]> wrote:
> 
> > I take minute by minute snapshots of network traffic by sampling
> > /proc/net/dev and most of the time everything works fine. Occasionally
> > though I get petabyte byte traffic and corresponding packet traffic.
> 
> How frequently?
> 
> Are you able to provide some actual numbers (expected and actual values),
> so we can look at the bit patterns?

I have some now. These are raw lines from /proc/net/dev. In this case it's
eth0 at 22:14 that chucked a wee wibbly.

Apr 11 22:13:02 '  eth0:17227166357 81379716000 0  0
 0 33090495625 86656584000 0   0  0 '
Apr 11 22:13:02 '  eth1:30708022097 91219466000 0  0
 0 122989582024 125073786000 0   0  0 '
Apr 11 22:14:02 '  eth0:220898233988841368 66750274000 0
  0  86458738 52386430545 101089219 19931300 0  199313  0 '
Apr 11 22:14:02 '  eth1:30708307787 91220183000 0  0
 0 122989665004 125074344000 0   0  0 '
Apr 11 22:15:02 '  eth0:17227454818 81381144000 0  0
 0 33091307388 86658381000 0   0  0 '
Apr 11 22:15:02 '  eth1:30708569308 91220742000 0  0
 0 122989732601 125074712000 0   0  0 '

On another server (same hardware except for 2ru case, more ram and more hds):

Apr  9 06:18:05 '  eth0:1556640056941 3598105481000 0  
0 0 2281147324747 3318270401000 0   0  0 '
Apr  9 06:18:05 '  eth1:912389249044 1190286687000 0  0 
0 642943095469 991257887000 0   0  0 '
Apr  9 06:19:04 '  eth0:14250798570591813804 2284720007938 1863800 
18638  0  27375938 1556640980159 3345714490000 0   
0  0 '
Apr  9 06:19:04 '  eth1:912389281939 1190287072000 0  0 
0 642943219035 991258183000 0   0  0 '
Apr  9 06:20:05 '  eth0:1556643514710 3598121584000 0  
0 0 2281154391794 3318284878000 0   0  0 '
Apr  9 06:20:05 '  eth1:912389305767 1190287354000 0  0 
0 642943273879 991258351000 0   0  0 '

> > This happens on an AMD64, dual core smp box with Broadcom NetXtreme II
> > nics.
> 
> What driver drivers that?  b44.c?

To clarify it's an Intel Dual Core Xeon (I just wound up as thinking of
them all as amd64s). Network card driver in use is the one defined by
CONFIG_BNX2. Kernel's monolithic.

> We do perform racy 64-bit updates of some of the stats counters.  But
> that'll only affect 32-bit kernels and I'm assuming you're running a 64-bit
> kernel on that AMD64 box (are you?)

Yes. With 32bit compat for executables built in.

-- 
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/30] Use menuconfig objects

2007-04-12 Thread Andrew Morton
On Tue, 10 Apr 2007 21:17:40 +0200 (MEST)
Jan Engelhardt <[EMAIL PROTECTED]> wrote:

> the following patch series turns some menus into menuconfigs, so they 
> can be disabled whilst "walking" thorugh the parent menu

So I merged the 23 of these which survived review and which do not
intersect with other outstanding work.

I don't think I have an opinion on whether the change is actually an
improvement, and I don't get a clear sense of what others think.  Shrug.

If we're going to make this change, we should ensure that it is done
kernel-wide, for UI consistency reasons.

If nothing else happens, I guess I'll spray these patches at the relevant
maintainers in a couple of weeks time, see what sticks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] uninline remove/add_parent() APIs

2007-04-12 Thread Roland McGrath
I'm travelling this week (through Monday) and can't be of much immediate
help on improving the situation or explaining it in great detail.  Last
week before I left home I was deep in some strange debugging and didn't get
a chance to look up.  There will be more of that, but I'll try to make some
timely progress on answering all the backlog of correspondence about utrace
too.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
> Hmm, does Xen perhaps not use interrupt stacks? Normally 2.7k should be still
> green as long as there are not too many functions above/below it.
>   

That's a good point, I'll need to check that.  Still, nearly 3k of stack!

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched_yield proposals/rationale

2007-04-12 Thread Bill Davidsen

[EMAIL PROTECTED] wrote:

-Original Message-



Besides - but I guess you're aware of it - any randomized
algorithms tend to drive benchmarkers and performance analysts
crazy because their performance cannot be repeated. So it's usually
better to avoid them unless there is really no alternative.


That could already solve your concern from above. Statistically

speaking, it will give them (benchmarkers) the smoothest curve they've
ever seen.


Please be aware that I'm just exploring options/insight here. It is

not something I intend to push inside the mainline kernel. I just want
to find reasonable and logic criticism as you and some others have
provided already. Thanks for that!

And having gotten same, are you going to code up what appears to be a 
solution, based on this feedback?


I'm curious how well it would run poorly written programs, having 
recently worked with a company which seemed to have a whole part of 
purchasing dedicated to buying same. :-(


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH UPDATE] deflate stack usage in lib/inflate.c

2007-04-12 Thread Matt Mackall
On Thu, Apr 12, 2007 at 01:50:54PM -0700, Jeremy Fitzhardinge wrote:
> -#define HEAP_SIZE 0x3000
> +#define HEAP_SIZE 0x4000

There are a bunch more of these that'll need fixing.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >