Re: gigabit ethernet power consumption

2007-10-08 Thread Oliver Neukum
Am Dienstag 09 Oktober 2007 schrieb Pavel Machek:
> Question is, how to implement it correctly? Daemon that would watch
> data rates and switch speeds using mii-tool would be simple, but is
> that enough?

Do you only want to affect true ethernet devices this way? It seems
to me that the savings for wireless devices could be larger and we don't
want a separate mechanism for each type of network device. So I think
you need to tell the kernel why you want to reduce the link speed.

Regards
Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Stefan Richter
> Sam Ravnborg wrote:
>> On Tue, Oct 09, 2007 at 08:11:53AM +0200, Stefan Richter wrote:
>>> The SCM changelog should contain _what_ a patch does and if
>>> necessary _why_ it does so.
>> The _why_ part is more important than _what_. The diff should hopefully
>> explain the _what_ part.
> 
> "What": fix lockup in this and that circumstances
> "Why": because lockups are annoying
> "How": the diff
> (That's what I meant with what and why.)

PS, example with non-trivial why:
What: add ABI which correlates bus cycle counter and local time
Why: apps need it to sync streams from different buses
How: the diff
-- 
Stefan Richter
-=-=-=== =-=- -=--=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread Benjamin Herrenschmidt

On Mon, 2007-10-08 at 22:00 -0700, David Brownell wrote:
> > > > The old /etc/hotplug/usb.rc script made sure to load those modules
> > > > in the correct order:  EHCI first.
> > > 
> > > I expected to find something cute attempting to handle this under
> > > /etc/udev, I have failed so far :-)
> >
> > No, nothing cute in udev itself, but it seems that all distros that I
> > know of have a "load these modules now" type setting in their init
> > scripts that can be used here.
> >
> > I can't think of a way to enforce this load order on the modules
> > themselves due to the fact that OHCI might not even be needed for EHCI
> > devices on UHCI (Intel) based chipsets :(
> 
> Assuming PCI is present, /sys/bus/pci/devices/*/class can tell
> if EHCI is present (0x0c0320) ... if so, load that driver.
> Then repeat for OHCI (0x0c0310) and UHCI (0x0c0300).

That will not work for all of the non-PCI implementations though.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm -v4 3/3] i386/x86_64 boot: document for 32 bit boot protocol

2007-10-08 Thread Huang, Ying
This patch defines a 32-bit boot protocol and adds corresponding
document. It is based on the proposal of Peter Anvin.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---

 boot.txt  |   70 +++
 zero-page.txt |  129 +-
 2 files changed, 99 insertions(+), 100 deletions(-)

Index: linux-2.6.23-rc6/Documentation/i386/boot.txt
===
--- linux-2.6.23-rc6.orig/Documentation/i386/boot.txt   2007-09-19 
16:45:23.0 +0800
+++ linux-2.6.23-rc6/Documentation/i386/boot.txt2007-09-19 
16:45:27.0 +0800
@@ -2,7 +2,7 @@
 
 
H. Peter Anvin <[EMAIL PROTECTED]>
-   Last update 2007-05-23
+   Last update 2007-09-18
 
 On the i386 platform, the Linux kernel uses a rather complicated boot
 convention.  This has evolved partially due to historical aspects, as
@@ -42,6 +42,9 @@
 Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of
the boot command line
 
+Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical
+   pointer to single linked list of struct setup_data.
+   Added 32-bit boot protocol.
 
  MEMORY LAYOUT
 
@@ -168,6 +171,9 @@
 0234/1 2.05+   relocatable_kernel Whether kernel is relocatable or not
 0235/3 N/A pad2Unused
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
+023c/4 N/A pad3Unused
+0240/8 2.07+   setup_data  64-bit physical pointer to linked list
+   of struct setup_data
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -480,6 +486,36 @@
   cmdline_size characters. With protocol version 2.05 and earlier, the
   maximum size was 255.
 
+Field name:setup_data
+Type:  write (obligatory)
+Offset/size:   0x240/8
+Protocol:  2.07+
+
+  The 64-bit physical pointer to NULL terminated single linked list of
+  struct setup_data. This is used to define a more extensible boot
+  parameters passing mechanism. The definition of struct setup_data is
+  as follow:
+
+  struct setup_data {
+ u64 next;
+ u32 type;
+ u32 len;
+ u8  data[0];
+  } __attribute__((packed));
+
+  Where, the next is a 64-bit physical pointer to the next node of
+  linked list, the next field of the last node is 0; the type is used
+  to identify the contents of data; the len is the length of data
+  field; the data holds the real payload.
+
+  With this field, to add a new boot parameter written by bootloader,
+  it is not needed to add a new field to real mode header, just add a
+  new setup_data type is sufficient. But to add a new boot parameter
+  read by bootloader, it is still needed to add a new field.
+
+  TODO: Where is the safe place to place the linked list of struct
+   setup_data?
+
 
  THE KERNEL COMMAND LINE
 
@@ -753,3 +789,35 @@
After completing your hook, you should jump to the address
that was in this field before your boot loader overwrote it
(relocated, if appropriate.)
+
+
+ SETUP DATA TYPES
+
+
+ 32-bit BOOT PROTOCOL
+
+For machine with some new BIOS other than legacy BIOS, such as EFI,
+LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
+based on legacy BIOS can not be used, so a 32-bit boot protocol need
+to be defined.
+
+In 32-bit boot protocol, the first step in loading a Linux kernel
+should still be to load the real-mode code and then examine the kernel
+header at offset 0x01f1. But, it is not necessary to load all
+real-mode code, just first 4K bytes traditionally known as "zero page"
+is needed.
+
+In addition to read/modify/write kernel header of the zero page as
+that of 16-bit boot protocol, the boot loader should also fill the
+additional fields of the zero page as that described in zero-page.txt.
+
+After loading and setuping the zero page, the boot loader can load the
+32/64-bit kernel in the same way as that of 16-bit boot protocol.
+
+In 32-bit boot protocol, the kernel is started by jumping to the
+32-bit kernel entry point, which is the start address of loaded
+32/64-bit kernel.
+
+At entry, the CPU must be in 32-bit protected mode with paging
+disabled; the CS and DS must be 4G flat segments; %esi holds the base
+address of the "zero page"; %esp, %ebp, %edi should be zero.
Index: linux-2.6.23-rc6/Documentation/i386/zero-page.txt
===
--- linux-2.6.23-rc6.orig/Documentation/i386/zero-page.txt  2007-09-19 
16:45:23.0 +0800
+++ linux-2.6.23-rc6/Documentation/i386/zero-page.txt   2007-09-19 
16:45:27.0 +0800
@@ -1,99 +1,30 @@

-!!!WARNING
-The zero page 

Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Stefan Richter
Sam Ravnborg wrote:
> On Tue, Oct 09, 2007 at 08:11:53AM +0200, Stefan Richter wrote:
>> The SCM changelog should contain _what_ a patch does and if
>> necessary _why_ it does so.
> The _why_ part is more important than _what_. The diff should hopefully
> explain the _what_ part.

"What": fix lockup in this and that circumstances
"Why": because lockups are annoying
"How": the diff
(That's what I meant with what and why.)
-- 
Stefan Richter
-=-=-=== =-=- -=--=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm -v4 1/3] i386/x86_64 boot: setup data

2007-10-08 Thread Huang, Ying
This patch add a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---

 arch/i386/Kconfig|3 -
 arch/i386/boot/header.S  |8 +++
 arch/i386/kernel/setup.c |   92 +++
 arch/x86_64/kernel/setup.c   |   37 +
 include/asm-i386/bootparam.h |   15 +++
 include/asm-i386/io.h|7 +++
 include/linux/mm.h   |2 
 mm/memory.c  |   24 +++
 8 files changed, 184 insertions(+), 4 deletions(-)

Index: linux-2.6.23-rc8/include/asm-i386/bootparam.h
===
--- linux-2.6.23-rc8.orig/include/asm-i386/bootparam.h  2007-10-09 
11:26:06.0 +0800
+++ linux-2.6.23-rc8/include/asm-i386/bootparam.h   2007-10-09 
14:15:14.0 +0800
@@ -9,6 +9,17 @@
 #include 
 #include 
 
+/* setup data types */
+#define SETUP_NONE 0
+
+/* extensible setup data list node */
+struct setup_data {
+   u64 next;
+   u32 type;
+   u32 len;
+   u8 data[0];
+} __attribute__((packed));
+
 struct setup_header {
u8  setup_sects;
u16 root_flags;
@@ -41,6 +52,10 @@
u32 initrd_addr_max;
u32 kernel_alignment;
u8  relocatable_kernel;
+   u8  _pad2[3];
+   u32 cmdline_size;
+   u32 _pad3;
+   u64 setup_data;
 } __attribute__((packed));
 
 struct sys_desc_table {
Index: linux-2.6.23-rc8/arch/i386/boot/header.S
===
--- linux-2.6.23-rc8.orig/arch/i386/boot/header.S   2007-10-09 
11:26:06.0 +0800
+++ linux-2.6.23-rc8/arch/i386/boot/header.S2007-10-09 11:26:08.0 
+0800
@@ -119,7 +119,7 @@
# Part 2 of the header, from the old setup.S
 
.ascii  "HdrS"  # header signature
-   .word   0x0206  # header version number (>= 0x0105)
+   .word   0x0207  # header version number (>= 0x0105)
# or else old loadlin-1.5 will fail)
.globl realmode_swtch
 realmode_swtch:.word   0, 0# default_switch, SETUPSEG
@@ -214,6 +214,12 @@
 #added with boot protocol
 #version 2.06
 
+pad4:  .long 0
+
+setup_data:.quad 0 # 64-bit physical pointer to
+   # single linked list of
+   # struct setup_data
+
 # End of setup header #
 
.section ".inittext", "ax"
Index: linux-2.6.23-rc8/arch/x86_64/kernel/setup.c
===
--- linux-2.6.23-rc8.orig/arch/x86_64/kernel/setup.c2007-10-09 
11:26:06.0 +0800
+++ linux-2.6.23-rc8/arch/x86_64/kernel/setup.c 2007-10-09 14:15:14.0 
+0800
@@ -250,6 +250,40 @@
ebda_size = 64*1024;
 }
 
+static void __init parse_setup_data(void)
+{
+   struct setup_data *data;
+   unsigned long pa_data;
+
+   if (boot_params.hdr.version < 0x0207)
+   return;
+   pa_data = boot_params.hdr.setup_data;
+   while (pa_data) {
+   data = early_ioremap(pa_data, PAGE_SIZE);
+   switch (data->type) {
+   default:
+   break;
+   }
+   pa_data = data->next;
+   early_iounmap(data, PAGE_SIZE);
+   }
+}
+
+static void __init reserve_setup_data(void)
+{
+   struct setup_data *data;
+   unsigned long pa_data;
+
+   if (boot_params.hdr.version < 0x0207)
+   return;
+   pa_data = boot_params.hdr.setup_data;
+   while (pa_data) {
+   data = __va(pa_data);
+   reserve_bootmem_generic(pa_data, sizeof(*data)+data->len);
+   pa_data = data->next;
+   }
+}
+
 void __init setup_arch(char **cmdline_p)
 {
printk(KERN_INFO "Command line: %s\n", boot_command_line);
@@ -285,6 +319,8 @@
strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
*cmdline_p = command_line;
 
+   parse_setup_data();
+
parse_early_param();
 
finish_e820_parsing();
@@ -373,6 +409,7 @@
 */
acpi_reserve_bootmem();
 #endif
+   reserve_setup_data();
/*
 * Find and reserve possible boot-time SMP configuration:
 */
Index: linux-2.6.23-rc8/arch/i386/kernel/setup.c
===
--- linux-2.6.23-rc8.orig/arch/i386/kernel/setup.c  2007-10-09 
11:26:06.0 +0800
+++ linux-2.6.23-rc

[PATCH -mm -v4 2/3] i386/x86_64 boot: boot parameters export via sysfs

2007-10-08 Thread Huang, Ying
This patch export the boot parameters via sysfs. This can be used for
debugging and kexec.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---

 i386/kernel/Makefile|1 
 i386/kernel/ksysfs.c|  242 
 i386/kernel/setup.c |2 
 x86_64/kernel/Makefile  |1 
 x86_64/kernel/setup64.c |2 
 5 files changed, 246 insertions(+), 2 deletions(-)

Index: linux-2.6.23-rc8/arch/x86_64/kernel/Makefile
===
--- linux-2.6.23-rc8.orig/arch/x86_64/kernel/Makefile   2007-10-09 
11:30:20.0 +0800
+++ linux-2.6.23-rc8/arch/x86_64/kernel/Makefile2007-10-09 
13:57:09.0 +0800
@@ -39,6 +39,7 @@
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_AUDIT)+= audit.o
 obj-$(CONFIG_STACK_UNWIND) += unwind.o
+obj-$(CONFIG_SYSFS)+= ../../i386/kernel/ksysfs.o
 
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_PCI)  += early-quirks.o
Index: linux-2.6.23-rc8/arch/x86_64/kernel/setup64.c
===
--- linux-2.6.23-rc8.orig/arch/x86_64/kernel/setup64.c  2007-10-09 
11:30:20.0 +0800
+++ linux-2.6.23-rc8/arch/x86_64/kernel/setup64.c   2007-10-09 
11:30:25.0 +0800
@@ -24,7 +24,7 @@
 #include 
 #include 
 
-struct boot_params __initdata boot_params;
+struct boot_params boot_params;
 
 cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
 
Index: linux-2.6.23-rc8/arch/i386/kernel/ksysfs.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.23-rc8/arch/i386/kernel/ksysfs.c  2007-10-09 13:58:58.0 
+0800
@@ -0,0 +1,242 @@
+/*
+ * arch/i386/ksysfs.c - architecture specific sysfs attributes in /sys/kernel
+ *
+ * Copyright (C) 2007, Intel Corp.
+ *  Huang Ying <[EMAIL PROTECTED]>
+ *
+ * This file is released under the GPLv2
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+static ssize_t boot_params_version_show(struct kset *kset, char *page)
+{
+   return sprintf(page, "0x%x\n", boot_params.hdr.version);
+}
+
+static struct subsys_attribute boot_params_version_attr = {
+   .attr = {
+   .name = "version",
+   .mode = S_IRUGO,
+   },
+   .show = boot_params_version_show,
+};
+
+static struct attribute *boot_params_attrs[] = {
+   &boot_params_version_attr.attr,
+   NULL
+};
+
+static struct attribute_group boot_params_attr_group = {
+   .attrs = boot_params_attrs,
+};
+
+static ssize_t boot_params_data_read(struct kobject *kobj,
+struct bin_attribute *bin_attr,
+char *buf, loff_t off, size_t count)
+{
+   memcpy(buf, (void *)&boot_params + off, count);
+   return count;
+}
+
+static struct bin_attribute boot_params_data_attr = {
+   .attr = {
+   .name = "data",
+   .mode = S_IRUGO,
+   },
+   .read = boot_params_data_read,
+   .size = sizeof(boot_params),
+};
+
+struct setup_data_kobj
+{
+   struct kobject kobj;
+   unsigned long pa_setup_data;
+   struct bin_attribute *data_attr;
+};
+
+struct setup_data_attribute {
+   struct attribute attr;
+   ssize_t (*show) (struct setup_data_kobj *setup_data_kobj, char *buf);
+};
+
+static ssize_t setup_data_type_show(struct setup_data_kobj *setup_data_kobj,
+   char *page)
+{
+   struct setup_data data;
+   copy_from_phys(&data, setup_data_kobj->pa_setup_data, sizeof(data));
+   return sprintf(page, "0x%x\n", data.type);
+}
+
+static struct setup_data_attribute setup_data_type_attr = {
+   .attr = {
+   .name = "type",
+   .mode = S_IRUGO,
+   },
+   .show = setup_data_type_show,
+};
+
+static ssize_t setup_data_attr_show(struct kobject *kobj,
+   struct attribute *attr,
+   char *buf)
+{
+   struct setup_data_kobj *setup_data_kobj =
+   (struct setup_data_kobj *)kobj;
+   struct setup_data_attribute *setup_data_attr =
+   (struct setup_data_attribute *)attr;
+   ssize_t error = -EIO;
+
+   if (setup_data_attr->show)
+   error = setup_data_attr->show(setup_data_kobj, buf);
+   return error;
+}
+
+static void setup_data_kobj_release(struct kobject *kobj)
+{
+   struct setup_data_kobj *setup_data_kobj =
+   (struct setup_data_kobj *)kobj;
+   kfree(setup_data_kobj->data_attr);
+   kfree(setup_data_kobj);
+}
+
+static struct sysfs_ops setup_data_attr_ops = {
+   .show = setup_data_attr_show,
+};
+
+static struct attribute *setup_data_default_attrs[] = {
+   &setup_data_type_attr.attr,
+   NULL,
+};
+
+static struct kobj_type ktype_setup_data = {
+   .release = &setup_data_ko

[PATCH -mm -v4 0/3] i386/x86_64 boot: 32-bit boot protocol

2007-10-08 Thread Huang, Ying
This patchset defines a 32-bit boot protocol for i386/x86_64 platform,
adds an extensible boot parameter passing mechanism, export the boot
parameters via sysfs.

The patchset has been tested against 2.6.23-rc8-mm2 kernel on x86_64
and i386.

This patchset is based on the proposal of Peter Anvin.


Known Issues:

- Where is safe to place the linked list of setup_data?  Because the
  length of the linked list of setup_data is variable, it can not be
  copied into BSS segment of kernel as that of "zero page". We must
  find a safe place for it, where it will not be overwritten by kernel
  during booting up. The i386 kernel will overwrite some pages after
  _end. The x86_64 kernel will overwrite some pages from 0x1000 on.

- The fields in zero page are fairly complex (such as struct
  edd_info). Is it necessary to document every field inside the first
  level fields, until the primary data type? Or is it sufficient to
  provide the C struct name only?

- Which fields of boot parameters should be exported directly in
  sysfs? Export all fields of boot parameters in sysfs is too complex
  and unnecessary. Which fields should be?


-v4

* Reserve setup_data and boot parameters for accessing during
  runtime.
* Export boot parameters via sysfs.

-v3

* Move hd0_info and hd1_info back to zero page for compatibility.

-v2

* Increase the boot protocol version number
* Check version number before parsing setup data.
* Revise zero page description according to the source code and move
  them to zero-page.txt.


Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread Benjamin Herrenschmidt

On Mon, 2007-10-08 at 22:26 -0700, David Miller wrote:
> From: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
> Date: Tue, 09 Oct 2007 15:13:36 +1000
> 
> > I'm not even sure module load order is 100% fault proof here since
> > khubd spawns as a thread...
> 
> I'm concerned about that as well, thanks for bringing it up.
> 
> My understanding, however, is that the critical thing is that the EHCI
> device reset being done by the EHCI driver probe occurs and completes
> first.  If that is true, then just making sure EHCI loads initially is
> a sufficient constraint to fix this problem.

Yup, that would be, though I hate that sort of load order
dependencies...

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Sam Ravnborg
On Tue, Oct 09, 2007 at 08:11:53AM +0200, Stefan Richter wrote:
> Steven Rostedt wrote:
> > But for those that run test suites, they should be smart enough to put
> > in more documentation into the change log to state how it was tested.
> 
> I disagree.  The SCM changelog should contain _what_ a patch does and if
> necessary _why_ it does so.
The _why_ part is more important than _what_. The diff should hopefully
explain the _what_ part.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: irq0 stops working

2007-10-08 Thread Thomas Gleixner
On Tue, 9 Oct 2007, Vasily Averin wrote:

> Jan Engelhardt wrote:
> > On Oct 9 2007 09:26, Vasily Averin wrote:
> >> On one of our servers timer interrupts (i.e irq0) are stops working. As 
> >> result
> >> any kernel timers do not triggers and tasks waiting some signals from 
> >> timers
> >> hangs forever.
> > 
> > What kernel.. and tried CONFIG_NO_HZ=n?
> 
> Originally I've noticed this issue on RHEL5 kernels, but then I've reproduced 
> it
> on latest mainstream kernels, in my last attempt it was 2.6.23-rc7.
> 
> Thank you for for your tips about  CONFIG_NO_HZ=n, will try to to it.

You run a 64 bit kernel, where this option is not available yet.

tglx

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Stop docproc segfaulting when SRCTREE isn't set.

2007-10-08 Thread Rob Landley
From: Rob Landley <[EMAIL PROTECTED]>

Prevent docproc from segfaulting when SRCTREE isn't set.

Signed-off-by: Rob Landley <[EMAIL PROTECTED]>
---

 scripts/basic/docproc.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff -r a26a53ed1101 scripts/basic/docproc.c
--- a/scripts/basic/docproc.c   Sun Oct 07 16:42:22 2007 -0700
+++ b/scripts/basic/docproc.c   Tue Oct 09 01:08:54 2007 -0500
@@ -64,12 +64,15 @@ FILELINE * entity_system;
 #define FUNCTION  "-function"
 #define NOFUNCTION"-nofunction"
 
+char *srctree;
+
 void usage (void)
 {
fprintf(stderr, "Usage: docproc {doc|depend} file\n");
fprintf(stderr, "Input is read from file.tmpl. Output is sent to 
stdout\n");
fprintf(stderr, "doc: frontend when generating kernel documentation\n");
fprintf(stderr, "depend: generate list of files referenced within 
file\n");
+   fprintf(stderr, "Environment variable SRCTREE: absolute path to kernel 
source tree.\n");
 }
 
 /*
@@ -88,7 +91,7 @@ void exec_kernel_doc(char **svec)
exit(1);
case  0:
memset(real_filename, 0, sizeof(real_filename));
-   strncat(real_filename, getenv("SRCTREE"), PATH_MAX);
+   strncat(real_filename, srctree, PATH_MAX);
strncat(real_filename, KERNELDOCPATH KERNELDOC,
PATH_MAX - strlen(real_filename));
execvp(real_filename, svec);
@@ -168,7 +171,7 @@ void find_export_symbols(char * filename
if (filename_exist(filename) == NULL) {
char real_filename[PATH_MAX + 1];
memset(real_filename, 0, sizeof(real_filename));
-   strncat(real_filename, getenv("SRCTREE"), PATH_MAX);
+   strncat(real_filename, srctree, PATH_MAX);
strncat(real_filename, filename,
PATH_MAX - strlen(real_filename));
sym = add_new_file(filename);
@@ -335,6 +338,9 @@ int main(int argc, char *argv[])
 int main(int argc, char *argv[])
 {
FILE * infile;
+
+   srctree = getenv("SRCTREE");
+   if (!srctree) srctree = getcwd(NULL,0);
if (argc != 3) {
usage();
exit(1);

-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: irq0 stops working

2007-10-08 Thread Vasily Averin
Jan Engelhardt wrote:
> On Oct 9 2007 09:26, Vasily Averin wrote:
>> On one of our servers timer interrupts (i.e irq0) are stops working. As 
>> result
>> any kernel timers do not triggers and tasks waiting some signals from timers
>> hangs forever.
> 
> What kernel.. and tried CONFIG_NO_HZ=n?

Originally I've noticed this issue on RHEL5 kernels, but then I've reproduced it
on latest mainstream kernels, in my last attempt it was 2.6.23-rc7.

Thank you for for your tips about  CONFIG_NO_HZ=n, will try to to it.

thank you,
Vasily Averin

OpenVZ Linux Kernel Team
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Stefan Richter
Steven Rostedt wrote:
> But for those that run test suites, they should be smart enough to put
> in more documentation into the change log to state how it was tested.

I disagree.  The SCM changelog should contain _what_ a patch does and if
necessary _why_ it does so.  The rest (e.g. the sign-off tag to state
that the licensing is alright, and any other tags) should have its
meaning sufficiently defined outside the changelog.

Remember what the SCM changelog is for, i.e. what we do with it after
commit.
-- 
Stefan Richter
-=-=-=== =-=- -=--=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread Greg KH
On Mon, Oct 08, 2007 at 09:47:27PM -0700, David Miller wrote:
> From: Greg KH <[EMAIL PROTECTED]>
> Date: Mon, 8 Oct 2007 21:39:09 -0700
> 
> > No, nothing cute in udev itself, but it seems that all distros that I
> > know of have a "load these modules now" type setting in their init
> > scripts that can be used here.
> > 
> > I can't think of a way to enforce this load order on the modules
> > themselves due to the fact that OHCI might not even be needed for EHCI
> > devices on UHCI (Intel) based chipsets :(
> > 
> > Can anyone else?
> 
> The three modules perhaps should be a bundle of whatever ones you have
> enabled, and internally we can dispatch the initialization to occur in
> the correct order from a top-level module_init().
> 
> If the devices need to be initialized in a certain order in a
> situation like this, it really seems like it is the kernel's job to
> enforce it.

I agree.

Here's some information from Intel about where they have seen this
happen for UHCI controllers, so it's not just an OHCI issue :(

thanks,

greg k-h




We had a logic analyzer attached to the bus going to the ESB (ICH) which
has the USB controller in it. In the passing case we would see no
accesses to UHCI IO registers while EHCI initialized and sets its config
flag. The EHCI Port Status & Control registers were then read and then
we see a write to the EHCI Port Status & Control registers port owner
bit for the low speed devices (keyboard & mouse). This turns control
back over to the companion UHCI controller.=20

In our most prevalent failing case (#1 below) we never saw the write to
port owner bit on the ports with the low speed devices. In the passing
case we see the write to the port owner bit.

I do not see how this would have anything to do with flakey hardware
especially since we can reproduce this on all of our systems and the
same device (USB controller) is used on multiple products.=20

I really believe that this has to do with the UHCI and EHCI drivers
running on top of each other. This seems to be happening fairly often on
our systems. If the EHCI driver runs first then we do not see the
failure. If they are running at the same time then we see different
failure symptoms.=20

1) We see that the ports with low speed devices are still in EHCI mode
(port owner bit not written to in EHCI driver). In our analyzer captures
we see the reads from the Port Status & Control register and it is
indicating that there are low speed devices on the ports. Can you tell
us why the driver would not be doing the write to the port owner bit
when it sees that low speed devices are attached to that port? Is there
something specific that it looks for and decides not to do the write?

2) In other cases we see that the ports with the low speed devices are
back in UHCI mode but the ports are disabled. In this case we see from
the analyzer traces that the UHCI driver has completed setting up the
port. It has actually enabled that port in UHCI mode. We then see the
EHCI driver comes in and it resets everything. The driver then gives
control back to the UCHI controller (by setting the port owner bit)
but...since the UHCI driver has already setup this port once it seems
that it does not go back and set it up again. In this case we do not
think that the UHCI driver has completed running when the EHCI driver
comes in and does the reset. Can you tell us if the UHCI driver was
interrupted in the middle but after the ports with the low speed devices
had been enabled would the UHCI driver ever go back and reinitialize the
ports with the low speed devices?

3) In some cases we see errors in the DMESG log but it seems to recover.

So we really do believe that it has to do with the EHCI driver running
in the middle of the UHCI driver running. And then dependent upon when
the EHCI driver comes in, while the UHCI driver is running, we see the
different failures. And since by default these drivers are not forced to
run sequentially we are susceptible to the failure.=20

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/RFT] kbuild: save ARCH & CROSS_COMPILE

2007-10-08 Thread Sam Ravnborg
On Mon, Oct 08, 2007 at 11:12:56PM +0200, Adrian Bunk wrote:
> On Mon, Oct 08, 2007 at 10:02:55PM +0200, Sam Ravnborg wrote:
> >...
> > The settings are stored in the build directory in a file
> > named "Kbuild.config" (should it be a .dot file?).
> >...
> 
> A .dot file sounds better.
I will make it ".kbuild".
Droppeing the .config bits of the name will hopefully avoid that
people mess with it manually.

> BTW: I'm currently trying without success to understand why the
>  drivers/infiniband/{hw/amso1100,ulp/srp}/Kbuild files are not
>  named "Makefile".
Giacomo explained this already..
But I have never done a global renaming - the
pain/benefit ratio seems too low.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: irq0 stops working

2007-10-08 Thread Thomas Gleixner
On Tue, 9 Oct 2007, Vasily Averin wrote:
> On one of our servers timer interrupts (i.e irq0) are stops working. As result
> any kernel timers do not triggers and tasks waiting some signals from timers
> hangs forever.

Which kernel version ?
 
> Most noticeable effect of this situation is that any write operations to disk
> are stalled, and nobody can log in on the node.
> 
> At the same time node all existing shells works away. I'm able to read
> interrupts statistic from /proc/interrupts file and it shows that all other
> interrupts are changed when these devices are accessed: disk on sata 
> controller,
> network, cdrom on ide controller, keyboard, serial console, LOC interrupts.
> 
> Also I've found that disable of irqbalance service on the node helps to
> workaround this issue, however of course it fixes nothing.

Well, it's at least a hint. Can you try the patch below please ?

tglx

diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 6d48a4e..248987a 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -360,7 +360,7 @@ void stop_timer_interrupt(void)
 
 static struct irqaction irq0 = {
.handler= timer_interrupt,
-   .flags  = IRQF_DISABLED | IRQF_IRQPOLL,
+   .flags  = IRQF_DISABLED | IRQF_IRQPOLL | IRQF_NOBALANCING,
.mask   = CPU_MASK_NONE,
.name   = "timer"
 };
@@ -403,6 +403,7 @@ void __init time_init(void)
cpu_khz / 1000, cpu_khz % 1000);
init_tsc_clocksource();
 
+   irq0.mask = cpumask_of_cpu(0);
setup_irq(0, &irq0);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/RFT] kbuild: save ARCH & CROSS_COMPILE

2007-10-08 Thread Sam Ravnborg
> > 2) We need to share much more Kconfig* between the individual architectures
> >First step is to let all arch's use drivers/Kconfig
> 
> 2) isn't terribly difficult, just takes some time and willingness
> of $arch maintainers to some changes, but please explain a bit more
> why it is needed...?

A prerequisite for moving ARCH selection to Kconfig is that we
read in all Kconfig files for all architectures.
To do so efficient we should avoind including the same Kconfig
file for each architecture which is obviously the case today.

The efficiency comes both with respect to reading the files but
also memory consumption. If we read in drivers/Kconfig only once
then we will avoid some duplication compared to reading drivers/Kconfig
once for each architecture.

The structure we should aim for is something like a top-level
Kconfig file that pull in relevant parts from the kernel tree
and where the arch Kconfig only pull in additional Kconfig files
from that arch.

When we get this far we will have a more logical structure
in the Kconfig file and their distribution.

But the showstopper is the part with choice value that cannot have more
than a single prompt so when we have the same choice value
used in two arch Kconfig files then kconfig will warn and the
choice will do the wrong thing.
I never took a deeper look at this - I seem to get distracted each
time I try to understand all the inner details of the kconfig
use of data structures.

Sam

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: irq0 stops working

2007-10-08 Thread Jan Engelhardt

On Oct 9 2007 09:26, Vasily Averin wrote:
>
>On one of our servers timer interrupts (i.e irq0) are stops working. As result
>any kernel timers do not triggers and tasks waiting some signals from timers
>hangs forever.

What kernel.. and tried CONFIG_NO_HZ=n?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/RFT] kbuild: save ARCH & CROSS_COMPILE

2007-10-08 Thread Randy Dunlap
On Mon, 8 Oct 2007 21:53:16 -0700 Randy Dunlap wrote:

> On Tue, 9 Oct 2007 06:17:43 +0200 Sam Ravnborg wrote:
> 
> > > 
> > > What about, that this is the first ever prompt, that must be shown and
> > > written to the .config?
> > Two issues to fix before we can do this:
> > 1) chocie values cannot have more than one prompt
> > 2) We need to share much more Kconfig* between the individual architectures
> >First step is to let all arch's use drivers/Kconfig
> 
> 2) isn't terribly difficult, just takes some time and willingness
> of $arch maintainers to some changes, but please explain a bit more
> why it is needed...?

Maybe I didn't read carefully:  "to add arch selection to kconfig"..

arch/cris using drivers/Kconfig: patch is below (maintainer is
cc-ed)

> > Let's get the two items above solved then we can revisit adding arch 
> > selection
> > to kconfig (where it belongs in the end).
> > And neither require a rewrite of kconfig...
> > 
> > > Also, i'd like to propose sequencing of config-enable-build-this-unit
> > > in config file(s), thus Makefile(s) (sometimes very small and stupid)
> > > will be not necessary. Additional link ordering can be supplied as
> > > meta-config information there. Shell scripting, very ugly in the view
> > > of make syntax, will be natural in config files. Extending build
> > > process to get hidden dependencies or right linking/other magic is
> > > part of particular configuration. Hm?
> > Discussed before but so far no patches has shown up.

---

From: Randy Dunlap <[EMAIL PROTECTED]>

Move arch/cris to using drivers/Kconfig for its drivers config list.
When all arches do this, Sam can make more interesting improvements
to .config files.

Using drivers/Kconfig adds these kconfig files to cris:
connector, misc, ata, message/fusion (not avail.), macintosh (not avail.),
i2c, spi, w1, power, hwmon, mfd, video, hid, mmc, leds,
infiniband (not avail.), edac (not avail.), rtc, dma, auxdisplay,
kvm (not avail.), uio, and lguest (not avail.).

Many of these are already enabled/disabled per arch., so adding that
for cris can be done as required.

"not avail." means that this menu is not valid for this arch.
and won't be presented to users when running 'make *config'.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 arch/cris/Kconfig |   40 +---
 1 file changed, 1 insertion(+), 39 deletions(-)

--- linux-2.6.23-rc9-git6.orig/arch/cris/Kconfig
+++ linux-2.6.23-rc9-git6/arch/cris/Kconfig
@@ -153,49 +153,11 @@ source arch/cris/arch-v10/drivers/Kconfi
 
 endmenu
 
-source "drivers/base/Kconfig"
-
 # standard linux drivers
-source "drivers/mtd/Kconfig"
-
-source "drivers/parport/Kconfig"
-
-source "drivers/pnp/Kconfig"
-
-source "drivers/block/Kconfig"
-
-source "drivers/md/Kconfig"
-
-source "drivers/ide/Kconfig"
-
-source "drivers/scsi/Kconfig"
-
-source "drivers/ieee1394/Kconfig"
-
-source "drivers/message/i2o/Kconfig"
-
-source "drivers/net/Kconfig"
-
-source "drivers/isdn/Kconfig"
-
-source "drivers/telephony/Kconfig"
-
-#
-# input before char - char/joystick depends on it. As does USB.
-#
-source "drivers/input/Kconfig"
-
-source "drivers/char/Kconfig"
-
-#source drivers/misc/Config.in
-source "drivers/media/Kconfig"
+source "drivers/Kconfig"
 
 source "fs/Kconfig"
 
-source "sound/Kconfig"
-
-source "drivers/usb/Kconfig"
-
 source "arch/cris/Kconfig.debug"
 
 source "security/Kconfig"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


irq0 stops working

2007-10-08 Thread Vasily Averin
On one of our servers timer interrupts (i.e irq0) are stops working. As result
any kernel timers do not triggers and tasks waiting some signals from timers
hangs forever.

Most noticeable effect of this situation is that any write operations to disk
are stalled, and nobody can log in on the node.

At the same time node all existing shells works away. I'm able to read
interrupts statistic from /proc/interrupts file and it shows that all other
interrupts are changed when these devices are accessed: disk on sata controller,
network, cdrom on ide controller, keyboard, serial console, LOC interrupts.

Also I've found that disable of irqbalance service on the node helps to
workaround this issue, however of course it fixes nothing.

All details about hardware/logs could be found in
http://bugzilla.kernel.org/show_bug.cgi?id=8650

I'm able to reproduce this situation, however now I have no ideas how to
continue the investigation of this problem.

Could please anybody advise me any new ways for investigation of this issue?

Thank you,
Vasily Averin

OpenVZ Linux Kernel Team
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Miller
From: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
Date: Tue, 09 Oct 2007 15:13:36 +1000

> I'm not even sure module load order is 100% fault proof here since
> khubd spawns as a thread...

I'm concerned about that as well, thanks for bringing it up.

My understanding, however, is that the critical thing is that the EHCI
device reset being done by the EHCI driver probe occurs and completes
first.  If that is true, then just making sure EHCI loads initially is
a sufficient constraint to fix this problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [13/18] x86_64: Allow fallback for the stack

2007-10-08 Thread Nick Piggin
On Tuesday 09 October 2007 03:36, Christoph Lameter wrote:
> On Sun, 7 Oct 2007, Nick Piggin wrote:
> > > The problem can become non-rare on special low memory machines doing
> > > wild swapping things though.
> >
> > But only your huge systems will be using huge stacks?
>
> I have no idea who else would be using such a feature. Relaxing the tight
> memory restrictions on stack use may allow placing larger structures on
> the stack in general.

The tight memory restrictions on stack usage do not come about because
of the difficulty in increasing the stack size :) It is because we want to
keep stack sizes small!

Increasing the stack size 4K uses another 4MB of memory for every 1000
threads you have, right?

It would take a lot of good reason to move away from the general direction
we've been taking over the past years that 4/8K stacks are a good idea for
regular 32 and 64 bit builds in general.


> I have some concerns about the medium NUMA systems (a few dozen of nodes)
> also running out of stack since more data is placed on the stack through
> the policy layer and since we may end up with a couple of stacked
> filesystems. Most of the current NUMA systems on x86_64 are basically
> two nodes on one motherboard. The use of NUMA controls is likely
> limited there and the complexity of the filesystems is also not high.

The solution has until now always been to fix the problems so they don't
use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
systems, but I don't think you'd be able to make that assumption for most
normal systems.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Miller
From: David Brownell <[EMAIL PROTECTED]>
Date: Mon, 08 Oct 2007 22:00:19 -0700

> Assuming PCI is present, /sys/bus/pci/devices/*/class can tell
> if EHCI is present (0x0c0320) ... if so, load that driver.
> Then repeat for OHCI (0x0c0310) and UHCI (0x0c0300).

These are facts all of us know very well, but implementing this in
userspace in a failsafe manner isn't practical.  That's what we're
discussing.

There are things that autoload USB drivers way before udev or similar
even get started.

For example, the first thing some distributions do is try to load the
correct keyboard maps.  Guess what that can do?  It triggers a load of
all of the modular USB host controller drivers in case we have a USB
keyboard.

The only real solution is in the kernel, because it is the only
clean place to trap all of the potential module load events.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: gigabit ethernet power consumption

2007-10-08 Thread Willy Tarreau
Hi Auke,

On Mon, Oct 08, 2007 at 03:31:51PM -0700, Kok, Auke wrote:
> Pavel Machek wrote:
> > Hi!
> > 
> > I've found that gbit vs. 100mbit power consumption difference is about
> > 1W -- pretty significant. (Maybe powertop should include it in the
> > tips section? :).
> > 
> > Energy Star people insist that machines should switch down to 100mbit
> > when network is idle, and I guess that makes a lot of sense -- you
> > save 1W locally and 1W on the router.
> > 
> > Question is, how to implement it correctly? Daemon that would watch
> > data rates and switch speeds using mii-tool would be simple, but is
> > that enough?
> 
> you most certainly want to do this in userspace I think.
> 
> One of the biggest problems is that link negotiation can take a significant 
> amount
> of time, well over several seconds (1 to 3 seconds typical) with gigabit, and
> having your ethernet connection go offline for 3 seconds may not be the 
> desired
> effect for when you want to get more bandwidth in the first place.
> 
> However, when a laptop is in battery mode, switching down from gigabit to 
> 100mbit
> makes a lot more sense, so this is something I would recommend. This can be as
> easy as changing the advertisement mask of the interface and renegotiating the
> link. Userspace could handle that very easily.

If something does that, it must *only* be in userspace so that we can
easily disable it. It's amazing how many laptops consider that you
don't want any performance when you run off batteries. I've seen a
2GHz laptop falling back to 600 MHz when running on batteries, which
was very inconvenient when the laptop in question was used to go
sniffing gigabit traffic in datacenters... I would even go as far
as to say that my notebook runs lowpower only when it's plugged into
the wall because it's when I'm typing or doing low activity things.

In my opinion, battery != low power, battery == mobility. It's user's
choice which must imply low power, so that must be done with a dedicated
daemon.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread Benjamin Herrenschmidt
> Yes, that's why I asked about EHCI.  My speculation would be that
> OHCI starts the reset, and EHCI claims the port before it completes;
> or contrariwise OHCI starts the reset right after EHCI claims it.
> 
> And there's some point in that process where a hardware race makes
> the trouble you've observed.  I believe there are plenty of other
> places where it's perfectly fine if EHCI grabs the port, or this
> little race would have shown up many times before.

Since we can't know which O/UHCI is paired with which EHCI, we can't
really have generic code to deal with that race, but maybe we can be
smart and basically mutex khubd activity such as port reset vs.
registration of any new HCD ?

I'm not even sure module load order is 100% fault proof here since khubd
spawns as a thread...
 
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread Benjamin Herrenschmidt

On Mon, 2007-10-08 at 21:47 -0700, David Miller wrote:
> From: Greg KH <[EMAIL PROTECTED]>
> Date: Mon, 8 Oct 2007 21:39:09 -0700
> 
> > No, nothing cute in udev itself, but it seems that all distros that I
> > know of have a "load these modules now" type setting in their init
> > scripts that can be used here.
> > 
> > I can't think of a way to enforce this load order on the modules
> > themselves due to the fact that OHCI might not even be needed for EHCI
> > devices on UHCI (Intel) based chipsets :(
> > 
> > Can anyone else?
> 
> The three modules perhaps should be a bundle of whatever ones you have
> enabled, and internally we can dispatch the initialization to occur in
> the correct order from a top-level module_init().
> 
> If the devices need to be initialized in a certain order in a
> situation like this, it really seems like it is the kernel's job to
> enforce it.

Is the problem strictly an ordering problem or just a race ? In the
later case, maybe some better arbitration by the USB core to make
sure things are quiescent or in a known state when letting a new HCD
register might help ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: gigabit ethernet power consumption

2007-10-08 Thread Chris Snook

Pavel Machek wrote:

Hi!

I've found that gbit vs. 100mbit power consumption difference is about
1W -- pretty significant. (Maybe powertop should include it in the
tips section? :).

Energy Star people insist that machines should switch down to 100mbit
when network is idle, and I guess that makes a lot of sense -- you
save 1W locally and 1W on the router.

Question is, how to implement it correctly? Daemon that would watch
data rates and switch speeds using mii-tool would be simple, but is
that enough?


I believe you misspelled "ethtool".

While you're at it, why stop at 100Mb?  I believe you save even more power at 
10Mb, which is why WOL puts the card in 10Mb mode.  In my experience, you 
generally want either the maximum setting or the minimum setting when going for 
power savings, because of the race-to-idle effect.  Workloads that have a 
sustained fractional utilization are rare.  Right now I'm at home, hooked up to 
a cable modem, so anything over 4Mb is wasted, unless I'm talking to the box 
across the room, which is rare.


Talk to the NetworkManager folks.  This is right up their alley.

-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-fbdev-devel] [PATCH 0/6] Patch series to add of_platform binding to xilinxfb

2007-10-08 Thread Antonino A. Daplas
On Mon, 2007-10-08 at 22:43 -0600, Grant Likely wrote:
> On 10/2/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:
> > On Mon, 2007-10-01 at 09:57 -0600, Grant Likely wrote:
> > > Assuming there are no major issues, I'd like to get this patch series
> > > queued up for inclusion in 2.6.24.
> >
> > Okay.
> >
> > Tony
> 
> BTW, what path do framebuffer patches take to get into Linus' tree?
> Does he pull your tree directly, or do they go through someone else's
> tree?

They all go to -mm tree, unless it's a needed fix, then to Linus's.

Tony


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Brownell
> > > The old /etc/hotplug/usb.rc script made sure to load those modules
> > > in the correct order:  EHCI first.
> > 
> > I expected to find something cute attempting to handle this under
> > /etc/udev, I have failed so far :-)
>
> No, nothing cute in udev itself, but it seems that all distros that I
> know of have a "load these modules now" type setting in their init
> scripts that can be used here.
>
> I can't think of a way to enforce this load order on the modules
> themselves due to the fact that OHCI might not even be needed for EHCI
> devices on UHCI (Intel) based chipsets :(

Assuming PCI is present, /sys/bus/pci/devices/*/class can tell
if EHCI is present (0x0c0320) ... if so, load that driver.
Then repeat for OHCI (0x0c0310) and UHCI (0x0c0300).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/RFT] kbuild: save ARCH & CROSS_COMPILE

2007-10-08 Thread Randy Dunlap
On Tue, 9 Oct 2007 06:17:43 +0200 Sam Ravnborg wrote:

> > 
> > What about, that this is the first ever prompt, that must be shown and
> > written to the .config?
> Two issues to fix before we can do this:
> 1) chocie values cannot have more than one prompt
> 2) We need to share much more Kconfig* between the individual architectures
>First step is to let all arch's use drivers/Kconfig

2) isn't terribly difficult, just takes some time and willingness
of $arch maintainers to some changes, but please explain a bit more
why it is needed...?


> Let's get the two items above solved then we can revisit adding arch selection
> to kconfig (where it belongs in the end).
> And neither require a rewrite of kconfig...
> 
> > Also, i'd like to propose sequencing of config-enable-build-this-unit
> > in config file(s), thus Makefile(s) (sometimes very small and stupid)
> > will be not necessary. Additional link ordering can be supplied as
> > meta-config information there. Shell scripting, very ugly in the view
> > of make syntax, will be natural in config files. Extending build
> > process to get hidden dependencies or right linking/other magic is
> > part of particular configuration. Hm?
> Discussed before but so far no patches has shown up.


---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Miller
From: Greg KH <[EMAIL PROTECTED]>
Date: Mon, 8 Oct 2007 21:39:09 -0700

> No, nothing cute in udev itself, but it seems that all distros that I
> know of have a "load these modules now" type setting in their init
> scripts that can be used here.
> 
> I can't think of a way to enforce this load order on the modules
> themselves due to the fact that OHCI might not even be needed for EHCI
> devices on UHCI (Intel) based chipsets :(
> 
> Can anyone else?

The three modules perhaps should be a bundle of whatever ones you have
enabled, and internally we can dispatch the initialization to occur in
the correct order from a top-level module_init().

If the devices need to be initialized in a certain order in a
situation like this, it really seems like it is the kernel's job to
enforce it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Miller
From: David Brownell <[EMAIL PROTECTED]>
Date: Mon, 08 Oct 2007 21:36:43 -0700

> Don't need this "limit_1" timeout; "reset_done" handles all
> the timeout needed there.  The regs->fmnumber is essentially
> a millisecond counter.

If the hardware hangs and the register stops incrementing,
the entire kernel will hang.  That is unacceptable.

We do need it.

> 
> > +   int limit_2;
> > +
> > /* spin until any current reset finishes */
> > -   for (;;) {
> > +   limit_2 = PORT_RESET_MSEC * 2;
> 
> This is the loop that didn't terminate for you, right?
> PORT_RESET_HW_MSEC is the ceiling you should use here,
> not PORT_RESET_MSEC.

Ok, fixed.

> What values do you see for "portstat"?

0x111

> I suspect there will be some flag set which would allow a more
> immediate exit from that loop.  RH_PS_CCS might clear, for example.

Absolutely nothing clears in the register from it's initial value.

Here is the patch with the limit_2 initial value fixed.

I kept loop_1 in there, it is necessary.  No kernel code should
hang in an endless loop because of malfunctioning hardware.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/drivers/usb/host/ohci-hub.c b/drivers/usb/host/ohci-hub.c
index bb9cc59..9149593 100644
--- a/drivers/usb/host/ohci-hub.c
+++ b/drivers/usb/host/ohci-hub.c
@@ -563,14 +563,19 @@ static inline int root_port_reset (struct ohci_hcd *ohci, 
unsigned port)
u32 temp;
u16 now = ohci_readl(ohci, &ohci->regs->fmnumber);
u16 reset_done = now + PORT_RESET_MSEC;
+   int limit_1;
 
/* build a "continuous enough" reset signal, with up to
 * 3msec gap between pulses.  scheduler HZ==100 must work;
 * this might need to be deadline-scheduled.
 */
-   do {
+   limit_1 = 100;
+   while (--limit_1 >= 0) {
+   int limit_2;
+
/* spin until any current reset finishes */
-   for (;;) {
+   limit_2 = PORT_RESET_HW_MSEC * 2;
+   while (--limit_2 >= 0) {
temp = ohci_readl (ohci, portstat);
/* handle e.g. CardBus eject */
if (temp == ~(u32)0)
@@ -579,6 +584,10 @@ static inline int root_port_reset (struct ohci_hcd *ohci, 
unsigned port)
break;
udelay (500);
}
+   if (limit_2 < 0) {
+   ohci_warn(ohci, "Root port inner-loop reset timeout, "
+ "portstat[%08x]\n", temp);
+   }
 
if (!(temp & RH_PS_CCS))
break;
@@ -589,7 +598,14 @@ static inline int root_port_reset (struct ohci_hcd *ohci, 
unsigned port)
ohci_writel (ohci, RH_PS_PRS, portstat);
msleep(PORT_RESET_HW_MSEC);
now = ohci_readl(ohci, &ohci->regs->fmnumber);
-   } while (tick_before(now, reset_done));
+   if (!tick_before(now, reset_done))
+   break;
+   }
+   if (limit_1 < 0) {
+   ohci_warn(ohci, "Root port outer-loop reset timeout, "
+ "now[%04x] reset_done[%04x]\n",
+ now, reset_done);
+   }
/* caller synchronizes using PRSC */
 
return 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-fbdev-devel] [PATCH 0/6] Patch series to add of_platform binding to xilinxfb

2007-10-08 Thread Grant Likely
On 10/2/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:
> On Mon, 2007-10-01 at 09:57 -0600, Grant Likely wrote:
> > Assuming there are no major issues, I'd like to get this patch series
> > queued up for inclusion in 2.6.24.
>
> Okay.
>
> Tony

BTW, what path do framebuffer patches take to get into Linus' tree?
Does he pull your tree directly, or do they go through someone else's
tree?

Thanks,
g.


>
>
>


-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
[EMAIL PROTECTED]
(403) 399-0195
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Brownell
> Regardless, here is a patch that hardens the OHCI reset handling
> loops so that they break out instead of hanging the entire system
> should this condition occur.  It's at least better than what the
> code does to a user right now which is hang the box completely:
>
> [USB] ohci: Do not hang the system if port reset does not complete.
>
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
>
> diff --git a/drivers/usb/host/ohci-hub.c b/drivers/usb/host/ohci-hub.c
> index bb9cc59..77ae5b4 100644
> --- a/drivers/usb/host/ohci-hub.c
> +++ b/drivers/usb/host/ohci-hub.c
> @@ -563,14 +563,19 @@ static inline int root_port_reset (struct ohci_hcd 
> *ohci, unsigned port)
>   u32 temp;
>   u16 now = ohci_readl(ohci, &ohci->regs->fmnumber);
>   u16 reset_done = now + PORT_RESET_MSEC;
> + int limit_1;
>  
>   /* build a "continuous enough" reset signal, with up to
>* 3msec gap between pulses.  scheduler HZ==100 must work;
>* this might need to be deadline-scheduled.
>*/
> - do {
> + limit_1 = 100;
> + while (--limit_1 >= 0) {

Don't need this "limit_1" timeout; "reset_done" handles all
the timeout needed there.  The regs->fmnumber is essentially
a millisecond counter.


> + int limit_2;
> +
>   /* spin until any current reset finishes */
> - for (;;) {
> + limit_2 = PORT_RESET_MSEC * 2;

This is the loop that didn't terminate for you, right?
PORT_RESET_HW_MSEC is the ceiling you should use here,
not PORT_RESET_MSEC.


> + while (--limit_2 >= 0) {
>   temp = ohci_readl (ohci, portstat);
>   /* handle e.g. CardBus eject */
>   if (temp == ~(u32)0)
> @@ -579,6 +584,10 @@ static inline int root_port_reset (struct ohci_hcd 
> *ohci, unsigned port)
>   break;
>   udelay (500);
>   }
> + if (limit_2 < 0) {
> + ohci_warn(ohci, "Root port inner-loop reset timeout, "
> +   "portstat[%08x]\n", temp);
> + }

What values do you see for "portstat"?

I suspect there will be some flag set which would allow a more
immediate exit from that loop.  RH_PS_CCS might clear, for example.

And in any case, if that fails I don't see any reason not to just
break, and return immediately.

>  
>   if (!(temp & RH_PS_CCS))
>   break;
> @@ -589,7 +598,14 @@ static inline int root_port_reset (struct ohci_hcd 
> *ohci, unsigned port)
>   ohci_writel (ohci, RH_PS_PRS, portstat);
>   msleep(PORT_RESET_HW_MSEC);
>   now = ohci_readl(ohci, &ohci->regs->fmnumber);
> - } while (tick_before(now, reset_done));
> + if (!tick_before(now, reset_done))
> + break;
> + }
> + if (limit_1 < 0) {
> + ohci_warn(ohci, "Root port outer-loop reset timeout, "
> +   "now[%04x] reset_done[%04x]\n",
> +   now, reset_done);
> + }
>   /* caller synchronizes using PRSC */
>  
>   return 0;
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread Greg KH
On Mon, Oct 08, 2007 at 08:42:36PM -0700, David Miller wrote:
> From: David Brownell <[EMAIL PROTECTED]>
> Date: Mon, 08 Oct 2007 20:34:12 -0700
> 
> > > However, when both OHCI and EHCI are built as modules (or, similarly
> > > I guess, OHCI is built-in and EHCI is modular) there appears to be
> > > nothing in userspace which makes sure EHCI gets loaded first.
> > 
> > The old /etc/hotplug/usb.rc script made sure to load those modules
> > in the correct order:  EHCI first.
> 
> I expected to find something cute attempting to handle this under
> /etc/udev, I have failed so far :-)

No, nothing cute in udev itself, but it seems that all distros that I
know of have a "load these modules now" type setting in their init
scripts that can be used here.

I can't think of a way to enforce this load order on the modules
themselves due to the fact that OHCI might not even be needed for EHCI
devices on UHCI (Intel) based chipsets :(

Can anyone else?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sleepy linux 2.6.23-rc9

2007-10-08 Thread Antonino A. Daplas
On Tue, 2007-10-09 at 00:05 +0200, Pavel Machek wrote:
> Hi!
> 
> I played with powertop a bit, and found a fairly interesting failure
> mode. If I boot init=/bin/bash vga=1, I get ~2 wakeups a second, nice.
> 
> When I boot init=/bin/bash vga=791 (vesa framebuffer), most wakeups
> are caused by cursor painting (I should fix that some day, I
> guess). But... the cursor blinking does not even work properly!
> 
> It blinks at normal speed, then (randomly) it blinks slowly, then gets
> back to normal speed, then inserts longer delay.
> 
> The effect is so nice that I thought about youtube ;-). Thinkpad
> x60.. question is, how to debug it? 

The cursor blinking is done by software via a timer. It's in
drivers/video/console/fbcon.c.

With the latest -rc kernel you can turn off the blinking with

echo 0 > /sys/class/graphics/fbcon/cursor_blink

Tony


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT] fix rt-task scheduling issue

2007-10-08 Thread Gregory Haskins
Hi Guys,
  Nice find!  Comment inline..

(adding linux-rt-users)

 for reference to

 http://lkml.org/lkml/2007/10/8/252

On Mon, 2007-10-08 at 22:46 -0400, Steven Rostedt wrote:
> Index: linux-2.6.23-rc9-rt2/kernel/sched.c
> ===
> --- linux-2.6.23-rc9-rt2.orig/kernel/sched.c
> +++ linux-2.6.23-rc9-rt2/kernel/sched.c
> @@ -2207,7 +2207,7 @@ static inline void finish_task_switch(st
>* If we pushed an RT task off the runqueue,
>* then kick other CPUs, they might run it:
>*/
> - if (unlikely(rt_task(current) && prev->se.on_rq && rt_task(prev))) {
> + if (unlikely(rt_task(current) && rq->rt_nr_running > 1)) {
>   schedstat_inc(rq, rto_schedule);
>   smp_send_reschedule_allbutself_cpumask(current->cpus_allowed);

the current->cpus_allowed I think probably should have been
"prev->cpus_allowed" in the original code?  However, in light of the new
findings with this bug Mike found, this should probably be sent to
allbutself() without the mask since you don't know what could have been
queued behind you.

Unless I am missing something?

Regards,
-Greg


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/RFT] kbuild: save ARCH & CROSS_COMPILE

2007-10-08 Thread Sam Ravnborg
> 
> What about, that this is the first ever prompt, that must be shown and
> written to the .config?
Two issues to fix before we can do this:
1) chocie values cannot have more than one prompt
2) We need to share much more Kconfig* between the individual architectures
   First step is to let all arch's use drivers/Kconfig

Let's get the two items above solved then we can revisit adding arch selection
to kconfig (where it belongs in the end).
And neither require a rewrite of kconfig...

> Also, i'd like to propose sequencing of config-enable-build-this-unit
> in config file(s), thus Makefile(s) (sometimes very small and stupid)
> will be not necessary. Additional link ordering can be supplied as
> meta-config information there. Shell scripting, very ugly in the view
> of make syntax, will be natural in config files. Extending build
> process to get hidden dependencies or right linking/other magic is
> part of particular configuration. Hm?
Discussed before but so far no patches has shown up.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Brownell
> To add some more information here, I think the EHCI idea might
> hold some water.
>
> What I have here are two NEC OHCI USB interfaces and one NEC EHCI
> USB interface on PCI.  Aparently they all go through a shared
> USB hub, mapped like this:
>
> HUB Port 1: OHCI #1, EHCI
> HUB Port 2: OHCI #2, EHCI
> HUB Port 3: OHCI #1, EHCI
> HUB Port 4: OHCI #2, EHCI
> HUB Port 5: OHCI #1, EHCI
>
> The OHCI ports go out to external USB connectors on the back panel of
> the machine, whereas the EHCI is connected up to an internal USB
> storage CDROM device and what appears to be another USB hub.

There's actually no such thing as an "EHCI port" or an "OHCI port".
Instead, there's a set of ports, each of which can be switched so
the USB differential data signals go up to either controller.

When EHCI starts, that switch points to EHCI so that devices can try
enumerating with high speed signaling.  When a device doesn't respond
to that "chirp", the EHCI root hub driver switches the port to the
companion controller.  (Which is OHCI here, UHCI on some PCs, etc.)


> The problem seems to be very strongly tied to timing.  For example
> simply adding "ignore_loglevel" to the kernel boot command line can
> make the problem go away.
>
> This got me thinking about your EHCI comment.
>
> If these controllers are going through the same HUB, things might go
> south if OHCI initialized first, then khubd et al. are asynchronously
> accessing the segments behind OHCI at the same time that the EHCI
> driver is initializing.  Perhaps, this is the kind of sequence of
> events which makes one of the root ports reset in such a way that the
> the reset bit never clears.
>
> Given that this machine has 64 cpus, the likelyhood for such parallel
> accesses is very likely :-)
>
> Does this make any sense?

Yes, that's why I asked about EHCI.  My speculation would be that
OHCI starts the reset, and EHCI claims the port before it completes;
or contrariwise OHCI starts the reset right after EHCI claims it.

And there's some point in that process where a hardware race makes
the trouble you've observed.  I believe there are plenty of other
places where it's perfectly fine if EHCI grabs the port, or this
little race would have shown up many times before.

- Dave


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lockdep: Avoid /proc/lockdep & lock_stat infinite output

2007-10-08 Thread Tim Pepper
On Tue 09 Oct at 02:30:11 +0100 [EMAIL PROTECTED] said:
> On Mon, Oct 08, 2007 at 06:15:51PM -0700, Tim Pepper wrote:
> > 
> > -   if (&class->lock_entry == all_lock_classes.next)
> > +   if (*pos == 0)
> > seq_printf(m, "all lock classes:\n");
> 
> Do not generate output outside of ->show() and you won't have these
> problems.  That's where your infinite output crap comes from.
> 
> IOW, NAK - fix the underlying problem.

Aaah...OK.  Can we add something like the following then:




Document that output must only come from _show() and SEQ_START_TOKEN is how
a _start() indicates a header is to be printed.

Signed-off-by: Tim Pepper <[EMAIL PROTECTED]>
Cc: Al Viro <[EMAIL PROTECTED]>

---

--- linux-2.6.orig/include/linux/seq_file.h
+++ linux-2.6.23-rc9/include/linux/seq_file.h
@@ -36,9 +36,10 @@ ssize_t seq_read(struct file *, char __u
 loff_t seq_lseek(struct file *, loff_t, int);
 int seq_release(struct inode *, struct file *);
 int seq_escape(struct seq_file *, const char *, const char *);
+
+/* these may only be called from a (*show) function */
 int seq_putc(struct seq_file *m, char c);
 int seq_puts(struct seq_file *m, const char *s);
-
 int seq_printf(struct seq_file *, const char *, ...)
__attribute__ ((format (printf,2,3)));
 
@@ -48,6 +49,11 @@ int single_open(struct file *, int (*)(s
 int single_release(struct inode *, struct file *);
 int seq_release_private(struct inode *, struct file *);
 
+/*
+ * return SEQ_START_TOKEN in your (*start) function and test for
+ * (v == SEQ_START_TOKEN) in * your (*show) funtion in order to
+ * print a header before your seq data
+ */
 #define SEQ_START_TOKEN ((void *)1)
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Miller
From: David Brownell <[EMAIL PROTECTED]>
Date: Mon, 08 Oct 2007 20:34:12 -0700

> > However, when both OHCI and EHCI are built as modules (or, similarly
> > I guess, OHCI is built-in and EHCI is modular) there appears to be
> > nothing in userspace which makes sure EHCI gets loaded first.
> 
> The old /etc/hotplug/usb.rc script made sure to load those modules
> in the correct order:  EHCI first.

I expected to find something cute attempting to handle this under
/etc/udev, I have failed so far :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Stephen Hemminger
On Mon, 8 Oct 2007 16:06:03 -0700
Randy Dunlap <[EMAIL PROTECTED]> wrote:

> On Mon, 08 Oct 2007 16:43:10 -0600 Jonathan Corbet wrote:
> 
> > Sam Ravnborg <[EMAIL PROTECTED]> wrote:
> > 
> > > Or maybe we need something much less formal that explain the purpose of 
> > > the
> > > four tags we use:
> > 
> > ...or maybe a combination?  How does the following patch look as a way
> > to describe how the tags are used and what Reviewed-by, in particular,
> > means?
> > 
> > Perhaps the DCO should move to this file as well?
> > 
> > jon
> 
> Just typos noted below...
> 
> > ---
> > 
> > Add a document on patch tags.
> > 
> > Signed-off-by: Jonathan Corbet <[EMAIL PROTECTED]>
> > 
> > diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
> > index 43e89b1..fa1518b 100644
> > --- a/Documentation/00-INDEX
> > +++ b/Documentation/00-INDEX
> > @@ -284,6 +284,8 @@ parport.txt
> > - how to use the parallel-port driver.
> >  parport-lowlevel.txt
> > - description and usage of the low level parallel port functions.
> > +patch-tags
> > +   - description of the tags which can be added to patches
> >  pci-error-recovery.txt
> > - info on PCI error recovery.
> >  pci.txt
> > diff --git a/Documentation/patch-tags b/Documentation/patch-tags
> > new file mode 100644
> > index 000..fb5f8e1
> > --- /dev/null
> > +++ b/Documentation/patch-tags
> > @@ -0,0 +1,66 @@
> > +Patches headed for the mainline may contain a variety of tags documenting
> > +who played a hand in (or was at least aware of) its progress.  All of these
> > +tags have the form:
> > +
> > +   Something-done-by: Full name <[EMAIL PROTECTED]>
> > +
> > +These tags are:
> > +
> > +Signed-off-by:  A person adding a Signed-off-by tag is attesting that the
> > +   patch is, to the best of his or her knowledge, legally able
> > +   to be merged into the mainline and distributed under the
> > +   terms of the GNU General Public License, version 2.
All changes are licensed under the terms of the file modified. 

(Some people seem not to understand that
if the file is dual licensed, then the changes are dual licensed. 
If file is GPL v2 only, then the changes are GPL v2 only, ...)

> >  See
> > +   the Developer's Certificate of Origin, found in
> > +   Documentation/SubmittingPatches, for the precise meaning of
> > +   Signed-off-by.


> > +Acked-by:  The person named (who should be an active developer in the
> > +   area addressed by the patch) is aware of the patch and has
> > +   no objection to its inclusion.  An Acked-by tag does not
> > +   imply any involvement in the development of the patch or
> > +   that a detailed review was done.
> > +
> > +Reviewed-by:   The patch has been reviewed and found acceptible 
> > according
> 
>   acceptable
> 
> > +   to the Reviewer's Statement as found at the bottom of this
> > +   file.  A Reviewed-by tag is a statement of opinion that the
> > +   patch is an appropriate modification of the kernel without
> > +   any remaining serious technical issues.  Any interested
> > +   reviewer (who has done the work) can offer a Reviewed-by
> > +   tag for a patch.
> > +
> > +Cc:The person named was given the opportunity to comment on
> > +   the patch.  This is the only tag which might be added
> > +   without an explicit action by the person it names.
> > +
> > +Tested-by: The patch has been successfully tested (in some
> > +   environment) by the person named.
> > +
>

IMHO the other tags actually are a poor substitute for providing a
more complete description of the reviewer's involvement. It would be better
to have more complete responses like "the patch should be merged as is for
2.6.X but the following should be fixed, ..." etc. The certificate of origin
has meaning for legal things that have a more concrete definition, but the
existing process is about people making good (or bad) decisions based on
feedback and other data. Trying to reduce the feedback down to 3 Acks, and 1 
Review
seems like noise. The problem is getting good reviews of new code in
a timely manner, not the descriptions of the result.


-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Brownell
> However, when both OHCI and EHCI are built as modules (or, similarly
> I guess, OHCI is built-in and EHCI is modular) there appears to be
> nothing in userspace which makes sure EHCI gets loaded first.

The old /etc/hotplug/usb.rc script made sure to load those modules
in the correct order:  EHCI first.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Miller
From: Greg KH <[EMAIL PROTECTED]>
Date: Mon, 8 Oct 2007 20:10:49 -0700

> Yes it does, I'm seeing reports from some hardware companies of the very
> same thing.  If you serialize and load the ehci driver first, and then
> the ohci driver, that should fix the problem.
> 
> Does that also work for you?  Or are these drivers built into the
> kernel?

As coicidence would have it I finally found a recipe for triggering
the issue, and it ties into what you're talking about here.

It happens only if I make sure OHCI gets loaded first and then EHCI
right afterwards.

It seems that indeed it is important for EHCI to get loaded first,
and in-kernel this is ensured by the link ordering.

However, when both OHCI and EHCI are built as modules (or, similarly
I guess, OHCI is built-in and EHCI is modular) there appears to be
nothing in userspace which makes sure EHCI gets loaded first.

When this triggers, in OHCI's root_port_reset(), the port status
register reads 0x111 in that inner-loop and the value never changes.
It stays like this forever.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] Use one zonelist that is filtered by nodemask

2007-10-08 Thread Nishanth Aravamudan
On 08.10.2007 [18:56:05 -0700], Christoph Lameter wrote:
> On Mon, 8 Oct 2007, Nishanth Aravamudan wrote:
> 
> > >  struct page * fastcall
> > >  __alloc_pages(gfp_t gfp_mask, unsigned int order,
> > >   struct zonelist *zonelist)
> > >  {
> > > + /*
> > > +  * Use a temporary nodemask for __GFP_THISNODE allocations. If the
> > > +  * cost of allocating on the stack or the stack usage becomes
> > > +  * noticable, allocate the nodemasks per node at boot or compile time
> > > +  */
> > > + if (unlikely(gfp_mask & __GFP_THISNODE)) {
> > > + nodemask_t nodemask;
> > > +
> > > + return __alloc_pages_internal(gfp_mask, order,
> > > + zonelist, nodemask_thisnode(&nodemask));
> > > + }
> > > +
> > >   return __alloc_pages_internal(gfp_mask, order, zonelist, NULL);
> > >  }
> > 
> > 
> > 
> > So alloc_pages_node() calls here and for THISNODE allocations, we go ask
> > nodemask_thisnode() for a nodemask...
> 
> H... nodemask_thisnode needs to be passed the zonelist.
> 
> > And nodemask_thisnode() always gives us a nodemask with only the node
> > the current process is running on set, I think?
> 
> Right.
> 
> 
> > That seems really wrong -- and would explain what Lee was seeing while
> > using my patches for the hugetlb pool allocator to use THISNODE
> > allocations. All the allocations would end up coming from whatever node
> > the process happened to be running on. This obviously messes up hugetlb
> > accounting, as I rely on THISNODE requests returning NULL if they go
> > off-node.
> > 
> > I'm not sure how this would be fixed, as __alloc_pages() no longer has
> > the nid to set in the mask.
> > 
> > Am I wrong in my analysis?
> 
> No you are right on target. The thisnode function must determine the
> node from the first zone of the zonelist.

It seems like I would zonelist_node_idx() for this, along the lines of:

static nodemask_t *nodemask_thisnode(nodemask_t *nodemask,
struct zonelist *zonelist)
{
int nid = zonelist_node_idx(zonelist);
/* Build a nodemask for just this node */
nodes_clear(*nodemask);
node_set(nid, *nodemask);

return nodemask;
}

But I think I need to check that zonelist->_zonerefs->zone is !NULL, given this
definition of zonelist_node_idx()

static inline int zonelist_node_idx(struct zoneref *zoneref)
{
#ifdef CONFIG_NUMA
/* zone_to_nid not available in this context */
return zoneref->zone->node;
#else
return 0;
#endif /* CONFIG_NUMA */
}

and this comment in __alloc_pages_internal():


z = zonelist->_zonerefs;  /* the list of zones suitable for gfp_mask */

if (unlikely(!z->zone)) {
/*
 * Happens if we have an empty zonelist as a result of
 * GFP_THISNODE being used on a memoryless node
 */
return NULL;
}
...

It seems like zoneref->zone may be NULL in zonelist_node_idx()? Maybe
someone else should look into resolving this :)

Thanks,
Nish

-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread Greg KH
On Mon, Oct 08, 2007 at 04:54:20PM -0700, David Miller wrote:
> From: David Miller <[EMAIL PROTECTED]>
> Date: Sun, 07 Oct 2007 00:51:56 -0700 (PDT)
> 
> > From: David Brownell <[EMAIL PROTECTED]>
> > Date: Sun, 07 Oct 2007 00:31:41 -0700
> > 
> > > Are the other ports still behaving?  Is EHCI maybe trying to switch
> > > ownership of that port?  Is maybe the (newish) autosuspend stuff
> > > kicking in?
> > 
> > I wouldn't know, the machine hangs and doesn't get any further.
> 
> To add some more information here, I think the EHCI idea might
> hold some water.
> 
> What I have here are two NEC OHCI USB interfaces and one NEC EHCI
> USB interface on PCI.  Aparently they all go through a shared
> USB hub, mapped like this:
> 
> HUB Port 1: OHCI #1, EHCI
> HUB Port 2: OHCI #2, EHCI
> HUB Port 3: OHCI #1, EHCI
> HUB Port 4: OHCI #2, EHCI
> HUB Port 5: OHCI #1, EHCI
> 
> The OHCI ports go out to external USB connectors on the back panel of
> the machine, whereas the EHCI is connected up to an internal USB
> storage CDROM device and what appears to be another USB hub.
> 
> The problem seems to be very strongly tied to timing.  For example
> simply adding "ignore_loglevel" to the kernel boot command line can
> make the problem go away.
> 
> This got me thinking about your EHCI comment.
> 
> If these controllers are going through the same HUB, things might go
> south if OHCI initialized first, then khubd et al. are asynchronously
> accessing the segments behind OHCI at the same time that the EHCI
> driver is initializing.  Perhaps, this is the kind of sequence of
> events which makes one of the root ports reset in such a way that the
> the reset bit never clears.
> 
> Given that this machine has 64 cpus, the likelyhood for such parallel
> accesses is very likely :-)
> 
> Does this make any sense?

Yes it does, I'm seeing reports from some hardware companies of the very
same thing.  If you serialize and load the ehci driver first, and then
the ohci driver, that should fix the problem.

Does that also work for you?  Or are these drivers built into the
kernel?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -rt more realtime scheduling issues

2007-10-08 Thread Steven Rostedt
On Mon, Oct 08, 2007 at 11:45:23AM -0700, Mike Kravetz wrote:
> On Fri, Oct 05, 2007 at 07:15:48PM -0700, Mike Kravetz wrote:
> > After applying the fix to try_to_wake_up() I was still seeing some large
> > latencies for realtime tasks.
> 
> I've been looking for places in the code where reschedule IPIs should
> be sent in the case of 'overload' to redistribute RealTime tasks based
> on priority.  However, an even more basic question to ask might be:  Are
> the use of reschedule IPIs reliable enough for this purpose.  In the
> code, there is the following comment:
> 
> /*
>  * this function sends a 'reschedule' IPI to another CPU.
>  * it goes straight through and wastes no time serializing
>  * anything. Worst case is that we lose a reschedule ...
>  */
> 
> After a quick read of the code, it does appear that reschedule's can
> be lost if the the IPI is sent at just the right time in schedule
> processing.  Can someone confirm this is actually the case?
> 
> The issue I see is that the 'rt_overload' mechanism depends on reschedule
> IPIs for RealTime scheduling semantics.  If this is not a reliable
> mechanism then this can lead to breakdowns in RealTime scheduling semantics.
> 
> Are these accurate statements?  I'll start working on a reliable delivery
> mechanism for RealTime scheduling.  But, I just want to make sure that
> is really necessary.

For i386 I don't think so. Seems that the interrupt handler will set the
current task to "need_resched" and on exit of the interrupt handler, the
schedule should take place. I don't see the race (that doesn't mean
there is one).

For x86_64 though, I don't think that we schedule. All the reschedule
vector does is return with a comment:

/*
 * Reschedule call back. Nothing to do,
 * all the work is done automatically when
 * we return from the interrupt.
 */
asmlinkage void smp_reschedule_interrupt(void)
{
ack_APIC_irq();
}

I'm thinking that this was the case for i386 a while back, and we fixed
it for RT.

/me does a quick search...

http://lkml.org/lkml/2005/5/13/174

Yep!  This is a bug in x86_64. I'll fix this up tomorrow and send out a
patch.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT] fix rt-task scheduling issue

2007-10-08 Thread Steven Rostedt
Mike,

Can you attach your Signed-off-by to this patch, please.


On Fri, Oct 05, 2007 at 07:15:48PM -0700, Mike Kravetz wrote:
> Hi Ingo,
> 
> After applying the fix to try_to_wake_up() I was still seeing some large
> latencies for realtime tasks.  Some debug code pointed out two additional
> causes of these latencies.  I have put fixes into my 'old' kernel and the
> scheduler related latencies have gone away.  I'm pretty confident that
> one of these bugs still exist in the latest RT patch set.  Not so sure
> about the other.  But, I wanted to describe in detail so that you could
> address in the latest version of the code if applicable.
> 
> finish_task_switch() contains the following code:
> 
> #if defined(CONFIG_PREEMPT_RT) && defined(CONFIG_SMP)
>   /*
>* If we pushed an RT task off the runqueue,
>* then kick other CPUs, they might run it:
>*/
>   if (unlikely(rt_task(current) && prev->se.on_rq && rt_task(prev))) {
>   schedstat_inc(rq, rto_schedule);
>   smp_send_reschedule_allbutself_cpumask(current->cpus_allowed);
>   }
> #endif
> 
> My debug code found instances where more than one realtime task got
> put on the runqueue before the __schedule() was invoked.  So, current
> would be a realtime task, but prev was not realtime.  And, there was
> another (lesser priority, or last in) realtime task on the queue.  I
> believe that in this case we would still want to send the IPIs.  In my
> kernel I changed the test to be:
> 
>   if (unlikely(rt_task(current) && rq->rt_nr_running > 1)) {
> 
> After this change, I definitely saw some long latencies go away.

I definitely agree with your analysis.

> 
> The other place of concern is in the routine pull_task().  I was a
> little surprised to see realtime tasks moved around via normal load
> balancing.  But, my debug code did point this out.  In the code for
> my old kernel, the routines end with:
> 
> /*
>  * Note that idle threads have a prio of MAX_PRIO, for this test
>  * to be always true for them.
>  */
> if (TASK_PREEMPTS_CURR(p, this_rq))
> resched_task(this_rq->curr);
> 
> This reminded me very much of the situation/code in try_to_wake_up().
> If pull_tasks() pulled in a realtime task, then I think it should also
> deal with the case where (TASK_PREEMPTS_CURR(p, this_rq) is false.  So
> I changed the code in my kernel to be:
> 
>   /*
>* Note that idle threads have a prio of MAX_PRIO, for this test
>* to be always true for them.
>*/
>   if (TASK_PREEMPTS_CURR(p, this_rq)) {
>   resched_task(this_rq->curr);
> 
>   } else if (unlikely(rt_task(p))) {
>   /* no appropriate rt_overload counter goes here */
>   smp_send_reschedule_allbutself();
>   }

I'm thinking that the first change would actually make this one
obsolete. The checking at the time of scheduling should cover most cases
where multiple rt tasks are being queued on the same CPU.  When we see
that the rt tasks are bunching up on a queue we should handle it then.
Which I would think is at the time of schedule, and the time a task is
queued (try_to_wake_up). Hopefully this is enough.

> 
> To be perfectly honest, I don't know if this change helped eliminate
> any of the large latencies I was seeing.  I made this changes first,
> and was still seeing some large latencies.  I then made the modification
> to finish_task_switch() and all my scheduler related latencies went
> away.  Entirely possible this change had no impact.  Also, the above

I'm thinking it may have had little to no effect. The first change seems
to be the culprit.

> code is replaced in the latest kernels with:
> 
>   check_preempt_curr(this_rq, p);
> 
> What check_preempt_curr() does is not immediately obvious to me. So,
> this may not apply at all.  Just something to think about.

I also don't want to put too many IPI reschedules when we see that we
have more than one rt task on queue. I can imaging an IPI scheduling
storm if we have one more rt tasks than CPUs. So sending the IPI when a
task switch actually occurs seems approriate.

-- Steve

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>


Index: linux-2.6.23-rc9-rt2/kernel/sched.c
===
--- linux-2.6.23-rc9-rt2.orig/kernel/sched.c
+++ linux-2.6.23-rc9-rt2/kernel/sched.c
@@ -2207,7 +2207,7 @@ static inline void finish_task_switch(st
 * If we pushed an RT task off the runqueue,
 * then kick other CPUs, they might run it:
 */
-   if (unlikely(rt_task(current) && prev->se.on_rq && rt_task(prev))) {
+   if (unlikely(rt_task(current) && rq->rt_nr_running > 1)) {
schedstat_inc(rq, rto_schedule);
smp_send_reschedule_allbutself_cpumask(current->cpus_allowed);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a m

Re: [PATCH] mm: set_page_dirty_balance() vs ->page_mkwrite()

2007-10-08 Thread Mark Fasheh
On Mon, Oct 08, 2007 at 05:47:52PM +1000, Nick Piggin wrote:
> > block_page_mkwrite() is just using generic interfaces to do this,
> > same as pretty much any write() system call. The idea was to make it
> > as similar to the write() call path as possible...
> >
> > However, unlike generic_file_buffered_write(), we are not calling
> > balance_dirty_pages_ratelimited(mapping) between
> > ->prepare/commit_write call pairs.  Perhaps this should be added to
> > block_page_mkwrite() after the page is unlocked
> 
> That sounds pretty sane, in terms of matching with
> generic_file_buffered_write.

I agree. We could also insert a call to balance_dirty_pages_ratelimited() in
__ocfs2_page_mkwrite.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Steven Rostedt
On Mon, Oct 08, 2007 at 10:16:26PM +0200, Rafael J. Wysocki wrote:
> 
> Tested-by: is sort of trivial for a fix patch, for example, if a bug reporter
> confirms that the proposed patch actually fixes the issue.  IMHO it wouldn't
> be practical to complicate that.
>

I see two types of Tested-by.

1) As you stated, a fixed to a problem that the reporter has seen. So
that someone could state a "fixes issue" in the change log and that
would simple mean that the tester has seen a problem, and the attached
patch fixes it.

2) Someone has a testsuite to the area that the change affects. So if
someone has developed a networking test suite and a patch changes some
networking logic, the Tested-by could be that the tester actually ran
specific tests.  This should require a more detail explaination of what
was done. Or the very least, a pointer to a web page of the tests that
were run.

So for the user that sees an issue, then gets a patch, perhaps all they
need to do is add a "fixed problem" or "works now" in the change log to
denote that the patch has actually (or seems to) fix the problem that
they previously seen. This shouldn't be too hard.

But for those that run test suites, they should be smart enough to put
in more documentation into the change log to state how it was tested.

Perhaps we need to add yet another signed off.

"Verified-by", which could be for the user that saw an issue and the
patch now fixes it. That user could just add the "Verified-by" to the
patch to acknowledge (and record) that the patch did fix the issue.

The "Tested-by" can be used for patches that are run through a test
suite.

Just a thought.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] Use one zonelist that is filtered by nodemask

2007-10-08 Thread Christoph Lameter
On Mon, 8 Oct 2007, Nishanth Aravamudan wrote:

> >  struct page * fastcall
> >  __alloc_pages(gfp_t gfp_mask, unsigned int order,
> > struct zonelist *zonelist)
> >  {
> > +   /*
> > +* Use a temporary nodemask for __GFP_THISNODE allocations. If the
> > +* cost of allocating on the stack or the stack usage becomes
> > +* noticable, allocate the nodemasks per node at boot or compile time
> > +*/
> > +   if (unlikely(gfp_mask & __GFP_THISNODE)) {
> > +   nodemask_t nodemask;
> > +
> > +   return __alloc_pages_internal(gfp_mask, order,
> > +   zonelist, nodemask_thisnode(&nodemask));
> > +   }
> > +
> > return __alloc_pages_internal(gfp_mask, order, zonelist, NULL);
> >  }
> 
> 
> 
> So alloc_pages_node() calls here and for THISNODE allocations, we go ask
> nodemask_thisnode() for a nodemask...

H... nodemask_thisnode needs to be passed the zonelist.

> And nodemask_thisnode() always gives us a nodemask with only the node
> the current process is running on set, I think?

Right.

 
> That seems really wrong -- and would explain what Lee was seeing while
> using my patches for the hugetlb pool allocator to use THISNODE
> allocations. All the allocations would end up coming from whatever node
> the process happened to be running on. This obviously messes up hugetlb
> accounting, as I rely on THISNODE requests returning NULL if they go
> off-node.
> 
> I'm not sure how this would be fixed, as __alloc_pages() no longer has
> the nid to set in the mask.
> 
> Am I wrong in my analysis?

No you are right on target. The thisnode function must determine the node 
from the first zone of the zonelist.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: parallel networking

2007-10-08 Thread Jeff Garzik

David Miller wrote:

From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Mon, 08 Oct 2007 10:22:28 -0400

In terms of overall parallelization, both for TX as well as RX, my gut 
feeling is that we want to move towards an MSI-X, multi-core friendly 
model where packets are LIKELY to be sent and received by the same set 
of [cpus | cores | packages | nodes] that the [userland] processes 
dealing with the data.


The problem is that the packet schedulers want global guarantees
on packet ordering, not flow centric ones.

That is the issue Jamal is concerned about.


Oh, absolutely.

I think, fundamentally, any amount of cross-flow resource management 
done in software is an obstacle to concurrency.


That's not a value judgement, just a statement of fact.

"traffic cops" are intentional bottlenecks we add to the process, to 
enable features like priority flows, filtering, or even simple socket 
fairness guarantees.  Each of those bottlenecks serves a valid purpose, 
but at the end of the day, it's still a bottleneck.


So, improving concurrency may require turning off useful features that 
nonetheless hurt concurrency.




The more I think about it, the more inevitable it seems that we really
might need multiple qdiscs, one for each TX queue, to pull this full
parallelization off.

But the semantics of that don't smell so nice either.  If the user
attaches a new qdisc to "ethN", does it go to all the TX queues, or
what?

All of the traffic shaping technology deals with the device as a unary
object.  It doesn't fit to multi-queue at all.


Well the easy solutions to networking concurrency are

* use virtualization to carve up the machine into chunks

* use multiple net devices

Since new NIC hardware is actively trying to be friendly to 
multi-channel/virt scenarios, either of these is reasonably 
straightforward given the current state of the Linux net stack.  Using 
multiple net devices is especially attractive because it works very well 
with the existing packet scheduling.


Both unfortunately impose a burden on the developer and admin, to force 
their apps to distribute flows across multiple [VMs | net devs].



The third alternative is to use a single net device, with SMP-friendly 
packet scheduling.  Here you run into the problems you described "device 
as a unary object" etc. with the current infrastructure.


With multiple TX rings, consider that we are pushing the packet 
scheduling from software to hardware...  which implies

* hardware-specific packet scheduling
* some TC/shaping features not available, because hardware doesn't 
support it


Jeff




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]fix VM_CAN_NONLINEAR check in sys_remap_file_pages

2007-10-08 Thread Yan Zheng
2007/10/9, Andrew Morton <[EMAIL PROTECTED]>:
> Perhaps Yan Zheng can tell us what test was used to demonstrate this?

I found it by review, only do test to check remap_file_pages works
when VM_CAN_NONLINEAR flags is set.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lockdep: Avoid /proc/lockdep & lock_stat infinite output

2007-10-08 Thread Al Viro
On Mon, Oct 08, 2007 at 06:15:51PM -0700, Tim Pepper wrote:
> 
> When a read() requests an amount of data smaller than the amount of data
> that the seq_file's foo_show() outputs, the output starts looping and
> outputs the "stuck" element's data infinitely.  There may be multiple
> sequential calls to foo_start(), foo_next()/foo_show(), and foo_stop()
> for a single open with sequential read of the file.  The _start() does not
> have to start with the 0th element and _show() might be called multiple
> times in a row for the same element for a given open/read of the seq_file.
>  
>  static void *l_start(struct seq_file *m, loff_t *pos)
>  {
> - struct lock_class *class = m->private;
> + struct lock_class *class;
> + loff_t i = 0;
>  
> - if (&class->lock_entry == all_lock_classes.next)
> + if (*pos == 0)
>   seq_printf(m, "all lock classes:\n");

Do not generate output outside of ->show() and you won't have these
problems.  That's where your infinite output crap comes from.

IOW, NAK - fix the underlying problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] param_sysfs_builtin memchr argument fix

2007-10-08 Thread Dave Young
> > If memchr argument is longer than strlen(kp->name), there will be some
> > weird result.
> 
> Just to clarify:  this was causing duplicate filenames in sysfs ?
Yes, it will casuse duplicate filenames in sysfs. For me, the "nousb"
will cause the "usbcore" created twice. 
> 
> 
> > Signed-off-by: Dave Young <[EMAIL PROTECTED]>
> >
> > ---
> > kernel/params.c |8 +++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff -upr linux/kernel/params.c linux.new/kernel/params.c
> > --- linux/kernel/params.c 2007-10-08 14:30:06.0 +0800
> > +++ linux.new/kernel/params.c 2007-10-08 15:13:04.0 +0800
> > @@ -592,11 +592,17 @@ static void __init param_sysfs_builtin(v
> >
> >   for (i=0; i < __stop___param - __start___param; i++) {
> >   char *dot;
> > + size_t kplen;
> >
> >   kp = &__start___param[i];
> > + kplen = strlen(kp->name);
> >
> >   /* We do not handle args without periods. */
> > - dot = memchr(kp->name, '.', MAX_KBUILD_MODNAME);
> > + if (kplen > MAX_KBUILD_MODNAME) {
> > + DEBUGP("kernel parameter %s is too long\n", kp->name);
> 
> how about
> kernel parameter name %s is too long
> or
> kernel parameter name is too long: %s
> 
> (primary is addition of "name")
Yes, "name" should be added, thanks.
> 
> > + continue;
> > + }
> > + dot = memchr(kp->name, '.', kplen);
> >   if (!dot) {
> >   DEBUGP("couldn't find period in %s\n", kp->name);
> >   continue;
> > -
> 

Regards
dave



Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
kernel/params.c |8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff -upr linux/kernel/params.c linux.new/kernel/params.c
--- linux/kernel/params.c   2007-10-08 14:30:06.0 +0800
+++ linux.new/kernel/params.c   2007-10-09 09:16:55.0 +0800
@@ -592,11 +592,17 @@ static void __init param_sysfs_builtin(v
 
for (i=0; i < __stop___param - __start___param; i++) {
char *dot;
+   size_t kplen;
 
kp = &__start___param[i];
+   kplen = strlen(kp->name);
 
/* We do not handle args without periods. */
-   dot = memchr(kp->name, '.', MAX_KBUILD_MODNAME);
+   if (kplen > MAX_KBUILD_MODNAME) {
+   DEBUGP("kernel parameter name is too long: %s\n", 
kp->name);
+   continue;
+   }
+   dot = memchr(kp->name, '.', kplen);
if (!dot) {
DEBUGP("couldn't find period in %s\n", kp->name);
continue;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lockdep: Avoid /proc/lockdep & lock_stat infinite output

2007-10-08 Thread Tim Pepper

When a read() requests an amount of data smaller than the amount of data
that the seq_file's foo_show() outputs, the output starts looping and
outputs the "stuck" element's data infinitely.  There may be multiple
sequential calls to foo_start(), foo_next()/foo_show(), and foo_stop()
for a single open with sequential read of the file.  The _start() does not
have to start with the 0th element and _show() might be called multiple
times in a row for the same element for a given open/read of the seq_file.

Signed-off-by: Tim Pepper <[EMAIL PROTECTED]>
Cc: Peter Zijlstra <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>

---

Assuming people are fine with this, it should probably find its way
to stable.

If you haven't seen the infinite output: it's easy to trigger with a
simple 'cat /proc/lockdep' generally for me, a cat /proc/lock_stat piped
to a file or for either of them a dd with the default bs=512 (or smaller)
should do the job also.

With this change to the lock_stat handler the data->iter member no longer
attempts to hold state across calls, so it could be taken out of the
lock_stat_seq struct and replace by a local variable in each function
but that isn't a clear win to me so I just left it.

--- linux-2.6.23-rc9.orig/kernel/lockdep_proc.c
+++ linux-2.6.23-rc9/kernel/lockdep_proc.c
@@ -34,19 +34,23 @@ static void *l_next(struct seq_file *m, 
  lock_entry);
else
class = NULL;
-   m->private = class;
 
return class;
 }
 
 static void *l_start(struct seq_file *m, loff_t *pos)
 {
-   struct lock_class *class = m->private;
+   struct lock_class *class;
+   loff_t i = 0;
 
-   if (&class->lock_entry == all_lock_classes.next)
+   if (*pos == 0)
seq_printf(m, "all lock classes:\n");
 
+   list_for_each_entry(class, &all_lock_classes, lock_entry) {
+   if (i++ == *pos)
+   return class;
+   }
+   return NULL;
-   return class;
 }
 
 static void l_stop(struct seq_file *m, void *v)
@@ -101,7 +105,7 @@ static void print_name(struct seq_file *
 static int l_show(struct seq_file *m, void *v)
 {
unsigned long nr_forward_deps, nr_backward_deps;
-   struct lock_class *class = m->private;
+   struct lock_class *class = v;
struct lock_list *entry;
char c1, c2, c3, c4;
 
@@ -523,12 +527,15 @@ static void *ls_start(struct seq_file *m
 {
struct lock_stat_seq *data = m->private;
 
-   if (data->iter == data->stats)
-   seq_header(m);
+   data->iter = data->stats;
+   data->iter += *pos;
 
-   if (data->iter == data->iter_end)
+   if (data->iter >= data->iter_end)
data->iter = NULL;
 
+   if (data->iter == data->stats)
+   seq_header(m);
+
return data->iter;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] Use one zonelist that is filtered by nodemask

2007-10-08 Thread Nishanth Aravamudan
On 28.09.2007 [15:25:27 +0100], Mel Gorman wrote:
> 
> Two zonelists exist so that GFP_THISNODE allocations will be guaranteed
> to use memory only from a node local to the CPU. As we can now filter the
> zonelist based on a nodemask, we filter the standard node zonelist for zones
> on the local node when GFP_THISNODE is specified.
> 
> When GFP_THISNODE is used, a temporary nodemask is created with only the
> node local to the CPU set. This allows us to eliminate the second zonelist.
> 
> Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
> Acked-by: Christoph Lameter <[EMAIL PROTECTED]>



> diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
> linux-2.6.23-rc8-mm2-030_filter_nodemask/include/linux/gfp.h 
> linux-2.6.23-rc8-mm2-040_use_one_zonelist/include/linux/gfp.h
> --- linux-2.6.23-rc8-mm2-030_filter_nodemask/include/linux/gfp.h  
> 2007-09-28 15:49:57.0 +0100
> +++ linux-2.6.23-rc8-mm2-040_use_one_zonelist/include/linux/gfp.h 
> 2007-09-28 15:55:03.0 +0100

[Reordering the chunks to make my comments a little more logical]



> -static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
> +static inline struct zonelist *node_zonelist(int nid)
>  {
> - return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
> + return &NODE_DATA(nid)->node_zonelist;
>  }
> 
>  #ifndef HAVE_ARCH_FREE_PAGE
> @@ -198,7 +186,7 @@ static inline struct page *alloc_pages_n
>   if (nid < 0)
>   nid = numa_node_id();
> 
> - return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
> + return __alloc_pages(gfp_mask, order, node_zonelist(nid));
>  }

This is alloc_pages_node(), and converting the nid to a zonelist means
that lower levels (specifically __alloc_pages() here) are not aware of
nids, as far as I can tell. This isn't a change, I just want to make
sure I understand...



>  struct page * fastcall
>  __alloc_pages(gfp_t gfp_mask, unsigned int order,
>   struct zonelist *zonelist)
>  {
> + /*
> +  * Use a temporary nodemask for __GFP_THISNODE allocations. If the
> +  * cost of allocating on the stack or the stack usage becomes
> +  * noticable, allocate the nodemasks per node at boot or compile time
> +  */
> + if (unlikely(gfp_mask & __GFP_THISNODE)) {
> + nodemask_t nodemask;
> +
> + return __alloc_pages_internal(gfp_mask, order,
> + zonelist, nodemask_thisnode(&nodemask));
> + }
> +
>   return __alloc_pages_internal(gfp_mask, order, zonelist, NULL);
>  }



So alloc_pages_node() calls here and for THISNODE allocations, we go ask
nodemask_thisnode() for a nodemask...

> +static nodemask_t *nodemask_thisnode(nodemask_t *nodemask)
> +{
> + /* Build a nodemask for just this node */
> + int nid = numa_node_id();
> +
> + nodes_clear(*nodemask);
> + node_set(nid, *nodemask);
> +
> + return nodemask;
> +}



And nodemask_thisnode() always gives us a nodemask with only the node
the current process is running on set, I think?

That seems really wrong -- and would explain what Lee was seeing while
using my patches for the hugetlb pool allocator to use THISNODE
allocations. All the allocations would end up coming from whatever node
the process happened to be running on. This obviously messes up hugetlb
accounting, as I rely on THISNODE requests returning NULL if they go
off-node.

I'm not sure how this would be fixed, as __alloc_pages() no longer has
the nid to set in the mask.

Am I wrong in my analysis?

Thanks,
Nish

-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sleepy linux 2.6.23-rc9

2007-10-08 Thread H. Peter Anvin

Clemens Koller wrote:


When I boot init=/bin/bash vga=791 (vesa framebuffer), most wakeups
are caused by cursor painting (I should fix that some day, I
guess). But... the cursor blinking does not even work properly!

It blinks at normal speed, then (randomly) it blinks slowly, then gets
back to normal speed, then inserts longer delay.


Is the effect a beat that it has roughly the frequency of your Notebooks
screen refresh rate (60Hz)? (in german: Schwebung)


The effect is so nice that I thought about youtube ;-). Thinkpad
x60.. question is, how to debug it? 


No idea... check where the register of the HW cursor blink rate
gets written? But as it seems to be so nice, please submit a patch
which enables this for all platforms. ;-)



For the VESA framebuffer I would assume the cursor blinking is done in 
software (if done at all.)


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] aic94xx: Use sas_request_addr() to provide SAS addr if the adapter lacks one

2007-10-08 Thread Darrick J. Wong
If the aic94xx chip doesn't have a SAS address in the chip's flash memory,
make libsas get one for us.  Also clean out some old code that had been
used to do this in the past.

Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]>
---

 drivers/scsi/aic94xx/aic94xx.h  |   16 
 drivers/scsi/aic94xx/aic94xx_hwi.c  |   21 ++---
 drivers/scsi/aic94xx/aic94xx_init.c |2 --
 3 files changed, 10 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/aic94xx/aic94xx.h b/drivers/scsi/aic94xx/aic94xx.h
index 32f513b..aee235f 100644
--- a/drivers/scsi/aic94xx/aic94xx.h
+++ b/drivers/scsi/aic94xx/aic94xx.h
@@ -58,7 +58,6 @@
 
 extern struct kmem_cache *asd_dma_token_cache;
 extern struct kmem_cache *asd_ascb_cache;
-extern char sas_addr_str[2*SAS_ADDR_SIZE + 1];
 
 static inline void asd_stringify_sas_addr(char *p, const u8 *sas_addr)
 {
@@ -68,21 +67,6 @@ static inline void asd_stringify_sas_addr(char *p, const u8 
*sas_addr)
*p = '\0';
 }
 
-static inline void asd_destringify_sas_addr(u8 *sas_addr, const char *p)
-{
-   int i;
-   for (i = 0; i < SAS_ADDR_SIZE; i++) {
-   u8 h, l;
-   if (!*p)
-   break;
-   h = isdigit(*p) ? *p-'0' : *p-'A'+10;
-   p++;
-   l = isdigit(*p) ? *p-'0' : *p-'A'+10;
-   p++;
-   sas_addr[i] = (h<<4) | l;
-   }
-}
-
 struct asd_ha_struct;
 struct asd_ascb;
 
diff --git a/drivers/scsi/aic94xx/aic94xx_hwi.c 
b/drivers/scsi/aic94xx/aic94xx_hwi.c
index 0cd7eed..1dc5400 100644
--- a/drivers/scsi/aic94xx/aic94xx_hwi.c
+++ b/drivers/scsi/aic94xx/aic94xx_hwi.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "aic94xx.h"
 #include "aic94xx_reg.h"
@@ -38,16 +39,14 @@ u32 MBAR0_SWB_SIZE;
 
 /* -- Initialization -- */
 
-static void asd_get_user_sas_addr(struct asd_ha_struct *asd_ha)
+static int asd_get_user_sas_addr(struct asd_ha_struct *asd_ha)
 {
-   extern char sas_addr_str[];
-   /* If the user has specified a WWN it overrides other settings
-*/
-   if (sas_addr_str[0] != '\0')
-   asd_destringify_sas_addr(asd_ha->hw_prof.sas_addr,
-sas_addr_str);
-   else if (asd_ha->hw_prof.sas_addr[0] != 0)
-   asd_stringify_sas_addr(sas_addr_str, asd_ha->hw_prof.sas_addr);
+   /* adapter came with a sas address */
+   if (asd_ha->hw_prof.sas_addr[0])
+   return 0;
+
+   return sas_request_addr(asd_ha->sas_ha.core.shost,
+   asd_ha->hw_prof.sas_addr);
 }
 
 static void asd_propagate_sas_addr(struct asd_ha_struct *asd_ha)
@@ -657,8 +657,7 @@ int asd_init_hw(struct asd_ha_struct *asd_ha)
 
asd_init_ctxmem(asd_ha);
 
-   asd_get_user_sas_addr(asd_ha);
-   if (!asd_ha->hw_prof.sas_addr[0]) {
+   if (asd_get_user_sas_addr(asd_ha)) {
asd_printk("No SAS Address provided for %s\n",
   pci_name(asd_ha->pcidev));
err = -ENODEV;
diff --git a/drivers/scsi/aic94xx/aic94xx_init.c 
b/drivers/scsi/aic94xx/aic94xx_init.c
index b70d6e7..5c99f27 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -54,8 +54,6 @@ MODULE_PARM_DESC(collector, "\n"
"\tThe aic94xx SAS LLDD supports both modes.\n"
"\tDefault: 0 (Direct Mode).\n");
 
-char sas_addr_str[2*SAS_ADDR_SIZE + 1] = "";
-
 static struct scsi_transport_template *aic94xx_transport_template;
 static int asd_scan_finished(struct Scsi_Host *, unsigned long);
 static void asd_scan_start(struct Scsi_Host *);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] libsas: Provide a transport-level facility to request SAS addrs

2007-10-08 Thread Darrick J. Wong
Use the request_firmware() interface to get a SAS address from userspace.
This way, there's no debate as to who or how an address gets generated;
it's up to the administrator to provide one if the driver can't find one
on its own.

Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]>
---

 drivers/scsi/libsas/sas_scsi_host.c |   41 +++
 include/scsi/libsas.h   |3 +++
 2 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index 7663841..0fa0296 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -24,6 +24,8 @@
  */
 
 #include 
+#include 
+#include 
 
 #include "sas_internal.h"
 
@@ -1047,6 +1049,45 @@ void sas_target_destroy(struct scsi_target *starget)
return;
 }
 
+static void sas_parse_addr(u8 *sas_addr, const char *p)
+{
+   int i;
+   for (i = 0; i < SAS_ADDR_SIZE; i++) {
+   u8 h, l;
+   if (!*p)
+   break;
+   h = isdigit(*p) ? *p-'0' : toupper(*p)-'A'+10;
+   p++;
+   l = isdigit(*p) ? *p-'0' : toupper(*p)-'A'+10;
+   p++;
+   sas_addr[i] = (h<<4) | l;
+   }
+}
+
+#define SAS_STRING_ADDR_SIZE   16
+
+int sas_request_addr(struct Scsi_Host *shost, u8 *addr)
+{
+   int res;
+   const struct firmware *fw;
+
+   res = request_firmware(&fw, "sas_addr", &shost->shost_gendev);
+   if (res)
+   return res;
+
+   if (fw->size < SAS_STRING_ADDR_SIZE) {
+   res = -ENODEV;
+   goto out;
+   }
+
+   sas_parse_addr(addr, fw->data);
+
+out:
+   release_firmware(fw);
+   return res;
+}
+EXPORT_SYMBOL_GPL(sas_request_addr);
+
 EXPORT_SYMBOL_GPL(sas_queuecommand);
 EXPORT_SYMBOL_GPL(sas_target_alloc);
 EXPORT_SYMBOL_GPL(sas_slave_configure);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 8dda2d6..58aa2aa 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -676,4 +676,7 @@ extern int sas_ioctl(struct scsi_device *sdev, int cmd, 
void __user *arg);
 
 extern int sas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
   struct request *req);
+
+int sas_request_addr(struct Scsi_Host *shost, u8 *addr);
+
 #endif /* _SASLIB_H_ */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lockdep: how to tell it multiple pte locks is OK?

2007-10-08 Thread Jeremy Fitzhardinge
Arjan van de Ven wrote:
> s/implemented/merged/ :)
>
> IN fact shared pagetables are already there for hugepages.
> For small pages it's a patch at this point.
>   

Is it kept up to date?  Where does it live?

> no I'm not saying that. I'm just saying that I'm worried about the
> locking robustness of your trick in general.
>   

Hm, well I won't need to re-pin shared ptes anyway, so I think it's moot.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: set_page_dirty_balance() vs ->page_mkwrite()

2007-10-08 Thread Nick Piggin
On Tuesday 09 October 2007 09:36, David Chinner wrote:
> On Mon, Oct 08, 2007 at 04:37:00PM +1000, Nick Piggin wrote:
> > On Tuesday 09 October 2007 02:54, Peter Zijlstra wrote:

> > > Force a balance call if ->page_mkwrite() was successful.
> >
> > Would it be better to just have the callers set_page_dirty_balance()?
>
> block_page_mkwrite() is just using generic interfaces to do this,
> same as pretty much any write() system call. The idea was to make it
> as similar to the write() call path as possible...
>
> However, unlike generic_file_buffered_write(), we are not calling
> balance_dirty_pages_ratelimited(mapping) between
> ->prepare/commit_write call pairs.  Perhaps this should be added to
> block_page_mkwrite() after the page is unlocked

That sounds pretty sane, in terms of matching with
generic_file_buffered_write.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]fix VM_CAN_NONLINEAR check in sys_remap_file_pages

2007-10-08 Thread Nick Piggin
On Tuesday 09 October 2007 03:51, Andrew Morton wrote:
> On Mon, 8 Oct 2007 10:28:43 -0700

> > I'll now add remap_file_pages soon.
> > Maybe those other 2 tests aren't strong enough (?).
> > Or maybe they don't return a non-0 exit status even when they fail...
> > (I'll check.)
>
> Perhaps Yan Zheng can tell us what test was used to demonstrate this?

Was probably found by review. Otherwise, you could probably reproduce
it by mmaping, say, drm device node, running remap_file_pages() on it
to create a nonlinear mapping, and then finding that you get the wrong
data.

> > > I'm surprise that LTP doesn't have any remap_file_pages() tests.
> >
> > quick grep didn't find any for me.
>
> Me either.  There are a few lying around the place which could be
> integrated.
>
> It would be good if LTP were to have some remap_file_pages() tests
> (please).  As we see here, it is something which we can easily break, and
> leave broken for some time.

Here is Ingo's old test, since cleaned up and fixed a bit by me
I'm sure he would distribute it GPL, but I've cc'ed him because I didn't
find an explicit statement about that.

/*
 * Copyright (C) Ingo Molnar, 2002
 */
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define PAGE_SIZE 4096
#define PAGE_WORDS (PAGE_SIZE/sizeof(int))

#define CACHE_PAGES 1024
#define CACHE_SIZE (CACHE_PAGES*PAGE_SIZE)

#define WINDOW_PAGES 16
#define WINDOW_SIZE (WINDOW_PAGES*PAGE_SIZE)

#define WINDOW_START 0x4800

static char cache_contents [CACHE_SIZE];

static void test_nonlinear(int fd)
{
	char *data = NULL;
	int i, j, repeat = 2;

	for (i = 0; i < CACHE_PAGES; i++) {
		int *page = (int *) (cache_contents + i*PAGE_SIZE);

		for (j = 0; j < PAGE_WORDS; j++)
			page[j] = i;
	}

	if (write(fd, cache_contents, CACHE_SIZE) != CACHE_SIZE)
		perror("write"), exit(1);

	data = mmap((void *)WINDOW_START,
			WINDOW_SIZE,
			PROT_READ|PROT_WRITE, 
			MAP_FIXED | MAP_SHARED 
			, fd, 0);

	if (data == MAP_FAILED)
		perror("mmap"), exit(1);

again:
	for (i = 0; i < WINDOW_PAGES; i += 2) {
		char *page = data + i*PAGE_SIZE;

		if (remap_file_pages(page, PAGE_SIZE * 2, 0,
(WINDOW_PAGES-i-2), 0) == -1)
			perror("remap_file_pages"), exit(1);
	}

	for (i = 0; i < WINDOW_PAGES; i++) {
		/*
		 * Double-check the correctness of the mapping:
		 */
		if (i & 1) {
			if (data[i*PAGE_SIZE] != WINDOW_PAGES-i) {
printf("hm, mapped incorrect data!\n");
exit(1);
			}
		} else {
			if (data[i*PAGE_SIZE] != WINDOW_PAGES-i-2) {
printf("hm, mapped incorrect data!\n");
exit(1);
			}
		}
	}

	if (--repeat)
		goto again;
}

int main(int argc, char **argv)
{
	int fd;

	fd = open("/dev/shm/cache", O_RDWR|O_CREAT|O_TRUNC,S_IRWXU);
	if (fd < 0)
		perror("open"), exit(1);
	test_nonlinear(fd);
	if (close(fd) == -1)
		perror("close"), exit(1);
	printf("nonlinear shm file OK\n");

	fd = open("/tmp/cache", O_RDWR|O_CREAT|O_TRUNC,S_IRWXU);
	if (fd < 0)
		perror("open"), exit(1);
	test_nonlinear(fd);
	if (close(fd) == -1)
		perror("close"), exit(1);
	printf("nonlinear /tmp/ file OK\n");

	exit(0);
}



Re: [PATCH] aic94xx: Use request_firmware() to provide SAS address if the adapter lacks one

2007-10-08 Thread Andrew Vasquez
On Mon, 08 Oct 2007, Darrick J. Wong wrote:

> On Mon, Oct 08, 2007 at 03:48:32PM -0700, Andrew Vasquez wrote:
> 
> > So how about factoring that out to a transport-level interface.  How
> > about something along the lines of the following patch, whereby the
> > software driver upon detecting no valid WWPN, makes an upcall to each
> > interface's 'request_wwn()'.  The data passed in from shost_gendev
> > should be enough for some helper script to cull relevent device bits
> > and perhaps offer some level of persistence...  Off base?
> 
> Hrm... jejb made a remark that it might be better to pass the
> scsi_host's device into request_firmware() as your example does, so I'll
> pitch in a patch to do likewise with libsas--the scsi_host knows the
> actual device it's coming from, and userland can sort that all out later
> anyway via DEVPATH.
> 
> I suppose one could also have multiple scsi_hosts per PCI device, which
> means that my first patch would stumble horribly in more than a few
> cases.

This is done already in the FC case -- NPIV.  Though with that
interface, the administrator is already responsible for assigning
proper WWNN/WWPN during creation.

> > Darrick, forgive the FC example, I don't do SAS...
> 
> That's ok, I don't do FC. :)  Looks mostly good to me...

--
av
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Neil Brown
On Monday October 8, [EMAIL PROTECTED] wrote:

I find it is always good to know *why* we have the tags.  That
information is a useful complement to what they mean, and can guide
people in adding them.

So below I present some "Purposes", YetAnotherTag, and a comment on
the RSO.

(And I'd like to add a vote for "Blame-Shared-By:" rather than
"Reviewed-by:", however I don't I'll get much support...)

> diff --git a/Documentation/patch-tags b/Documentation/patch-tags
> new file mode 100644
> index 000..fb5f8e1
> --- /dev/null
> +++ b/Documentation/patch-tags
> @@ -0,0 +1,66 @@
> +Patches headed for the mainline may contain a variety of tags documenting
> +who played a hand in (or was at least aware of) its progress.  All of these
> +tags have the form:
> +
> + Something-done-by: Full name <[EMAIL PROTECTED]>
> +
> +These tags are:

   From:The Author, Primary Author, or Authors of the patch.
Authors should also provide a Signed-off-by: tag.

Purpose: to give credit to authors
> +
> +Signed-off-by:  A person adding a Signed-off-by tag is attesting that the
> + patch is, to the best of his or her knowledge, legally able
> + to be merged into the mainline and distributed under the
> + terms of the GNU General Public License, version 2.  See
> + the Developer's Certificate of Origin, found in
> + Documentation/SubmittingPatches, for the precise meaning of
> + Signed-off-by.

Purpose: to allow subsequent review of the originality of 
the contribution should copyright questions arise.
> +
> +Acked-by:The person named (who should be an active developer in the
> + area addressed by the patch) is aware of the patch and has
> + no objection to its inclusion.  An Acked-by tag does not
> + imply any involvement in the development of the patch or
> + that a detailed review was done.

Purpose:  to inform upstream aggregators that
consensus was achieved for the change.  This is
particularly relevant for changes that affect multiple
Maintenance Domains.

> +
> +Reviewed-by: The patch has been reviewed and found acceptible according
> + to the Reviewer's Statement as found at the bottom of this
> + file.  A Reviewed-by tag is a statement of opinion that the
> + patch is an appropriate modification of the kernel without
> + any remaining serious technical issues.  Any interested
> + reviewer (who has done the work) can offer a Reviewed-by
> + tag for a patch.

Purpose: to inform upstream aggregators that due
diligence has been performed to ensure correctness of
the change.  Also to give credit to reviewers.

> +
> +Cc:  The person named was given the opportunity to comment on
> + the patch.  This is the only tag which might be added
> + without an explicit action by the person it names.

Purpose: to ensure that interested parties are
included in subsequent discussions of the change.

> +
> +Tested-by:   The patch has been successfully tested (in some
> + environment) by the person named.

Purpose: to give credit to testers.

> +
> +
> +
> +
> +Reviewer's statement of oversight, v0.02
> +
> +By offering my Reviewed-by: tag, I state that:
> +
> + (a) I have carried out a technical review of this patch to evaluate its
> + appropriateness and readiness for inclusion into the mainline kernel. 
> +
> + (b) Any problems, concerns, or questions relating to the patch have been
> + communicated back to the submitter.  I am satisfied with how the
> + submitter has responded to my comments.

This seems more detailed that necessary.  The process (communicated
back / responded) is not really relevant.  I would go for something
like:

(b) I have no outstanding problems, concerns, or questions about
this patch (except as noted in the above comments).

and in fact, given (c2), (b) might not be needed at all.

NeilBrown


> +
> + (c) While there may (or may not) be things which could be improved with
> + this submission, I believe that it is, at this time, (1) a worthwhile
> + modification to the kernel, and (2) free of known issues which would
> + argue against its inclusion.
> +
> + (d) While I have reviewed the patch and believe it to be sound, I can not
> + (unless explicitly stated elsewhere) make any warranties or guarantees
> + that it will achieve its stated purpose or function properly in any
> + given situation.
> +
> + (e) I understand and agree that this project and the contribution are
> + public and that a record of the contribution (including my Reviewed-by
> + tag and any associated public communica

Re: [ofa-general] Updated InfiniBand/RDMA merge plans for 2.6.24

2007-10-08 Thread Roland Dreier
 > No mention about the iwarp port space issue?

I don't think we're at a stage where I'm prepared to merge something--
we all agree the latest patch has serious drawbacks, and it commits us
to a suboptimal interface that is userspace-visible.

 > I'm at a loss as to how to proceed.

Could we try to do some cleanups to the net core to make the alias
stuff less painful?  eg is there any sane way to make it possible for
a device that creates 'eth0' to also create an 'iw0' alias without an
assigning an address?

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI root_port_reset() deadly loop...

2007-10-08 Thread David Miller
From: David Miller <[EMAIL PROTECTED]>
Date: Sun, 07 Oct 2007 00:51:56 -0700 (PDT)

> From: David Brownell <[EMAIL PROTECTED]>
> Date: Sun, 07 Oct 2007 00:31:41 -0700
> 
> > Are the other ports still behaving?  Is EHCI maybe trying to switch
> > ownership of that port?  Is maybe the (newish) autosuspend stuff
> > kicking in?
> 
> I wouldn't know, the machine hangs and doesn't get any further.

To add some more information here, I think the EHCI idea might
hold some water.

What I have here are two NEC OHCI USB interfaces and one NEC EHCI
USB interface on PCI.  Aparently they all go through a shared
USB hub, mapped like this:

HUB Port 1: OHCI #1, EHCI
HUB Port 2: OHCI #2, EHCI
HUB Port 3: OHCI #1, EHCI
HUB Port 4: OHCI #2, EHCI
HUB Port 5: OHCI #1, EHCI

The OHCI ports go out to external USB connectors on the back panel of
the machine, whereas the EHCI is connected up to an internal USB
storage CDROM device and what appears to be another USB hub.

The problem seems to be very strongly tied to timing.  For example
simply adding "ignore_loglevel" to the kernel boot command line can
make the problem go away.

This got me thinking about your EHCI comment.

If these controllers are going through the same HUB, things might go
south if OHCI initialized first, then khubd et al. are asynchronously
accessing the segments behind OHCI at the same time that the EHCI
driver is initializing.  Perhaps, this is the kind of sequence of
events which makes one of the root ports reset in such a way that the
the reset bit never clears.

Given that this machine has 64 cpus, the likelyhood for such parallel
accesses is very likely :-)

Does this make any sense?

Regardless, here is a patch that hardens the OHCI reset handling
loops so that they break out instead of hanging the entire system
should this condition occur.  It's at least better than what the
code does to a user right now which is hang the box completely:

[USB] ohci: Do not hang the system if port reset does not complete.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/drivers/usb/host/ohci-hub.c b/drivers/usb/host/ohci-hub.c
index bb9cc59..77ae5b4 100644
--- a/drivers/usb/host/ohci-hub.c
+++ b/drivers/usb/host/ohci-hub.c
@@ -563,14 +563,19 @@ static inline int root_port_reset (struct ohci_hcd *ohci, 
unsigned port)
u32 temp;
u16 now = ohci_readl(ohci, &ohci->regs->fmnumber);
u16 reset_done = now + PORT_RESET_MSEC;
+   int limit_1;
 
/* build a "continuous enough" reset signal, with up to
 * 3msec gap between pulses.  scheduler HZ==100 must work;
 * this might need to be deadline-scheduled.
 */
-   do {
+   limit_1 = 100;
+   while (--limit_1 >= 0) {
+   int limit_2;
+
/* spin until any current reset finishes */
-   for (;;) {
+   limit_2 = PORT_RESET_MSEC * 2;
+   while (--limit_2 >= 0) {
temp = ohci_readl (ohci, portstat);
/* handle e.g. CardBus eject */
if (temp == ~(u32)0)
@@ -579,6 +584,10 @@ static inline int root_port_reset (struct ohci_hcd *ohci, 
unsigned port)
break;
udelay (500);
}
+   if (limit_2 < 0) {
+   ohci_warn(ohci, "Root port inner-loop reset timeout, "
+ "portstat[%08x]\n", temp);
+   }
 
if (!(temp & RH_PS_CCS))
break;
@@ -589,7 +598,14 @@ static inline int root_port_reset (struct ohci_hcd *ohci, 
unsigned port)
ohci_writel (ohci, RH_PS_PRS, portstat);
msleep(PORT_RESET_HW_MSEC);
now = ohci_readl(ohci, &ohci->regs->fmnumber);
-   } while (tick_before(now, reset_done));
+   if (!tick_before(now, reset_done))
+   break;
+   }
+   if (limit_1 < 0) {
+   ohci_warn(ohci, "Root port outer-loop reset timeout, "
+ "now[%04x] reset_done[%04x]\n",
+ now, reset_done);
+   }
/* caller synchronizes using PRSC */
 
return 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Colored kernel output (run3)

2007-10-08 Thread Antonino A. Daplas
On Tue, 2007-10-09 at 01:31 +0200, Jan Engelhardt wrote:
> On Oct 9 2007 07:12, Antonino A. Daplas wrote:
> >> 
> >> References: http://lkml.org/lkml/2007/4/1/162
> >>http://lkml.org/lkml/2007/10/5/199
> >
> >This is quite a long thread :-)
> 
> It was a patch series after all. But as Greg puts it, be persistent.
> 
> >> +config VT_PRINTK_COLOR
> >> +  hex "Colored kernel message output"
> >> +  range 0x00 0xFF
> >> +  depends on VT_CKO
> >> +  default 0x07
> >> +  ---help---
> >> +  This option defines with which color kernel messages will be
> >> +  printed to the console.
> >> +
> >> +  The value you need to enter here is the value is composed
> >
> >The more correct term for "The value" is probably "The attribute".
> 
> "The value for this kconfig entry" it should read in the minds.
> 
> >> +  (Foreground colors 0x08 to 0x0F do not work when a VGA
> >> +  console font with 512 glyphs is used.)
> >
> >You might have to include a warning that those values or attributes are 
> >interpreted differently depending on the driver used, and the above is
> >mostly true for 16-color console drivers only.
> 
> Are there any other drivers besides vgacon and fbcon that use vt.c?

All drivers under drivers/video/console. That would be:

vgacon
dummycon
fbcon
newport_con
sticon
promcon
mdacon

There are perhaps a few more drivers outside this directory, such as
sisusbcon or something.



> >You may want to leave out the blink attribute (0x80) from this part.
> >Otherwise setterm -blink on|off will produce the opposite effect. 
> 
> But 0x80 might be interpreted in a different fashion for some othercon, 
> yielding for example superbold rather than blinking.

That's right. But setting the blink attribute is done with an XOR (^).
So 'setterm -blink' on will unset the blink attribute (0x80 ^ 0x80).

> I'll have to try this, because usually, setterm operates on TTYs
> rather than VCs.

Yes, but if the tty driver type is a virtual console, then vt.c is still
affected. 

Well the blink attribute is ignored by most drivers, if I'm not
mistaken. So you generally won't see the effect :-). But with fbcon, the
blink attribute is interpreted as "change background color from black to
light gray".

Tony

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aic94xx: Use request_firmware() to provide SAS address if the adapter lacks one

2007-10-08 Thread Darrick J. Wong
On Mon, Oct 08, 2007 at 03:48:32PM -0700, Andrew Vasquez wrote:

> So how about factoring that out to a transport-level interface.  How
> about something along the lines of the following patch, whereby the
> software driver upon detecting no valid WWPN, makes an upcall to each
> interface's 'request_wwn()'.  The data passed in from shost_gendev
> should be enough for some helper script to cull relevent device bits
> and perhaps offer some level of persistence...  Off base?

Hrm... jejb made a remark that it might be better to pass the
scsi_host's device into request_firmware() as your example does, so I'll
pitch in a patch to do likewise with libsas--the scsi_host knows the
actual device it's coming from, and userland can sort that all out later
anyway via DEVPATH.

I suppose one could also have multiple scsi_hosts per PCI device, which
means that my first patch would stumble horribly in more than a few
cases.

> Darrick, forgive the FC example, I don't do SAS...

That's ok, I don't do FC. :)  Looks mostly good to me...

--D
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sleepy linux 2.6.23-rc9

2007-10-08 Thread Clemens Koller

Pavel Machek schrieb:

I played with powertop a bit, and found a fairly interesting failure
mode. If I boot init=/bin/bash vga=1, I get ~2 wakeups a second, nice.

When I boot init=/bin/bash vga=791 (vesa framebuffer), most wakeups
are caused by cursor painting (I should fix that some day, I
guess). But... the cursor blinking does not even work properly!

It blinks at normal speed, then (randomly) it blinks slowly, then gets
back to normal speed, then inserts longer delay.


Is the effect a beat that it has roughly the frequency of your Notebooks
screen refresh rate (60Hz)? (in german: Schwebung)


The effect is so nice that I thought about youtube ;-). Thinkpad
x60.. question is, how to debug it? 


No idea... check where the register of the HW cursor blink rate
gets written? But as it seems to be so nice, please submit a patch
which enables this for all platforms. ;-)

Regards,
--
Clemens Koller
___
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Str. 45/1
81379 Muenchen
Germany

http://www.anagramm-technology.com
Phone: +49-89-741518-50
Fax: +49-89-741518-19
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: parallel networking

2007-10-08 Thread jamal
On Mon, 2007-08-10 at 15:33 -0700, David Miller wrote:

> Multiply whatever effect you think you might be able to measure due to
> that on your 2 or 4 way system, and multiple it up to 64 cpus or so
> for machines I am using.  This is where machines are going, and is
> going to become the norm.

Yes, i keep forgetting that ;-> I need to train my brain to remember
that.

cheers,
jamal



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Stefan Richter
Jonathan Corbet wrote:
> All of these
> +tags have the form:
> +
> + Something-done-by: Full name <[EMAIL PROTECTED]>

To be precise:
Something-done-by: Full name <[EMAIL PROTECTED]> [optional random stuff]

"Some people also put extra tags at the end.  They'll just be ignored
for now, but you can do this to mark internal company procedures or just
point out some special detail about the sign-off.", says
SubmittingPatches.  I actually do so on occasions.
-- 
Stefan Richter
-=-=-=== =-=- -=--=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: set_page_dirty_balance() vs ->page_mkwrite()

2007-10-08 Thread David Chinner
On Mon, Oct 08, 2007 at 04:37:00PM +1000, Nick Piggin wrote:
> On Tuesday 09 October 2007 02:54, Peter Zijlstra wrote:
> > It seems that with the recent usage of ->page_mkwrite() a little detail
> > was overlooked.
> >
> > .22-rc1 merged OCFS2 usage of this hook
> > .23-rc1 merged XFS usage
> > .24-rc1 will most likely merge NFS usage
> >
> > Please consider this for .23 final and maybe even .22.x
> >
> > ---
> > Subject: mm: set_page_dirty_balance() vs ->page_mkwrite()
> >
> > All the current page_mkwrite() implementations also set the page dirty.
> > Which results in the set_page_dirty_balance() call to _not_ call balance,
> > because the page is already found dirty.
> >
> > This allows us to dirty a _lot_ of pages without ever hitting
> > balance_dirty_pages().  Not good (tm).
> >
> > Force a balance call if ->page_mkwrite() was successful.
> 
> Would it be better to just have the callers set_page_dirty_balance()?

block_page_mkwrite() is just using generic interfaces to do this,
same as pretty much any write() system call. The idea was to make it
as similar to the write() call path as possible...

However, unlike generic_file_buffered_write(), we are not calling
balance_dirty_pages_ratelimited(mapping) between
->prepare/commit_write call pairs.  Perhaps this should be added to
block_page_mkwrite() after the page is unlocked

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]fix VM_CAN_NONLINEAR check in sys_remap_file_pages

2007-10-08 Thread Nick Piggin
On Tuesday 09 October 2007 03:04, Andrew Morton wrote:
> On Mon, 8 Oct 2007 19:45:08 +0800 "Yan Zheng" <[EMAIL PROTECTED]> wrote:
> > Hi all
> >
> > The test for VM_CAN_NONLINEAR always fails
> >
> > Signed-off-by: Yan Zheng<[EMAIL PROTECTED]>
> > 
> > diff -ur linux-2.6.23-rc9/mm/fremap.c linux/mm/fremap.c
> > --- linux-2.6.23-rc9/mm/fremap.c2007-10-07 15:03:33.0 +0800
> > +++ linux/mm/fremap.c   2007-10-08 19:33:44.0 +0800
> > @@ -160,7 +160,7 @@
> > if (vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR))
> > goto out;
> >
> > -   if (!vma->vm_flags & VM_CAN_NONLINEAR)
> > +   if (!(vma->vm_flags & VM_CAN_NONLINEAR))
> > goto out;
> >
> > if (end <= start || start < vma->vm_start || end > vma->vm_end)
>
> Lovely.  From this we can deduce that nobody has run remap_file_pages()
> since 2.6.23-rc1 and that nobody (including the developer who made that
> change) ran it while that change was in -mm.

But you'd be wrong. remap_file_pages was tested both with my own tester
and Ingo's test program.

vm_flags != 0, !vm_flags = 0, 0 & x = 0, so the test always falls
through. Of course, what I _should_ have done is also test a driver which
does not have VM_CAN_NONLINEAR... but even I wouldn't rewrite half
the nonlinear mapping code without once testing it ;)

FWIW, Oracle (maybe the sole real user of this) has been testing it, which
I'm very happy about (rather than testing after 2.6.23 is released).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] device-mapper: fix bd_mount_sem corruption

2007-10-08 Thread Jun'ichi Nomura
Hi,

This patch fixes a bd_mount_sem counter corruption bug in device-mapper.

thaw_bdev() should be called only when freeze_bdev() was called for the
device.
Otherwise, thaw_bdev() will up bd_mount_sem and corrupt the semaphore counter.
struct block_device with the corrupted semaphore may remain in slab cache
and be reused later.

Attached patch will fix it by calling unlock_fs() instead.
unlock_fs() will determine whether it should call thaw_bdev()
by checking the device is frozen or not.

Easy reproducer is:
  #!/bin/sh
  while [ 1 ]; do
 dmsetup --notable create a
 dmsetup --nolockfs suspend a
 dmsetup remove a
  done

It's not easy to see the effect of corrupted semaphore.
So I have tested with putting printk below in bdev_alloc_inode():
if (atomic_read(&ei->bdev.bd_mount_sem.count) != 1)
printk(KERN_DEBUG "Incorrect semaphore count = %d (%p)\n",
atomic_read(&ei->bdev.bd_mount_sem.count),
&ei->bdev);

Without the patch, I saw something like:
 Incorrect semaphore count = 17 (f2ab91c0)

With the patch, the message didn't appear.


Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 2120155..998d450 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1064,12 +1064,14 @@ static struct mapped_device *alloc_dev(int minor)
return NULL;
 }
 
+static void unlock_fs(struct mapped_device *md);
+
 static void free_dev(struct mapped_device *md)
 {
int minor = md->disk->first_minor;
 
if (md->suspended_bdev) {
-   thaw_bdev(md->suspended_bdev, NULL);
+   unlock_fs(md);
bdput(md->suspended_bdev);
}
mempool_destroy(md->tio_pool);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Colored kernel output (run3)

2007-10-08 Thread Jan Engelhardt

On Oct 9 2007 07:12, Antonino A. Daplas wrote:
>> 
>> References: http://lkml.org/lkml/2007/4/1/162
>>  http://lkml.org/lkml/2007/10/5/199
>
>This is quite a long thread :-)

It was a patch series after all. But as Greg puts it, be persistent.

>> +config VT_PRINTK_COLOR
>> +hex "Colored kernel message output"
>> +range 0x00 0xFF
>> +depends on VT_CKO
>> +default 0x07
>> +---help---
>> +This option defines with which color kernel messages will be
>> +printed to the console.
>> +
>> +The value you need to enter here is the value is composed
>
>The more correct term for "The value" is probably "The attribute".

"The value for this kconfig entry" it should read in the minds.

>> +(Foreground colors 0x08 to 0x0F do not work when a VGA
>> +console font with 512 glyphs is used.)
>
>You might have to include a warning that those values or attributes are 
>interpreted differently depending on the driver used, and the above is
>mostly true for 16-color console drivers only.

Are there any other drivers besides vgacon and fbcon that use vt.c?

>For 2-colors [...] With a 4-color fb console (4-level grayscale) [...]
>With an 8-color console, only the first 8 values are considered.
>With a 16-color console, that is also not consistent:[...]

I see. That probably means the explanation of values moves from Kconfig 
to Documentation/. Somehow I think we could do without doc and let 
interested starts find out for themselves and learn a little about 
vgacon/fbcon. ;)

>With vgacon, it supports 16-color foreground (fg), 8-color
>background (bg) at 256 chars. Becomes 8 fg and 8 bg with 512 chars.
>
>With fbcon, it supports 16 fg and 16 bg at 256, 16 fg and 8 bg at
>512 chars.

And then there is fbiterm, which supports at least 16 fg/16 bg with ... 
the whole Unicode set of chars. :)

>And for drivers that have their own con_build_attr() hook, they will be
>interpreted differently again.

>> +Background:
>> +0x00 = black,   0x40 = blue,
>> +0x10 = red, 0x50 = magenta,
>> +0x20 = green,   0x60 = cyan,
>> +0x30 = brown,   0x70 = gray,
>> +
>> +For example, 0x1F would yield white on red.
>
>You may need to specify that the values here are the console default,
>ie, the default_blue|grn|red boot options are not filled up.

>> +static inline void vc_set_color(struct vc_data *vc, unsigned char color)
>> +{
>> +vc->vc_color = color_table[color & 0xF] |
>> +   (color_table[(color >> 4) & 0x7] << 4) |
>> +   (color & 0x80);
>
>You may want to leave out the blink attribute (0x80) from this part.
>Otherwise setterm -blink on|off will produce the opposite effect. 

But 0x80 might be interpreted in a different fashion for some othercon, 
yielding for example superbold rather than blinking.
I'll have to try this, because usually, setterm operates on TTYs
rather than VCs.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread J. Bruce Fields
On Mon, Oct 08, 2007 at 04:43:10PM -0600, Jonathan Corbet wrote:
> + (e) I understand and agree that this project and the contribution are
> + public and that a record of the contribution (including my Reviewed-by
> + tag and any associated public communications) is maintained
> + indefinitely and may be redistributed consistent with this project or
> + the open source license(s) involved.

Is this paragraph really necessary?  (For example, is there some history
of problems that this is addressing?)

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel

2007-10-08 Thread Bill Davidsen

Serge E. Hallyn wrote:

(tongue-in-cheek)

No no, everyone knows you don't build simpler things on top of more
complicated ones, you go the other way around.  So what he was
suggesting was that selinux be re-written on top of smack.
  


Having gone from proposing a simpler and easier to use security system 
as an alternative to SELinux, you now propose to change the one working 
security system we have. And yes, it's hard to use, but it works. Let's 
keep this a patch, people who want adventure can have one, and people 
who have gotten Linux accepted "if SELinux is enabled" will avoid one.


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]fix VM_CAN_NONLINEAR check in sys_remap_file_pages

2007-10-08 Thread Nick Piggin
On Monday 08 October 2007 23:37, Hugh Dickins wrote:
> On Mon, 8 Oct 2007, Yan Zheng wrote:
> > The test for VM_CAN_NONLINEAR always fails
>
> Good catch indeed.  Though I was puzzled how we do nonlinear at all,
> until I realized it's "The test for not VM_CAN_NONLINEAR always fails".
>
> It's not as serious as it appears, since code further down has been
> added more recently to simulate nonlinear on non-RAM-backed filesystems,
> instead of going the real nonlinear way; so most filesystems are now not
> required to do what VM_CAN_NONLINEAR was put in to ensure they could do.

Well, I think all filesystems can do VM_CAN_NONLINEAR anyway. Device
drivers and "weird" things tend to have trouble...


> I'm confused as to where that leaves us: is this actually a fix that
> needs to go into 2.6.23?  or will it suddenly disable a system call
> which has been silently working fine on various filesystems which did
> not add VM_CAN_NONLINEAR?  could we just rip out VM_CAN_NONLINEAR?

We probably should keep VM_CAN_NONLINEAR for the moment, I think.
But now that we have the fallback path, we _could_ use that instead of
failing. I doubt anybody will be using nonlinear mappings on anything but
regular files for the time being, but as a trivial fix, I think this probably
should go into 2.6.23.

Thanks for spotting this problem
Acked-by: Nick Piggin <[EMAIL PROTECTED]>

> I hope Nick or Miklos is clearer on what the risks are.
>
> (Apologies for all the "not"s and "non"s here, I'm embarrassed
> after just criticizing Ingo's SCHED_NO_NO_OMIT_FRAME_POINTER!)
>
> Hugh
>
> > Signed-off-by: Yan Zheng<[EMAIL PROTECTED]>
> > 
> > diff -ur linux-2.6.23-rc9/mm/fremap.c linux/mm/fremap.c
> > --- linux-2.6.23-rc9/mm/fremap.c2007-10-07 15:03:33.0 +0800
> > +++ linux/mm/fremap.c   2007-10-08 19:33:44.0 +0800
> > @@ -160,7 +160,7 @@
> > if (vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR))
> > goto out;
> >
> > -   if (!vma->vm_flags & VM_CAN_NONLINEAR)
> > +   if (!(vma->vm_flags & VM_CAN_NONLINEAR))
> > goto out;
> >
> > if (end <= start || start < vma->vm_start || end > vma->vm_end)
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> > in the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: set_page_dirty_balance() vs ->page_mkwrite()

2007-10-08 Thread Nick Piggin
On Tuesday 09 October 2007 02:54, Peter Zijlstra wrote:
> It seems that with the recent usage of ->page_mkwrite() a little detail
> was overlooked.
>
> .22-rc1 merged OCFS2 usage of this hook
> .23-rc1 merged XFS usage
> .24-rc1 will most likely merge NFS usage
>
> Please consider this for .23 final and maybe even .22.x
>
> ---
> Subject: mm: set_page_dirty_balance() vs ->page_mkwrite()
>
> All the current page_mkwrite() implementations also set the page dirty.
> Which results in the set_page_dirty_balance() call to _not_ call balance,
> because the page is already found dirty.
>
> This allows us to dirty a _lot_ of pages without ever hitting
> balance_dirty_pages().  Not good (tm).
>
> Force a balance call if ->page_mkwrite() was successful.

Would it be better to just have the callers set_page_dirty_balance()?


> Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
> ---
>  include/linux/writeback.h |2 +-
>  mm/memory.c   |9 +++--
>  mm/page-writeback.c   |4 ++--
>  3 files changed, 10 insertions(+), 5 deletions(-)
>
> Index: linux-2.6/include/linux/writeback.h
> ===
> --- linux-2.6.orig/include/linux/writeback.h
> +++ linux-2.6/include/linux/writeback.h
> @@ -137,7 +137,7 @@ int sync_page_range(struct inode *inode,
>   loff_t pos, loff_t count);
>  int sync_page_range_nolock(struct inode *inode, struct address_space
> *mapping, loff_t pos, loff_t count);
> -void set_page_dirty_balance(struct page *page);
> +void set_page_dirty_balance(struct page *page, int page_mkwrite);
>  void writeback_set_ratelimit(void);
>
>  /* pdflush.c */
> Index: linux-2.6/mm/memory.c
> ===
> --- linux-2.6.orig/mm/memory.c
> +++ linux-2.6/mm/memory.c
> @@ -1559,6 +1559,7 @@ static int do_wp_page(struct mm_struct *
>   struct page *old_page, *new_page;
>   pte_t entry;
>   int reuse = 0, ret = 0;
> + int page_mkwrite = 0;
>   struct page *dirty_page = NULL;
>
>   old_page = vm_normal_page(vma, address, orig_pte);
> @@ -1607,6 +1608,8 @@ static int do_wp_page(struct mm_struct *
>   page_cache_release(old_page);
>   if (!pte_same(*page_table, orig_pte))
>   goto unlock;
> +
> + page_mkwrite = 1;
>   }
>   dirty_page = old_page;
>   get_page(dirty_page);
> @@ -1691,7 +1694,7 @@ unlock:
>* do_no_page is protected similarly.
>*/
>   wait_on_page_locked(dirty_page);
> - set_page_dirty_balance(dirty_page);
> + set_page_dirty_balance(dirty_page, page_mkwrite);
>   put_page(dirty_page);
>   }
>   return ret;
> @@ -2238,6 +2241,7 @@ static int __do_fault(struct mm_struct *
>   struct page *dirty_page = NULL;
>   struct vm_fault vmf;
>   int ret;
> + int page_mkwrite = 0;
>
>   vmf.virtual_address = (void __user *)(address & PAGE_MASK);
>   vmf.pgoff = pgoff;
> @@ -2315,6 +2319,7 @@ static int __do_fault(struct mm_struct *
>   anon = 1; /* no anon but release 
> vmf.page */
>   goto out;
>   }
> + page_mkwrite = 1;
>   }
>   }
>
> @@ -2375,7 +2380,7 @@ out_unlocked:
>   if (anon)
>   page_cache_release(vmf.page);
>   else if (dirty_page) {
> - set_page_dirty_balance(dirty_page);
> + set_page_dirty_balance(dirty_page, page_mkwrite);
>   put_page(dirty_page);
>   }
>
> Index: linux-2.6/mm/page-writeback.c
> ===
> --- linux-2.6.orig/mm/page-writeback.c
> +++ linux-2.6/mm/page-writeback.c
> @@ -460,9 +460,9 @@ static void balance_dirty_pages(struct a
>   pdflush_operation(background_writeout, 0);
>  }
>
> -void set_page_dirty_balance(struct page *page)
> +void set_page_dirty_balance(struct page *page, int page_mkwrite)
>  {
> - if (set_page_dirty(page)) {
> + if (set_page_dirty(page) || page_mkwrite) {
>   struct address_space *mapping = page_mapping(page);
>
>   if (mapping)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Colored kernel output (run3)

2007-10-08 Thread Antonino A. Daplas
On Sat, 2007-10-06 at 22:09 +0200, Jan Engelhardt wrote: 
> Colored kernel message output (1/2)
> 
> This patch makes it possible to give kernel messages a selectable
> color. It can be chosen at compile time, overridden at boot time,
> and changed at run time.
> 
> References: http://lkml.org/lkml/2007/4/1/162
>   http://lkml.org/lkml/2007/10/5/199

This is quite a long thread :-)

> 
> Signed-off-by: Jan Engelhardt <[EMAIL PROTECTED]>
> 
> ---
>  drivers/char/Kconfig |   42 ++
>  drivers/char/vt.c|   23 +++
>  2 files changed, 65 insertions(+)
> 
> Index: linux-2.6.23/drivers/char/Kconfig
> ===
> --- linux-2.6.23.orig/drivers/char/Kconfig
> +++ linux-2.6.23/drivers/char/Kconfig
> @@ -58,6 +58,48 @@ config VT_CONSOLE
>  
> If unsure, say Y.
>  
> +config VT_CKO
> + bool "Colored kernel message output"
> + depends on VT_CONSOLE
> + ---help---
> + This option enables kernel messages to be emitted in
> + colors other than the default.
> +
> + If unsure, say N.
> +
> +config VT_PRINTK_COLOR
> + hex "Colored kernel message output"
> + range 0x00 0xFF
> + depends on VT_CKO
> + default 0x07
> + ---help---
> + This option defines with which color kernel messages will be
> + printed to the console.
> +
> + The value you need to enter here is the value is composed

The more correct term for "The value" is probably "The attribute".

> + (OR-ed) of a foreground and a background color.
> +
> + Foreground:
> + 0x00 = black,   0x08 = dark gray,
> + 0x01 = red, 0x09 = light red,
> + 0x02 = green,   0x0A = light green,
> + 0x03 = brown,   0x0B = yellow,
> + 0x04 = blue,0x0C = light blue,
> + 0x05 = magenta, 0x0D = light magenta,
> + 0x06 = cyan,0x0E = light cyan,
> + 0x07 = gray,0x0F = white,
> +
> + (Foreground colors 0x08 to 0x0F do not work when a VGA
> + console font with 512 glyphs is used.)

You might have to include a warning that those values or attributes are 
interpreted differently depending on the driver used, and the above is
mostly true for 16-color console drivers only.

For 2-colors (we still have quite a few of them) only bit 0 is true for
color (0x00 and 0x01). The rest of the bits are interpreted as
attributes:

0x02 - italic
0x04 - underline
0x08 - bold
0x80 - blink

The italic, underline and bold attributes will show up in a 2-color
framebuffer console. The blink attribute is ignored.

With a 4-color fb console (4-level grayscale), those values are again
interpreted differently.

0x00 - 0x00 : black
0x01 - 0x06 : white
0x07 - 0x08 : gray  
the rest: intense white

(If by mistake 0x0106 is used, it will produce a white on white display)

With an 8-color console, only the first 8 values are considered.

With a 16-color console, that is also not consistent:

With vgacon, it supports 16-color foreground (fg), 8-color
background (bg) at 256 chars. Becomes 8 fg and 8 bg with 512 chars.

With fbcon, it supports 16 fg and 16 bg at 256, 16 fg and 8 bg at
512 chars.

And for drivers that have their own con_build_attr() hook, they will be
interpreted differently again.

> +
> + Background:
> + 0x00 = black,   0x40 = blue,
> + 0x10 = red, 0x50 = magenta,
> + 0x20 = green,   0x60 = cyan,
> + 0x30 = brown,   0x70 = gray,
> +
> + For example, 0x1F would yield white on red.
> +

You may need to specify that the values here are the console default,
ie, the default_blue|grn|red boot options are not filled up.

>  config HW_CONSOLE
>   bool
>   depends on VT && !S390 && !UML
> Index: linux-2.6.23/drivers/char/vt.c
> ===
> --- linux-2.6.23.orig/drivers/char/vt.c
> +++ linux-2.6.23/drivers/char/vt.c
> @@ -73,6 +73,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2344,6 +2345,23 @@ struct tty_driver *console_driver;
>  
>  #ifdef CONFIG_VT_CONSOLE
>  
> +static unsigned int printk_color __read_mostly = CONFIG_VT_PRINTK_COLOR;
> +#ifdef CONFIG_VT_CKO
> +module_param(printk_color, uint, S_IRUGO | S_IWUSR);
> +
> +static inline void vc_set_color(struct vc_data *vc, unsigned char color)
> +{
> + vc->vc_color = color_table[color & 0xF] |
> +(color_table[(color >> 4) & 0x7] << 4) |
> +(color & 0x80);

You may want to leave out the blink attribute (0x80) from this part.
Otherwise setterm -blink on|off will produce the opposite effect. 

Tony



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + fix-vm_can_nonlinear-check-in-sys_remap_file_pages.patch added to -mm tree

2007-10-08 Thread Ray Lee
On 10/8/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
> On Mon, Oct 08, 2007 at 10:05:40AM -0700, [EMAIL PROTECTED] wrote:
> > --- a/mm/fremap.c~fix-vm_can_nonlinear-check-in-sys_remap_file_pages
> > +++ a/mm/fremap.c
> > @@ -160,7 +160,7 @@ asmlinkage long sys_remap_file_pages(uns
> >   if (vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR))
> >   goto out;
> >
> > - if (!vma->vm_flags & VM_CAN_NONLINEAR)
> > + if (!(vma->vm_flags & VM_CAN_NONLINEAR))
>
> Ick.

Perhaps a good candidate for checkpatch.pl? (Andy cc:d.)

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Randy Dunlap
On Mon, 08 Oct 2007 16:43:10 -0600 Jonathan Corbet wrote:

> Sam Ravnborg <[EMAIL PROTECTED]> wrote:
> 
> > Or maybe we need something much less formal that explain the purpose of the
> > four tags we use:
> 
> ...or maybe a combination?  How does the following patch look as a way
> to describe how the tags are used and what Reviewed-by, in particular,
> means?
> 
> Perhaps the DCO should move to this file as well?
> 
> jon

Just typos noted below...

> ---
> 
> Add a document on patch tags.
> 
> Signed-off-by: Jonathan Corbet <[EMAIL PROTECTED]>
> 
> diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
> index 43e89b1..fa1518b 100644
> --- a/Documentation/00-INDEX
> +++ b/Documentation/00-INDEX
> @@ -284,6 +284,8 @@ parport.txt
>   - how to use the parallel-port driver.
>  parport-lowlevel.txt
>   - description and usage of the low level parallel port functions.
> +patch-tags
> + - description of the tags which can be added to patches
>  pci-error-recovery.txt
>   - info on PCI error recovery.
>  pci.txt
> diff --git a/Documentation/patch-tags b/Documentation/patch-tags
> new file mode 100644
> index 000..fb5f8e1
> --- /dev/null
> +++ b/Documentation/patch-tags
> @@ -0,0 +1,66 @@
> +Patches headed for the mainline may contain a variety of tags documenting
> +who played a hand in (or was at least aware of) its progress.  All of these
> +tags have the form:
> +
> + Something-done-by: Full name <[EMAIL PROTECTED]>
> +
> +These tags are:
> +
> +Signed-off-by:  A person adding a Signed-off-by tag is attesting that the
> + patch is, to the best of his or her knowledge, legally able
> + to be merged into the mainline and distributed under the
> + terms of the GNU General Public License, version 2.  See
> + the Developer's Certificate of Origin, found in
> + Documentation/SubmittingPatches, for the precise meaning of
> + Signed-off-by.
> +
> +Acked-by:The person named (who should be an active developer in the
> + area addressed by the patch) is aware of the patch and has
> + no objection to its inclusion.  An Acked-by tag does not
> + imply any involvement in the development of the patch or
> + that a detailed review was done.
> +
> +Reviewed-by: The patch has been reviewed and found acceptible according

  acceptable

> + to the Reviewer's Statement as found at the bottom of this
> + file.  A Reviewed-by tag is a statement of opinion that the
> + patch is an appropriate modification of the kernel without
> + any remaining serious technical issues.  Any interested
> + reviewer (who has done the work) can offer a Reviewed-by
> + tag for a patch.
> +
> +Cc:  The person named was given the opportunity to comment on
> + the patch.  This is the only tag which might be added
> + without an explicit action by the person it names.
> +
> +Tested-by:   The patch has been successfully tested (in some
> + environment) by the person named.
> +
> +
> +
> +
> +Reviewer's statement of oversight, v0.02
> +
> +By offering my Reviewed-by: tag, I state that:
> +
> + (a) I have carried out a technical review of this patch to evaluate its
> + appropriateness and readiness for inclusion into the mainline kernel. 
> +
> + (b) Any problems, concerns, or questions relating to the patch have been
> + communicated back to the submitter.  I am satisfied with how the
> + submitter has responded to my comments.
> +
> + (c) While there may (or may not) be things which could be improved with
> + this submission, I believe that it is, at this time, (1) a worthwhile
> + modification to the kernel, and (2) free of known issues which would
> + argue against its inclusion.
> +
> + (d) While I have reviewed the patch and believe it to be sound, I can not

 cannot

> + (unless explicitly stated elsewhere) make any warranties or guarantees
> + that it will achieve its stated purpose or function properly in any
> + given situation.
> +
> + (e) I understand and agree that this project and the contribution are
> + public and that a record of the contribution (including my Reviewed-by
> + tag and any associated public communications) is maintained
> + indefinitely and may be redistributed consistent with this project or
> + the open source license(s) involved.
> -


---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Oleg Verych
* Mon, 8 Oct 2007 17:38:52 -0400
>
> On Mon, Oct 08, 2007 at 01:33:38PM -0700, H. Peter Anvin wrote:
>> Uhm, no.  There is no reason an "unimportant" person couldn't review a 
>> patch, and therefore perform a potentially highly valuable service to 
>> the maintainer.
>> 
>> None of these are indicative of the authority of the person acking, 
>> reviewing, testing, or nacking.  That's only as good as the trust in the 
>> person signing.
>
> I would tend to agree.  Right now I think the problem is that we are
> getting too little reviews, not enough.  And someone who reviews
> patches, even if unknown, could be building up expertise that
> eventually would make them a valued developer, even while they are
> doing us a service.   

Experience of convincing experienced patch author, that some things in
the patch are wrong :)

[]
> We could ask reviewers to include a URL to an LKML archive of their
> review, to make it easier to find a review of a patch so later on
> people can judge how effective they their review was.

I vote for more little summaries in the `Subject'(again). Long, boring
threads with whole threading part of screen being empty due to same
subjects isn't fun, when some of thousands of messages can have
interesting stuff inside.

And it's easy not only for mailing list readers now, and for archive
readers also; readers of the www search results (who ever that may be):

google.com/search?q=reviewed+crashkernel

First hit on the review of the patch, i happened to make. And i just
thought "hell, just string parsing, what can be more simply?", yet there
was productive discussion and bug fixing. After i saw convincing
statements about testing, i've placed review mark. Though i'm really
"unimportant" random hacker.
--
-o--=O`C
 #oo'L O
<___=E M
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [NetLabel] Introduce a new kernel configuration API for NetLabel - for Smack Version 5

2007-10-08 Thread Casey Schaufler
From: Paul Moore <[EMAIL PROTECTED]>

Add a new set of configuration functions to the NetLabel/LSM API so that
LSMs can perform their own configuration of the NetLabel subsystem without
relying on assistance from userspace.

Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
---

This update fixes a memory leak on error conditions.

 include/net/netlabel.h |   47 --
 net/ipv4/cipso_ipv4.c  |4 -
 net/netlabel/netlabel_cipso_v4.c   |2 
 net/netlabel/netlabel_cipso_v4.h   |3 +
 net/netlabel/netlabel_domainhash.h |1 
 net/netlabel/netlabel_kapi.c   |  177 
 6 files changed, 225 insertions(+), 9 deletions(-)

diff --git a/include/net/netlabel.h b/include/net/netlabel.h
index 2e5b2f6..facaf68 100644
--- a/include/net/netlabel.h
+++ b/include/net/netlabel.h
@@ -36,6 +36,8 @@
 #include 
 #include 
 
+struct cipso_v4_doi;
+
 /*
  * NetLabel - A management interface for maintaining network packet label
  *mapping tables for explicit packet labling protocols.
@@ -99,12 +101,6 @@ struct netlbl_audit {
uid_t loginuid;
 };
 
-/* Domain mapping definition struct */
-struct netlbl_dom_map;
-
-/* Domain mapping operations */
-int netlbl_domhsh_remove(const char *domain, struct netlbl_audit *audit_info);
-
 /* LSM security attributes */
 struct netlbl_lsm_cache {
atomic_t refcount;
@@ -285,6 +281,19 @@ static inline void netlbl_secattr_free(struct 
netlbl_lsm_secattr *secattr)
 
 #ifdef CONFIG_NETLABEL
 /*
+ * LSM configuration operations
+ */
+int netlbl_cfg_map_del(const char *domain, struct netlbl_audit *audit_info);
+int netlbl_cfg_unlbl_add_map(const char *domain,
+struct netlbl_audit *audit_info);
+int netlbl_cfg_cipsov4_add(struct cipso_v4_doi *doi_def,
+  struct netlbl_audit *audit_info);
+int netlbl_cfg_cipsov4_add_map(struct cipso_v4_doi *doi_def,
+  const char *domain,
+  struct netlbl_audit *audit_info);
+int netlbl_cfg_cipsov4_del(u32 doi, struct netlbl_audit *audit_info);
+
+/*
  * LSM security attribute operations
  */
 int netlbl_secattr_catmap_walk(struct netlbl_lsm_secattr_catmap *catmap,
@@ -318,6 +327,32 @@ void netlbl_cache_invalidate(void);
 int netlbl_cache_add(const struct sk_buff *skb,
 const struct netlbl_lsm_secattr *secattr);
 #else
+static inline int netlbl_cfg_map_del(const char *domain,
+struct netlbl_audit *audit_info)
+{
+   return -ENOSYS;
+}
+static inline int netlbl_cfg_unlbl_add_map(const char *domain,
+  struct netlbl_audit *audit_info)
+{
+   return -ENOSYS;
+}
+static inline int netlbl_cfg_cipsov4_add(struct cipso_v4_doi *doi_def,
+struct netlbl_audit *audit_info)
+{
+   return -ENOSYS;
+}
+static inline int netlbl_cfg_cipsov4_add_map(struct cipso_v4_doi *doi_def,
+const char *domain,
+struct netlbl_audit *audit_info)
+{
+   return -ENOSYS;
+}
+static inline int netlbl_cfg_cipsov4_del(u32 doi,
+struct netlbl_audit *audit_info)
+{
+   return -ENOSYS;
+}
 static inline int netlbl_secattr_catmap_walk(
  struct netlbl_lsm_secattr_catmap *catmap,
  u32 offset)
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index ab56a05..714461c 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -557,8 +557,8 @@ int cipso_v4_doi_remove(u32 doi,
spin_unlock(&cipso_v4_doi_list_lock);
list_for_each_entry_rcu(dom_iter, &doi_def->dom_list, list)
if (dom_iter->valid)
-   netlbl_domhsh_remove(dom_iter->domain,
-audit_info);
+   netlbl_cfg_map_del(dom_iter->domain,
+  audit_info);
cipso_v4_cache_invalidate();
rcu_read_unlock();
 
diff --git a/net/netlabel/netlabel_cipso_v4.c b/net/netlabel/netlabel_cipso_v4.c
index c060e3f..07f7fd4 100644
--- a/net/netlabel/netlabel_cipso_v4.c
+++ b/net/netlabel/netlabel_cipso_v4.c
@@ -89,7 +89,7 @@ static const struct nla_policy 
netlbl_cipsov4_genl_policy[NLBL_CIPSOV4_A_MAX + 1
  * safely.
  *
  */
-static void netlbl_cipsov4_doi_free(struct rcu_head *entry)
+void netlbl_cipsov4_doi_free(struct rcu_head *entry)
 {
struct cipso_v4_doi *ptr;
 
diff --git a/net/netlabel/netlabel_cipso_v4.h b/net/netlabel/netlabel_cipso_v4.h
index f03cf9b..220cb9d 100644
--- a/net/netlabel/netlabel_cipso_v4.h
+++ b/net/netlabel/netlabel_cipso_v4.h
@@ -163,4 +163,7 @@ enum {
 /* NetLabel protocol functions */
 int netlbl_cipsov4_genl_init(void);
 
+/* Free the me

possible recursive locking detected... in __wake_up

2007-10-08 Thread Stefan Richter
Hi list,

how could this ever happen?

>>   =
>>   [ INFO: possible recursive locking detected ]
>>   2.6.23-0.222.rc9.git4.fc8 #1
>>   -
>>   X/2522 is trying to acquire lock:
>>(&q->lock){++..}, at: [] __wake_up+0x15/0x42
>>
>>   but task is already holding lock:
>>(&q->lock){++..}, at: [] __wake_up+0x15/0x42
>>
>>   other info that might help us debug this:
>>   2 locks held by X/2522:
>>#0:  (&client->lock){.+..}, at: [] queue_event+0x2b/0x68 
>> [firewire_core]
>>#1:  (&q->lock){++..}, at: [] __wake_up+0x15/0x42
>>
>>   stack backtrace:
>>[] show_trace_log_lvl+0x1a/0x2f
>>[] show_trace+0x12/0x14
>>[] dump_stack+0x16/0x18
>>[] __lock_acquire+0x189/0xc67
>>[] lock_acquire+0x7b/0x9e
>>[] _spin_lock_irqsave+0x4a/0x77
>>[] __wake_up+0x15/0x42
>>[] ep_poll_safewake+0x86/0xa8
>>[] ep_poll_callback+0x9f/0xaa
>>[] __wake_up_common+0x32/0x55
>>[] __wake_up+0x31/0x42
>>[] queue_event+0x57/0x68 [firewire_core]
>>[] handle_request+0xd8/0xe0 [firewire_core]
>>[] fw_core_handle_request+0x215/0x23c [firewire_core]
>>[] handle_ar_packet+0xd7/0xeb [firewire_ohci]
>>[] ar_context_tasklet+0xb6/0xc4 [firewire_ohci]
>>[] tasklet_action+0x68/0xd3
>>[] __do_softirq+0x78/0xff
>>[] do_softirq+0x74/0xf7
>>===
(from https://bugzilla.redhat.com/show_bug.cgi?id=323411)

We wake up the queue from a workqueue context (rarely) and from tasklet
context (frequently).  However, since __wake_up disables local IRQs, it
should be entirely impossible for __wake_up to take q->lock twice before
releasing it.  What's the deal?
-- 
Stefan Richter
-=-=-=== =-=- -=--=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]fix VM_CAN_NONLINEAR check in sys_remap_file_pages

2007-10-08 Thread Yan Zheng
2007/10/8, Hugh Dickins <[EMAIL PROTECTED]>:
> On Mon, 8 Oct 2007, Yan Zheng wrote:
> >
> > The test for VM_CAN_NONLINEAR always fails
> Good catch indeed.  Though I was puzzled how we do nonlinear at all,
> until I realized it's "The test for not VM_CAN_NONLINEAR always fails".
> It's not as serious as it appears, since code further down has been
> added more recently to simulate nonlinear on non-RAM-backed filesystems,
> instead of going the real nonlinear way; so most filesystems are now not
> required to do what VM_CAN_NONLINEAR was put in to ensure they could do.
> I'm confused as to where that leaves us: is this actually a fix that
> needs to go into 2.6.23?  or will it suddenly disable a system call
> which has been silently working fine on various filesystems which did
> not add VM_CAN_NONLINEAR?  could we just rip out VM_CAN_NONLINEAR?
> I hope Nick or Miklos is clearer on what the risks are.
> (Apologies for all the "not"s and "non"s here, I'm embarrassed
> after just criticizing Ingo's SCHED_NO_NO_OMIT_FRAME_POINTER!)
> Hugh

Yes, I mean "The test for not VM_CAN_NONLINEAR always fails".  please
forgive my poor English.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] aic94xx: Use request_firmware() to provide SAS address if the adapter lacks one

2007-10-08 Thread Andrew Vasquez
On Mon, 08 Oct 2007, Darrick J. Wong wrote:

> If the aic94xx chip doesn't have a SAS address in the chip's flash memory,
> use the request_firmware() interface to get one from userspace.  This
> way, there's no debate as to who or how an address gets generated--it's
> totally up to the administrator to provide it if the card doesn't have one.

So how about factoring that out to a transport-level interface.  How
about something along the lines of the following patch, whereby the
software driver upon detecting no valid WWPN, makes an upcall to each
interface's 'request_wwn()'.  The data passed in from shost_gendev
should be enough for some helper script to cull relevent device bits
and perhaps offer some level of persistence...  Off base?

Darrick, forgive the FC example, I don't do SAS...

--
av

--

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 7a7cfe5..5e0d953 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "scsi_priv.h"
 #include "scsi_transport_fc_internal.h"
 
@@ -3251,6 +3252,30 @@ fc_vport_sched_delete(struct work_struct *work)
vport->channel, stat);
 }
 
+int
+fc_request_wwn(struct Scsi_Host *shost, u64 *wwn)
+{
+   const struct firmware *fw;
+   int stat;
+
+   stat = request_firmware(&fw, "fc_addr", &shost->shost_gendev);
+   if (stat)
+   return stat;
+
+   if (fw->size < 16) {
+   stat = -EINVAL;
+   goto out;
+   }
+
+   stat = fc_parse_wwn(fw->data, wwn);
+   if (stat)
+   return stat;
+
+out:
+   release_firmware(fw);
+   return stat;
+}
+EXPORT_SYMBOL(fc_request_wwn);
 
 /* Original Author:  Martin Hicks */
 MODULE_AUTHOR("James Smart");
diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
index e466d88..e80c36c 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -734,4 +734,6 @@ void fc_host_post_vendor_event(struct Scsi_Host *shost, u32 
event_number,
 */
 int fc_vport_terminate(struct fc_vport *vport);
 
+int fc_request_wwn(struct Scsi_Host *, u64 *);
+
 #endif /* SCSI_TRANSPORT_FC_H */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: reviewer's statement of oversight

2007-10-08 Thread Jonathan Corbet
Sam Ravnborg <[EMAIL PROTECTED]> wrote:

> Or maybe we need something much less formal that explain the purpose of the
> four tags we use:

...or maybe a combination?  How does the following patch look as a way
to describe how the tags are used and what Reviewed-by, in particular,
means?

Perhaps the DCO should move to this file as well?

jon

---

Add a document on patch tags.

Signed-off-by: Jonathan Corbet <[EMAIL PROTECTED]>

diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 43e89b1..fa1518b 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -284,6 +284,8 @@ parport.txt
- how to use the parallel-port driver.
 parport-lowlevel.txt
- description and usage of the low level parallel port functions.
+patch-tags
+   - description of the tags which can be added to patches
 pci-error-recovery.txt
- info on PCI error recovery.
 pci.txt
diff --git a/Documentation/patch-tags b/Documentation/patch-tags
new file mode 100644
index 000..fb5f8e1
--- /dev/null
+++ b/Documentation/patch-tags
@@ -0,0 +1,66 @@
+Patches headed for the mainline may contain a variety of tags documenting
+who played a hand in (or was at least aware of) its progress.  All of these
+tags have the form:
+
+   Something-done-by: Full name <[EMAIL PROTECTED]>
+
+These tags are:
+
+Signed-off-by:  A person adding a Signed-off-by tag is attesting that the
+   patch is, to the best of his or her knowledge, legally able
+   to be merged into the mainline and distributed under the
+   terms of the GNU General Public License, version 2.  See
+   the Developer's Certificate of Origin, found in
+   Documentation/SubmittingPatches, for the precise meaning of
+   Signed-off-by.
+
+Acked-by:  The person named (who should be an active developer in the
+   area addressed by the patch) is aware of the patch and has
+   no objection to its inclusion.  An Acked-by tag does not
+   imply any involvement in the development of the patch or
+   that a detailed review was done.
+
+Reviewed-by:   The patch has been reviewed and found acceptible according
+   to the Reviewer's Statement as found at the bottom of this
+   file.  A Reviewed-by tag is a statement of opinion that the
+   patch is an appropriate modification of the kernel without
+   any remaining serious technical issues.  Any interested
+   reviewer (who has done the work) can offer a Reviewed-by
+   tag for a patch.
+
+Cc:The person named was given the opportunity to comment on
+   the patch.  This is the only tag which might be added
+   without an explicit action by the person it names.
+
+Tested-by: The patch has been successfully tested (in some
+   environment) by the person named.
+
+
+
+
+Reviewer's statement of oversight, v0.02
+
+By offering my Reviewed-by: tag, I state that:
+
+ (a) I have carried out a technical review of this patch to evaluate its
+ appropriateness and readiness for inclusion into the mainline kernel. 
+
+ (b) Any problems, concerns, or questions relating to the patch have been
+ communicated back to the submitter.  I am satisfied with how the
+ submitter has responded to my comments.
+
+ (c) While there may (or may not) be things which could be improved with
+ this submission, I believe that it is, at this time, (1) a worthwhile
+ modification to the kernel, and (2) free of known issues which would
+ argue against its inclusion.
+
+ (d) While I have reviewed the patch and believe it to be sound, I can not
+ (unless explicitly stated elsewhere) make any warranties or guarantees
+ that it will achieve its stated purpose or function properly in any
+ given situation.
+
+ (e) I understand and agree that this project and the contribution are
+ public and that a record of the contribution (including my Reviewed-by
+ tag and any associated public communications) is maintained
+ indefinitely and may be redistributed consistent with this project or
+ the open source license(s) involved.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Correct Makefile rule for generating custom keymap.

2007-10-08 Thread Maarten Bressers
When building a custom keymap, after setting GENERATE_KEYMAP := 1 in
drivers/char/Makefile, the kernel build fails like this:

  CC  drivers/char/vt.o
make[2]: *** No rule to make target `drivers/char/%.map', needed by
`drivers/char/defkeymap.c'.  Stop.
make[1]: *** [drivers/char] Error 2
make: *** [drivers] Error 2

This was caused by commit af8b128719f5248e542036ea994610a29d0642a6,
which deleted a necessary colon from the Makefile rule that generates
the keymap, since that rule contains both a target and a target-pattern.
The following patch puts the colon back:

Signed-off by: Maarten Bressers <[EMAIL PROTECTED]>


--- a/drivers/char/Makefile 2007-10-08 23:46:47.0 +0200
+++ b/drivers/char/Makefile 2007-10-08 23:46:57.0 +0200
@@ -129,7 +129,7 @@ $(obj)/defkeymap.o:  $(obj)/defkeymap.c
 
 ifdef GENERATE_KEYMAP
 
-$(obj)/defkeymap.c $(obj)/%.c: $(src)/%.map
+$(obj)/defkeymap.c: $(obj)/%.c: $(src)/%.map
loadkeys --mktable $< > [EMAIL PROTECTED]
sed -e 's/^static *//' [EMAIL PROTECTED] > $@
rm [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

2007-10-08 Thread Helge Hafting

Thomas Gleixner wrote:

On Sat, 29 Sep 2007, Helge Hafting wrote:
  

Thomas Gleixner wrote:


I have gone back to 2.6.22rc4, which seems to work.

This is a single opteron, although on a dual-slot board.



Can you switch to serial console, so we can get some information out of
that box? Sysrq-B is working, so we can get info from other sysrq
functions as well.
  
  

I didn't need the serial - it crashes during console work too.
I think a "make clean" was in progress at the time. There must be work going
on in order to crash.

This time 2.6.22rc4 died on me with a general protection fault

I got two reports, the first one scrolled partially off screen but
the whole trace was there:



That's why I asked for a serial console. That way we can get all the
information from the reports including the register dumps 
  

I got another crash - with a full dump.  I have also discovered
files with lots of single-bit errors, so this is probably just some kind
of hw problem. :-(

Replace mermory or the motherboard with everything on it . . . :-(

Helge Hafting

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: parallel networking

2007-10-08 Thread Waskiewicz Jr, Peter P
> Multiply whatever effect you think you might be able to 
> measure due to that on your 2 or 4 way system, and multiple 
> it up to 64 cpus or so for machines I am using.  This is 
> where machines are going, and is going to become the norm.

That along with speeds going to 10 GbE with multiple Tx/Rx queues (with
40 and 100 GbE under discussion now), where multiple CPU's hitting the
driver are needed to push line rate without cratering the entire
machine.

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: parallel networking

2007-10-08 Thread David Miller
From: jamal <[EMAIL PROTECTED]>
Date: Mon, 08 Oct 2007 18:30:18 -0400

> Very quickly there are no more packets for it to dequeue from the
> qdisc or the driver is stoped and it has to get out of there. If you
> dont have any interupt tied to a specific cpu then you can have many
> cpus enter and leave that region all the time.

With the lock shuttling back and forth between those cpus, which is
what we're trying to avoid.

Multiply whatever effect you think you might be able to measure due to
that on your 2 or 4 way system, and multiple it up to 64 cpus or so
for machines I am using.  This is where machines are going, and is
going to become the norm.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: gigabit ethernet power consumption

2007-10-08 Thread Kok, Auke
Pavel Machek wrote:
> Hi!
> 
> I've found that gbit vs. 100mbit power consumption difference is about
> 1W -- pretty significant. (Maybe powertop should include it in the
> tips section? :).
> 
> Energy Star people insist that machines should switch down to 100mbit
> when network is idle, and I guess that makes a lot of sense -- you
> save 1W locally and 1W on the router.
> 
> Question is, how to implement it correctly? Daemon that would watch
> data rates and switch speeds using mii-tool would be simple, but is
> that enough?

you most certainly want to do this in userspace I think.

One of the biggest problems is that link negotiation can take a significant 
amount
of time, well over several seconds (1 to 3 seconds typical) with gigabit, and
having your ethernet connection go offline for 3 seconds may not be the desired
effect for when you want to get more bandwidth in the first place.

However, when a laptop is in battery mode, switching down from gigabit to 
100mbit
makes a lot more sense, so this is something I would recommend. This can be as
easy as changing the advertisement mask of the interface and renegotiating the
link. Userspace could handle that very easily.

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: parallel networking

2007-10-08 Thread jamal
On Mon, 2007-08-10 at 14:11 -0700, David Miller wrote:

> The problem is that the packet schedulers want global guarantees
> on packet ordering, not flow centric ones.
> 
> That is the issue Jamal is concerned about.

indeed, thank you for giving it better wording. 

> The more I think about it, the more inevitable it seems that we really
> might need multiple qdiscs, one for each TX queue, to pull this full
> parallelization off.
> 
> But the semantics of that don't smell so nice either.  If the user
> attaches a new qdisc to "ethN", does it go to all the TX queues, or
> what?
> 
> All of the traffic shaping technology deals with the device as a unary
> object.  It doesn't fit to multi-queue at all.

If you let only one CPU at a time access the "xmit path" you solve all
the reordering. If you want to be more fine grained you make the
serialization point as low as possible in the stack - perhaps in the
driver.
But I think even what we have today with only one cpu entering the
dequeue/scheduler region, _for starters_, is not bad actually ;->  What
i am finding (and i can tell you i have been trying hard;->) is that a
sufficiently fast cpu doesnt sit in the dequeue area for "too long" (and
batching reduces the time spent further). Very quickly there are no more
packets for it to dequeue from the qdisc or the driver is stoped and it
has to get out of there. If you dont have any interupt tied to a
specific cpu then you can have many cpus enter and leave that region all
the time. 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc9 compile error drivers/video/fbmon.c

2007-10-08 Thread Helge Hafting

Adrian Bunk wrote:

On Tue, Oct 09, 2007 at 12:00:38AM +0200, Helge Hafting wrote:
  

 CC  drivers/video/fbmon.o
drivers/video/fbmon.c: In function ‘fb_parse_edid’:
drivers/video/fbmon.c:867: error: expected ‘=’, ‘,’, ‘;’, 
‘asm’ or ‘__attrib

_’ before ‘*’ token
drivers/video/fbmon.c:867: error: ‘block’ undeclared (first use in this 
func

)

This line reads:
   unsigned char$*block;

Source error, or is my tree simply corrupt?



The $ is a space character in my copy of the tree, so it seems to be a 
corrupted tree with a bit error on your side ($ and the space character 
differ by only one bit).
  

I downloaded a new tree.
That file has quite a few single-bit errors in my old tree.
Mostly of the +16 variety.
Seems I have to look for bad memory :-( :-( :-(

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc9 compile error drivers/video/fbmon.c

2007-10-08 Thread Adrian Bunk
On Tue, Oct 09, 2007 at 12:00:38AM +0200, Helge Hafting wrote:
>  CC  drivers/video/fbmon.o
> drivers/video/fbmon.c: In function ‘fb_parse_edid’:
> drivers/video/fbmon.c:867: error: expected ‘=’, ‘,’, ‘;’, 
> ‘asm’ or ‘__attrib
> _’ before ‘*’ token
> drivers/video/fbmon.c:867: error: ‘block’ undeclared (first use in this 
> func
> )
>
> This line reads:
>unsigned char$*block;
>
> Source error, or is my tree simply corrupt?

The $ is a space character in my copy of the tree, so it seems to be a 
corrupted tree with a bit error on your side ($ and the space character 
differ by only one bit).

> Helge Hafting

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc9 compile error drivers/video/fbmon.c

2007-10-08 Thread Randy Dunlap
On Tue, 09 Oct 2007 00:00:38 +0200 Helge Hafting wrote:

>   CC  drivers/video/fbmon.o
> drivers/video/fbmon.c: In function ‘fb_parse_edid’:
> drivers/video/fbmon.c:867: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attrib
> _’ before ‘*’ token
> drivers/video/fbmon.c:867: error: ‘block’ undeclared (first use in this func
> )
> 
> This line reads:
> unsigned char$*block;
> 
> Source error, or is my tree simply corrupt?

Hi,
It's a space in my kernel source file.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


gigabit ethernet power consumption

2007-10-08 Thread Pavel Machek
Hi!

I've found that gbit vs. 100mbit power consumption difference is about
1W -- pretty significant. (Maybe powertop should include it in the
tips section? :).

Energy Star people insist that machines should switch down to 100mbit
when network is idle, and I guess that makes a lot of sense -- you
save 1W locally and 1W on the router.

Question is, how to implement it correctly? Daemon that would watch
data rates and switch speeds using mii-tool would be simple, but is
that enough?
Pavel 
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >