Re: [PATCH] powerpc: use is_init()

2006-12-19 Thread Akinobu Mita
On Wed, Dec 20, 2006 at 03:06:51PM +1100, Paul Mackerras wrote:
> Akinobu Mita writes:
> 
> > Use is_init() rather than hard coded pid comparison.
> 
> What's the context of this patch?  Why is this a good thing to do?
> 

This is just minor cleanup patch.
is_init() is available on 2.6.20-rc1 (include/linux/sched.h):

/**
 * is_init - check if a task structure is init
 * @tsk: Task structure to be checked.
 *
 * Check if a task structure is the first user space task the kernel created.
 */
static inline int is_init(struct task_struct *tsk)
{
return tsk->pid == 1;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread Arjan van de Ven

> Seriously. How many pieces of userspace-visible functionality have 
> recently been removed without there being any sort of alternative?

There IS an alternative, you're using it for networking:
 
You *down the interface*.

If there's a NIC that doesn't support that let us (or preferably netdev)
know and it'll get fixed quickly I'm sure.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: problem with signal delivery SIGCHLD

2006-12-19 Thread Mike Galbraith
On Mon, 2006-12-18 at 20:05 +0100, Nicholas Mc Guire wrote:
> 
> Hi !
> 
>   I have a phenomena that I don't quite understand. gdbserver forks and 
> after setting ptrace (PTRACE_TRACEME, 0, 0, 0); it then execv 
> (program, allargs); when this child process hits ptrace_stoped (breakpoint
> it does the following in kernel space:
> 
> pid 1242 = child process
> pid 1241 = gdbserver
> pid 0= kernel
> pid -1   = interrupt
>  pid
>   1559  51242 ptrace_stop
>   3  6  21242 |  do_notify_parent_cldstop
>   4  3  21242 |  |  __group_send_sig_info
>   5  1  11242 |  |  |  handle_stop_signal
>   7  0  01242 |  |  |  sig_ignored
>   8  1  01242 |  |  __wake_up_sync
>   8  1  11242 |  |  |  __wake_up_common
>  105475411242 |  schedule
>  10  2  21242 |  |  profile_hit
>  13  1  11242 |  |  sched_clock
>  15  1  01242 |  |  deactivate_task
>  15  1  11242 |  |  |  dequeue_task
>  19  2  2   0 |  |  __switch_to
> ---  start --
>  24574574   0 default_idle
> ---  end 
> ---  start --
> 780 41 12   0 do_IRQ
> 780 29  2  -1 /  __do_IRQ
> ...
> 807  2  2  -1 /  /  /  enable_8259A_irq
> ---  end 
> ---  start --
> 810 11  0   0 do_softirq
> ...
> 820  0  0  -1 {  {  {  preempt_schedule
> ---  end 
> ---  start --
> 822358  1   0 preempt_schedule_irq
> ...
> 827  1  11241 %  %  __switch_to
> ---  end 
> 829  1  11241 (  (  (  del_timer
> ---  end 
> ---  start --
> 837  8  21241 sys_waitpid
> 
> So basically child signals -> delayed to next tick -> parent wakes up.

Hm.  What does the trace of gdbserver look like prior to the clild doing
do_notify_parent_cldstop()?  Sleeping someplace other than wait4?

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kernel-doc: allow unnamed structs/unions

2006-12-19 Thread Randy Dunlap
From: Randy Dunlap <[EMAIL PROTECTED]>

Make kernel-doc support unnamed (anonymous) structs and unions.
There is one (union) in include/linux/skbuff.h (inside struct sk_buff)
that is currently generating a kernel-doc warning, so this
fixes that warning.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 scripts/kernel-doc |   17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

--- linux-2.6.20-rc1-git7.orig/scripts/kernel-doc
+++ linux-2.6.20-rc1-git7/scripts/kernel-doc
@@ -1469,6 +1469,7 @@ sub push_parameter($$$) {
my $param = shift;
my $type = shift;
my $file = shift;
+   my $anon = 0;
 
my $param_name = $param;
$param_name =~ s/\[.*//;
@@ -1484,9 +1485,20 @@ sub push_parameter($$$) {
$param="void";
$parameterdescs{void} = "no arguments";
}
+   elsif ($type eq "" && ($param eq "struct" or $param eq "union"))
+   # handle unnamed (anonymous) union or struct:
+   {
+   $type = $param;
+   $param = "{unnamed_" . $param. "}";
+   $parameterdescs{$param} = "anonymous\n";
+   $anon = 1;
+   }
+
# warn if parameter has no description
-   # (but ignore ones starting with # as these are no parameters
-   # but inline preprocessor statements
+   # (but ignore ones starting with # as these are not parameters
+   # but inline preprocessor statements);
+   # also ignore unnamed structs/unions;
+   if (!$anon) {
if (!defined $parameterdescs{$param_name} && $param_name !~ /^#/) {
 
$parameterdescs{$param_name} = $undescribed;
@@ -1500,6 +1512,7 @@ sub push_parameter($$$) {
 " No description found for parameter '$param'\n";
++$warnings;
 }
+}
 
push @parameterlist, $param;
$parametertypes{$param} = $type;


---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: wedged processes, test program supplied

2006-12-19 Thread Mike Galbraith
On Wed, 2006-12-20 at 01:05 -0500, Albert Cahalan wrote:
> On 12/20/06, Mike Galbraith <[EMAIL PROTECTED]> wrote:
> > On Tue, 2006-12-19 at 21:46 -0500, Albert Cahalan wrote:
> > > Somebody PLEASE try this...
> >
> > I was having enough fun with cloninator (which was whitespace munged
> > btw).
> 
> Anything stuck? Besides refusing to die, that beast slays debuggers
> left and right. I just need to add execve of /proc/self/exe and a massive
> storm of signals on the alternate stack.

Usually, I can kill the misbehaving strace or abandoned cloninators if
it decides to take a hike, but sometimes it leaves corpses lying around.

> Oh. I wanted to be sure you'd see the problem. Did you have
> some... difficulty? A plain old ^C should make things stop.
> The second test program is like the first, but missing SIGCHLD
> >from the clone flags, and hopefully not whitespace-mangled.
> 
> Note that the test program is not normally a fork bomb.
> It self-limits itself to 42 tasks via a lock in shared memory.
> If things are working OK, you should see no more than
> about 60 tasks.

I didn't take any countermeasures.. had ~27000 zombies.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO-APIC + timer doesn't work

2006-12-19 Thread Yinghai Lu

On 12/19/06, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

So the pin2 case should be tested right after the pin1 case as we do
currently.  On most new boards that will be a complete noop.

But it is better than our current blind guess at using ExtINT mode.

I figure after we try what the BIOS has told us about and that
has failed we should first try the common irq 0 apic mappings,
and then try the common ExtINT mappings.


Please check if this one is ok.
[PATCH] x86_64: check_timer with io apic setup before try_apic_pin

add io apic setup before try_apic_pin

cc: Andi Kleen <[EMAIL PROTECTED]>
cc: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 2a1dcd5..6d09fc0 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -273,10 +273,17 @@ static void add_pin_to_irq(unsigned int irq, int apic, int pin)
 	struct irq_pin_list *entry = irq_2_pin + irq;
 
 	BUG_ON(irq >= NR_IRQS);
-	while (entry->next)
+	while (entry->next) {
+		if (entry->apic == apic && entry->pin == pin) 
+			return;
+		if (entry->pin == -1) 
+			break;
 		entry = irq_2_pin + entry->next;
+	}
 
 	if (entry->pin != -1) {
+		if (entry->apic == apic && entry->pin == pin) 
+			return;
 		entry->next = first_free_entry;
 		entry = irq_2_pin + entry->next;
 		if (++first_free_entry >= PIN_MAP_SIZE)
@@ -286,6 +293,24 @@ static void add_pin_to_irq(unsigned int irq, int apic, int pin)
 	entry->pin = pin;
 }
 
+static void remove_pin_to_irq(unsigned int irq, int apic, int pin)
+{
+	struct irq_pin_list *entry = irq_2_pin + irq;
+
+	BUG_ON(irq >= NR_IRQS);
+
+	while (entry) {
+		if (entry->apic == apic && entry->pin == pin) {
+			entry->apic = -1;
+			entry->pin = -1;
+			break;
+		}
+		if (entry->next) 
+			entry = irq_2_pin + entry->next;
+	}
+
+}
+
 
 #define DO_ACTION(name,R,ACTION, FINAL)	\
 	\
@@ -367,6 +392,34 @@ static int find_irq_entry(int apic, int pin, int type)
 	return -1;
 }
 
+static int add_irq_entry(int type, int irqflag, int bus, int irq, int apic, int pin)
+{
+struct mpc_config_intsrc intsrc;
+	int idx;
+
+intsrc.mpc_type = MP_INTSRC;
+intsrc.mpc_irqflag = irqflag; /* conforming */
+intsrc.mpc_srcbus = bus;
+intsrc.mpc_dstapic = (apic != -1) ? mp_ioapics[apic].mpc_apicid: MP_APIC_ALL;
+
+intsrc.mpc_irqtype = type;
+
+intsrc.mpc_srcbusirq = irq;
+intsrc.mpc_dstirq = pin;
+
+mp_irqs [mp_irq_entries] = intsrc;
+Dprintk("Int: type %d, pol %d, trig %d, bus %d,"
+" IRQ %02x, APIC ID %x, APIC INT %02x\n",
+intsrc.mpc_irqtype, intsrc.mpc_irqflag & 3,
+(intsrc.mpc_irqflag >> 2) & 3, intsrc.mpc_srcbus,
+intsrc.mpc_srcbusirq, intsrc.mpc_dstapic, intsrc.mpc_dstirq);
+idx = mp_irq_entries;
+	if (++mp_irq_entries >= MAX_IRQ_SOURCES)
+panic("Max # of irq sources exceeded!!\n");
+	return idx;
+
+}
+
 /*
  * Find the pin to which IRQ[irq] (ISA) is connected
  */
@@ -1570,6 +1658,22 @@ static inline void unlock_ExtINT_logic(void)
  * fanatically on his truly buggy board.
  */
 
+static void set_try_apic_pin(int apic, int pin, int type)
+{
+	int idx;
+	int irq = 0;
+	int bus = 0; /* MP_ISA_BUS */
+	int irqflag = 5; /* MP_IRQ_TRIGGER_EDGE|MP_IRQ_POLARITY_HIGH */
+
+	idx = find_irq_entry(apic,pin,type);
+
+	if (idx == -1) 
+		idx = add_irq_entry(type, irqflag, bus, irq, apic, pin);
+
+	add_pin_to_irq(irq, apic, pin);
+	setup_IO_APIC_irq(apic, pin, idx, irq);
+}
+
 static int try_apic_pin(int apic, int pin, char *msg)
 {
 	apic_printk(APIC_VERBOSE, KERN_INFO
@@ -1588,7 +1692,7 @@ static int try_apic_pin(int apic, int pin, char *msg)
 		}
 		return 1;
 	}
-	clear_IO_APIC_pin(apic, pin);
+
 	apic_printk(APIC_QUIET, KERN_ERR " .. failed\n");
 	return 0;
 }
@@ -1599,12 +1703,13 @@ static void check_timer(void)
 	int apic1, pin1, apic2, pin2;
 	int vector;
 	cpumask_t mask;
+	int i;
 
 	/*
 	 * get/set the timer IRQ vector:
 	 */
-	disable_8259A_irq(0);
 	vector = assign_irq_vector(0, TARGET_CPUS, );
+	disable_8259A_irq(0);
 
 	/*
 	 * Subtle, code in do_timer_interrupt() expects an AEOI
@@ -1621,33 +1726,51 @@ static void check_timer(void)
 	pin2  = ioapic_i8259.pin;
 	apic2 = ioapic_i8259.apic;
 
-	/* Do this first, otherwise we get double interrupts on ATI boards */
-	if ((pin1 != -1) && try_apic_pin(apic1, pin1,"with 8259 IRQ0 disabled"))
-		return;
+	apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n",
+		vector, apic1, pin1, apic2, pin2);
 
-	/* Now try again with IRQ0 8259A enabled.
-	   Assumes timer is on IO-APIC 0 ?!? */
-	enable_8259A_irq(0);
-	unmask_IO_APIC_irq(0);
-	if (try_apic_pin(apic1, pin1, "with 8259 IRQ0 enabled"))
-		return;
-	disable_8259A_irq(0);
+	if (pin1 != -1) {
+		/* Do this first, otherwise we get double interrupts on ATI boards */
+		/* 

Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine

2006-12-19 Thread Chuck Ebbert
In-Reply-To: <[EMAIL PROTECTED]>

On Tue, 19 Dec 2006 17:29:00 -0800, Andrew Morton wrote:

> Quoting the bug report:

> general protection fault: 013b [1] PREEMPT 

That '013b' is critical information.

Bit 0: 1: exception source is external to the processor
Bit 1: 1: there is a problem with an interrupt descriptor in the IDT
Bit 2: n/a
Bits 15-3: index of the problem descriptor

So an external interrupt occurred, the system tried to use interrupt
descriptor #39 decimal (irq 7), but the descriptor was invalid.
-- 
MBTI: IXTP

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: util-linux: orphan

2006-12-19 Thread Albert Cahalan

Karel Zak writes:


I've originally thought about util-linux upstream fork,
but as usually an fork is bad step. So.. I'd like to start
some discussion before this step.

...

after few weeks I'm pleased to announce a new "util-linux-ng"
project. This project is a fork of the original util-linux (2.13-pre7).


Aw damn, I missed it again. LKML gets about 300 posts/day. The last
time util-linux was offered, I missed out. Bummer.

Well, how about giving me a chunk of it? I'd like /bin/kill please.
I already ship a nicer one in procps anyway, so you can just delete
the files and call that done. (just today I was working on a Fedora
system and /bin/kill annoyed me)

VERY STRONG SUGGESTION: build a full test suite before you mess with
the source. This isn't some cute toy like xeyes or a silly game.
This is util-linux, which MUST work.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fdtable: Provide free_fdtable() wrapper.

2006-12-19 Thread Vadim Lobanov
Hi,

Christoph Hellwig has expressed concerns that the recent fdtable changes
expose the details of the RCU methodology used to release no-longer-used
fdtable structures to the rest of the kernel. The trivial patch below
addresses these concerns by introducing the appropriate free_fdtable() calls,
which simply wrap the release RCU usage. Since free_fdtable() is a one-liner,
it makes sense to promote it to an inline helper.

Please apply.

Signed-off-by: Vadim Lobanov <[EMAIL PROTECTED]>

diff -pru old/fs/file.c new/fs/file.c
--- old/fs/file.c   2006-12-19 19:54:23.0 -0800
+++ new/fs/file.c   2006-12-19 20:04:02.0 -0800
@@ -206,7 +206,7 @@ static int expand_fdtable(struct files_s
copy_fdtable(new_fdt, cur_fdt);
rcu_assign_pointer(files->fdt, new_fdt);
if (cur_fdt->max_fds > NR_OPEN_DEFAULT)
-   call_rcu(_fdt->rcu, free_fdtable_rcu);
+   free_fdtable(cur_fdt);
} else {
/* Somebody else expanded, so undo our attempt */
free_fdarr(new_fdt);
diff -pru old/include/linux/file.h new/include/linux/file.h
--- old/include/linux/file.h2006-12-19 19:54:25.0 -0800
+++ new/include/linux/file.h2006-12-19 20:03:19.0 -0800
@@ -80,6 +80,11 @@ extern int expand_files(struct files_str
 extern void free_fdtable_rcu(struct rcu_head *rcu);
 extern void __init files_defer_init(void);
 
+static inline void free_fdtable(struct fdtable *fdt)
+{
+   call_rcu(>rcu, free_fdtable_rcu);
+}
+
 static inline struct file * fcheck_files(struct files_struct *files, unsigned 
int fd)
 {
struct file * file = NULL;
diff -pru old/kernel/exit.c new/kernel/exit.c
--- old/kernel/exit.c   2006-12-19 19:54:52.0 -0800
+++ new/kernel/exit.c   2006-12-19 20:04:20.0 -0800
@@ -466,7 +466,7 @@ void fastcall put_files_struct(struct fi
fdt = files_fdtable(files);
if (fdt != >fdtab)
kmem_cache_free(files_cachep, files);
-   call_rcu(>rcu, free_fdtable_rcu);
+   free_fdtable(fdt);
}
 }
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


downloading kernels w/ metalink (mirrors, checksums, signatures)

2006-12-19 Thread meta link

Hi,

This may not be as nice for kernels as for other downloads because of
how nicely organized the kernel mirrors are, but maybe some people
will be interested.

Metalink is a system which attempts to improve the download process by
increasing availability and guaranteeing integrity. It can give your
users a more reliable download by providing multiple links to the same
file, which can be switched to if one server is down or fails during
transmission. It can also make downloads faster by using multiple
resources at once.

Metalink lists mirrors with machine readable information on priority
and location so their efficient use can be automated by download
programs. It can list mirrors around the world, but will automatically
default to mirrors closer to you and by priority. The checksum
verification process, usually manual and arcane to most people, is
automated with Metalink, so files are guaranteed to be an exact copy
of the file you downloaded, free of errors. Metalinks can also contain
publisher information, Operating System and architecture, language,
file descriptions, mutliple files (to be added to a download queue),
partial file checksums, and so on. All this extra information allows
download programs to do interesting things.

Linux Kernel Metalink downloads (All):
http://download.packages.ro/metalink/kernel/
More details..."Downloading bliss with Metalink":
http://www.linux.com/article.pl?sid=06/11/01/1641247

Partial example .metalink:


http://www.metalinker.org/;
 
origin="http://prog.infosnel.nl/metalinks/kernel.php/kernel/v2.6/linux-2.6.19.tar.bz2.metalink;
 generator="http://prog.infosnel.nl/metalinks/kernel.php;>
 
Kernel.org
http://kernel.org/
 


   2.6.19

 443c265b57e87eadc0c677c3acc37e20



http://www.al.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2
ftp://ftp.al.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2
http://www.aq.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2
http://www.ag.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2



http://www.al.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2.sign
ftp://ftp.al.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2.sign
http://www.aq.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2.sign
http://www.ag.kernel.org/pub/linux/kernel/v2.6/linux-2.6.19.tar.bz2.sign





A real .metalink would list all mirrors.

Metalink is supported by download managers on Mac, Unix, and Windows.

aria2 ( http://aria2.sourceforge.net/ ) is a really nice command line
client. You can use command line options to default to mirrors in a
certain country (--metalink-location=XX) and other things.

The main users of metalink are OpenOffice.org, openSUSE, Arch Linux,
and other Linux distributions for ISO downloads.

(( Anthony Bryan
)) Metalink [ http://www.metalinker.org ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-mm patch] make uio_irq_handler() static

2006-12-19 Thread Greg KH
On Sat, Dec 16, 2006 at 02:56:54PM +0100, Adrian Bunk wrote:
> On Thu, Dec 14, 2006 at 10:59:13PM -0800, Andrew Morton wrote:
> >...
> > Changes since 2.6.19-mm1:
> >...
> > +gregkh-driver-uio-irq.patch
> > 
> >  driver tree updates
> >...
> 
> This patch makes the needlessly global uio_irq_handler() static.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 
> --- linux-2.6.20-rc1-mm1/drivers/uio/uio_irq.c.old2006-12-15 
> 22:23:23.0 +0100
> +++ linux-2.6.20-rc1-mm1/drivers/uio/uio_irq.c2006-12-15 
> 22:33:40.0 +0100
> @@ -22,7 +22,7 @@
>  
>  static struct uio_device *uio_irq_idev;
>  
> -irqreturn_t uio_irq_handler(int irq, void *dev_id)
> +static irqreturn_t uio_irq_handler(int irq, void *dev_id)
>  {
>   return IRQ_HANDLED;
>  }

Thanks, I've applied this to my tree.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: wedged processes, test program supplied

2006-12-19 Thread Albert Cahalan

On 12/20/06, Mike Galbraith <[EMAIL PROTECTED]> wrote:

On Tue, 2006-12-19 at 21:46 -0500, Albert Cahalan wrote:
> Somebody PLEASE try this...

I was having enough fun with cloninator (which was whitespace munged
btw).


Anything stuck? Besides refusing to die, that beast slays debuggers
left and right. I just need to add execve of /proc/self/exe and a massive
storm of signals on the alternate stack.

In the original post, I also mangled the recommended ps command:
ps -Ccloninator
-mwostat,ppid,pid,tid,nlwp,pending,sigmask,sigignore,caught,wchan

Leave out pid,tid,nlwp if you need to save screen space, like so:
ps -Ccloninator -mwostat,ppid,pending,sigmask,sigignore,caught,wchan

(note: procps versions prior to 3.2.7 are mostly fine, but will mess
up the PENDING column for any single-threaded processes you get)

This is fun to look at:
watch ps -Ccloninator fostat,ppid,wchan:9,comm


> Normally, when a process dies it becomes a zombie.
> If the parent dies (before or after the child), the child
> is adopted by init. Init will reap the child.
>
> The program included below DOES NOT get reaped.

While true wasn't a great test recommendation :)


Oh. I wanted to be sure you'd see the problem. Did you have
some... difficulty? A plain old ^C should make things stop.
The second test program is like the first, but missing SIGCHLD
from the clone flags, and hopefully not whitespace-mangled.

Note that the test program is not normally a fork bomb.
It self-limits itself to 42 tasks via a lock in shared memory.
If things are working OK, you should see no more than
about 60 tasks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Jari Sundell

On 12/20/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:

On Tue, 19 Dec 2006, Linus Torvalds wrote:
>
>  here's a totally new tangent on this: it's possible that user code is
> simply BUGGY.

Btw, here's a simpler test-program that actually shows the difference
between 2.6.18 and 2.6.19 in action, and why it could explain why a
program like rtorrent might show corruption behavious that it didn't show
before.


Kinda late to the discussion, but I guess I could summarize what
rtorrent actually does, or should be doing.

When downloading a new torrent, it will create the files and truncate
them to the final size. It will never call truncate after this and the
files will remain sparse until data is downloaded. A 'piece' is mapped
to memory using MAP_SHARED, which will be page aligned on single file
torrents but unlikely to be so on multi-file torrents.

So on multi-file torrents it'll often end up with two mappings
overlapping with one page, each of which only write to their own part
the page. These will then be sync'ed with MS_ASYNC, or MS_SYNC if low
on disk space. After that it might be unmapped, then mapped as
read-only.

I haven't thought of asking if single file torrents are ok.

Rakshasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread Matthew Garrett
On Tue, Dec 19, 2006 at 09:34:17PM -0800, Greg KH wrote:

> I would be very interested to see any newer SuSE programs using that
> interface.  Just point them out to me and I'll quickly fix them.

As far as I can tell, powersaved still uses these.. I'm not quite sure 
how you can fix it without just removing the functionality from it...

> And yes, as a SuSE developer (and one of the people in charge of the
> SuSE kernels), I have no problem with these files just going away.
> Because, as David keeps repeating, they are broken and wrong.

In the common case, it works perfectly well for the management of 
individual PCI devices. Yes it's "wrong", in much the same way as (say) 
the IDE bus registration/unregistration code. But we keep that around 
because despite it being even more broken than devices/.../power/state, 
people are still actually using it and we haven't provided any sort of 
alternative.

Seriously. How many pieces of userspace-visible functionality have 
recently been removed without there being any sort of alternative?
-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] procfs: export context switch counts in /proc/*/stat

2006-12-19 Thread Albert Cahalan

David Wragg writes:

Benjamin LaHaise <[EMAIL PROTECTED]> writes:

On Mon, Dec 18, 2006 at 11:50:08PM +, David Wragg wrote:



This patch (against 2.6.19/2.6.19.1) adds the four context
switch values (voluntary context switches, involuntary
context switches, and the same values accumulated from
terminated child processes) to the end of /proc/*/stat,
similarly to min_flt, maj_flt and the time used values.


Hmmm, OK, do people have a use for these values?


Please put these into new files, as the stat files in /proc are
horribly overloaded and have always been somewhat problematic
when it comes to changing how things are reported due to internal
changes to the kernel.  Cheers,


No thanks. Yours truly, the maintainer of "ps", "top", "vmstat", etc.


The delay accounting value was added to the end of /proc/pid/stat back
in July without discussion, so I assumed this approach was still
considered satisfactory.


/proc/*/stat is the very best place in /proc for any per-process
data that will be commonly needed. Unlike /proc/*/status, few
people are tempted to screw with the formatting and/or spelling.
Unlike the /sys crap, it doesn't take 3 syscalls PER VALUE to
get at the data.

The things to ask are of course: will this really be used, and
does it really belong in /proc at all?


Putting just these four values into a new file would seem a little
odd, since they have a lot in common with the other getrusage values
that are already in /proc/pid/stat.  One possibility is to add
/proc/pid/rusage, mirroring the full struct rusage in text form, since
struct rusage is already part of the kernel ABI (though Linux doesn't
fill in half of the values).


Since we already have a struct defined and all...

sys_get_rusage(int pid)


Or perhaps it makes sense to reorganize all the values from
/proc/pid/stat and its siblings into a sysfs-like one-value-per-file
structure, though that might introduce atomicity and efficiency issues
(calculating some of the values involves iterating over the threads in
the process; with everything in one file, these loops are folded
together).


Yeah, big time. Things are quite bad in /proc, but /sys is a joke.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: wedged processes, test program supplied

2006-12-19 Thread Mike Galbraith
On Tue, 2006-12-19 at 21:46 -0500, Albert Cahalan wrote:
> Somebody PLEASE try this...

I was having enough fun with cloninator (which was whitespace munged
btw).

> Normally, when a process dies it becomes a zombie.
> If the parent dies (before or after the child), the child
> is adopted by init. Init will reap the child.
> 
> The program included below DOES NOT get reaped.

While true wasn't a great test recommendation :)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] powerpc: remove the broken Gemini support

2006-12-19 Thread Paul Mackerras
Roman Zippel writes:

> Well, there are still patches umerged for over a year, they probably still 
> apply mostly.

Please rebase and repost them, if you want them to go in.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread Greg KH
On Tue, Dec 19, 2006 at 09:14:49PM -0800, David Brownell wrote:
> On Tuesday 19 December 2006 8:26 pm, Matthew Garrett wrote:
> > On Tue, Dec 19, 2006 at 07:59:42PM -0800, David Brownell wrote:
> > It's perfectly reasonable to  
> > refer to it as a flawed interface, or perhaps even a buggy one. But in 
> > itself, it's clearly not a bug.
> 
> This class of bug is also called a "design bug" or sometimes "mistake".

Exactly, those "power" files actually pre-date the actual tree of
devices itself.  They were just holders for what the original developer
thought was going to be needed, but was never properly implemented due
to some job changes (note, this was not myself...)

> > > In contrast, the /sys/devices/.../power/state API has never had many
> > > users beyond developers trying to test their drivers (without taking
> > > the whole system into a low power state, which probably didn't work
> > > in any case), and has *always* been problematic.  And the change you
> > > object to doesn't "break" anything fundamental, either.  Everything
> > > still works.
> > 
> > It's used on every Ubuntu and Suse system,
> 
> Odd how the relevant Suse developers didn't mention any issues with
> those files going away, any of the times problems with them were
> discussed on the PM list.  Also, I have a Suse system that doesn't
> use those files for anything ... maybe only newer release use it.

I would be very interested to see any newer SuSE programs using that
interface.  Just point them out to me and I'll quickly fix them.

And yes, as a SuSE developer (and one of the people in charge of the
SuSE kernels), I have no problem with these files just going away.
Because, as David keeps repeating, they are broken and wrong.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

2006-12-19 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 19 Dec 2006 21:11:24 -0800

> It was the realtime/normal comments that piqued my interest.
> Perhaps we should either tweak process priority or remove
> the comments.

I mentioned that to Linus once and he said the entire
idea was bogus.

With the recent tcp_recvmsg() preemption issue thread,
I agree with his sentiments even more than I did previously.

What needs to happen is to liberate the locking so that
input packet processing can occur in parallel with
tcp_recvmsg(), instead of doing this bogus backlog thing
which can wedge TCP ACK processing for an entire quantum
if we take a kernel preemption while the process has the
socket lock held.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread David Brownell
On Tuesday 19 December 2006 8:26 pm, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 07:59:42PM -0800, David Brownell wrote:

> The existence of the power/state interface wasn't a bug - it was a 
> deliberate decision to add it. It's the only reason the 
> dpm_runtime_suspend() interface exists. 

All that buggy infrastructure talks together, yes.  Those dpm_*()
calls are in the same "will remove" task item.


> It's perfectly reasonable to  
> refer to it as a flawed interface, or perhaps even a buggy one. But in 
> itself, it's clearly not a bug.

This class of bug is also called a "design bug" or sometimes "mistake".


> > In contrast, the /sys/devices/.../power/state API has never had many
> > users beyond developers trying to test their drivers (without taking
> > the whole system into a low power state, which probably didn't work
> > in any case), and has *always* been problematic.  And the change you
> > object to doesn't "break" anything fundamental, either.  Everything
> > still works.
> 
> It's used on every Ubuntu and Suse system,

Odd how the relevant Suse developers didn't mention any issues with
those files going away, any of the times problems with them were
discussed on the PM list.  Also, I have a Suse system that doesn't
use those files for anything ... maybe only newer release use it.

I've got some Ubuntu going too, which hasn't (visibly) suffered from
any of these changes.

- dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

2006-12-19 Thread Stephen Hemminger
On Tue, 19 Dec 2006 18:55:25 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

> From: Herbert Xu <[EMAIL PROTECTED]>
> Date: Wed, 20 Dec 2006 10:52:19 +1100
> 
> > Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> > > I noticed this bit of discussion in tcp_recvmsg. It implies that a better
> > > queuing policy would be good. But it is confusing English (Alexey?) so
> > > not sure where to start.
> > 
> > Actually I think the comment says that the current code isn't the
> > most elegant but is more efficient.
> 
> It's just explaining the hierarchy of queues that need to
> be purged, and in what order, for correctness.
> 
> Alexey added that code when I mentioned to him, right after
> we added the prequeue, that it was possible process the
> normal backlog before the prequeue, which is illegal.
> In fixing that bug, he added the comment we are discussing.

It was the realtime/normal comments that piqued my interest.
Perhaps we should either tweak process priority or remove
the comments.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Update feature-removal-schedule.txt

2006-12-19 Thread Matthew Garrett
Add pm_has_noirq_stage to feature-removal-schedule as part of the 
/sys/devices/.../power/state removal. Also note that this functionality 
won't be removed until alternative functionality is implemented, in 
order to avoid having this argument again in July.

Signed-off-by: Matthew Garrett <[EMAIL PROTECTED]>

diff --git a/Documentation/feature-removal-schedule.txt 
b/Documentation/feature-removal-schedule.txt
index 30f3c8c..8a91689 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -9,7 +9,8 @@ be removed from this file.
 What:  /sys/devices/.../power/state
dev->power.power_state
dpm_runtime_{suspend,resume)()
-When:  July 2007
+   bus->pm_has_noirq_stage()
+When:  Once alternative functionality has been implemented
 Why:   Broken design for runtime control over driver power states, confusing
driver-internal runtime power management with:  mechanisms to support
system-wide sleep state transitions; event codes that distinguish

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Fix /sys/device/.../power/state

2006-12-19 Thread Matthew Garrett
Recent changes in the PM system made it impossible to perform runtime 
suspend of any PCI or platform devices. This patch restores the 
functionality for any devices that don't require any of their suspend or 
resume code to be run with interrupts disabled.

Signed-off-by: Matthew Garrett <[EMAIL PROTECTED]>

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index f9c903b..6bf1218 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -597,6 +597,16 @@ static int platform_resume(struct device * dev)
return ret;
 }
 
+static int platform_pm_has_noirq_stage(struct device * dev)
+{
+   int ret = 0;
+   struct platform_driver *drv = to_platform_driver(dev->driver);
+
+   if (dev->driver && (drv->resume_early || drv->suspend_late))
+   ret = 1;
+   return ret;
+}
+
 struct bus_type platform_bus_type = {
.name   = "platform",
.dev_attrs  = platform_dev_attrs,
@@ -606,6 +616,7 @@ struct bus_type platform_bus_type = {
.suspend_late   = platform_suspend_late,
.resume_early   = platform_resume_early,
.resume = platform_resume,
+   .pm_has_noirq_stage = platform_pm_has_noirq_stage,
 };
 EXPORT_SYMBOL_GPL(platform_bus_type);
 
diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
index 2d47517..03d3f81 100644
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -46,7 +46,8 @@ static ssize_t state_store(struct device * dev, struct 
device_attribute *attr, c
int error = -EINVAL;
 
/* disallow incomplete suspend sequences */
-   if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
+   if (dev->bus && dev->bus->pm_has_noirq_stage 
+   && dev->bus->pm_has_noirq_stage(dev))
return error;
 
state.event = PM_EVENT_SUSPEND;
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index e5ae3a0..c0e4e7a 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -351,6 +351,17 @@ static int pci_device_resume(struct device * dev)
return error;
 }
 
+static int pci_device_pm_has_noirq_stage(struct device * dev)
+{
+   int error = 0;
+   struct pci_dev * pci_dev = to_pci_dev(dev);
+   struct pci_driver * drv = pci_dev->driver;
+
+   if (drv && (drv->resume_early || drv->suspend_late))
+   error = 1;
+   return error;
+}
+
 static int pci_device_resume_early(struct device * dev)
 {
int error = 0;
@@ -569,6 +580,7 @@ struct bus_type pci_bus_type = {
.suspend_late   = pci_device_suspend_late,
.resume_early   = pci_device_resume_early,
.resume = pci_device_resume,
+   .pm_has_noirq_stage = pci_device_pm_has_noirq_stage,
.shutdown   = pci_device_shutdown,
.dev_attrs  = pci_dev_attrs,
 };
diff --git a/include/linux/device.h b/include/linux/device.h
index 49ab53c..1c663c4 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -59,6 +59,7 @@ struct bus_type {
int (*suspend)(struct device * dev, pm_message_t state);
int (*suspend_late)(struct device * dev, pm_message_t state);
int (*resume_early)(struct device * dev);
+   int (*pm_has_noirq_stage)(struct device * dev);
int (*resume)(struct device * dev);
 };
-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: use is_init()

2006-12-19 Thread Paul Mackerras
Akinobu Mita writes:

> Use is_init() rather than hard coded pid comparison.

What's the context of this patch?  Why is this a good thing to do?

Doing a git grep -w is_init on Linus' current git tree reveals an
is_init() in arch/parisc/kernel/module.c, which looks to be something
different, but no generic definition of an is_init() function or
macro.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] more patches for removable drive bay

2006-12-19 Thread Len Brown
Thanks for removing the new procfs code Kristen.

applied.
-Len

On Saturday 16 December 2006 17:40, Kristen Carlson Accardi wrote:
> Hi Len,
> Here's a set of patches for changing the removable drive bay driver
> (drivers/acpi/bay) from using the old proc interface to using a sysfs
> interface instead.  I made the bay driver a platform driver, and 
> so it's entries will now be located in /sys/devices/platform/bay.X.
> There are still 2 entries - one for checking whether the bay is
> present (present) that is read only, and one that is write only for
> ejecting the bay (eject).  Let me know if you would prefer me to fold
> these into the original bay driver patch.
> 
> Thanks,
> Kristen
> --
> -
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: GPL only modules [was Re: [GIT PATCH] more Driver core patches for 2.6.19]

2006-12-19 Thread Steven Rostedt
On Sun, 2006-12-17 at 11:11 +0100, Geert Uytterhoeven wrote:
> On Thu, 14 Dec 2006, David Schwartz wrote:

> > That makes it clear that it's not about giving us the fruits of years of
> > your own work but that it's about enabling us to do our own work. (I would
> > have no objection to also requiring them to provide a minimal open-source
> > driver. I'm not trying to work out the exact terms here, just get the idea
> > out.)
> 
> Since `works with' may sound a bit too vague, something like
> `LinuxFriendly(tm)', with a happy penguin logo?
> 

I've bought a couple of products lately that had the happy penguin logo
on it. Just to find out that they only applied a bare minimum
functionality of the device for Linux. If you want more, you need to
plug it into a Windows box.

Funny, if you own a Mac, it had the same problem. It had a little more
functionality than the Linux port, but still far from what they give for
Windows.

I like the Open Hardware thing that Paolo mentioned.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread Matthew Garrett
On Tue, Dec 19, 2006 at 07:59:42PM -0800, David Brownell wrote:
> On Tuesday 19 December 2006 4:25 pm, Matthew Garrett wrote:
> > 1) feature-removal-schedule.txt says that it'll be removed in July 2007. 
> > This isn't July 2007.
> 
> Which is why the functionality is still there.

Merely broken in the majority of cases...

> > 2) The functionality was disabled in 2.6.19. The addition to 
> > feature-removal-schedule.txt was in, uh, 2.6.19.
> 
> Please respond to the technical explanation I provided, and stop
> referring to the functionality ** which is still there and works **
> as being disabled.

The breakage is that devices that are happy to suspend with enabled 
interrupts can no longer be suspended from userspace. Refusing to 
suspend a single device on the basis that some other driver on the bus 
may, potentially, at some point require some suspend code to be run with 
disabled interrupts is not a sensible choice. Especially since I can't 
actually find a single driver in the kernel tree that currently uses 
this functionality.

> I can't help it if that schedule.txt patch took until 2.6.19 to get
> upstream; ISTR it was available before 2.6.18 shipped.  Maybe patches
> to that file should be accelerated, even into the stable series.

That would still not have provided anywhere near enough warning. 

> One of the missing steps in Linus' formulation there is that not all
> interfaces are equivalent in terms of support guarantee.  Bugs are
> interfaces, for example, and sometimes folk wrongly depend on them
> when they persist for a long time (like, cough, this one).

The existence of the power/state interface wasn't a bug - it was a 
deliberate decision to add it. It's the only reason the 
dpm_runtime_suspend() interface exists. It's perfectly reasonable to 
refer to it as a flawed interface, or perhaps even a buggy one. But in 
itself, it's clearly not a bug. And it's perfectly reasonable for 
userland to depend on interfaces that are deliberately exposed by the 
kernel.

> In contrast, the /sys/devices/.../power/state API has never had many
> users beyond developers trying to test their drivers (without taking
> the whole system into a low power state, which probably didn't work
> in any case), and has *always* been problematic.  And the change you
> object to doesn't "break" anything fundamental, either.  Everything
> still works.

It's used on every Ubuntu and Suse system, and the change means that 
certain functionality no longer works - it's now impossible to prevent 
my wireless hardware from drawing power when I'm not using it, for 
example. If the WE power operations were deliberately disabled, then 
that would also be a bug.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] Add driver for OHCI firewire host controllers.

2006-12-19 Thread Robert Hancock

Kristian Høgsberg wrote:

Signed-off-by: Kristian Hoegsberg <[EMAIL PROTECTED]>
---
 drivers/firewire/Kconfig   |   11 
 drivers/firewire/Makefile  |1 
 drivers/firewire/fw-ohci.c | 1394 

 drivers/firewire/fw-ohci.h |  152 +
 4 files changed, 1558 insertions(+), 0 deletions(-)



..


+static struct pci_driver fw_ohci_pci_driver = {
+   .name   = ohci_driver_name,
+   .id_table   = pci_table,
+   .probe  = pci_probe,
+   .remove = pci_remove,
+};


How about suspend/resume support? Lots of laptops have OHCI 1394 and 
full suspend/resume support is something that the current ohci1394 
driver lacks.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread David Brownell
On Tuesday 19 December 2006 7:43 pm, Matthew Garrett wrote:

> > Do you have an alternate solution?
> 
> How about something like this? Entirely untested, but I think it shows 
> the basic idea.

Other than indentation/whitespace bugs, it seems to encapsulate the
layering violation needed to get those deprecated files working again
for PCI (and platform_bus).   I'd rename the new bus method though;
maybe "pm_has_noirq_stage()" or somesuch.  Your name is so generic that
it'd be a surprise if the answer were ever "no"!

You should also list this new call in the feature-removal.txt entry for
stuff that gets removed with /sys/devices/.../power/state files, since
it's another mechanism that only exists to prop up that broken API,
and should vanish at the same time that API does.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread David Brownell
On Tuesday 19 December 2006 4:25 pm, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 01:34:49PM -0800, David Brownell wrote:
> 
> > Documentation/feature-removal-schedule.txt has warned about this since
> > August, and the PM list has discussed how broken that model is numerous
> > times over the past several years.  (I'm pretty sure that discussion has
> > leaked out to LKML on occasion.)  It shouldn't be news today.
> 
> 1) feature-removal-schedule.txt says that it'll be removed in July 2007. 
> This isn't July 2007.

Which is why the functionality is still there.


> 2) The functionality was disabled in 2.6.19. The addition to 
> feature-removal-schedule.txt was in, uh, 2.6.19.

Please respond to the technical explanation I provided, and stop
referring to the functionality ** which is still there and works **
as being disabled.

The fact that PCI exposes a mechanism that conflicts with that is
a separate issue.

Whining does not help.

I can't help it if that schedule.txt patch took until 2.6.19 to get
upstream; ISTR it was available before 2.6.18 shipped.  Maybe patches
to that file should be accelerated, even into the stable series.

 
> 3) "The whole _point_ of a kernel is to act as a abstraction layer and 
> resource management between user programs and hardware/outside world. 
> That's why kernels _exist_. Breaking user-land API's is thus by 
> definition something totally idiotic.
> 
> If you need to break something, you create a new interface, and try to 
> translate between the two, and maybe you deprecate the old one so that 
> it can be removed once it's not in use any more. If you can't see that 
> this is how a kernel should work, you're missing the point of having a 
> kernel in the first place."
> 
> Linus, http://lkml.org/lkml/2006/10/4/327

So I'm amused that the problem you refer to is the direct consequence
of Linus' patch to add the suspend_late()/resume_early() mechanism
into the PCI driver framework.  (Again, see the technical explanation;
and please try to have a technical discussion, not a flamefest.)


One of the missing steps in Linus' formulation there is that not all
interfaces are equivalent in terms of support guarantee.  Bugs are
interfaces, for example, and sometimes folk wrongly depend on them
when they persist for a long time (like, cough, this one).

His comment was specifically about breaking a widely used API that
many people have been relying on since, oh, about 1996, and had been
well proven in that time.  And the change was a "system doesn't work"
level change.

In contrast, the /sys/devices/.../power/state API has never had many
users beyond developers trying to test their drivers (without taking
the whole system into a low power state, which probably didn't work
in any case), and has *always* been problematic.  And the change you
object to doesn't "break" anything fundamental, either.  Everything
still works.

In terms of any reasonable expectations about support, those two
changes aren't comparable.

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add pci class code for SATA

2006-12-19 Thread Randy Dunlap
On Wed, 20 Dec 2006 11:52:44 +0800 Conke Hu wrote:

> On 12/20/06, Conke Hu <[EMAIL PROTECTED]> wrote:
> > On 12/20/06, Conke Hu <[EMAIL PROTECTED]> wrote:
> > > On 12/20/06, Jeff Garzik <[EMAIL PROTECTED]> wrote:
> > > > Conke Hu wrote:
> > > > > Add pci class code 0x0106 for SATA to pci_ids.h
> > > > >
> > > > > signed-off-by: [EMAIL PROTECTED]
> > > > > 
> > > > > --- linux-2.6.20-rc1/include/linux/pci_ids.h.orig 2006-12-20
> > > > > 01:58:30.0 +0800
> > > > > +++ linux-2.6.20-rc1/include/linux/pci_ids.h  2006-12-20
> > > > > 01:59:07.0 +0800
> > > > > @@ -15,6 +15,7 @@
> > > > >  #define PCI_CLASS_STORAGE_FLOPPY 0x0102
> > > > >  #define PCI_CLASS_STORAGE_IPI0x0103
> > > > >  #define PCI_CLASS_STORAGE_RAID   0x0104
> > > > > +#define PCI_CLASS_STORAGE_SATA   0x0106
> > > > >  #define PCI_CLASS_STORAGE_SAS0x0107
> > > > >  #define PCI_CLASS_STORAGE_OTHER  0x0180
> > > >
> > > > Two comments:
> > > >
> > > > 1) I think "_SATA" is an inaccurate description.  It should be _AHCI 
> > > > AFAICS.
> > > >
> > > > 2) Typically we don't add constants unless they are used somewhere...
> > > >
> > > > Jeff
> > > >
> > >
> > > Hi Jeff,
> > > According to PCI spec 3.0, 0x0106 means SATA controller, 0x010601
> > > means AHCI and 0x010600 means vendor specific SATA controller. Pls see
> > > the following table (PCI spec 3.0 P296):
> > >
> > > Base Class  Sub-Class   Interface   Meaning
> > > 
> > > 00h 00h SCSI bus controller
> > > 
> > > 01h xxh IDE controller
> > > ---
> > > 02h 00h Floppy disk controller
> > > -
> > > 03h 00h IPI bus controller
> > > --
> > > 04h 00h RAID controller
> > > 01h 
> > > 20h ATA controller with ADMA 
> > > interface
> > > 05h 
> > > ---
> > > 30h ATA controller with ADMA 
> > > interface
> > > 
> > > ---
> > > 00h Serial ATA 
> > > controller–vendor specific interface
> > > 06h 
> > > -
> > > 01h Serial ATA 
> > > controller–AHCI 1.0 interface
> > > 
> > > -
> > > 07h 00h Serial Attached SCSI 
> > > (SAS) controller
> > > 
> > > -
> > > 80h 00h Other mass storage 
> > > controller
> > > --
> > >
> > >
> > > So, I think, the following macro is correct:
> > > #define PCI_CLASS_STORAGE_SATA   0x0106
> > > If you would define AHCI class code, it should be 0x010601, not 0x0106:
> > > #define PCI_CLASS_STORAGE_SATA_AHCI   0x010601
> > >
> > > And, I think that PCI_CLASS_STORAGE_SATA had better be added to
> > > pci_ids.h since the class code 0x0106 is used more than once. e.g.
> > > ahci.c uses the magic number 0x0106 twice, and it might be used more
> > > in future.
> > >
> > > Best regards,
> > > Conke
> > >
> >
> >
> > Here is a patch to show more details:
> > ---
> > diff -Nur linux-2.6.20-rc1.orig/drivers/ata/ahci.c
> > linux-2.6.20-rc1/drivers/ata/ahci.c
> > --- linux-2.6.20-rc1.orig/drivers/ata/ahci.c2006-12-20 
> > 10:25:00.0 +0800
> > +++ linux-2.6.20-rc1/drivers/ata/ahci.c 2006-12-20 10:13:24.0 +0800
> > @@ -418,7 +418,7 @@
> >
> > /* Generic, PCI class code for AHCI */
> > { PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
> > - 0x010601, 0xff, board_ahci },
> > + PCI_CLASS_STORAGE_SATA<<8|1, 0xff, board_ahci },
> >
> > { } /* terminate list */
> >  };
> > @@ -1586,11 +1586,11 @@
> > speed_s = "?";
> >
> > pci_read_config_word(pdev, 0x0a, );
> > -   if (cc == 0x0101)
> > +   if (cc == PCI_CLASS_STORAGE_IDE)
> > scc_s = "IDE";
> > -   else if (cc == 0x0106)
> > +   else 

Re: [PATCH] Add pci class code for SATA

2006-12-19 Thread Conke Hu

On 12/20/06, Conke Hu <[EMAIL PROTECTED]> wrote:

On 12/20/06, Conke Hu <[EMAIL PROTECTED]> wrote:
> On 12/20/06, Jeff Garzik <[EMAIL PROTECTED]> wrote:
> > Conke Hu wrote:
> > > Add pci class code 0x0106 for SATA to pci_ids.h
> > >
> > > signed-off-by: [EMAIL PROTECTED]
> > > 
> > > --- linux-2.6.20-rc1/include/linux/pci_ids.h.orig 2006-12-20
> > > 01:58:30.0 +0800
> > > +++ linux-2.6.20-rc1/include/linux/pci_ids.h  2006-12-20
> > > 01:59:07.0 +0800
> > > @@ -15,6 +15,7 @@
> > >  #define PCI_CLASS_STORAGE_FLOPPY 0x0102
> > >  #define PCI_CLASS_STORAGE_IPI0x0103
> > >  #define PCI_CLASS_STORAGE_RAID   0x0104
> > > +#define PCI_CLASS_STORAGE_SATA   0x0106
> > >  #define PCI_CLASS_STORAGE_SAS0x0107
> > >  #define PCI_CLASS_STORAGE_OTHER  0x0180
> >
> > Two comments:
> >
> > 1) I think "_SATA" is an inaccurate description.  It should be _AHCI AFAICS.
> >
> > 2) Typically we don't add constants unless they are used somewhere...
> >
> > Jeff
> >
>
> Hi Jeff,
> According to PCI spec 3.0, 0x0106 means SATA controller, 0x010601
> means AHCI and 0x010600 means vendor specific SATA controller. Pls see
> the following table (PCI spec 3.0 P296):
>
> Base Class  Sub-Class   Interface   Meaning
> 
> 00h 00h SCSI bus controller
> 
> 01h xxh IDE controller
> ---
> 02h 00h Floppy disk controller
> -
> 03h 00h IPI bus controller
> --
> 04h 00h RAID controller
> 01h 
> 20h ATA controller with ADMA 
interface
> 05h 
---
> 30h ATA controller with ADMA 
interface
> 
---
> 00h Serial ATA controller–vendor 
specific interface
> 06h 
-
> 01h Serial ATA controller–AHCI 
1.0 interface
> 
-
> 07h 00h Serial Attached SCSI (SAS) 
controller
> 
-
> 80h 00h Other mass storage controller
> --
>
>
> So, I think, the following macro is correct:
> #define PCI_CLASS_STORAGE_SATA   0x0106
> If you would define AHCI class code, it should be 0x010601, not 0x0106:
> #define PCI_CLASS_STORAGE_SATA_AHCI   0x010601
>
> And, I think that PCI_CLASS_STORAGE_SATA had better be added to
> pci_ids.h since the class code 0x0106 is used more than once. e.g.
> ahci.c uses the magic number 0x0106 twice, and it might be used more
> in future.
>
> Best regards,
> Conke
>


Here is a patch to show more details:
---
diff -Nur linux-2.6.20-rc1.orig/drivers/ata/ahci.c
linux-2.6.20-rc1/drivers/ata/ahci.c
--- linux-2.6.20-rc1.orig/drivers/ata/ahci.c2006-12-20 10:25:00.0 
+0800
+++ linux-2.6.20-rc1/drivers/ata/ahci.c 2006-12-20 10:13:24.0 +0800
@@ -418,7 +418,7 @@

/* Generic, PCI class code for AHCI */
{ PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
- 0x010601, 0xff, board_ahci },
+ PCI_CLASS_STORAGE_SATA<<8|1, 0xff, board_ahci },

{ } /* terminate list */
 };
@@ -1586,11 +1586,11 @@
speed_s = "?";

pci_read_config_word(pdev, 0x0a, );
-   if (cc == 0x0101)
+   if (cc == PCI_CLASS_STORAGE_IDE)
scc_s = "IDE";
-   else if (cc == 0x0106)
+   else if (cc == PCI_CLASS_STORAGE_SATA)
scc_s = "SATA";
-   else if (cc == 0x0104)
+   else if (cc == PCI_CLASS_STORAGE_RAID)
scc_s = "RAID";
else
scc_s = "unknown";
diff -Nur linux-2.6.20-rc1.orig/include/linux/pci_ids.h
linux-2.6.20-rc1/include/linux/pci_ids.h
--- linux-2.6.20-rc1.orig/include/linux/pci_ids.h   2006-12-20
10:24:51.0 +0800
+++ linux-2.6.20-rc1/include/linux/pci_ids.h2006-12-20 10:08:15.0 
+0800
@@ -15,6 +15,7 @@
 

Re: Changes to sysfs PM layer break userspace

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 18:35:39 -0800
Randy Dunlap <[EMAIL PROTECTED]> wrote:

> On Tue, 19 Dec 2006 18:15:24 -0800 Andrew Morton wrote:
> 
> > On Tue, 19 Dec 2006 13:34:49 -0800
> > David Brownell <[EMAIL PROTECTED]> wrote:
> > 
> > > Documentation/feature-removal-schedule.txt has warned about this since
> > > August
> > 
> > Nobody reads that.
> 
> Ugh, I read it.
> 
> > Please, wherever possible, put a nice printk("this is going away") in the 
> > code
> > when planning these things.
> 
> Can notices go in both places, or is in the source code (printk)
> now the preferred way?

I think printks grab a lot more attention.  It's not surprising that people
get surprised when the feature they're using goes away.

Plus they may not even know that that they're using the feature.  A printk
fixes that.

> I think that we can point people to Doc/feature-removal-schedule.txt
> easier (and more effectively) than we can source code (or noisy kernel
> logs).

Hopefully developers who see the printk will think to look in
feature-removal-schedule.txt for more details.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


netif_poll_enable() & barrier

2006-12-19 Thread Benjamin Herrenschmidt
Hi !

I stumbled accross what might be a bug on out of order architecture:

netif_poll_enable() only does a clear_bit(). However,
netif_poll_disable/enable pairs are often used as simili-spinlocks.

(netif_poll_enable() has pretty much spin_lock semantics except that it
schedules instead of looping).

Thus, shouldn't netif_poll_disable() do an smp_wmb(); before clearing
the bit to make sure that any stores done within the poll-disabled
section are properly visible to the rest of the system before clearing
the bit ?

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread Matthew Garrett
On Tue, Dec 19, 2006 at 07:19:36PM -0800, David Brownell wrote:
> On Tuesday 19 December 2006 4:09 pm, Matthew Garrett wrote:
> > I'm sorry, which bit of "Don't break userspace API without adequate 
> > prior warning and with a workable replacement" is difficult to 
> > understand?
> 
> What part of "it was already broken" do YOU not understand?  The
> whole notion is unsustainable.  It doesn't work cross-platform, or
> for multiple bus types.  It confuses system-wide suspend mechanisms
> with runtime mechanisms.  It breaks guaranteed parent/child ordering
> of suspend/resume calls.  (And more...)

Linux is utterly riddled with broken APIs. It's possible to see that as 
a downside of the "Release early, release often" model, but the 
advantage is that we get the opportunity to determine how these 
interfaces are broken. Based on that, we can either improve the existing 
interface or decide that it's broken beyond repair and design a new one.

What we don't do is decide that an interface is broken, deprecate it 
and in the same release break it even for the cases where it 
previously worked. That's just insane.

> Let us know when you get tired of whining and want to move on to
> getting a real solution to the set of problems here.  I've pointed
> out that reverting Linus' patch would be one option to get your
> short term issue rsolved ... that would remove a capability from
> PCI drivers, but you could then use that deprecated mechanism.
> I've also pointed out that you could start working towards a real
> long term solution.

I could, and in the long run I intend to. On the other hand, I don't 
expect to have enough time to fix every single in-tree network driver 
before 2.6.20, so...

> Do you have an alternate solution?

How about something like this? Entirely untested, but I think it shows 
the basic idea.

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index f9c903b..4865918 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -597,6 +597,17 @@ static int platform_resume(struct device * dev)
return ret;
 }
 
+static int platform_requires_disabled_interrupts(struct device * dev)
+{
+   int ret = 0;
+
+   if (dev->driver && (dev->driver->resume_early 
+   || dev->driver->suspend_late))
+   ret = 1;
+
+   return ret;
+}
+
 struct bus_type platform_bus_type = {
.name   = "platform",
.dev_attrs  = platform_dev_attrs,
@@ -604,8 +615,9 @@ struct bus_type platform_bus_type = {
.uevent = platform_uevent,
.suspend= platform_suspend,
.suspend_late   = platform_suspend_late,
-   .resume_early   = platform_resume_early,
+   .resume_early   = platform_resume_early,
.resume = platform_resume,
+   .requires_disabled_interrupts = platform_requires_disabled_interrupts,
 };
 EXPORT_SYMBOL_GPL(platform_bus_type);
 
diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
index 2d47517..97c6d65 100644
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -46,7 +46,8 @@ static ssize_t state_store(struct device * dev, struct 
device_attribute *attr, c
int error = -EINVAL;
 
/* disallow incomplete suspend sequences */
-   if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
+   if (dev->bus && dev->bus->requires_disabled_interrupts 
+   && dev->bus->requries_disabled_interrupts())
return error;
 
state.event = PM_EVENT_SUSPEND;
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index e5ae3a0..9808d42 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -351,6 +351,18 @@ static int pci_device_resume(struct device * dev)
return error;
 }
 
+static int pci_device_requires_disabled_interrupts(struct device * dev)
+{
+   int error = 0;
+   struct pci_dev * pci_dev = to_pci_dev(dev);
+   struct pci_driver * drv = pci_dev->driver;
+
+   if (drv && (drv->resume_early || drv_suspend_late))
+   error = 1;
+
+   return error;
+}
+
 static int pci_device_resume_early(struct device * dev)
 {
int error = 0;
@@ -569,6 +581,7 @@ struct bus_type pci_bus_type = {
.suspend_late   = pci_device_suspend_late,
.resume_early   = pci_device_resume_early,
.resume = pci_device_resume,
+   .requires_disabled_interrupts = pci_requires_disabled_interrupts,
.shutdown   = pci_device_shutdown,
.dev_attrs  = pci_dev_attrs,
 };
diff --git a/include/linux/device.h b/include/linux/device.h
index 49ab53c..0686234 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -59,6 +59,7 @@ struct bus_type {
int (*suspend)(struct device * dev, pm_message_t state);
int (*suspend_late)(struct device * dev, pm_message_t state);
int (*resume_early)(struct device * dev);
+   int 

Re: [PATCH] Add pci class code for SATA

2006-12-19 Thread Conke Hu

On 12/20/06, Conke Hu <[EMAIL PROTECTED]> wrote:

On 12/20/06, Jeff Garzik <[EMAIL PROTECTED]> wrote:
> Conke Hu wrote:
> > Add pci class code 0x0106 for SATA to pci_ids.h
> >
> > signed-off-by: [EMAIL PROTECTED]
> > 
> > --- linux-2.6.20-rc1/include/linux/pci_ids.h.orig 2006-12-20
> > 01:58:30.0 +0800
> > +++ linux-2.6.20-rc1/include/linux/pci_ids.h  2006-12-20
> > 01:59:07.0 +0800
> > @@ -15,6 +15,7 @@
> >  #define PCI_CLASS_STORAGE_FLOPPY 0x0102
> >  #define PCI_CLASS_STORAGE_IPI0x0103
> >  #define PCI_CLASS_STORAGE_RAID   0x0104
> > +#define PCI_CLASS_STORAGE_SATA   0x0106
> >  #define PCI_CLASS_STORAGE_SAS0x0107
> >  #define PCI_CLASS_STORAGE_OTHER  0x0180
>
> Two comments:
>
> 1) I think "_SATA" is an inaccurate description.  It should be _AHCI AFAICS.
>
> 2) Typically we don't add constants unless they are used somewhere...
>
> Jeff
>

Hi Jeff,
According to PCI spec 3.0, 0x0106 means SATA controller, 0x010601
means AHCI and 0x010600 means vendor specific SATA controller. Pls see
the following table (PCI spec 3.0 P296):

Base Class  Sub-Class   Interface   Meaning

00h 00h SCSI bus controller

01h xxh IDE controller
---
02h 00h Floppy disk controller
-
03h 00h IPI bus controller
--
04h 00h RAID controller
01h 
20h ATA controller with ADMA 
interface
05h 
---
30h ATA controller with ADMA 
interface

---
00h Serial ATA controller–vendor 
specific interface
06h 
-
01h Serial ATA controller–AHCI 1.0 
interface

-
07h 00h Serial Attached SCSI (SAS) 
controller

-
80h 00h Other mass storage controller
--


So, I think, the following macro is correct:
#define PCI_CLASS_STORAGE_SATA   0x0106
If you would define AHCI class code, it should be 0x010601, not 0x0106:
#define PCI_CLASS_STORAGE_SATA_AHCI   0x010601

And, I think that PCI_CLASS_STORAGE_SATA had better be added to
pci_ids.h since the class code 0x0106 is used more than once. e.g.
ahci.c uses the magic number 0x0106 twice, and it might be used more
in future.

Best regards,
Conke




Here is a patch to show more details:
---
diff -Nur linux-2.6.20-rc1.orig/drivers/ata/ahci.c
linux-2.6.20-rc1/drivers/ata/ahci.c
--- linux-2.6.20-rc1.orig/drivers/ata/ahci.c2006-12-20 10:25:00.0 
+0800
+++ linux-2.6.20-rc1/drivers/ata/ahci.c 2006-12-20 10:13:24.0 +0800
@@ -418,7 +418,7 @@

/* Generic, PCI class code for AHCI */
{ PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
- 0x010601, 0xff, board_ahci },
+ PCI_CLASS_STORAGE_SATA<<8|1, 0xff, board_ahci },

{ } /* terminate list */
};
@@ -1586,11 +1586,11 @@
speed_s = "?";

pci_read_config_word(pdev, 0x0a, );
-   if (cc == 0x0101)
+   if (cc == PCI_CLASS_STORAGE_IDE)
scc_s = "IDE";
-   else if (cc == 0x0106)
+   else if (cc == PCI_CLASS_STORAGE_SATA)
scc_s = "SATA";
-   else if (cc == 0x0104)
+   else if (cc == PCI_CLASS_STORAGE_RAID)
scc_s = "RAID";
else
scc_s = "unknown";
diff -Nur linux-2.6.20-rc1.orig/include/linux/pci_ids.h
linux-2.6.20-rc1/include/linux/pci_ids.h
--- linux-2.6.20-rc1.orig/include/linux/pci_ids.h   2006-12-20
10:24:51.0 +0800
+++ linux-2.6.20-rc1/include/linux/pci_ids.h2006-12-20 10:08:15.0 
+0800
@@ -15,6 +15,7 @@
#define PCI_CLASS_STORAGE_FLOPPY0x0102
#define PCI_CLASS_STORAGE_IPI   0x0103
#define PCI_CLASS_STORAGE_RAID  0x0104
+#define PCI_CLASS_STORAGE_SATA 

[PATCH] add .mailmap for proper git-shortlog output

2006-12-19 Thread Nicolas Pitre
This list has been ripped out of the latest git-shortlog tool. It can be 
maintained separately so this is what this patch does. A couple more 
entries were added to the original list as well.

Signed-off-by: Nicolas Pitre <[EMAIL PROTECTED]>

---

diff --git a/.mailmap b/.mailmap
new file mode 100644
index 000..016b861
--- /dev/null
+++ b/.mailmap
@@ -0,0 +1,96 @@
+#
+# This list is used by git-shortlog to fix a few botched name translations
+# in the git archive, either because the author's full name was messed up
+# and/or not always written the same way, making contributions from the
+# same person appearing not to be so or badly displayed.
+#
+# repo-abbrev: /pub/scm/linux/kernel/git/
+#
+
+Aaron Durbin <[EMAIL PROTECTED]>
+Adam Oldham <[EMAIL PROTECTED]>
+Adam Radford <[EMAIL PROTECTED]>
+Adrian Bunk <[EMAIL PROTECTED]>
+Alan Cox <[EMAIL PROTECTED]>
+Alan Cox <[EMAIL PROTECTED]>
+Aleksey Gorelov <[EMAIL PROTECTED]>
+Al Viro <[EMAIL PROTECTED]>
+Al Viro <[EMAIL PROTECTED]>
+Andreas Herrmann <[EMAIL PROTECTED]>
+Andrew Morton <[EMAIL PROTECTED]>
+Andrew Vasquez <[EMAIL PROTECTED]>
+Andy Adamson <[EMAIL PROTECTED]>
+Arnaud Patard <[EMAIL PROTECTED]>
+Arnd Bergmann <[EMAIL PROTECTED]>
+Axel Dyks <[EMAIL PROTECTED]>
+Ben Gardner <[EMAIL PROTECTED]>
+Ben M Cahill <[EMAIL PROTECTED]>
+Björn Steinbrink <[EMAIL PROTECTED]>
+Brian Avery <[EMAIL PROTECTED]>
+Brian King <[EMAIL PROTECTED]>
+Christoph Hellwig <[EMAIL PROTECTED]>
+Corey Minyard <[EMAIL PROTECTED]>
+David Brownell <[EMAIL PROTECTED]>
+David Woodhouse <[EMAIL PROTECTED]>
+Domen Puncer <[EMAIL PROTECTED]>
+Douglas Gilbert <[EMAIL PROTECTED]>
+Ed L. Cashin <[EMAIL PROTECTED]>
+Evgeniy Polyakov <[EMAIL PROTECTED]>
+Felipe W Damasio <[EMAIL PROTECTED]>
+Felix Kuhling <[EMAIL PROTECTED]>
+Felix Moeller <[EMAIL PROTECTED]>
+Filipe Lautert <[EMAIL PROTECTED]>
+Franck Bui-Huu <[EMAIL PROTECTED]>
+Frank Zago <[EMAIL PROTECTED]>
+Greg Kroah-Hartman <[EMAIL PROTECTED](none)>
+Greg Kroah-Hartman <[EMAIL PROTECTED]>
+Greg Kroah-Hartman <[EMAIL PROTECTED]>
+Henk Vergonet <[EMAIL PROTECTED]>
+Henrik Kretzschmar <[EMAIL PROTECTED]>
+Herbert Xu <[EMAIL PROTECTED]>
+Jacob Shin <[EMAIL PROTECTED]>
+James Bottomley <[EMAIL PROTECTED](none)>
+James Bottomley <[EMAIL PROTECTED]>
+James E Wilson <[EMAIL PROTECTED]>
+James Ketrenos <[EMAIL PROTECTED](none)>
+Jean Tourrilhes <[EMAIL PROTECTED]>
+Jeff Garzik <[EMAIL PROTECTED]>
+Jens Axboe <[EMAIL PROTECTED]>
+Jens Osterkamp <[EMAIL PROTECTED]>
+John Stultz <[EMAIL PROTECTED]>
+Juha Yrjola 
+Juha Yrjola <[EMAIL PROTECTED]>
+Juha Yrjola <[EMAIL PROTECTED]>
+Kay Sievers <[EMAIL PROTECTED]>
+Kenneth W Chen <[EMAIL PROTECTED]>
+Koushik <[EMAIL PROTECTED]>
+Leonid I Ananiev <[EMAIL PROTECTED]>
+Linas Vepstas <[EMAIL PROTECTED]>
+Matthieu CASTET <[EMAIL PROTECTED]>
+Michel Dänzer <[EMAIL PROTECTED]>
+Mitesh shah <[EMAIL PROTECTED]>
+Morten Welinder <[EMAIL PROTECTED]>
+Morten Welinder <[EMAIL PROTECTED]>
+Morten Welinder <[EMAIL PROTECTED]>
+Morten Welinder <[EMAIL PROTECTED]>
+Nguyen Anh Quynh <[EMAIL PROTECTED]>
+Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
+Patrick Mochel <[EMAIL PROTECTED]>
+Peter A Jonsson <[EMAIL PROTECTED]>
+Praveen BP <[EMAIL PROTECTED]>
+Rajesh Shah <[EMAIL PROTECTED]>
+Ralf Baechle <[EMAIL PROTECTED]>
+Ralf Wildenhues <[EMAIL PROTECTED]>
+Rémi Denis-Courmont <[EMAIL PROTECTED]>
+Rudolf Marek <[EMAIL PROTECTED]>
+Rui Saraiva <[EMAIL PROTECTED]>
+Sachin P Sant <[EMAIL PROTECTED]>
+Sam Ravnborg <[EMAIL PROTECTED]>
+Simon Kelley <[EMAIL PROTECTED]>
+Stéphane Witzmann <[EMAIL PROTECTED]>
+Stephen Hemminger <[EMAIL PROTECTED]>
+Tejun Heo <[EMAIL PROTECTED]>
+Thomas Graf <[EMAIL PROTECTED]>
+Tony Luck <[EMAIL PROTECTED]>
+Tsuneo Yoshioka <[EMAIL PROTECTED]>
+Valdis Kletnieks <[EMAIL PROTECTED]>


Re: [2.6 patch] drivers/atm/fore200e.c: cleanups

2006-12-19 Thread David Miller
From: Adrian Bunk <[EMAIL PROTECTED]>
Date: Tue, 19 Dec 2006 05:12:58 +0100

> This patch contains the following transformations from custom functions 
> to standard kernel version:
> - fore200e_kmalloc() -> kzalloc()
> - fore200e_kfree() -> kfree()
> - fore200e_swap() -> cpu_to_be32()
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Looks good, applied, thanks Adrian.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] drivers/atm/Kconfig: remove dead ATM_TNETA1570 option

2006-12-19 Thread David Miller
From: Adrian Bunk <[EMAIL PROTECTED]>
Date: Tue, 19 Dec 2006 05:13:00 +0100

> This patch removes the unconverted ATM_TNETA1570 option that also lacks 
> any code in the kernel.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Applied, thanks Adrian.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: schedule_timeout: wrong timeout value

2006-12-19 Thread kyle


- Original Message - 
From: "Robert Hancock" <[EMAIL PROTECTED]>

To: "kyle" <[EMAIL PROTECTED]>
Cc: 
Sent: Tuesday, December 19, 2006 10:34 AM
Subject: Re: schedule_timeout: wrong timeout value



kyle wrote:

Hi,

Recently my mysql servershows something like:
Dec 18 18:24:05 sql kernel: schedule_timeout: wrong timeout value 
 from c0284efd

Dec 18 18:24:36 sql last message repeated 19939 times
Dec 18 18:25:37 sql last message repeated 33392 times



The message means some code in the kernel or in some module passed a 
negative value to schedule_timeout which it shouldn't have. The c0284efd 
value is the address of the function that made the call - you may be able 
to look that up in your /proc/ksyms or the System.map file and figure out 
what function that is..


There was no module loaded, and unfortunlately, I cannot find the System.map 
or /proc/ksyms file for the affected kernel!
Anyway thank you for your explanation. I have upgraded the kernel to 
2.6.17.14 and wish that it can fix the problem. Thank you


Kyle 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to sysfs PM layer break userspace

2006-12-19 Thread David Brownell
On Tuesday 19 December 2006 6:15 pm, Andrew Morton wrote:
> On Tue, 19 Dec 2006 13:34:49 -0800
> David Brownell <[EMAIL PROTECTED]> wrote:
> 
> > Documentation/feature-removal-schedule.txt has warned about this since
> > August
> 
> Nobody reads that.
> 
> Please, wherever possible, put a nice printk("this is going away") in the code
> when planning these things.


Signed-off-by: David Brownell <[EMAIL PROTECTED]>

Index: g26/drivers/base/power/sysfs.c
===
--- g26.orig/drivers/base/power/sysfs.c 2006-09-27 16:19:00.0 -0700
+++ g26/drivers/base/power/sysfs.c  2006-12-19 19:27:25.0 -0800
@@ -42,9 +42,17 @@ static ssize_t state_show(struct device 
 
 static ssize_t state_store(struct device * dev, struct device_attribute *attr, 
const char * buf, size_t n)
 {
+   static int warned;
pm_message_t state;
int error = -EINVAL;
 
+   if (!warned) {
+   printk(KERN_WARNING
+   "*** WARNING *** sysfs devices/.../power/state files "
+   "are only for testing, and will be removed\n");
+   warned = error;
+   }
+
/* disallow incomplete suspend sequences */
if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
return error;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread David Brownell
On Tuesday 19 December 2006 4:09 pm, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 03:36:28PM -0800, David Brownell wrote:
> > On Tuesday 19 December 2006 2:57 pm, Matthew Garrett wrote:
> > > The fact that something is scheduled to be removed in July 2007 does 
> > > *not* mean it's acceptable to break it in 2006. We need to find a way to 
> > > fix this functionality in the meantime.
> > 
> > The disconnect here is analagous to:  I tell you the alleged perpetual
> > motion machine never worked, and can't ever work; and you push back and
> > say that you need a perpetual motion machine that works, NOW please,
> > because you need something that pushes those widgets around.  (There are
> > better ways to push widgets than side effects of a broken machine...)
> 
> But it *did* work. 

Having been on the other side ... I can testify that if you
think it actually worked, it's because you're ignoring all
the nasty failure modes.


> > I'd not be keen on reverting Linus' patch [1] myself, even though few
> > drivers have started to use that mechanism yet; that would be a step
> > backwards, and would perpetuate users of that broken sysfs file.
> 
> I'm sorry, which bit of "Don't break userspace API without adequate 
> prior warning and with a workable replacement" is difficult to 
> understand?

What part of "it was already broken" do YOU not understand?  The
whole notion is unsustainable.  It doesn't work cross-platform, or
for multiple bus types.  It confuses system-wide suspend mechanisms
with runtime mechanisms.  It breaks guaranteed parent/child ordering
of suspend/resume calls.  (And more...)


Let us know when you get tired of whining and want to move on to
getting a real solution to the set of problems here.  I've pointed
out that reverting Linus' patch would be one option to get your
short term issue rsolved ... that would remove a capability from
PCI drivers, but you could then use that deprecated mechanism.
I've also pointed out that you could start working towards a real
long term solution.

Do you have an alternate solution?

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add pci class code for SATA

2006-12-19 Thread Conke Hu

On 12/20/06, Jeff Garzik <[EMAIL PROTECTED]> wrote:

Conke Hu wrote:
> Add pci class code 0x0106 for SATA to pci_ids.h
>
> signed-off-by: [EMAIL PROTECTED]
> 
> --- linux-2.6.20-rc1/include/linux/pci_ids.h.orig 2006-12-20
> 01:58:30.0 +0800
> +++ linux-2.6.20-rc1/include/linux/pci_ids.h  2006-12-20
> 01:59:07.0 +0800
> @@ -15,6 +15,7 @@
>  #define PCI_CLASS_STORAGE_FLOPPY 0x0102
>  #define PCI_CLASS_STORAGE_IPI0x0103
>  #define PCI_CLASS_STORAGE_RAID   0x0104
> +#define PCI_CLASS_STORAGE_SATA   0x0106
>  #define PCI_CLASS_STORAGE_SAS0x0107
>  #define PCI_CLASS_STORAGE_OTHER  0x0180

Two comments:

1) I think "_SATA" is an inaccurate description.  It should be _AHCI AFAICS.

2) Typically we don't add constants unless they are used somewhere...

Jeff



Hi Jeff,
   According to PCI spec 3.0, 0x0106 means SATA controller, 0x010601
means AHCI and 0x010600 means vendor specific SATA controller. Pls see
the following table (PCI spec 3.0 P296):

Base Class  Sub-Class   Interface   Meaning

00h 00h SCSI bus controller

01h xxh IDE controller
---
02h 00h Floppy disk controller
-
03h 00h IPI bus controller
--
04h 00h RAID controller
01h 
20h ATA controller with ADMA 
interface
05h 
---
30h ATA controller with ADMA 
interface

---
00h Serial ATA controller–vendor 
specific interface
06h 
-
01h Serial ATA controller–AHCI 1.0 
interface

-
07h 00h Serial Attached SCSI (SAS) 
controller

-
80h 00h Other mass storage controller
--


So, I think, the following macro is correct:
#define PCI_CLASS_STORAGE_SATA   0x0106
If you would define AHCI class code, it should be 0x010601, not 0x0106:
#define PCI_CLASS_STORAGE_SATA_AHCI   0x010601

And, I think that PCI_CLASS_STORAGE_SATA had better be added to
pci_ids.h since the class code 0x0106 is used more than once. e.g.
ahci.c uses the magic number 0x0106 twice, and it might be used more
in future.

Best regards,
Conke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

2006-12-19 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 20 Dec 2006 10:52:19 +1100

> Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> > I noticed this bit of discussion in tcp_recvmsg. It implies that a better
> > queuing policy would be good. But it is confusing English (Alexey?) so
> > not sure where to start.
> 
> Actually I think the comment says that the current code isn't the
> most elegant but is more efficient.

It's just explaining the hierarchy of queues that need to
be purged, and in what order, for correctness.

Alexey added that code when I mentioned to him, right after
we added the prequeue, that it was possible process the
normal backlog before the prequeue, which is illegal.
In fixing that bug, he added the comment we are discussing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BUG: wedged processes, test program supplied

2006-12-19 Thread Albert Cahalan

Somebody PLEASE try this...

Normally, when a process dies it becomes a zombie.
If the parent dies (before or after the child), the child
is adopted by init. Init will reap the child.

The program included below DOES NOT get reaped.

Do like so:

gcc -m32 -O2 -std=gnu99 -o foo foo.c
while true; do killall -9 foo; ./foo; sleep 1; done

BTW, it gets even better if you start playing with ptrace.
Use the "strace" program (following children) and/or start
sending rapid-fire SIGKILL to all the various _threads_ in
the processes. You can get processes wedged in a wide
variety of interesting states. I've seen "X" state, processes
sitting around with pending SIGKILL, a process stuck in
"D" state supposedly core dumping despite ulimit 0 on
the core size, etc.

/

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 

#include 

static void early_write(int fd, const void *buf, size_t count)
{
#if 0
   unsigned long eax = __NR_write;
   /* push and pop because -fPIC probably
  needs ebx for the GOT base pointer */
   __asm__ __volatile__(
   "push %%ebx ; "
   "push %1 ; pop %%ebx ; int $0x80"
   "; pop %%ebx"
   :"=a"(eax)
   :"r"(fd),"c"(buf),"d"(count),"0"(eax)
   :"memory"
   );
#endif
}

static void p_str(char *s)
{
   size_t count = strlen(s);
   early_write(STDERR_FILENO,s,count);
}

static void p_hex(unsigned long u)
{
   char buf[9];
   char x[] = "0123456789abcdef";
   char *s = buf;
   s[8] = '\0';
   int i = 8;
   while(i--)
   buf[7-i] = x[(u>>(i*4))&15];
   early_write(STDERR_FILENO,buf,8);
}

static void p_dec(unsigned long u)
{
   char buf[11];
   char *s = buf+10;
   *s-- = '\0';
   int count = 0;
   while(u || !count)
   {
   *s-- = u%10 + '0';
   u /= 10;
   count++;
   }
   early_write(STDERR_FILENO,s+1,count);
}

#define FUTEX_WAIT  0
#define FUTEX_WAKE  1


typedef int lock_t;

#define LOCK_INITIALIZER 0

static inline void init_lock(lock_t* l) { *l = 0; }

// lock_add performs an atomic add
// and returns the resulting value
static inline int lock_add(lock_t* l, int val)
{
   int result = val;
   __asm__ __volatile__ (
   "lock; xaddl %1, %0;"
   : "=m" (*l), "=r" (result)
   : "1" (result), "m" (*l)
   : "memory");
   return result + val;
   // Returns the value written to memory
}

// lock_bts_high_bit atomically tests and
// sets the high bit and returns
// true if the bit was clear initially
static inline bool lock_bts_high_bit(lock_t* l)
{
   bool result;
   __asm__ __volatile__ (
   "lock; btsl $31, %0;\n\t"
   "setnc %1;"
   : "=m" (*l), "=q" (result)
   : "m" (*l)
   : "memory");
   return result;
}

static int futex(int* uaddr, int op, int val,
const struct timespec*timeout, int*uaddr2, int val3)
{
   (void)timeout;
   (void)uaddr2;
   (void)val3;
   int eax = __NR_futex;
   __asm__ __volatile__(
   "push %%ebx ; push %1 ; pop %%ebx"
   " ; int $0x80; pop %%ebx"
   :"=a"(eax)
   :"r"(uaddr),"c"(op),"d"(val),"0"(eax)
   :"memory"
   );
   return eax;
}

// lock will wait for and lock a mutex
static void lock(lock_t* l)
{
   // Check the mutex and set held bit
   if (lock_bts_high_bit(l))
   {
   // Got the mutex
   return;
   }

   // Increment wait count
   lock_add(l, 1);

   while (true)
   {
   // Check the mutex and set held bit
   if (lock_bts_high_bit(l))
   {
   // Got mutex, decrement wait count
   lock_add(l, -1);
   return;
   }

   int val = *l;
   // Ensure mutex not given up since check
   if (!(val & 0x8000))
   continue;

   // Wait for the mutex
   futex(l, FUTEX_WAIT, val, NULL, NULL, 0);
   }
}

// unlock will release a mutex
static void unlock(lock_t* l)
{
   // Turn off lock held bit and check for waiters
   if (lock_add(l, 0x8000) == 0)
   {
   // No waiters
   return;
   }

   // Waiters found, wake up one of them
   futex(l, FUTEX_WAKE, 1, NULL, NULL, 0);
}

unsigned toomany = 42;

struct data {
   unsigned nprocs;
   lock_t lock;
   unsigned count;
};

struct data *data;

static struct data *get_shm(void)
{
   void *addr;
   int shmid;

   // create
   shmid = shmget(IPC_PRIVATE,42,IPC_CREAT|0666);
   // attach
   addr = shmat(shmid, NULL, 0);
   // don't want it to 

Re: [Alsa-devel] HDA Intel sound driver fails on Acer notebook

2006-12-19 Thread D. Hazelton
On Tuesday 19 December 2006 20:48, tony mancill wrote:
> FWIW, using pci=noacpi seems to break the USB controller on this laptop.
> I get "device not accepting address xx, error -110.

Strange. I'm using an Acer Aspire 1640Z and the sound works perfectly. Of 
course Kubuntu was the only distro I could find that did OOB, but that's 
besides the point. In a quick look through /etc on my laptop I wasn't able to 
see how they do this. But after doing a quick check on Google the reports 
vary from this being a patched bug in ALSA to being easily solved by ensuring 
that the needed sound modules are loaded in the proper order.

An alternate solution to this is to load the snd-hda-intel module with the 
parameter "model=laptop"

> In addition, neither the onboard nor the wireless NIC work anymore with
> this option.  For the onboard, you see that the link is up, but then
> get "NETDEV WATCHDOG: eth0: transmit timed out."
>
> acpi=off is worse - the boot hangs trying to load acpi/thermal.ko.

>From personal experience I can say that ACPI is needed for Acer notebooks with 
the centrino chipset to function properly.

> I've tested with both 1.0.13 and and 1.0.14rc1.  I don't get exactly
> the same kernel logging (I'm using a Debian 2.6.18 kernel), but kern.log
> contains:

I had the same problem when I tried Debian on this laptop. I don't recommend 
it for laptops, since there are several common pieces of hardware found on 
laptops that need firmware not shipped by Debian. This includes the ipw2200 
firmware - which most Acer laptops need, because they ship with that wireless 
card.

> Dec 19 17:39:43 maus kernel: : hda_codec: invalid dep_range_val 0:7fff
> Dec 19 17:39:43 maus kernel: ALSA
> /home/tony/alsa-driver-1.0.14rc1/pci/hda/hda_codec.c:216: hda_codec:
> invalid dep_range_val 0:7fff Dec 19 17:39:43 maus last message repeated 279
> times
> Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0xd
> Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0x9
> Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0xd
> Dec 19 17:39:43 maus last message repeated 20 times
> Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0x9
>
> Thanks in advance for any assistance.  I hope you enjoyed your
> vacation.
>
> Thanks,
> tony
>
> Takashi Iwai wrote:
> > Hi,
> >
> > sorry for the late reply since I've been on vacation.
> >
> > At Sun, 3 Dec 2006 02:30:34 -0500,
> >
> > Chuck Ebbert wrote:
> >> The HDA Intel sound driver still fails to load on my Acer Aspire 5102
> >> notebook (Turion64 X2, ATI chipset):
> >>
> >> Here is the PCI info while running x86_64.  I tried i386 and x86_64 and
> >> it fails on both:
> >>
> >> 00:14.2 Audio device: ATI Technologies Inc Unknown device 437b (rev 01)
> >> Subsystem: Acer Incorporated [ALI] Unknown device 009f
> >> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >> ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B-
> >> ParErr- DEVSEL=slow >TAbort- SERR-  >> 64, Cache Line Size 08
> >> Interrupt: pin ? routed to IRQ 16
> >> Region 0: Memory at c000 (64-bit, non-prefetchable)
> >> [size=16K] Capabilities: [50] Power Management version 2
> >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA
> >> PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0
> >> PME-
> >> Capabilities: [60] Message Signalled Interrupts: 64bit+
> >> Queue=0/0 Enable- Address:   Data: 
> >> 00: 02 10 7b 43 06 00 10 04 01 00 03 04 08 40 00 00
> >> 10: 04 00 00 c0 00 00 00 00 00 00 00 00 00 00 00 00
> >> 20: 00 00 00 00 00 00 00 00 00 00 00 00 25 10 9f 00
> >> 30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 00 00 00
> >> 40: 00 00 02 40 00 00 00 00 00 00 00 00 00 00 00 00
> >> 50: 01 60 42 c8 00 00 00 00 00 00 00 00 00 00 00 00
> >> 60: 05 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>
> >> On i386 I get this after doing
> >> insmod snd-hda-codec.ko ;  insmod snd-hda-intel.ko
> >>
> >> Dec  1 17:38:29 ac kernel: ACPI: PCI Interrupt :00:14.2[A] -> GSI 16
> >> (level, low) -> IRQ 18 Dec  1 17:38:29 ac kernel: codec_mask = 0xb
> >> Dec  1 17:38:30 ac kernel: hda_codec: PCI 1025:9f, codec config 5 is
> >> selected Dec  1 17:38:31 ac kernel: hda_intel: azx_get_response timeout,
> >> switching to polling mode... Dec  1 17:38:32 ac kernel: hda_intel:
> >> azx_get_response timeout, switching to single_cmd mode...
> >
> > These messages are scary.  It 

Re: SATA DMA problem (sata_uli)

2006-12-19 Thread Tejun Heo
Jeff Garzik wrote:
> Tejun Heo wrote:
>> Jeff Garzik wrote:
>>> Alan wrote:
> I tracked it down to one of the drives being forced into PIO4 mode
> rather than UDMA mode; dmesg bits:
> ata4.00: ATA-7, max UDMA/133, 586072368 sectors: LBA48 NCQ (depth
> 0/32)
> ata4.00: ata4: dev 0 multi count 16
> ata4.00: simplex DMA is claimed by other device, disabling DMA
 Your ULi controller is reporting that it supports UDMA upon only one
 channel at a time. The kernel is honouring this information. The older
 ULi (was ALi) PATA devices report simplex but let you turn it off so
 see if the following does the trick. Test carefully as always with
 disk driver
 changes.

 (Jeff probably best to check the docs before merging this but I believe
 it is sane)

 Signed-off-by: Alan Cox <[EMAIL PROTECTED]>
>>> My Uli SATA docs do not appear to cover the bmdma registers :(  Only the
>>> PCI config registers.
>>>
>>> But regardless, I think the better fix is to never set ATA_HOST_SIMPLEX
>>> if ATA_FLAG_NO_LEGACY is set.
>>>
>>> None of the SATA controllers I've ever encountered has been simplex.
>>
>> Just another data point.  The same problem is reported by bug #7590.
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=7590
>>
>> Is somebody brewing a patch?
> 
> Not to my knowledge.  Did you just volunteer?  ;-)
> 
> /me runs...

I'm just gonna ack Alan's patch.

* ATA_FLAG_NO_LEGACY is not really used widely (and thus LLDs don't set
it rigorously).  I think it should be removed once we get initialization
model right.

* I'm really reluctant to add more LLD-specific knowledge into libata
core.  We're already carrying too much due to the current init model
(libata should initialize host according to probe_ent, so many
weirdities should be represented in probe_ent in a form libata core
understands).

* The idea of clearing simplex for unknown controllers scares the hell
out of me.  where's mummy...

So, I'll ask bug reporter of #7590 to test it.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to sysfs PM layer break userspace

2006-12-19 Thread Randy Dunlap
On Tue, 19 Dec 2006 18:15:24 -0800 Andrew Morton wrote:

> On Tue, 19 Dec 2006 13:34:49 -0800
> David Brownell <[EMAIL PROTECTED]> wrote:
> 
> > Documentation/feature-removal-schedule.txt has warned about this since
> > August
> 
> Nobody reads that.

Ugh, I read it.

> Please, wherever possible, put a nice printk("this is going away") in the code
> when planning these things.

Can notices go in both places, or is in the source code (printk)
now the preferred way?

I think that we can point people to Doc/feature-removal-schedule.txt
easier (and more effectively) than we can source code (or noisy kernel
logs).

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] Fix kmalloc flags used in ext3 with an active journal handle

2006-12-19 Thread Suzuki


Andrew Morton wrote:

On Tue, 19 Dec 2006 17:58:12 -0800
Suzuki <[EMAIL PROTECTED]> wrote:



* Fix the kmalloc flags used from within ext3, when we have an active journal 
handle

If we do a kmalloc with GFP_KERNEL on system running low on memory, 
with an active journal handle, we might end up in cleaning up the fs cache 
flushing dirty inodes for some other filesystem. This would cause hitting a 
J_ASSERT() in :



The change might be needed (haven't looked at it yet).  But I'd like to see
the full BUG trace, please.  To see the callchain.


Here is the call trace which was hit by one of our test teams. This was 
from fs/ext3/xattr.c. While looking for similar calls I found the others 
described in the patch.


Assertion failure in journal_start() at fs/jbd/transaction.c:274: "handle-
>h_transaction->t_journal == journal"
kernel BUG at fs/jbd/transaction.c:274!
illegal operation: 0001 [#1]
CPU:0Not tainted (2.6.5-7.282-s390x SLES9_SP3_BRANCH-20061031152356)
Process dbench (pid: 14070, task: 025617f0, ksp: 01057630)
Krnl PSW : 07018000 08837b38 (journal_start+0x90/0x15c 
[jbd])
Krnl GPRS:  00507fc0 002b 
01056d80
   08837b36 2885 08841da6 

   001bfaa0 03483d08 0002 
07a8bda0
   08833000 088a7d08 08837b36 
01056e80

Krnl Code: 00 00 58 10 b0 0c a7 1a 00 01 b9 04 00 2b 50 10 b0 0c e3 40
Call Trace:
 [<088a30fc>] ext3_journal_start+0x8c/0xa4 [ext3]
 [<08896822>] ext3_dirty_inode+0x3a/0xe0 [ext3]
 [<001ca362>] __mark_inode_dirty+0x1ae/0x1c8
 [<001bfaa0>] iput+0xbc/0xf0
 [<001bdcca>] prune_dcache+0x29e/0x584
 [<001bdfe4>] shrink_dcache_memory+0x34/0x54
 [<0017b100>] shrink_slab+0x15c/0x250
 [<0017b6e4>] try_to_free_pages+0x1c0/0x2a4
 [<00170276>] __alloc_pages+0x2ba/0x4e0
 [<0017059a>] __get_free_pages+0x4e/0x8c
 [<00174ea2>] cache_alloc_refill+0x2a6/0x868
 [<00175540>] __kmalloc+0xdc/0xe0
 [<088a4e62>] ext3_xattr_set_handle+0x114a/0x174c [ext3]
 [<088a54e4>] ext3_xattr_set+0x80/0xd0 [ext3]
 [<088a6312>] ext3_xattr_user_set+0xce/0xe4 [ext3]
 [<088a5f1e>] ext3_setxattr+0x17e/0x18c [ext3]
 [<001c88e6>] setxattr+0x14a/0x234
 [<001c8a80>] sys_fsetxattr+0xb0/0x110
 [<0011fc10>] sysc_noemu+0x10/0x16


Always include the trace...


Will take care of it from now onwards.


Thanks.


* Fix the kmalloc flags used from within ext3, when we have an active journal handle

	If we do a kmalloc with GFP_KERNEL on system running low on memory, with an active journal handle, we might end up in cleaning up the fs cache flushing dirty inodes for some other filesystem. This would cause hitting a J_ASSERT() in :

handle_t *journal_start(journal_t *journal, int nblocks)
{
	handle_t *handle = journal_current_handle();
	int err;
[...]

	if (handle) {
		J_ASSERT(handle->h_transaction->t_journal == journal);


Here are the places where we do kmalloc or may end up doing kmalloc, with __GFP_FS (through GFP_KERNEL) from ext3, while holding a journal handle. 

1) fs/ext3/xattr.c :: ext3_xattr_block_set() : 2 occurences 

2) fs/ext3/resize.c :: reserve_backup_gdb()
3) fs/ext3/resize.c :: add_new_gdb()


4) fs/ext3/acl.c :: ext3_init_acl() :
There are quite a few points where we may endup calling the kmalloc() from ext3_init_acl() which is called with a handle() from ext3_new_inode():

 a)   Called direclty within ext3_init_acl() as:
  clone = posix_acl_clone(acl, GFP_KERNEL);
 b) With the following code path:
ext3_init_acl()-> ext3_get_acl()-> ext3_acl_from_disk() -> posix_acl_alloc(GFP_KERNEL)

 c) Also  ext3_init_acl()-> ext3_get_acl()-> kmalloc() also might call kmalloc() directly.


5) fs/ext3/acl.c :: ext3_acl_to_disk() which is called from ext3_set_acl().


Among these 4.b & 4.c may be called from a with or without handle case. 

There was a similar issue reported sometime back, early this year.

http://lkml.org/lkml/2006/1/31/54

Attached patch fixes all the above invocatins to make use of GFP_NOFS instead of GFP_KERNEL.


Signed-off-by: Suzuki K P <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc1/fs/ext3/xattr.c
===
--- linux-2.6.20-rc1.orig/fs/ext3/xattr.c	2006-12-13 17:14:23.0 -0800
+++ linux-2.6.20-rc1/fs/ext3/xattr.c	2006-12-19 11:41:35.0 -0800
@@ -718,7 +718,7 @@
 ce = NULL;
 			}
 			ea_bdebug(bs->bh, "cloning");
-			s->base = kmalloc(bs->bh->b_size, GFP_KERNEL);
+			s->base = kmalloc(bs->bh->b_size, GFP_NOFS);
 			error = -ENOMEM;
 			if (s->base == NULL)
 goto cleanup;
@@ -730,7 +730,7 @@
 		}
 	} else {
 		/* Allocate a buffer where we construct the new block. */
-		s->base = kmalloc(sb->s_blocksize, GFP_KERNEL);
+		s->base = 

Re: [Alsa-devel] HDA Intel sound driver fails on Acer notebook

2006-12-19 Thread tony mancill
FWIW, using pci=noacpi seems to break the USB controller on this laptop.  
I get "device not accepting address xx, error -110.

In addition, neither the onboard nor the wireless NIC work anymore with
this option.  For the onboard, you see that the link is up, but then
get "NETDEV WATCHDOG: eth0: transmit timed out."

acpi=off is worse - the boot hangs trying to load acpi/thermal.ko.

I've tested with both 1.0.13 and and 1.0.14rc1.  I don't get exactly
the same kernel logging (I'm using a Debian 2.6.18 kernel), but kern.log
contains:

Dec 19 17:39:43 maus kernel: : hda_codec: invalid dep_range_val 0:7fff
Dec 19 17:39:43 maus kernel: ALSA 
/home/tony/alsa-driver-1.0.14rc1/pci/hda/hda_codec.c:216: hda_codec: invalid 
dep_range_val 0:7fff
Dec 19 17:39:43 maus last message repeated 279 times
Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0xd
Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0x9
Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0xd
Dec 19 17:39:43 maus last message repeated 20 times
Dec 19 17:39:43 maus kernel: hda_codec: num_steps = 0 for NID=0x9

Thanks in advance for any assistance.  I hope you enjoyed your
vacation.

Thanks,
tony

Takashi Iwai wrote:
> Hi,
> 
> sorry for the late reply since I've been on vacation.
> 
> At Sun, 3 Dec 2006 02:30:34 -0500,
> Chuck Ebbert wrote:
>> The HDA Intel sound driver still fails to load on my Acer Aspire 5102
>> notebook (Turion64 X2, ATI chipset):
>>
>> Here is the PCI info while running x86_64.  I tried i386 and x86_64 and it 
>> fails
>> on both:
>>
>> 00:14.2 Audio device: ATI Technologies Inc Unknown device 437b (rev 01)
>> Subsystem: Acer Incorporated [ALI] Unknown device 009f
>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
>> Stepping- SERR- FastB2B-
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- 
>> SERR- > Latency: 64, Cache Line Size 08
>> Interrupt: pin ? routed to IRQ 16
>> Region 0: Memory at c000 (64-bit, non-prefetchable) [size=16K]
>> Capabilities: [50] Power Management version 2
>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA 
>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>> Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/0 
>> Enable-
>> Address:   Data: 
>> 00: 02 10 7b 43 06 00 10 04 01 00 03 04 08 40 00 00
>> 10: 04 00 00 c0 00 00 00 00 00 00 00 00 00 00 00 00
>> 20: 00 00 00 00 00 00 00 00 00 00 00 00 25 10 9f 00
>> 30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 00 00 00
>> 40: 00 00 02 40 00 00 00 00 00 00 00 00 00 00 00 00
>> 50: 01 60 42 c8 00 00 00 00 00 00 00 00 00 00 00 00
>> 60: 05 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>
>> On i386 I get this after doing
>> insmod snd-hda-codec.ko ;  insmod snd-hda-intel.ko
>>
>> Dec  1 17:38:29 ac kernel: ACPI: PCI Interrupt :00:14.2[A] -> GSI 16 
>> (level, low) -> IRQ 18
>> Dec  1 17:38:29 ac kernel: codec_mask = 0xb
>> Dec  1 17:38:30 ac kernel: hda_codec: PCI 1025:9f, codec config 5 is selected
>> Dec  1 17:38:31 ac kernel: hda_intel: azx_get_response timeout, switching to 
>> polling mode...
>> Dec  1 17:38:32 ac kernel: hda_intel: azx_get_response timeout, switching to 
>> single_cmd mode...
> 
> These messages are scary.  It means that the communication between the
> controller chip and the codec chip doesn't work, usually incorrect IRQ
> handling, and often due to broken BIOS or ACPI support.  Any change if
> you pass pci=noacpi or acpi=off boot option?
> 
> Anyway, you can try alsa-git patch in mm tree.  It's a better support
> code for Acer laptops, and this might work slightly differently.
> 
> 
> Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to sysfs PM layer break userspace

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 13:34:49 -0800
David Brownell <[EMAIL PROTECTED]> wrote:

> Documentation/feature-removal-schedule.txt has warned about this since
> August

Nobody reads that.

Please, wherever possible, put a nice printk("this is going away") in the code
when planning these things.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] Fix kmalloc flags used in ext3 with an active journal handle

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 17:58:12 -0800
Suzuki <[EMAIL PROTECTED]> wrote:

> * Fix the kmalloc flags used from within ext3, when we have an active journal 
> handle
> 
>   If we do a kmalloc with GFP_KERNEL on system running low on memory, 
> with an active journal handle, we might end up in cleaning up the fs cache 
> flushing dirty inodes for some other filesystem. This would cause hitting a 
> J_ASSERT() in :

The change might be needed (haven't looked at it yet).  But I'd like to see
the full BUG trace, please.  To see the callchain.

Always include the trace...

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: schedule_timeout: wrong timeout value

2006-12-19 Thread Andrew Morton
On Mon, 18 Dec 2006 20:34:43 -0600
Robert Hancock <[EMAIL PROTECTED]> wrote:

> kyle wrote:
> > Hi,
> > 
> > Recently my mysql servershows something like:
> > Dec 18 18:24:05 sql kernel: schedule_timeout: wrong timeout value 
> >  from c0284efd
> > Dec 18 18:24:36 sql last message repeated 19939 times
> > Dec 18 18:25:37 sql last message repeated 33392 times
> > 
> > from syslog every 1 or 2 days. Whenever the messages show, mysql server 
> > stop accept new connections from the same network, and I need to restart 
> > the mysql service and then it will keep running well for 1-2 days until 
> > the messages show up again.
> > 
> > The server has been running over 1 year without any problem, the problem 
> > started show up around 2 weeks ago. It's running kernel 2.6.12, and 
> > mysql server, nothing else. Hardware is Pentium 4 2.8GHz with 
> > hyperthreading enabled.
> > 
> > What does the kernel message mean and why it make mysql stop accept new 
> > connections? Is it hardware problem or try upgrade the kernel may help?
> > Please CC me if possible. Thank you
> 
> The message means some code in the kernel or in some module passed a 
> negative value to schedule_timeout which it shouldn't have. The c0284efd 
> value is the address of the function that made the call - you may be 
> able to look that up in your /proc/ksyms or the System.map file and 
> figure out what function that is..
> 

I queued this:


From: Andrew Morton <[EMAIL PROTECTED]>

Kyle is hitting this warning, and we don't have a clue what it's caused by. 
Add the obligatory dump_stack().

Cc: kyle <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 kernel/timer.c |7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff -puN kernel/timer.c~schedule_timeout-improve-warning-message kernel/timer.c
--- a/kernel/timer.c~schedule_timeout-improve-warning-message
+++ a/kernel/timer.c
@@ -1344,11 +1344,10 @@ fastcall signed long __sched schedule_ti
 * should never happens anyway). You just have the printk()
 * that will tell you if something is gone wrong and where.
 */
-   if (timeout < 0)
-   {
+   if (timeout < 0) {
printk(KERN_ERR "schedule_timeout: wrong timeout "
-   "value %lx from %p\n", timeout,
-   __builtin_return_address(0));
+   "value %lx\n", timeout);
+   dump_stack();
current->state = TASK_RUNNING;
goto out;
}
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] [PATCH] Fix kmalloc flags used in ext3 with an active journal handle

2006-12-19 Thread Suzuki

Hi,

The attached patch converts the GFP mask for kmallocs within ext3 to 
GFP_NOFS whenever they are called with an active journal handle.


More description in the patch.

Comments ?

Thanks,

Suzuki
Linux Technology Center
IBM Systems & Technology Labs.

* Fix the kmalloc flags used from within ext3, when we have an active journal handle

	If we do a kmalloc with GFP_KERNEL on system running low on memory, with an active journal handle, we might end up in cleaning up the fs cache flushing dirty inodes for some other filesystem. This would cause hitting a J_ASSERT() in :

handle_t *journal_start(journal_t *journal, int nblocks)
{
	handle_t *handle = journal_current_handle();
	int err;
[...]

	if (handle) {
		J_ASSERT(handle->h_transaction->t_journal == journal);


Here are the places where we do kmalloc or may end up doing kmalloc, with __GFP_FS (through GFP_KERNEL) from ext3, while holding a journal handle. 

1) fs/ext3/xattr.c :: ext3_xattr_block_set() : 2 occurences 

2) fs/ext3/resize.c :: reserve_backup_gdb()
3) fs/ext3/resize.c :: add_new_gdb()


4) fs/ext3/acl.c :: ext3_init_acl() :
There are quite a few points where we may endup calling the kmalloc() from ext3_init_acl() which is called with a handle() from ext3_new_inode():

 a)   Called direclty within ext3_init_acl() as:
  clone = posix_acl_clone(acl, GFP_KERNEL);
 b) With the following code path:
ext3_init_acl()-> ext3_get_acl()-> ext3_acl_from_disk() -> posix_acl_alloc(GFP_KERNEL)

 c) Also  ext3_init_acl()-> ext3_get_acl()-> kmalloc() also might call kmalloc() directly.


5) fs/ext3/acl.c :: ext3_acl_to_disk() which is called from ext3_set_acl().


Among these 4.b & 4.c may be called from a with or without handle case. 

There was a similar issue reported sometime back, early this year.

http://lkml.org/lkml/2006/1/31/54

Attached patch fixes all the above invocatins to make use of GFP_NOFS instead of GFP_KERNEL.


Signed-off-by: Suzuki K P <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc1/fs/ext3/xattr.c
===
--- linux-2.6.20-rc1.orig/fs/ext3/xattr.c	2006-12-13 17:14:23.0 -0800
+++ linux-2.6.20-rc1/fs/ext3/xattr.c	2006-12-19 11:41:35.0 -0800
@@ -718,7 +718,7 @@
 ce = NULL;
 			}
 			ea_bdebug(bs->bh, "cloning");
-			s->base = kmalloc(bs->bh->b_size, GFP_KERNEL);
+			s->base = kmalloc(bs->bh->b_size, GFP_NOFS);
 			error = -ENOMEM;
 			if (s->base == NULL)
 goto cleanup;
@@ -730,7 +730,7 @@
 		}
 	} else {
 		/* Allocate a buffer where we construct the new block. */
-		s->base = kmalloc(sb->s_blocksize, GFP_KERNEL);
+		s->base = kmalloc(sb->s_blocksize, GFP_NOFS);
 		/* assert(header == s->base) */
 		error = -ENOMEM;
 		if (s->base == NULL)
Index: linux-2.6.20-rc1/fs/ext3/resize.c
===
--- linux-2.6.20-rc1.orig/fs/ext3/resize.c	2006-12-13 17:14:23.0 -0800
+++ linux-2.6.20-rc1/fs/ext3/resize.c	2006-12-19 11:42:39.0 -0800
@@ -440,7 +440,7 @@
 		goto exit_dindj;
 
 	n_group_desc = kmalloc((gdb_num + 1) * sizeof(struct buffer_head *),
-			GFP_KERNEL);
+			GFP_NOFS);
 	if (!n_group_desc) {
 		err = -ENOMEM;
 		ext3_warning (sb, __FUNCTION__,
@@ -524,7 +524,7 @@
 	int res, i;
 	int err;
 
-	primary = kmalloc(reserved_gdb * sizeof(*primary), GFP_KERNEL);
+	primary = kmalloc(reserved_gdb * sizeof(*primary), GFP_NOFS);
 	if (!primary)
 		return -ENOMEM;
 
Index: linux-2.6.20-rc1/fs/ext3/acl.c
===
--- linux-2.6.20-rc1.orig/fs/ext3/acl.c	2006-12-13 17:14:23.0 -0800
+++ linux-2.6.20-rc1/fs/ext3/acl.c	2006-12-19 11:45:35.0 -0800
@@ -37,7 +37,7 @@
 		return ERR_PTR(-EINVAL);
 	if (count == 0)
 		return NULL;
-	acl = posix_acl_alloc(count, GFP_KERNEL);
+	acl = posix_acl_alloc(count, GFP_NOFS);
 	if (!acl)
 		return ERR_PTR(-ENOMEM);
 	for (n=0; n < count; n++) {
@@ -91,7 +91,7 @@
 
 	*size = ext3_acl_size(acl->a_count);
 	ext_acl = kmalloc(sizeof(ext3_acl_header) + acl->a_count *
-			sizeof(ext3_acl_entry), GFP_KERNEL);
+			sizeof(ext3_acl_entry), GFP_NOFS);
 	if (!ext_acl)
 		return ERR_PTR(-ENOMEM);
 	ext_acl->a_version = cpu_to_le32(EXT3_ACL_VERSION);
@@ -187,7 +187,7 @@
 	}
 	retval = ext3_xattr_get(inode, name_index, "", NULL, 0);
 	if (retval > 0) {
-		value = kmalloc(retval, GFP_KERNEL);
+		value = kmalloc(retval, GFP_NOFS);
 		if (!value)
 			return ERR_PTR(-ENOMEM);
 		retval = ext3_xattr_get(inode, name_index, "", value, retval);
@@ -335,7 +335,7 @@
 			if (error)
 goto cleanup;
 		}
-		clone = posix_acl_clone(acl, GFP_KERNEL);
+		clone = posix_acl_clone(acl, GFP_NOFS);
 		error = -ENOMEM;
 		if (!clone)
 			goto cleanup;


Re: [RFC] HZ free ntp

2006-12-19 Thread john stultz
On Tue, 2006-12-19 at 17:32 -0800, john stultz wrote:
> On Wed, 2006-12-13 at 21:40 +0100, Roman Zippel wrote:
> > On Wed, 13 Dec 2006, john stultz wrote:
> > > > You don't have to introduce anything new, it's tick_length that changes
> > > > and HZ that becomes a variable in this function.
> > >
> > > So, forgive me for rehashing this, but it seems we're cross talking
> > > again. The context here is the dynticks code. Where HZ doesn't change,
> > > but we get interrupts at much reduced rates.
> > 
> > I know and all you have to change in the ntp and some related code is to
> > replace HZ there with a variable, thus make it changable, so you can
> > increase the update interval (i.e. it becomes 1s/hz instead of 1s/HZ).
> 
> Untested patch below. Does this vibe better with you are suggesting?

And here would be the follow on patch (again *untested*) for
CONFIG_NO_HZ slowing the time accumulation down to once per second.

thanks
-john


diff --git a/include/linux/timex.h b/include/linux/timex.h
index 8241e6e..3beb539 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -286,7 +286,11 @@ #endif /* !CONFIG_TIME_INTERPOLATION */
 
 #define TICK_LENGTH_SHIFT  32
 
+#ifdef CONFIG_NO_HZ
+#define NTP_INTERVAL_FREQ  (1)
+#else
 #define NTP_INTERVAL_FREQ  (HZ)
+#endif
 #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
 
 /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index d0ba190..53979a9 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -127,12 +127,14 @@ EXPORT_SYMBOL_GPL(ktime_get_ts);
  */
 static void hrtimer_get_softirq_time(struct hrtimer_base *base)
 {
+   struct timespec ts;
ktime_t xtim, tomono;
unsigned long seq;
 
do {
seq = read_seqbegin(_lock);
-   xtim = timespec_to_ktime(xtime);
+   getnstimeofday();
+   xtim = timespec_to_ktime(ts);
tomono = timespec_to_ktime(wall_to_monotonic);
 
} while (read_seqretry(_lock, seq));


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc iseries link error in allmodconfig

2006-12-19 Thread Stephen Rothwell
On Tue, 19 Dec 2006 15:57:19 + David Woodhouse <[EMAIL PROTECTED]> wrote:
>
> On Wed, 2006-11-08 at 09:34 -0800, Judith Lebzelter wrote:
> > Choose rpa_vscsi.c over iseries_vscsi.c when building both
> > pseries and iseries.
>
> Would it not be better to make them both work instead?

The maintainer's take on this is the noone installs onto vscsi disks on
legacy iSeries.

> Untested-but-otherwise-Signed-off-by: David Woodhouse <[EMAIL PROTECTED]>

And that will, unfortunately, never get into 2.6.20.  I suggest that we
put the simpler patch into 2.6.20 and maybe revisit this afterwards if
we think it is worth the effort.

--
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpttq8DMhZ6Y.pgp
Description: PGP signature


Re: [RFC] HZ free ntp

2006-12-19 Thread john stultz
On Wed, 2006-12-13 at 21:40 +0100, Roman Zippel wrote:
> On Wed, 13 Dec 2006, john stultz wrote:
> > > You cannot choose arbitrary intervals otherwise you get other problems,
> > > e.g. with your patch time_offset handling is broken.
> >
> > I'm not seeing this yet. Any more details?
> 
> time_offset is scaled to HZ in do_adjtimex, which needs to be changed as
> well.

Ah, thanks! Fixed.

> > > You don't have to introduce anything new, it's tick_length that changes
> > > and HZ that becomes a variable in this function.
> >
> > So, forgive me for rehashing this, but it seems we're cross talking
> > again. The context here is the dynticks code. Where HZ doesn't change,
> > but we get interrupts at much reduced rates.
> 
> I know and all you have to change in the ntp and some related code is to
> replace HZ there with a variable, thus make it changable, so you can
> increase the update interval (i.e. it becomes 1s/hz instead of 1s/HZ).

Untested patch below. Does this vibe better with you are suggesting?

Any other suggestions or feedback?

thanks
-john


diff --git a/include/linux/timex.h b/include/linux/timex.h
index db501dc..8241e6e 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -286,6 +286,9 @@ #endif /* !CONFIG_TIME_INTERPOLATION */
 
 #define TICK_LENGTH_SHIFT  32
 
+#define NTP_INTERVAL_FREQ  (HZ)
+#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
+
 /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */
 extern u64 current_tick_length(void);
 
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 3afeaa3..eb12509 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -24,7 +24,7 @@ static u64 tick_length, tick_length_base
 
 #define MAX_TICKADJ500 /* microsecs */
 #define MAX_TICKADJ_SCALED (((u64)(MAX_TICKADJ * NSEC_PER_USEC) << \
- TICK_LENGTH_SHIFT) / HZ)
+ TICK_LENGTH_SHIFT) / NTP_INTERVAL_FREQ)
 
 /*
  * phase-lock loop variables
@@ -46,13 +46,17 @@ #define CLOCK_TICK_ADJUST   (((s64)CLOCK_T
 
 static void ntp_update_frequency(void)
 {
-   tick_length_base = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << 
TICK_LENGTH_SHIFT;
-   tick_length_base += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
-   tick_length_base += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);
+   u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
+   << TICK_LENGTH_SHIFT;
+   second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
+   second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);
 
-   do_div(tick_length_base, HZ);
+   tick_length_base = second_length;
 
-   tick_nsec = tick_length_base >> TICK_LENGTH_SHIFT;
+   do_div(second_length, HZ);
+   tick_nsec = second_length >> TICK_LENGTH_SHIFT;
+
+   do_div(tick_length_base, NTP_INTERVAL_FREQ);
 }
 
 /**
@@ -162,7 +166,7 @@ void second_overflow(void)
tick_length -= MAX_TICKADJ_SCALED;
} else {
tick_length += (s64)(time_adjust * NSEC_PER_USEC /
-HZ) << TICK_LENGTH_SHIFT;
+   NTP_INTERVAL_FREQ) << TICK_LENGTH_SHIFT;
time_adjust = 0;
}
}
@@ -239,7 +243,8 @@ #endif
result = -EINVAL;
goto leave;
}
-   time_freq = ((s64)txc->freq * NSEC_PER_USEC) >> (SHIFT_USEC - 
SHIFT_NSEC);
+   time_freq = ((s64)txc->freq * NSEC_PER_USEC)
+   >> (SHIFT_USEC - SHIFT_NSEC);
}
 
if (txc->modes & ADJ_MAXERROR) {
@@ -309,7 +314,8 @@ #endif
freq_adj += time_freq;
freq_adj = min(freq_adj, (s64)MAXFREQ_NSEC);
time_freq = max(freq_adj, (s64)-MAXFREQ_NSEC);
-   time_offset = (time_offset / HZ) << SHIFT_UPDATE;
+   time_offset = (time_offset / NTP_INTERVAL_FREQ)
+   << SHIFT_UPDATE;
} /* STA_PLL */
} /* txc->modes & ADJ_OFFSET */
if (txc->modes & ADJ_TICK)
@@ -324,8 +330,10 @@ leave: if ((time_status & (STA_UNSYNC|ST
if ((txc->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
txc->offset= save_adjust;
else
-   txc->offset= shift_right(time_offset, SHIFT_UPDATE) * HZ / 1000;
-   txc->freq  = (time_freq / NSEC_PER_USEC) << (SHIFT_USEC - 
SHIFT_NSEC);
+   txc->offset= shift_right(time_offset, SHIFT_UPDATE)
+   * NTP_INTERVAL_FREQ / 1000;
+   txc->freq  = (time_freq / NSEC_PER_USEC)
+   << (SHIFT_USEC - SHIFT_NSEC);
txc->maxerror  = time_maxerror;
txc->esterror  = time_esterror;
txc->status   

[patch 1/4] Add

2006-12-19 Thread Vincent Legoll

Hello,

what about something along the lines of the following,
on top of your patch ?

Or should the kernel-doc be put on another function
instead of that one ?

--
Vincent Legoll
Add do_syslog() kernel-doc

---
commit 95b0721d8b4b46ddf83113fe49492810d7d92060
tree e2715a8cf7eb0d71b3bee2185a5cf98639d79d90
parent de794d2dfd6dd0c38dd552020ac00c46e1df5293
author Vincent Legoll <[EMAIL PROTECTED]> Wed, 20 Dec 2006 01:29:34 +0100
committer Vincent Legoll <[EMAIL PROTECTED]> Wed, 20 Dec 2006 01:29:34 +0100

 kernel/printk.c |   11 ++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 232467e..5416d07 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -164,7 +164,16 @@ out:
 
 __setup("log_buf_len=", log_buf_len_setup);
 
-/* See linux/klog.h for the command numbers passed as the first argument.  */
+/**
+ * do_syslog - operate on kernel messages log
+ * @type: operation to perform
+ * @buf: user-space buffer to copy data into
+ * @len: length of data to copy from log into @buf
+ *
+ * See include/linux/klog.h for the command numbers passed as @type.
+ * Parameters @buf & @len are only used for operations of type %KLOG_READ,
+ * %KLOG_READ_HIST and %KLOG_READ_CLEAR_HIST.
+ */
 int do_syslog(int type, char __user *buf, int len)
 {
unsigned long i, j, limit, count;


Re: [patch] hrtimers: add state tracking, fix

2006-12-19 Thread Tilman Schmidt
Am 19.12.2006 20:56 schrieb Ingo Molnar:
> thanks for the report - this made me review the hrtimer state engine 
> logic, and bingo, it indeed has a nasty typo! Could you try the fix 
> below, does it fix your problem? It might explain the crash you are 
> seeing, because the typo means we'd ignore HRTIMER_STATE_PENDING state 
> (which is rare but possible).

Ok, the machine has been running for a couple of hours with that patch
and so far hasn't frozen again. I'll watch it some more but it looks
like your patch did indeed fix my problem.

Thanks
Tilman

> -->
> Subject: [patch] hrtimers: add state tracking, fix
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> fix bug in hrtimer_is_queued(), introduced by a cleanup during
> the recent refactoring.
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> ---
>  kernel/hrtimer.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux/kernel/hrtimer.c
> ===
> --- linux.orig/kernel/hrtimer.c
> +++ linux/kernel/hrtimer.c
> @@ -157,7 +157,7 @@ static void hrtimer_get_softirq_time(str
>  static inline int hrtimer_is_queued(struct hrtimer *timer)
>  {
>   return timer->state &
> - (HRTIMER_STATE_ENQUEUED || HRTIMER_STATE_PENDING);
> + (HRTIMER_STATE_ENQUEUED | HRTIMER_STATE_PENDING);
>  }
>  
>  /*

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)



signature.asc
Description: OpenPGP digital signature


Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine

2006-12-19 Thread Andrew Morton
On Mon, 18 Dec 2006 09:48:01 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

> [EMAIL PROTECTED] writes:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=7505
> >
> > --- Additional Comments From [EMAIL PROTECTED]  2006-12-18 07:39 ---
> > OK, fixed.
> 
> 
> Greg.
> 
> It appears commit d71374dafbba7ec3f67371d3b7e9f6310a588808 which
> replaced the pci bus spinlock with a semaphore causes some systems not
> to boot.  I haven't a clue why.   
> 
> So I figure I would toss the ball over to your court to see if you can
> look and see what needs to happen to resolve this problem.
> 
> There appears to be at least one positive confirmation that reverting
> this patch allows this patch fixes the problems.
> 

That's weird.

Quoting the bug report:


There are output from kernel with enabled 'earlyprintk' option.

Linux version 2.6.19-rc5 ([EMAIL PROTECTED]) (gcc version 4.1.2 20060901 
(prerelease) 
(Debian 4.1.1-13)) #2 PREEMPT Sat Nov 11 16:04:00 MSK 2006
Command line: BOOT_IMAGE=Linux-bug ro root=303 
video=radeonfb:mode:[EMAIL PROTECTED] idebus=66 
earlyprintk=serial,ttyS0,9600,keep
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f800 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 1fff (usable)
 BIOS-e820: 1fff - 1fff3000 (ACPI NVS)
 BIOS-e820: 1fff3000 - 2000 (ACPI data)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
end_pfn_map = 1048576
kernel direct mapping tables up to 1 @ 8000-d000
DMI 2.2 present.
Zone PFN ranges:
  DMA 0 -> 4096
  DMA324096 ->  1048576
  Normal1048576 ->  1048576
early_node_map[2] active PFN ranges
0:0 ->  159
0:  256 ->   131056
Nvidia board detected. Ignoring ACPI timer override.
ACPI: PM-Timer IO Port: 0x4008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Nosave address range: 0009f000 - 000a
Nosave address range: 000a - 000f
Nosave address range: 000f - 0010
Allocating PCI resources starting at 3000 (gap: 2000:c000)
Built 1 zonelists.  Total pages: 128336
Kernel command line: BOOT_IMAGE=Linux-bug ro root=303 
video=radeonfb:mode:[EMAIL PROTECTED] idebus=66 
earlyprintk=serial,ttyS0,9600,keep
ide_setup: idebus=66
Initializing CPU#0
general protection fault: 013b [1] PREEMPT 
CPU 0 
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.19-rc5 #2
RIP: 0010:[]  [] init_8259A+0xb6/0xf0
RSP: 0018:803cdf68  EFLAGS: 00010246
RAX: 00ff RBX: 0246 RCX: b4fcb55f
RDX: 0011 RSI: 8013cf40 RDI: 0199
RBP:  R08:  R09: 
R10: 0001 R11: 0070 R12: 
R13:  R14:  R15: 
FS:  () GS:803c() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 00f0aed9 CR3: 00101000 CR4: 06a0
Process swapper (pid: 0, threadinfo 803cc000, task 80360360)
Stack:   803d3a46 800089360a40206f 0009
 0008e000 803d3ab9  803ddd99
 0009 803cf65a  0009
Call Trace:
 [] init_ISA_irqs+0x16/0x80
 [] init_IRQ+0x9/0x1e0
 [] rcu_cpu_notify+0x49/0x60
 [] start_kernel+0xda/0x1f0
 [] _sinittext+0x146/0x150


I assume we went splat in start_kernel->trap_init->cpu_init.  We shouldn't
have touched pci_bus_lock that early?  Perhaps acpi does PCI things very
early..

Conceivably an accidental early local_irq_enable could cause bad things,
but that rwsem should be 100% uncontended.

Could the reporters please determine whether disabling the various
CONFIG_DEBUG_* options prevents this?  Such as CONFIG_DEBUG_LOCKDEP,
CONFIG_DEBUG_LOCK_ALLOC, CONFIG_PROVE_LOCKING, etc?

Also, some additional oops traces would be nice, if we can get them.

(Please do reply-to-all via email from now on, rather than using the
bugzilla UI).

-
To unsubscribe from this list: send the line 

[PATCH] NFS: Kill the obsolete NFS_PARANOIA

2006-12-19 Thread Jesper Juhl

Linus,

This patch has been both compile and run-time tested.
It has been in -mm for quite a while without problems.
Trond & Andrew have both signed off on it.

Please apply.


Remove obsolete NFS_PARANOIA.

Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Acked-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/dir.c  |   17 ++---
 fs/nfs/getroot.c  |1 -
 fs/nfs/inode.c|3 ---
 fs/nfs/nfs2xdr.c  |1 -
 fs/nfs/pagelist.c |7 ---
 5 files changed, 2 insertions(+), 27 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index dee3d6c..8b71075 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -38,7 +38,6 @@ #include "nfs4_fs.h"
 #include "delegation.h"
 #include "iostat.h"
 
-#define NFS_PARANOIA 1
 /* #define NFS_DEBUG_VERBOSE 1 */
 
 static int nfs_opendir(struct inode *, struct file *);
@@ -1322,11 +1321,6 @@ static int nfs_sillyrename(struct inode 
atomic_read(>d_count));
nfs_inc_stats(dir, NFSIOS_SILLYRENAME);
 
-#ifdef NFS_PARANOIA
-if (!dentry->d_inode)
-printk("NFS: silly-renaming %s/%s, negative dentry??\n",
-dentry->d_parent->d_name.name, dentry->d_name.name);
-#endif
/*
 * We don't allow a dentry to be silly-renamed twice.
 */
@@ -1641,16 +1635,9 @@ static int nfs_rename(struct inode *old_
new_inode = NULL;
/* instantiate the replacement target */
d_instantiate(new_dentry, NULL);
-   } else if (atomic_read(_dentry->d_count) > 1) {
-   /* dentry still busy? */
-#ifdef NFS_PARANOIA
-   printk("nfs_rename: target %s/%s busy, d_count=%d\n",
-  new_dentry->d_parent->d_name.name,
-  new_dentry->d_name.name,
-  atomic_read(_dentry->d_count));
-#endif
+   } else if (atomic_read(_dentry->d_count) > 1)
+   /* dentry still busy? */
goto out;
-   }
} else
drop_nlink(new_inode);
 
diff --git a/fs/nfs/getroot.c b/fs/nfs/getroot.c
index 8391bd7..4dc193f 100644
--- a/fs/nfs/getroot.c
+++ b/fs/nfs/getroot.c
@@ -42,7 +42,6 @@ #include "delegation.h"
 #include "internal.h"
 
 #define NFSDBG_FACILITYNFSDBG_CLIENT
-#define NFS_PARANOIA 1
 
 /*
  * get an NFS2/NFS3 root dentry from the root filehandle
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 63e4702..d29dfe0 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -48,7 +48,6 @@ #include "iostat.h"
 #include "internal.h"
 
 #define NFSDBG_FACILITYNFSDBG_VFS
-#define NFS_PARANOIA 1
 
 static void nfs_invalidate_inode(struct inode *);
 static int nfs_update_inode(struct inode *, struct nfs_fattr *);
@@ -1022,10 +1021,8 @@ static int nfs_update_inode(struct inode
/*
 * Big trouble! The inode has become a different object.
 */
-#ifdef NFS_PARANOIA
printk(KERN_DEBUG "%s: inode %ld mode changed, %07o to %07o\n",
__FUNCTION__, inode->i_ino, inode->i_mode, fattr->mode);
-#endif
  out_err:
/*
 * No need to worry about unhashing the dentry, as the
diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index 3be4e72..1fc757b 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -26,7 +26,6 @@ #include 
 #include "internal.h"
 
 #define NFSDBG_FACILITYNFSDBG_XDR
-/* #define NFS_PARANOIA 1 */
 
 /* Mapping from NFS error code to "errno" error code. */
 #define errno_NFSERR_IOEIO
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index ca4b1d4..7e32bf3 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -19,8 +19,6 @@ #include 
 #include 
 #include 
 
-#define NFS_PARANOIA 1
-
 static struct kmem_cache *nfs_page_cachep;
 
 static inline struct nfs_page *
@@ -172,11 +170,6 @@ nfs_release_request(struct nfs_page *req
if (!atomic_dec_and_test(>wb_count))
return;
 
-#ifdef NFS_PARANOIA
-   BUG_ON (!list_empty(>wb_list));
-   BUG_ON (NFS_WBACK_BUSY(req));
-#endif
-
/* Release struct file or cached credential */
nfs_clear_request(req);
put_nfs_open_context(req->wb_context);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5][time][x86_64] Re-enable vsyscall support for x86_64

2006-12-19 Thread john stultz
Cleanup and re-enable vsyscall gettimeofday using the generic 
clocksource infrastructure.

Signed-off-by: John Stultz <[EMAIL PROTECTED]>

 arch/x86_64/Kconfig  |4 +
 arch/x86_64/kernel/hpet.c|6 +
 arch/x86_64/kernel/time.c|6 -
 arch/x86_64/kernel/tsc.c |7 ++
 arch/x86_64/kernel/vmlinux.lds.S |   28 +++--
 arch/x86_64/kernel/vsyscall.c|  121 +++
 include/asm-x86_64/proto.h   |2 
 include/asm-x86_64/timex.h   |1 
 include/asm-x86_64/vsyscall.h|   33 +-
 9 files changed, 105 insertions(+), 103 deletions(-)

linux-2.6.20-rc1_timeofday-arch-x86-64-vsyscall-reenablement_C7.patch

diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index e1d044c..98b11c6 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -28,6 +28,10 @@ config GENERIC_TIME
bool
default y
 
+config GENERIC_TIME_VSYSCALL
+   bool
+   default y
+
 config ZONE_DMA32
bool
default y
diff --git a/arch/x86_64/kernel/hpet.c b/arch/x86_64/kernel/hpet.c
index 74d95d0..cd834cc 100644
--- a/arch/x86_64/kernel/hpet.c
+++ b/arch/x86_64/kernel/hpet.c
@@ -442,6 +442,11 @@ static cycle_t read_hpet(void)
return (cycle_t)readl(hpet_ptr);
 }
 
+static cycle_t __vsyscall_fn vread_hpet(void)
+{
+   return (cycle_t)readl((void *)fix_to_virt(VSYSCALL_HPET) + 0xf0);
+}
+
 struct clocksource clocksource_hpet = {
.name   = "hpet",
.rating = 250,
@@ -450,6 +455,7 @@ struct clocksource clocksource_hpet = {
.mult   = 0, /* set below */
.shift  = HPET_SHIFT,
.is_continuous  = 1,
+   .vread  = vread_hpet,
 };
 
 static int __init init_hpet_clocksource(void)
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 4bc737c..17bb7de 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -53,13 +53,7 @@ DEFINE_SPINLOCK(rtc_lock);
 EXPORT_SYMBOL(rtc_lock);
 DEFINE_SPINLOCK(i8253_lock);
 
-unsigned long vxtime_hz = PIT_TICK_RATE;
-
-struct vxtime_data __vxtime __section_vxtime;  /* for vsyscalls */
-
 volatile unsigned long __jiffies __section_jiffies = INITIAL_JIFFIES;
-struct timespec __xtime __section_xtime;
-struct timezone __sys_tz __section_sys_tz;
 
 unsigned long profile_pc(struct pt_regs *regs)
 {
diff --git a/arch/x86_64/kernel/tsc.c b/arch/x86_64/kernel/tsc.c
index 958ec0a..f16733e 100644
--- a/arch/x86_64/kernel/tsc.c
+++ b/arch/x86_64/kernel/tsc.c
@@ -185,6 +185,12 @@ static cycle_t read_tsc(void)
return ret;
 }
 
+static cycle_t __vsyscall_fn vread_tsc(void)
+{
+   cycle_t ret = (cycle_t)get_cycles_sync();
+   return ret;
+}
+
 static struct clocksource clocksource_tsc = {
.name   = "tsc",
.rating = 300,
@@ -194,6 +200,7 @@ static struct clocksource clocksource_ts
.shift  = 22,
.update_callback= tsc_update_callback,
.is_continuous  = 1,
+   .vread  = vread_tsc,
 };
 
 static int tsc_update_callback(void)
diff --git a/arch/x86_64/kernel/vmlinux.lds.S b/arch/x86_64/kernel/vmlinux.lds.S
index 1e54ddf..adb4263 100644
--- a/arch/x86_64/kernel/vmlinux.lds.S
+++ b/arch/x86_64/kernel/vmlinux.lds.S
@@ -88,31 +88,25 @@ #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET
   __vsyscall_0 = VSYSCALL_VIRT_ADDR;
 
   . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
-  .xtime_lock : AT(VLOAD(.xtime_lock)) { *(.xtime_lock) }
-  xtime_lock = VVIRT(.xtime_lock);
-
-  .vxtime : AT(VLOAD(.vxtime)) { *(.vxtime) }
-  vxtime = VVIRT(.vxtime);
+  .vsyscall_fn : AT(VLOAD(.vsyscall_fn)) { *(.vsyscall_fn) }
+  . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+  .vsyscall_gtod_data : AT(VLOAD(.vsyscall_gtod_data))
+   { *(.vsyscall_gtod_data) }
+  vsyscall_gtod_data = VVIRT(.vsyscall_gtod_data);
 
   .vgetcpu_mode : AT(VLOAD(.vgetcpu_mode)) { *(.vgetcpu_mode) }
   vgetcpu_mode = VVIRT(.vgetcpu_mode);
 
-  .sys_tz : AT(VLOAD(.sys_tz)) { *(.sys_tz) }
-  sys_tz = VVIRT(.sys_tz);
-
-  .sysctl_vsyscall : AT(VLOAD(.sysctl_vsyscall)) { *(.sysctl_vsyscall) }
-  sysctl_vsyscall = VVIRT(.sysctl_vsyscall);
-
-  .xtime : AT(VLOAD(.xtime)) { *(.xtime) }
-  xtime = VVIRT(.xtime);
-
   . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
   .jiffies : AT(VLOAD(.jiffies)) { *(.jiffies) }
   jiffies = VVIRT(.jiffies);
 
-  .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1)) { 
*(.vsyscall_1) }
-  .vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) { 
*(.vsyscall_2) }
-  .vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) { 
*(.vsyscall_3) }
+  .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1))
+   { *(.vsyscall_1) }
+  .vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2))
+   { *(.vsyscall_2) }
+  .vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3))
+   { *(.vsyscall_3) }
 
   . 

Re: 2.6.18 mmap hangs unrelated apps

2006-12-19 Thread Trond Myklebust
On Tue, 2006-12-19 at 19:17 -0500, Trond Myklebust wrote:
> Ack, I'll add one in. If PagePrivate() is set during the call to
> try_to_release_page(), then the page should never be freeable.

OK. This one actually compiles, and eliminates a few logic bugs. Note
that I renamed the callback to ->launder_page() for clarity (and for
histerical reasons).

Cheers
  Trond


commit 85a5b844c56706a5e3f47cde8b82109d325ad609
Author: Trond Myklebust <[EMAIL PROTECTED]>
Date:   Tue Dec 19 20:18:55 2006 -0500

NFS: Fix race in nfs_release_page()

invalidate_inode_pages2() may find the dirty bit has been set on a page
owing to the fact that the page may still be mapped after it was locked.
Only after the call to unmap_mapping_range() are we sure that the page
can no longer be dirtied.
In order to fix this, NFS has hooked the releasepage() method and tries
to write the page out between the call to unmap_mapping_range() and the
call to remove_mapping(). This, however leads to deadlocks in the page
reclaim code, where the page may be locked without holding a reference
to the inode or dentry.

Fix is to add a new address_space_operation, launder_page(), which will
attempt to write out a dirty page without releasing the page lock.

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---
 Documentation/filesystems/Locking |8 
 fs/nfs/file.c |   16 
 include/linux/fs.h|1 +
 mm/truncate.c |   23 ++-
 4 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 790ef6f..28bfea7 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -171,6 +171,7 @@ prototypes:
int (*releasepage) (struct page *, int);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
loff_t offset, unsigned long nr_segs);
+   int (*launder_page) (struct page *);
 
 locking rules:
All except set_page_dirty may block
@@ -188,6 +189,7 @@ bmap:   yes
 invalidatepage:no  yes
 releasepage:   no  yes
 direct_IO: no
+launder_page:  no  yes
 
->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
 may be called from the request handler (/dev/loop).
@@ -281,6 +283,12 @@ buffers from the page in preparation for
 indicate that the buffers are (or may be) freeable.  If ->releasepage is zero,
 the kernel assumes that the fs has no private interest in the buffers.
 
+   ->launder_page() may be called prior to releasing a page if
+it is still found to be dirty. It returns zero if the page was successfully
+cleaned, or an error value if not. Note that in order to prevent the page
+getting mapped back in and redirtied, it needs to be kept locked
+across the entire operation.
+
Note: currently almost all instances of address_space methods are
 using BKL for internal serialization and that's one of the worst sources
 of contention. Normally they are calling library functions (in fs/buffer.c)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 0dd6be3..fab20d0 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -315,14 +315,13 @@ static void nfs_invalidate_page(struct p
 
 static int nfs_release_page(struct page *page, gfp_t gfp)
 {
-   /*
-* Avoid deadlock on nfs_wait_on_request().
-*/
-   if (!(gfp & __GFP_FS))
-   return 0;
-   /* Hack... Force nfs_wb_page() to write out the page */
-   SetPageDirty(page);
-   return !nfs_wb_page(page->mapping->host, page);
+   /* If PagePrivate() is set, then the page is not freeable */
+   return 0;
+}
+
+static int nfs_launder_page(struct page *page)
+{
+   return nfs_wb_page(page->mapping->host, page);
 }
 
 const struct address_space_operations nfs_file_aops = {
@@ -338,6 +337,7 @@ const struct address_space_operations nf
 #ifdef CONFIG_NFS_DIRECTIO
.direct_IO = nfs_direct_IO,
 #endif
+   .launder_page = nfs_launder_page,
 };
 
 static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 186da81..14a337c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -426,6 +426,7 @@ struct address_space_operations {
/* migrate the contents of a page to the specified target */
int (*migratepage) (struct address_space *,
struct page *, struct page *);
+   int (*launder_page) (struct page *);
 };
 
 struct backing_dev_info;
diff --git a/mm/truncate.c b/mm/truncate.c
index 9bfb8e8..d4811dc 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -321,6 +321,16 @@ failed:
return 0;
 }
 
+static int
+do_launder_page(struct address_space *mapping, struct 

[PATCH 2/5][time][x86_64] hpet_address cleanup

2006-12-19 Thread john stultz
In preparation for supporting generic timekeeping, this patch cleans up 
x86-64's use of vxtime.hpet_address, changing it to just hpet_address 
as is also used in i386. This is necessary since the vxtime structure 
will be going away.

Signed-off-by: John Stultz <[EMAIL PROTECTED]>


 arch/i386/kernel/acpi/boot.c |   23 ++-
 arch/x86_64/kernel/apic.c|3 ++-
 arch/x86_64/kernel/time.c|   36 +++-
 include/asm-x86_64/hpet.h|1 +
 4 files changed, 28 insertions(+), 35 deletions(-)

linux-2.6.20-rc1_timeofday-arch-x86-64-hpet-address-cleanup_C7.patch

diff --git a/arch/i386/kernel/acpi/boot.c b/arch/i386/kernel/acpi/boot.c
index c8f96cf..464f95b 100644
--- a/arch/i386/kernel/acpi/boot.c
+++ b/arch/i386/kernel/acpi/boot.c
@@ -638,6 +638,7 @@ static int __init acpi_parse_sbf(unsigne
 }
 
 #ifdef CONFIG_HPET_TIMER
+#include 
 
 static int __init acpi_parse_hpet(unsigned long phys, unsigned long size)
 {
@@ -671,32 +672,20 @@ #define HPET_RESOURCE_NAME_SIZE 9
hpet_res->end = (1 * 1024) - 1;
}
 
+   hpet_address = hpet_tbl->addr.addrl;
 #ifdef CONFIG_X86_64
-   vxtime.hpet_address = hpet_tbl->addr.addrl |
-   ((long)hpet_tbl->addr.addrh << 32);
-
+   hpet_address |= ((long)hpet_tbl->addr.addrh << 32);
+#endif
printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n",
-  hpet_tbl->id, vxtime.hpet_address);
-
-   res_start = vxtime.hpet_address;
-#else  /* X86 */
-   {
-   extern unsigned long hpet_address;
+  hpet_tbl->id, hpet_address);
 
-   hpet_address = hpet_tbl->addr.addrl;
-   printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n",
-  hpet_tbl->id, hpet_address);
-
-   res_start = hpet_address;
-   }
-#endif /* X86 */
+   res_start = hpet_address;
 
if (hpet_res) {
hpet_res->start = res_start;
hpet_res->end += res_start;
insert_resource(_resource, hpet_res);
}
-
return 0;
 }
 #else
diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 124b2d2..7ce7797 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -37,6 +37,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 int apic_mapped;
@@ -763,7 +764,7 @@ static void setup_APIC_timer(unsigned in
local_irq_save(flags);
 
/* wait for irq slice */
-   if (vxtime.hpet_address && hpet_use_timer) {
+   if (hpet_address && hpet_use_timer) {
int trigger = hpet_readl(HPET_T0_CMP);
while (hpet_readl(HPET_COUNTER) >= trigger)
/* do nothing */ ;
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 9f05bc9..af9b072 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -67,6 +67,7 @@ #define US_SCALE  32 /* 2^32, arbitralril
 
 unsigned int cpu_khz;  /* TSC clocks / usec, 
not used here */
 EXPORT_SYMBOL(cpu_khz);
+unsigned long hpet_address;
 static unsigned long hpet_period;  /* fsecs / HPET clock */
 unsigned long hpet_tick;   /* HPET clocks / 
interrupt */
 int hpet_use_timer;/* Use counter of hpet for time 
keeping, otherwise PIT */
@@ -316,7 +317,7 @@ static noinline void handle_lost_ticks(i
   KERN_WARNING "Your time source seems to be instable or "
"some driver is hogging interupts\n");
print_symbol("rip %s\n", get_irq_regs()->rip);
-   if (vxtime.mode == VXTIME_TSC && vxtime.hpet_address) {
+   if (vxtime.mode == VXTIME_TSC && hpet_address) {
printk(KERN_WARNING "Falling back to HPET\n");
if (hpet_use_timer)
vxtime.last = hpet_readl(HPET_T0_CMP) - 
@@ -324,6 +325,7 @@ static noinline void handle_lost_ticks(i
else
vxtime.last = hpet_readl(HPET_COUNTER);
vxtime.mode = VXTIME_HPET;
+   vxtime.hpet_address = hpet_address;
do_gettimeoffset = do_gettimeoffset_hpet;
}
/* else should fall back to PIT, but code missing. */
@@ -354,7 +356,7 @@ void main_timer_handler(void)
 
write_seqlock(_lock);
 
-   if (vxtime.hpet_address)
+   if (hpet_address)
offset = hpet_readl(HPET_COUNTER);
 
if (hpet_use_timer) {
@@ -717,7 +719,7 @@ static __init int late_hpet_init(void)
struct hpet_datahd;
unsigned intntimer;
 
-   if (!vxtime.hpet_address)
+   if (!hpet_address)
return 0;
 
memset(, 0, 

[PATCH 4/5][time][x86_64] Convert x86_64 to use GENERIC_TIME

2006-12-19 Thread john stultz
This patch converts x86_64 to use the GENERIC_TIME infrastructure and 
adds clocksource structures for both TSC and HPET (ACPI PM is shared w/ 
i386).

Signed-off-by: John Stultz <[EMAIL PROTECTED]>

 arch/x86_64/Kconfig|4 
 arch/x86_64/kernel/apic.c  |2 
 arch/x86_64/kernel/hpet.c  |   65 
 arch/x86_64/kernel/pmtimer.c   |   58 ---
 arch/x86_64/kernel/smpboot.c   |1 
 arch/x86_64/kernel/time.c  |  301 -
 arch/x86_64/kernel/tsc.c   |  108 --
 drivers/char/hangcheck-timer.c |2 
 include/asm-x86_64/proto.h |2 
 include/asm-x86_64/timex.h |5 
 10 files changed, 137 insertions(+), 411 deletions(-)

linux-2.6.20-rc1_timeofday-arch-x86-64-generic-time-conversion_C7.patch

diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index d427553..e1d044c 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -24,6 +24,10 @@ config X86
bool
default y
 
+config GENERIC_TIME
+   bool
+   default y
+
 config ZONE_DMA32
bool
default y
diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 7ce7797..723417d 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -786,7 +786,7 @@ static void setup_APIC_timer(unsigned in
/* Turn off PIT interrupt if we use APIC timer as main timer.
   Only works with the PM timer right now
   TBD fix it for HPET too. */
-   if (vxtime.mode == VXTIME_PMTMR &&
+   if ((pmtmr_ioport != 0) &&
smp_processor_id() == boot_cpu_id &&
apic_runs_main_timer == 1 &&
!cpu_isset(boot_cpu_id, timer_interrupt_broadcast_ipi_mask)) {
diff --git a/arch/x86_64/kernel/hpet.c b/arch/x86_64/kernel/hpet.c
index ad67c6b..74d95d0 100644
--- a/arch/x86_64/kernel/hpet.c
+++ b/arch/x86_64/kernel/hpet.c
@@ -21,12 +21,6 @@ unsigned long hpet_tick; /* HPET clocks 
 int hpet_use_timer;/* Use counter of hpet for time keeping,
 * otherwise PIT
 */
-unsigned int do_gettimeoffset_hpet(void)
-{
-   /* cap counter read to one tick to avoid inconsistencies */
-   unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;
-   return (min(counter,hpet_tick) * vxtime.quot) >> US_SCALE;
-}
 
 #ifdef CONFIG_HPET
 static __init int late_hpet_init(void)
@@ -435,3 +429,62 @@ static int __init nohpet_setup(char *s)
 
 __setup("nohpet", nohpet_setup);
 
+#define HPET_MASK  0x
+#define HPET_SHIFT 22
+
+/* FSEC = 10^-15 NSEC = 10^-9 */
+#define FSEC_PER_NSEC  100
+
+static void *hpet_ptr;
+
+static cycle_t read_hpet(void)
+{
+   return (cycle_t)readl(hpet_ptr);
+}
+
+struct clocksource clocksource_hpet = {
+   .name   = "hpet",
+   .rating = 250,
+   .read   = read_hpet,
+   .mask   = (cycle_t)HPET_MASK,
+   .mult   = 0, /* set below */
+   .shift  = HPET_SHIFT,
+   .is_continuous  = 1,
+};
+
+static int __init init_hpet_clocksource(void)
+{
+   unsigned long hpet_period;
+   void __iomem *hpet_base;
+   u64 tmp;
+
+   if (!hpet_address)
+   return -ENODEV;
+
+   /* calculate the hpet address: */
+   hpet_base =
+   (void __iomem*)ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
+   hpet_ptr = hpet_base + HPET_COUNTER;
+
+   /* calculate the frequency: */
+   hpet_period = readl(hpet_base + HPET_PERIOD);
+
+   /*
+* hpet period is in femto seconds per cycle
+* so we need to convert this to ns/cyc units
+* aproximated by mult/2^shift
+*
+*  fsec/cyc * 1nsec/100fsec = nsec/cyc = mult/2^shift
+*  fsec/cyc * 1ns/100fsec * 2^shift = mult
+*  fsec/cyc * 2^shift * 1nsec/100fsec = mult
+*  (fsec/cyc << shift)/100 = mult
+*  (hpet_period << shift)/FSEC_PER_NSEC = mult
+*/
+   tmp = (u64)hpet_period << HPET_SHIFT;
+   do_div(tmp, FSEC_PER_NSEC);
+   clocksource_hpet.mult = (u32)tmp;
+
+   return clocksource_register(_hpet);
+}
+
+module_init(init_hpet_clocksource);
diff --git a/arch/x86_64/kernel/pmtimer.c b/arch/x86_64/kernel/pmtimer.c
index 7554458..ae8f912 100644
--- a/arch/x86_64/kernel/pmtimer.c
+++ b/arch/x86_64/kernel/pmtimer.c
@@ -24,15 +24,6 @@ #include 
 #include 
 #include 
 
-/* The I/O port the PMTMR resides at.
- * The location is detected during setup_arch(),
- * in arch/i386/kernel/acpi/boot.c */
-u32 pmtmr_ioport __read_mostly;
-
-/* value of the Power timer at last timer interrupt */
-static u32 offset_delay;
-static u32 last_pmtmr_tick;
-
 #define ACPI_PM_MASK 0xFF /* limit it to 24 bits */
 
 static inline u32 cyc2us(u32 cycles)
@@ -48,38 +39,6 @@ static inline u32 cyc2us(u32 cycles)
return (cycles >> 10);
 }
 
-int 

[PATCH 3/5][time][x86_64] Split x86_64/kernel/time.c up

2006-12-19 Thread john stultz
In preparation for the x86_64 generic time conversion, this patch 
splits out TSC and HPET related code from arch/x86_64/kernel/time.c 
into respective hpet.c and tsc.c files.

Signed-off-by: John Stultz <[EMAIL PROTECTED]>

 arch/x86_64/kernel/Makefile |2 
 arch/x86_64/kernel/hpet.c   |  437 ++
 arch/x86_64/kernel/time.c   |  628 
 arch/x86_64/kernel/tsc.c|  201 ++
 include/asm-x86_64/hpet.h   |6 
 include/asm-x86_64/timex.h  |   11 
 6 files changed, 660 insertions(+), 625 deletions(-)

linux-2.6.20-rc1_timeofday-arch-x86-64-split-hpet-tsc-time_C7.patch

diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 3c7cbff..e68a87e 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -8,7 +8,7 @@ obj-y   := process.o signal.o entry.o trap
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \
x8664_ksyms.o i387.o syscall.o vsyscall.o \
setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \
-   pci-dma.o pci-nommu.o alternative.o
+   pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-$(CONFIG_X86_MCE)  += mce.o therm_throt.o
diff --git a/arch/x86_64/kernel/hpet.c b/arch/x86_64/kernel/hpet.c
new file mode 100644
index 000..ad67c6b
--- /dev/null
+++ b/arch/x86_64/kernel/hpet.c
@@ -0,0 +1,437 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int nohpet __initdata = 0;
+
+unsigned long hpet_address;
+unsigned long hpet_period; /* fsecs / HPET clock */
+unsigned long hpet_tick;   /* HPET clocks / interrupt */
+
+int hpet_use_timer;/* Use counter of hpet for time keeping,
+* otherwise PIT
+*/
+unsigned int do_gettimeoffset_hpet(void)
+{
+   /* cap counter read to one tick to avoid inconsistencies */
+   unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;
+   return (min(counter,hpet_tick) * vxtime.quot) >> US_SCALE;
+}
+
+#ifdef CONFIG_HPET
+static __init int late_hpet_init(void)
+{
+   struct hpet_datahd;
+   unsigned intntimer;
+
+   if (!hpet_address)
+   return 0;
+
+   memset(, 0, sizeof (hd));
+
+   ntimer = hpet_readl(HPET_ID);
+   ntimer = (ntimer & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT;
+   ntimer++;
+
+   /*
+* Register with driver.
+* Timer0 and Timer1 is used by platform.
+*/
+   hd.hd_phys_address = hpet_address;
+   hd.hd_address = (void __iomem *)fix_to_virt(FIX_HPET_BASE);
+   hd.hd_nirqs = ntimer;
+   hd.hd_flags = HPET_DATA_PLATFORM;
+   hpet_reserve_timer(, 0);
+#ifdef CONFIG_HPET_EMULATE_RTC
+   hpet_reserve_timer(, 1);
+#endif
+   hd.hd_irq[0] = HPET_LEGACY_8254;
+   hd.hd_irq[1] = HPET_LEGACY_RTC;
+   if (ntimer > 2) {
+   struct hpet *hpet;
+   struct hpet_timer   *timer;
+   int i;
+
+   hpet = (struct hpet *) fix_to_virt(FIX_HPET_BASE);
+   timer = >hpet_timers[2];
+   for (i = 2; i < ntimer; timer++, i++)
+   hd.hd_irq[i] = (timer->hpet_config &
+   Tn_INT_ROUTE_CNF_MASK) >>
+   Tn_INT_ROUTE_CNF_SHIFT;
+
+   }
+
+   hpet_alloc();
+   return 0;
+}
+fs_initcall(late_hpet_init);
+#endif
+
+int hpet_timer_stop_set_go(unsigned long tick)
+{
+   unsigned int cfg;
+
+/*
+ * Stop the timers and reset the main counter.
+ */
+
+   cfg = hpet_readl(HPET_CFG);
+   cfg &= ~(HPET_CFG_ENABLE | HPET_CFG_LEGACY);
+   hpet_writel(cfg, HPET_CFG);
+   hpet_writel(0, HPET_COUNTER);
+   hpet_writel(0, HPET_COUNTER + 4);
+
+/*
+ * Set up timer 0, as periodic with first interrupt to happen at hpet_tick,
+ * and period also hpet_tick.
+ */
+   if (hpet_use_timer) {
+   hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
+   HPET_TN_32BIT, HPET_T0_CFG);
+   hpet_writel(hpet_tick, HPET_T0_CMP); /* next interrupt */
+   hpet_writel(hpet_tick, HPET_T0_CMP); /* period */
+   cfg |= HPET_CFG_LEGACY;
+   }
+/*
+ * Go!
+ */
+
+   cfg |= HPET_CFG_ENABLE;
+   hpet_writel(cfg, HPET_CFG);
+
+   return 0;
+}
+
+int hpet_arch_init(void)
+{
+   unsigned int id;
+
+   if (!hpet_address)
+   return -1;
+   set_fixmap_nocache(FIX_HPET_BASE, hpet_address);
+   __set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE);
+
+/*
+ * Read the period, compute tick and quotient.
+ */
+
+   id = hpet_readl(HPET_ID);
+
+   

[PATCH 0/5][time][x86_64] GENERIC_TIME patchset for x86_64

2006-12-19 Thread john stultz
Andrew, Andi,

I didn't hear any objections (or really, any comments) on my 
last release, so as I mentioned then, I want to go ahead and push this 
to Andrew for a bit of testing in -mm. Hopefully targeting for 
inclusion in 2.6.21 or 2.6.22.

Here's the performance data from the last release:

Vanilla TSC:
149 nsecs per gtod call
367 nsecs per CLOCK_MONOTONIC call
288 nsecs per CLOCK_REALTIME call
Vanilla ACPI PM:
1272 nsecs per gtod call
1335 nsecs per CLOCK_MONOTONIC call
1273 nsecs per CLOCK_REALTIME call

GENERIC_TIME TSC:
149 nsecs per gtod call
304 nsecs per CLOCK_MONOTONIC call
275 nsecs per CLOCK_REALTIME call
GENERIC_TIME ACPI PM:
1273 nsecs per gtod call
1275 nsecs per CLOCK_MONOTONIC call
1273 nsecs per CLOCK_REALTIME call

So almost no performance change.

New in the current C8 release:
o Synced up w/ 2.6.20-rc1
o Added a few small cleanups from Ingo

Let me know if you have any thoughts or comments!

thanks again!
-john
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5][time][generic] vsyscall-gtod support for GENERIC_TIME

2006-12-19 Thread john stultz
Provides generic infrastructure for vsyscall-gtod.

Signed-off-by: John Stultz <[EMAIL PROTECTED]>

 include/linux/clocksource.h |8 
 kernel/timer.c  |1 +
 2 files changed, 9 insertions(+)

linux-2.6.20-rc1_timeofday-vsyscall-support_C7.patch

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 1622d23..6899ef3 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -46,6 +46,7 @@ typedef u64 cycle_t;
  * @shift: cycle to nanosecond divisor (power of two)
  * @update_callback:   called when safe to alter clocksource values
  * @is_continuous: defines if clocksource is free-running.
+ * @vread: vsyscall based read
  * @cycle_interval:Used internally by timekeeping core, please ignore.
  * @xtime_interval:Used internally by timekeeping core, please ignore.
  */
@@ -59,6 +60,7 @@ struct clocksource {
u32 shift;
int (*update_callback)(void);
int is_continuous;
+   cycle_t (*vread)(void);
 
/* timekeeping specific data, ignore */
cycle_t cycle_last, cycle_interval;
@@ -182,4 +184,10 @@ int clocksource_register(struct clocksou
 void clocksource_reselect(void);
 struct clocksource* clocksource_get_next(void);
 
+#ifdef CONFIG_GENERIC_TIME_VSYSCALL
+extern void update_vsyscall(struct timespec *ts, struct clocksource *c);
+#else
+#define update_vsyscall(now, c) do { } while(0)
+#endif
+
 #endif /* _LINUX_CLOCKSOURCE_H */
diff --git a/kernel/timer.c b/kernel/timer.c
index feddf81..d7a41e7 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1094,6 +1094,7 @@ #endif
clock->xtime_nsec = 0;
clocksource_calculate_interval(clock, tick_nsec);
}
+   update_vsyscall(, clock);
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: GPL only modules

2006-12-19 Thread Alexandre Oliva
On Dec 19, 2006, "Horst H. von Brand" <[EMAIL PROTECTED]> wrote:

> Sanjoy Mahajan <[EMAIL PROTECTED]> wrote:

>> This License acknowledges your rights of "fair use" or other
>> equivalent, as provided by copyright law. 

>> By choosing 'acknowledges' as the verb, the licensee says explicitly
>> that fair-use rights are already yours, not that they are being given
>> to you.

> Pure noise, a license can't take them away in any case.

Yeah, that's merely informative, indeed.  Point is to ensure people
know their rights, while at the same time avoiding giving impressions
such the one Linus somehow got.

> [That is my pet pevee with GPL: It has a bit of legally binding text, and
>  lots of "explanation" and "philosophy" that don't add anything but
>  confusion. A clear-cut license plus an explanation/comment would have been
>  better. IMHO, IANAL. HAND.]

This bit would probably fit better in the spirit (preamble) than in
the letter.  That's why I filed the comment about it in the preamble.

-- 
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: GPL only modules

2006-12-19 Thread Alexandre Oliva
On Dec 19, 2006, "D. Hazelton" <[EMAIL PROTECTED]> wrote:

> However I have a feeling that the lawyers in the employ of the
> companies that ship BLOB drivers say that all they need to do to
> comply with the GPL is to ship the glue-code in source form.

> And I have to admit that this does seem to comply with the GPL - to the 
> letter, if not the spirit.

I don't see that it does comply even with the letter.  Consider this:

  These requirements apply to the modified work as a whole.  If
  identifiable sections of that work are not derived from the Program,
  and can be reasonably considered independent and separate works in
  themselves, then this License, and its terms, do not apply to those
  sections when you distribute them as separate works.  But when you
  distribute the same sections as part of a whole which is a work
  based on the Program, the distribution of the whole must be on the
  terms of this License, whose permissions for other licensees extend
  to the entire whole, and thus to each and every part regardless of
  who wrote it.

The work, in this case, is the GPLed glue code, in source form, and
the binary blob, without sources.  See that, even though the binary
blob is an independent and separate work in itself, and so it can
indeed be distributed separaly under a different license, when it's
distributed as part of a whole, then the whole must be on the terms of
the GPL.

So the question becomes whether the copyright holder of the glue code
bound by these GPL terms.

(a) If the glue code can be shown to be a derived work from Linux,
even in source form, then the copyright holder *is* bound by these
terms, and thus the whole could only be distributed under the GPL, so
including the binary blob would be in violation of the license.

(b) Now, if the glue code is *not* a derived work from Linux, then the
copyright holder is entitled to use whatever terms she likes.  It
could be any license whatsoever, that permits the distribution of the
whole or of the parts with whatever constraints copyright law
permitted.  Why would they choose the GPL in this case, then?


Let's assume they're not intentionally violating the GPL, but rather
that they believe they're entitled to do what they're doing, i.e.,
that they believe (a) their glue code is not a derived work from
Linux.

In this case, they *can* distribute the glue source code under the GPL
along with their binary blob.  But can anyone else?

Methinks anyone else would be entitled to pass the same whole along
under the GPL, per section 1, but wouldn't be entitled to distribute
modified versions, because this would require the derived work to be
licensed under the GPL, and nobody else is able to provide the source
code to the binary blob.

And then, who'd be entitled to complain?  Only the copyright holder of
the glue code and the binary blob.

Would you like to be on the wrong end of a copyright infringement
lawsuit by one of these binary blob distributors for distributing a
patched version of their glue code + binary blob?  More to the point,
do you think they would actually bring suit, just to make it clear
that the whole point is for them to keep a monopoly on the rights to
modify and then distribute the combined work, in spite of using the
GPL for (part of) the work?


It gets trickier for binaries, since they are quite possibly derived
works from the kernel, licensed under the GPL.  If they are, they
can't be distributed at all, not even by the copyright holder of the
glue code + binary blob.  If they aren't, then the copyright holder
can distribute them, but nobody else can because that would be a
violation of the GPL, as in the discussion above.  So, the copyright
holder would be keeping a monopoly on the rights to distribute
binaries, and anyone else could be sued by them.


Sure enough, one might think of praising them for distributing the
glue code under the GPL.  Then others could take this glue code and
use it for something else that is useful, right?

Well...  Not quite.  For one, even if enabling others to distribute
glue code + binary blobs were a good thing, using somebody else's glue
code means you're bound by the GPL requirements, so you can't ship the
combination of the glue code with your binary blob.

And then, if you intend to use the glue code to plug in some other
code that is GPL-compatible in the kernel, perhaps you'd be better off
not using the glue code at all, but rather modifying the
GPL-compatible code to fit.

So, even if condoning binary blobs were morally acceptable, we still
wouldn't be gaining anything from this relationship, we'd only be
enabling vendors to sell us their undocumented hardware while denying
us our freedoms.

Why should we do this?

-- 
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}
-
To unsubscribe from 

[PATCH 2/4] Add device probing and sysfs integration.

2006-12-19 Thread Kristian Høgsberg
Signed-off-by: Kristian Hoegsberg <[EMAIL PROTECTED]>
---
 drivers/firewire/Makefile |3 
 drivers/firewire/fw-card.c|   56 +++
 drivers/firewire/fw-device-cdev.c |  617 +
 drivers/firewire/fw-device-cdev.h |  146 +
 drivers/firewire/fw-device.c  |  613 +
 drivers/firewire/fw-device.h  |  127 
 drivers/firewire/fw-iso.c |1 
 drivers/firewire/fw-topology.c|   10 -
 drivers/firewire/fw-transaction.c |5 
 drivers/firewire/fw-transaction.h |4 
 10 files changed, 1573 insertions(+), 9 deletions(-)

diff --git a/drivers/firewire/Makefile b/drivers/firewire/Makefile
index db7020d..da77bc0 100644
--- a/drivers/firewire/Makefile
+++ b/drivers/firewire/Makefile
@@ -2,6 +2,7 @@ #
 # Makefile for the Linux IEEE 1394 implementation
 #
 
-fw-core-objs := fw-card.o fw-topology.o fw-transaction.o fw-iso.o
+fw-core-objs := fw-card.o fw-topology.o fw-transaction.o fw-iso.o \
+   fw-device.o fw-device-cdev.o
 
 obj-$(CONFIG_FW) += fw-core.o
diff --git a/drivers/firewire/fw-card.c b/drivers/firewire/fw-card.c
index d8abd70..7977390 100644
--- a/drivers/firewire/fw-card.c
+++ b/drivers/firewire/fw-card.c
@@ -24,6 +24,7 @@ #include 
 #include 
 #include "fw-transaction.h"
 #include "fw-topology.h"
+#include "fw-device.h"
 
 /* The lib/crc16.c implementation uses the standard (0x8005)
  * polynomial, but we need the ITU-T (or CCITT) polynomial (0x1021).
@@ -186,6 +187,59 @@ fw_core_remove_descriptor (struct fw_des
 EXPORT_SYMBOL(fw_core_remove_descriptor);
 
 static void
+fw_card_irm_work(struct work_struct *work)
+{
+   struct fw_card *card =
+   container_of(work, struct fw_card, work.work);
+   struct fw_device *root;
+   unsigned long flags;
+   int new_irm_id, generation;
+
+   /* FIXME: This simple bus management unconditionally picks a
+* cycle master if the current root can't do it.  We need to
+* not do this if there is a bus manager already.  Also, some
+* hubs set the contender bit, which is bogus, so we should
+* probably do a little sanity check on the IRM (like, read
+* the bandwidth register) if it's not us. */
+
+   spin_lock_irqsave(>lock, flags);
+
+   generation = card->generation;
+   root = card->root_node->data;
+
+   if (root == NULL)
+   /* Either link_on is false, or we failed to read the
+* config rom.  In either case, pick another root. */
+   new_irm_id = card->local_node->node_id;
+   else if (root->state != FW_DEVICE_RUNNING)
+   /* If we haven't probed this device yet, bail out now
+* and let's try again once that's done. */
+   new_irm_id = -1;
+   else if (root->config_rom[2] & bib_cmc)
+   /* FIXME: I suppose we should set the cmstr bit in the
+* STATE_CLEAR register of this node, as described in
+* 1394-1995, 8.4.2.6.  Also, send out a force root
+* packet for this node. */
+   new_irm_id = -1;
+   else
+   /* Current root has an active link layer and we
+* successfully read the config rom, but it's not
+* cycle master capable. */
+   new_irm_id = card->local_node->node_id;
+
+   if (card->irm_retries++ > 5)
+   new_irm_id = -1;
+
+   spin_unlock_irqrestore(>lock, flags);
+
+   if (new_irm_id > 0) {
+   fw_notify("Trying to become root (card %d)\n", card->index);
+   fw_send_force_root(card, new_irm_id, generation);
+   fw_core_initiate_bus_reset(card, 1);
+   }
+}
+
+static void
 release_card(struct device *device)
 {
struct fw_card *card =
@@ -222,6 +276,8 @@ fw_card_initialize(struct fw_card *card,
 
card->local_node = NULL;
 
+   INIT_DELAYED_WORK(>work, fw_card_irm_work);
+
card->card_device.bus = _bus_type;
card->card_device.release = release_card;
card->card_device.parent  = card->device;
diff --git a/drivers/firewire/fw-device-cdev.c 
b/drivers/firewire/fw-device-cdev.c
new file mode 100644
index 000..c10e332
--- /dev/null
+++ b/drivers/firewire/fw-device-cdev.c
@@ -0,0 +1,617 @@
+/* -*- c-basic-offset: 8 -*-
+ *
+ * fw-device-cdev.c - Char device for device raw access
+ *
+ * Copyright (C) 2005-2006  Kristian Hoegsberg <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 

[PATCH 3/4] Add driver for OHCI firewire host controllers.

2006-12-19 Thread Kristian Høgsberg
Signed-off-by: Kristian Hoegsberg <[EMAIL PROTECTED]>
---
 drivers/firewire/Kconfig   |   11 
 drivers/firewire/Makefile  |1 
 drivers/firewire/fw-ohci.c | 1394 
 drivers/firewire/fw-ohci.h |  152 +
 4 files changed, 1558 insertions(+), 0 deletions(-)

diff --git a/drivers/firewire/Kconfig b/drivers/firewire/Kconfig
index bdd6303..b386334 100644
--- a/drivers/firewire/Kconfig
+++ b/drivers/firewire/Kconfig
@@ -20,4 +20,15 @@ config FW
  To compile this driver as a module, say M here: the
  module will be called fw-core.
 
+config FW_OHCI
+   tristate "Support for OHCI firewire host controllers"
+   depends on PCI && FW
+   help
+ Enable this driver if you have an firewire controller based
+ on the OHCI specification.  For all practical purposes, this
+ is the only chipset in use, so say Y here.
+
+ To compile this driver as a module, say M here: the
+ module will be called fw-ohci.
+
 endmenu
diff --git a/drivers/firewire/Makefile b/drivers/firewire/Makefile
index da77bc0..add3b98 100644
--- a/drivers/firewire/Makefile
+++ b/drivers/firewire/Makefile
@@ -6,3 +6,4 @@ fw-core-objs := fw-card.o fw-topology.o 
fw-device.o fw-device-cdev.o
 
 obj-$(CONFIG_FW) += fw-core.o
+obj-$(CONFIG_FW_OHCI) += fw-ohci.o
diff --git a/drivers/firewire/fw-ohci.c b/drivers/firewire/fw-ohci.c
new file mode 100644
index 000..5392a2b
--- /dev/null
+++ b/drivers/firewire/fw-ohci.c
@@ -0,0 +1,1394 @@
+/* -*- c-basic-offset: 8 -*-
+ *
+ * fw-ohci.c - Driver for OHCI 1394 boards
+ * Copyright (C) 2003-2006 Kristian Hoegsberg <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "fw-transaction.h"
+#include "fw-ohci.h"
+
+#define descriptor_output_more 0
+#define descriptor_output_last (1 << 12)
+#define descriptor_input_more  (2 << 12)
+#define descriptor_input_last  (3 << 12)
+#define descriptor_status  (1 << 11)
+#define descriptor_key_immediate   (2 << 8)
+#define descriptor_ping(1 << 7)
+#define descriptor_yy  (1 << 6)
+#define descriptor_no_irq  (0 << 4)
+#define descriptor_irq_error   (1 << 4)
+#define descriptor_irq_always  (3 << 4)
+#define descriptor_branch_always   (3 << 2)
+
+struct descriptor {
+   __le16 req_count;
+   __le16 control;
+   __le32 data_address;
+   __le32 branch_address;
+   __le16 res_count;
+   __le16 transfer_status;
+} __attribute__((aligned(16)));
+
+struct ar_context {
+   struct fw_ohci *ohci;
+   struct descriptor descriptor;
+   __le32 buffer[512];
+   dma_addr_t descriptor_bus;
+   dma_addr_t buffer_bus;
+
+   u32 command_ptr;
+   u32 control_set;
+   u32 control_clear;
+
+   struct tasklet_struct tasklet;
+};
+
+struct at_context {
+   struct fw_ohci *ohci;
+   dma_addr_t descriptor_bus;
+   dma_addr_t buffer_bus;
+
+   struct list_head list;
+
+   struct {
+   struct descriptor more;
+   __le32 header[4];
+   struct descriptor last;
+   } d;
+
+   u32 command_ptr;
+   u32 control_set;
+   u32 control_clear;
+
+   struct tasklet_struct tasklet;
+};
+
+#define it_header_sy(v)  ((v) <<  0)
+#define it_header_tcode(v)   ((v) <<  4)
+#define it_header_channel(v) ((v) <<  8)
+#define it_header_tag(v) ((v) << 14)
+#define it_header_speed(v)   ((v) << 16)
+#define it_header_data_length(v) ((v) << 16)
+
+struct iso_context {
+   struct fw_iso_context base;
+   struct tasklet_struct tasklet;
+   u32 control_set;
+   u32 control_clear;
+   u32 command_ptr;
+   u32 context_match;
+
+   struct descriptor *buffer;
+   dma_addr_t buffer_bus;
+   struct descriptor *head_descriptor;
+   struct descriptor *tail_descriptor;
+   struct descriptor *tail_descriptor_last;
+   struct descriptor *prev_descriptor;
+};
+
+#define CONFIG_ROM_SIZE 1024
+
+struct fw_ohci {
+   struct fw_card card;
+
+   __iomem char 

[PATCH 4/4] Add SBP-2 protocol driver for storage devices.

2006-12-19 Thread Kristian Høgsberg
Signed-off-by: Kristian Hoegsberg <[EMAIL PROTECTED]>
---
 drivers/firewire/Kconfig   |   12 
 drivers/firewire/Makefile  |1 
 drivers/firewire/fw-sbp2.c | 1073 
 3 files changed, 1086 insertions(+), 0 deletions(-)

diff --git a/drivers/firewire/Kconfig b/drivers/firewire/Kconfig
index b386334..bfab4b3 100644
--- a/drivers/firewire/Kconfig
+++ b/drivers/firewire/Kconfig
@@ -31,4 +31,16 @@ config FW_OHCI
  To compile this driver as a module, say M here: the
  module will be called fw-ohci.
 
+config FW_SBP2
+   tristate "Support for storage devices (SBP-2 protocol driver)"
+   depends on FW && SCSI
+   help
+ This option enables you to use SBP-2 devices connected to an
+ firewire bus.  SBP-2 devices include storage devices like
+ harddisks and DVD drives, also some other FireWire devices
+ like scanners.
+
+ You should also enable support for disks, CD-ROMs, etc. in the SCSI
+ configuration section.
+
 endmenu
diff --git a/drivers/firewire/Makefile b/drivers/firewire/Makefile
index add3b98..b955c99 100644
--- a/drivers/firewire/Makefile
+++ b/drivers/firewire/Makefile
@@ -7,3 +7,4 @@ fw-core-objs := fw-card.o fw-topology.o 
 
 obj-$(CONFIG_FW) += fw-core.o
 obj-$(CONFIG_FW_OHCI) += fw-ohci.o
+obj-$(CONFIG_FW_SBP2) += fw-sbp2.o
\ No newline at end of file
diff --git a/drivers/firewire/fw-sbp2.c b/drivers/firewire/fw-sbp2.c
new file mode 100644
index 000..2756e0c
--- /dev/null
+++ b/drivers/firewire/fw-sbp2.c
@@ -0,0 +1,1073 @@
+/* -*- c-basic-offset: 8 -*-
+ * fw-sbp2.c -- SBP2 driver (SCSI over IEEE1394)
+ *
+ * Copyright (C) 2005-2006  Kristian Hoegsberg <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "fw-transaction.h"
+#include "fw-topology.h"
+#include "fw-device.h"
+
+/* I don't know why the SCSI stack doesn't define something like this... */
+typedef void (*scsi_done_fn_t) (struct scsi_cmnd *);
+
+static const char sbp2_driver_name[] = "sbp2";
+
+struct sbp2_device {
+   struct fw_unit *unit;
+   struct fw_address_handler address_handler;
+   struct list_head orb_list;
+   u64 management_agent_address;
+   u64 command_block_agent_address;
+   u32 workarounds;
+   int login_id;
+
+   /* We cache these addresses and only update them once we've
+* logged in or reconnected to the sbp2 device.  That way, any
+* IO to the device will automatically fail and get retried if
+* it happens in a window where the device is not ready to
+* handle it (e.g. after a bus reset but before we reconnect). */
+   int node_id;
+   int address_high;
+   int generation;
+
+   struct work_struct work;
+   struct Scsi_Host *scsi_host;
+};
+
+#define SBP2_MAX_SG_ELEMENT_LENGTH 0xf000
+#define SBP2_MAX_SECTORS   255 /* Max sectors supported */
+#define SBP2_MAX_CMDS  8   /* This should be safe */
+
+#define SBP2_ORB_NULL  0x8000
+
+#define SBP2_DIRECTION_TO_MEDIA0x0
+#define SBP2_DIRECTION_FROM_MEDIA  0x1
+
+/* Unit directory keys */
+#define SBP2_COMMAND_SET_SPECIFIER 0x38
+#define SBP2_COMMAND_SET   0x39
+#define SBP2_COMMAND_SET_REVISION  0x3b
+#define SBP2_FIRMWARE_REVISION 0x3c
+
+/* Flags for detected oddities and brokeness */
+#define SBP2_WORKAROUND_128K_MAX_TRANS 0x1
+#define SBP2_WORKAROUND_INQUIRY_36 0x2
+#define SBP2_WORKAROUND_MODE_SENSE_8   0x4
+#define SBP2_WORKAROUND_FIX_CAPACITY   0x8
+#define SBP2_WORKAROUND_OVERRIDE   0x100
+
+/* Management orb opcodes */
+#define SBP2_LOGIN_REQUEST 0x0
+#define SBP2_QUERY_LOGINS_REQUEST  0x1
+#define SBP2_RECONNECT_REQUEST 0x3
+#define SBP2_SET_PASSWORD_REQUEST  0x4
+#define SBP2_LOGOUT_REQUEST0x7
+#define SBP2_ABORT_TASK_REQUEST0xb
+#define SBP2_ABORT_TASK_SET0xc
+#define SBP2_LOGICAL_UNIT_RESET0xe
+#define SBP2_TARGET_RESET_REQUEST  0xf
+
+/* Offsets for command block agent registers */
+#define SBP2_AGENT_STATE 

[PATCH 0/4] New firewire stack - updated patches

2006-12-19 Thread Kristian Høgsberg
Hi,

Here's a new set of patches for the new firewire stack.  The changes
since the last set of patches address the issues that were raised on
the list and can be reviewed in detail here:

  http://gitweb.freedesktop.org/?p=users/krh/juju.git

but to sum up the changes:

 - Got rid of bitfields.

 - Tested on ppc, ppc64 x86-64 and x86.

 - ioctl interface tested on 32-bit userspace / 64-bit kernels.

 - ASCIIfied sources.

 - Incorporated Jeff Garziks comments.

 - Updated to work with the new workqueue API changes.

 - Moved subsystem to drivers/firewire from drivers/fw.

plus a number of bug fixes.

As mentioned last time, the stack still lacks isochronous receive
functionality to be on par with the old stack, feature-wise.  This is
the one remaining piece of feature work kernel-side.  When that is
done, I have a couple of TODO items in user space:

 - Make a libraw1394 compatibility library

 - Port libdv1394 to new isochronous API.

which will allow us to move most user space applications to the new
stack.  That is, even if the new stack provides a new interface for
asynchronous and isochronous IO, a lot of applications can still work
since the changes are isolated to a couple of libraries.  This is
still in development and is being discussed on the linux1394-devel
list.  It will likely require a few changes kernel side in the stack
as we figure out how to do this.

It is still work in progress, but at least now it should work across
all architectures and endianesses.

Happy Holidays,
Kristian

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG on 2.6.20-rc1 when using gdb

2006-12-19 Thread Dave Airlie

On 12/20/06, Andrew Morton <[EMAIL PROTECTED]> wrote:

> When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack
> trace as the program was running (e.g. I had typed 'run' in gdb):
>
> WARNING at kernel/softirq.c:137 local_bh_enable()
>  [] dump_trace+0x68/0x1d9
>  [] show_trace_log_lvl+0x18/0x2c
>  [] show_trace+0xf/0x11
>  [] dump_stack+0x12/0x14
>  [] local_bh_enable+0x44/0x94
>  [] unix_release_sock+0x6e/0x1fe
>  [] unix_stream_connect+0x3b4/0x3cf
>  [] sys_connect+0x82/0xad
>  [] sys_socketcall+0xac/0x261
>  [] syscall_call+0x7/0xb
>  [] 0xb7f70822
>  ===
> [ cut here ]
> kernel BUG at fs/buffer.c:1235!
> invalid opcode:  [#1]
> PREEMPT
> Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd
> exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac
> battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative
> cpufreq_ondemand cpufreq_performance cpufreq_powersave
> speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss
> snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir
> sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt
> crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250
> serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic
> pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev
> ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix
> ide_core ehci_hcd uhci_hcd usbcore
> CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010046   (2.6.20-rc1 #1)
> EIP is at __find_get_block+0x1c/0x16f
> eax: 0086   ebx:    ecx:    edx: 0088a800
> esi: 0088a800   edi:    ebp: dfffd040   esp: cad2dd30
> ds: 007b   es: 007b   ss: 0068
> Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0
> task.ti=cad2c000)
> Stack: cad2dd58 c02caa0b 0002 000e 000b 0001 e8836580
> 0088a800
>  e8836610  c01793dc 1000 c03ab3e0
> f3cadd80
>0086 c90d41b0 0088a800  dfffd040 8000 
> 0002
> Call Trace:
>  [] __getblk+0x23/0x268
>  [] ext3_getblk+0x10b/0x244 [ext3]
>  [] ext3_bread+0x19/0x70 [ext3]
>  [] dx_probe+0x43/0x2c9 [ext3]
>  [] ext3_htree_fill_tree+0x99/0x1ba [ext3]
>  [] ext3_readdir+0x1d4/0x5ed [ext3]
>  [] vfs_readdir+0x63/0x8d
>  [] sys_getdents64+0x63/0xa5
>  [] syscall_call+0x7/0xb
>  [] 0xb7f70822
>  ===
> Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56
> 89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b
> eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74
> EIP: [] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30
>
> This happens on 2.6.20-rc1 but not 2.6.19.
>

And it's repeatable, yes?

And you're sure that use of gdb triggers it?

Something is forgetting to reenable local interrupts.


I've managed to get nearly the same thing on a test system I built
yesterday, my app when running under gdb would also blow up in
__find_get_block.

I was using close to Linus's git head...

Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG on 2.6.20-rc1 when using gdb

2006-12-19 Thread Dave Airlie

On 12/20/06, Dave Airlie <[EMAIL PROTECTED]> wrote:

On 12/20/06, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack
> > trace as the program was running (e.g. I had typed 'run' in gdb):
> >
> > WARNING at kernel/softirq.c:137 local_bh_enable()
> >  [] dump_trace+0x68/0x1d9
> >  [] show_trace_log_lvl+0x18/0x2c
> >  [] show_trace+0xf/0x11
> >  [] dump_stack+0x12/0x14
> >  [] local_bh_enable+0x44/0x94
> >  [] unix_release_sock+0x6e/0x1fe
> >  [] unix_stream_connect+0x3b4/0x3cf
> >  [] sys_connect+0x82/0xad
> >  [] sys_socketcall+0xac/0x261
> >  [] syscall_call+0x7/0xb
> >  [] 0xb7f70822
> >  ===
> > [ cut here ]
> > kernel BUG at fs/buffer.c:1235!
> > invalid opcode:  [#1]
> > PREEMPT
> > Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd
> > exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac
> > battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative
> > cpufreq_ondemand cpufreq_performance cpufreq_powersave
> > speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss
> > snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir
> > sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt
> > crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250
> > serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic
> > pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev
> > ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix
> > ide_core ehci_hcd uhci_hcd usbcore
> > CPU:0
> > EIP:0060:[]Not tainted VLI
> > EFLAGS: 00010046   (2.6.20-rc1 #1)
> > EIP is at __find_get_block+0x1c/0x16f
> > eax: 0086   ebx:    ecx:    edx: 0088a800
> > esi: 0088a800   edi:    ebp: dfffd040   esp: cad2dd30
> > ds: 007b   es: 007b   ss: 0068
> > Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0
> > task.ti=cad2c000)
> > Stack: cad2dd58 c02caa0b 0002 000e 000b 0001 e8836580
> > 0088a800
> >  e8836610  c01793dc 1000 c03ab3e0
> > f3cadd80
> >0086 c90d41b0 0088a800  dfffd040 8000 
> > 0002
> > Call Trace:
> >  [] __getblk+0x23/0x268
> >  [] ext3_getblk+0x10b/0x244 [ext3]
> >  [] ext3_bread+0x19/0x70 [ext3]
> >  [] dx_probe+0x43/0x2c9 [ext3]
> >  [] ext3_htree_fill_tree+0x99/0x1ba [ext3]
> >  [] ext3_readdir+0x1d4/0x5ed [ext3]
> >  [] vfs_readdir+0x63/0x8d
> >  [] sys_getdents64+0x63/0xa5
> >  [] syscall_call+0x7/0xb
> >  [] 0xb7f70822
> >  ===
> > Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56
> > 89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b
> > eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74
> > EIP: [] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30
> >
> > This happens on 2.6.20-rc1 but not 2.6.19.
> >
>
> And it's repeatable, yes?
>
> And you're sure that use of gdb triggers it?
>
> Something is forgetting to reenable local interrupts.

I've managed to get nearly the same thing on a test system I built
yesterday, my app when running under gdb would also blow up in
__find_get_block.

I was using close to Linus's git head...


And of course it was on a fresh 32-bit x86 with FC6 on it.

Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-19 Thread Steven Rostedt
On Mon, 2006-12-11 at 15:53 -0800, Siddha, Suresh B wrote:

> 
> Comments and review feedback welcome. Minimal testing done on couple of
> i386 platforms. Perf testing yet to be done.

Nice work!

> 
> thanks,
> suresh
> ---


> diff -pNru linux-2.6.19-mm1/include/linux/sched.h linux/include/linux/sched.h
> --- linux-2.6.19-mm1/include/linux/sched.h2006-12-12 06:39:22.0 
> -0800
> +++ linux/include/linux/sched.h   2006-12-12 06:51:03.0 -0800
> @@ -195,6 +195,14 @@ extern void sched_init_smp(void);
>  extern void init_idle(struct task_struct *idle, int cpu);
>  
>  extern cpumask_t nohz_cpu_mask;
> +#ifdef CONFIG_SMP
> +extern int select_notick_load_balancer(int cpu);
> +#else
> +static inline int select_notick_load_balancer(int cpu)

Later on in the actual code, the parameter is named stop_tick, which
makes sense. You should change the name here too so it's not confusing
when looking later on at the code.

> +{
> + return 0;
> +}
> +#endif

[...]

> +
> +/*
> + * This routine will try to nominate the ilb (idle load balancing)
> + * owner among the cpus whose ticks are stopped. ilb owner will do the idle
> + * load balancing on behalf of all those cpus. If all the cpus in the system
> + * go into this tickless mode, then there will be no ilb owner (as there is
> + * no need for one) and all the cpus will sleep till the next wakeup event
> + * arrives...
> + *
> + * For the ilb owner, tick is not stopped. And this tick will be used
> + * for idle load balancing. ilb owner will still be part of
> + * notick.cpu_mask..
> + *
> + * While stopping the tick, this cpu will become the ilb owner if there
> + * is no other owner. And will be the owner till that cpu becomes busy
> + * or if all cpus in the system stop their ticks at which point
> + * there is no need for ilb owner.
> + *
> + * When the ilb owner becomes busy, it nominates another owner, during the
> + * schedule()
> + */
> +int select_notick_load_balancer(int stop_tick)
> +{
> + int cpu = smp_processor_id();
> +

[...]

> +#ifdef CONFIG_NO_HZ
> + if (idle_cpu(local_cpu) && notick.load_balancer == local_cpu &&
> + !cpus_empty(cpus))
> + goto restart;
> +#endif
>  }
>  #else
>  /*
> @@ -3562,6 +3669,21 @@ switch_tasks:
>   ++*switch_count;
>  
>   prepare_task_switch(rq, next);
> +#if defined(CONFIG_HZ) && defined(CONFIG_SMP)

Ah! so this is where the CONFIG_NO_HZ mistake came in ;)


> + if (prev == rq->idle && notick.load_balancer == -1) {
> + /*
> +  * simple selection for now: Nominate the first cpu in
> +  * the notick list to be the next ilb owner.
> +  *
> +  * TBD: Traverse the sched domains and nominate
> +  * the nearest cpu in the notick.cpu_mask.
> +  */
> + int ilb = first_cpu(notick.cpu_mask);
> +
> + if (ilb != NR_CPUS)
> + resched_cpu(ilb);
> + }
> +#endif
>   prev = context_switch(rq, prev, next);


-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG on 2.6.20-rc1 when using gdb

2006-12-19 Thread Andrew Morton
On Sun, 17 Dec 2006 20:55:18 -0500
"Andrew J. Barr" <[EMAIL PROTECTED]> wrote:

> When I was using gdb to debug xchat-gnome, I got a kernel BUG and stack
> trace as the program was running (e.g. I had typed 'run' in gdb):
> 
> WARNING at kernel/softirq.c:137 local_bh_enable()
>  [] dump_trace+0x68/0x1d9
>  [] show_trace_log_lvl+0x18/0x2c
>  [] show_trace+0xf/0x11
>  [] dump_stack+0x12/0x14
>  [] local_bh_enable+0x44/0x94
>  [] unix_release_sock+0x6e/0x1fe
>  [] unix_stream_connect+0x3b4/0x3cf
>  [] sys_connect+0x82/0xad
>  [] sys_socketcall+0xac/0x261
>  [] syscall_call+0x7/0xb
>  [] 0xb7f70822
>  ===
> [ cut here ]
> kernel BUG at fs/buffer.c:1235!
> invalid opcode:  [#1]
> PREEMPT
> Modules linked in: binfmt_misc rfcomm l2cap i915 drm bluetooth nfs nfsd
> exportfs lockd nfs_acl sunrpc nvram uinput ipv6 ppdev lp button ac
> battery dm_crypt dm_snapshot dm_mirror dm_mod fuse cpufreq_conservative
> cpufreq_ondemand cpufreq_performance cpufreq_powersave
> speedstep_centrino freq_table ibm_acpi loop snd_intel8x0m snd_pcm_oss
> snd_mixer_oss snd_intel8x0 snd_ac97_codec pcmcia ac97_bus irtty_sir
> sir_dev ipw2200 snd_pcm snd_timer irda ieee80211 ieee80211_crypt
> crc_ccitt rtc parport_pc parport 8250_pnp snd soundcore 8250_pci 8250
> serial_core firmware_class i2c_i801 yenta_socket rsrc_nonstatic
> pcmcia_core snd_page_alloc i2c_core intel_agp agpgart evdev tsdev joydev
> ext3 jbd mbcache ide_cd cdrom ide_disk ide_generic e100 mii generic piix
> ide_core ehci_hcd uhci_hcd usbcore
> CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010046   (2.6.20-rc1 #1)
> EIP is at __find_get_block+0x1c/0x16f
> eax: 0086   ebx:    ecx:    edx: 0088a800
> esi: 0088a800   edi:    ebp: dfffd040   esp: cad2dd30
> ds: 007b   es: 007b   ss: 0068
> Process xchat-gnome (pid: 4322, ti=cad2c000 task=d0cd3ab0
> task.ti=cad2c000)
> Stack: cad2dd58 c02caa0b 0002 000e 000b 0001 e8836580
> 0088a800
>  e8836610  c01793dc 1000 c03ab3e0
> f3cadd80
>0086 c90d41b0 0088a800  dfffd040 8000 
> 0002
> Call Trace:
>  [] __getblk+0x23/0x268
>  [] ext3_getblk+0x10b/0x244 [ext3]
>  [] ext3_bread+0x19/0x70 [ext3]
>  [] dx_probe+0x43/0x2c9 [ext3]
>  [] ext3_htree_fill_tree+0x99/0x1ba [ext3]
>  [] ext3_readdir+0x1d4/0x5ed [ext3]
>  [] vfs_readdir+0x63/0x8d
>  [] sys_getdents64+0x63/0xa5
>  [] syscall_call+0x7/0xb
>  [] 0xb7f70822
>  ===
> Code: 8b 40 08 a8 08 74 05 e8 02 2f 11 00 5b 5e c3 55 89 c5 57 89 cf 56
> 89 d6 53 83 ec 20 9c 58 90 8d b4 26 00 00 00 00 f6 c4 02 75 04 <0f> 0b
> eb fe 89 e0 25 00 e0 ff ff ff 40 14 31 c9 8b 1c 8d a0 74
> EIP: [] __find_get_block+0x1c/0x16f SS:ESP 0068:cad2dd30
> 
> This happens on 2.6.20-rc1 but not 2.6.19.
> 

And it's repeatable, yes?

And you're sure that use of gdb triggers it?

Something is forgetting to reenable local interrupts.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA DMA problem (sata_uli)

2006-12-19 Thread Jeff Garzik

Tejun Heo wrote:

Jeff Garzik wrote:

Alan wrote:

I tracked it down to one of the drives being forced into PIO4 mode
rather than UDMA mode; dmesg bits:
ata4.00: ATA-7, max UDMA/133, 586072368 sectors: LBA48 NCQ (depth 0/32)
ata4.00: ata4: dev 0 multi count 16
ata4.00: simplex DMA is claimed by other device, disabling DMA

Your ULi controller is reporting that it supports UDMA upon only one
channel at a time. The kernel is honouring this information. The older
ULi (was ALi) PATA devices report simplex but let you turn it off so
see if the following does the trick. Test carefully as always with
disk driver
changes.

(Jeff probably best to check the docs before merging this but I believe
it is sane)

Signed-off-by: Alan Cox <[EMAIL PROTECTED]>

My Uli SATA docs do not appear to cover the bmdma registers :(  Only the
PCI config registers.

But regardless, I think the better fix is to never set ATA_HOST_SIMPLEX
if ATA_FLAG_NO_LEGACY is set.

None of the SATA controllers I've ever encountered has been simplex.


Just another data point.  The same problem is reported by bug #7590.

http://bugzilla.kernel.org/show_bug.cgi?id=7590

Is somebody brewing a patch?


Not to my knowledge.  Did you just volunteer?  ;-)

/me runs...

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA DMA problem (sata_uli)

2006-12-19 Thread Tejun Heo
Jeff Garzik wrote:
> Alan wrote:
>>> I tracked it down to one of the drives being forced into PIO4 mode
>>> rather than UDMA mode; dmesg bits:
>>> ata4.00: ATA-7, max UDMA/133, 586072368 sectors: LBA48 NCQ (depth 0/32)
>>> ata4.00: ata4: dev 0 multi count 16
>>> ata4.00: simplex DMA is claimed by other device, disabling DMA
>>
>> Your ULi controller is reporting that it supports UDMA upon only one
>> channel at a time. The kernel is honouring this information. The older
>> ULi (was ALi) PATA devices report simplex but let you turn it off so
>> see if the following does the trick. Test carefully as always with
>> disk driver
>> changes.
>>
>> (Jeff probably best to check the docs before merging this but I believe
>> it is sane)
>>
>> Signed-off-by: Alan Cox <[EMAIL PROTECTED]>
> 
> My Uli SATA docs do not appear to cover the bmdma registers :(  Only the
> PCI config registers.
> 
> But regardless, I think the better fix is to never set ATA_HOST_SIMPLEX
> if ATA_FLAG_NO_LEGACY is set.
> 
> None of the SATA controllers I've ever encountered has been simplex.

Just another data point.  The same problem is reported by bug #7590.

http://bugzilla.kernel.org/show_bug.cgi?id=7590

Is somebody brewing a patch?

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to sysfs PM layer break userspace

2006-12-19 Thread Matthew Garrett
On Tue, Dec 19, 2006 at 01:34:49PM -0800, David Brownell wrote:

> Documentation/feature-removal-schedule.txt has warned about this since
> August, and the PM list has discussed how broken that model is numerous
> times over the past several years.  (I'm pretty sure that discussion has
> leaked out to LKML on occasion.)  It shouldn't be news today.

1) feature-removal-schedule.txt says that it'll be removed in July 2007. 
This isn't July 2007.

2) The functionality was disabled in 2.6.19. The addition to 
feature-removal-schedule.txt was in, uh, 2.6.19.

3) "The whole _point_ of a kernel is to act as a abstraction layer and 
resource management between user programs and hardware/outside world. 
That's why kernels _exist_. Breaking user-land API's is thus by 
definition something totally idiotic.

If you need to break something, you create a new interface, and try to 
translate between the two, and maybe you deprecate the old one so that 
it can be removed once it's not in use any more. If you can't see that 
this is how a kernel should work, you're missing the point of having a 
kernel in the first place."

Linus, http://lkml.org/lkml/2006/10/4/327

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds


On Wed, 20 Dec 2006, Peter Zijlstra wrote:
> On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote:
> > OR:
> > 
> >  - page_mkclean_one() is simply buggy.
> 
> GOLD!

Ok. I was looking at that, and I wondered..

However, if that works, then I _think_ the correct sequence is the 
following..

The rule should be:
 - we flush the tlb _after_ we have cleared it, but _before_ we insert the 
   new entry.

But I dunno. These things are damn subtle. Does this patch fix it for you?

I actually suspect we should do this as an arch-specific macro, and 
totally replace the current "ptep_clear_flush_dirty()" with one that does 
"ptep_clear_flush_dirty_and_set_wp()".

Because what I'd _really_ prefer to do on x86 (and probably on most other 
sane architectures) is to do

 - atomically replace the pte with the EXACT SAME ONE, but one that 
   has the writable bit clear.

bit_clear(_PAGE_BIT_RW, &(ptep)->pte_low);

 - flush the TLB, making sure that all CPU's will no longer write to it:

flush_tlb_page(vma, address);

 - finally, just fetch-and-clear the dirty bit (and since it's no longer 
   writable, nobody should be settign it any more)

ret = bit_clear(__PAGE_BIT_DIRTY, &(ptep)->pte_low);

and now we should be all done.

But the "ptep_get_and_clear() + flush_tlb_page()" sequence should 
hopefully also work.

Pls test.

Linus


diff --git a/mm/rmap.c b/mm/rmap.c
index d8a842a..eec8706 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -448,9 +448,10 @@ static int page_mkclean_one(struct page *page, struct 
vm_area_struct *vma)
goto unlock;
 
entry = ptep_get_and_clear(mm, address, pte);
+   flush_tlb_page(vma, address);
entry = pte_mkclean(entry);
entry = pte_wrprotect(entry);
-   ptep_establish(vma, address, pte, entry);
+   set_pte_at(mm, address, pte, entry);
lazy_mmu_prot_update(entry);
ret = 1;
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

2006-12-19 Thread Herbert Xu
Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> I noticed this bit of discussion in tcp_recvmsg. It implies that a better
> queuing policy would be good. But it is confusing English (Alexey?) so
> not sure where to start.

Actually I think the comment says that the current code isn't the
most elegant but is more efficient.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.18 mmap hangs unrelated apps

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 19:17:43 -0500
Trond Myklebust <[EMAIL PROTECTED]> wrote:

> > (We were supposed to stop doing that about four years ago - change it so
> > that all a_ops must implement ->releasepage, but nobody got around to it).
> 
> Would you still be interested in seeing this done?

Sure, when things calm down.  It's just a cleanup.

There are various places where we got lazy and did this.  ->set_page_dirty,
->page_mkwrite, many others.  With varying degrees of consequential ugliness.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: GPL only modules

2006-12-19 Thread Alexandre Oliva
On Dec 18, 2006, "David Schwartz" <[EMAIL PROTECTED]> wrote:

> It makes no difference whether the "mere aggregation" paragraph kicks in
> because the "mere aggregation" paragraph is *explaining* the *law*. What
> matters is what the law actually *says*.

You mean "mere aggregation" is defined in copyright law?  I don't
think so, otherwise the term 'aggregate' probably wouldn't have
been used in GPLv3.

AFAIK it's perfectly legitimate (even if immoral) for a copyright
license to prohibit the distribution of the software governed by the
license with anything else the author establishes.  E.g., some Java
virtual machine's license used to establish that you couldn't ship it
along with other implementations of Java that didn't pass some
comformance test.

Now, the GPL doesn't do this.  It doesn't say you can't distribute
GPLed software along with any other software.  It only says that, when
you distribute together works that don't constitute mere aggregation
(providing its own definition of mere aggregation), then the whole
must be licensed under the GPL.

> The GPL could say that if you ever see the source code to a GPL'd work,
> every work you ever write must be placed under the GPL. But that wouldn't
> make it true, because that would be a requirement outside the GPL's scope.

It is indeed possible that this would fall outside the scope of
copyright law in the US, and it would not be morally acceptable for
the GPL to impose such a condition.  But then, since nobody can be
forced to see the source code of a GPLed work, or any work for that
matter, acceptance is voluntary, and one shouldn't enter an agreement
one's not willing to abide by.

-- 
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 16:03:49 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Wed, 20 Dec 2006, Peter Zijlstra wrote:
> 
> > On Tue, 2006-12-19 at 14:58 -0800, Andrew Morton wrote:
> > 
> > > Well... we'd need to see (corruption && this-not-triggering) to be sure.
> > > 
> > > Peter, have you been able to trigger the corruption?
> > 
> > Yes; however the mail I send describing that seems to be lost in space.
> 
> Btw, can somebody actually explain the mess that is ext3 "dirtying".
> 
> Ext3 does NOT use __set_page_dirty_buffers. It does
> 
>   static int ext3_journalled_set_page_dirty(struct page *page)
>   {
>   SetPageChecked(page);
>   return __set_page_dirty_nobuffers(page);
>   }
> 
> and uses that "Checked" bit as a "whole page is dirty" bit (which it tests 
> in "writepage()".

This is purely for data=journal, which is rarely used.

In journalled-data mode, write(), write-fault, etc are not allowed to dirty
the pages and buffers, because the data has to be written to the journal
first.  After the data has been written to the journal we only then mark
buffers (and hence pages) dirty as far as the VFS is concerned.  For
checkpointing the data back to its real place on the disk.


For MAP_SHARED pages ext3 cheats madly and doesn't journal the data at all.
In all journalling modes, MAP_SHARED data follows the regular ext2-style
handling.  Which is a bit of a wart.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.18 mmap hangs unrelated apps

2006-12-19 Thread Trond Myklebust
On Tue, 2006-12-19 at 16:03 -0800, Andrew Morton wrote:
> On Tue, 19 Dec 2006 18:19:38 -0500
> Trond Myklebust <[EMAIL PROTECTED]> wrote:
> 
> > NFS: Fix race in nfs_release_page()
> > 
> > invalidate_inode_pages2() may set the dirty bit on a page owing to the 
> > call
> > to unmap_mapping_range() after the page was locked. In order to fix 
> > this,
> > NFS has hooked the releasepage() method. This, however leads to 
> > deadlocks
> > in other parts of the VM.
> 
> hmm, subtle.
> 
> > Fix is to add a new callback: flushpage(), which will write out a dirty
> > page that is under the page lock.
> > 
> 
> I guess this might permit us to clean up some of the nasties in
> invalidate_inode_pages2() - if the page comes dirty again, write it again. 
> But the requirement that the page remain locked makes it hard.  Need to
> think about it some more.

This was one of the reasons why I had to introduce
nfs_writepage_locked() for 2.6.20 (the other reason being readpage()).

The problem is that you can only protect against redirtying of the page
by holding the page lock across the call to unmap_mapping_range(), the
page writeout and the page removal.

> Are you sure this is the cause of the NFS problem?
> 
> > .prepare_write = nfs_prepare_write,
> > .commit_write = nfs_commit_write,
> > .invalidatepage = nfs_invalidate_page,
> > -   .releasepage = nfs_release_page,
> 
> A NULL ->releasepage means that try_to_release_page() will call
> try_to_free_buffers() if PagePrivate().  I suspect you'll need a stub to
> prevent this.

Ack, I'll add one in. If PagePrivate() is set during the call to
try_to_release_page(), then the page should never be freeable.

> (We were supposed to stop doing that about four years ago - change it so
> that all a_ops must implement ->releasepage, but nobody got around to it).

Would you still be interested in seeing this done?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread Matthew Garrett
On Tue, Dec 19, 2006 at 03:36:28PM -0800, David Brownell wrote:
> On Tuesday 19 December 2006 2:57 pm, Matthew Garrett wrote:
> > The fact that something is scheduled to be removed in July 2007 does 
> > *not* mean it's acceptable to break it in 2006. We need to find a way to 
> > fix this functionality in the meantime.
> 
> The disconnect here is analagous to:  I tell you the alleged perpetual
> motion machine never worked, and can't ever work; and you push back and
> say that you need a perpetual motion machine that works, NOW please,
> because you need something that pushes those widgets around.  (There are
> better ways to push widgets than side effects of a broken machine...)

But it *did* work. Userspace could ask the device to suspend, and (in 
general) that would result in the device going into a lower power state. 
You've broken that without providing an alternative.

> Given that your examples are network adapters, you should really look
> more at why "ifdown eth0" (etc) having drivers put the device into a
> low power state (like PCI D3hot, or maybe D2) wouldn't work in any
> particular case.  If you actually have such cases, then maybe those
> specific drivers need to drive new power management interfaces.

We seem to be arguing at cross purposes here. I've absolutely no 
objection to this approach in the long run, just as I've got no 
objection to flying cars or food pills or moon pods. When these things 
exist, the world will indeed be a glorious place. But that doesn't 
justify me slashing your tyres, poisoning your crops or setting fire to 
whatever the real-world analogue of a moon pod is. I had something that 
worked. Now I don't, but instead have the promise that at some point 
I'll have something better. Understand why I'm a touch irritated?

> That's a workable approach to resolving the underlying problem in the
> long term.  In the short term, notice the system still works correctly
> if you don't try writing those files.

Well, except I'm now burning an extra couple of watts of power. I 
consider that pretty broken.

> I'd not be keen on reverting Linus' patch [1] myself, even though few
> drivers have started to use that mechanism yet; that would be a step
> backwards, and would perpetuate users of that broken sysfs file.

I'm sorry, which bit of "Don't break userspace API without adequate 
prior warning and with a workable replacement" is difficult to 
understand?

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: GPL only modules

2006-12-19 Thread Alexandre Oliva
On Dec 18, 2006, "David Schwartz" <[EMAIL PROTECTED]> wrote:

> I don't see why you can't distribute a single DVD that combines the contents
> of the two you bought, so long as you destroy the originals.

Because, for example, per Brazilian law since 1998, fair use only
grants you the right to copy small portions of copyrighted works for
personal use.   http://www.petitiononline.com/netlivre

Remember that the GPL is not only about US copyright law or US
courts.

-- 
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata badness in 2.6.20-rc1? [Was: Re: md patches in -mm]

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 15:26:00 -0800 (PST)
Luben Tuikov <[EMAIL PROTECTED]> wrote:

> The reason was that my dev tree was tainted by this bug:
> 
> if (good_bytes &&
> -   scsi_end_request(cmd, 1, good_bytes, !!result) == NULL)
> +   scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)
> return;
> 
> in scsi_io_completion().  I had there !!result which is wrong, and when
> I diffed against master, it produced a bad patch.

Oh.  I thought that got sorted out.  It's a shame this wasn't made clear to
me..

> As James mentioned one of the chunks is good and can go in.

Please send a new patch, not referential to any previous patch or email,
including full changelogging.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: GPL only modules

2006-12-19 Thread Alexandre Oliva
On Dec 18, 2006, "David Schwartz" <[EMAIL PROTECTED]> wrote:

> No automated, mechanical process can create a derivative work of software.
> (With a few exceptions not relevant here.)

Can you explain what mechanisms are involved in copyright monopolies
over object code, then?
(there's a hint at http://www.fsfla.org/?q=en/node/128#1 )

-- 
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds


On Wed, 20 Dec 2006, Peter Zijlstra wrote:

> On Tue, 2006-12-19 at 14:58 -0800, Andrew Morton wrote:
> 
> > Well... we'd need to see (corruption && this-not-triggering) to be sure.
> > 
> > Peter, have you been able to trigger the corruption?
> 
> Yes; however the mail I send describing that seems to be lost in space.

Btw, can somebody actually explain the mess that is ext3 "dirtying".

Ext3 does NOT use __set_page_dirty_buffers. It does

static int ext3_journalled_set_page_dirty(struct page *page)
{
SetPageChecked(page);
return __set_page_dirty_nobuffers(page);
}

and uses that "Checked" bit as a "whole page is dirty" bit (which it tests 
in "writepage()".

You realize what this all means? It means that ANYTHING that actually 
clears the _real_ dirty bit won't actually be doing anything at all for 
ext3, because the Checked bit will still stay set, and any IO down the 
line on that page would totally ignore the dirty bits on the buffer heads 
and just write out everything.

That is "The Mess(tm)".

It also basically means that anything that clears the dirty bit without 
just calling "writepage()" had _better_ call "invalidatepage()" for the 
whole page, because otherwise the PageChecked bit will never be cleared as 
far as I can see. Happily, at least ext3 seems to _test_ for that case in 
the release_page() function, so it appears that we do do this.

But this seems to just strengthen my argument: you can NEVER clean a page, 
unless you (a) do IO on it immediately afterwards (writeback) or (b) 
invalidate it entirely (truncate).

I'd really like to see just those two functions exist. Preferably in a 
form where you can see easily that we actually follow those rules. Rather 
than having a confusing set of "clear_page_dirty()" and
"test_and_clear_page_dirty()" functions that are called from random 
places.

IOW, I think the "clear_page_dirty_for_io()" is fine (it's case (a)) 
above, and then we should probably have a "cancel_dirty_page()" function 
that does all the current clear_page_dirty() but also makes sure that we 
actually call the invalidate_page() function itself. 

Hmm?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.18 mmap hangs unrelated apps

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 18:19:38 -0500
Trond Myklebust <[EMAIL PROTECTED]> wrote:

> NFS: Fix race in nfs_release_page()
> 
> invalidate_inode_pages2() may set the dirty bit on a page owing to the 
> call
> to unmap_mapping_range() after the page was locked. In order to fix this,
> NFS has hooked the releasepage() method. This, however leads to deadlocks
> in other parts of the VM.

hmm, subtle.

> Fix is to add a new callback: flushpage(), which will write out a dirty
> page that is under the page lock.
> 

I guess this might permit us to clean up some of the nasties in
invalidate_inode_pages2() - if the page comes dirty again, write it again. 
But the requirement that the page remain locked makes it hard.  Need to
think about it some more.

Are you sure this is the cause of the NFS problem?

>   .prepare_write = nfs_prepare_write,
>   .commit_write = nfs_commit_write,
>   .invalidatepage = nfs_invalidate_page,
> - .releasepage = nfs_release_page,

A NULL ->releasepage means that try_to_release_page() will call
try_to_free_buffers() if PagePrivate().  I suspect you'll need a stub to
prevent this.

(We were supposed to stop doing that about four years ago - change it so
that all a_ops must implement ->releasepage, but nobody got around to it).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19.1, sata_sil: sata dvd writer doesn't work

2006-12-19 Thread Tejun Heo
* dmesg is truncated, please post the content of file /var/log/boot.msg.

* Please post the result of 'lspci -nnvvv'

* Please try the attached patch and see if it makes any difference and
post the result of 'dmesg' after trying to play a problematic dvd.

-- 
tejun
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 02b2b27..bbbec75 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1433,16 +1433,47 @@ static void ata_eh_report(struct ata_port *ap)
}
 
for (tag = 0; tag < ATA_MAX_QUEUE; tag++) {
+   static const char *dma_str[] = {
+   [DMA_BIDIRECTIONAL] = "bidi",
+   [DMA_TO_DEVICE] = "out",
+   [DMA_FROM_DEVICE]   = "in",
+   [DMA_NONE]  = "",
+   };
struct ata_queued_cmd *qc = __ata_qc_from_tag(ap, tag);
+   struct ata_taskfile *cmd = >tf, *res = >result_tf;
+   const u8 *c = qc->cdb;
+   unsigned int nbytes;
 
if (!(qc->flags & ATA_QCFLAG_FAILED) || !qc->err_mask)
continue;
 
-   ata_dev_printk(qc->dev, KERN_ERR, "tag %d cmd 0x%x "
-  "Emask 0x%x stat 0x%x err 0x%x (%s)\n",
-  qc->tag, qc->tf.command, qc->err_mask,
-  qc->result_tf.command, qc->result_tf.feature,
-  ata_err_string(qc->err_mask));
+   nbytes = qc->nbytes;
+   if (!nbytes)
+   nbytes = qc->nsect << 9;
+
+   ata_dev_printk(qc->dev, KERN_ERR,
+   "cmd 
%02x/%02x:%02x:%02x:%02x:%02x/%02x:%02x:%02x:%02x:%02x/%02x "
+   "tag %d cdb 0x%x data %u %s\n "
+   "res 
%02x/%02x:%02x:%02x:%02x:%02x/%02x:%02x:%02x:%02x:%02x/%02x "
+   "Emask 0x%x (%s)\n",
+   cmd->command, cmd->feature, cmd->nsect,
+   cmd->lbal, cmd->lbam, cmd->lbah,
+   cmd->hob_feature, cmd->hob_nsect,
+   cmd->hob_lbal, cmd->hob_lbam, cmd->hob_lbah,
+   cmd->device, qc->tag, qc->cdb[0], nbytes,
+   dma_str[qc->dma_dir],
+   res->command, res->feature, res->nsect,
+   res->lbal, res->lbam, res->lbah,
+   res->hob_feature, res->hob_nsect,
+   res->hob_lbal, res->hob_lbam, res->hob_lbah,
+   res->device, qc->err_mask, 
ata_err_string(qc->err_mask));
+
+   ata_dev_printk(qc->dev, KERN_ERR,
+  "CDB: %02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x "
+  "%02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x p=%d\n",
+  c[0], c[1], c[2], c[3], c[4], c[5], c[6], c[7],
+  c[8], c[9], c[10], c[11], c[12], c[13], c[14], 
c[15],
+  cmd->protocol);
}
 }
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3ac4890..f018e49 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -191,6 +191,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned 
char *cmd,
goto out;
 
req->cmd_len = COMMAND_SIZE(cmd[0]);
+   memset(req->cmd, 0, BLK_MAX_CDB); /* ATAPI hates garbage after CDB */
memcpy(req->cmd, cmd, req->cmd_len);
req->sense = sense;
req->sense_len = 0;


Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Peter Zijlstra
On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote:

> OR:
> 
>  - page_mkclean_one() is simply buggy.

GOLD!

it seems to work with all this (full diff against current git).

/me rebuilds full kernel to make sure...
reboot...
test...  pff the tension...
yay, still good!

Andrei; would you please verify.

The magic seems to be in the extra tlb flush after clearing the dirty
bit. Just too bad ptep_clear_flush_dirty() needs ptep not entry.

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 5e7cd45..2b8893b 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -135,8 +135,7 @@ static int cn_call_callback(struct cn_msg *msg, void 
(*destruct_data)(void *), v
spin_lock_bh(>cbdev->queue_lock);
list_for_each_entry(__cbq, >cbdev->queue_list, callback_entry) {
if (cn_cb_equal(&__cbq->id.id, >id)) {
-   if (likely(!test_bit(WORK_STRUCT_PENDING,
-&__cbq->work.work.management) &&
+   if (likely(!delayed_work_pending(&__cbq->work) &&
__cbq->data.ddata == NULL)) {
__cbq->data.callback_priv = msg;
 
diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page)
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page)
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/mm/memory.c b/mm/memory.c
index c00bac6..60e0945 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_space *mapping,
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
+static void check_last_page(struct address_space *mapping, loff_t size)
+{
+   pgoff_t index;
+   unsigned int offset;
+   struct page *page;
+
+   if (!mapping)
+   return;
+   offset = size & ~PAGE_MASK;
+   if (!offset)
+   return;
+   index = size >> PAGE_SHIFT;
+   page = find_lock_page(mapping, index);
+   if (page) {
+   unsigned int check = 0;
+   unsigned char *kaddr = kmap_atomic(page, KM_USER0);
+   do {
+   check += kaddr[offset++];
+   } while (offset < PAGE_SIZE);
+   kunmap_atomic(kaddr, KM_USER0);
+   unlock_page(page);
+   page_cache_release(page);
+   if (check)
+   printk(KERN_ERR "%s: BADNESS: truncate check %u\n", 
current->comm, check);
+   }
+}
+
 /**
  * vmtruncate - unmap mappings "freed" by truncate() syscall
  * @inode: inode of the file used
@@ -1875,6 +1902,7 @@ do_expand:
goto out_sig;
if (offset > inode->i_sb->s_maxbytes)
goto out_big;
+   check_last_page(mapping, inode->i_size);
i_size_write(inode, offset);
 
 out_truncate:
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 237107c..f561e72 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -957,7 +957,7 @@ int test_set_page_writeback(struct page *page)
 EXPORT_SYMBOL(test_set_page_writeback);
 
 /*
- * Return true if any of the pages in the mapping are marged with the
+ * Return true if any of the pages in the mapping are marked with the
  * passed tag.
  */
 int mapping_tagged(struct address_space *mapping, int tag)
diff --git a/mm/rmap.c b/mm/rmap.c
index d8a842a..900229a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -432,7 +432,7 @@ static int page_mkclean_one(struct page *page, struct 
vm_area_struct *vma)
 {
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
-   pte_t *pte, 

ok, maybe i misread that whole "kmem_cache_alloc()" thing

2006-12-19 Thread Robert P. J. Day

  all right, i may have misread what's going on with
kmem_cache_alloc() and kmem_cache_zalloc(), and my earlier submission
may be entirely nonsense, since it involved transformations like this:

 * it with privilege level 3 because the IVE uses non-privileged 
accesses to these
 * tables.  IA-32 segmentation is used to protect against IA-32 
accesses to them.
 */
-   vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
+   vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
if (vma) {
-   memset(vma, 0, sizeof(*vma));
vma->vm_mm = current->mm;
vma->vm_start = IA32_GDT_OFFSET;
vma->vm_end = vma->vm_start + PAGE_SIZE;


can someone briefly tell me if what i did makes sense?

rday
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Changes to PM layer break userspace

2006-12-19 Thread David Brownell
On Tuesday 19 December 2006 2:57 pm, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 01:22:12PM -0800, David Brownell wrote:

> > As a generic mechanism, that interface has *ALWAYS* been "broken
> > by design"; I'd call it unfixable.  It's deprecated, and scheduled
> > to vanish; see Documentation/feature-removal-schedule.txt ...
> 
> The fact that something is scheduled to be removed in July 2007 does 
> *not* mean it's acceptable to break it in 2006. We need to find a way to 
> fix this functionality in the meantime.

The disconnect here is analagous to:  I tell you the alleged perpetual
motion machine never worked, and can't ever work; and you push back and
say that you need a perpetual motion machine that works, NOW please,
because you need something that pushes those widgets around.  (There are
better ways to push widgets than side effects of a broken machine...)


Given that your examples are network adapters, you should really look
more at why "ifdown eth0" (etc) having drivers put the device into a
low power state (like PCI D3hot, or maybe D2) wouldn't work in any
particular case.  If you actually have such cases, then maybe those
specific drivers need to drive new power management interfaces.

That's a workable approach to resolving the underlying problem in the
long term.  In the short term, notice the system still works correctly
if you don't try writing those files.

I'd not be keen on reverting Linus' patch [1] myself, even though few
drivers have started to use that mechanism yet; that would be a step
backwards, and would perpetuate users of that broken sysfs file.

- Dave

[1] cbd69dbbf1adfce6e048f15afc8629901ca9dae5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc1-mm1

2006-12-19 Thread Luben Tuikov
--- Damien Wyart <[EMAIL PROTECTED]> wrote:
> > > > The reiser4 failure is unexpected. Could you please see if you can
> > > > capture a trace, let the people at [EMAIL PROTECTED] know?
> 
> > > Ok, I've handwritten the messages, here they are :
> 
> > > reiser4 panicked cowardly : reiser4[umount(2451)] : commit_current_atom 
> > > (fs/reiser4/txmngr.c:1087) (zam-597)
> > > write log failed (-5)
> 
> > > [ got 2 copies of them because I have 2 reiser4 fs)
> 
> > > I got them mainly when I try to reboot or halt the machine, and the
> > > process doesn't finish, the computer gets stuck after the reiser4
> > > messages. This is only with 2.6.20-mm1, not 2.6.19-rc6-mm2.
> 
> * Laurent Riffard <[EMAIL PROTECTED]> [2006-12-18 09:03]:
> > fix-sense-key-medium-error-processing-and-retry.patch seems to be the
> > culprit.
> 
> > Reverting it fix those reiser4 panics for me. Damien, could you confirm 
> > please ?
> 
> Yes, this fixes it too on my side. Thanks for this tracking !

I had a bug in my dev tree which got picked up by the patch
when I diffed against master:

-   scsi_end_request(cmd, 1, good_bytes, !!result) == NULL)
+   scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)
return;

As james explained, the other chunk of the patch is still good.

Luben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20-git] sata_svw: Check for errors from ata_device_add()

2006-12-19 Thread Ben Collins
On Tue, 2006-12-19 at 17:59 -0500, Ben Collins wrote:
> Without this patch, G5 oopses on boot. I've had this in Ubuntu since
> 2.6.17, but I forgot it was in there. Still required with 2.6.20.
> 
> Signed-off-by: Ben Collins <[EMAIL PROTECTED]>

Ignore this patch for now, BenH and I are discussing the issue further.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata badness in 2.6.20-rc1? [Was: Re: md patches in -mm]

2006-12-19 Thread Luben Tuikov
--- [EMAIL PROTECTED] wrote:
> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Sun, Dec 17, 2006 at 03:05:39AM -0800
> > On Sun, 17 Dec 2006 12:00:12 +0100
> > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > 
> > > Okay, I have identified the patch that causes the problem to appear, 
> > > which is
> > > 
> > > fix-sense-key-medium-error-processing-and-retry.patch
> > > 
> > > With this patch reverted -rc1-mm1 is happily running on my test box.
> > 
> > That was rather unexpected.   Thanks.
> >
> I can confirm that 2.6.20-rc1-mm1 with this patch reverted mounts my
> raid6 partition without problems. This is x86_64 with SMP.
> 

The reason was that my dev tree was tainted by this bug:

if (good_bytes &&
-   scsi_end_request(cmd, 1, good_bytes, !!result) == NULL)
+   scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)
return;

in scsi_io_completion().  I had there !!result which is wrong, and when
I diffed against master, it produced a bad patch.

As James mentioned one of the chunks is good and can go in.

Luben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >