Re: [PATCH -mm] x86_64 UP needs smp_call_function_single

2006-11-29 Thread Andrew Morton
On Thu, 30 Nov 2006 08:00:00 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:

> On Wed, 2006-11-29 at 17:45 -0800, Andrew Morton wrote:
> > No, I think this patch is right - the declaration of the CONFIG_SMP
> > smp_call_function_single() is in linux/smp.h so the !CONFIG_SMP
> > declaration
> > or definition should be there too.
> > 
> > It's still buggy though.  It should disable local interrupts around
> > the
> > call to match the SMP version.  I'll fix that separately. 
> 
> hm, didnt i send an updated patch for that already? See the patch below,
> from many days ago. I sent it after the tsc-sync-rewrite patch.
> 

Might have got lost.

> --->
> Subject: x86_64: build fixes
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> x86_64 does not build cleanly on UP:
> 
> arch/x86_64/kernel/vsyscall.c: In function 'cpu_vsyscall_notifier':
> arch/x86_64/kernel/vsyscall.c:282: warning: implicit declaration of
> function 'smp_call_function_single'
> arch/x86_64/kernel/vsyscall.c: At top level:
> arch/x86_64/kernel/vsyscall.c:279: warning: 'cpu_vsyscall_notifier'
> defined but not used
> 
> this patch fixes it by making smp_call_function_single() globally
> available.
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> ---
>  include/asm-x86_64/smp.h |   11 ++-
>  include/linux/smp.h  |   10 +++---
>  kernel/sched.c   |   19 +++
>  3 files changed, 28 insertions(+), 12 deletions(-)
> 
> Index: linux/include/asm-x86_64/smp.h
> ===
> --- linux.orig/include/asm-x86_64/smp.h
> +++ linux/include/asm-x86_64/smp.h
> @@ -115,16 +115,9 @@ static __inline int logical_smp_processo
>  }
>  
>  #ifdef CONFIG_SMP
> -#define cpu_physical_id(cpu) x86_cpu_to_apicid[cpu]
> +# define cpu_physical_id(cpu)x86_cpu_to_apicid[cpu]
>  #else
> -#define cpu_physical_id(cpu) boot_cpu_id
> -static inline int smp_call_function_single(int cpuid, void (*func)
> (void *info),

congratulations-your-first-wordwrapped-patch ;)

> --- linux.orig/kernel/sched.c
> +++ linux/kernel/sched.c
> @@ -1110,6 +1110,25 @@ repeat:
>   task_rq_unlock(rq, );
>  }
>  
> +#ifndef CONFIG_SMP
> +/*
> + * Call a function on a specific CPU (on UP the function gets executed
> + * on the current CPU, immediately):
> + */
> +int smp_call_function_single(int cpuid, void (*func) (void *info), void
> *info,
> +  int retry, int wait)
> +{
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + func(info);
> + local_irq_restore(flags);
> +
> + return 0;
> +}

yes, but a) calling the SMP version with local interrupts disabled is a
bug, so we can use bare local_irq_disable() here and b) only two
archictures call or use this function, so all the others don't want a copy
of it.

So I did:

--- a/include/linux/smp.h~up-smp_call_function_single-should-disable-interrupts
+++ a/include/linux/smp.h
@@ -15,6 +15,7 @@ extern void cpu_idle(void);
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -102,8 +103,9 @@ static inline void smp_send_reschedule(i
 static inline int smp_call_function_single(int cpuid, void (*func) (void 
*info),
void *info, int retry, int wait)
 {
-   /* Disable interrupts here? */
+   local_irq_disable();/* Match the SMP call environment */
func(info);
+   local_irq_enable();
return 0;
 }
 
_

which is somewhat unpleasant.  I added a WARN_ON(irqs_disabled()) to the
out-of-line SMP version.


btw, does anyone know why the SMP versions of this function use
spin_lock_bh(_lock)?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] Mount problem with the GFS2 code

2006-11-29 Thread Srinivasa Ds

Hi all
 While mounting the gfs2 filesystem,our test team had a problem and we 
got this error message.

===

GFS2: fsid=: Trying to join cluster "lock_nolock", "dasde1"
GFS2: fsid=dasde1.0: Joined cluster. Now mounting FS...
GFS2: not a GFS2 filesystem
GFS2: fsid=dasde1.0: can't read superblock: -22

==
On debugging further we found that problem is while reading the super 
block(gfs2_read_super) and comparing the magic number in it.
When I  replace the submit_bio() call(present in gfs2_read_super) with 
the sb_getblk() and ll_rw_block(), mount operation succeded.
On further analysis we found that before calling submit_bio(), 
bio->bi_sector was set to "sector" variable. This "sector" variable has 
the same value of bh->b_blocknr(block number). Hence there is a need to 
multiply this valuwith (blocksize >> 9)(9 because,sector size 
2^9,samething happens in ll_rw_block also, before calling submit_bio()).
So I have developed the patch which solves this problem. Please let me 
know your comments.



Signed-off-by: Srinivasa DS <[EMAIL PROTECTED]>


 super.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.19-rc6/fs/gfs2/super.c
===
--- linux-2.6.19-rc6.orig/fs/gfs2/super.c
+++ linux-2.6.19-rc6/fs/gfs2/super.c
@@ -199,7 +199,7 @@ struct page *gfs2_read_super(struct supe
return NULL;
}
 
-   bio->bi_sector = sector;
+   bio->bi_sector = sector * (sb->s_blocksize >> 9);
bio->bi_bdev = sb->s_bdev;
bio_add_page(bio, page, PAGE_SIZE, 0);
 


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread Ingo Molnar

* David Miller <[EMAIL PROTECTED]> wrote:

> > furthermore, the tweak allows the shifting of processing from a 
> > prioritized process context into a highest-priority softirq context. 
> > (it's not proven that there is any significant /net win/ of 
> > performance: all that was proven is that if we shift TCP processing 
> > from process context into softirq context then TCP throughput of 
> > that otherwise penalized process context increases.)
> 
> If we preempt with any packets in the backlog, we send no ACKs and the 
> sender cannot send thus the pipe empties.  That's the problem, this 
> has nothing to do with scheduler priorities or stuff like that IMHO. 
> The argument goes that if the reschedule is delayed long enough, the 
> ACKs will exceed the round trip time and trigger retransmits which 
> will absolutely kill performance.

yes, but i disagree a bit about the characterisation of the problem. The 
question in my opinion is: how is TCP processing prioritized for this 
particular socket, which is attached to the process context which was 
preempted.

normally, normally quite a bit of TCP processing happens in a softirq 
context (in fact most of it happens there), and softirq contexts have no 
fairness whatsoever - they preempt whatever processing is going on, 
regardless of any priority preferences of the user!

what was observed here were the effects of completely throttling TCP 
processing for a given socket. I think such throttling can in fact be 
desirable: there is a /reason/ why the process context was preempted: in 
that load scenario there was 10 times more processing requested from the 
CPU than it can possibly service. It's a serious overload situation and 
it's the scheduler's task to prioritize between workloads!

normally such kind of "throttling" of the TCP stack for this particular 
socket does not happen. Note that there's no performance lost: we dont 
do TCP processing because there are /9 other tasks for this CPU to run/, 
and the scheduler has a tough choice.

Now i agree that there are more intelligent ways to throttle and less 
intelligent ways to throttle, but the notion to allow a given workload 
'steal' CPU time from other workloads by allowing it to push its 
processing into a softirq is i think unfair. (and this issue is 
partially addressed by my softirq threading patches in -rt :-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] Generic Process Containers (+ ResGroups/BeanCounters)

2006-11-29 Thread Paul Jackson
I got a chance to build and test this patch set, to see if it behaved
like I expected cpusets to behave, on an ia64 SN2 Altix system.

Two details - otherwise looked good.  I continue to like this
approach.

The two details are (1) /proc//cpuset not configured by
default if CPUSETS configured, and (2) a locking bug wedging
tasks trying to rmdir a cpuset off the notify_on_release hook.


1) I had to enable CONFIG_PROC_PID_CPUSET.  I used the following
   one line change to do this.  I am willing to consider, in due
   time, phasing out such legacy cpuset support.  But so long as it
   is small stuff that is not getting in anyone's way, I think we
   should take our sweet time about doing so -- as in a year or two
   after marking it deprecated or some such.  No sense deciding that
   matter now; keep the current cpuset API working throughout any
   transitition to container based cpusets, then revisit the question
   of whether to deprecate and eventually remove these kernel API
   details, later on, after the major reconstruction dust settles.
   In general, we try to avoid removing kernel API's, especially if
   they are happily being used and working and not causing anyone
   grief.

 begin 
--- 2.6.19-rc5.orig/init/Kconfig2006-11-29 21:14:48.071114833 -0800
+++ 2.6.19-rc5/init/Kconfig 2006-11-29 22:19:02.015166048 -0800
@@ -268,6 +268,7 @@ config CPUSETS
 config PROC_PID_CPUSET
bool "Include legacy /proc//cpuset file"
depends on CPUSETS
+   default y if CPUSETS
 
 config CONTAINER_CPUACCT
bool "Simple CPU accounting container subsystem"
= end =


2) I wedged the kernel on the container_lock, doing a removal of a cpuset
   using notify_on_release.

   Right now, that test system has the following two tasks, wedged:

 begin 
F S UID   PID PPID C PRI NI ADDR SZ  WCHAN  STIME TTY  TIME CMD
0 S root 4992   34 0  71 -5 -   380   wait   22:51 ?   00:00:00 /bin/sh 
/sbin/cpuset_release_agent /cpuset_test_tree
0 D root 4994 4992 0  72 -5 -   200 contai   22:51 ?   00:00:00 rmdir 
/dev/cpuset//cpuset_test_tree
= end =

   I had a cpuset called /cpuset_test_tree, and some sub-cpusets
   below it.  I marked it 'notify_on_release' and then removed all
   tasks from it, and then removed the child cpusets that it had.
   Removing that last child cpuset presumably triggered the above
   callout to /sbin/cpuset_release_agent, which called rmdir.

   That wait address (from /proc/4994/stat) in hex is a001000f1060,
   and my System.map has the two lines:

a001000f1040 T container_lock
a001000f1360 T container_manage_unlock

   So it is wedged in container_lock.

   I have subsequently also wedged an 'ls' command trying to scan this
   /dev/cpuset directory, waiting in the kernel routine vfs_readdir
   (not surprising, given that I'm in the middle of doing a rmdir on
   that directory.)

   If you don't immediately see the problem, I can go back and get a
   kernel stack trace or whatever else you need.

   This lockup occurred the first, and thus far only, time that I tried
   to use notify_on_release to rmdir a cpuset.  So I presume it is an
   easy failure for me to reproduce.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch/rfc 2.6.19-rc5] arch-neutral GPIO calls

2006-11-29 Thread pHilipp Zabel

On 11/30/06, pHilipp Zabel <[EMAIL PROTECTED]> wrote:

> Effectively, yes.  I counted quite a few implementations in the current
> tree which can trivially (#defines) map to that API.


Or so I thought, sorry.

regards
Philipp
Index: linux-2.6/include/asm-arm/arch-pxa/gpio.h
===
--- /dev/null	1970-01-01 00:00:00.0 +
+++ linux-2.6/include/asm-arm/arch-pxa/gpio.h	2006-11-30 07:39:59.0 +0100
@@ -0,0 +1,65 @@
+/*
+ * linux/include/asm-arm/arch-pxa/gpio.h
+ *
+ * PXA GPIO wrappers for arch-neutral GPIO calls
+ *
+ * Written by Philipp Zabel <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+
+#ifndef __ASM_ARCH_PXA_GPIO_H
+#define __ASM_ARCH_PXA_GPIO_H
+
+#include 
+#include 
+#include 
+
+#include 
+
+static inline int gpio_request(unsigned gpio, const char *label)
+{
+	return 0;
+}
+
+static inline void gpio_free(unsigned gpio)
+{
+	return;
+}
+
+static inline int gpio_direction_input(unsigned gpio)
+{
+	if (gpio > PXA_LAST_GPIO)
+		return -EINVAL;
+	pxa_gpio_mode(gpio | GPIO_IN);
+}
+
+static inline int gpio_direction_output(unsigned gpio)
+{
+	if (gpio > PXA_LAST_GPIO)
+		return -EINVAL;
+	pxa_gpio_mode(gpio | GPIO_OUT);
+}
+
+#define gpio_get_value(gpio)	(GPLR(gpio) & GPIO_bit(gpio))
+#define gpio_set_value(gpio,value) \
+	((value)? (GPSR(gpio) = GPIO_bit(gpio)):(GPCR(gpio) = GPIO_bit(gpio)))
+
+#define gpio_to_irq(gpio)	IRQ_GPIO(gpio)
+#define irq_to_gpio(irq)	IRQ_TO_GPIO(irq)
+
+
+#endif


[patch 0/3] more buffered write fixes

2006-11-29 Thread Nick Piggin
Sorry, I should give some background.

The following patches attempt to fix the problems people have identified
with buffered write deadlock patches. Against 2.6.19 + the previous patchset
dropped from -mm.

Comments?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/3] fs: fix cont vs deadlock patches

2006-11-29 Thread Nick Piggin

Rework the cont filesystem helpers so that generic_cont_expand does the
actual work of expanding the file. cont_prepare_write then calls this
routine if expanding is needed, and retries. Also solves the problem
where cont_prepare_write would previously hold the target page locked
while doing not-very-nice things like locking other pages.

Means that zero-length prepare/commit_write pairs are no longer needed
as an overloaded directive to extend the file, thus cont should operate
better within the new deadlock-free buffered write code.

Converts fat over to the new cont scheme.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>


Index: linux-2.6/fs/buffer.c
===
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -2004,19 +2004,20 @@ int block_read_full_page(struct page *pa
return 0;
 }
 
-/* utility function for filesystems that need to do work on expanding
- * truncates.  Uses prepare/commit_write to allow the filesystem to
- * deal with the hole.  
+/*
+ * Utility function for filesystems that need to do work on expanding
+ * truncates. For moronic filesystems that do not allow holes in file.
  */
-static int __generic_cont_expand(struct inode *inode, loff_t size,
-pgoff_t index, unsigned int offset)
+int generic_cont_expand(struct inode *inode, loff_t size, loff_t *bytes,
+   get_block_t *get_block)
 {
struct address_space *mapping = inode->i_mapping;
+   unsigned long blocksize = 1 << inode->i_blkbits;
struct page *page;
unsigned long limit;
-   int err;
+   int status;
 
-   err = -EFBIG;
+   status = -EFBIG;
 limit = current->signal->rlim[RLIMIT_FSIZE].rlim_cur;
if (limit != RLIM_INFINITY && size > (loff_t)limit) {
send_sig(SIGXFSZ, current, 0);
@@ -2025,146 +2026,83 @@ static int __generic_cont_expand(struct 
if (size > inode->i_sb->s_maxbytes)
goto out;
 
-   err = -ENOMEM;
-   page = grab_cache_page(mapping, index);
-   if (!page)
-   goto out;
-   err = mapping->a_ops->prepare_write(NULL, page, offset, offset);
-   if (err) {
-   /*
-* ->prepare_write() may have instantiated a few blocks
-* outside i_size.  Trim these off again.
-*/
-   unlock_page(page);
-   page_cache_release(page);
-   vmtruncate(inode, inode->i_size);
-   goto out;
-   }
+   status = 0;
 
-   err = mapping->a_ops->commit_write(NULL, page, offset, offset);
+   while (*bytes < size) {
+   unsigned int zerofrom;
+   unsigned int zeroto;
+   void *kaddr;
+   pgoff_t pgpos;
+
+   pgpos = *bytes >> PAGE_CACHE_SHIFT;
+   page = grab_cache_page(mapping, pgpos);
+   if (!page) {
+   status = -ENOMEM;
+   break;
+   }
+   /* we might sleep */
+   if (*bytes >> PAGE_CACHE_SHIFT != pgpos)
+   goto unlock;
 
-   unlock_page(page);
-   page_cache_release(page);
-   if (err > 0)
-   err = 0;
-out:
-   return err;
-}
+   zerofrom = *bytes & ~PAGE_CACHE_MASK;
+   if (zerofrom & (blocksize-1))
+   *bytes = (*bytes + blocksize-1) & (blocksize-1);
 
-int generic_cont_expand(struct inode *inode, loff_t size)
-{
-   pgoff_t index;
-   unsigned int offset;
+   zeroto = PAGE_CACHE_SIZE;
 
-   offset = (size & (PAGE_CACHE_SIZE - 1)); /* Within page */
+   status = __block_prepare_write(inode, page, zerofrom,
+   zeroto, get_block);
+   if (status)
+   goto unlock;
+   kaddr = kmap_atomic(page, KM_USER0);
+   memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom);
+   flush_dcache_page(page);
+   kunmap_atomic(kaddr, KM_USER0);
+   status = __block_commit_write(inode, page, zerofrom, zeroto);
 
-   /* ugh.  in prepare/commit_write, if from==to==start of block, we
-   ** skip the prepare.  make sure we never send an offset for the start
-   ** of a block
-   */
-   if ((offset & (inode->i_sb->s_blocksize - 1)) == 0) {
-   /* caller must handle this extra byte. */
-   offset++;
+unlock:
+   unlock_page(page);
+   page_cache_release(page);
+   if (status) {
+   BUG_ON(status == AOP_TRUNCATED_PAGE);
+   break;
+   }
}
-   index = size >> PAGE_CACHE_SHIFT;
 
-   return __generic_cont_expand(inode, size, index, offset);
-}
-
-int generic_cont_expand_simple(struct inode *inode, loff_t size)
-{
-   

[patch 1/3] mm: pagecache write deadlocks zerolength fix

2006-11-29 Thread Nick Piggin

writev with a zero-length segment is a noop, and we shouldn't return EFAULT.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/include/linux/pagemap.h
===
--- linux-2.6.orig/include/linux/pagemap.h
+++ linux-2.6/include/linux/pagemap.h
@@ -198,6 +198,9 @@ static inline int fault_in_pages_writeab
 {
int ret;
 
+   if (unlikely(size == 0))
+   return 0;
+
/*
 * Writing zeroes into userspace here is OK, because we know that if
 * the zero gets there, we'll be overwriting it.
@@ -222,6 +225,9 @@ static inline int fault_in_pages_readabl
volatile char c;
int ret;
 
+   if (unlikely(size == 0))
+   return 0;
+
ret = __get_user(c, uaddr);
if (ret == 0) {
const char __user *end = uaddr + size - 1;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -rt] 2.6.19-4c6-rt9 build problem

2006-11-29 Thread Ingo Molnar

* Paul E. McKenney <[EMAIL PROTECTED]> wrote:

> > > thanks, applied. Have you tried to boot the resulting kernel as 
> > > well?
> 
> And with these two changes, it does boot!

great! Applied both of them.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/3] mm: pagecache write deadlocks stale holes fix

2006-11-29 Thread Nick Piggin

If the data copy within a prepare_write can potentially allocate blocks
to fill holes, so if the page copy fails then new blocks must be zeroed
so uninitialised data cannot be exposed with a subsequent read.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1951,7 +1951,14 @@ retry_noprogress:
bytes);
dec_preempt_count();
 
-   if (!PageUptodate(page)) {
+   if (unlikely(copied != bytes)) {
+   /*
+* Must zero out new buffers here so that we do end
+* up properly filling holes rather than leaving stale
+* data in them that might be read in future.
+*/
+   page_zero_new_buffers(page);
+
/*
 * If the page is not uptodate, we cannot allow a
 * partial commit_write because when we unlock the
@@ -1965,10 +1972,10 @@ retry_noprogress:
 * Abort the operation entirely with a zero length
 * commit_write. Retry.  We will enter the
 * single-segment path below, which should get the
-* filesystem to bring the page uputodate for us next
+* filesystem to bring the page uptodate for us next
 * time.
 */
-   if (unlikely(copied != bytes))
+   if (!PageUptodate(page))
copied = 0;
}
 
Index: linux-2.6/fs/buffer.c
===
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1491,6 +1491,39 @@ out:
 }
 EXPORT_SYMBOL(block_invalidatepage);
 
+void page_zero_new_buffers(struct page *page)
+{
+   unsigned int block_start, block_end;
+   struct buffer_head *head, *bh;
+
+   BUG_ON(!PageLocked(page));
+   if (!page_has_buffers(page))
+   return;
+
+   bh = head = page_buffers(page);
+   block_start = 0;
+   do {
+   block_end = block_start + bh->b_size;
+
+   if (buffer_new(bh)) {
+   void *kaddr;
+
+   if (!PageUptodate(page)) {
+   kaddr = kmap_atomic(page, KM_USER0);
+   memset(kaddr+block_start, 0, bh->b_size);
+   flush_dcache_page(page);
+   kunmap_atomic(kaddr, KM_USER0);
+   }
+   clear_buffer_new(bh);
+   set_buffer_uptodate(bh);
+   mark_buffer_dirty(bh);
+   }
+
+   block_start = block_end;
+   bh = bh->b_this_page;
+   } while (bh != head);
+}
+
 /*
  * We attach and possibly dirty the buffers atomically wrt
  * __set_page_dirty_buffers() via private_lock.  try_to_free_buffers
@@ -1784,36 +1817,33 @@ static int __block_prepare_write(struct 
}
continue;
}
-   if (buffer_new(bh))
-   clear_buffer_new(bh);
if (!buffer_mapped(bh)) {
WARN_ON(bh->b_size != blocksize);
err = get_block(inode, block, bh, 1);
if (err)
break;
-   if (buffer_new(bh)) {
-   unmap_underlying_metadata(bh->b_bdev,
-   bh->b_blocknr);
-   if (PageUptodate(page)) {
-   set_buffer_uptodate(bh);
-   continue;
-   }
-   if (block_end > to || block_start < from) {
-   void *kaddr;
-
-   kaddr = kmap_atomic(page, KM_USER0);
-   if (block_end > to)
-   memset(kaddr+to, 0,
-   block_end-to);
-   if (block_start < from)
-   memset(kaddr+block_start,
-   0, from-block_start);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
+   }
+   if (buffer_new(bh)) {
+   unmap_underlying_metadata(bh->b_bdev, 

redboot partition combind fis / config problem

2006-11-29 Thread Yoshinori Sato
Can't analyze FIS directory in CYGSEM_REDBOOT_FLASH_COMBINED_FIS_AND_CONFIG 
really.

Signed-off-by: Yoshinori Sato <[EMAIL PROTECTED]>

diff --git a/drivers/mtd/redboot.c b/drivers/mtd/redboot.c
index 5b58523..0204cb9 100644
--- a/drivers/mtd/redboot.c
+++ b/drivers/mtd/redboot.c
@@ -110,6 +110,9 @@ #endif
}
}
break;
+   } else {
+   /* re-calculate of real numslots */
+   numslots = buf[i].size / sizeof(struct fis_image_desc);
}
}
if (i == numslots) {

-- 
Yoshinori Sato
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread David Miller
From: Ingo Molnar <[EMAIL PROTECTED]>
Date: Thu, 30 Nov 2006 07:47:58 +0100

> furthermore, the tweak allows the shifting of processing from a 
> prioritized process context into a highest-priority softirq context. 
> (it's not proven that there is any significant /net win/ of performance: 
> all that was proven is that if we shift TCP processing from process 
> context into softirq context then TCP throughput of that otherwise 
> penalized process context increases.)

If we preempt with any packets in the backlog, we send no ACKs and the
sender cannot send thus the pipe empties.  That's the problem, this
has nothing to do with scheduler priorities or stuff like that IMHO.
The argument goes that if the reschedule is delayed long enough, the
ACKs will exceed the round trip time and trigger retransmits which
will absolutely kill performance.

The only reason we block input packet processing while we hold this
lock is because we don't want the receive queue changing from
underneath us while we're copying data to userspace.

Furthermore once you preempt in this particular way, no input
packet processing occurs in that socket still, exacerbating the
situation.

Anyways, even if we somehow unlocked the socket and ran the backlog at
preemption points, by hand, since we've thus deferred the whole work
of processing whatever is in the backlog until the preemption point,
we've lost our quantum already, so it's perhaps not legal to do the
deferred processing as the preemption signalling point from a fairness
perspective.

It would be different if we really did the packet processing at the
original moment (where we had to queue to the socket backlog because
it was locked, in softirq) because then we'd return from the softirq
and hit the preemption point earlier or whatever.

Therefore, perhaps the best would be to see if there is a way we can
still allow input packet processing even while running the majority of
TCP's recvmsg().  It won't be easy :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] rtc: ds1743 support

2006-11-29 Thread Torsten Ertbjerg Rasmussen
Hello, 

The real time clocks ds1742 and ds1743 differs only in the size of the nvram. 
This patch changes the existing ds1742 driver to support also ds1743. The 
main change is that the nvram size is determined from the resource attached 
to the device. 

This patch applies to and have been tested with 2.6.19-rc5 and 2.6.19-rc6.

The patch have benefitted from suggestions from Atsushi Nemeto, who is the 
author of the ds1742 driver. 

Please cc: me on any comments

Regards,
Signed-off-by: Torsten Rasmussen
---
diff -uprN -X linux-2.6.19-rc5-vanilla/Documentation/dontdiff 
linux-2.6.19-rc5-vanilla/drivers/rtc/Kconfig 
linux-2.6.19-rc5/drivers/rtc/Kconfig
--- linux-2.6.19-rc5-vanilla/drivers/rtc/Kconfig2006-11-08 
03:24:20.0 
+0100
+++ linux-2.6.19-rc5/drivers/rtc/Kconfig2006-11-23 11:07:20.157388499 
+0100
@@ -154,11 +154,11 @@ config RTC_DRV_DS1672
  will be called rtc-ds1672.
 
 config RTC_DRV_DS1742
-   tristate "Dallas DS1742"
+   tristate "Dallas DS1742/1743"
depends on RTC_CLASS
help
  If you say yes here you get support for the
- Dallas DS1742 timekeeping chip.
+ Dallas DS1742/1743 timekeeping chip.
 
  This driver can also be built as a module. If so, the module
  will be called rtc-ds1742.
diff -uprN -X linux-2.6.19-rc5-vanilla/Documentation/dontdiff 
linux-2.6.19-rc5-vanilla/drivers/rtc/rtc-ds1742.c 
linux-2.6.19-rc5/drivers/rtc/rtc-ds1742.c
--- linux-2.6.19-rc5-vanilla/drivers/rtc/rtc-ds1742.c   2006-11-08 
03:24:20.0 +0100
+++ linux-2.6.19-rc5/drivers/rtc/rtc-ds1742.c   2006-11-23 11:04:19.977903832 
+0100
@@ -6,6 +6,10 @@
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
+ * 
+ * Copyright (C) 2006 Torsten Ertbjerg Rasmussen <[EMAIL PROTECTED]>
+ *  - nvram size determined from resource
+ *  - this ds1742 driver now supports ds1743. 
  */
 
 #include 
@@ -17,20 +21,19 @@
 #include 
 #include 
 
-#define DRV_VERSION "0.2"
+#define DRV_VERSION "0.3"
 
-#define RTC_REG_SIZE   0x800
-#define RTC_OFFSET 0x7f8
+#define RTC_SIZE   8
 
-#define RTC_CONTROL(RTC_OFFSET + 0)
-#define RTC_CENTURY(RTC_OFFSET + 0)
-#define RTC_SECONDS(RTC_OFFSET + 1)
-#define RTC_MINUTES(RTC_OFFSET + 2)
-#define RTC_HOURS  (RTC_OFFSET + 3)
-#define RTC_DAY(RTC_OFFSET + 4)
-#define RTC_DATE   (RTC_OFFSET + 5)
-#define RTC_MONTH  (RTC_OFFSET + 6)
-#define RTC_YEAR   (RTC_OFFSET + 7)
+#define RTC_CONTROL0
+#define RTC_CENTURY0
+#define RTC_SECONDS1
+#define RTC_MINUTES2
+#define RTC_HOURS  3
+#define RTC_DAY4
+#define RTC_DATE   5
+#define RTC_MONTH  6
+#define RTC_YEAR   7
 
 #define RTC_CENTURY_MASK   0x3f
 #define RTC_SECONDS_MASK   0x7f
@@ -48,7 +51,10 @@
 
 struct rtc_plat_data {
struct rtc_device *rtc;
-   void __iomem *ioaddr;
+   void __iomem *ioaddr_nvram;  
+   void __iomem *ioaddr_rtc;
+   size_t size_nvram;
+   size_t size;
unsigned long baseaddr;
unsigned long last_jiffies;
 };
@@ -57,7 +63,7 @@ static int ds1742_rtc_set_time(struct de
 {
struct platform_device *pdev = to_platform_device(dev);
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
-   void __iomem *ioaddr = pdata->ioaddr;
+   void __iomem *ioaddr = pdata->ioaddr_rtc;
u8 century;
 
century = BIN2BCD((tm->tm_year + 1900) / 100);
@@ -82,7 +88,7 @@ static int ds1742_rtc_read_time(struct d
 {
struct platform_device *pdev = to_platform_device(dev);
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
-   void __iomem *ioaddr = pdata->ioaddr;
+   void __iomem *ioaddr = pdata->ioaddr_rtc;
unsigned int year, month, day, hour, minute, second, week;
unsigned int century;
 
@@ -127,10 +133,10 @@ static ssize_t ds1742_nvram_read(struct 
struct platform_device *pdev =
to_platform_device(container_of(kobj, struct device, kobj));
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
-   void __iomem *ioaddr = pdata->ioaddr;
+   void __iomem *ioaddr = pdata->ioaddr_nvram;
ssize_t count;
 
-   for (count = 0; size > 0 && pos < RTC_OFFSET; count++, size--)
+   for (count = 0; size > 0 && pos < pdata->size_nvram; count++, size--)
*buf++ = readb(ioaddr + pos++);
return count;
 }
@@ -141,10 +147,10 @@ static ssize_t ds1742_nvram_write(struct
struct platform_device *pdev =
to_platform_device(container_of(kobj, struct device, kobj));
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
-   void __iomem 

Re: [RFC: 2.6 patch] remove the broken MTD_PCMCIA driver

2006-11-29 Thread Adrian Bunk
On Tue, Nov 28, 2006 at 10:16:27PM +, David Woodhouse wrote:
> On Sat, 2006-11-18 at 22:40 +0100, Adrian Bunk wrote:
> > The MTD_PCMCIA driver has:
> > - already been marked as BROKEN in 2.6.0 three years ago and
> > - is still marked as BROKEN.
> > 
> > Drivers that had been marked as BROKEN for such a long time seem to be
> > unlikely to be revived in the forseeable future.
> 
> Actually, there's hardware currently on its way to me, and I plan to fix
> this driver fairly soon.

OK.

> > But if anyone wants to ever revive this driver, the code is still
> > present in the older kernel releases.
> 
> I'm unconvinced by that argument in the general case. People don't go
> looking back through git history, do they? Drivers such as this don't
> really do any harm as they are, and they're _much_ easier to find when
> someone does want to fix them up.

If there is an already merged driver that is marked as broken for a long 
time, there are usually two possible cases:
- it is really unused
- patches to fix it are pending or floating around

A patch to remove a driver is usually the best way for getting the 
information which case a driver belongs into (a good example might be 
the zr36120 driver that seems to have found a new maintainer due to my 
removal patch).

And if there's no reaction, the usefullness of very outdated and 
usually non-compiling code is quite questionable.

> dwmw2

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] x86_64 UP needs smp_call_function_single

2006-11-29 Thread Ingo Molnar
On Wed, 2006-11-29 at 17:45 -0800, Andrew Morton wrote:
> No, I think this patch is right - the declaration of the CONFIG_SMP
> smp_call_function_single() is in linux/smp.h so the !CONFIG_SMP
> declaration
> or definition should be there too.
> 
> It's still buggy though.  It should disable local interrupts around
> the
> call to match the SMP version.  I'll fix that separately. 

hm, didnt i send an updated patch for that already? See the patch below,
from many days ago. I sent it after the tsc-sync-rewrite patch.

Ingo

--->
Subject: x86_64: build fixes
From: Ingo Molnar <[EMAIL PROTECTED]>

x86_64 does not build cleanly on UP:

arch/x86_64/kernel/vsyscall.c: In function 'cpu_vsyscall_notifier':
arch/x86_64/kernel/vsyscall.c:282: warning: implicit declaration of
function 'smp_call_function_single'
arch/x86_64/kernel/vsyscall.c: At top level:
arch/x86_64/kernel/vsyscall.c:279: warning: 'cpu_vsyscall_notifier'
defined but not used

this patch fixes it by making smp_call_function_single() globally
available.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/asm-x86_64/smp.h |   11 ++-
 include/linux/smp.h  |   10 +++---
 kernel/sched.c   |   19 +++
 3 files changed, 28 insertions(+), 12 deletions(-)

Index: linux/include/asm-x86_64/smp.h
===
--- linux.orig/include/asm-x86_64/smp.h
+++ linux/include/asm-x86_64/smp.h
@@ -115,16 +115,9 @@ static __inline int logical_smp_processo
 }
 
 #ifdef CONFIG_SMP
-#define cpu_physical_id(cpu)   x86_cpu_to_apicid[cpu]
+# define cpu_physical_id(cpu)  x86_cpu_to_apicid[cpu]
 #else
-#define cpu_physical_id(cpu)   boot_cpu_id
-static inline int smp_call_function_single(int cpuid, void (*func)
(void *info),
-   void *info, int retry, int wait)
-{
-   /* Disable interrupts here? */
-   func(info);
-   return 0;
-}
+# define cpu_physical_id(cpu)  boot_cpu_id
 #endif /* !CONFIG_SMP */
 #endif
 
Index: linux/include/linux/smp.h
===
--- linux.orig/include/linux/smp.h
+++ linux/include/linux/smp.h
@@ -53,9 +53,6 @@ extern void smp_cpus_done(unsigned int m
  */
 int smp_call_function(void(*func)(void *info), void *info, int retry,
int wait);
 
-int smp_call_function_single(int cpuid, void (*func) (void *info), void
*info,
-   int retry, int wait);
-
 /*
  * Call a function on all processors
  */
@@ -103,6 +100,13 @@ static inline void smp_send_reschedule(i
 #endif /* !SMP */
 
 /*
+ * Call a function on a specific CPU (on UP the function gets executed
+ * on the current CPU, immediately):
+ */
+int smp_call_function_single(int cpuid, void (*func) (void *info), void
*info,
+int retry, int wait);
+
+/*
  * smp_processor_id(): get the current CPU ID.
  *
  * if DEBUG_PREEMPT is enabled the we check whether it is
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1110,6 +1110,25 @@ repeat:
task_rq_unlock(rq, );
 }
 
+#ifndef CONFIG_SMP
+/*
+ * Call a function on a specific CPU (on UP the function gets executed
+ * on the current CPU, immediately):
+ */
+int smp_call_function_single(int cpuid, void (*func) (void *info), void
*info,
+int retry, int wait)
+{
+   unsigned long flags;
+
+   local_irq_save(flags);
+   func(info);
+   local_irq_restore(flags);
+
+   return 0;
+}
+EXPORT_SYMBOL(smp_call_function_single);
+#endif
+
 /***
  * kick_process - kick a running thread to enter/exit the kernel
  * @p: the to-be-kicked thread


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread Ingo Molnar

* David Miller <[EMAIL PROTECTED]> wrote:

> This is why my suggestion is to preempt_disable() as soon as we grab 
> the socket lock, [...]

independently of the issue at hand, in general the explicit use of 
preempt_disable() in non-infrastructure code is quite a heavy tool. Its 
effects are heavy and global: it disables /all/ preemption (even on 
PREEMPT_RT). Furthermore, when preempt_disable() is used for per-CPU 
data structures then [unlike for example to a spin-lock] the connection 
between the 'data' and the 'lock' is not explicit - causing all kinds of 
grief when trying to convert such code to a different preemption model. 
(such as PREEMPT_RT :-)

So my plan is to remove all "open-coded" use of preempt_disable() [and 
raw use of local_irq_save/restore] from the kernel and replace it with 
some facility that connects data and lock. (Note that this will not 
result in any actual changes on the instruction level because internally 
every such facility still maps to preempt_disable() on non-PREEMPT_RT 
kernels, so on non-PREEMPT_RT kernels such code will still be the same 
as before.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch/rfc 2.6.19-rc5] arch-neutral GPIO calls

2006-11-29 Thread pHilipp Zabel

Hi,

On 11/23/06, David Brownell <[EMAIL PROTECTED]> wrote:

On Tuesday 21 November 2006 7:57 am, Bill Gatliff wrote:

> Once you're hiding the GPIO number behind an enumeration, you can create
> a bitmap with more information than a single integer.  That extra
> information could be used--- in my implementations, if any ever come
> about--- to store routing information.

But none of the existing GPIO users do that.  The goal wasn't to define
a new notion of GPIO; it was collecting the existing ones under a single
arch-neutral umbrella.


> >It'd also be a big (and needless) disruption to code that's been working
> >fine for several years now ...
>
> ... all of which is using the current GPIO API, you mean?  :)

Effectively, yes.  I counted quite a few implementations in the current
tree which can trivially (#defines) map to that API.


I tried to do that for pxa, the patch is attached.
So what is the state of this discussion, now that 2.6.19 is here?

I just submitted an input driver for GPIO buttons to linux-input that
we use in the handhelds.org kernel for sa1100, pxa and s3c2410 archs.
It needs some ugly
#ifdefs currently, but with common GPIO calls they all could go away.

regards
Philipp
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread Ingo Molnar

* David Miller <[EMAIL PROTECTED]> wrote:

> > yeah, i like this one. If the problem is "too long locked section", 
> > then the most natural solution is to "break up the lock", not to 
> > "boost the priority of the lock-holding task" (which is what the 
> > proposed patch does).
> 
> Ingo you're mis-read the problem :-)

yeah, the problem isnt too long locked section but "too much time spent 
holding a lock" and hence opening up ourselves to possible negative 
side-effects of the scheduler's fairness algorithm when it forces a 
preemption of that process context with that lock held (and forcing all 
subsequent packets to be backlogged).

but please read my last mail - i think i'm slowly starting to wake up 
;-) I dont think there is any real problem: a tweak to the scheduler 
that in essence gives TCP-using tasks a preference changes the balance 
of workloads. Such an explicit tweak is possible already.

furthermore, the tweak allows the shifting of processing from a 
prioritized process context into a highest-priority softirq context. 
(it's not proven that there is any significant /net win/ of performance: 
all that was proven is that if we shift TCP processing from process 
context into softirq context then TCP throughput of that otherwise 
penalized process context increases.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fix for OpenSUSE kernel bug (was Re: [Opps] Invalid opcode)

2006-11-29 Thread Zachary Amsden

S.Çağlar Onur wrote:
05 Kas 2006 Paz 18:40 tarihinde, Andi Kleen şunları yazmıştı: 
  

How do you know this?



Just guessing, if im not wrong panics occur after SMP alternative switching 
code done its job.


  

And does it still happen in 2.6.19-rc4?



Will try

  

in VmWare and Microsoft Virtual
PC and in order to confirm this bug is not our distro specific i
downloaded and tried latest OpenSuse also [1]  and [2] are screens
captured by vmware but exact same panic occurs in Virtual PC as reported
to us in [3].
  

Always the same BUG()?



Yes, same bug

  

There is just some rolling Turkish text there.



Ah im sorry here is the correct links :(

[1] http://cekirdek.pardus.org.tr/~caglar/2.6.18/panic_on_opensuse.png
[2] http://cekirdek.pardus.org.tr/~caglar/2.6.18/panic_on_pardus.png

Cheers
  


I'm proposing this as a fix for your bug. Having tasklets scheduled 
before softirqd gets to run might be somewhat backwards, but there is 
nothing I can find wrong about it from a correctness point of view. 
Better to boot the kernel even when compiled with bug checking on, I think.


This bug started becoming apparent in 2.6.18 because of some rework with 
the CPU hotplug code, but in theory, it exists at least all the way back 
to 2.6.10, which is as far as I looked backwards in time.


Zach
It is possible to have tasklets get scheduled before softirqd has had
a chance to spawn on all CPUs.  This is totally harmless; after success
during action CPU_UP_PREPARE, action CPU_ONLINE will be called, which
immediately wakes softirqd on the appropriate CPU to process the already
pending tasklets.  So there is no danger of having a missed wakeup for
any tasklets that were already pending.

In particular, i386 is affected by this during startup, and is visible when
using a very large initrd; during the time it takes for the initrd to be
decompressed, a timer IRQ can come in and schedule RCU callbacks.  It is also
possible that resending of a hardware IRQ via a softirq triggers the same bug.

Because of different timing conditions, this shows up in all emulators
and virtual machines tested, including Xen, VMware, Virtual PC, and Qemu.
It is also possible to trigger on native hardware with a large enough initrd,
although I don't have a reliable case demonstrating that.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

Index: linux-2.6.18/kernel/softirq.c
===
--- linux-2.6.18.orig/kernel/softirq.c  2006-11-10 14:44:39.0 -0800
+++ linux-2.6.18/kernel/softirq.c   2006-11-29 22:19:36.0 -0800
@@ -574,8 +574,6 @@ static int __cpuinit cpu_callback(struct
 
switch (action) {
case CPU_UP_PREPARE:
-   BUG_ON(per_cpu(tasklet_vec, hotcpu).list);
-   BUG_ON(per_cpu(tasklet_hi_vec, hotcpu).list);
p = kthread_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu);
if (IS_ERR(p)) {
printk("ksoftirqd for %i failed\n", hotcpu);


Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-29 Thread Gautham R Shenoy
On Thu, Nov 30, 2006 at 09:58:07AM +0530, Gautham R Shenoy wrote:
> 
> So can we ignore this circular-dep warning as a false positive?
> Or is there a way to exploit this circular dependency ?
> 
> At the moment, I cannot think of way to exploit this circular dependency
> unless we do something like try destroying the created workqueue when the
> cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware.
> (eeks! that doesn't look good)

Ok, I see that we are already doing it :(. So we can end up in a
deadlock.

Here's the culprit callpath:

_cpu_down()
!
!-> raw_notifier_call_chain(CPU_LOCK_ACQUIRE)
!   !
!   !-> workqueue_cpu_mutex(CPU_LOCK_ACQUIRE) [*]
!
!-> raw_notifier_call_chain(CPU_DEAD)
!
!-> cpufreq_cpu_callback (CPU_DEAD)
!
!-> cpufreq_remove_dev
!
!-> __cpufreq_governor(data, GOVERNOR_STOP)
!
!-> policy->governor->governor()
!
!-> cpufreq_governor_dbs(GOVERNOR_STOP)
!
!-> destroy_workqueue() [*]

[*] indicates function takes workqueue_mutex.

So a deadlock!

I wasn't able to observe this because I'm running Xeon SMP box on which
you cannot offline cpu0. And cpufreq data is created only for cpu0,
while all other cpus cpufreq_data just point to cpu0's cpufreq_data.

So the mentioned callpath within  cpufreq_remove_dev is never reached
during the normal cpu offline cycle.

However, if there are architectures which allow the first-booted-cpu
(or to be precise, the cpu for which cpufreq_data is *actually* created) 
to be offlined and we are running Ondemand governor during the offline,
we will see this deadlock.

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> > Attached is the detailed description of the problem and one possible 
> > solution.
> 
> Thanks.  The attachment will be too large for the mailing-list servers 
> so I uploaded a copy to 
> http://userweb.kernel.org/~akpm/Linux-TCP-Bottleneck-Analysis-Report.pdf
> 
> From a quick peek it appears that you're getting around 10% 
> improvement in TCP throughput, best case.

Wenji, have you tried to renice the receiving task (to say nice -20) and 
see how much TCP throughput you get in "background load of 10.0". 
(similarly, you could also renice the background load tasks to nice +19 
and/or set their scheduling policy to SCHED_BATCH)

as far as i can see, the numbers in the paper and the patch prove the 
following two points:

 - a task doing TCP receive with 10 other tasks running on the CPU will
   see lower TCP throughput than if it had the CPU for itself alone.

 - a patch that tweaks the scheduler to give the receiving task more
   timeslices (i.e. raises its nice level in essence) results in ...
   more timeslices, which results in higher receive numbers ...

so the most important thing to check would be, before any scheduler and 
TCP code change is considered: if you give the task higher priority 
/explicitly/, via nice -20, do the numbers improve? Similarly, if all 
the other "background load" tasks are reniced to nice +19 (or their 
policy is set to SCHED_BATCH), do you get a similar improvement?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread David Miller
From: Ingo Molnar <[EMAIL PROTECTED]>
Date: Thu, 30 Nov 2006 07:17:58 +0100

> 
> * David Miller <[EMAIL PROTECTED]> wrote:
> 
> > We can make explicitl preemption checks in the main loop of 
> > tcp_recvmsg(), and release the socket and run the backlog if 
> > need_resched() is TRUE.
> > 
> > This is the simplest and most elegant solution to this problem.
> 
> yeah, i like this one. If the problem is "too long locked section", then
> the most natural solution is to "break up the lock", not to "boost the 
> priority of the lock-holding task" (which is what the proposed patch 
> does).

Ingo you're mis-read the problem :-)

The issue is that we actually don't hold any locks that prevent
preemption, so we can take preemption points which the TCP code
wasn't designed with in-mind.

Normally, we control the sleep point very carefully in the TCP
sendmsg/recvmsg code, such that when we sleep we drop the socket
lock and process the backlog packets that accumulated while the
socket was locked.

With pre-emption we can't control that properly.

The problem is that we really do need to run the backlog any time
we give up the cpu in the sendmsg/recvmsg path, or things get real
erratic.  ACKs don't go out as early as we'd like them to, etc.

It isn't easy to do generically, perhaps, because we can only
drop the socket lock at certain points and we need to do that to
run the backlog.

This is why my suggestion is to preempt_disable() as soon as we
grab the socket lock, and explicitly test need_resched() at places
where it is absolutely safe, like this:

if (need_resched()) {
/* Run packet backlog... */
release_sock(sk);
schedule();
lock_sock(sk);
}

The socket lock is just a by-hand binary semaphore, so it doesn't
block pre-emption.  We have to be able to sleep while holding it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread Ingo Molnar

* Wenji Wu <[EMAIL PROTECTED]> wrote:

> > That yield() will need to be removed - yield()'s behaviour is truly 
> > awfulif the system is otherwise busy.  What is it there for?
> 
> Please read the uploaded paper, which has detailed description.

do you have any URL for that?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread Ingo Molnar

* David Miller <[EMAIL PROTECTED]> wrote:

> We can make explicitl preemption checks in the main loop of 
> tcp_recvmsg(), and release the socket and run the backlog if 
> need_resched() is TRUE.
> 
> This is the simplest and most elegant solution to this problem.

yeah, i like this one. If the problem is "too long locked section", then
the most natural solution is to "break up the lock", not to "boost the 
priority of the lock-holding task" (which is what the proposed patch 
does).

[ Also note that "sprinkle the code with preempt_disable()" kind of
  solutions, besides hurting interactivity, are also a pain to resolve 
  in something like PREEMPT_RT. (unlike say a spinlock, 
  preempt_disable() is quite opaque in what data structure it protects, 
  etc., making it hard to convert it to a preemptible primitive) ]

> The one suggested in your patch and paper are way overkill, there is 
> no reason to solve a TCP specific problem inside of the generic 
> scheduler.

agreed.

What we could also add is a /reverse/ mechanism to the scheduler: a task 
could query whether it has just a small amount of time left in its 
timeslice, and could in that case voluntarily drop its current lock and 
yield, and thus give up its current timeslice and wait for a new, full 
timeslice, instead of being forcibly preempted due to lack of timeslices 
with a possibly critical lock still held.

But the suggested solution here, to "prolong the running of this task 
just a little bit longer" only starts a perpetual arms race between 
users of such a facility and other kernel subsystems. (besides not being 
adequate anyway, there can always be /so/ long lock-hold times that the 
scheduler would have no other option but to preempt the task)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread Mike Galbraith
On Wed, 2006-11-29 at 17:08 -0800, Andrew Morton wrote:
> + if (p->backlog_flag == 0) {
> + if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
> + enqueue_task(p, rq->expired);
> + if (p->static_prio < rq->best_expired_prio)
> + rq->best_expired_prio = p->static_prio;
> + } else
> + enqueue_task(p, rq->active);
> + } else {
> + if (expired_starving(rq)) {
> + enqueue_task(p,rq->expired);
> + if (p->static_prio < rq->best_expired_prio)
> + rq->best_expired_prio = p->static_prio;
> + } else {
> + if (!TASK_INTERACTIVE(p))
> + p->extrarun_flag = 1;
> + enqueue_task(p,rq->active);
> + }
> + }

(oh my, doing that to the scheduler upsets my tummy, but that aside...)

I don't see how that can really solve anything.  "Interactive" tasks
starting to use cpu heftily can still preempt and keep the special cased
cpu hog off the cpu for ages.  It also only takes one task in the
expired array to trigger the forced array switch with a fully loaded
cpu, and once any task hits the expired array, a stream of wakeups can
prevent the switch from completing for as long as you can keep wakeups
happening.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c (kernel 2.6.18.1)

2006-11-29 Thread Jesper Juhl

On 30/11/06, David Chinner <[EMAIL PROTECTED]> wrote:

On Wed, Nov 29, 2006 at 10:17:25AM +0100, Jesper Juhl wrote:
> On 29/11/06, David Chinner <[EMAIL PROTECTED]> wrote:
> >On Tue, Nov 28, 2006 at 04:49:00PM +0100, Jesper Juhl wrote:
> >> Filesystem "dm-1": XFS internal error xfs_trans_cancel at line 1138 of
> >> file fs/xfs/xfs_trans.c.  Caller 0x8034b47e
> >>
> >> Call Trace:
> >> [] show_trace+0xb2/0x380
> >> [] dump_stack+0x15/0x20
> >> [] xfs_error_report+0x3c/0x50
> >> [] xfs_trans_cancel+0x6e/0x130
> >> [] xfs_create+0x5ee/0x6a0
> >> [] xfs_vn_mknod+0x156/0x2e0
> >> [] xfs_vn_create+0xb/0x10
> >> [] vfs_create+0x8c/0xd0
> >> [] nfsd_create_v3+0x31a/0x560
> >> [] nfsd3_proc_create+0x148/0x170
> >> [] nfsd_dispatch+0xf9/0x1e0
> >> [] svc_process+0x437/0x6e0
> >> [] nfsd+0x1cd/0x360
> >> [] child_rip+0xa/0x12
> >> xfs_force_shutdown(dm-1,0x8) called from line 1139 of file
> >> fs/xfs/xfs_trans.c.  Return address = 0x80359daa
> >
> >We shut down the filesystem because we cancelled a dirty transaction.
> >Once we start to dirty the incore objects, we can't roll back to
> >an unchanged state if a subsequent fatal error occurs during the
> >transaction and we have to abort it.
> >
> So you are saying that there's nothing I can do to prevent this from
> happening in the future?

Pretty much - we need to work out what is going wrong and
we can't from teh shutdown message above - the error has
occurred in a path that doesn't have error report traps
in it.

Is this reproducable?


Not on demand, no. It has happened only this once as far as I know and
for unknown reasons.



> >If I understand historic occurrences of this correctly, there is
> >a possibility that it can be triggered in ENOMEM situations. Was your
> >machine running out of memoy when this occurred?
> >
> Not really. I just checked my monitoring software and, at the time
> this happened, the box had ~5.9G RAM free (of 8G total) and no swap
> used (but 11G available).

Ok. Sounds like we need more error reporting points inserted
into that code so we dump an error earlier and hence have some
hope of working out what went wrong next time.

OOC, there weren't any I/O errors reported before this shutdown?


No. I looked but found none.

Let me know if there's anything I can do to help.

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PM-Timer clock source is slow. Try something else: How slow? What other source(s)?

2006-11-29 Thread Srinivasa Ds

john stultz wrote:

On Wed, 2006-11-29 at 16:56 -0800, Linda Walsh wrote:
  

I recently noticed this message in my bootup that I don't remember
from before:

PCI: Probing PCI hardware (bus 00)
* Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
* this clock source is slow. Consider trying other clock sources



This basically means that your chipset has a bug which requires the ACPI
PM timer to be read three times in order to get a valid reading.

This will cause gettimeofday/clock_gettime to take longer to execute,
which is what is meant by "slow" (rather then the counter's frequency
being incorrect).

  

How would this affect my clock?  It says to try another
clock source, what type of clock source would it be suggesting I
use? Another chip already in the computer? 

Yes.

It is an Intel 440BX
chipset; on an Dell motherboard. Would that be likely to have
another chip source that is compensating?

You can change the clock source using "clock=" kernel parameter. Please 
refer to  Documentation/kernel-parameters.txt file of kernel source.

I don't notice a significant clock slowdown, but I'm running NTP,
so that could be masking the problem.



Unless you're running performance critical programs that utilize
gettimeofday/clock_gettime, you probably won't notice anything. Time
should still function properly.  If you are having performance issues,
you can try using a different clocksource (the TSC is probably safe, but
not necessarily).

thanks
-john



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

  

Thanks
Srinivasa DS
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] autofs: fix error code path in autofs_fill_sb()

2006-11-29 Thread Ian Kent
On Thu, 2006-11-30 at 01:26 +0100, Jiri Kosina wrote:
> [PATCH] autofs: fix error code path in autofs_fill_sb()
> 
> When kernel is compiled with old version of autofs (CONFIG_AUTOFS_FS), and 
> new (observed at least with 5.x.x) automount deamon is started, kernel 
> correctly reports incompatible version of kernel and userland daemon, but 
> then screws things up instead of correct handling of the error:
> 
>  autofs: kernel does not match daemon version
>  =
>  [ BUG: bad unlock balance detected! ]
>  -
>  automount/4199 is trying to release lock (>s_umount_key) at:
>  [] get_sb_nodev+0x76/0xa4
>  but there are no more locks to release!
> 
>  other info that might help us debug this:
>  no locks held by automount/4199.
> 
>  stack backtrace:
>   [] dump_trace+0x68/0x1b2
>   [] show_trace_log_lvl+0x18/0x2c
>   [] show_trace+0xf/0x11
>   [] dump_stack+0x12/0x14
>   [] print_unlock_inbalance_bug+0xe7/0xf3
>   [] lock_release+0x8d/0x164
>   [] up_write+0x14/0x27
>   [] get_sb_nodev+0x76/0xa4
>   [] vfs_kern_mount+0x83/0xf6
>   [] do_kern_mount+0x2d/0x3e
>   [] do_mount+0x607/0x67a
>   [] sys_mount+0x72/0xa4
>   [] sysenter_past_esp+0x5f/0x99
>  DWARF2 unwinder stuck at sysenter_past_esp+0x5f/0x99
>  Leftover inexact backtrace:
>   ===
> 
> and then deadlock comes.
> 
> The problem: autofs_fill_super() returns EINVAL to get_sb_nodev(), but before
> that, it calls kill_anon_super() to destroy the superblock which won't be 
> needed. This is however way too soon to call kill_anon_super(), because 
> get_sb_nodev() has to perform its own cleanup of the superblock first
> (deactivate_super(), etc.). The correct time to call kill_anon_super() is in
> the autofs_kill_sb() callback, which is called by deactivate_super() at proper
> time, when the superblock is ready to be killed.
> 
> I can see the same faulty codepath also in autofs4. This patch solves issues 
> in
> both filesystems in a same way - it postpones the kill_anon_super() until the 
> proper time is signalized by deactivate_super() calling the kill_sb() 
> callback.
> 
> Patch against 2.6.19-rc6-mm2.
> 
> Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>
Acked-by: Ian Kent <[EMAIL PROTECTED]>

It looks so obvious now.
Updating the comment above would be a good idea also, see attached.

> 
> --- 
> 
>  fs/autofs/inode.c|4 ++--
>  fs/autofs4/inode.c   |4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/autofs/inode.c b/fs/autofs/inode.c
> index 38ede5c..61e04ab 100644
> --- a/fs/autofs/inode.c
> +++ b/fs/autofs/inode.c
> @@ -31,7 +31,7 @@ void autofs_kill_sb(struct super_block *
>* just exit when we are called from deactivate_super.
>*/
>   if (!sbi)
> - return;
> + goto out_kill_sb;
>  
>   if ( !sbi->catatonic )
>   autofs_catatonic_mode(sbi); /* Free wait queues, close pipe */
> @@ -44,6 +44,7 @@ void autofs_kill_sb(struct super_block *
>  
>   kfree(sb->s_fs_info);
>  
> +out_kill_sb:
>   DPRINTK(("autofs: shutting down\n"));
>   kill_anon_super(sb);
>  }
> @@ -209,7 +210,6 @@ fail_iput:
>  fail_free:
>   kfree(sbi);
>   s->s_fs_info = NULL;
> - kill_anon_super(s);
>  fail_unlock:
>   return -EINVAL;
>  }
> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
> index ce7c0f1..be14200 100644
> --- a/fs/autofs4/inode.c
> +++ b/fs/autofs4/inode.c
> @@ -155,7 +155,7 @@ void autofs4_kill_sb(struct super_block
>* just exit when we are called from deactivate_super.
>*/
>   if (!sbi)
> - return;
> + goto out_kill_sb;
>  
>   sb->s_fs_info = NULL;
>  
> @@ -167,6 +167,7 @@ void autofs4_kill_sb(struct super_block
>  
>   kfree(sbi);
>  
> +out_kill_sb:
>   DPRINTK("shutting down");
>   kill_anon_super(sb);
>  }
> @@ -426,7 +427,6 @@ fail_ino:
>  fail_free:
>   kfree(sbi);
>   s->s_fs_info = NULL;
> - kill_anon_super(s);
>  fail_unlock:
>   return -EINVAL;
>  }
> 
> 

Update descriptive comment also.

Signed-off-by: Ian Kent <[EMAIL PROTECTED]>

---
--- linux-2.6.19-rc5-mm1/fs/autofs4/inode.c.fix-error-in-autofs_fill_sb-comment 
2006-11-30 13:05:13.0 +0800
+++ linux-2.6.19-rc5-mm1/fs/autofs4/inode.c 2006-11-30 13:09:27.0 
+0800
@@ -152,7 +152,8 @@ void autofs4_kill_sb(struct super_block 
/*
 * In the event of a failure in get_sb_nodev the superblock
 * info is not present so nothing else has been setup, so
-* just exit when we are called from deactivate_super.
+* just call kill_anon_super when we are called from
+* deactivate_super.
 */
if (!sbi)
goto out_kill_sb;
--- linux-2.6.19-rc5-mm1/fs/autofs/inode.c.fix-error-in-autofs_fill_sb-comment  
2006-11-30 13:05:02.0 +0800
+++ linux-2.6.19-rc5-mm1/fs/autofs/inode.c  2006-11-30 13:09:00.0 
+0800
@@ 

Re: failed 'ljmp' in linear addressing mode

2006-11-29 Thread Jun Sun
On Tue, Nov 28, 2006 at 05:40:56PM -0800, Jun Sun wrote:
> 
> Can you elaborate more why this last ljmp will fail?  I thought at this point
> the paging is turned off, and 0x1000- would simply mean a physical
> address - which is a valid physical address in RAM, btw.
>


I finally got it working, even though I don't understand at all. :)

I realized that after paging mode is turned off, 0x1000- is actually
at the same flag 4G code segment as caller code.  So I tried to just
"call" and that worked.

Here is the excerpt of the related code in case someone else needs to
do the same:

In arch/i386/kernel/machine_kexec.c:

extern void do_os_switching(void);
void os_switch(void)
{
void (*foo)(void);

/* absolutely no irq */
local_irq_disable();

/* create identity mapping */
foo=virt_to_phys(do_os_switching);
identity_map_page((unsigned long)foo);

/* jump to the real address */
load_segments();
set_gdt(phys_to_virt(0),0);
set_idt(phys_to_virt(0),0);
foo();
}

In arch/i386/kernel/acpi/wakeup.S:

.align  4096
ENTRY(do_os_switching)
/* JSUN, 0x11 was the boot up value for cr0. */
movl$0x11, %eax
movl%eax, %cr0

/* clear cr4 */
movl$0, %eax
movl%eax, %cr4

/* clear cr3, flush TLB */
movl$0, %eax
movl%eax, %cr3

movl$0x1000,%eax
call*%eax

I have a second Linux kernel loaded at 0x1000-.  Now the only matter
remaining is to figure out why the tsc timer stopped working ... :)

Cheers.

Jun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] dynsched - different cpu schedulers per cpuset

2006-11-29 Thread Paul Jackson
pj wrote:
> See Paul Menage's most recent patch proposal at:
>   http://lkml.org/lkml/2006/11/17/217
>   Subject: [PATCH 0/6] Multi-hierarchy Process Containers
>   Date:Fri, 17 Nov 2006 11:11:59 -0800

I'm behind the times.  Paul Menage's most recent proposal is at:
http://lkml.org/lkml/2006/11/23/95
Subject: [PATCH 0/7] Generic Process Containers (+ ResGroups/BeanCounters)
Date:Thu, 23 Nov 2006 04:08:48 -0800

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Core file size?

2006-11-29 Thread linux err
Does anyone know what determines the size of a core
dump? I have a process running out of memory (it
allocates about 3GB) - but the size of core varies
(between 2-3GB) depending on how much the process
wrote on the allocated memory.

Also, the time it takes to write the core (same size)
varies??

I briefly looked at elf_core_dump and get_user_pages()
in binfmt_elf.c. Is there any documentation on this?
Or anyone knows how it works?

TIA


 

Cheap talk?
Check out Yahoo! Messenger's low PC-to-Phone call rates.
http://voice.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mass-storage problems with Archos AV500

2006-11-29 Thread Robert Hancock

David Weinehall wrote:

I've got an Archos AV500 here (running the very latest firmware), pretty
much acting as a doorstop, since I cannot get it to be recognized
properly by Linux.


..


[  118.144000] SCSI device sdb: 58074975 512-byte hdwr sectors (29734
MB)
[  118.144000] sdb: Write Protect is off
[  118.144000] sdb: Mode Sense: 33 00 00 00
[  118.144000] sdb: assuming drive cache: write through
[  118.144000]  sdb: unknown partition table
[  118.452000] sd 4:0:0:0: Attached scsi removable disk sdb
[  118.452000] usb-storage: device scan complete

This is with linux-image-2.6.19-7-generic 2.6.19-7.10 from Ubuntu edgy.
I get similar results with a home-brew 2.6.18-rc4.

Any mass storage quirk needed that might be missing?


That all seems normal, other than the unknown partition table, but the 
device might be all one unpartitioned disk.. at what point is it failing?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 panic on boot -- i386

2006-11-29 Thread Chris Wright
* David Miller ([EMAIL PROTECTED]) wrote:
> Check [EMAIL PROTECTED]'s inbox, I just sent it in :)

Ooh, nice timing!

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] [ATM] Add CPPFLAGS to byteorder.h check.

2006-11-29 Thread Ben Collins
O= builds produced errors in the shell command because of unfound headers.

Signed-off-by: Ben Collins <[EMAIL PROTECTED]>
---
 drivers/atm/Makefile |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/atm/Makefile b/drivers/atm/Makefile
index b5077ce..1b16f81 100644
--- a/drivers/atm/Makefile
+++ b/drivers/atm/Makefile
@@ -41,7 +41,7 @@ ifeq ($(CONFIG_ATM_FORE200E_PCA),y)
   # guess the target endianess to choose the right PCA-200E firmware image
   ifeq ($(CONFIG_ATM_FORE200E_PCA_DEFAULT_FW),y)
 byteorder.h:= include$(if $(patsubst 
$(srctree),,$(objtree)),2)/asm/byteorder.h
-CONFIG_ATM_FORE200E_PCA_FW := $(obj)/pca200e$(if $(shell $(CC) -E -dM 
$(byteorder.h) | grep ' __LITTLE_ENDIAN '),.bin,_ecd.bin2)
+CONFIG_ATM_FORE200E_PCA_FW := $(obj)/pca200e$(if $(shell $(CC) $(CPPFLAGS) 
-E -dM $(byteorder.h) | grep ' __LITTLE_ENDIAN '),.bin,_ecd.bin2)
   endif
 endif
 
-- 
1.4.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] [APIC] Allow disabling of UP APIC/IO-APIC by default, with command line option to turn it on.

2006-11-29 Thread Ben Collins
Signed-off-by: Ben Collins <[EMAIL PROTECTED]>
---
 arch/i386/Kconfig  |   13 +
 arch/i386/kernel/apic.c|   13 +++--
 arch/i386/kernel/io_apic.c |   10 +-
 include/asm-i386/apic.h|6 ++
 include/asm-i386/io_apic.h |5 +
 5 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index b4a2461..ef2f2db 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -285,6 +285,19 @@ config X86_UP_IOAPIC
  to use it. If you say Y here even though your machine doesn't have
  an IO-APIC, then the kernel will still run with no slowdown at all.
 
+config X86_UP_APIC_DEFAULT_OFF
+   bool "APIC support on uniprocessors defaults to off"
+   depends on X86_UP_APIC
+   default n
+   help
+ Some older systems have flaky APICs.  Say Y to turn off APIC
+ support by default, while still allowing it to be enabled by the
+ "lapic" and "apic" command line options.
+
+ Usually this is only necessary for distro installer kernels that
+ must work with everything.  Everyone else can safely say N here
+ and configure APIC support in or out as needed.
+
 config X86_LOCAL_APIC
bool
depends on X86_UP_APIC || ((X86_VISWS || SMP) && !X86_VOYAGER) || 
X86_GENERICARCH
diff --git a/arch/i386/kernel/apic.c b/arch/i386/kernel/apic.c
index 2fd4b7d..2f2eb83 100644
--- a/arch/i386/kernel/apic.c
+++ b/arch/i386/kernel/apic.c
@@ -51,8 +51,9 @@ static cpumask_t timer_bcast_ipi;
 
 /*
  * Knob to control our willingness to enable the local APIC.
+ * -2=default-disable, -1=force-disable, 1=force-enable, 0=automatic
  */
-static int enable_local_apic __initdata = 0; /* -1=force-disable, 
+1=force-enable */
+static int enable_local_apic __initdata = (X86_APIC_DEFAULT_OFF ? -2 : 0);
 
 static inline void lapic_disable(void)
 {
@@ -801,7 +802,7 @@ static int __init detect_init_APIC (void
 * APIC only if "lapic" specified.
 */
if (enable_local_apic <= 0) {
-   printk("Local APIC disabled by BIOS -- "
+   printk("Local APIC disabled by BIOS (or by default) -- "
   "you can enable it with \"lapic\"\n");
return -1;
}
@@ -1350,6 +1351,14 @@ int __init APIC_init_uniprocessor (void)
if (!smp_found_config && !cpu_has_apic)
return -1;
 
+   /* If local apic is off due to config_x86_apic_off option, jump
+* out here. */
+   if (enable_local_apic < -1) {
+   printk(KERN_INFO "Local APIC disabled by default -- "
+  "use 'lapic' to enable it.\n");
+   return -1;
+   }
+
/*
 * Complain if the BIOS pretends there is one.
 */
diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index 3b7a63e..0122dba 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -767,7 +767,7 @@ #endif /* !CONFIG_SMP */
 #define MAX_PIRQS 8
 static int pirq_entries [MAX_PIRQS];
 static int pirqs_enabled;
-int skip_ioapic_setup;
+int skip_ioapic_setup = X86_APIC_DEFAULT_OFF;
 
 static int __init ioapic_setup(char *str)
 {
@@ -2887,3 +2887,11 @@ static int __init parse_noapic(char *arg
return 0;
 }
 early_param("noapic", parse_noapic);
+
+static int __init parse_apic(char *arg)
+{
+   /* enable IO-APIC */
+   enable_ioapic_setup();
+   return 0;
+}
+early_param("apic", parse_apic);
diff --git a/include/asm-i386/apic.h b/include/asm-i386/apic.h
index b952957..a06ca3f 100644
--- a/include/asm-i386/apic.h
+++ b/include/asm-i386/apic.h
@@ -71,6 +71,12 @@ # define apic_read_around(x) apic_read(x
 # define apic_write_around(x,y) apic_write_atomic((x),(y))
 #endif
 
+#ifdef CONFIG_X86_UP_APIC_DEFAULT_OFF
+# define X86_APIC_DEFAULT_OFF 1
+#else
+# define X86_APIC_DEFAULT_OFF 0
+#endif
+
 static inline void ack_APIC_irq(void)
 {
/*
diff --git a/include/asm-i386/io_apic.h b/include/asm-i386/io_apic.h
index 059a9ff..ddedeec 100644
--- a/include/asm-i386/io_apic.h
+++ b/include/asm-i386/io_apic.h
@@ -126,6 +126,11 @@ static inline void disable_ioapic_setup(
skip_ioapic_setup = 1;
 }
 
+static inline void enable_ioapic_setup(void)
+{
+   skip_ioapic_setup = 0;
+}
+
 static inline int ioapic_setup_disabled(void)
 {
return skip_ioapic_setup;
-- 
1.4.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-29 Thread Gautham R Shenoy
On Wed, Nov 29, 2006 at 01:05:56PM -0800, Andrew Morton wrote:
> On Wed, 29 Nov 2006 20:54:04 +0530
> Gautham R Shenoy <[EMAIL PROTECTED]> wrote:
> 
> > Ok, so to cut the long story short,
> > - While changing governor from anything to
> > ondemand, locks are taken in the following order
> >
> > policy->lock ===> dbs_mutex ===> workqueue_mutex.

> >
> > - While offlining a cpu, locks are taken in the following order
> >
> > cpu_add_remove_lock ==> sched_hotcpu_mutex ==> workqueue_mutex ==
> > ==> cache_chain_mutex ==> policy->lock.
> 
> What functions are taking all these locks?  (ie: the callpath?)

While changing cpufreq governor to ondemand, the locks taken are:
--
lockfunctionfile
--
policy->lockstore_scaling_governor  drivers/cpufreq/cpufreq.c

dbs_mutex   cpufreq_governor_dbsdrivers/cpufreq/cpufreq_ondemand.c

workqueue_mutex __create_workqueue  kernel/workqueue.c
--

The complete callpath would be

store_scaling_governor [*]
|
__cpufreq_set_policy
|
__cpufreq_governor(data, CPUFREQ_GOV_START)
|
policy->governor->governor => cpufreq_governor_dbs(data, CPUFREQ_GOV_START) [*]
|
create_workqueue #defined as __create_workqueue [*]

where [*] = locks taken.

While offlining a cpu, locks are taken in the following order:

--
lockfunctionfile
--
cpu_add_remove_lock cpu_downkernel/cpu.c

sched_hotcpu_mutex  migration_call  kernel/sched.c

workqueue_mutex workqueue_cpu_callback  kernel/workqueue.c

cache_chain_mutex   cpuup_callback  mm/slab.c

policy->lockcpufreq_driver_target   drivers/cpufreq/cpufreq.c
---

Please note that in the above,
- sched_hotcpu_mutex, workqueue_mutex, cache_chain_mutex are taken 
  while handling CPU_LOCK_ACQUIRE events in the respective subsystems'
  cpu_callback functions.

- policy->lock is taken while handling CPU_DOWN_PREPARE in 
  cpufreq_cpu_callback which calls cpufreq_driver_target.

It's perfectly clear that in the cpu offline callpath, cpufreq
does not have to do anything with the workqueue. 

So can we ignore this circular-dep warning as a false positive?
Or is there a way to exploit this circular dependency ?

At the moment, I cannot think of way to exploit this circular dependency
unless we do something like try destroying the created workqueue when the
cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware.
(eeks! that doesn't look good)

I'm working on fixing this. Let me see if I can come up with something.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] [HVCS] Select HVC_CONSOLE if HVCS is enabled.

2006-11-29 Thread Ben Collins
If HVC_CONSOLE provides symbols that HVCS requires.

Signed-off-by: Ben Collins <[EMAIL PROTECTED]>
---
 drivers/char/Kconfig |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 2af12fc..c94ecdc 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -598,6 +598,7 @@ config HVC_RTAS
 config HVCS
tristate "IBM Hypervisor Virtual Console Server support"
depends on PPC_PSERIES
+   select HVC_CONSOLE
help
  Partitionable IBM Power5 ppc64 machines allow hosting of
  firmware virtual consoles from one Linux partition by
-- 
1.4.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 panic on boot -- i386

2006-11-29 Thread David Miller
From: Chris Wright <[EMAIL PROTECTED]>
Date: Wed, 29 Nov 2006 20:27:59 -0800

> * David Miller ([EMAIL PROTECTED]) wrote:
> > From: Pete Clements <[EMAIL PROTECTED]>
> > Date: Wed, 29 Nov 2006 22:13:09 -0500 (EST)
> > 
> > > 2.6.19 panics at boot. Good up through rc6-git11.
> > > Hand copied screen below.
> > 
> > Here is the fix, which was posted in response to a seperate
> > report of this problem here:
> 
> looks like 2.6.19.1 material ;-)

Check [EMAIL PROTECTED]'s inbox, I just sent it in :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Ubuntu patch sync for 2.6.20

2006-11-29 Thread Ben Collins
This is a set of patches from the Ubuntu tree that seemed suitable for
upstream sync.

[PATCH 1/4] [x86] Add command line option to enable/disable hyper-threading.

[PATCH 2/4] [APIC] Allow disabling of UP APIC/IO-APIC by default, with command 
line option to turn it on.

[PATCH 3/4] [ATM] Add CPPFLAGS to byteorder.h check.

[PATCH 4/4] [HVCS] Select HVC_CONSOLE if HVCS is enabled.


 arch/i386/Kconfig   |   13 +
 Documentation/kernel-parameters.txt |3 +++
 arch/i386/Kconfig   |5 +
 arch/i386/kernel/apic.c |   13 +++--
 arch/i386/kernel/cpu/common.c   |   30 +-
 arch/i386/kernel/io_apic.c  |   10 +-
 drivers/atm/Makefile|3 +--
 drivers/char/Kconfig|2 +-
 include/asm-i386/apic.h |6 ++
 include/asm-i386/io_apic.h  |6 +-
 10 files changed, 83 insertions(+), 8 deletions(-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Ubuntu patch sync for 2.6.20

2006-11-29 Thread Ben Collins
This is a set of patches from the Ubuntu tree that seemed suitable for
upstream sync.

[PATCH 1/4] [x86] Add command line option to enable/disable hyper-threading.

[PATCH 2/4] [APIC] Allow disabling of UP APIC/IO-APIC by default, with command 
line option to turn it on.

[PATCH 3/4] [ATM] Add CPPFLAGS to byteorder.h check.

[PATCH 4/4] [HVCS] Select HVC_CONSOLE if HVCS is enabled.


 arch/i386/Kconfig   |   13 +
 Documentation/kernel-parameters.txt |3 +++
 arch/i386/Kconfig   |5 +
 arch/i386/kernel/apic.c |   13 +++--
 arch/i386/kernel/cpu/common.c   |   30 +-
 arch/i386/kernel/io_apic.c  |   10 +-
 drivers/atm/Makefile|3 +--
 drivers/char/Kconfig|2 +-
 include/asm-i386/apic.h |6 ++
 include/asm-i386/io_apic.h  |6 +-
 10 files changed, 83 insertions(+), 8 deletions(-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] [x86] Add command line option to enable/disable hyper-threading.

2006-11-29 Thread Ben Collins
This patch adds a config option to allow disabling hyper-threading by
default, and a kernel command line option to changes this default at
boot time.

Signed-off-by: Ben Collins <[EMAIL PROTECTED]>
---
 Documentation/kernel-parameters.txt |3 +++
 arch/i386/Kconfig   |5 +
 arch/i386/kernel/cpu/common.c   |   29 +
 3 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 6747384..2b68d6e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -600,6 +600,9 @@ and is between 256 and 4096 characters. 
hisax=  [HW,ISDN]
See Documentation/isdn/README.HiSax.
 
+   ht= [HW,IA-32,SMP] Enable or disable hyper-threading.
+   Format: 
+
hugepages=  [HW,IA-32,IA-64] Maximal number of HugeTLB pages.
 
noirqbalance[IA-32,SMP,KNL] Disable kernel irq balancing
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 8ff1c6f..b4a2461 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -1185,6 +1185,11 @@ config X86_HT
depends on SMP && !(X86_VISWS || X86_VOYAGER)
default y
 
+config X86_HT_DISABLE
+   bool "Disable Hyper-Threading by default"
+   depends on X86_HT
+   default n
+
 config X86_BIOS_REBOOT
bool
depends on !(X86_VISWS || X86_VOYAGER)
diff --git a/arch/i386/kernel/cpu/common.c b/arch/i386/kernel/cpu/common.c
index d9f3e3c..42d2361 100644
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -482,6 +482,29 @@ void __cpuinit identify_cpu(struct cpuin
 }
 
 #ifdef CONFIG_X86_HT
+
+#ifdef CONFIG_X86_HT_DISABLE
+static int disable_ht __cpuinitdata = 1;
+#else
+static int disable_ht __cpuinitdata;
+#endif
+
+static int __init parse_ht(char *arg)
+{
+   if (!arg)
+   return -EINVAL;
+
+   if (!memcmp(arg, "on", 2))
+   disable_ht = 0;
+   else if (!memcmp(arg, "off", 3))
+   disable_ht = 1;
+   else
+   return -EINVAL;
+
+   return 0;
+}
+early_param("ht", parse_ht);
+
 void __cpuinit detect_ht(struct cpuinfo_x86 *c)
 {
u32 eax, ebx, ecx, edx;
@@ -492,6 +515,12 @@ void __cpuinit detect_ht(struct cpuinfo_
if (!cpu_has(c, X86_FEATURE_HT) || cpu_has(c, X86_FEATURE_CMP_LEGACY))
return;
 
+   if (disable_ht) {
+   printk(KERN_INFO  "CPU: Hyper-Threading disabled by default. 
Enable with ht=on\n");
+   smp_num_siblings = 1;
+   return;
+   }
+
smp_num_siblings = (ebx & 0xff) >> 16;
 
if (smp_num_siblings == 1) {
-- 
1.4.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Ubuntu patch sync for 2.6.20

2006-11-29 Thread Ben Collins
This is a set of patches from the Ubuntu tree that seemed suitable for
upstream sync.

[PATCH 1/4] [x86] Add command line option to enable/disable hyper-threading.

[PATCH 2/4] [APIC] Allow disabling of UP APIC/IO-APIC by default, with command 
line option to turn it on.

[PATCH 3/4] [ATM] Add CPPFLAGS to byteorder.h check.

[PATCH 4/4] [HVCS] Select HVC_CONSOLE if HVCS is enabled.


 arch/i386/Kconfig   |   13 +
 Documentation/kernel-parameters.txt |3 +++
 arch/i386/Kconfig   |5 +
 arch/i386/kernel/apic.c |   13 +++--
 arch/i386/kernel/cpu/common.c   |   30 +-
 arch/i386/kernel/io_apic.c  |   10 +-
 drivers/atm/Makefile|3 +--
 drivers/char/Kconfig|2 +-
 include/asm-i386/apic.h |6 ++
 include/asm-i386/io_apic.h  |6 +-
 10 files changed, 83 insertions(+), 8 deletions(-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 panic on boot -- i386

2006-11-29 Thread Chris Wright
* David Miller ([EMAIL PROTECTED]) wrote:
> From: Pete Clements <[EMAIL PROTECTED]>
> Date: Wed, 29 Nov 2006 22:13:09 -0500 (EST)
> 
> > 2.6.19 panics at boot. Good up through rc6-git11.
> > Hand copied screen below.
> 
> Here is the fix, which was posted in response to a seperate
> report of this problem here:

looks like 2.6.19.1 material ;-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] dynsched - different cpu schedulers per cpuset

2006-11-29 Thread Paul Jackson
Felix wrote:
> The cpu<->scheduler mapping is controlled via cpusets. Thus you
> can switch the scheduler for a cpuset containing multiple cpus and
> keep the rest untouched.

I don't have comments on the main focus of this work - schedulers are
not my expertise.

I just noticed this lkml post because of my interest in cpusets.

You should take a look at the work of Paul Menage (added to the
cc list), who is splitting the cpuset code into:
 1) a generic "container" mechanism,
 2) separate CPU and Memory "controllers", and
 3) various other additional "controllers".

See Paul Menage's most recent patch proposal at:
  http://lkml.org/lkml/2006/11/17/217
  Subject: [PATCH 0/6] Multi-hierarchy Process Containers
  Date:Fri, 17 Nov 2006 11:11:59 -0800

The container mechanism uses a virtual file system derived from
the cpuset code to provide a file system style (hierarchical names
and classic Unix style file and directory permissions) naming of a
partitioning of the tasks on a system.

By partitioning here, I mean a division of the tasks into several
subsets, aka partition elements, which are non-overlapping and covering.

That is, each task is in one and only one of the partition elements,
these partitions elements are named by the directories in the container
file system, and the regular files in the container file system provide
per-element attributes.

Then kernel facilities that can be considered as providing attributes
for and control of subsets of tasks is represented as a controller,
and attached to such a container.

Your dynamic scheduler mechanisms appear (from what I can tell after a
brief glance) to be a candidate for being such a controller.

The upshot of this is that, if your work should proceed and eventually
be considered for inclusion in the kernel (I have --no-- idea if that
would be a good idea, either for the purposes of your student group,
or for the kernel itself) then it would likely (if Menage's work
is accepted) need to be recast as a "controller" in Menage's terms,
not as an extension to cpusets.

If Menage succeeds, that should not actually be that big of a change,
either semantically, or in coding details.

Good luck.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc patch] Re: [patch] PM: suspend/resume debugging should depend on SOFTWARE_SUSPEND

2006-11-29 Thread Mike Galbraith
On Wed, 2006-11-29 at 11:49 -0800, Andrew Morton wrote:

> > +#ifdef CONFIG_PM
> > +static int serial_pnp_suspend(struct pnp_dev *dev, pm_message_t state)
> > +{
> > +   long line = (long)pnp_get_drvdata(dev);
> 
> Please avoid adding long lines.  (heh, I kill me)

Ok.  I also changed the place I got it from.

> We'd usually do
> 
> #else
> #define serial_pnp_suspend NULL
> #define serial_pnp_resume NULL
> 
> here
> 
> > +
> > +#endif /* CONFIG_PM */
> > +
> >  static struct pnp_driver serial_pnp_driver = {
> > .name   = "serial",
> > -   .id_table   = pnp_dev_table,
> > .probe  = serial_pnp_probe,
> > .remove = __devexit_p(serial_pnp_remove),
> > +#ifdef CONFIG_PM
> > +   .suspend= serial_pnp_suspend,
> > +   .resume = serial_pnp_resume,
> > +#endif
> 
> and hence omit the ifdefs here.

New patch.

Add suspend/resume methods to drivers/serial/8250_pnp.c.  Tested on a
P4/HT 16550A box, ttyS0 login survives across suspend to ram.

Signed-off-by: Mike Galbraith <[EMAIL PROTECTED]>

--- linux-2.6.19-rc6-mm2/drivers/serial/8250_pnp.c.org  2006-11-29 
07:14:15.0 +0100
+++ linux-2.6.19-rc6-mm2/drivers/serial/8250_pnp.c  2006-11-29 
20:49:33.0 +0100
@@ -459,16 +459,43 @@ serial_pnp_probe(struct pnp_dev *dev, co
 
 static void __devexit serial_pnp_remove(struct pnp_dev *dev)
 {
-   long line = (long)pnp_get_drvdata(dev);
+   int line = (long)pnp_get_drvdata(dev);
if (line)
serial8250_unregister_port(line - 1);
 }
 
+#ifdef CONFIG_PM
+static int serial_pnp_suspend(struct pnp_dev *dev, pm_message_t state)
+{
+   int line = (int)pnp_get_drvdata(dev);
+
+   if (!line)
+   return -ENODEV;
+   serial8250_suspend_port(line - 1);
+   return 0;
+}
+
+static int serial_pnp_resume(struct pnp_dev *dev)
+{
+   int line = (int)pnp_get_drvdata(dev);
+
+   if (!line)
+   return -ENODEV;
+   serial8250_resume_port(line - 1);
+   return 0;
+}
+#else
+#define serial_pnp_suspend NULL
+#define serial_pnp_resume NULL
+#endif /* CONFIG_PM */
+
 static struct pnp_driver serial_pnp_driver = {
.name   = "serial",
-   .id_table   = pnp_dev_table,
.probe  = serial_pnp_probe,
.remove = __devexit_p(serial_pnp_remove),
+   .suspend= serial_pnp_suspend,
+   .resume = serial_pnp_resume,
+   .id_table   = pnp_dev_table,
 };
 
 static int __init serial8250_pnp_init(void)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 panic on boot -- i386

2006-11-29 Thread David Miller
From: Pete Clements <[EMAIL PROTECTED]>
Date: Wed, 29 Nov 2006 22:13:09 -0500 (EST)

> 2.6.19 panics at boot. Good up through rc6-git11.
> Hand copied screen below.

Here is the fix, which was posted in response to a seperate
report of this problem here:

commit c28728decc37fe52c8cdf48b3e0c0cf9b0c2fefb
Author: David S. Miller <[EMAIL PROTECTED]>
Date:   Wed Nov 29 18:14:47 2006 -0800

[IPV6] NDISC: Calculate packet length correctly for allocation.

MAX_HEADER does not include the ipv6 header length in it,
so we need to add it in explicitly.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 73eb8c3..c42d4c2 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -441,7 +441,8 @@ static void ndisc_send_na(struct net_dev
 struct sk_buff *skb;
int err;
 
-   len = sizeof(struct icmp6hdr) + sizeof(struct in6_addr);
+   len = sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr) +
+   sizeof(struct in6_addr);
 
/* for anycast or proxy, solicited_addr != src_addr */
ifp = ipv6_get_ifaddr(solicited_addr, dev, 1);
@@ -556,7 +557,8 @@ void ndisc_send_ns(struct net_device *de
if (err < 0)
return;
 
-   len = sizeof(struct icmp6hdr) + sizeof(struct in6_addr);
+   len = sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr) +
+   sizeof(struct in6_addr);
send_llinfo = dev->addr_len && !ipv6_addr_any(saddr);
if (send_llinfo)
len += ndisc_opt_addr_space(dev);
@@ -632,7 +634,7 @@ void ndisc_send_rs(struct net_device *de
if (err < 0)
return;
 
-   len = sizeof(struct icmp6hdr);
+   len = sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr);
if (dev->addr_len)
len += ndisc_opt_addr_space(dev);
 
@@ -1381,7 +1383,8 @@ void ndisc_send_redirect(struct sk_buff 
 struct in6_addr *target)
 {
struct sock *sk = ndisc_socket->sk;
-   int len = sizeof(struct icmp6hdr) + 2 * sizeof(struct in6_addr);
+   int len = sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr) +
+   2 * sizeof(struct in6_addr);
struct sk_buff *buff;
struct icmp6hdr *icmph;
struct in6_addr saddr_buf;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-rc6-mm2

2006-11-29 Thread Randy Dunlap
On Wed, 29 Nov 2006 22:42:20 -0500 Ed Tomlinson wrote:

> On Tuesday 28 November 2006 05:02, Andrew Morton wrote:
> 
> > Will appear eventually at
> > 
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc6/2.6.19-rc6-mm2/
> 
> This kernel does not boot here.  It does not get far enough to post anything 
> to my serial console.

Have you tried using "earlyprintk=..." to see if it produces any
more output?

> The last booted kernel here is 19-rc5-mm2.   Grub is used to boot, here is 
> the starting log
> of rc5-mm2 build is UP AMD64:
> 
> [0.00] Linux version 2.6.19-rc5-mm2 ([EMAIL PROTECTED]) (gcc version 
> 4.1.1 (Gentoo 4.1.1-r1)) #1 PREEM6
> [0.00] Command line: root=/dev/sda3 vga=0x318 
> video=vesafb:ywrap,mtrr:3 console=tty0 console=tty1
> [0.00] BIOS-provided physical RAM map:
> [0.00]  BIOS-e820:  - 0009f800 (usable)
> [0.00]  BIOS-e820: 0009f800 - 000a (reserved)
> [0.00]  BIOS-e820: 000f - 0010 (reserved)
> [0.00]  BIOS-e820: 0010 - 3fff (usable)
> [0.00]  BIOS-e820: 3fff - 3fff3000 (ACPI NVS)
> [0.00]  BIOS-e820: 3fff3000 - 4000 (ACPI data)
> [0.00]  BIOS-e820: fec0 - fec01000 (reserved)
> [0.00]  BIOS-e820: fee0 - fef0 (reserved)
> [0.00]  BIOS-e820: fefffc00 - ff00 (reserved)
> [0.00]  BIOS-e820:  - 0001 (reserved)
> [0.00] end_pfn_map = 1048576
> [0.00] DMI 2.2 present.
> [0.00] Zone PFN ranges:
> [0.00]   DMA 0 -> 4096
> [0.00]   DMA324096 ->  1048576
> [0.00]   Normal1048576 ->  1048576
> [0.00] early_node_map[2] active PFN ranges
> [0.00] 0:0 ->  159
> [0.00] 0:  256 ->   262128
> [0.00] Nvidia board detected. Ignoring ACPI timer override.
> [0.00] ACPI: PM-Timer IO Port: 0x4008
> [0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> [0.00] Processor #0 (Bootup-CPU)
> [0.00] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
> [0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
> [0.00] IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [0.00] ACPI: BIOS IRQ0 pin2 override ignored.
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
> [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
> [0.00] Setting APIC routing to flat
> [0.00] Using ACPI (MADT) for SMP configuration information
> [0.00] Nosave address range: 0009f000 - 000a
> [0.00] Nosave address range: 000a - 000f
> [0.00] Nosave address range: 000f - 0010
> [0.00] Allocating PCI resources starting at 5000 (gap: 
> 4000:bec0)
> [0.00] Built 1 zonelists.  Total pages: 257320
> [0.00] Kernel command line: root=/dev/sda3 vga=0x318 
> video=vesafb:ywrap,mtrr:3 console=tty0 cons1
> [0.00] Initializing CPU#0
> [0.00] PID hash table entries: 4096 (order: 12, 32768 bytes)
> 
> Any ideas what I should try or suggestions on patches to remove/try.
> 
> Thanks
> Ed

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


man-pages-2.43 is released

2006-11-29 Thread Michael Kerrisk
Gidday,

I just released man-pages-2.43.

This release is now available for download at:

ftp://ftp.kernel.org/pub/linux/docs/manpages
or mirrors: ftp://ftp.XX.kernel.org/pub/linux/docs/manpages

and soon at:

ftp://ftp.win.tue.nl/pub/linux-local/manpages

Changes in this release that may be of interest to readers
of this list include the following:

Changes to individual pages
---

rtc.4
David Brownell

Update the RTC man page to reflect the new RTC class framework:

- Generalize ... it's not just for PC/AT style RTCs, and there
  may be more than one RTC per system.

- Not all RTCs expose the same feature set as PC/AT ones; most
  of these ioctls will be rejected by some RTCs.

- Be explicit about when {A,P}IE_{ON,OFF} calls are needed.

- Describe the parameter to the get/set epoch request; correct
  the description of the get/set frequency parameter.

- Document RTC_WKALM_{RD,SET}, which don't need AIE_{ON,OFF} and
  which support longer alarm periods.

- Hey, not all system clock implementations count timer irqs any
  more now that the new RT-derived clock support is merging.

raw.7
udp.7
Andi Kleen
Describe the correct default for UDP/RAW path MTU discovery.

==

Cheers,

Michael

-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?  Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 panic on boot -- i386

2006-11-29 Thread Pete Clements
Quoting Randy Dunlap
  > > 2.6.19 panics at boot. Good up through rc6-git11.
  > > Hand copied screen below.
  > 
  > Try the patch that DaveM recently posted:
  >   http://lkml.org/lkml/2006/11/29/335
  > 
  > ---
  > ~Randy
  > 
That fixed it.

-- 
Pete Clements 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/2] qrcu: "quick" srcu implementation

2006-11-29 Thread Oleg Nesterov
On 11/30, Oleg Nesterov wrote:
>
> On 11/29, Paul E. McKenney wrote:
> >
> > Hmmm...  Now I am wondering if the memory barriers inherent in the
> > __wait_event() suffice for this last barrier...  :-/  Thoughts?
> > 
> > > + smp_mb();
> 
> Fastpath skips __wait_event(), and it is possible that the reader does
> lock/unlock between the first 'mb()' and 'if (atomic_read() == 1)'.

In fact, a slow path needs (I think) it too. We can have an unrelated
wakeup, and then the reader does unlock() before we check !atomic_read()
in the __wait_event()'s loop. The reader removes us from ->wq, in that
case finish_wait() does nothing.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: A commit between 2.6.16.4 and 2.6.16.5 failed crashme

2006-11-29 Thread Zhao Forrest


Thanks for your report.

A git-bisect might be a bit of overkill considering that there were only
two patches applied beween 2.6.16.4 and 2.6.16.5:

Andi Kleen (2):
  x86_64: Clean up execve
  x86_64: When user could have changed RIP always force IRET (CVE-2006-0744)

I've attached both patches.



Hi Andi,

I found that this patch is also in 2.6.18.3, but crashme doesn't
trigger kernel panic for 2.6.18.3..weird.

Thanks,
Forrest
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-rc6-mm2

2006-11-29 Thread Ed Tomlinson
On Tuesday 28 November 2006 05:02, Andrew Morton wrote:

> Will appear eventually at
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc6/2.6.19-rc6-mm2/

This kernel does not boot here.  It does not get far enough to post anything to 
my serial console.
The last booted kernel here is 19-rc5-mm2.   Grub is used to boot, here is the 
starting log
of rc5-mm2 build is UP AMD64:

[0.00] Linux version 2.6.19-rc5-mm2 ([EMAIL PROTECTED]) (gcc version 
4.1.1 (Gentoo 4.1.1-r1)) #1 PREEM6
[0.00] Command line: root=/dev/sda3 vga=0x318 video=vesafb:ywrap,mtrr:3 
console=tty0 console=tty1
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009f800 (usable)
[0.00]  BIOS-e820: 0009f800 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 3fff (usable)
[0.00]  BIOS-e820: 3fff - 3fff3000 (ACPI NVS)
[0.00]  BIOS-e820: 3fff3000 - 4000 (ACPI data)
[0.00]  BIOS-e820: fec0 - fec01000 (reserved)
[0.00]  BIOS-e820: fee0 - fef0 (reserved)
[0.00]  BIOS-e820: fefffc00 - ff00 (reserved)
[0.00]  BIOS-e820:  - 0001 (reserved)
[0.00] end_pfn_map = 1048576
[0.00] DMI 2.2 present.
[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   DMA324096 ->  1048576
[0.00]   Normal1048576 ->  1048576
[0.00] early_node_map[2] active PFN ranges
[0.00] 0:0 ->  159
[0.00] 0:  256 ->   262128
[0.00] Nvidia board detected. Ignoring ACPI timer override.
[0.00] ACPI: PM-Timer IO Port: 0x4008
[0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0.00] Processor #0 (Bootup-CPU)
[0.00] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: BIOS IRQ0 pin2 override ignored.
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
[0.00] Setting APIC routing to flat
[0.00] Using ACPI (MADT) for SMP configuration information
[0.00] Nosave address range: 0009f000 - 000a
[0.00] Nosave address range: 000a - 000f
[0.00] Nosave address range: 000f - 0010
[0.00] Allocating PCI resources starting at 5000 (gap: 
4000:bec0)
[0.00] Built 1 zonelists.  Total pages: 257320
[0.00] Kernel command line: root=/dev/sda3 vga=0x318 
video=vesafb:ywrap,mtrr:3 console=tty0 cons1
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 4096 (order: 12, 32768 bytes)

Any ideas what I should try or suggestions on patches to remove/try.

Thanks
Ed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[rfc patch] optimize o_direct on block device

2006-11-29 Thread Chen, Kenneth W
I've been complaining about O_DIRECT I/O processing being exceedingly
complex and slow since March 2005, see posting below:
http://marc.theaimsgroup.com/?l=linux-kernel=111033309732261=2

At that time, a patch was written for raw device to demonstrate that
large performance head room is achievable (at ~20% speedup for micro-
benchmark and ~2% for db transaction processing benchmark) with a tight
I/O submission processing loop.

Since raw device is being slowly phased out, I've rewritten the patch
for block device.  O_DIRECT on block device is much simpler than O_D
on file system. Part of the reason that direct_io_worker is so complex
is because of O_D on file system, where it needs to perform block
allocation, hole detection, extents file on write, and tons of other
corner cases. The end result is that it takes tons of CPU time to
submit an I/O.

For block device, the block allocation is much simpler and I can write
a really tight double loop to iterate each iovec and each page within
the iovec in order to construct/prepare bio structure and then subsequently
submit it to the block layer.

So here it goes, posted here for comments.

A few notes on the patch:

(1) I need a vector structure similar to pagevec, however, pagevec doesn't
have everything that I need, i.e., an iterator variable.  So I create a
new struct pvec.  Maybe something can be worked out with pagevec?

(2) there are some inconsistency for synchronous I/O: condition to update
ppos and condition to wait on sync_kiocb is incompatible.  Call chain
looks like the following:

do_sync_read
   generic_file_aio_read
 ...
   blkdev_direct_IO

do_sync_read will wait for I/O completion only if lower function returned
-EIOCBQUEUED. Updating ppos is done via generic_file_aio_read, but only
if the lower function returned positive value. So I either have to construct
my own wait_on_sync_kiocb, or hack around the ppos update.

(3) I/O length setup in kiocb is inconsistent between normal read vs vector read
or aio_read.  One is passed in kiocb->ki_left vs others passing total length
in kiocb->nbytes.  I've made them consistent in the read path (note to self:
I need to add the same thing in do_sync_write).



Signed-off-by: Ken Chen <[EMAIL PROTECTED]>


--- ./fs/block_dev.c.orig   2006-11-29 14:52:20.0 -0800
+++ ./fs/block_dev.c2006-11-29 16:45:36.0 -0800
@@ -129,43 +129,147 @@ blkdev_get_block(struct inode *inode, se
return 0;
 }
 
-static int
-blkdev_get_blocks(struct inode *inode, sector_t iblock,
-   struct buffer_head *bh, int create)
+int blk_end_aio(struct bio *bio, unsigned int bytes_done, int error)
 {
-   sector_t end_block = max_block(I_BDEV(inode));
-   unsigned long max_blocks = bh->b_size >> inode->i_blkbits;
+   struct kiocb* iocb = bio->bi_private;
+   atomic_t* bio_count = (atomic_t*) >private;
+   long res;
+
+   if ((bio->bi_rw & 1) == READ)
+   bio_check_pages_dirty(bio);
+   else {
+   bio_release_pages(bio);
+   bio_put(bio);
+   }
 
-   if ((iblock + max_blocks) > end_block) {
-   max_blocks = end_block - iblock;
-   if ((long)max_blocks <= 0) {
-   if (create)
-   return -EIO;/* write fully beyond EOF */
-   /*
-* It is a read which is fully beyond EOF.  We return
-* a !buffer_mapped buffer
-*/
-   max_blocks = 0;
-   }
+   if (error)
+   iocb->ki_left = -EIO;
+
+   if (atomic_dec_and_test(bio_count)) {
+   res = (iocb->ki_left < 0) ? iocb->ki_left : iocb->ki_nbytes;
+   aio_complete(iocb, res, 0);
}
 
-   bh->b_bdev = I_BDEV(inode);
-   bh->b_blocknr = iblock;
-   bh->b_size = max_blocks << inode->i_blkbits;
-   if (max_blocks)
-   set_buffer_mapped(bh);
return 0;
 }
 
+#define VEC_SIZE   16
+struct pvec {
+   unsigned short nr;
+   unsigned short idx;
+   struct page *page[VEC_SIZE];
+};
+
+
+struct page *blk_get_page(unsigned long addr, size_t count, int rw,
+ struct pvec *pvec)
+{
+   int ret, nr_pages;
+   if (pvec->idx == pvec->nr) {
+   nr_pages = (addr + count + PAGE_SIZE - 1) / PAGE_SIZE -
+   addr / PAGE_SIZE;
+   nr_pages = min(nr_pages, VEC_SIZE);
+   down_read(>mm->mmap_sem);
+   ret = get_user_pages(current, current->mm, addr, nr_pages,
+rw==READ, 0, pvec->page, NULL);
+   up_read(>mm->mmap_sem);
+   if (ret < 0)
+   return ERR_PTR(ret);
+   pvec->nr = ret;
+   pvec->idx = 0;
+   }
+   return pvec->page[pvec->idx++];
+}
+
 

Re: [RFC, PATCH 1/2] qrcu: "quick" srcu implementation

2006-11-29 Thread Oleg Nesterov
On 11/29, Paul E. McKenney wrote:
>
> On Thu, Nov 30, 2006 at 04:57:14AM +0300, Oleg Nesterov wrote:
> > (the same patch + comments from Paul)
> > 
> With the addition of a comment for the smp_mb() at the beginning of
> synchronize_qrcu(), shown below:
> 
> Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>

Thanks!

>   /*
>* The following memory barrier is needed to ensure that
>* and subsequent freeing of data elements previously
>* removed is seen by other CPUs after the wait completes.
>*/

I think we have another reason for mb(), but I can't suggest a clear
comment.

struct data {
...
int in_use;
...
}

void free_data(struct data *p)
{
BUG_ON(p->in_use);
kfree(p);
}

struct data *DATA;

Reader:

qrcu_read_lock();
data = rcu_dereference(DATA);

data->in_use = 1;
do_something(data);
data->in_use = 0;

qrcu_read_unlock();

Writer:

old = DATA;
DATA = alloc_new_data();

synchronize_qrcu();
free_data(old);

qrcu_read_unlock() does (implicit) mb() on reader's side, but we must pair
it on our side, otherwise we can't be sure (of course, _only_ in theory) we
are seeing all the changes (->in_use == 0) made by the reader.

> Hmmm...  Now I am wondering if the memory barriers inherent in the
> __wait_event() suffice for this last barrier...  :-/  Thoughts?
> 
> > +   smp_mb();

Fastpath skips __wait_event(), and it is possible that the reader does
lock/unlock between the first 'mb()' and 'if (atomic_read() == 1)'.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] genapic: default to physical mode on hotplug CPU kernels

2006-11-29 Thread Siddha, Suresh B
On Wed, Nov 29, 2006 at 09:08:34AM +0100, Ingo Molnar wrote:
> 
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > hm - indeed. Then we can indeed do the patch below. Nice simplification!
> 
> forgot to convert a few more places - full patch below.

Acked-by: Suresh Siddha <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 panic on boot -- i386

2006-11-29 Thread Randy Dunlap
On Wed, 29 Nov 2006 22:13:09 -0500 (EST) Pete Clements wrote:

> 2.6.19 panics at boot. Good up through rc6-git11.
> Hand copied screen below.

Try the patch that DaveM recently posted:
  http://lkml.org/lkml/2006/11/29/335

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.19 panic on boot -- i386

2006-11-29 Thread Pete Clements
2.6.19 panics at boot. Good up through rc6-git11.
Hand copied screen below.
-- 
Pete Clements 


Call Trace:
[] ndisc_send_rs+0x420/0x460 [ipv6]
[] ndisc_send_rs+0x42c/0x460 [ipv6]
[] ndisc_send rs+0x420/0x460 [ipv6]
[] addrconf_dad_completed+0x93/0xe0 [ipv6]
[] addrconf_dad_timer+0x119/0x120 [ipv6]
[] rebalance_tick+0x131/0x350
[] addrconf_dad_timer+0x0/0x120 [ipv6]
[] run_timer_softirq+0x113/0x190
[] __do_softirq+0x75/0xf0
[] do_softirq+03b/0x50
[] smp_apic_timer_interrupt+0xa5/0xc0
[] apic_timer_interrupt+0x1f/0x24
[] default_idle+0x0/0x60
[] default_idle+031/0x60
[] cpu_idle+0x6c/0x90
[] start_kernel+0x34e/0x3d0
[] unknown_bootoption+0x0/0x290

Code: 8c 00 00 00 89 44 24 10 8b 44 24 2c 89 44 24 0c 8b 41 60 c7 04 24 e4 ac 36
 c0 89 44 24 08 8b 44 24 30 89 44 24 04 e8 9d 51 e6 ff <0f> 0b 5d 00 1a 84 36 c0
 83 c4 24 c3 90 55 57 56 53 83 ec 2c 8b
EIP: [] skb_over_panic+0x63/0x70 SS:ESP 0068:c03cfe08
 <0>Kernel panic - not syncing: Fatal exception in interrupt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/2] qrcu: "quick" srcu implementation

2006-11-29 Thread Paul E. McKenney
On Thu, Nov 30, 2006 at 04:57:14AM +0300, Oleg Nesterov wrote:
> (the same patch + comments from Paul)
> 
> [RFC, PATCH 1/2] qrcu: "quick" srcu implementation
> 
> Very much based on ideas, corrections, and patient explanations from
> Alan and Paul.
> 
> The current srcu implementation is very good for readers, lock/unlock
> are extremely cheap. But for that reason it is not possible to avoid
> synchronize_sched() and polling in synchronize_srcu().
> 
> Jens Axboe wrote:
> >
> > It works for me, but the overhead is still large. Before it would take
> > 8-12 jiffies for a synchronize_srcu() to complete without there actually
> > being any reader locks active, now it takes 2-3 jiffies. So it's
> > definitely faster, and as suspected the loss of two of three
> > synchronize_sched() cut down the overhead to a third.
> 
> 'qrcu' behaves the same as srcu but optimized for writers. The fast path
> for synchronize_qrcu() is mutex_lock() + atomic_read() + mutex_unlock().
> The slow path is __wait_event(), no polling. However, the reader does
> atomic inc/dec on lock/unlock, and the counters are not per-cpu.
> 
> Also, unlike srcu, qrcu read lock/unlock can be used in interrupt context,
> and 'qrcu_struct' can be compile-time initialized.
> 
> See also (a long) discussion:
>   http://marc.theaimsgroup.com/?t=11637085763

With the addition of a comment for the smp_mb() at the beginning of
synchronize_qrcu(), shown below:

Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>

> Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>
> 
> --- 19-rc6/include/linux/srcu.h~1_qrcu2006-10-22 18:24:03.0 
> +0400
> +++ 19-rc6/include/linux/srcu.h   2006-11-30 04:32:42.0 +0300
> @@ -27,6 +27,8 @@
>  #ifndef _LINUX_SRCU_H
>  #define _LINUX_SRCU_H
> 
> +#include 
> +
>  struct srcu_struct_array {
>   int c[2];
>  };
> @@ -50,4 +52,32 @@ void srcu_read_unlock(struct srcu_struct
>  void synchronize_srcu(struct srcu_struct *sp);
>  long srcu_batches_completed(struct srcu_struct *sp);
> 
> +/*
> + * fully compatible with srcu, but optimized for writers.
> + */
> +
> +struct qrcu_struct {
> + int completed;
> + atomic_t ctr[2];
> + wait_queue_head_t wq;
> + struct mutex mutex;
> +};
> +
> +int init_qrcu_struct(struct qrcu_struct *qp);
> +int qrcu_read_lock(struct qrcu_struct *qp);
> +void qrcu_read_unlock(struct qrcu_struct *qp, int idx);
> +void synchronize_qrcu(struct qrcu_struct *qp);
> +
> +/**
> + * cleanup_qrcu_struct - deconstruct a quick-RCU structure
> + * @qp: structure to clean up.
> + *
> + * Must invoke this after you are finished using a given qrcu_struct that
> + * was initialized via init_qrcu_struct().  We reserve the right to
> + * leak memory should you fail to do this!
> + */
> +static inline void cleanup_qrcu_struct(struct qrcu_struct *qp)
> +{
> +}
> +
>  #endif
> --- 19-rc6/kernel/srcu.c~1_qrcu   2006-10-22 18:24:03.0 +0400
> +++ 19-rc6/kernel/srcu.c  2006-11-30 04:39:53.0 +0300
> @@ -256,3 +256,94 @@ EXPORT_SYMBOL_GPL(srcu_read_unlock);
>  EXPORT_SYMBOL_GPL(synchronize_srcu);
>  EXPORT_SYMBOL_GPL(srcu_batches_completed);
>  EXPORT_SYMBOL_GPL(srcu_readers_active);
> +
> +/**
> + * init_qrcu_struct - initialize a quick-RCU structure.
> + * @qp: structure to initialize.
> + *
> + * Must invoke this on a given qrcu_struct before passing that qrcu_struct
> + * to any other function.  Each qrcu_struct represents a separate domain
> + * of QRCU protection.
> + */
> +int init_qrcu_struct(struct qrcu_struct *qp)
> +{
> + qp->completed = 0;
> + atomic_set(qp->ctr + 0, 1);
> + atomic_set(qp->ctr + 1, 0);
> + init_waitqueue_head(>wq);
> + mutex_init(>mutex);
> +
> + return 0;
> +}
> +
> +/**
> + * qrcu_read_lock - register a new reader for an QRCU-protected structure.
> + * @qp: qrcu_struct in which to register the new reader.
> + *
> + * Counts the new reader in the appropriate element of the qrcu_struct.
> + * Returns an index that must be passed to the matching qrcu_read_unlock().
> + */
> +int qrcu_read_lock(struct qrcu_struct *qp)
> +{
> + for (;;) {
> + int idx = qp->completed & 0x1;
> + if (likely(atomic_inc_not_zero(qp->ctr + idx)))
> + return idx;
> + }
> +}
> +
> +/**
> + * qrcu_read_unlock - unregister a old reader from an QRCU-protected 
> structure.
> + * @qp: qrcu_struct in which to unregister the old reader.
> + * @idx: return value from corresponding qrcu_read_lock().
> + *
> + * Removes the count for the old reader from the appropriate element of
> + * the qrcu_struct.
> + */
> +void qrcu_read_unlock(struct qrcu_struct *qp, int idx)
> +{
> + if (atomic_dec_and_test(qp->ctr + idx))
> + wake_up(>wq);
> +}
> +
> +/**
> + * synchronize_qrcu - wait for prior QRCU read-side critical-section 
> completion
> + * @qp: qrcu_struct with which to synchronize.
> + *
> + * Flip the completed counter, and wait for the old count to drain to zero.
> + * As with 

Re: Linux 2.6.19

2006-11-29 Thread Greg Norris
On Wed, Nov 29, 2006 at 05:08:15PM -0800, Randy Dunlap wrote:
> On Wed, 29 Nov 2006 18:56:31 -0600 Greg Norris wrote:
> > On a similar vein, it'd be nice if http://www.kernel.org/kdist/version.html 
> > would break the entries into separate lines.
> 
> I prefer to use
> http://www.kernel.org/kdist/finger_banner
> for that.

I use that in some cases as well, but the browser on my PDA insists upon 
trying to download that file rather than simply displaying it.  So I 
sometimes need to use version.html instead, even though it renders 
poorly under every browser I've tried.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-rc6-mm2: uli526x only works after reload

2006-11-29 Thread Andrew Morton
On Thu, 30 Nov 2006 02:04:15 +0100
"Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:

> > > 
> > > git-netdev-all.patch
> > > git-netdev-all-fixup.patch
> > > libphy-dont-do-that.patch
> > 
> > Are you able to eliminate libphy-dont-do-that.patch?
> > 
> > > Is a broken-out version of git-netdev-all.patch available from somewhere?
> > 
> > Nope, and my few fumbling attempts to generate the sort of patch series
> > which you want didn't work out too well.  One has to downgrade to
> > git-bisect :(
> > 
> > What does "doesn't work" mean, btw?
> 
> Well, it turns out not to be 100% reproducible.  I can only reproduce it after
> a soft reboot (eg. shutdown -r now).
> 
> Then, while configuring network interfaces the system says the interface name
> is ethxx0, but it should be eth1 (eth0 is an RTL-8139, which is not used).  
> Now
> if I run ifconfig, it says:
> 
> eth0: error fetching interface information: Device not found
> 
> and that's all (normally, ifconfig would show the information for lo and eth1,
> without eth0).  Moreover, 'ifconfig eth1' says:
> 
> eth1: error fetching interface information: Device not found
> 
> Next, I run 'rmmod uli526x' and 'modprobe uli526x' and then 'ifconfig' is
> still saying the above (about eth0), but 'ifconfig eth1' seems to work as
> it should.  However, the interface often fails to transfer anything after
> that.

Lovely.  Sounds like some startup race, perhaps against userspace.

Is CONFIG_PCI_MULTITHREAD_PROBE set?  (err, we meant to disable that for
2.6.19 but forgot).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread David Miller
From: Wenji Wu <[EMAIL PROTECTED]>
Date: Wed, 29 Nov 2006 19:56:58 -0600

> >We could also pepper tcp_recvmsg() with some very carefully placed
> >preemption disable/enable calls to deal with this even with
> >CONFIG_PREEMPT enabled.
>
> I also think about this approach. But since the "problem" happens in
> the 2.6 Desktop and Low-latency Desktop (not server), system
> responsiveness is a key feature, simply placing preemption
> disabled/enable call might not work.  If you want to place
> preemption disable/enable calls within tcp_recvmsg, you have to put
> them in the very beginning and end of the call. Disabling preemption
> would degrade system responsiveness.

We can make explicitl preemption checks in the main loop of
tcp_recvmsg(), and release the socket and run the backlog if
need_resched() is TRUE.

This is the simplest and most elegant solution to this problem.

The one suggested in your patch and paper are way overkill, there is
no reason to solve a TCP specific problem inside of the generic
scheduler.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] sata_nv: add suspend/resume support

2006-11-29 Thread Robert Hancock
The attached patch is against 2.6.18-rc6-mm1, to be applied on top of 
the patch "sata_nv: fix ATAPI in ADMA mode" which Andrew and Jeff 
already have in their trees. I've only been able to test this myself by 
doing an aborted suspend and immediate resume and verifying it doesn't 
blow up in that case (suspend-to-RAM is broken on my box and something 
isn't configured properly for suspend-to-disk to work). However, since 
resume will definitely not work on some of these controllers without 
this patch, I think it's an improvement in any case..


---

This patch adds the necessary callbacks to support suspend/resume 
properly in sata_nv. Most of the controllers don't need any specific 
handling but CK804/MCP04 controllers, whether ADMA is enabled or not, 
need some additional setup on resume.


As well as the additional storage of the controller type needed for 
proper resume handling, this also removes the inline helper functions 
for getting ADMA register locations by storing the pointers so we don't 
have to keep calculating them all the time.


Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

---
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


--- linux-2.6.19-rc6-mm1-admafixnoresume/drivers/ata/sata_nv.c  2006-11-26 
00:53:44.0 -0600
+++ linux-2.6.19-rc6-mm1-admafix/drivers/ata/sata_nv.c  2006-11-29 
18:42:17.0 -0600
@@ -49,7 +49,7 @@
 #include 
 
 #define DRV_NAME   "sata_nv"
-#define DRV_VERSION"3.2"
+#define DRV_VERSION"3.3"
 
 #define NV_ADMA_DMA_BOUNDARY   0xUL
 
@@ -213,12 +213,21 @@ struct nv_adma_port_priv {
dma_addr_t  cpb_dma;
struct nv_adma_prd  *aprd;
dma_addr_t  aprd_dma;
+   void __iomem *  ctl_block;
+   void __iomem *  gen_block;
+   void __iomem *  notifier_clear_block;
u8  flags;
 };
 
+struct nv_host_priv {
+   unsigned long   type;
+};
+
 #define NV_ADMA_CHECK_INTR(GCTL, PORT) ((GCTL) & ( 1 << (19 + (12 * (PORT)
 
 static int nv_init_one (struct pci_dev *pdev, const struct pci_device_id *ent);
+static void nv_remove_one (struct pci_dev *pdev);
+static int nv_pci_device_resume(struct pci_dev *pdev);
 static void nv_ck804_host_stop(struct ata_host *host);
 static irqreturn_t nv_generic_interrupt(int irq, void *dev_instance);
 static irqreturn_t nv_nf2_interrupt(int irq, void *dev_instance);
@@ -239,6 +248,8 @@ static irqreturn_t nv_adma_interrupt(int
 static void nv_adma_irq_clear(struct ata_port *ap);
 static int nv_adma_port_start(struct ata_port *ap);
 static void nv_adma_port_stop(struct ata_port *ap);
+static int nv_adma_port_suspend(struct ata_port *ap, pm_message_t mesg);
+static int nv_adma_port_resume(struct ata_port *ap);
 static void nv_adma_error_handler(struct ata_port *ap);
 static void nv_adma_host_stop(struct ata_host *host);
 static void nv_adma_bmdma_setup(struct ata_queued_cmd *qc);
@@ -292,7 +303,9 @@ static struct pci_driver nv_pci_driver =
.name   = DRV_NAME,
.id_table   = nv_pci_tbl,
.probe  = nv_init_one,
-   .remove = ata_pci_remove_one,
+   .suspend= ata_pci_device_suspend,
+   .resume = nv_pci_device_resume,
+   .remove = nv_remove_one,
 };
 
 static struct scsi_host_template nv_sht = {
@@ -311,6 +324,8 @@ static struct scsi_host_template nv_sht 
.slave_configure= ata_scsi_slave_config,
.slave_destroy  = ata_scsi_slave_destroy,
.bios_param = ata_std_bios_param,
+   .suspend= ata_scsi_device_suspend,
+   .resume = ata_scsi_device_resume,
 };
 
 static struct scsi_host_template nv_adma_sht = {
@@ -330,6 +345,8 @@ static struct scsi_host_template nv_adma
.slave_configure= nv_adma_slave_config,
.slave_destroy  = ata_scsi_slave_destroy,
.bios_param = ata_std_bios_param,
+   .suspend= ata_scsi_device_suspend,
+   .resume = ata_scsi_device_resume,
 };
 
 static const struct ata_port_operations nv_generic_ops = {
@@ -438,6 +455,8 @@ static const struct ata_port_operations 
.scr_write  = nv_scr_write,
.port_start = nv_adma_port_start,
.port_stop  = nv_adma_port_stop,
+   .port_suspend   = nv_adma_port_suspend,
+   .port_resume= nv_adma_port_resume,
.host_stop  = nv_adma_host_stop,
 };
 
@@ -476,6 +495,7 @@ static struct ata_port_info nv_port_info
{
.sht= _adma_sht,
.flags  = ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY |
+ ATA_FLAG_HRST_TO_RESUME 

Re: XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c (kernel 2.6.18.1)

2006-11-29 Thread David Chinner
On Wed, Nov 29, 2006 at 10:17:25AM +0100, Jesper Juhl wrote:
> On 29/11/06, David Chinner <[EMAIL PROTECTED]> wrote:
> >On Tue, Nov 28, 2006 at 04:49:00PM +0100, Jesper Juhl wrote:
> >> Filesystem "dm-1": XFS internal error xfs_trans_cancel at line 1138 of
> >> file fs/xfs/xfs_trans.c.  Caller 0x8034b47e
> >>
> >> Call Trace:
> >> [] show_trace+0xb2/0x380
> >> [] dump_stack+0x15/0x20
> >> [] xfs_error_report+0x3c/0x50
> >> [] xfs_trans_cancel+0x6e/0x130
> >> [] xfs_create+0x5ee/0x6a0
> >> [] xfs_vn_mknod+0x156/0x2e0
> >> [] xfs_vn_create+0xb/0x10
> >> [] vfs_create+0x8c/0xd0
> >> [] nfsd_create_v3+0x31a/0x560
> >> [] nfsd3_proc_create+0x148/0x170
> >> [] nfsd_dispatch+0xf9/0x1e0
> >> [] svc_process+0x437/0x6e0
> >> [] nfsd+0x1cd/0x360
> >> [] child_rip+0xa/0x12
> >> xfs_force_shutdown(dm-1,0x8) called from line 1139 of file
> >> fs/xfs/xfs_trans.c.  Return address = 0x80359daa
> >
> >We shut down the filesystem because we cancelled a dirty transaction.
> >Once we start to dirty the incore objects, we can't roll back to
> >an unchanged state if a subsequent fatal error occurs during the
> >transaction and we have to abort it.
> >
> So you are saying that there's nothing I can do to prevent this from
> happening in the future?

Pretty much - we need to work out what is going wrong and
we can't from teh shutdown message above - the error has
occurred in a path that doesn't have error report traps
in it.

Is this reproducable?

> >If I understand historic occurrences of this correctly, there is
> >a possibility that it can be triggered in ENOMEM situations. Was your
> >machine running out of memoy when this occurred?
> >
> Not really. I just checked my monitoring software and, at the time
> this happened, the box had ~5.9G RAM free (of 8G total) and no swap
> used (but 11G available).

Ok. Sounds like we need more error reporting points inserted
into that code so we dump an error earlier and hence have some
hope of working out what went wrong next time.

OOC, there weren't any I/O errors reported before this shutdown?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.18-3.rt10.0001 report

2006-11-29 Thread Sergio Monteiro Basto
Hi,
Like I right for rt7 , I had successfully boot without notsc,
but not all times.

The same happens with rt10 I can have 3 results when I boot without
notsc (with notsc I don't had/see any problem) : 
1st boot without errors  (dmesg on
http://bugzilla.kernel.org/show_bug.cgi?id=6419#c59 )
2nd hangs on boot with last message
input: ImPS/2 Generic Wheel Mouse as /class/input/input1and 
3rd boot but gives a long oops.
(dmesg on http://bugzilla.kernel.org/show_bug.cgi?id=6419#c60 )

Thanks,
-- 
Sérgio M.B.


smime.p7s
Description: S/MIME cryptographic signature


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread Wenji Wu
> That yield() will need to be removed - yield()'s behaviour is truly 
> awfulif the system is otherwise busy.  What is it there for?

Please read the uploaded paper, which has detailed description.

thanks,

wenji

- Original Message -
From: Andrew Morton <[EMAIL PROTECTED]>
Date: Wednesday, November 29, 2006 7:08 pm
Subject: Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

> On Wed, 29 Nov 2006 16:53:11 -0800 (PST)
> David Miller <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Please, it is very difficult to review your work the way you have
> > submitted this patch as a set of 4 patches.  These patches have not
> > been split up "logically", but rather they have been split up "per
> > file" with the same exact changelog message in each patch posting.
> > This is very clumsy, and impossible to review, and wastes a lot of
> > mailing list bandwith.
> > 
> > We have an excellent file, called 
> Documentation/SubmittingPatches, in
> > the kernel source tree, which explains exactly how to do this
> > correctly.
> > 
> > By splitting your patch into 4 patches, one for each file touched,
> > it is impossible to review your patch as a logical whole.
> > 
> > Please also provide your patch inline so people can just hit reply
> > in their mail reader client to quote your patch and comment on it.
> > This is impossible with the attachments you've used.
> > 
> 
> Here you go - joined up, cleaned up, ported to mainline and test-
> compiled.
> That yield() will need to be removed - yield()'s behaviour is truly 
> awfulif the system is otherwise busy.  What is it there for?
> 
> 
> 
> From: Wenji Wu <[EMAIL PROTECTED]>
> 
> For Linux TCP, when the network applcaiton make system call to move 
> data from
> socket's receive buffer to user space by calling tcp_recvmsg().  
> The socket
> will be locked.  During this period, all the incoming packet for 
> the TCP
> socket will go to the backlog queue without being TCP processed
> 
> Since Linux 2.6 can be inerrupted mid-task, if the network application
> expires, and moved to the expired array with the socket locked, all 
> thepackets within the backlog queue will not be TCP processed till 
> the network
> applicaton resume its execution.  If the system is heavily loaded, 
> TCP can
> easily RTO in the Sender Side.
> 
> 
> 
> include/linux/sched.h |2 ++
> kernel/fork.c |3 +++
> kernel/sched.c|   24 ++--
> net/ipv4/tcp.c|9 +
> 4 files changed, 32 insertions(+), 6 deletions(-)
> 
> diff -puN net/ipv4/tcp.c~tcp-speedup net/ipv4/tcp.c
> --- a/net/ipv4/tcp.c~tcp-speedup
> +++ a/net/ipv4/tcp.c
> @@ -1109,6 +1109,8 @@ int tcp_recvmsg(struct kiocb *iocb, stru
>   struct task_struct *user_recv = NULL;
>   int copied_early = 0;
> 
> + current->backlog_flag = 1;
> +
>   lock_sock(sk);
> 
>   TCP_CHECK_TIMER(sk);
> @@ -1468,6 +1470,13 @@ skip_copy:
> 
>   TCP_CHECK_TIMER(sk);
>   release_sock(sk);
> +
> + current->backlog_flag = 0;
> + if (current->extrarun_flag == 1){
> + current->extrarun_flag = 0;
> + yield();
> + }
> +
>   return copied;
> 
> out:
> diff -puN include/linux/sched.h~tcp-speedup include/linux/sched.h
> --- a/include/linux/sched.h~tcp-speedup
> +++ a/include/linux/sched.h
> @@ -1023,6 +1023,8 @@ struct task_struct {
> #ifdefCONFIG_TASK_DELAY_ACCT
>   struct task_delay_info *delays;
> #endif
> + int backlog_flag;   /* packets wait in tcp backlog queue flag */
> + int extrarun_flag;  /* extra run flag for TCP performance */
> };
> 
> static inline pid_t process_group(struct task_struct *tsk)
> diff -puN kernel/sched.c~tcp-speedup kernel/sched.c
> --- a/kernel/sched.c~tcp-speedup
> +++ a/kernel/sched.c
> @@ -3099,12 +3099,24 @@ void scheduler_tick(void)
> 
>   if (!rq->expired_timestamp)
>   rq->expired_timestamp = jiffies;
> - if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
> - enqueue_task(p, rq->expired);
> - if (p->static_prio < rq->best_expired_prio)
> - rq->best_expired_prio = p->static_prio;
> - } else
> - enqueue_task(p, rq->active);
> + if (p->backlog_flag == 0) {
> + if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
> + enqueue_task(p, rq->expired);
> + if (p->static_prio < rq->best_expired_prio)
> + rq->best_expired_prio = p-
> >static_prio;+} else
> + enqueue_task(p, rq->active);
> + } else {
> + if (expired_starving(rq)) {
> + enqueue_task(p,rq->expired);
> + if (p->static_prio < rq->best_expired_prio)
> + rq->best_expired_prio = p-
> 

Re: [RFC, PATCH 1/2] qrcu: "quick" srcu implementation

2006-11-29 Thread Oleg Nesterov
(the same patch + comments from Paul)

[RFC, PATCH 1/2] qrcu: "quick" srcu implementation

Very much based on ideas, corrections, and patient explanations from
Alan and Paul.

The current srcu implementation is very good for readers, lock/unlock
are extremely cheap. But for that reason it is not possible to avoid
synchronize_sched() and polling in synchronize_srcu().

Jens Axboe wrote:
>
> It works for me, but the overhead is still large. Before it would take
> 8-12 jiffies for a synchronize_srcu() to complete without there actually
> being any reader locks active, now it takes 2-3 jiffies. So it's
> definitely faster, and as suspected the loss of two of three
> synchronize_sched() cut down the overhead to a third.

'qrcu' behaves the same as srcu but optimized for writers. The fast path
for synchronize_qrcu() is mutex_lock() + atomic_read() + mutex_unlock().
The slow path is __wait_event(), no polling. However, the reader does
atomic inc/dec on lock/unlock, and the counters are not per-cpu.

Also, unlike srcu, qrcu read lock/unlock can be used in interrupt context,
and 'qrcu_struct' can be compile-time initialized.

See also (a long) discussion:
http://marc.theaimsgroup.com/?t=11637085763

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 19-rc6/include/linux/srcu.h~1_qrcu  2006-10-22 18:24:03.0 +0400
+++ 19-rc6/include/linux/srcu.h 2006-11-30 04:32:42.0 +0300
@@ -27,6 +27,8 @@
 #ifndef _LINUX_SRCU_H
 #define _LINUX_SRCU_H
 
+#include 
+
 struct srcu_struct_array {
int c[2];
 };
@@ -50,4 +52,32 @@ void srcu_read_unlock(struct srcu_struct
 void synchronize_srcu(struct srcu_struct *sp);
 long srcu_batches_completed(struct srcu_struct *sp);
 
+/*
+ * fully compatible with srcu, but optimized for writers.
+ */
+
+struct qrcu_struct {
+   int completed;
+   atomic_t ctr[2];
+   wait_queue_head_t wq;
+   struct mutex mutex;
+};
+
+int init_qrcu_struct(struct qrcu_struct *qp);
+int qrcu_read_lock(struct qrcu_struct *qp);
+void qrcu_read_unlock(struct qrcu_struct *qp, int idx);
+void synchronize_qrcu(struct qrcu_struct *qp);
+
+/**
+ * cleanup_qrcu_struct - deconstruct a quick-RCU structure
+ * @qp: structure to clean up.
+ *
+ * Must invoke this after you are finished using a given qrcu_struct that
+ * was initialized via init_qrcu_struct().  We reserve the right to
+ * leak memory should you fail to do this!
+ */
+static inline void cleanup_qrcu_struct(struct qrcu_struct *qp)
+{
+}
+
 #endif
--- 19-rc6/kernel/srcu.c~1_qrcu 2006-10-22 18:24:03.0 +0400
+++ 19-rc6/kernel/srcu.c2006-11-30 04:39:53.0 +0300
@@ -256,3 +256,94 @@ EXPORT_SYMBOL_GPL(srcu_read_unlock);
 EXPORT_SYMBOL_GPL(synchronize_srcu);
 EXPORT_SYMBOL_GPL(srcu_batches_completed);
 EXPORT_SYMBOL_GPL(srcu_readers_active);
+
+/**
+ * init_qrcu_struct - initialize a quick-RCU structure.
+ * @qp: structure to initialize.
+ *
+ * Must invoke this on a given qrcu_struct before passing that qrcu_struct
+ * to any other function.  Each qrcu_struct represents a separate domain
+ * of QRCU protection.
+ */
+int init_qrcu_struct(struct qrcu_struct *qp)
+{
+   qp->completed = 0;
+   atomic_set(qp->ctr + 0, 1);
+   atomic_set(qp->ctr + 1, 0);
+   init_waitqueue_head(>wq);
+   mutex_init(>mutex);
+
+   return 0;
+}
+
+/**
+ * qrcu_read_lock - register a new reader for an QRCU-protected structure.
+ * @qp: qrcu_struct in which to register the new reader.
+ *
+ * Counts the new reader in the appropriate element of the qrcu_struct.
+ * Returns an index that must be passed to the matching qrcu_read_unlock().
+ */
+int qrcu_read_lock(struct qrcu_struct *qp)
+{
+   for (;;) {
+   int idx = qp->completed & 0x1;
+   if (likely(atomic_inc_not_zero(qp->ctr + idx)))
+   return idx;
+   }
+}
+
+/**
+ * qrcu_read_unlock - unregister a old reader from an QRCU-protected structure.
+ * @qp: qrcu_struct in which to unregister the old reader.
+ * @idx: return value from corresponding qrcu_read_lock().
+ *
+ * Removes the count for the old reader from the appropriate element of
+ * the qrcu_struct.
+ */
+void qrcu_read_unlock(struct qrcu_struct *qp, int idx)
+{
+   if (atomic_dec_and_test(qp->ctr + idx))
+   wake_up(>wq);
+}
+
+/**
+ * synchronize_qrcu - wait for prior QRCU read-side critical-section completion
+ * @qp: qrcu_struct with which to synchronize.
+ *
+ * Flip the completed counter, and wait for the old count to drain to zero.
+ * As with classic RCU, the updater must use some separate means of
+ * synchronizing concurrent updates.  Can block; must be called from
+ * process context.
+ *
+ * Note that it is illegal to call synchronize_qrcu() from the corresponding
+ * QRCU read-side critical section; doing so will result in deadlock.
+ * However, it is perfectly legal to call synchronize_qrcu() on one
+ * qrcu_struct from some other qrcu_struct's read-side critical section.
+ */
+void 

Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread Wenji Wu
Yes, when CONFIG_PREEMPT is disabled, the "problem" won't happen. That is why I 
put "for 2.6 desktop, low-latency desktop" in the uploaded paper. This 
"problem" happens in the 2.6 Desktop and Low-latency Desktop.

>We could also pepper tcp_recvmsg() with some very carefully placed preemption 
>disable/enable calls to deal with this even with CONFIG_PREEMPT enabled.

I also think about this approach. But since the "problem" happens in the 2.6 
Desktop and Low-latency Desktop (not server), system responsiveness is a key 
feature, simply placing preemption disabled/enable call might not work.  If you 
want to place preemption disable/enable calls within tcp_recvmsg, you have to 
put them in the very beginning and end of the call. Disabling preemption would 
degrade system responsiveness.

wenji



- Original Message -
From: David Miller <[EMAIL PROTECTED]>
Date: Wednesday, November 29, 2006 7:13 pm
Subject: Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Wed, 29 Nov 2006 17:08:35 -0800
> 
> > On Wed, 29 Nov 2006 16:53:11 -0800 (PST)
> > David Miller <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > Please, it is very difficult to review your work the way you have
> > > submitted this patch as a set of 4 patches.  These patches have 
> not> > been split up "logically", but rather they have been split 
> up "per
> > > file" with the same exact changelog message in each patch posting.
> > > This is very clumsy, and impossible to review, and wastes a lot of
> > > mailing list bandwith.
> > > 
> > > We have an excellent file, called 
> Documentation/SubmittingPatches, in
> > > the kernel source tree, which explains exactly how to do this
> > > correctly.
> > > 
> > > By splitting your patch into 4 patches, one for each file touched,
> > > it is impossible to review your patch as a logical whole.
> > > 
> > > Please also provide your patch inline so people can just hit reply
> > > in their mail reader client to quote your patch and comment on it.
> > > This is impossible with the attachments you've used.
> > > 
> > 
> > Here you go - joined up, cleaned up, ported to mainline and test-
> compiled.> 
> > That yield() will need to be removed - yield()'s behaviour is 
> truly awful
> > if the system is otherwise busy.  What is it there for?
> 
> What about simply turning off CONFIG_PREEMPT to fix this "problem"?
> 
> We always properly run the backlog (by doing a release_sock()) before
> going to sleep otherwise except for the specific case of taking a page
> fault during the copy to userspace.  It is only CONFIG_PREEMPT that
> can cause this situation to occur in other circumstances as far as I
> can see.
> 
> We could also pepper tcp_recvmsg() with some very carefully placed
> preemption disable/enable calls to deal with this even with
> CONFIG_PREEMPT enabled.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: isochronous receives?

2006-11-29 Thread Keith Curtis
Hi Robert,

I never resolved the problem. I turned on the excessive debugging output, but 
it 
didn't print out info about receiving packets or interrupts. My test 
app claimed there were no packets received although the bus analyzer 
showed lots of packets going by.  

If I can help out, let me know, but I'm not sure where to start at this point.

Keith


-Original Message-
From: Robert Crocombe [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 28, 2006 4:59 PM
To: Keith Curtis; linux1394-devel; linux-kernel
Subject: isochronous receives?


Keith, et. al,

I am having problems with isochronous receives, and remembered just as
I was getting ready to dig into the source that there was a message
about this stuff.  Lo and behold your message to linux1394-user from
September 7:

> I'm trying to receive isochronous streams (using libraw1394 1.2.0), and
> I've noticed that if data is transmitted on channel 63, then my app tends
> to work fine. If the stream is on a different channel, then I don't see
> any isochronous packets at all.  I'm using 2.4.29, I've also tried 2.6.15
> with similar results, can't seem to receive channels < 63.

Did you ultimately have any success getting this going?  Funnily
enough, when I tested isochronous stuff in July, I just did iso
transmit since I figured receives *must* be working since everyone has
camcorders and whatnot.  My currently my iso xmit stuff does appear to
be working, but iso receives are not.

I have a Firespy and no reason not to trust it, so I can see the junk
I'm spewing out.  I've tried transmitting on channels 4 and 63 (per
your advice), but neither works for me.  I suppose it could my
stuff... nah.

-- 
Robert Crocombe
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Infinite retries reading the partition table

2006-11-29 Thread Luben Tuikov
--- Luben Tuikov <[EMAIL PROTECTED]> wrote:

> Suppose reading sector 0 always reports an error,
> sense key HARDWARE ERROR.
> 
> What I'm observing is that the request to read sector 0,
> reading partition information, is retried forever, ad infinitum.
> 
> Does anyone have a patch to resolve this? (2.6.19-rc6)

Actually the device sends SK: MEDIUM ERROR, ASC: UNRECOVERED READ ERR,
but SCSI Core seems to retry reading the partition table (sector 0)
forever.

Anyone seen this and/or has a patch in their tree for it?

   Luben
P.S.  This is fairly straightforward to inject/test.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.19

2006-11-29 Thread Phil Oester
Getting an oops on boot here, caused by commit
e81c73596704793e73e6dbb478f41686f15a4b34 titled
"[NET]: Fix MAX_HEADER setting".

Reverting that patch fixes things up for me.  Dave?

Phil



Bringing up interface eth0:  
skb_over_panic: text:c02af809 len:56 put:16 head:d7e213c0
data:d7e213d0 tail:d7e21408 end:d7e21400 dev:eth0
[ cut here ]
kernel BUG at net/core/skbuff.c:93!
invalid opcode:  [#1]
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010296   (2.6.19 #1)
EIP is at skb_over_panic+0x59/0x70
eax: 006f   ebx: d7e213c0   ecx:    edx: c03102c0
esi: d7e4f000   edi: d7e213f8   ebp: d7e4f000   esp: c037aec4
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, ti=c037a000 task=c03023e0 task.ti=c0347000)
Stack: c02fbb9c c02af809 0038 0010 d7e213c0 d7e213d0 d7e21408 d7e21400
   d7e4f000 0010 d6e84520 c02af80e d6c718a0 c037af6c 003a 0010
   c037af6c d6c718a0 d6c09920 d79749c0 0001  02ff 
Call Trace:
 [] ndisc_send_rs+0x399/0x3e0
 [] ndisc_send_rs+0x39e/0x3e0
 [] addrconf_dad_completed+0x82/0xc0
 [] addrconf_dad_timer+0xe5/0xf0
 [] e100_poll+0x259/0x420
 [] it_real_fn+0x0/0x60
 [] cascade+0x3f/0x60
 [] addrconf_dad_timer+0x0/0xf0
 [] run_timer_softirq+0xab/0x170
 [] __do_softirq+0x42/0xa0
 [] do_softirq+0x60/0xb0
 [] handle_edge_irq+0x0/0x110
 [] do_IRQ+0x85/0xe0
 [] schedule+0x29e/0x580
 [] common_interrupt+0x1a/0x20
 [] default_idle+0x32/0x60
 [] cpu_idle+0x42/0x60
 [] start_kernel+0x283/0x330
 [] unknown_bootoption+0x0/0x260
 ===
Code: 00 00 89 5c 24 14 8b 98 8c 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 60 89 
4c 2
4 04 c7 04 24 9c bb 2f c0 89 44 24 08 e8 47 07 ed ff <0f> 0b 5d 00 a4 91 2f c0 
83
c4 24 5b 5e c3 89 f6 8d bc 27 00 00
EIP: [] skb_over_panic+0x59/0x70 SS:ESP 0068:c037aec4
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] x86_64 UP needs smp_call_function_single

2006-11-29 Thread Andrew Morton
On Wed, 29 Nov 2006 17:01:11 -0800
Randy Dunlap <[EMAIL PROTECTED]> wrote:

> From: Randy Dunlap <[EMAIL PROTECTED]>
> 
> smp_call_function_single() needs to be visible in non-SMP builds, to fix:
> 
> arch/x86_64/kernel/vsyscall.c:283: warning: implicit declaration of function 
> 'smp_call_function_single'
> 
> The (other/trivial) fix (instead of this one) is to add:
> #include 
> to linux-2.6.19-rc6-mm2/arch/x86_64/kernel/vsyscall.c
> 
> Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
> ---
>  include/asm-x86_64/smp.h |7 ---
>  include/linux/smp.h  |7 +++
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> --- linux-2.6.19-rc6-mm2.orig/include/asm-x86_64/smp.h
> +++ linux-2.6.19-rc6-mm2/include/asm-x86_64/smp.h
> @@ -113,13 +113,6 @@ static __inline int logical_smp_processo
>  #define cpu_physical_id(cpu) x86_cpu_to_apicid[cpu]
>  #else
>  #define cpu_physical_id(cpu) boot_cpu_id
> -static inline int smp_call_function_single(int cpuid, void (*func) (void 
> *info),
> - void *info, int retry, int wait)
> -{
> - /* Disable interrupts here? */
> - func(info);
> - return 0;
> -}
>  #endif /* !CONFIG_SMP */
>  #endif
>  
> --- linux-2.6.19-rc6-mm2.orig/include/linux/smp.h
> +++ linux-2.6.19-rc6-mm2/include/linux/smp.h
> @@ -99,6 +99,13 @@ static inline int up_smp_call_function(v
>  static inline void smp_send_reschedule(int cpu) { }
>  #define num_booting_cpus()   1
>  #define smp_prepare_boot_cpu()   do {} while (0)
> +static inline int smp_call_function_single(int cpuid, void (*func) (void 
> *info),
> + void *info, int retry, int wait)
> +{
> + /* Disable interrupts here? */
> + func(info);
> + return 0;
> +}
>  
>  #endif /* !SMP */
>  

No, I think this patch is right - the declaration of the CONFIG_SMP
smp_call_function_single() is in linux/smp.h so the !CONFIG_SMP declaration
or definition should be there too.

It's still buggy though.  It should disable local interrupts around the
call to match the SMP version.  I'll fix that separately.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PM-Timer clock source is slow. Try something else: How slow? What other source(s)?

2006-11-29 Thread john stultz
On Wed, 2006-11-29 at 16:56 -0800, Linda Walsh wrote:
> I recently noticed this message in my bootup that I don't remember
> from before:
> 
> PCI: Probing PCI hardware (bus 00)
> * Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
> * this clock source is slow. Consider trying other clock sources

This basically means that your chipset has a bug which requires the ACPI
PM timer to be read three times in order to get a valid reading.

This will cause gettimeofday/clock_gettime to take longer to execute,
which is what is meant by "slow" (rather then the counter's frequency
being incorrect).

> How would this affect my clock?  It says to try another
> clock source, what type of clock source would it be suggesting I
> use? Another chip already in the computer? It is an Intel 440BX
> chipset; on an Dell motherboard. Would that be likely to have
> another chip source that is compensating?
> 
> I don't notice a significant clock slowdown, but I'm running NTP,
> so that could be masking the problem.

Unless you're running performance critical programs that utilize
gettimeofday/clock_gettime, you probably won't notice anything. Time
should still function properly.  If you are having performance issues,
you can try using a different clocksource (the TSC is probably safe, but
not necessarily).

thanks
-john



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bulk] Re: [patch 2.6.19-rc6] fix hotplug for legacy platform drivers

2006-11-29 Thread David Brownell
On Wednesday 29 November 2006 3:02 pm, Greg KH wrote:
> > 
> > Here's my fix.  ...
> > ... I audited all the drivers using the relevant APIs, and I can't 
> > see many (if any!) folk hitting problems from this.
> 
> But this still can cause the problem that your 'modalias' file in sysfs
> contains exactly the same name as the module itself, right?

Not a problem if folk stick to the original design.  Hotplug will at
most "modprobe $MODALIAS" (iff the device needs a driver) before doing
a udevsend ... and only coldplug uses "modprobe $(cat modalias)".

The two were provided to address distinct problems.  And the issue that
was described to me was _only_ relevant on the hotplug paths; coldplug
scripts, using /sys/devices/.../modalias files, see no problems.


I could update the patch so that attribute turns into a null string,
but that would have a **negative effect** since it would break coldplug
for all the platform init code which doesn't use platform_add_devices()
or maybe platform_device_register().


> That's not good, it should be an alias, not the "real name".

Well, adding unjustified complexity _after the fact_ isn't good either,
and that's what I see going on here.

How many years has KMOD been around?  It's worked just fine without that
sort of bizarre (and un-needed) rule.  Aliases were provided just to give
*additional* names to modules ... not to make one needlessly "special".
Kernel request_module() calls make no distinction between which type of
name they use ... and when the filesystem name changes, they still work
when the old name is properly aliased.

That whole "give a module an alias of itself" model just seems ludicrous
to me.  We _know_ that "A" means "A", so there's no point in aliasing it
as itself.


... plus it's a distraction from the real problem, namely that certain
"legacy" drivers, primarily stuffed onto the "platform" bus, can't ever
hotplug.  (While normal platform drivers do so just fine, without needing
strange rules to make some names "more equal than others".)

 
> That will ensure that userspace tools do not get confused, 

I don't observe any confusion on my systems.  Platform device hotplug
works just fine.  Udev creates their /dev nodes.  If there are some
tools getting confused, that seems best characterized as tool bugs.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.16.32 stuck in generic_file_aio_write()

2006-11-29 Thread erich

Dear Igmar Palsenberg,

If you are working on arcmsr 1.20.00.13 for official kernel version.
This is the last version.
Could you check your RAID controller event and tell someting to me?
You can check "MBIOS"=>"Physical Drive Information"=>"View Drive 
Information"=>"Select The Drive"=>"Timeout Count"..
It could tell you which disk had bad behavior cause your RAID volume 
offline.
About the message dump from arcmsr, it said that your RAID volume had 
something wrong and kicked out from the system.

How about your RAID config?
Areca had new firmware released (1.42).
If you are working on "sg" device with scsi passthrough ioctl method to feed 
data into Areca's RAID volume.

You need to limit your data under 512 blocks (256K) each transfer.
The new firmware will enlarge it into 4096 blocks (2M) each transfer.
The firmware version 1.42 is on releasing procedure but not yet put it on 
Areca ftp site.

If you need it, please tell me again.

Best Regards
Erich Chen


- Original Message - 
From: "Igmar Palsenberg" <[EMAIL PROTECTED]>

To: 
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, November 29, 2006 8:41 PM
Subject: 2.6.16.32 stuck in generic_file_aio_write()




Hi,

I've got a machine which occasionally locks up. I can still sysrq it from
a serial console, so it's not entirely dead.

A sysrq-t learns me that it's got a large number of httpd processes stuck
in D state :

httpd D F7619440  2160 11635   2057 11636   (NOTLB)
dbb7ae14 cc9b0550 c33224a0 f7619440 de187604  00b3 0001
  00b3   d374a550 c33224a0 0005b8d8 f04af800
000f75e7
  d374a550 cc9b0550 cc9b0678 ef7d33ec ef7d33e8 cc9b0550 ef7d33fc
c041bf70
Call Trace:
[] __mutex_lock_slowpath+0x92/0x43e
[] generic_file_aio_write+0x5c/0xfa
[] generic_file_aio_write+0x5c/0xfa
[] generic_file_aio_write+0x5c/0xfa
[] permission+0xad/0xcb
[] ext3_file_write+0x3b/0xb0
[] do_sync_write+0xd5/0x130
[] _spin_unlock+0xb/0xf
[] autoremove_wake_function+0x0/0x4b
[] vfs_write+0x1a3/0x1a8
[] sys_write+0x4b/0x74
[] sysenter_past_esp+0x54/0x75

After this, the machine is rendered useless (probably due to the fact that
disk IO isn't working anymore).

The lock debugging gives me this :

D   httpd:11635 [cc9b0550, 116] blocked on mutex: [ef7d33e8]
{inode_init_once}
.. held by: httpd:  506 [d67e1000, 121]
... acquired at:   generic_file_aio_write+0x5c/0xfa


I see similiar things as mentioned in http://lkml.org/lkml/2006/1/10/64,
with the difference that I'm not running software RAID or SATA (it's an
Areca ARC-1110).

I can't reproduce it until now, it 'just' happens. Can someone give me a
pointer where to start looking ?

Erich, I've CC-ed you since the machine is running an Areca RAID config.
It's also the only used disk subsystem in this machine.


Regards,


Igmar



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RCU] adds a prefetch() in rcu_do_batch()

2006-11-29 Thread Paul E. McKenney
On Wed, Nov 22, 2006 at 04:02:29PM +0100, Eric Dumazet wrote:
> On some workloads, (for example when lot of close() syscalls are done), RCU
> qlen can be quite large, and RCU heads are no longer in cpu cache when
> rcu_do_batch() is called.
> 
> This patches adds a prefetch() in rcu_do_batch() to give CPU a hint to bring
> back cache lines containing 'struct rcu_head's.
> 
> Most list manipulations macros include prefetch(), but not open coded ones (at
> least with current C compilers :) )
> 
> I got a nice speedup on a trivial benchmark  (3.48 us per iteration instead of
> 3.95 us on a 1.6 GHz Pentium-M)
> while (1) { pipe(p); close(fd[0]); close(fd[1]);}

Interesting!  How much of the speedup was due to the prefetch() and how
much to removing the extra store to rdp->donelist?

Thanx, Paul

> Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

> --- linux-2.6.19-rc6/kernel/rcupdate.c2006-11-16 05:03:40.0 
> +0100
> +++ linux-2.6.19-rc6-ed/kernel/rcupdate.c 2006-11-22 15:12:09.0 
> +0100
> @@ -235,12 +235,14 @@ static void rcu_do_batch(struct rcu_data
> 
>   list = rdp->donelist;
>   while (list) {
> - next = rdp->donelist = list->next;
> + next = list->next;
> + prefetch(next);
>   list->func(list);
>   list = next;
>   if (++count >= rdp->blimit)
>   break;
>   }
> + rdp->donelist = list;
> 
>   local_irq_disable();
>   rdp->qlen -= count;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Infinite retries reading the partition table

2006-11-29 Thread Luben Tuikov
Suppose reading sector 0 always reports an error,
sense key HARDWARE ERROR.

What I'm observing is that the request to read sector 0,
reading partition information, is retried forever, ad infinitum.

Does anyone have a patch to resolve this? (2.6.19-rc6)

Thanks,
Luben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread David Miller
From: Andrew Morton <[EMAIL PROTECTED]>
Date: Wed, 29 Nov 2006 17:08:35 -0800

> On Wed, 29 Nov 2006 16:53:11 -0800 (PST)
> David Miller <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Please, it is very difficult to review your work the way you have
> > submitted this patch as a set of 4 patches.  These patches have not
> > been split up "logically", but rather they have been split up "per
> > file" with the same exact changelog message in each patch posting.
> > This is very clumsy, and impossible to review, and wastes a lot of
> > mailing list bandwith.
> > 
> > We have an excellent file, called Documentation/SubmittingPatches, in
> > the kernel source tree, which explains exactly how to do this
> > correctly.
> > 
> > By splitting your patch into 4 patches, one for each file touched,
> > it is impossible to review your patch as a logical whole.
> > 
> > Please also provide your patch inline so people can just hit reply
> > in their mail reader client to quote your patch and comment on it.
> > This is impossible with the attachments you've used.
> > 
> 
> Here you go - joined up, cleaned up, ported to mainline and test-compiled.
> 
> That yield() will need to be removed - yield()'s behaviour is truly awful
> if the system is otherwise busy.  What is it there for?

What about simply turning off CONFIG_PREEMPT to fix this "problem"?

We always properly run the backlog (by doing a release_sock()) before
going to sleep otherwise except for the specific case of taking a page
fault during the copy to userspace.  It is only CONFIG_PREEMPT that
can cause this situation to occur in other circumstances as far as I
can see.

We could also pepper tcp_recvmsg() with some very carefully placed
preemption disable/enable calls to deal with this even with
CONFIG_PREEMPT enabled.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-rc6-mm2: uli526x only works after reload

2006-11-29 Thread Rafael J. Wysocki
On Thursday, 30 November 2006 00:26, Andrew Morton wrote:
> On Thu, 30 Nov 2006 00:08:21 +0100
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> 
> > On Wednesday, 29 November 2006 22:31, Rafael J. Wysocki wrote:
> > > On Wednesday, 29 November 2006 22:30, Andrew Morton wrote:
> > > > On Wed, 29 Nov 2006 21:08:00 +0100
> > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > On Wednesday, 29 November 2006 20:54, Rafael J. Wysocki wrote:
> > > > > > On Tuesday, 28 November 2006 11:02, Andrew Morton wrote:
> > > > > > > 
> > > > > > > Temporarily at
> > > > > > > 
> > > > > > > http://userweb.kernel.org/~akpm/2.6.19-rc6-mm2/
> > > > > > > 
> > > > > > > Will appear eventually at
> > > > > > > 
> > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc6/2.6.19-rc6-mm2/
> > > > > > 
> > > > > > A minor issue: on one of my (x86-64) test boxes the uli526x driver 
> > > > > > doesn't
> > > > > > work when it's first loaded.  I have to rmmod and modprobe it to 
> > > > > > make it work.
> > > > 
> > > > That isn't a minor issue.
> > > > 
> > > > > > It worked just fine on -mm1, so something must have happened to it 
> > > > > > recently.
> > > > > 
> > > > > Sorry, I was wrong.  The driver doesn't work at all, even after 
> > > > > reload.
> > > > > 
> > > > 
> > > > tulip-dmfe-carrier-detection-fix.patch was added in rc6-mm2.  But you're
> > > > not using that (corrent?)
> > > > 
> > > > git-netdev-all changes drivers/net/tulip/de2104x.c, but you're not using
> > > > that either.
> > > > 
> > > > git-powerpc(!) alters drivers/net/tulip/de4x5.c, but you're not using 
> > > > that.
> > > > 
> > > > Beats me, sorry.  Perhaps it's due to changes in networking core.  It's
> > > > presumably a showstopper for statically-linked-uli526x users.  If you 
> > > > could
> > > > bisect it, please?  I'd start with git-netdev-all, then tulip-*.
> > > 
> > > OK, but it'll take some time.
> > 
> > OK, done.
> > 
> > It's one of these (the first one alone doesn't compile):
> > 
> > git-netdev-all.patch
> > git-netdev-all-fixup.patch
> > libphy-dont-do-that.patch
> 
> Are you able to eliminate libphy-dont-do-that.patch?
> 
> > Is a broken-out version of git-netdev-all.patch available from somewhere?
> 
> Nope, and my few fumbling attempts to generate the sort of patch series
> which you want didn't work out too well.  One has to downgrade to
> git-bisect :(
> 
> What does "doesn't work" mean, btw?

Well, it turns out not to be 100% reproducible.  I can only reproduce it after
a soft reboot (eg. shutdown -r now).

Then, while configuring network interfaces the system says the interface name
is ethxx0, but it should be eth1 (eth0 is an RTL-8139, which is not used).  Now
if I run ifconfig, it says:

eth0: error fetching interface information: Device not found

and that's all (normally, ifconfig would show the information for lo and eth1,
without eth0).  Moreover, 'ifconfig eth1' says:

eth1: error fetching interface information: Device not found

Next, I run 'rmmod uli526x' and 'modprobe uli526x' and then 'ifconfig' is
still saying the above (about eth0), but 'ifconfig eth1' seems to work as
it should.  However, the interface often fails to transfer anything after
that.

Greetings,
Rafael


-- 
You never change things by fighting the existing reality.
R. Buckminster Fuller
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread Andrew Morton
On Wed, 29 Nov 2006 16:53:11 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

> 
> Please, it is very difficult to review your work the way you have
> submitted this patch as a set of 4 patches.  These patches have not
> been split up "logically", but rather they have been split up "per
> file" with the same exact changelog message in each patch posting.
> This is very clumsy, and impossible to review, and wastes a lot of
> mailing list bandwith.
> 
> We have an excellent file, called Documentation/SubmittingPatches, in
> the kernel source tree, which explains exactly how to do this
> correctly.
> 
> By splitting your patch into 4 patches, one for each file touched,
> it is impossible to review your patch as a logical whole.
> 
> Please also provide your patch inline so people can just hit reply
> in their mail reader client to quote your patch and comment on it.
> This is impossible with the attachments you've used.
> 

Here you go - joined up, cleaned up, ported to mainline and test-compiled.

That yield() will need to be removed - yield()'s behaviour is truly awful
if the system is otherwise busy.  What is it there for?



From: Wenji Wu <[EMAIL PROTECTED]>

For Linux TCP, when the network applcaiton make system call to move data from
socket's receive buffer to user space by calling tcp_recvmsg().  The socket
will be locked.  During this period, all the incoming packet for the TCP
socket will go to the backlog queue without being TCP processed

Since Linux 2.6 can be inerrupted mid-task, if the network application
expires, and moved to the expired array with the socket locked, all the
packets within the backlog queue will not be TCP processed till the network
applicaton resume its execution.  If the system is heavily loaded, TCP can
easily RTO in the Sender Side.



 include/linux/sched.h |2 ++
 kernel/fork.c |3 +++
 kernel/sched.c|   24 ++--
 net/ipv4/tcp.c|9 +
 4 files changed, 32 insertions(+), 6 deletions(-)

diff -puN net/ipv4/tcp.c~tcp-speedup net/ipv4/tcp.c
--- a/net/ipv4/tcp.c~tcp-speedup
+++ a/net/ipv4/tcp.c
@@ -1109,6 +1109,8 @@ int tcp_recvmsg(struct kiocb *iocb, stru
struct task_struct *user_recv = NULL;
int copied_early = 0;
 
+   current->backlog_flag = 1;
+
lock_sock(sk);
 
TCP_CHECK_TIMER(sk);
@@ -1468,6 +1470,13 @@ skip_copy:
 
TCP_CHECK_TIMER(sk);
release_sock(sk);
+
+   current->backlog_flag = 0;
+   if (current->extrarun_flag == 1){
+   current->extrarun_flag = 0;
+   yield();
+   }
+
return copied;
 
 out:
diff -puN include/linux/sched.h~tcp-speedup include/linux/sched.h
--- a/include/linux/sched.h~tcp-speedup
+++ a/include/linux/sched.h
@@ -1023,6 +1023,8 @@ struct task_struct {
 #ifdef CONFIG_TASK_DELAY_ACCT
struct task_delay_info *delays;
 #endif
+   int backlog_flag;   /* packets wait in tcp backlog queue flag */
+   int extrarun_flag;  /* extra run flag for TCP performance */
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
diff -puN kernel/sched.c~tcp-speedup kernel/sched.c
--- a/kernel/sched.c~tcp-speedup
+++ a/kernel/sched.c
@@ -3099,12 +3099,24 @@ void scheduler_tick(void)
 
if (!rq->expired_timestamp)
rq->expired_timestamp = jiffies;
-   if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
-   enqueue_task(p, rq->expired);
-   if (p->static_prio < rq->best_expired_prio)
-   rq->best_expired_prio = p->static_prio;
-   } else
-   enqueue_task(p, rq->active);
+   if (p->backlog_flag == 0) {
+   if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
+   enqueue_task(p, rq->expired);
+   if (p->static_prio < rq->best_expired_prio)
+   rq->best_expired_prio = p->static_prio;
+   } else
+   enqueue_task(p, rq->active);
+   } else {
+   if (expired_starving(rq)) {
+   enqueue_task(p,rq->expired);
+   if (p->static_prio < rq->best_expired_prio)
+   rq->best_expired_prio = p->static_prio;
+   } else {
+   if (!TASK_INTERACTIVE(p))
+   p->extrarun_flag = 1;
+   enqueue_task(p,rq->active);
+   }
+   }
} else {
/*
 * Prevent a too long timeslice allowing a task to monopolize
diff -puN kernel/fork.c~tcp-speedup kernel/fork.c
--- a/kernel/fork.c~tcp-speedup
+++ a/kernel/fork.c
@@ -1032,6 +1032,9 @@ static struct task_struct *copy_process(

Re: Linux 2.6.19

2006-11-29 Thread Randy Dunlap
On Wed, 29 Nov 2006 18:56:31 -0600 Greg Norris wrote:

> On Wed, Nov 29, 2006 at 03:11:11PM -0800, Randy Dunlap wrote:
> > What would it take to have the kernel.org web page and finger banner
> > give the correct version information?  (yessir, not your problem)
> 
> On a similar vein, it'd be nice if http://www.kernel.org/kdist/version.html 
> would break the entries into separate lines.


I prefer to use
http://www.kernel.org/kdist/finger_banner
for that.  And script it so that I can just type:

$ kcurrent
to see it.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-29 Thread David Schwartz

Ask yourself this question: Can an assignment to a non-volatile variable be
optimized out? Then ask yourself this question: Does casting away volatile
make it not volatile any more?

> The volatile'ness does not simply disappear the moment you
> assign the result to some local variable which is not volatile.

Yes, it does. That's what a cast does, it tells the compiler to, in all
respects, pretend that a variable is of a different type than it 'actually
is', such that it actually isn't anymore.

> Half of our drivers would break if this were true.

On the contrary, they'd break if it was true. If casting away volatile
didn't make it go away, then casting in volatile wouldn't have to make it
appear. A cast causes the compiler to act as if a variable really was the
type you cast it to. If you cast volatile away, that has the reverse of the
same affect casting to volatile has.

The 'readl' function should actually assign the value to a volatile
variable. Assignments to volatiles cannot be cast away, but casts can and
assignments to non-volatile variables can be optimized out.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread David Miller

The delays dealt with in your paper might actually help a highly
loaded server with lots of sockets and threads trying to communicate.

The packet processing delays caused by the scheduling delay paces the
TCP sender by controlling the rate at which ACKs go back to that
sender.  Those ACKs will go out paced to the rate at which the
sleeping TCP receiver gets back onto the cpu, and this will cause the
TCP sender to naturally adjust to the overall processing rate of the
receiver system, on a per-connection basis.

Perhaps try a system with hundreds of processes and potentially
hundreds of thousands of TCP sockets, with thousands of unique sender
sites, and see what happens.

This is a similar topic like TSO, where we are trying to balance the
gains from batching work from the losses of gaps in the communication
stream.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [v4l-dvb-maintainer] [2.6 patch] remove DVB_AV7110_FIRMWARE

2006-11-29 Thread Oliver Endriss
Adrian Bunk wrote:
> On Tue, Nov 28, 2006 at 08:45:56PM -0800, Trent Piepho wrote:
> > On Wed, 29 Nov 2006, Adrian Bunk wrote:
> > > On Tue, Nov 28, 2006 at 01:06:02PM -0800, Trent Piepho wrote:
> > > > On Sun, 26 Nov 2006, Adrian Bunk wrote:
> > > > > DVB_AV7110_FIRMWARE was (except for some OSS drivers) the only option
> > > > > that was still compiling a binary-only user-supplied firmware file at
> > > > > build-time into the kernel.
> > > > >
> > > > > This patch changes the driver to always use the standard
> > > > > request_firmware() way for firmware by removing DVB_AV7110_FIRMWARE.
> > > >
> > > > Doesn't this also prevent the AV7110 module from getting compiled
> > > > into the kernel?  Shouldn't the Kconfig file be adjusted so
> > > > that 'y' can't be selected anymore and it depends on MODULES?
> > >
> > > No.
> > > No.
> > >
> > > request_firmware() works fine for built-in drivers.
> > 
> > Wouldn't that require loading the firmware file before the filesystems are
> > mounted?
> 
> Sure.

And you have to create an initrd for the firmware!

As I wrote before: 
I NAK any attempt to remove this option.

The option _is_ useful because it allows a user to build an av7110 driver
without hotplug, initrd etc.

Nobody has to use this option, but it should be possible to do so.

CU
Oliver

-- 

VDR Remote Plugin 0.3.8 available at
http://www.escape-edv.de/endriss/vdr/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] x86_64 UP needs smp_call_function_single

2006-11-29 Thread Randy Dunlap
From: Randy Dunlap <[EMAIL PROTECTED]>

smp_call_function_single() needs to be visible in non-SMP builds, to fix:

arch/x86_64/kernel/vsyscall.c:283: warning: implicit declaration of function 
'smp_call_function_single'

The (other/trivial) fix (instead of this one) is to add:
#include 
to linux-2.6.19-rc6-mm2/arch/x86_64/kernel/vsyscall.c

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 include/asm-x86_64/smp.h |7 ---
 include/linux/smp.h  |7 +++
 2 files changed, 7 insertions(+), 7 deletions(-)

--- linux-2.6.19-rc6-mm2.orig/include/asm-x86_64/smp.h
+++ linux-2.6.19-rc6-mm2/include/asm-x86_64/smp.h
@@ -113,13 +113,6 @@ static __inline int logical_smp_processo
 #define cpu_physical_id(cpu)   x86_cpu_to_apicid[cpu]
 #else
 #define cpu_physical_id(cpu)   boot_cpu_id
-static inline int smp_call_function_single(int cpuid, void (*func) (void 
*info),
-   void *info, int retry, int wait)
-{
-   /* Disable interrupts here? */
-   func(info);
-   return 0;
-}
 #endif /* !CONFIG_SMP */
 #endif
 
--- linux-2.6.19-rc6-mm2.orig/include/linux/smp.h
+++ linux-2.6.19-rc6-mm2/include/linux/smp.h
@@ -99,6 +99,13 @@ static inline int up_smp_call_function(v
 static inline void smp_send_reschedule(int cpu) { }
 #define num_booting_cpus() 1
 #define smp_prepare_boot_cpu() do {} while (0)
+static inline int smp_call_function_single(int cpuid, void (*func) (void 
*info),
+   void *info, int retry, int wait)
+{
+   /* Disable interrupts here? */
+   func(info);
+   return 0;
+}
 
 #endif /* !SMP */
 


---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PM-Timer clock source is slow. Try something else: How slow? What other source(s)?

2006-11-29 Thread Linda Walsh

I recently noticed this message in my bootup that I don't remember
from before:

PCI: Probing PCI hardware (bus 00)
* Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
* this clock source is slow. Consider trying other clock sources
--
   How would this affect my clock?  It says to try another
clock source, what type of clock source would it be suggesting I
use? Another chip already in the computer? It is an Intel 440BX
chipset; on an Dell motherboard. Would that be likely to have
another chip source that is compensating?

I don't notice a significant clock slowdown, but I'm running NTP,
so that could be masking the problem.

NTP values appear: to indicated smallish values for clock variance, but I'm
not sure what is "standardly" considered good or bad, so I don't have 
anything

to compare to.

Relevant ntp time vars show:
leap indicator:   00
stratum:  2
precision:-20
root distance:0.01445 s
root dispersion:  0.01372 s
jitter:   0.002335 s
stability:58.565 ppm
broadcastdelay:   0.003998 s
---
 maximum error 130449 us, estimated error 1923 us
ntp_adjtime() returns code 0 (OK)
 offset 1384.000 us, frequency 74.584 ppm, interval 1 s,
 maximum error 130449 us, estimated error 1923 us,
 status 0x1 (PLL),
 time constant 3, precision 1.000 us, tolerance 512 ppm,


   It seems the estimated error is .1923ms, with a precision
of 1us.
   Is the clock "slowness" indicated by the
"offset 1384us, 74.584ppm @ interval 1s?  I.e. do I read that
as the clock is off by 74.584ppm/s, or ~75us/sec, or do I look
at the offset of 1384us/sec, meaning off by .1384ms/s (wouldn't
that be 1384ppm?).  Seems the stability is fairly low, on the
order of 58.656ppm, or about .058ms/s?

   Seems like fewer questions are being answered these days than
in days past.  Is this because of a change in the list focus (maybe
all the patches being submitted),
- or change in list membership, i.e. fewer people up-to-speed on
older HW,
- or increased specialization in specific kernel areas with fewer
having knowledge outside their specific domain, or
what?

   It it is an ugly tradeoff between development time spent
and answering questions that might increase understanding of
people on the list (or maybe it's such common knowledge that
no one bothers to answer...  dunno...

but thanks for any ideas...especially on the original issue.

Linda W.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.19

2006-11-29 Thread Greg Norris
On Wed, Nov 29, 2006 at 03:11:11PM -0800, Randy Dunlap wrote:
> What would it take to have the kernel.org web page and finger banner
> give the correct version information?  (yessir, not your problem)

On a similar vein, it'd be nice if http://www.kernel.org/kdist/version.html 
would break the entries into separate lines.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mips tx4927 missing brace fix

2006-11-29 Thread Atsushi Nemoto
On Wed, 29 Nov 2006 19:43:46 +, Ralf Baechle <[EMAIL PROTECTED]> wrote:
> On Wed, Nov 29, 2006 at 08:30:35PM +0100, Mariusz Kozlowski wrote:
> 
> > This patch adds missing brace at the end of 
> > toshiba_rbtx4927_irq_isa_init().
> 
> Thanks Mariusz!  Applied,

Oh, that was my fault.  Thank you.  I see the fix was folded into
linux-queue tree.  Thanks.

---
Atsushi Nemoto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-29 Thread David Miller

Please, it is very difficult to review your work the way you have
submitted this patch as a set of 4 patches.  These patches have not
been split up "logically", but rather they have been split up "per
file" with the same exact changelog message in each patch posting.
This is very clumsy, and impossible to review, and wastes a lot of
mailing list bandwith.

We have an excellent file, called Documentation/SubmittingPatches, in
the kernel source tree, which explains exactly how to do this
correctly.

By splitting your patch into 4 patches, one for each file touched,
it is impossible to review your patch as a logical whole.

Please also provide your patch inline so people can just hit reply
in their mail reader client to quote your patch and comment on it.
This is impossible with the attachments you've used.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 1/5][AIO] - Rework compat_sys_io_submit

2006-11-29 Thread Zach Brown

On Nov 29, 2006, at 2:32 AM, Sébastien Dugué wrote:


 compat_sys_io_submit() cleanup


  Cleanup compat_sys_io_submit by duplicating some of the native  
syscall

logic in the compat layer and directly calling io_submit_one() instead
of fooling the syscall into thinking it is called from a native 64-bit
caller.

  This is needed for the completion notification patch to avoid having
to rewrite each iocb on the caller stack for sys_io_submit() to  
find the

sigevents.


You could explicitly mention that this eliminates:

 - the overhead of copying nr pointers on the userspace caller's stack

 - the arbitrary PAGE_SIZE/(sizeof(void *)) limit on the number of  
iocbs that can be submitted


Those alone make this worth merging.


+   if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)
+   return -EFAULT;


I'm glad you got that right :)  I no doubt would have initially  
hoisted these little checks into a shared helper function and missed  
that detail of getting the size of the access_ok() right in the  
compat case.



+   put_ioctx(ctx);
+
+   return i? i: ret;


sys_io_getevents() reads:

put_ioctx(ctx);
return i ? i : ret;

So while this compat_sys_io_submit() logic seems fine and I would be  
comfortable with it landing as-is, I'd also appreciate it if we  
didn't introduce differences between the two functions when it seems  
just as easy to make them the same.  (That chunk is just one  
example.  There's whitespace, missing unlikely()s, etc).


- z-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


hrtimer.h

2006-11-29 Thread Ariel Chÿffffe1vez Lorenzo
Hi,

Since the kernel 2.6.18 has incorporated the high
resolution timer itself, I'm trying to test it, but on
my GNU/Debian I can't figure out how to include
hrtimer.h, that is on /usr/src/linux/include/, the
headers.

I use the following command to try to compile it.

gcc -D__KERNEL__ -I /usr/src/linux/include ex.c


ex.c is just the inclusion of hrtimer.h

#include 
int main()
{
 return 0;
}


and I get this:



In file included from
/usr/src/linux-headers-2.6.18sbr-24-11-06/include/asm/thread_info.h:16,
 from
/usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/thread_info.h:21,
 from
/usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/preempt.h:9,
 from
/usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/spinlock.h:49,
 from
/usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/seqlock.h:29,
 from
/usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/time.h:7,
 from
/usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/ktime.h:24,
 from
/usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/hrtimer.h:19,
 from ex.c:1:
/usr/src/linux-headers-2.6.18sbr-24-11-06/include/asm/processor.h:80:
error: ‘CONFIG_X86_L1_CACHE_SHIFT’ undeclared here
(not in a function)
/usr/src/linux-headers-2.6.18sbr-24-11-06/include/asm/processor.h:80:
error: requested alignment is not a constant


I will appreciate any hint.
Thanks in advance..

Ariel




__ 
LLama Gratis a cualquier PC del Mundo. 
Llamadas a fijos y móviles desde 1 céntimo por minuto. 
http://es.voice.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] doc: atomic_add_unless() doesn't imply mb() on failure

2006-11-29 Thread Oleg Nesterov
Most implementations of atomic_add_unless() can fail (return 0) after the first
atomic_read() (before cmpxchg). In that case we have a compiler barrier only.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

 Documentation/atomic_ops.txt  |3 ++-
 Documentation/memory-barriers.txt |2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

--- 19-rc6/Documentation/memory-barriers.txt~doc2006-11-27 
21:20:20.0 +0300
+++ 19-rc6/Documentation/memory-barriers.txt2006-11-30 03:32:06.0 
+0300
@@ -1492,7 +1492,7 @@ about the state (old or new) implies an 
atomic_dec_and_test();
atomic_sub_and_test();
atomic_add_negative();
-   atomic_add_unless();
+   atomic_add_unless();/* when succeeds (returns 1) */
test_and_set_bit();
test_and_clear_bit();
test_and_change_bit();
--- 19-rc6/Documentation/atomic_ops.txt~doc 2006-07-29 05:05:33.0 
+0400
+++ 19-rc6/Documentation/atomic_ops.txt 2006-11-30 03:22:58.0 +0300
@@ -137,7 +137,8 @@ If the atomic value v is not equal to u,
 returns non zero. If v is equal to u then it returns zero. This is done as
 an atomic operation.
 
-atomic_add_unless requires explicit memory barriers around the operation.
+atomic_add_unless requires explicit memory barriers around the operation
+unless it fails (returns 0).
 
 atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH -rt] RCU priority boosting that survives mild testing

2006-11-29 Thread Paul E. McKenney
This patch boosts the priority of RCU read-side critical sections when
they block to prevent them from being preempted by other non-realtime
threads.  This patch allows transitive boosting (e.g., to processes
holding locks waited on by the RCU read-side critical section) and
actually survives light testing, in contrast with its rather large
number of predecessors.  (All of which are preserved for posterity at
http://rdrop.com/users/paulmck/patches -- nothing to hide, so there!!!)

The trick is to provide a per-task mutex that is acquired when a task
enters the scheduler while in an RCU read-side critical section.  This
mutex is released by the outermost rcu_read_unlock().  This works even
if rcu_read_unlock() is invoked by (say) a hardware irq handler, since
the critical section cannot be preempted in that case.  One remaining
case not handled is the following:

rcu_read_lock();
/* code that might be preempted. */
local_irq_save(oldirq);
rcu_read_unlock();
local_irq_restore(oldirq);

If this case is important to you, please don't keep it a secret!!!

A separate task (not yet implemented, but in process) can then acquire
a given task's mutex, boosting its priority for the duration of the
RCU read-side critical section, as needed to expedite a given RCU
grace period.  The formerly painful races with rcu_read_unlock() are now
harmless -- the boosting task simply needlessly acquires and immediately
releases the mutex in that case.

There is a new CONFIG_PREEMPT_RCU_BOOST that enables the boosting,
defaulting to "n" because this code is quite new and because people
writing realtime applications that carefully avoid realtime-priority CPU
hogs may not want the degradation in scheduling latency that comes with
this patch.  This config variable should also greatly reduce the risk
that this patch might otherwise pose to innocent bystanders.

Some questions:

o   I currently unconditionally boost to the highest non-realtime
priority when a task blocks in an RCU read-side critical section.
This is to aid in testing, but I am thinking in terms of
removing it.  It degrades scheduling latency, and if there
is a real problem, the TBD booster task should kick it later.
Plus, getting rid of this would significantly reduce the size
and intrusiveness of the patch.  Does this approach make sense?

o   I believe I can acquire a mutex with impunity near the beginning
of __schedule().  I have a flag that prevents more than one
level of recursion in face of nested preemptions (e.g., due to
getting a scheduling-clock interrupt just as one was starting
__schedule() anyway).  Any gotchas I am missing?

o   Is the code snippet above likely to show up?  If it is, I would
check for interrupts disabled in rcu_read_unlock(), IPI myself if
so, and clean up in preempt_schedule_irq().  I would like to avoid
this due to the extra test on the preemption path.  Thoughts?

I am in the process of testing on 2.6.19-rc6-rt10.

Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
---

 include/linux/init_task.h  |   12 
 include/linux/rcupreempt.h |4 
 include/linux/sched.h  |   12 
 kernel/Kconfig.preempt |   11 +++
 kernel/rcupreempt.c|   23 ---
 kernel/rtmutex.c   |9 ++---
 kernel/sched.c |   17 +
 kernel/softirq.c   |1 +
 8 files changed, 83 insertions(+), 6 deletions(-)

diff -urpNa -X dontdiff linux-2.6.18-rt3/include/linux/init_task.h 
linux-2.6.18-rt3-rcubp/include/linux/init_task.h
--- linux-2.6.18-rt3/include/linux/init_task.h  2006-10-09 17:27:12.0 
-0700
+++ linux-2.6.18-rt3-rcubp/include/linux/init_task.h2006-11-27 
11:04:03.0 -0800
@@ -91,6 +91,7 @@ extern struct group_info init_groups;
.prio   = MAX_PRIO-20,  \
.static_prio= MAX_PRIO-20,  \
.normal_prio= MAX_PRIO-20,  \
+   INIT_RCU_PRIO   \
.policy = SCHED_NORMAL, \
.cpus_allowed   = CPU_MASK_ALL, \
.mm = NULL, \
@@ -98,6 +99,7 @@ extern struct group_info init_groups;
.run_list   = LIST_HEAD_INIT(tsk.run_list), \
.ioprio = 0,\
.time_slice = HZ,   \
+   INIT_RCU_BOOST  \
.tasks  = LIST_HEAD_INIT(tsk.tasks),\
.ptrace_children= LIST_HEAD_INIT(tsk.ptrace_children),  \
.ptrace_list= 

Re: [PATCH] alternatives/paravirt: use NULL for pointers

2006-11-29 Thread Andi Kleen
On Wednesday 29 November 2006 22:17, Randy Dunlap wrote:
> From: Randy Dunlap <[EMAIL PROTECTED]>
> 
> Use NULL instead of 0 for pointers.
> 
> arch/x86_64/kernel/../../i386/kernel/alternative.c:432:18: warning: Using 
> plain integer as NULL pointer
> arch/x86_64/kernel/../../i386/kernel/alternative.c:432:44: warning: Using 
> plain integer as NULL pointer

I fixed it in the original patch thanks

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] autofs: fix error code path in autofs_fill_sb()

2006-11-29 Thread Jiri Kosina
[PATCH] autofs: fix error code path in autofs_fill_sb()

When kernel is compiled with old version of autofs (CONFIG_AUTOFS_FS), and 
new (observed at least with 5.x.x) automount deamon is started, kernel 
correctly reports incompatible version of kernel and userland daemon, but 
then screws things up instead of correct handling of the error:

 autofs: kernel does not match daemon version
 =
 [ BUG: bad unlock balance detected! ]
 -
 automount/4199 is trying to release lock (>s_umount_key) at:
 [] get_sb_nodev+0x76/0xa4
 but there are no more locks to release!

 other info that might help us debug this:
 no locks held by automount/4199.

 stack backtrace:
  [] dump_trace+0x68/0x1b2
  [] show_trace_log_lvl+0x18/0x2c
  [] show_trace+0xf/0x11
  [] dump_stack+0x12/0x14
  [] print_unlock_inbalance_bug+0xe7/0xf3
  [] lock_release+0x8d/0x164
  [] up_write+0x14/0x27
  [] get_sb_nodev+0x76/0xa4
  [] vfs_kern_mount+0x83/0xf6
  [] do_kern_mount+0x2d/0x3e
  [] do_mount+0x607/0x67a
  [] sys_mount+0x72/0xa4
  [] sysenter_past_esp+0x5f/0x99
 DWARF2 unwinder stuck at sysenter_past_esp+0x5f/0x99
 Leftover inexact backtrace:
  ===

and then deadlock comes.

The problem: autofs_fill_super() returns EINVAL to get_sb_nodev(), but before
that, it calls kill_anon_super() to destroy the superblock which won't be 
needed. This is however way too soon to call kill_anon_super(), because 
get_sb_nodev() has to perform its own cleanup of the superblock first
(deactivate_super(), etc.). The correct time to call kill_anon_super() is in
the autofs_kill_sb() callback, which is called by deactivate_super() at proper
time, when the superblock is ready to be killed.

I can see the same faulty codepath also in autofs4. This patch solves issues in
both filesystems in a same way - it postpones the kill_anon_super() until the 
proper time is signalized by deactivate_super() calling the kill_sb() callback.

Patch against 2.6.19-rc6-mm2.

Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>

--- 

 fs/autofs/inode.c|4 ++--
 fs/autofs4/inode.c   |4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/autofs/inode.c b/fs/autofs/inode.c
index 38ede5c..61e04ab 100644
--- a/fs/autofs/inode.c
+++ b/fs/autofs/inode.c
@@ -31,7 +31,7 @@ void autofs_kill_sb(struct super_block *
 * just exit when we are called from deactivate_super.
 */
if (!sbi)
-   return;
+   goto out_kill_sb;
 
if ( !sbi->catatonic )
autofs_catatonic_mode(sbi); /* Free wait queues, close pipe */
@@ -44,6 +44,7 @@ void autofs_kill_sb(struct super_block *
 
kfree(sb->s_fs_info);
 
+out_kill_sb:
DPRINTK(("autofs: shutting down\n"));
kill_anon_super(sb);
 }
@@ -209,7 +210,6 @@ fail_iput:
 fail_free:
kfree(sbi);
s->s_fs_info = NULL;
-   kill_anon_super(s);
 fail_unlock:
return -EINVAL;
 }
diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
index ce7c0f1..be14200 100644
--- a/fs/autofs4/inode.c
+++ b/fs/autofs4/inode.c
@@ -155,7 +155,7 @@ void autofs4_kill_sb(struct super_block
 * just exit when we are called from deactivate_super.
 */
if (!sbi)
-   return;
+   goto out_kill_sb;
 
sb->s_fs_info = NULL;
 
@@ -167,6 +167,7 @@ void autofs4_kill_sb(struct super_block
 
kfree(sbi);
 
+out_kill_sb:
DPRINTK("shutting down");
kill_anon_super(sb);
 }
@@ -426,7 +427,6 @@ fail_ino:
 fail_free:
kfree(sbi);
s->s_fs_info = NULL;
-   kill_anon_super(s);
 fail_unlock:
return -EINVAL;
 }


-- 
Jiri Kosina
SUSE Labs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.19

2006-11-29 Thread Udo A. Steinberg
On Wed, 29 Nov 2006 14:21:21 -0800 (PST) Linus Torvalds (LT) wrote:

LT> So go get it. It's one of those rare "perfect" kernels. So if it doesn't 
LT> happen to compile with your config (or it does compile, but then does 
LT> unspeakable acts of perversion with your pet dachshund), you can rest easy 
LT> knowing that it's all your own d*mn fault, and you should just fix your 
LT> evil ways.

Ok, so 2.6.18 used to get along fine with cryptoloop and 2.6.19 refuses to
cooperate. An strace of "losetup -e aes /dev/loop0 /dev/hda7" without all the
terminal interaction shows:

open("/dev/hda7", O_RDWR|O_LARGEFILE)   = 3
open("/dev/loop0", O_RDWR|O_LARGEFILE)  = 4
mlockall(MCL_CURRENT|MCL_FUTURE)= 0
...
munmap(0xb7fc8000, 4096)= 0
ioctl(4, 0x4c00, 0x3)   = 0
close(3)= 0
ioctl(4, 0x4c04, 0xbfc21670)= -1 ENOENT (No such file or directory)
ioctl(4, 0x4c02, 0xbfc215e0)= -1 ENOENT (No such file or directory)
dup(2)  = 3
fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat64(3, {st_mode=S_IFCHR|0720, st_rdev=makedev(4, 1), ...}) = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7fc8000
_llseek(3, 0, 0xbfc21040, SEEK_CUR) = -1 ESPIPE (Illegal seek)
write(3, "ioctl: LOOP_SET_STATUS: No such "..., 50ioctl: LOOP_SET_STATUS: No 
such file or directory) = 50
close(3)= 0
munmap(0xb7fc8000, 4096)= 0
ioctl(4, 0x4c01, 0) = 0
close(4)= 0
exit_group(1)   = ?

Linux 2.6.18 does not fail at

ioctl(4, 0x4c04, ...)

I know that dm-crypt is now the preferred method of doing such things, but as
long as cryptoloop exists in the kernel I'd expect it to work.

Cheers,
- Udo


signature.asc
Description: PGP signature


Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

2006-11-29 Thread Oleg Nesterov
On 11/29, Paul E. McKenney wrote:
>
> On Wed, Nov 29, 2006 at 11:16:46PM +0300, Oleg Nesterov wrote:
> > 
> > Hmm... SRCU can't be used from irq, yes. But I think that both versions
> > (spinlock needs _irqsave) can ?
> 
> I didn't think you could call wait_event() from irq.

Ah, sorry for confusion, I talked only about read lock/unlock of course.

Just in case, it is not safe to do srcu_read_{,un}lock() from irq,

per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++
  
we need local_t for that.

> For the locked version, you would also need spin_lock_irqsave() or some
> such to avoid self-deadlock.
> 
> For the atomic version, the fact that synchronize_qrcu() increments
> the new counter before decrmenting the old one should mean that calls
> to qrcu_read_lock() and qrcu_read_unlock() can be called from irq.

Yes, exactly! There is another reason, suppose we did

qp->completed++;
atomic_inc(qp->ctr + (idx ^ 0x1));

In that case the reader could be stalled if synchronize_qrcu() takes a
preemption in between.

> But synchronize_qrcu() must be called from process context, since it
> can block.

Surely.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.19

2006-11-29 Thread Linus Torvalds


On Wed, 29 Nov 2006, Randy Dunlap wrote:

> On Wed, 29 Nov 2006 23:21:12 + Alan wrote:
> 
> > On Wed, 29 Nov 2006 15:11:11 -0800
> > Randy Dunlap <[EMAIL PROTECTED]> wrote:
> > 
> > > What would it take to have the kernel.org web page and finger banner
> > > give the correct version information?  
> > 
> > Patience 8)
> 
> OK.  How many days?

It _should_ update automatically once everything has mirrored out. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC, PATCH 1/2] qrcu: "quick" srcu implementation

2006-11-29 Thread Oleg Nesterov
Very much based on ideas, corrections, and patient explanations from
Alan and Paul.

The current srcu implementation is very good for readers, lock/unlock
are extremely cheap. But for that reason it is not possible to avoid
synchronize_sched() and polling in synchronize_srcu().

Jens Axboe wrote:
>
> It works for me, but the overhead is still large. Before it would take
> 8-12 jiffies for a synchronize_srcu() to complete without there actually
> being any reader locks active, now it takes 2-3 jiffies. So it's
> definitely faster, and as suspected the loss of two of three
> synchronize_sched() cut down the overhead to a third.

'qrcu' behaves the same as srcu but optimized for writers. The fast path
for synchronize_qrcu() is mutex_lock() + atomic_read() + mutex_unlock().
The slow path is __wait_event(), no polling. However, the reader does
atomic inc/dec on lock/unlock, and the counters are not per-cpu.

Also, unlike srcu, qrcu read lock/unlock can be used in interrupt context,
and 'qrcu_struct' can be compile-time initialized.

See also (a long) discussion:
http://marc.theaimsgroup.com/?t=11637085763

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 19-rc6/include/linux/srcu.h~1_qrcu  2006-11-17 19:42:31.0 +0300
+++ 19-rc6/include/linux/srcu.h 2006-11-29 20:22:37.0 +0300
@@ -27,6 +27,8 @@
 #ifndef _LINUX_SRCU_H
 #define _LINUX_SRCU_H
 
+#include 
+
 struct srcu_struct_array {
int c[2];
 };
@@ -50,4 +52,24 @@ void srcu_read_unlock(struct srcu_struct
 void synchronize_srcu(struct srcu_struct *sp);
 long srcu_batches_completed(struct srcu_struct *sp);
 
+/*
+ * fully compatible with srcu, but optimized for writers.
+ */
+
+struct qrcu_struct {
+   int completed;
+   atomic_t ctr[2];
+   wait_queue_head_t wq;
+   struct mutex mutex;
+};
+
+int init_qrcu_struct(struct qrcu_struct *qp);
+int qrcu_read_lock(struct qrcu_struct *qp);
+void qrcu_read_unlock(struct qrcu_struct *qp, int idx);
+void synchronize_qrcu(struct qrcu_struct *qp);
+
+static inline void cleanup_qrcu_struct(struct qrcu_struct *qp)
+{
+}
+
 #endif
--- 19-rc6/kernel/srcu.c~1_qrcu 2006-11-17 19:42:31.0 +0300
+++ 19-rc6/kernel/srcu.c2006-11-29 20:09:49.0 +0300
@@ -256,3 +256,55 @@ EXPORT_SYMBOL_GPL(srcu_read_unlock);
 EXPORT_SYMBOL_GPL(synchronize_srcu);
 EXPORT_SYMBOL_GPL(srcu_batches_completed);
 EXPORT_SYMBOL_GPL(srcu_readers_active);
+
+int init_qrcu_struct(struct qrcu_struct *qp)
+{
+   qp->completed = 0;
+   atomic_set(qp->ctr + 0, 1);
+   atomic_set(qp->ctr + 1, 0);
+   init_waitqueue_head(>wq);
+   mutex_init(>mutex);
+
+   return 0;
+}
+
+int qrcu_read_lock(struct qrcu_struct *qp)
+{
+   for (;;) {
+   int idx = qp->completed & 0x1;
+   if (likely(atomic_inc_not_zero(qp->ctr + idx)))
+   return idx;
+   }
+}
+
+void qrcu_read_unlock(struct qrcu_struct *qp, int idx)
+{
+   if (atomic_dec_and_test(qp->ctr + idx))
+   wake_up(>wq);
+}
+
+void synchronize_qrcu(struct qrcu_struct *qp)
+{
+   int idx;
+
+   smp_mb();
+   mutex_lock(>mutex);
+
+   idx = qp->completed & 0x1;
+   if (atomic_read(qp->ctr + idx) == 1)
+   goto out;
+
+   atomic_inc(qp->ctr + (idx ^ 0x1));
+   qp->completed++;
+
+   atomic_dec(qp->ctr + idx);
+   __wait_event(qp->wq, !atomic_read(qp->ctr + idx));
+out:
+   mutex_unlock(>mutex);
+   smp_mb();
+}
+
+EXPORT_SYMBOL_GPL(init_qrcu_struct);
+EXPORT_SYMBOL_GPL(qrcu_read_lock);
+EXPORT_SYMBOL_GPL(qrcu_read_unlock);
+EXPORT_SYMBOL_GPL(synchronize_qrcu);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC, PATCH 2/2] qrcu: add rcutorture test

2006-11-29 Thread Oleg Nesterov
Add rcutorture test for qrcu.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 19-rc6/kernel/__rcutorture.c2006-11-17 19:42:31.0 +0300
+++ 19-rc6/kernel/rcutorture.c  2006-11-29 20:05:23.0 +0300
@@ -465,6 +465,73 @@ static struct rcu_torture_ops srcu_ops =
 };
 
 /*
+ * Definitions for qrcu torture testing.
+ */
+
+static struct qrcu_struct qrcu_ctl;
+
+static void qrcu_torture_init(void)
+{
+   init_qrcu_struct(_ctl);
+   rcu_sync_torture_init();
+}
+
+static void qrcu_torture_cleanup(void)
+{
+   synchronize_qrcu(_ctl);
+   cleanup_qrcu_struct(_ctl);
+}
+
+static int qrcu_torture_read_lock(void)
+{
+   return qrcu_read_lock(_ctl);
+}
+
+static void qrcu_torture_read_unlock(int idx)
+{
+   qrcu_read_unlock(_ctl, idx);
+}
+
+static int qrcu_torture_completed(void)
+{
+   return qrcu_ctl.completed;
+}
+
+static void qrcu_torture_synchronize(void)
+{
+   synchronize_qrcu(_ctl);
+}
+
+static int qrcu_torture_stats(char *page)
+{
+   int cnt = 0;
+   int idx = qrcu_ctl.completed & 0x1;
+
+   cnt += sprintf([cnt], "%s%s per-CPU(idx=%d):",
+   torture_type, TORTURE_FLAG, idx);
+
+   cnt += sprintf([cnt], " (%d,%d)",
+   atomic_read(qrcu_ctl.ctr + 0),
+   atomic_read(qrcu_ctl.ctr + 1));
+
+   cnt += sprintf([cnt], "\n");
+   return cnt;
+}
+
+static struct rcu_torture_ops qrcu_ops = {
+   .init = qrcu_torture_init,
+   .cleanup = qrcu_torture_cleanup,
+   .readlock = qrcu_torture_read_lock,
+   .readdelay = srcu_read_delay,
+   .readunlock = qrcu_torture_read_unlock,
+   .completed = qrcu_torture_completed,
+   .deferredfree = rcu_sync_torture_deferred_free,
+   .sync = qrcu_torture_synchronize,
+   .stats = qrcu_torture_stats,
+   .name = "qrcu"
+};
+
+/*
  * Definitions for sched torture testing.
  */
 
@@ -503,8 +570,8 @@ static struct rcu_torture_ops sched_ops 
 };
 
 static struct rcu_torture_ops *torture_ops[] =
-   { _ops, _sync_ops, _bh_ops, _bh_sync_ops, _ops,
- _ops, NULL };
+   { _ops, _sync_ops, _bh_ops, _bh_sync_ops,
+ _ops, _ops, _ops, NULL };
 
 /*
  * RCU torture writer kthread.  Repeatedly substitutes a new structure

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] netfilter: remove broken macro

2006-11-29 Thread Mariusz Kozlowski
Hello,

This patch removes broken and unused macro.

Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>

 net/ipv4/netfilter/ip_nat_standalone.c |6 --
 1 file changed, 6 deletions(-)

--- linux-2.6.19-rc6-mm2-a/net/ipv4/netfilter/ip_nat_standalone.c   
2006-11-16 05:03:40.0 +0100
+++ linux-2.6.19-rc6-mm2-b/net/ipv4/netfilter/ip_nat_standalone.c   
2006-11-29 15:31:37.0 +0100
@@ -44,12 +44,6 @@
 #define DEBUGP(format, args...)
 #endif
 
-#define HOOKNAME(hooknum) ((hooknum) == NF_IP_POST_ROUTING ? "POST_ROUTING"  \
-  : ((hooknum) == NF_IP_PRE_ROUTING ? "PRE_ROUTING" \
- : ((hooknum) == NF_IP_LOCAL_OUT ? "LOCAL_OUT"  \
-: ((hooknum) == NF_IP_LOCAL_IN ? "LOCAL_IN"  \
-   : "*ERROR*")))
-
 #ifdef CONFIG_XFRM
 static void nat_decode_session(struct sk_buff *skb, struct flowi *fl)
 {


-- 
Regards,

Mariusz Kozlowski
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >