date:20070920

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-20 Thread Dipankar Sarma

On Fri, Sep 21, 2007 at 12:17:21AM -0400, Steven Rostedt wrote:
> [ continued here from comment on patch 1]
> 
> On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:
> >  /* softirq mask and active fields moved to irq_cpustat_t in
> > diff -urpNa -X dontdiff 
> > linux-2.6.22-b-fixbarriers/include/linux/rcuclassic.h 
> > linux-2.6.22-c-preemptrcu/include/linux/rcuclassic.h
> > --- linux-2.6.22-b-fixbarriers/include/linux/rcuclassic.h   2007-08-22 
> > 14:42:23.0 -0700
> > +++ linux-2.6.22-c-preemptrcu/include/linux/rcuclassic.h2007-08-22 
> > 15:21:06.0 -0700
> > @@ -142,8 +142,6 @@ extern int rcu_needs_cpu(int cpu);
> >  #define RCU_HEAD_INIT  { .next = NULL, .func = NULL }
> >  #define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT
> > @@ -218,10 +222,13 @@ extern void FASTCALL(call_rcu_bh(struct 
> >  /* Exported common interfaces */
> >  extern void synchronize_rcu(void);
> >  extern void rcu_barrier(void);
> > +extern long rcu_batches_completed(void);
> > +extern long rcu_batches_completed_bh(void);
> >
> 
> And here we put back rcu_batches_completed and rcu_batches_completed_bh
> from rcuclassic.h to rcupdate.h ;-)

Good questions :) I can't remember why I did this - probably because
I was breaking up into classic and preemptible RCU in incremental
patches with the goal that the break-up patch can be merged
before the rcu-preempt patches. IIRC, I had to make *batches_completed*()
a common RCU API later on.

Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-20 Thread Paul E. McKenney

On Fri, Sep 21, 2007 at 12:17:21AM -0400, Steven Rostedt wrote:
> [ continued here from comment on patch 1]
> 
> On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:
> >  /* softirq mask and active fields moved to irq_cpustat_t in
> > diff -urpNa -X dontdiff 
> > linux-2.6.22-b-fixbarriers/include/linux/rcuclassic.h 
> > linux-2.6.22-c-preemptrcu/include/linux/rcuclassic.h
> > --- linux-2.6.22-b-fixbarriers/include/linux/rcuclassic.h   2007-08-22 
> > 14:42:23.0 -0700
> > +++ linux-2.6.22-c-preemptrcu/include/linux/rcuclassic.h2007-08-22 
> > 15:21:06.0 -0700
> > @@ -142,8 +142,6 @@ extern int rcu_needs_cpu(int cpu);
> >  extern void __rcu_init(void);
> >  extern void rcu_check_callbacks(int cpu, int user);
> >  extern void rcu_restart_cpu(int cpu);
> > -extern long rcu_batches_completed(void);
> > -extern long rcu_batches_completed_bh(void);
> >  
> >  #endif /* __KERNEL__ */
> >  #endif /* __LINUX_RCUCLASSIC_H */
> > diff -urpNa -X dontdiff linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h 
> > linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h
> > --- linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h 2007-07-19 
> > 14:02:36.0 -0700
> > +++ linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h  2007-08-22 
> > 15:21:06.0 -0700
> > @@ -52,7 +52,11 @@ struct rcu_head {
> > void (*func)(struct rcu_head *head);
> >  };
> >  
> > +#ifdef CONFIG_CLASSIC_RCU
> >  #include 
> > +#else /* #ifdef CONFIG_CLASSIC_RCU */
> > +#include 
> > +#endif /* #else #ifdef CONFIG_CLASSIC_RCU */
> >  
> >  #define RCU_HEAD_INIT  { .next = NULL, .func = NULL }
> >  #define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT
> > @@ -218,10 +222,13 @@ extern void FASTCALL(call_rcu_bh(struct 
> >  /* Exported common interfaces */
> >  extern void synchronize_rcu(void);
> >  extern void rcu_barrier(void);
> > +extern long rcu_batches_completed(void);
> > +extern long rcu_batches_completed_bh(void);
> >
> 
> And here we put back rcu_batches_completed and rcu_batches_completed_bh
> from rcuclassic.h to rcupdate.h ;-)

Hmmm...  Good point!!!  I guess it would be OK to just leave them
in rcupdate.h throughout.  ;-)

Will fix.  And good eyes!

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ahci: enable GHC.AE bit before set GHC.HR

2007-09-20 Thread Peer Chen

According to the description of section 5.2.2.1 and 10.1.2 of AHCI 
specification rev1_1/rev1_2, GHC.HR shall only be set to ¡®1¡¯
by software when GHC.AE is set to ¡®1¡¯.

Signed-off-by: Peer Chen <[EMAIL PROTECTED]>
---
--- linux-2.6.23-rc7/drivers/ata/ahci.c.orig2007-09-20 11:01:55.0 
-0400
+++ linux-2.6.23-rc7/drivers/ata/ahci.c 2007-09-20 11:07:31.0 -0400
@@ -834,6 +834,10 @@ static int ahci_reset_controller(struct 
void __iomem *mmio = host->iomap[AHCI_PCI_BAR];
u32 tmp;
 
+/* turn on AHCI mode before controller reset*/
+writel(HOST_AHCI_EN, mmio + HOST_CTL);
+(void) readl(mmio + HOST_CTL);  /* flush */
+
/* global controller reset */
tmp = readl(mmio + HOST_CTL);
if ((tmp & HOST_RESET) == 0) {
-

--
Peer Chen
2007-09-21

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Introduce ext4_find_next_bit

2007-09-20 Thread Aneesh Kumar K.V

Also add generic_find_next_le_bit

This gets used by the ext4 multi block allocator patches.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 include/asm-generic/bitops/ext2-non-atomic.h |2 +
 include/asm-generic/bitops/le.h  |4 ++
 include/asm-powerpc/bitops.h |4 ++
 include/linux/ext4_fs.h  |1 +
 lib/find_next_bit.c  |   44 ++
 5 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/include/asm-generic/bitops/ext2-non-atomic.h 
b/include/asm-generic/bitops/ext2-non-atomic.h
index 1697404..63cf822 100644
--- a/include/asm-generic/bitops/ext2-non-atomic.h
+++ b/include/asm-generic/bitops/ext2-non-atomic.h
@@ -14,5 +14,7 @@
generic_find_first_zero_le_bit((unsigned long *)(addr), (size))
 #define ext2_find_next_zero_bit(addr, size, off) \
generic_find_next_zero_le_bit((unsigned long *)(addr), (size), (off))
+#define ext2_find_next_bit(addr, size, off) \
+   generic_find_next_le_bit((unsigned long *)(addr), (size), (off))
 
 #endif /* _ASM_GENERIC_BITOPS_EXT2_NON_ATOMIC_H_ */
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index b9c7e5d..80e3bf1 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -20,6 +20,8 @@
 #define generic___test_and_clear_le_bit(nr, addr) __test_and_clear_bit(nr, 
addr)
 
 #define generic_find_next_zero_le_bit(addr, size, offset) 
find_next_zero_bit(addr, size, offset)
+#define generic_find_next_le_bit(addr, size, offset) \
+   find_next_bit(addr, size, offset)
 
 #elif defined(__BIG_ENDIAN)
 
@@ -42,6 +44,8 @@
 
 extern unsigned long generic_find_next_zero_le_bit(const unsigned long *addr,
unsigned long size, unsigned long offset);
+extern unsigned long generic_find_next_le_bit(const unsigned long *addr,
+   unsigned long size, unsigned long offset);
 
 #else
 #error "Please fix "
diff --git a/include/asm-powerpc/bitops.h b/include/asm-powerpc/bitops.h
index 8144a27..60652a3 100644
--- a/include/asm-powerpc/bitops.h
+++ b/include/asm-powerpc/bitops.h
@@ -310,6 +310,8 @@ static __inline__ int test_le_bit(unsigned long nr,
 unsigned long generic_find_next_zero_le_bit(const unsigned long *addr,
unsigned long size, unsigned long offset);
 
+unsigned long generic_find_next_le_bit(const unsigned long *addr,
+   unsigned long size, unsigned long offset);
 /* Bitmap functions for the ext2 filesystem */
 
 #define ext2_set_bit(nr,addr) \
@@ -329,6 +331,8 @@ unsigned long generic_find_next_zero_le_bit(const unsigned 
long *addr,
 #define ext2_find_next_zero_bit(addr, size, off) \
generic_find_next_zero_le_bit((unsigned long*)addr, size, off)
 
+#define ext2_find_next_bit(addr, size, off) \
+   generic_find_next_le_bit((unsigned long *)addr, size, off)
 /* Bitmap functions for the minix filesystem.  */
 
 #define minix_test_and_set_bit(nr,addr) \
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index cdee7aa..c7b9bb2 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -502,6 +502,7 @@ do {
   \
 #define ext4_test_bit  ext2_test_bit
 #define ext4_find_first_zero_bit   ext2_find_first_zero_bit
 #define ext4_find_next_zero_bitext2_find_next_zero_bit
+#define ext4_find_next_bit ext2_find_next_bit
 
 /*
  * Maximal mount counts between two filesystem checks
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index bda0d71..0306c04 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -178,4 +178,48 @@ found_middle_swap:
 
 EXPORT_SYMBOL(generic_find_next_zero_le_bit);
 
+unsigned long generic_find_next_le_bit(const unsigned long *addr, unsigned
+   long size, unsigned long offset)
+{
+   const unsigned long *p = addr + BITOP_WORD(offset);
+   unsigned long result = offset & ~(BITS_PER_LONG - 1);
+   unsigned long tmp;
+
+   if (offset >= size)
+   return size;
+   size -= result;
+   offset &= (BITS_PER_LONG - 1UL);
+   if (offset) {
+   tmp = ext2_swabp(p++);
+   tmp &= (~0UL << offset);
+   if (size < BITS_PER_LONG)
+   goto found_first;
+   if (tmp)
+   goto found_middle;
+   size -= BITS_PER_LONG;
+   result += BITS_PER_LONG;
+   }
+
+   while (size & ~(BITS_PER_LONG - 1)) {
+   tmp = *(p++);
+   if (tmp)
+   goto found_middle_swap;
+   result += BITS_PER_LONG;
+   size -= BITS_PER_LONG;
+   }
+   if (!size)
+   return result;
+   tmp = ext2_swabp(p);
+found_first:
+   tmp &= (~0UL >> (BITS_PER_LONG - size));
+   if (tmp == 0UL)

[PATCH] ext4: Fix spare warnings

2007-09-20 Thread Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/inode.c |6 --
 include/linux/ext4_fs.h |   16 
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a4848e0..307e240 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3177,12 +3177,14 @@ int ext4_mark_inode_dirty(handle_t *handle, struct 
inode *inode)
  iloc, handle);
if (ret) {
EXT4_I(inode)->i_state |= EXT4_STATE_NO_EXPAND;
-   if (mnt_count != sbi->s_es->s_mnt_count) {
+   if (mnt_count !=
+   le16_to_cpu(sbi->s_es->s_mnt_count)) {
ext4_warning(inode->i_sb, __FUNCTION__,
"Unable to expand inode %lu. Delete"
" some EAs or run e2fsck.",
inode->i_ino);
-   mnt_count = sbi->s_es->s_mnt_count;
+   mnt_count =
+   le16_to_cpu(sbi->s_es->s_mnt_count);
}
}
}
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index c7b9bb2..ab7edaa 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -129,7 +129,7 @@ struct ext4_group_desc
__le16  bg_free_blocks_count;   /* Free blocks count */
__le16  bg_free_inodes_count;   /* Free inodes count */
__le16  bg_used_dirs_count; /* Directories count */
-   __u16   bg_flags;
+   __le16  bg_flags;
__u32   bg_reserved[3];
__le32  bg_block_bitmap_hi; /* Blocks bitmap block MSB */
__le32  bg_inode_bitmap_hi; /* Inodes bitmap block MSB */
@@ -596,13 +596,13 @@ struct ext4_super_block {
 /*150*/__le32  s_blocks_count_hi;  /* Blocks count */
__le32  s_r_blocks_count_hi;/* Reserved blocks count */
__le32  s_free_blocks_count_hi; /* Free blocks count */
-   __u16   s_min_extra_isize;  /* All inodes have at least # bytes */
-   __u16   s_want_extra_isize; /* New inodes should reserve # bytes */
-   __u32   s_flags;/* Miscellaneous flags */
-   __u16   s_raid_stride;  /* RAID stride */
-   __u16   s_mmp_interval; /* # seconds to wait in MMP checking */
-   __u64   s_mmp_block;/* Block for multi-mount protection */
-   __u32   s_raid_stripe_width;/* blocks on all data disks (N*stride)*/
+   __le16  s_min_extra_isize;  /* All inodes have at least # bytes */
+   __le16  s_want_extra_isize; /* New inodes should reserve # bytes */
+   __le32  s_flags;/* Miscellaneous flags */
+   __le16  s_raid_stride;  /* RAID stride */
+   __le16  s_mmp_interval; /* # seconds to wait in MMP checking */
+   __le64  s_mmp_block;/* Block for multi-mount protection */
+   __le32  s_raid_stripe_width;/* blocks on all data disks (N*stride)*/
__u32   s_reserved[163];/* Padding to the end of the block */
 };
 
-- 
1.5.3.1.91.gd3392-dirty

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23-rc6-mm1] - Panic in blk_rq_map_sg() from CCISS driver

2007-09-20 Thread Jens Axboe

On Thu, Sep 20 2007, Lee Schermerhorn wrote:
> PATCH 2.6.23-rc6-mm1 - Panic in blk_rq_map_sg() from CCISS driver
> 
> New scatter/gather list chaining [sg_next()] treats 'page' member of
> struct scatterlist with low bit set [0x01] as a chain pointer to
> another struct scatterlist [array].  The CCISS driver request function
> passes an uninitialized, temporary, on-stack scatterlist array to 
> blk_rq_map_sq().  sg_next() interprets random data on the stack as a
> chain pointer and eventually tries to de-reference an invalid pointer,
> resulting in:
> 
> [] blk_rq_map_sg+0x70/0x170
> PGD 6090c3067 PUD 0
> Oops:  [1] SMP
> last sysfs file: /block/cciss!c0d0/cciss!c0d0p1/dev
> CPU 6
> Modules linked in: ehci_hcd ohci_hcd uhci_hcd
> Pid: 1, comm: init Not tainted 2.6.23-rc6-mm1 #3
> RIP: 0010:[] [] blk_rq_map_sg+0x70/0x170
> RSP: 0018:81060901f768 EFLAGS: 00010206
> RAX: 00040b161000 RBX: 81060901f7d8 RCX: 00040b162c00
> RDX:  RSI: 81060b13a260 RDI: 81060b139600
> RBP: 1400 R08: fffe R09: 0400
> R10:  R11: 00040b163000 R12: 810102fe
> R13: 0001 R14: 0001 R15: 1e00
> FS: 026108f0(0063) GS:810409000b80() knlGS:
> CS: 0010 DS:  ES:  CR0: 8005003b
> CR2: 0001001e CR3: 0006090c6000 CR4: 06e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process init (pid: 1, threadinfo 81060901e000, task 810409020800)
> last branch before last exception/interrupt
> from [] blk_rq_map_sg+0x10a/0x170
> to [] blk_rq_map_sg+0x70/0x170
> Stack: 00018068ea00 810102fe  81001140
> 0002  81040b172000 803acd3d
> 3ec1 8106090d5000 8106090d5000 810102fe
> Call Trace:
> [] do_cciss_request+0x15d/0x4c0
> [] new_slab+0x1c8/0x270
> [] __slab_alloc+0x22d/0x470
> [] mempool_alloc+0x4b/0x130
> [] cfq_set_request+0xee/0x380
> [] mempool_alloc+0x4b/0x130
> [] get_request+0x168/0x360
> [] rb_insert_color+0x8d/0x110
> [] elv_rb_add+0x58/0x60
> [] cfq_add_rq_rb+0x69/0xa0
> [] elv_merged_request+0x5b/0x60
> [] __make_request+0x23d/0x650
> [] __slab_alloc+0x22d/0x470
> [] generic_write_checks+0x140/0x190
> [] generic_make_request+0x1c2/0x3a0
> 
> Kernel panic - not syncing: Attempted to kill init!
> 
> This patch initializes the tmp_sg array to zeroes.  Perhaps not the ultimate
> fix, but an effective work-around.  I can now boot 23-rc6-mm1 on an HP
> Proliant x86_64 with CCISS boot disk.
> 
> Signed-off-by:  Lee Schermerhorn <[EMAIL PROTECTED]>
> 
>  drivers/block/cciss.c |1 +
>  1 file changed, 1 insertion(+)
> 
> Index: Linux/drivers/block/cciss.c
> ===
> --- Linux.orig/drivers/block/cciss.c  2007-09-20 14:59:29.0 -0400
> +++ Linux/drivers/block/cciss.c   2007-09-20 15:06:39.0 -0400
> @@ -2611,6 +2611,7 @@ static void do_cciss_request(struct requ
>  (int)creq->nr_sectors);
>  #endif   /* CCISS_DEBUG */
>  
> + memset(tmp_sg, 0, sizeof(tmp_sg));
>   seg = blk_rq_map_sg(q, creq, tmp_sg);
>  
>   /* get the DMA records for the setup */
> 

Thanks Lee, applied.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pktcdvd: don't rely on bio_init() preserving bio->bi_destructor

2007-09-20 Thread Jens Axboe

On Thu, Sep 20 2007, Laurent Riffard wrote:
> Le 14.09.2007 21:04, Laurent Riffard a écrit :
> > Le 14.09.2007 13:06, Jens Axboe a écrit :
> >> On Fri, Sep 14 2007, Jens Axboe wrote:
> >>> On Fri, Sep 14 2007, Laurent Riffard wrote:
>  Le 10.09.2007 22:19, Laurent Riffard a écrit :
> > Jens,
> >
> > git-block.patch broke pktcdvd, I've got an Oops while syncing:
> >
> > [snip]
>  I dig through git-block.patch and the culprit seems to be commit
>  c94f1c4ac87862675c8d70941973bc3a69aff5d8 "bio: use memset() in
>  bio_init()".
> 
>  Maybe the real bug is a bad bio initialization in pktcdvd driver,
>  which is revealed by this commit ?
> >>> At least pktcdvd doesn't expect bio->bi_io_vec[] to be cleared, that's
> >>> why it's oopsing now. I'll revert this bit for now, thanks for the
> >>> report.
> >> Rethinking this, I think bio_init() is doing the right thing, only
> >> pktcdvd seems to rely on it preserving some members. So I'd rather fixup
> >> pktcdvd instead.
> >>
> >> Does this work for you?
> > 
> > Well, it's better: I was able to mount the DVD-RW, sync, and write data,
> > but kernel oopsed when I unmounted the drive:
> 
> Jens,
> 
> this patch, applied on top of your previous patch, solved it.
> 
> 
> 
> pktcdvd: don't rely on bio_init() preserving bio->bi_destructor

Ah great, thanks for following up on this! Applied.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING

2007-09-20 Thread Paul Mackerras

Linus Torvalds writes:

> It would indeed be nice if we could just take CPU's down early (while 
> everything is working), and run the whole suspend code with just one CPU, 
> rather than having to worry about the ordering between CPU and device 
> takedown.

That is certainly what we want to do on powerpc.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Test harness in the kernel for new syscalls? [Was: Trace code and documentation (updated)]

2007-09-20 Thread Randy Dunlap

On Wed, 19 Sep 2007 20:01:15 +0200 Sam Ravnborg wrote:

> On Wed, Sep 19, 2007 at 06:51:09PM +0100, Christoph Hellwig wrote:
> > On Wed, Sep 19, 2007 at 07:48:45PM +0200, Sam Ravnborg wrote:
> > > > Well, this is kernel code - so util-linux is not the solution here
> > > > obviously :)

so kernel sample code goes in the new samples/ directory,
and userspace sample code gets pushed to util-linux ?

> > > Can you sketch what you have in mind.
> > > We right now have said we wnated to:
> > > 1) include a framework for executing simple new-syscall-test-stubs
> > > 2) have a nice place for kernel example code
> > > 
> > > I could come up with something but I expect you already have something
> > > in your mind where to put stuff.
> > > If I have a rough idea I can start looking into the kbuild bits of it.
> > > Not that I will have it ready within the next two weeks but nice buffer
> > > when I anyway drop sleeping..
> > 
> > I think for samples we just want a samples/ toplevel directory with
> > normal Kbuild and Kconfig files.  Not any different from drivers or
> > filesystems, just a new hiearchary.
> 
> OK - anyone can do this. So I will not worry.
> 
> 
> > tests stuff was rather disliked by Linus, so I wonder wether we should
> > go ahead with it.
> I heard it like "Ok for new syscalls".
> 
> And it is resonable for new syscalls because:
>   o Make the test of the syscall public
>   o Is a nice example of the usage of the syscalls (both good and bad cases)
>   o Is availbale for other platforms that plan to implement the same syscall
>   o We (at least a few sufficiently skilled ones) will then review not only
> the syscall but also the use of the syscall

That's a good idea/plan.

> >  We'd need a test driver like expect to driver the
> > testcases.
> OK - may give it a spin one day.
> But I hope someone that have done similar stuff can come
> with some example code we can adapt to the kernel.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Eric W. Biederman

Nigel Cunningham <[EMAIL PROTECTED]> writes:
>
> Sounds doable, as long as you can cope with long command lines (which 
> shouldn't be a biggie). (If you've got a swapfile or parts of a swap 
> partition already in use, it can be quite fragmented).

Hmm.  This is an interesting problem.  Sharing a swap file or a swap
partition with the actual swap of user space pages does seem to be
a limitation of this approach.

Although the fact that it is simple to write to a separate file may
be a reasonable compensation.

> Andrew, you're seeing that it really doesn't mean the removal of all 
> hibernation code from the kernel being suspended, aren't you? (And if the 
> kexec'd kernel is the same binary, then there's more code again).

More binary size yes not more code to maintain.

As for the rest the current implementation is small enough and allows
for enough beyond hibernation I think it makes sense to eventually
merge assuming a good clean implementation can be achieved.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] 2.6.22.6 networking [ipv4]: fix wrong destination when reply packetes

2007-09-20 Thread David Stevens

I'm not sure why it's using rt_src here, but there are relevant cases that
your description doesn't cover. For example, what happens if  the source
is not set in the original packet?  Does NAT affect this?

You quote RFC text for ICMP echo and the case where the receiving machine
is the final destination, but you're modifying code that is used for all 
ICMP
types and used for ICMP errors generated when acting as an intermediate
router.

In ordinary cases, and certainly with ICMP echo when the source is set in
the original packet and no rewriting is going on (and the address is not 
spoofed),
using the original source as the destination is fine. But have you tested 
or
considered the other cases?

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove broken netfilter binary sysctls from bridging code

2007-09-20 Thread Eric W. Biederman

[EMAIL PROTECTED] (Joseph Fannin) writes:

> The netfilter sysctls in the bridging code don't set strategy routines:
>
>  sysctl table check failed: /net/bridge/bridge-nf-call-arptables .3.10.1 
> Missing
> strategy
>  sysctl table check failed: /net/bridge/bridge-nf-call-iptables .3.10.2 
> Missing
> strategy
>  sysctl table check failed: /net/bridge/bridge-nf-call-ip6tables .3.10.3 
> Missing
> strategy
>  sysctl table check failed: /net/bridge/bridge-nf-filter-vlan-tagged .3.10.4
> Missing strategy
>  sysctl table check failed: /net/bridge/bridge-nf-filter-pppoe-tagged .3.10.5
> Missing strategy
>
> These binary sysctls can't work. The binary sysctl numbers of
> other netfilter sysctls with this problem are being removed.  These
> need to go as well.
>
> Signed-off-by: Joseph Fannin <[EMAIL PROTECTED]>

Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]>

> ---
>
>This *really* needs to be reviewed by someone who knows what this
>is all about.  I've simply extended the removal of netfilter binary
>sysctl numbers so I could load bridge.ko.  I don't particularly
>care if I get attributed for this fix or any of that.
>
>It Works For Me.

Hmm.  This is an interesting case.  The proc method is forcing
the integer to be either 0 or 1 in a racy fashion.  But none of the
users appear to depend upon that.

So this is the least broken set of binary sysctls I have seen caught
by my check.

A really good fix would be to remove the binary side and then to
modify brnf_sysctl_call_tables to allocate a temporary ctl_table and
integer on the stack and only set ctl->data after we have normalized
the written value.  But since in practice nothing cares about
the race a better fix probably isn't worth it.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 3/9] RCU: Preemptible RCU

2007-09-20 Thread Steven Rostedt

[ continued here from comment on patch 1]

On Mon, Sep 10, 2007 at 11:34:12AM -0700, Paul E. McKenney wrote:
>  /* softirq mask and active fields moved to irq_cpustat_t in
> diff -urpNa -X dontdiff linux-2.6.22-b-fixbarriers/include/linux/rcuclassic.h 
> linux-2.6.22-c-preemptrcu/include/linux/rcuclassic.h
> --- linux-2.6.22-b-fixbarriers/include/linux/rcuclassic.h 2007-08-22 
> 14:42:23.0 -0700
> +++ linux-2.6.22-c-preemptrcu/include/linux/rcuclassic.h  2007-08-22 
> 15:21:06.0 -0700
> @@ -142,8 +142,6 @@ extern int rcu_needs_cpu(int cpu);
>  extern void __rcu_init(void);
>  extern void rcu_check_callbacks(int cpu, int user);
>  extern void rcu_restart_cpu(int cpu);
> -extern long rcu_batches_completed(void);
> -extern long rcu_batches_completed_bh(void);
>  
>  #endif /* __KERNEL__ */
>  #endif /* __LINUX_RCUCLASSIC_H */
> diff -urpNa -X dontdiff linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h 
> linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h
> --- linux-2.6.22-b-fixbarriers/include/linux/rcupdate.h   2007-07-19 
> 14:02:36.0 -0700
> +++ linux-2.6.22-c-preemptrcu/include/linux/rcupdate.h2007-08-22 
> 15:21:06.0 -0700
> @@ -52,7 +52,11 @@ struct rcu_head {
>   void (*func)(struct rcu_head *head);
>  };
>  
> +#ifdef CONFIG_CLASSIC_RCU
>  #include 
> +#else /* #ifdef CONFIG_CLASSIC_RCU */
> +#include 
> +#endif /* #else #ifdef CONFIG_CLASSIC_RCU */
>  
>  #define RCU_HEAD_INIT{ .next = NULL, .func = NULL }
>  #define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT
> @@ -218,10 +222,13 @@ extern void FASTCALL(call_rcu_bh(struct 
>  /* Exported common interfaces */
>  extern void synchronize_rcu(void);
>  extern void rcu_barrier(void);
> +extern long rcu_batches_completed(void);
> +extern long rcu_batches_completed_bh(void);
>

And here we put back rcu_batches_completed and rcu_batches_completed_bh
from rcuclassic.h to rcupdate.h ;-)

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Andrew Morton

On Fri, 21 Sep 2007 11:57:26 +1000 Nigel Cunningham <[EMAIL PROTECTED]> wrote:

> Hi.
> 
> On Friday 21 September 2007 11:41:06 Andrew Morton wrote:
> > > On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> > > > On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
> > > <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > Hi Andrew.
> > > > > 
> > > > > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > > > > Seems like good enough for -mm to me.
> > > > > > 
> > > > > > 
> > > > > > Pavel
> > > > > 
> > > > > Andrew, if I recall correctly, you said a while ago that you didn't 
> want 
> > > > > another hibernation implementation in the vanilla kernel. If you're 
> going 
> > > to 
> > > > > consider merging this kexec code, will you also please consider 
> merging 
> > > > > TuxOnIce?
> > > > > 
> > > > 
> > > > The theory is that kexec-based hibernation will mainly use preexisting
> > > > kexec code and will permit us to delete the existing hibernation
> > > > implementation.
> > > > 
> > > > That's different from replacing it.
> > > 
> > > TuxOnIce doesn't remove the existing implementation either. It can 
> > > transparently replace it, but you can enable/disable that at compile time.
> > 
> > Right.  So we end up with two implementations in-tree.  Whereas
> > kexec-based-hibernation leads us to having zero implementations in-tree.
> > 
> > See, it's different.
> 
> That's not true. Kexec will itself be an implementation, otherwise you'd end 
> up with people screaming about no hibernation support. And it won't result in 
> the complete removal of the existing hibernation code from the kernel. At the 
> very least, it's going to want the kernel being hibernated to have an 
> interface by which it can find out which pages need to be saved. I wouldn't 
> be surprised if it also ends up with an interface in which the kernel being 
> hibernated tells it what bdev/sectors in which to save the image as well 
> (otherwise you're going to need a dedicated, otherwise untouched partition 
> exclusively for the kexec'd kernel to use), or what network settings to use 
> if it wants to try to save the image to a network storage device. On top of 
> that, there are all the issues related to device reinitialisation and so on, 
> and it looks like there's greatly increased pain for users wanting to 
> configure this new implementation. Kexec is by no means proven to be the 
> panacea for all the issues.
> 

Maybe, maybe not, dunno.  That's why we haven't merged it yet.  If it ends
up being no good, we won't merge it!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 1/9] RCU: Split API to permit multiple RCU implementations

2007-09-20 Thread Steven Rostedt

On Mon, Sep 10, 2007 at 11:32:08AM -0700, Paul E. McKenney wrote:

[nitpick and two part mail ]

> 
> diff -urpNa -X dontdiff linux-2.6.22/include/linux/rcuclassic.h 
> linux-2.6.22-a-splitclassic/include/linux/rcuclassic.h
> --- linux-2.6.22/include/linux/rcuclassic.h   1969-12-31 16:00:00.0 
> -0800
> +++ linux-2.6.22-a-splitclassic/include/linux/rcuclassic.h2007-08-22 
> 14:42:23.0 -0700
> @@ -0,0 +1,149 @@

[snip]

> + local_bh_enable(); \
> + } while (0)
> +
> +#define __synchronize_sched() synchronize_rcu()
> +
> +extern void __rcu_init(void);
> +extern void rcu_check_callbacks(int cpu, int user);
> +extern void rcu_restart_cpu(int cpu);
> +extern long rcu_batches_completed(void);
> +extern long rcu_batches_completed_bh(void);
> +

> +#endif /* __KERNEL__ */
> +#endif /* __LINUX_RCUCLASSIC_H */
> diff -urpNa -X dontdiff linux-2.6.22/include/linux/rcupdate.h 
> linux-2.6.22-a-splitclassic/include/linux/rcupdate.h
> --- linux-2.6.22/include/linux/rcupdate.h 2007-07-08 16:32:17.0 
> -0700
> +++ linux-2.6.22-a-splitclassic/include/linux/rcupdate.h  2007-07-19 
> 14:02:36.0 -0700

[snip]

>   */
> -#define synchronize_sched() synchronize_rcu()
> +#define synchronize_sched() __synchronize_sched()
>  
> -extern void rcu_init(void);
> -extern void rcu_check_callbacks(int cpu, int user);
> -extern void rcu_restart_cpu(int cpu);
> -extern long rcu_batches_completed(void);
> -extern long rcu_batches_completed_bh(void);

Why is rcu_batches_completed and rcu_batches_completed_bh moved from
rcupdate.h to rcuclassic.h?

[ continued ...]

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bugme-new] [Bug 9043] New: tty not printed to screen

2007-09-20 Thread Ray Lee

On 9/20/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> (Please reply via emailed reply-to-all, not via the bugzilla web interface)
>
> On Thu, 20 Sep 2007 05:46:34 -0700 (PDT) [EMAIL PROTECTED] wrote:
>
> > http://bugzilla.kernel.org/show_bug.cgi?id=9043
> >
> >Summary: tty not printed to screen
> >Product: Other
> >Version: 2.5
> >  KernelVersion: 2.6.23-rc7
> >   Platform: All
> > OS/Version: Linux
> >   Tree: Mainline
> > Status: NEW
> >   Severity: normal
> >   Priority: P1
> >  Component: Other
> > AssignedTo: [EMAIL PROTECTED]
> > ReportedBy: [EMAIL PROTECTED]
> >
> >
> > Most recent kernel where this bug did not occur: 2.6.23-rc6
> > Distribution: Centos 4.5 (Final)  (Careless Network V3)
> > Hardware Environment: NEC PowerMate VL260
> >  Output of "lspci":
> > 00:00.0 Host bridge: Intel Corporation 82945G/GZ/P/PL Memory Controller Hub
> > (rev 02)
> > 00:02.0 VGA compatible controller: Intel Corporation 82945G/GZ Integrated
> > Graphics Controller (rev 02)
> > 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 
> > 1
> > (rev 01)
> > 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> > Controller #1 (rev 01)
> > 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> > Controller #2 (rev 01)
> > 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> > Controller #3 (rev 01)
> > 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> > Controller #4 (rev 01)
> > 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
> > Controller (rev 01)
> > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
> > 00:1e.2 Multimedia audio controller: Intel Corporation 82801G (ICH7 Family)
> > AC'97 Audio Controller (rev 01)
> > 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface
> > Bridge (rev 01)
> > 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller
> > (rev 01)
> > 00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA 
> > IDE
> > Controller (rev 01)
> > 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 
> > 01)
> > 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B 
> > PCI
> > Express Gigabit Ethernet controller (rev 01)
> >
> >
> >
> > Software Environment:
> >   Output of ver_linux :
> > Linux careless2 2.6.23-rc7 #1 SMP Thu Sep 20 10:58:53 CEST 2007 i686 i686 
> > i386
> > GNU/Linux
> > Gnu C  3.4.6
> > Gnu make   3.80
> > binutils   2.15.92.0.2
> > util-linux 2.12a
> > mount  2.12a
> > module-init-tools  3.1-pre5
> > e2fsprogs  1.35
> > quota-tools3.12.
> > PPP2.4.2
> > isdn4k-utils   3.3
> > nfs-utils  1.0.6
> > Linux C Library3.4
> > Dynamic linker (ldd)   2.3.4
> > Procps 3.2.3
> > Net-tools  1.60
> > Kbd1.12
> > Sh-utils   5.2.1
> > udev   039
> > Modules Loaded thermal processor fan button uhci_hcd intelfb
> > i2c_algo_bit rng_core i2c_i801 i2c_core r8169 dm_snapshot dm_zero dm_mirror
> > dm_mod ata_piix libata sd_mod scsi_mod
> >
> >
> >
> > Problem Description:
> > When booting "init 3", the screen prints "ENTERING SLEEPING MODE"
> > and i can't access to any console even by pressing keys.
> > When booting "init 5" Xorg starts nicely and works perfectly but when
> > pressing CTRL-ALT-F[1-6], the same happens : the screen prints "Entering
> > sleeping mode" and i am not able to see anything.
> > All the rest works perfectly so i can access on ssh to the box and  i can 
> > see a
> > new user with the "who" command when logging in on tty1.
> >
> >
> > Please let me know how to post attachment, i think my .config and others 
> > like
> > "lspci -vvv" may be needed.
> >
>
> I don't understnad this report much, but it sounds like a very recent
> regression.
>
> You're not actually trying to suspend the machine at the time, are you?

I'm pretty sure he's not.

> And it doesn't sound like the keyboard has malfunctioned?

Correct. The 'tty' bit is misleading. He's not getting any visible
text console when he hits ctrl-alt-f1 .. f6.

> Does anyone know where this "ENTERING SLEEPING MODE" message is coming
> from?  A bit of googling makes me suspect that it is actually coming from
> your monitor, which perhaps indicates that the kernel is sending incorrect
> DPMS signalling to the monitor, or something like that?

Almost certainly it's the monitor not being able to sync to the output
when he's outside of X. Checking his lspci above, he has an intel
chipset. Looking at the module loaded list, he does indeed have
intelfb loaded. Checking the log from 2.6.23-rc6 to -rc7 shows four
patches (at least)

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Eric W. Biederman

"Huang, Ying" <[EMAIL PROTECTED]> writes:

> Index: linux-2.6.23-rc6/include/linux/kexec.h
> ===
> --- linux-2.6.23-rc6.orig/include/linux/kexec.h 2007-09-20 11:24:25.0
> +0800
> +++ linux-2.6.23-rc6/include/linux/kexec.h 2007-09-20 11:26:03.0 +0800
> @@ -83,6 +83,7 @@
>  
>   unsigned long start;
>   struct page *control_code_page;
> + struct page *swap_page;
>  
>   unsigned long nr_segments;
>   struct kexec_segment segment[KEXEC_SEGMENT_MAX];
> @@ -194,4 +195,12 @@
>  static inline void crash_kexec(struct pt_regs *regs) { }
>  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
>  #endif /* CONFIG_KEXEC */
> +
> +#ifdef CONFIG_KEXEC_JUMP
> +extern int machine_kexec_jump(struct kimage *image);
> +extern unsigned long kexec_jump_back_entry;
> +extern int kexec_jump(void);
> +#else /* !CONFIG_KEXEC_JUMP */
> +static inline int kexec_jump(void) { return 0; }
> +#endif /* CONFIG_KEXEC_JUMP */
>  #endif /* LINUX_KEXEC_H */

Please the kexec_jump code just be triggered off of a flag in
struct kimage.  We just need to define an extra flag to sys_kexec_load
say KEXEC_RETURNS.  Ideally in the long term we would not have to
do anything except to accept the flag.  Adding a flag makes
a nice feature test if you want to see if your kernel supports
the extended version of kexec.

Until we get the hibernation methods sorted out storing the flag in
struct kimage and making the methods that we call conditional feels
like a more maintainable interface.  Especially since we have to
know at kexec image load time what we are going to do with the
kexec image.

> +#ifdef CONFIG_KEXEC_JUMP
> +unsigned long kexec_jump_back_entry;
> +
> +int kexec_jump(void)
> +{
> + int error;
> +
> + if (!kexec_image)
> + return -EINVAL;

I understand where you are coming from with this implementation of
kexec_jump but it looks like this is one of the big parts of this
patch that have not reached their final form.

The line above is racy with sys_kexec_load.

> + pm_prepare_console();
> + suspend_console();
> + error = device_suspend(PMSG_FREEZE);
> + if (error)
> + goto Resume_console;

This as everyone knows needs to be device_shutdown or a better hibernation
replacement.

> + error = disable_nonboot_cpus();
> + if (error)
> + goto Resume_devices;

Can't we just catch the noboot cpu's in a mutex.
disable_nonboot_cpus is actually impossible to implement 100% reliably
with current hardware.  But something smp_call_function so we trap them
at a specific location and then the equivalent when we come back should
be simple.  I guess the tricky part is bringing the cpus back up again.

Using the broken by design version of cpu hotplug really annoys me here.

> + local_irq_disable();
> + /* At this point, device_suspend() has been called, but *not*
> +  * device_power_down(). We *must* device_power_down() now.
> +  * Otherwise, drivers for some devices (e.g. interrupt controllers)
> +  * become desynchronized with the actual state of the hardware
> +  * at resume time, and evil weirdness ensues.
> +  */
> + error = device_power_down(PMSG_FREEZE);
> + if (error)
> + goto Enable_irqs;

This of course should go away when we have the proper methods.

> + save_processor_state();
This line might even be reasonable.
> + error = machine_kexec_jump(kexec_image);
> + restore_processor_state();
>
> + /* NOTE:  device_power_up() is just a resume() for devices
> +  * that suspended with irqs off ... no overall powerup.
> +  */
> + device_power_up();
Yep this can go away.
> + Enable_irqs:
> + local_irq_enable();
> + enable_nonboot_cpus();

I haven't looked at the cpu start up code yet to see if it
is generally implementable.  I would think so, but I guess
we need to be careful with our data structures.

> + Resume_devices:
> + device_resume();
This of course should change.
> + Resume_console:
> + resume_console();
> + pm_restore_console();

Odd.  I'm a little surprised that the console is the last
thing we restore.  But it does make sense to treat it specially.

> + return error;
> +}
> +#endif /* CONFIG_KEXEC_JUMP */

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH] make mis-configured cpu hotplug safer

2007-09-20 Thread KAMEZAWA Hiroyuki

When we want to hot-add a new cpu,  it should be on cpu_possible_map.
On some archs, we have to specify additional_cpus= boot option.
(x86_64, ia64, s390 seems to need this. others ?)

If a user enable a cpu which is not counted as possible_cpu, the system
will panic. This patch disables to register cpu control if cpu is not in
possible_map.

Works as expected on ia64/cpu-hotplug-by-ACPI case.

Consideration:
handling this issue in cpu_up() is an another way. 

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>


---
 drivers/base/cpu.c |   11 +++
 1 file changed, 11 insertions(+)

Index: linux-2.6.23-rc6-mm1/drivers/base/cpu.c
===
--- linux-2.6.23-rc6-mm1.orig/drivers/base/cpu.c
+++ linux-2.6.23-rc6-mm1/drivers/base/cpu.c
@@ -113,6 +113,17 @@ static SYSDEV_ATTR(crash_notes, 0400, sh
 int __devinit register_cpu(struct cpu *cpu, int num)
 {
int error;
+
+#ifdef CONFIG_HOTPLUG_CPU
+   if (!cpu_isset(num, cpu_possible_map)) {
+   printk("Newly added cpu is not configured"
+   "as hot-add-candidate at boot time\n");
+#if defined(CONFIG_X86_64) || defined(CONFIG_IA64) || defined(CONFIG_S390)
+   printk("please check additional_cpus= boot option\n");
+#endif
+   return -EINVAL;
+   }
+#endif
cpu->node_id = cpu_to_node(num);
cpu->sysdev.id = num;
cpu->sysdev.cls = _sysdev_class;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23-rc7 1/3] async_tx: usage documentation and developer notes

2007-09-20 Thread Randy Dunlap

On Thu, 20 Sep 2007 18:27:40 -0700 Dan Williams wrote:

> Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
> ---

Hi Dan,

Looks pretty good and informative.  Thanks.

(nits below :)


>  Documentation/crypto/async-tx-api.txt |  217 
> +
>  1 files changed, 217 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/crypto/async-tx-api.txt 
> b/Documentation/crypto/async-tx-api.txt
> new file mode 100644
> index 000..48d685a
> --- /dev/null
> +++ b/Documentation/crypto/async-tx-api.txt
> @@ -0,0 +1,217 @@
> +  Asynchronous Transfers/Transforms API
> +
> +1 INTRODUCTION
> +
> +2 GENEALOGY
> +
> +3 USAGE
> +3.1 General format of the API
> +3.2 Supported operations
> +3.2 Descriptor management

duplicate 3.2

> +3.3 When does the operation execute?
> +3.4 When does the operation complete?
> +3.5 Constraints
> +3.6 Example
> +
> +4 DRIVER DEVELOPER NOTES
> +4.1 Conformance points
> +4.2 "My application needs finer control of hardware channels"
> +
> +5 SOURCE
> +
> +---
> +
> +1 INTRODUCTION
> +
> +The async_tx api provides methods for describing a chain of asynchronous
> +bulk memory transfers/transforms with support for inter-transactional
> +dependencies.  It is implemented as a dmaengine client that smooths over
> +the details of different hardware offload engine implementations.  Code
> +that is written to the api can optimize for asynchronous operation and
> +the api will fit the chain of operations to the available offload
> +resources.
> +

I would s/api/API/g .

> +2 GENEALOGY
> +
[snip]

> +
> +3 USAGE
> +
> +3.1 General format of the API:
> +struct dma_async_tx_descriptor *
> +async_(,
> +   enum async_tx_flags flags,
> +   struct dma_async_tx_descriptor *dependency,
> +   dma_async_tx_callback callback_routine,
> +   void *callback_parameter);
> +
> +3.2 Supported operations:
> +memcpy   - memory copy between a source and a destination buffer
> +memset   - fill a destination buffer with a byte value
> +xor   - xor a series of source buffers and write the result to a
> +destination buffer
> +xor_zero_sum - xor a series of source buffers and set a flag if the
> +result is zero.  The implementation attempts to prevent
> +writes to memory
> +
> +3.2 Descriptor management:

duplicate 3.2

> +The return value is non-NULL and points to a 'descriptor' when the operation
> +has been queued to execute asynchronously.  Descriptors are recycled
> +resources, under control of the offload engine driver, to be reused as
> +operations complete.  When an application needs to submit a chain of
> +operations it must guarantee that the descriptor is not automatically 
> recycled
> +before the dependency is submitted.  This requires that all descriptors be
> +acknowledged by the application before the offload engine driver is allowed 
> to
> +recycle (or free) the descriptor.  A descriptor can be acked by:

can be acked by any of:   (?)

> +1/ setting the ASYNC_TX_ACK flag if no operations are to be submitted
> +2/ setting the ASYNC_TX_DEP_ACK flag to acknowledge the parent
> +   descriptor of a new operation.
> +3/ calling async_tx_ack() on the descriptor.
> +
> +3.3 When does the operation execute?:

Drop ':'

> +Operations do not immediately issue after return from the
> +async_ call.  Offload engine drivers batch operations to
> +improve performance by reducing the number of mmio cycles needed to
> +manage the channel.  Once a driver specific threshold is met the driver

   driver-specific

> +automatically issues pending operations.  An application can force this
> +event by calling async_tx_issue_pending_all().  This operates on all
> +channels since the application has no knowledge of channel to operation
> +mapping.
> +
> +3.4 When does the operation complete?:

drop ':'

> +There are two methods for an application to learn about the completion
> +of an operation.
> +1/ Call dma_wait_for_async_tx().  This call causes the cpu to spin while

s/cpu/CPU/g

> +   it polls for the completion of the operation.  It handles dependency
> +   chains and issuing pending operations.
> +2/ Specify a completion callback.  The callback routine runs in tasklet
> +   context if the offload engine driver supports interrupts, or it is
> +   called in application context if the operation is carried out
> +   synchronously in software.  The callback can be set in the call to
> +   async_, or when the application needs to submit a chain of
> +   unknown length it can use the async_trigger_callback() routine to set a
> +   completion interrupt/callback at the end of the chain.
> +
> +3.5 Constraints:
> +1/ Calls to async_ are not permitted in irq context.  Other

s/irq/IRQ/g

> +   contexts are permitted provided constraint #2 is not violated.
> +2/ Completion callback routines can not submit new operations.  This

   cannot

>

[Fwd: [BUG:] forcedeth: MCP55 not allowing DHCP]

2007-09-20 Thread Casey Dahlin

Apparently I posted this in the middle of an unrelated thread by 
mistake. If this is the third message you are getting in regard to this 
issue, sorry :( Just trying to get it in the right place.

 Original Message 
Subject:[BUG:] forcedeth: MCP55 not allowing DHCP
Date:   Tue, 11 Sep 2007 18:05:15 -0400
From:   Casey Dahlin <[EMAIL PROTECTED]>
To: Linux Kernel 

I have an Asus Striker Extreme motherboard with two built in MCP55 GigE 
interfaces. When I build with the original Fedora 7 release kernel ( 
ftp://ftp.belnet.be/linux/fedora/linux/releases/7/Fedora/i386/os/Fedora/kernel-2.6.21-1.3194.fc7.i686.rpm 
) everything works fine. However, when I boot with any updated kernels 
or any other kernel (have tried building from several points in the 
linus git tree between 2.6.20 and .23-rc3, and 2.6.21.2 in -stable) I 
cannot get an IP address via dhcp. There is no error in dmesg. The card 
shows a link and otherwise appears to be working, but it is as if the 
dhcp server has been removed from the network.

On a running system there is no indication that this is a kernel bug at 
all, however by varying only the kernel the bug appears and disappears. 
I've run all these tests repeatedly with no intervening updates of any 
other packages.

As I said I attempted to build 2.6.21.2 ( the point of divergence 
between the Fedora kernel in question and -stable ) and still the card 
did not work. I will next attempt to manually build the rpm for the 
release kernel. If this works I will try experimenting with the included 
patches to narrow it down, but at this point I'm at a complete loss.

-Casey Dahlin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: MAX_RT_PRIO - 1 Highest prio?

2007-09-20 Thread Steven Rostedt

On Wed, Sep 19, 2007 at 05:20:27PM -0600, Chris Rigg wrote:
> Hello,

Hi Chris,

>
> First, I'm assuming that if I want my task to have the HIGHEST priority in 
> the system (i.e. preempt any other task whenever it is put into the ready 
> queue (assuming I have preemption turned on/configured)), I use 
> sched_setscheduler (...) and use the sched_priority in sched_param for 
> MAX_RT_PRIO -1. Is this correct?

Actually it's probably best to use sched_get_priority_max for the max
prio.

>
> Second, assuming that MAX_RT_PRIO-1 is the highest, would it be bad on an 
> SMP/Hyperthreading system (that's using the migration thread balancing in 
> 2.6.20.7) to set a task's priority to MAX_RT_PRIO -1 given the fact that 
> the migration threads are already set to MAX_RT_PRIO -1? Should I be 
> setting my task's prio to MAX_RT_PRIO-2 to not interfere with the load 
> balancing?

The migrate task of a given CPU isn't the only one that will take
tasks off its CPU to push them to others (although it does do that).
But there's other load balancing work going on in the scheduler
(looking at the 2.6.20 sched.c).

Although it would be interesting to see what the result would be if you
had N+1 tasks running on N CPUs all doing busy loops, and make one of
the tasks with the prio of MAX_RT_PRIO-1, and see if we have one task
that is starved and never schedules. But I'm sure this should be fixed
(if it was ever broken) with the latest scheduling work that's being
done in the most recent kernels.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Eric W. Biederman

Nigel Cunningham <[EMAIL PROTECTED]> writes:
>
> That's not true. Kexec will itself be an implementation, otherwise you'd end 
> up with people screaming about no hibernation support. 

There needs to be an implementation of hibernation based on kexec with
return yes.

> And it won't result in 
> the complete removal of the existing hibernation code from the kernel. At the 
> very least, it's going to want the kernel being hibernated to have an 
> interface by which it can find out which pages need to be saved.

That interface should be running kernel -> user space -> target kernel.
Not direct kernel to kernel.

> I wouldn't 
> be surprised if it also ends up with an interface in which the kernel being 
> hibernated tells it what bdev/sectors in which to save the image as well 
> (otherwise you're going to need a dedicated, otherwise untouched partition 
> exclusively for the kexec'd kernel to use), or what network settings to use 
> if it wants to try to save the image to a network storage device. 

initramfs.  We already seem to have that interface.  And distros
seems to do a pretty decent job of using it to configure systems.

> On top of 
> that, there are all the issues related to device reinitialisation and so on, 

Yes.

> and it looks like there's greatly increased pain for users wanting to 
> configure this new implementation. 

Not to be callous but that really is a user space and distro issue.

> Kexec is by no means proven to be the panacea for all the issues.

I agree.  I'm still not quite convinced it will do a satisfactory job.
But I think it does make sense to implement a general kexec with
return and see if that can reasonably be used for handling hibernation
issues.  If done cleanly and with care the implementation won't be
hibernation specific.

Frankly this looks like the best way I can see to implement a general
mechanism for calling silly firmware/BIOS/EFI services after we
have a kernel up and running.  It's a little bit like allowing
X to call iopl(3) and do inb/outb directly.

The configuration issues you raise pretty much exist for kexec on
panic, and they seem to be being resolved for that case in a
reasonable way.  I do agree that the current kexec+return effort seems
to be one of those unfortunate cases where we give every mechanism in
the kernel to do something in user space and then no one actually
implements the user space.  That doesn't do any one any good.

For hibernation we don't have the absolute need to step outside of the
current kernel that we do in the kexec on panic approach.  However we
have this practical fight about mechanism and policy, and kexec with
return has this seductive allure that it appears to be the minimal
necessary mechanism in the kernel.

No one has yet attacked the hard problem of coming up with separate
hibernate methods for drivers.  This should be the hard part of the
puzzle, and the recurring work from a kernel maintenance point of
view.  There is some reason to hope that things will be a maintenance
will be a little simpler because you can get at all of the distinct
pieces of the puzzle.

Currently kexec with return appears to require the minimal amount of
mechanism in the kernel and leaves the policy to someplace else, plus
the code is not hibernation specific.  We could use it to make runtime
EFI calls, or to implement cooperative multitasking between kernels.

My current opinion is that the patches are starting to get close
enough that it isn't a waste of my time reviewing them.  But there
is still a fair amount to be done before this code is in shape for
us to merge it into the kernel.

At 500 or so lines I don't feel bad about pushing back until all of
the core user interface issues are resolved, and we have the code
calling the proper driver methods.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] binfmt_flat: minimum support for theBlackfin relocations

2007-09-20 Thread David McCullough


Jivin Robin Getz lays it down ...
> On Thu 20 Sep 2007 11:03, David McCullough pondered:
> > I would say that (a) is definately not the case.  I am sure the BF guys
> > will say they have been banging us on the head with changes for a long
> > time and getting no where as we considered the changes to severe or out
> > of line.
> 
> I don't think we have been "banging heads" with you (unless that is your 
> feeling?) - how about "working together, but diverting to satisfy different 
> needs" :)

No head banging feelings here,  but I would understand if you guys felt
that way occasionally ;-)  I obviously forgot the happy face on that
statement.  It was meant as a good thing.

> I think that we have had more issues in the uClinux-dist (userspace and build 
> environment), but for kernel code, we have moved from some non-standard 
> (stupid) things we were doing early on to what we have today - which is as 
> common/standard with other archs as we can be.
> 
> Although this is slightly off topic - on the uClinux distribution side - most 
> of our changes are based on requirements/desires from being able to support 
> fdpic elf and flat formats, and to attempt to make things easier for end 
> users/us to use/maintain. Where we do make changes - we always send the patch 
> upstream and have the conversation with you (not everyone else does this), 
> and some/most times rework things so they are more acceptable to you. We 
> don't always come to an agreement - but we always have the discussion, and 
> are willing to move if we can make things better that still meets both our 
> needs/desires.
> 
> > This particular patch was trivial in comparison to others I've seen,
> 
> That is what we thought.
> 
> > it fixed all the existing arches (not something that is always done) and
> > seemed a reasonable start to finally get the BF guys up and running.
> > Still, happy to make it better of course ;-)
> 
> As always - we are more than happy to explore/review alternative patches if 
> people want to write/sumbit them.

Cheers,
Davidm

-- 
David McCullough,  [EMAIL PROTECTED],   Ph:+61 734352815
Secure Computing - SnapGear  http://www.uCdot.org http://www.cyberguard.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, group scheduler, fixes

2007-09-20 Thread Willy Tarreau

On Fri, Sep 21, 2007 at 04:40:55AM +0200, Mike Galbraith wrote:
> On Thu, 2007-09-20 at 21:48 +0200, Willy Tarreau wrote:
> 
> > I don't know if this is relevant, but 4294966399 in nr_uninterruptible
> > for cpu#0 equals -897, exactly the negation of cpu1.nr_uninterruptible.
> > I don't know if this rings a bell for someone or if it's a completely
> > useless comment, but just in case...
> 
> A task can block on one cpu, and wake up on another, which isn't
> tracked, hence the fishy looking numbers.  The true nr_uninterruptible
> is the sum of all.

OK, thanks Mike for this clarification.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS on loopback locks up entire system(2.6.23-rc6)?

2007-09-20 Thread Chakri n

Thanks Trond, for clarifying this for me.

I have seen similar behavior when a remote NFS server is not
available. Many processes wait end up waiting in nfs_release_page. So,
what will happen if the remote server is not available,
nfs_release_page cannot free the memory since it waits on rpc request
to complete, which never completes and processes wait in there for
ever?

And unfortunately in my case, I cannot use "mount --bind". I want to
use the same file system from two different nodes, and I want file &
record locking to be consistent. The only way to make sure locking is
consistent is to use loopback NFS on 1 host and NFS mount the same
file system on other nodes, so that NFS server ensures file & record
locking to be consistent. Is there any alternative to this?

Is it possible or any efforts to integrate ext3 or other local file
systems locking & network file system locking, so that user can use
"mount --bind" on local host and NFS mount on remote nodes, but file &
record locking will be consistent between both the nodes?

Thanks
--Chakri

On 9/20/07, Trond Myklebust <[EMAIL PROTECTED]> wrote:
> On Thu, 2007-09-20 at 17:22 -0700, Chakri n wrote:
> > Hi,
> >
> > I am testing NFS on loopback locks up entire system with 2.6.23-rc6 kernel.
> >
> > I have mounted a local ext3 partition using loopback NFS (version 3)
> > and started my test program. The test program forks 20 threads
> > allocates 10MB for each thread, writes & reads a file on the loopback
> > NFS mount. After running for about 5 min, I cannot even login to the
> > machine. Commands like ps etc, hang in a live session.
> >
> > The machine is a DELL 1950 with 4Gig of RAM, so there is plenty of RAM
> > & CPU to play around and no other io/heavy processes are running on
> > the system.
> >
> > vmstat output shows no buffers are actually getting transferred in or
> > out and iowait is 100%.
> >
> > [EMAIL PROTECTED] ~]# vmstat 1
> > procs ---memory-- ---swap-- -io --system--
> > -cpu--
> >  r  bswpd   free   buff   cache   si   so   bi   bo
> > in cs us sy id wa st
> >  0 24116 110080  11132 304566400 0 0   28  345  0
> > 1  0 99  0
> >  0 24116 110080  11132 304566400 0 05  329  0
> > 0  0 100  0
> >  0 24116 110080  11132 304566400 0 0   26  336  0
> > 0  0 100  0
> >  0 24116 110080  11132 304566400 0 08  335  0
> > 0  0 100  0
> >  0 24116 110080  11132 304566400 0 0   26  352  0
> > 0  0 100  0
> >  0 24116 110080  11132 304566400 0 08  351  0
> > 0  0 100  0
> >  0 24116 110080  11132 304566400 0 0   23  358  0
> > 1  0 99  0
> >  0 24116 110080  11132 304566400 0 0   10  350  0
> > 0  0 100  0
> >  0 24116 110080  11132 304566400 0 0   26  363  0
> > 0  0 100  0
> >  0 24116 110080  11132 304566400 0 08  346  0
> > 1  0 99  0
> >  0 24116 110080  11132 304566400 0 0   26  360  0
> > 0  0 100  0
> >  0 24116 110080  11140 304565600 8 0   11  345  0
> > 0  0 100  0
> >  0 24116 110080  11140 304566400 0 0   27  355  0
> > 0  2 97  0
> >  0 24116 110080  11140 304566400 0 09  330  0
> > 0  0 100  0
> >  0 24116 110080  11140 304566400 0 0   26  358  0
> > 0  0 100  0
> >
> >
> > The following is the backtrace of
> > 1. one of the threads of my test program
> > 2. nfsd daemon and
> > 3. a generic command like pstree, after the machine hangs:
> > -
> > crash> bt 3252
> > PID: 3252   TASK: f6f3c610  CPU: 0   COMMAND: "test"
> >  #0 [f6bdcc10] schedule at c0624a34
> >  #1 [f6bdcc84] schedule_timeout at c06250ee
> >  #2 [f6bdccc8] io_schedule_timeout at c0624c15
> >  #3 [f6bdccdc] congestion_wait at c045eb7d
> >  #4 [f6bdcd00] balance_dirty_pages_ratelimited_nr at c045ab91
> >  #5 [f6bdcd54] generic_file_buffered_write at c0457148
> >  #6 [f6bdcde8] __generic_file_aio_write_nolock at c04576e5
> >  #7 [f6bdce40] try_to_wake_up at c042342b
> >  #8 [f6bdce5c] generic_file_aio_write at c0457799
> >  #9 [f6bdce8c] nfs_file_write at f8c25cee
> > #10 [f6bdced0] do_sync_write at c0472e27
> > #11 [f6bdcf7c] vfs_write at c0473689
> > #12 [f6bdcf98] sys_write at c0473c95
> > #13 [f6bdcfb4] sysenter_entry at c0404ddf
> > EAX: 0004  EBX: 0013  ECX: a4966008  EDX: 0098
> > DS:  007b  ESI: 0098  ES:  007b  EDI: a4966008
> > SS:  007b  ESP: a5ae6ec0  EBP: a5ae6ef0
> > CS:  0073  EIP: b7eed410  ERR: 0004  EFLAGS: 0246
> > crash> bt 3188
> > PID: 3188   TASK: f74c4000  CPU: 1   COMMAND: "nfsd"
> >  #0 [f6836c7c] schedule at c0624a34
> >  #1 [f6836cf0] __mutex_lock_slowpath at c062543d
> >  #2 [f6836d0c] mutex_lock at c0625326
> >  #3 [f6836d18] generic_file_aio_write at c0457784
> >

Re: Don't cross the (tty) streams

2007-09-20 Thread Matthew Wilcox

On Thu, Sep 20, 2007 at 11:29:31PM +0200, Andreas Schwab wrote:
> Read the thread starting here:
> .

Thanks, Andreas.  I tested it with /bin/echo instead of the built-in
echo and the problem disappeared.  Both machines were running
bash 3.1.17(1)-release.

"Not all bugs are kernel bugs" x10.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham

Hi.

On Friday 21 September 2007 12:45:57 Huang, Ying wrote:
> On Fri, 2007-09-21 at 12:25 +1000, Nigel Cunningham wrote:
> > Hi.
> > 
> > On Friday 21 September 2007 12:18:57 Huang, Ying wrote:
> > > > That's not true. Kexec will itself be an implementation, otherwise 
you'd 
> > end 
> > > > up with people screaming about no hibernation support. And it won't 
result 
> > in 
> > > > the complete removal of the existing hibernation code from the kernel. 
At 
> > the 
> > > > very least, it's going to want the kernel being hibernated to have an 
> > > > interface by which it can find out which pages need to be saved. I 
> > wouldn't 
> > > 
> > > This has been done by kexec/kdump guys. There is a makedumpfile utility
> > > and vmcoreinfo kernel mechanism to implement this. We can just reuse the
> > > work of kexec/kdump.
> > 
> > You've already said that you are currently saving all pages. How are you 
going 
> > to avoid saving free pages if you don't get the information from the 
kernel 
> > being saved? This will require more than just code reuse.
> 
> I have not tried "makedumpfile". The "makedumpfile" avoids saving free
> pages through checking the "mem_map" of the original kernel. I think
> there is nothing prevent it been used for kexec based hibernation image
> writing.
> 
> This is an example of duplicated effort between kexec/kdump and original
> hibernation implementation. Both kexec/kdump and hibernation need to
> save memory image without saving the free pages. This can be done once
> instead of twice.

Ok.

> > > > be surprised if it also ends up with an interface in which the kernel 
> > being 
> > > > hibernated tells it what bdev/sectors in which to save the image as 
well 
> > > > (otherwise you're going to need a dedicated, otherwise untouched 
partition 
> > > > exclusively for the kexec'd kernel to use), or what network settings 
to 
> > use 
> > > > if it wants to try to save the image to a network storage device. On 
top 
> > of
> > > 
> > > These can be done in user space. The image writing will be done in user
> > > space for kexec base hibernation.
> > 
> > That only complicates things more. Now you need to get the information on 
> > where to save the image from the kernel being saved, then transfer it to 
> > userspace after switching to the kexec kernel. That's more kernel code, 
not 
> > less.
> 
> This is fairly simple in fact. For example, you can specify the
> bdev/sectors in kernel command line when do kexec load "kexec -l <...>
> --append='...'", then the image writing system can get it through
> "cat /proc/cmdline".

Sounds doable, as long as you can cope with long command lines (which 
shouldn't be a biggie). (If you've got a swapfile or parts of a swap 
partition already in use, it can be quite fragmented).

Andrew, you're seeing that it really doesn't mean the removal of all 
hibernation code from the kernel being suspended, aren't you? (And if the 
kexec'd kernel is the same binary, then there's more code again).

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Eric W. Biederman

"Huang, Ying" <[EMAIL PROTECTED]> writes:

> This patch implements the functionality of jumping between the kexeced
> kernel and the original kernel.
>
> A new reboot command named LINUX_REBOOT_CMD_KJUMP is defined to
> trigger the jumping to (executing) the new kernel and jumping back to
> the original kernel.
>
> To support jumping between two kernels, before jumping to (executing)
> the new kernel and jumping back to the original kernel, the devices
> are put into quiescent state (to be fully implemented),

Well this we have an implementation of (it's called shutdown) or does
that method not do enough to meet the requirements of hibernation.
If at all possible I would like to keep reboot, kexec and kexec+return
all using the same device driver methods.

> and the state of devices and CPU is saved. 

Makes a reasonable amount of sense.  We do need to save whatever
state we cannot recover just be reprogramming the hardware.
As long as the drivers are built so this is a good place for a
hot remove to happen we should be in good shape.

> After jumping back from kexeced kernel
> and jumping to the new kernel, the state of devices and CPU are
> restored accordingly. The devices/CPU state save/restore code of
> software suspend is called to implement corresponding function.

At least for now that sounds like a reasonable work around.

I don't think we want to merge this code until we have agreed upon
how the new device_detach and device_reattach (or whatever we call the
device methods for hibernate) are to be implemented.

> To support jumping without preserving memory. One shadow backup page
> is allocated for each page used by new (kexeced) kernel. 

That does not sound correct.  The current implementation of kexec_load
does allocate a source page and give it a destination page and usually
those two pages are different.  But if our memory allocations happen
to return a destination page there we use it directly, making no
copy necessary.

I think we are talking about the same thing but I'm not certain
you have thought about the case where your shadow backup page happens
to be the same as current page.

> When do
> kexec_load, the image of new kernel is loaded into shadow pages, 

Ok.  This sounds like the existing implementation.  Except it
depending on your destination it may force the address.

> and
> before executing, the original pages and the shadow pages are swapped,
> so the contents of original pages are backuped.

Yes.  Unless we happen to have everything allocated on the same page.
Does your code handle that case?  I know the generic kexec code will
pass lists like that in the proper circumstances.  Especially for
the kexec on panic case.

> Before jumping to the
> new (kexeced) kernel and after jumping back to the original kernel,
> the original pages and the shadow pages are swapped too.

Yes.   That sounds right.

> A jump back protocol is defined and documented.

Bleh.  We do need to document the requirements but we don't need a
versioned monster.  And we don't need to be exposing implementation
details in that documentation.

In the kexec world /sbin/kexec or another user space caller is
responsible for passing information to our callers.

To be polite we need to document more but the jump back protocol
really should be as if the entry point kexec handed control to did
a subroutine return.

> Known issues
>
> - A field is added to Linux kernel real-mode header. This is
>   temporary, and should be replaced after the 32-bit boot protocol and
>   setup data patches are accepted.

It shouldn't be needed.

> - The suspend method of device is used to put device in quiescent
>   state. But if the ACPI is enabled this will also put devices into
>   low power state, which prevent the new kernel from booting. So, the
>   ACPI must be disabled both in original kernel and kexeced
>   kernel. This is planed to be resolved after the suspend method and
>   hibernate method is separated for device as proposed earlier in the
>   LKML.

Reasonable.

> - The NX (none executable) bit should be turned off for the control
>   page if available.

Why don't we have a problem with this in the normal kexec case?

More comments below.

> Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
>
> ---
>
>  Documentation/i386/jump_back_protocol.txt |   81 
>  arch/i386/Kconfig |7 +
>  arch/i386/boot/header.S   |2 
>  arch/i386/kernel/machine_kexec.c  |   77 +---
>  arch/i386/kernel/relocate_kernel.S | 187 ++
>  arch/i386/kernel/setup.c  |3 
>  include/asm-i386/bootparam.h  |3 
>  include/asm-i386/kexec.h  |   48 ++-
>  include/linux/kexec.h |9 +
>  include/linux/reboot.h|2 
>  kernel/kexec.c|   59 +
>  kernel/ksysfs.c   |   17 ++
>  kernel/power/Kconfig

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Huang, Ying

On Fri, 2007-09-21 at 12:25 +1000, Nigel Cunningham wrote:
> Hi.
> 
> On Friday 21 September 2007 12:18:57 Huang, Ying wrote:
> > > That's not true. Kexec will itself be an implementation, otherwise you'd 
> end 
> > > up with people screaming about no hibernation support. And it won't 
> > > result 
> in 
> > > the complete removal of the existing hibernation code from the kernel. At 
> the 
> > > very least, it's going to want the kernel being hibernated to have an 
> > > interface by which it can find out which pages need to be saved. I 
> wouldn't 
> > 
> > This has been done by kexec/kdump guys. There is a makedumpfile utility
> > and vmcoreinfo kernel mechanism to implement this. We can just reuse the
> > work of kexec/kdump.
> 
> You've already said that you are currently saving all pages. How are you 
> going 
> to avoid saving free pages if you don't get the information from the kernel 
> being saved? This will require more than just code reuse.

I have not tried "makedumpfile". The "makedumpfile" avoids saving free
pages through checking the "mem_map" of the original kernel. I think
there is nothing prevent it been used for kexec based hibernation image
writing.

This is an example of duplicated effort between kexec/kdump and original
hibernation implementation. Both kexec/kdump and hibernation need to
save memory image without saving the free pages. This can be done once
instead of twice.

> > > be surprised if it also ends up with an interface in which the kernel 
> being 
> > > hibernated tells it what bdev/sectors in which to save the image as well 
> > > (otherwise you're going to need a dedicated, otherwise untouched 
> > > partition 
> > > exclusively for the kexec'd kernel to use), or what network settings to 
> use 
> > > if it wants to try to save the image to a network storage device. On top 
> of
> > 
> > These can be done in user space. The image writing will be done in user
> > space for kexec base hibernation.
> 
> That only complicates things more. Now you need to get the information on 
> where to save the image from the kernel being saved, then transfer it to 
> userspace after switching to the kexec kernel. That's more kernel code, not 
> less.

This is fairly simple in fact. For example, you can specify the
bdev/sectors in kernel command line when do kexec load "kexec -l <...>
--append='...'", then the image writing system can get it through
"cat /proc/cmdline".

> > > that, there are all the issues related to device reinitialisation and so 
> on, 
> > 
> > Yes. Device reinitialisation is needed. But all in all, kexec based
> > hibernation can be much simpler on the kernel side.
> 
> Sorry, but I'm yet to be convinced. I'm not unwilling, I'm just not there yet.
>  
> > > and it looks like there's greatly increased pain for users wanting to 
> > > configure this new implementation. Kexec is by no means proven to be the 
> > > panacea for all the issues.
> > 
> > Configuration is a problem, we will work on it.
> > 
> > But, because it is based on kexec/kdump instead of starting from
> > scratch, the duplicated part between hibernation and kexec/kdump can be
> > eliminated.
> 

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git] CFS-devel, group scheduler, fixes

2007-09-20 Thread Mike Galbraith

On Thu, 2007-09-20 at 21:48 +0200, Willy Tarreau wrote:

> I don't know if this is relevant, but 4294966399 in nr_uninterruptible
> for cpu#0 equals -897, exactly the negation of cpu1.nr_uninterruptible.
> I don't know if this rings a bell for someone or if it's a completely
> useless comment, but just in case...

A task can block on one cpu, and wake up on another, which isn't
tracked, hence the fishy looking numbers.  The true nr_uninterruptible
is the sum of all.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham

Hi.

On Friday 21 September 2007 12:18:57 Huang, Ying wrote:
> > That's not true. Kexec will itself be an implementation, otherwise you'd 
end 
> > up with people screaming about no hibernation support. And it won't result 
in 
> > the complete removal of the existing hibernation code from the kernel. At 
the 
> > very least, it's going to want the kernel being hibernated to have an 
> > interface by which it can find out which pages need to be saved. I 
wouldn't 
> 
> This has been done by kexec/kdump guys. There is a makedumpfile utility
> and vmcoreinfo kernel mechanism to implement this. We can just reuse the
> work of kexec/kdump.

You've already said that you are currently saving all pages. How are you going 
to avoid saving free pages if you don't get the information from the kernel 
being saved? This will require more than just code reuse.

> > be surprised if it also ends up with an interface in which the kernel 
being 
> > hibernated tells it what bdev/sectors in which to save the image as well 
> > (otherwise you're going to need a dedicated, otherwise untouched partition 
> > exclusively for the kexec'd kernel to use), or what network settings to 
use 
> > if it wants to try to save the image to a network storage device. On top 
of
> 
> These can be done in user space. The image writing will be done in user
> space for kexec base hibernation.

That only complicates things more. Now you need to get the information on 
where to save the image from the kernel being saved, then transfer it to 
userspace after switching to the kexec kernel. That's more kernel code, not 
less.

> > that, there are all the issues related to device reinitialisation and so 
on, 
> 
> Yes. Device reinitialisation is needed. But all in all, kexec based
> hibernation can be much simpler on the kernel side.

Sorry, but I'm yet to be convinced. I'm not unwilling, I'm just not there yet.
 
> > and it looks like there's greatly increased pain for users wanting to 
> > configure this new implementation. Kexec is by no means proven to be the 
> > panacea for all the issues.
> 
> Configuration is a problem, we will work on it.
> 
> But, because it is based on kexec/kdump instead of starting from
> scratch, the duplicated part between hibernation and kexec/kdump can be
> eliminated.

Regards,

Nigel
-- 
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND] 2.6.22.6 networking [ipv4]: fix wrong destination when reply packetes

2007-09-20 Thread lepton

Hi,
  This is a resend of this patch with more details. I'd
  like it can be accepted this time.
  The problem: In icmp_reply and ip_send_reply function, 
  we now use rt->rt_src as destination to send out packets.
  For packets received in loopback device, this is wrong 
  sometimes.
  Here is an example (NOTE: eth0 address is set to 10.10.10.1/24):

  #tcpdump -n -i lo icmp &
  [1] 3155
  ...

  # hping3 --icmp --spoof 10.10.10.3 10.10.10.1
  ...
  09:53:49.508449 IP 10.10.10.3 > 10.10.10.1: icmp 8: echo request seq 0
  09:53:49.508482 IP 10.10.10.1 > 10.10.10.1: icmp 8: echo reply seq 0
  09:53:50.525560 IP 10.10.10.3 > 10.10.10.1: icmp 8: echo request seq 256
  09:53:50.525589 IP 10.10.10.1 > 10.10.10.1: icmp 8: echo reply seq 256

  The same thing will happend for tcp:

  # hping3 --syn --destport 1234 --spoof 10.10.10.3 10.10.10.1
  (NOTE: there is no service to listen on port 1234)
  ...
  10:02:59.395715 IP 10.10.10.3.2787 > 10.10.10.1.1234: S
  72057069:72057069(0) win 512 
  10:02:59.395746 IP 10.10.10.1.1234 > 10.10.10.1.2787: R 0:0(0) ack
  72057070 win 0

  As you can see, all destination address is wrong.
  This problem comes from the fact that the route selection for packetes 
travle on loopback device only happend once. When we send out packets
from loopback device, the skb->dst is assigned in ip_route_output. It 
won't be reassinged in packetes recveive path. So the rt->rt_src don't 
equal to ip_hdr(skb)->saddr.
  I don't know why we must use rt->rt_src as destionation address. at least 
  for icmp reply packets, I thin we should use ip_hdr(skb)->saddr as 
  destionation address. this is according to RFC792:
   ...
   Addresses

  The address of the source in an echo message will be the
  destination of the echo reply message.  To form an echo reply
  message, the source and destination addresses are simply reversed,
  the type code changed to 0, and the checksum recomputed.
  
  A possible fix is to do ip_route_input in ip_rcv_finish for packtes 
  received in loopback device. But I think just to use ip_hdr(skb)->saddr 
  instead of rt->rt_src as destination to reply packetes is a more simple fix.
 
  Thanks Kenan Kalajdzic <[EMAIL PROTECTED]> for help me with more details
  about this problem.
  
Signed-off-by: Lepton Wu <[EMAIL PROTECTED]>

diff -X linux-2.6.22.6/Documentation/dontdiff -pru 
linux-2.6.22.6/net/ipv4/icmp.c linux-2.6.22.6-lepton/net/ipv4/icmp.c
--- linux-2.6.22.6/net/ipv4/icmp.c  2007-09-14 17:41:18.0 +0800
+++ linux-2.6.22.6-lepton/net/ipv4/icmp.c   2007-09-18 09:57:30.0 
+0800
@@ -382,6 +382,7 @@ static void icmp_reply(struct icmp_bxm *
struct ipcm_cookie ipc;
struct rtable *rt = (struct rtable *)skb->dst;
__be32 daddr;
+   struct iphdr *ip = ip_hdr(skb);
 
if (ip_options_echo(_param->replyopts, skb))
return;
@@ -393,7 +394,7 @@ static void icmp_reply(struct icmp_bxm *
icmp_out_count(icmp_param->data.icmph.type);
 
inet->tos = ip_hdr(skb)->tos;
-   daddr = ipc.addr = rt->rt_src;
+   daddr = ipc.addr = ip->saddr;
ipc.opt = NULL;
if (icmp_param->replyopts.optlen) {
ipc.opt = _param->replyopts;
diff -X linux-2.6.22.6/Documentation/dontdiff -pru 
linux-2.6.22.6/net/ipv4/ip_output.c linux-2.6.22.6-lepton/net/ipv4/ip_output.c
--- linux-2.6.22.6/net/ipv4/ip_output.c 2007-09-14 17:41:18.0 +0800
+++ linux-2.6.22.6-lepton/net/ipv4/ip_output.c  2007-09-18 09:57:13.0 
+0800
@@ -1337,11 +1337,12 @@ void ip_send_reply(struct sock *sk, stru
struct ipcm_cookie ipc;
__be32 daddr;
struct rtable *rt = (struct rtable*)skb->dst;
+   struct iphdr *ip = ip_hdr(skb);
 
if (ip_options_echo(, skb))
return;
 
-   daddr = ipc.addr = rt->rt_src;
+   daddr = ipc.addr = ip->saddr;
ipc.opt = NULL;
 
if (replyopts.opt.optlen) {


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Huang, Ying

On Fri, 2007-09-21 at 11:57 +1000, Nigel Cunningham wrote:
> Hi.
> 
> On Friday 21 September 2007 11:41:06 Andrew Morton wrote:
> > > On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> > > > On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
> > > <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > Hi Andrew.
> > > > > 
> > > > > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > > > > Seems like good enough for -mm to me.
> > > > > > 
> > > > > > 
> > > > > > Pavel
> > > > > 
> > > > > Andrew, if I recall correctly, you said a while ago that you didn't 
> want 
> > > > > another hibernation implementation in the vanilla kernel. If you're 
> going 
> > > to 
> > > > > consider merging this kexec code, will you also please consider 
> merging 
> > > > > TuxOnIce?
> > > > > 
> > > > 
> > > > The theory is that kexec-based hibernation will mainly use preexisting
> > > > kexec code and will permit us to delete the existing hibernation
> > > > implementation.
> > > > 
> > > > That's different from replacing it.
> > > 
> > > TuxOnIce doesn't remove the existing implementation either. It can 
> > > transparently replace it, but you can enable/disable that at compile time.
> > 
> > Right.  So we end up with two implementations in-tree.  Whereas
> > kexec-based-hibernation leads us to having zero implementations in-tree.
> > 
> > See, it's different.
> 
> That's not true. Kexec will itself be an implementation, otherwise you'd end 
> up with people screaming about no hibernation support. And it won't result in 
> the complete removal of the existing hibernation code from the kernel. At the 
> very least, it's going to want the kernel being hibernated to have an 
> interface by which it can find out which pages need to be saved. I wouldn't 

This has been done by kexec/kdump guys. There is a makedumpfile utility
and vmcoreinfo kernel mechanism to implement this. We can just reuse the
work of kexec/kdump.

> be surprised if it also ends up with an interface in which the kernel being 
> hibernated tells it what bdev/sectors in which to save the image as well 
> (otherwise you're going to need a dedicated, otherwise untouched partition 
> exclusively for the kexec'd kernel to use), or what network settings to use 
> if it wants to try to save the image to a network storage device. On top of

These can be done in user space. The image writing will be done in user
space for kexec base hibernation.

> that, there are all the issues related to device reinitialisation and so on, 

Yes. Device reinitialisation is needed. But all in all, kexec based
hibernation can be much simpler on the kernel side.

> and it looks like there's greatly increased pain for users wanting to 
> configure this new implementation. Kexec is by no means proven to be the 
> panacea for all the issues.

Configuration is a problem, we will work on it.

But, because it is based on kexec/kdump instead of starting from
scratch, the duplicated part between hibernation and kexec/kdump can be
eliminated.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Remove broken netfilter binary sysctls from bridging code

2007-09-20 Thread Joseph Fannin

The netfilter sysctls in the bridging code don't set strategy routines:

 sysctl table check failed: /net/bridge/bridge-nf-call-arptables .3.10.1 
Missing strategy
 sysctl table check failed: /net/bridge/bridge-nf-call-iptables .3.10.2 Missing 
strategy
 sysctl table check failed: /net/bridge/bridge-nf-call-ip6tables .3.10.3 
Missing strategy
 sysctl table check failed: /net/bridge/bridge-nf-filter-vlan-tagged .3.10.4 
Missing strategy
 sysctl table check failed: /net/bridge/bridge-nf-filter-pppoe-tagged .3.10.5 
Missing strategy

These binary sysctls can't work. The binary sysctl numbers of
other netfilter sysctls with this problem are being removed.  These
need to go as well.

Signed-off-by: Joseph Fannin <[EMAIL PROTECTED]>

---

   This *really* needs to be reviewed by someone who knows what this
   is all about.  I've simply extended the removal of netfilter binary
   sysctl numbers so I could load bridge.ko.  I don't particularly
   care if I get attributed for this fix or any of that.

   It Works For Me.

diff -ru linux-2.6.23-rc6-mm1.orig/net/bridge/br_netfilter.c 
linux-2.6.23-rc6-mm1/net/bridge/br_netfilter.c
--- linux-2.6.23-rc6-mm1.orig/net/bridge/br_netfilter.c 2007-09-19 
02:40:49.0 -0400
+++ linux-2.6.23-rc6-mm1/net/bridge/br_netfilter.c  2007-09-20 
20:31:41.0 -0400
@@ -904,7 +904,6 @@
 
 static ctl_table brnf_table[] = {
{
-   .ctl_name   = NET_BRIDGE_NF_CALL_ARPTABLES,
.procname   = "bridge-nf-call-arptables",
.data   = _call_arptables,
.maxlen = sizeof(int),
@@ -912,7 +911,6 @@
.proc_handler   = _sysctl_call_tables,
},
{
-   .ctl_name   = NET_BRIDGE_NF_CALL_IPTABLES,
.procname   = "bridge-nf-call-iptables",
.data   = _call_iptables,
.maxlen = sizeof(int),
@@ -920,7 +918,6 @@
.proc_handler   = _sysctl_call_tables,
},
{
-   .ctl_name   = NET_BRIDGE_NF_CALL_IP6TABLES,
.procname   = "bridge-nf-call-ip6tables",
.data   = _call_ip6tables,
.maxlen = sizeof(int),
@@ -928,7 +925,6 @@
.proc_handler   = _sysctl_call_tables,
},
{
-   .ctl_name   = NET_BRIDGE_NF_FILTER_VLAN_TAGGED,
.procname   = "bridge-nf-filter-vlan-tagged",
.data   = _filter_vlan_tagged,
.maxlen = sizeof(int),
@@ -936,7 +932,6 @@
.proc_handler   = _sysctl_call_tables,
},
{
-   .ctl_name   = NET_BRIDGE_NF_FILTER_PPPOE_TAGGED,
.procname   = "bridge-nf-filter-pppoe-tagged",
.data   = _filter_pppoe_tagged,
.maxlen = sizeof(int),

--
Joseph Fannin
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] make mv643xx_eth.c build again

2007-09-20 Thread Joseph Fannin


The changeset to "Make NAPI polling independent of struct net_device
objects" forgot to change a function prototype in mv643xx_eth.c, and
also introduced a typo that caused the driver not to build.

Signed-off-by: Joseph Fannin <[EMAIL PROTECTED]>

---

This is build-tested only.

diff -ru linux-2.6.23-rc6-mm1.orig/drivers/net/mv643xx_eth.c 
linux-2.6.23-rc6-mm1/drivers/net/mv643xx_eth.c
--- linux-2.6.23-rc6-mm1.orig/drivers/net/mv643xx_eth.c 2007-09-20 
18:17:41.0 -0400
+++ linux-2.6.23-rc6-mm1/drivers/net/mv643xx_eth.c  2007-09-20 
18:17:11.0 -0400
@@ -65,7 +65,7 @@
 static int mv643xx_eth_change_mtu(struct net_device *, int);
 static void eth_port_init_mac_tables(unsigned int eth_port_num);
 #ifdef MV643XX_NAPI
-static int mv643xx_poll(struct net_device *dev, int budget);
+static int mv643xx_poll(struct napi_struct *napi, int budget);
 #endif
 static int ethernet_phy_get(unsigned int eth_port_num);
 static void ethernet_phy_set(unsigned int eth_port_num, int phy_addr);
@@ -561,7 +561,7 @@
/* wait for previous write to complete */
mv_read(MV643XX_ETH_INTERRUPT_MASK_REG(port_num));
 
-   netif_rx_schedule(dev, >napi);
+   netif_rx_schedule(dev, >napi);
}
 #else
if (eth_int_cause & ETH_INT_CAUSE_RX)


--
Joseph Fannin
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1

2007-09-20 Thread Joseph Fannin

On Thu, Sep 20, 2007 at 09:42:44PM +0530, Kamalesh Babulal wrote:
> On 9/20/07, Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> > On 9/20/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > >
> > > On Wed, 19 Sep 2007 19:58:28 -0400
> > > [EMAIL PROTECTED] (Joseph Fannin) wrote:
> > >
> > > > On Tue, Sep 18, 2007 at 01:18:41AM -0700, Andrew Morton wrote:
> > >
> > > ---
> > > a/include/asm-powerpc/smp.h~convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64-fix-2
> > >
> > > +++ a/include/asm-powerpc/smp.h
> > > @@ -25,8 +25,8 @@
> > >
> > > #ifdef CONFIG_PPC64
> > > #include 
> > > -#include 
> > > #endif
> > > +#include 
> > >
> > > extern int boot_cpuid;


> > Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]>
> > ---
> > --- linux-2.6.23-rc6 /drivers/net/mace.c 2007-09-20 17:16:50.0+0530
> > +++ linux-2.6.23-rc6/drivers/net/~mace.c2007-09-20 17:12:
> > 47.0 +0530
> > @@ -633,7 +633,7 @@ static void mace_set_multicast(struct ne
> >  spin_unlock_irqrestore(>lock, flags);
> >  }
> >
> > -static void mace_handle_misc_intrs(struct mace_data *mp, int intr)
> > +static void mace_handle_misc_intrs(struct mace_data *mp, int intr, struct
> > net_device *dev)
> >  {
> >  volatile struct mace __iomem *mb = mp->mace;
> >  static int mace_babbles, mace_jabbers;
> > @@ -669,7 +669,7 @@ static irqreturn_t mace_interrupt(int ir
> >  spin_lock_irqsave(>lock, flags);
> >  intr = in_8(>ir);  /* read interrupt register */
> >  in_8(>xmtrc);  /* get retries */
> > -mace_handle_misc_intrs(mp, intr);
> > +mace_handle_misc_intrs(mp, intr, dev);
> >
> >  i = mp->tx_empty;
> >  while (in_8(>pr) & XMTSV) {
> > @@ -682,7 +682,7 @@ static irqreturn_t mace_interrupt(int ir
> >  */
> > intr = in_8(>ir);
> > if (intr != 0)
> > -   mace_handle_misc_intrs(mp, intr);
> > +   mace_handle_misc_intrs(mp, intr, dev);
> > if (mp->tx_bad_runt) {
> > fs = in_8(>xmtfs);
> > mp->tx_bad_runt = 0;
> > @@ -817,7 +817,7 @@ static void mace_tx_timeout(unsigned lon
> > goto out;
> >
> >  /* update various counters */
> > -mace_handle_misc_intrs(mp, in_8(>ir));
> > +mace_handle_misc_intrs(mp, in_8(>ir), dev);
> >
> >  cp = mp->tx_cmds + NCMDS_TX * mp->tx_empty;

Both these patches have built and booted for me.

I will send a patch for the following error separately, in what
will hopefully be canonical patch submission format, in case that's of
any use.

Thanks.

> drivers/net/mv643xx_eth.c: In function 'mv643xx_eth_int_handler':
> drivers/net/mv643xx_eth.c:564: error: 'bp' undeclared (first use in this
> function)
> drivers/net/mv643xx_eth.c:564: error: (Each undeclared identifier is
> reported only once
> drivers/net/mv643xx_eth.c:564: error: for each function it appears in.)
> drivers/net/mv643xx_eth.c: At top level:
> drivers/net/mv643xx_eth.c:1010: error: conflicting types for 'mv643xx_poll'
> drivers/net/mv643xx_eth.c:68: error: previous declaration of 'mv643xx_poll'
> was here
> make[2]: *** [drivers/net/mv643xx_eth.o] Error 1
> make[1]: *** [drivers/net] Error 2
> make: *** [drivers] Error 2


--
Joseph Fannin
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham

Hi.

On Friday 21 September 2007 11:41:06 Andrew Morton wrote:
> > On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> > > On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
> > <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Hi Andrew.
> > > > 
> > > > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > > > Seems like good enough for -mm to me.
> > > > > 
> > > > >   
> > > > > Pavel
> > > > 
> > > > Andrew, if I recall correctly, you said a while ago that you didn't 
want 
> > > > another hibernation implementation in the vanilla kernel. If you're 
going 
> > to 
> > > > consider merging this kexec code, will you also please consider 
merging 
> > > > TuxOnIce?
> > > > 
> > > 
> > > The theory is that kexec-based hibernation will mainly use preexisting
> > > kexec code and will permit us to delete the existing hibernation
> > > implementation.
> > > 
> > > That's different from replacing it.
> > 
> > TuxOnIce doesn't remove the existing implementation either. It can 
> > transparently replace it, but you can enable/disable that at compile time.
> 
> Right.  So we end up with two implementations in-tree.  Whereas
> kexec-based-hibernation leads us to having zero implementations in-tree.
> 
> See, it's different.

That's not true. Kexec will itself be an implementation, otherwise you'd end 
up with people screaming about no hibernation support. And it won't result in 
the complete removal of the existing hibernation code from the kernel. At the 
very least, it's going to want the kernel being hibernated to have an 
interface by which it can find out which pages need to be saved. I wouldn't 
be surprised if it also ends up with an interface in which the kernel being 
hibernated tells it what bdev/sectors in which to save the image as well 
(otherwise you're going to need a dedicated, otherwise untouched partition 
exclusively for the kexec'd kernel to use), or what network settings to use 
if it wants to try to save the image to a network storage device. On top of 
that, there are all the issues related to device reinitialisation and so on, 
and it looks like there's greatly increased pain for users wanting to 
configure this new implementation. Kexec is by no means proven to be the 
panacea for all the issues.

Regards,

Nigel
-- 
Nigel Cunningham
Pastor
Christian Reformed Church of Cobden
Victoria, Australia
+61 3 5595 1185
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] Don't truncate /proc/PID/environ at 4096 characters

2007-09-20 Thread Arvin Moezzi

2007/9/19, James Pearson <[EMAIL PROTECTED]>:
> +   while (count > 0) {
> +   int this_len, retval;
> +
> +   this_len = mm->env_end - (mm->env_start + src);
> +
> +   if (this_len <= 0)
> +   break;
> +
> +   if (this_len > max_len)
> +   this_len = max_len;
> +
> +   retval = access_process_vm(task, (mm->env_start + src),
> +   page, this_len, 0);
> +
> +   if (retval <= 0) {
> +   ret = retval;
> +   break;
> +   }
> +
> +   if (copy_to_user(buf, page, retval)) {

shouldn't you only copy min(count,retval) bytes? otherwise you could
write beyond the users buffer "buf", right?

Arvin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] binfmt_flat: minimum support for theBlackfin relocations

2007-09-20 Thread Robin Getz

On Thu 20 Sep 2007 11:03, David McCullough pondered:
> I would say that (a) is definately not the case.  I am sure the BF guys
> will say they have been banging us on the head with changes for a long
> time and getting no where as we considered the changes to severe or out
> of line.

I don't think we have been "banging heads" with you (unless that is your 
feeling?) - how about "working together, but diverting to satisfy different 
needs" :)

I think that we have had more issues in the uClinux-dist (userspace and build 
environment), but for kernel code, we have moved from some non-standard 
(stupid) things we were doing early on to what we have today - which is as 
common/standard with other archs as we can be.

Although this is slightly off topic - on the uClinux distribution side - most 
of our changes are based on requirements/desires from being able to support 
fdpic elf and flat formats, and to attempt to make things easier for end 
users/us to use/maintain. Where we do make changes - we always send the patch 
upstream and have the conversation with you (not everyone else does this), 
and some/most times rework things so they are more acceptable to you. We 
don't always come to an agreement - but we always have the discussion, and 
are willing to move if we can make things better that still meets both our 
needs/desires.

> This particular patch was trivial in comparison to others I've seen,

That is what we thought.

> it fixed all the existing arches (not something that is always done) and
> seemed a reasonable start to finally get the BF guys up and running.
> Still, happy to make it better of course ;-)

As always - we are more than happy to explore/review alternative patches if 
people want to write/sumbit them.

-Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Andrew Morton

On Fri, 21 Sep 2007 11:19:59 +1000 Nigel Cunningham <[EMAIL PROTECTED]> wrote:

> Hi.
> 
> On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> > On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
> <[EMAIL PROTECTED]> wrote:
> > 
> > > Hi Andrew.
> > > 
> > > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > > Seems like good enough for -mm to me.
> > > > 
> > > > 
> > > > Pavel
> > > 
> > > Andrew, if I recall correctly, you said a while ago that you didn't want 
> > > another hibernation implementation in the vanilla kernel. If you're going 
> to 
> > > consider merging this kexec code, will you also please consider merging 
> > > TuxOnIce?
> > > 
> > 
> > The theory is that kexec-based hibernation will mainly use preexisting
> > kexec code and will permit us to delete the existing hibernation
> > implementation.
> > 
> > That's different from replacing it.
> 
> TuxOnIce doesn't remove the existing implementation either. It can 
> transparently replace it, but you can enable/disable that at compile time.

Right.  So we end up with two implementations in-tree.  Whereas
kexec-based-hibernation leads us to having zero implementations in-tree.

See, it's different.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bnx2 dirver's firmware images

2007-09-20 Thread Michael Chan

On Thu, 2007-09-20 at 15:49 +0100, Denys Vlasenko wrote:
> 
> 
> Please test these two patches.
> I updated them according to your comments.
> 
> 

I've only tested patch #1.  It worked after some minor modifications
below.

> 
> 
> 
> 
> 
> plain text
> document
> attachment
> (linux-2.6.23-
> rc6.bnx2-1.patch)
> 
> 
> @@ -2767,89 +2769,44 @@ bnx2_set_rx_mode(struct net_device *dev)
> spin_unlock_bh(>phy_lock);
>  }
>  
> -#define FW_BUF_SIZE0x8000
> -
> +/* To be moved to generic lib/ */
>  static int
> -bnx2_gunzip_init(struct bnx2 *bp)
> +bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len, void **outbuf)

outbuf is no longer needed.

> +   rc = zlib_inflateInit2(strm, -MAX_WBITS);
> +   if (rc == Z_OK) {
> +   rc = zlib_inflate(strm, Z_FINISH);
> +   if (rc == Z_OK)

rc will always be Z_STREAM_END in this case since we provide a big
enough gunzip_buf for the whole blob.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Zhenyu Wang

On 2007.09.21 00:10:26 +, Jiri Slaby wrote:
> > Could you try current xf86-video-intel driver? just do
> > git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel
> 
> It works! 

yep, I also pushed a fix for G33 in xf86-video-intel when fixing the intel agp.
So for G33 user, you should upgrade both to be able to work correctly.

> 3d problem, but it has maybe nothing to do with kernel:
> $ glxinfo
> name of display: :0.0
> Unrecognized deviceID 29c2
> X Error of failed request:  GLXBadContext
> ...

It looks you have an old version of mesa, that i915 dri driver doesn't know
your chipset. Try mesa-7.X.

I have also seen X exit broken with 2.6.23-rc6-mm1, will follow this thread
and try Dave's patch.

Thanks for testing!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.23-rc7 2/3] async_tx: fix dma_wait_for_async_tx

2007-09-20 Thread Dan Williams

Fix dma_wait_for_async_tx to not loop forever in the case where a
dependency chain is longer than two entries.  This condition will not
happen with current in-kernel drivers, but fix it for future drivers.

Found-by: Saeed Bishara <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 crypto/async_tx/async_tx.c |   12 ++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/crypto/async_tx/async_tx.c b/crypto/async_tx/async_tx.c
index 0350071..bc18cbb 100644
--- a/crypto/async_tx/async_tx.c
+++ b/crypto/async_tx/async_tx.c
@@ -80,6 +80,7 @@ dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx)
 {
enum dma_status status;
struct dma_async_tx_descriptor *iter;
+   struct dma_async_tx_descriptor *parent;
 
if (!tx)
return DMA_SUCCESS;
@@ -87,8 +88,15 @@ dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx)
/* poll through the dependency chain, return when tx is complete */
do {
iter = tx;
-   while (iter->cookie == -EBUSY)
-   iter = iter->parent;
+
+   /* find the root of the unsubmitted dependency chain */
+   while (iter->cookie == -EBUSY) {
+   parent = iter->parent;
+   if (parent && parent->cookie == -EBUSY)
+   iter = iter->parent;
+   else
+   break;
+   }
 
status = dma_sync_wait(iter->chan, iter->cookie);
} while (status == DMA_IN_PROGRESS || (iter != tx));
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.23-rc7 3/3] raid5: fix ops_complete_biofill

2007-09-20 Thread Dan Williams

ops_complete_biofill tried to avoid calling handle_stripe since all the
state necessary to return read completions is available.  However the
process of determining whether more read requests are pending requires
locking the stripe (to block add_stripe_bio from updating dev->toead).
ops_complete_biofill can run in tasklet context, so rather than upgrading
all the stripe locks from spin_lock to spin_lock_bh this patch just moves
read completion handling back into handle_stripe.

Found-by: Yuri Tikhonov <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   90 +++-
 1 files changed, 46 insertions(+), 44 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4d63773..38c8893 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -512,54 +512,12 @@ async_copy_data(int frombio, struct bio *bio, struct page 
*page,
 static void ops_complete_biofill(void *stripe_head_ref)
 {
struct stripe_head *sh = stripe_head_ref;
-   struct bio *return_bi = NULL;
-   raid5_conf_t *conf = sh->raid_conf;
-   int i, more_to_read = 0;
 
pr_debug("%s: stripe %llu\n", __FUNCTION__,
(unsigned long long)sh->sector);
 
-   /* clear completed biofills */
-   for (i = sh->disks; i--; ) {
-   struct r5dev *dev = >dev[i];
-   /* check if this stripe has new incoming reads */
-   if (dev->toread)
-   more_to_read++;
-
-   /* acknowledge completion of a biofill operation */
-   /* and check if we need to reply to a read request
-   */
-   if (test_bit(R5_Wantfill, >flags) && !dev->toread) {
-   struct bio *rbi, *rbi2;
-   clear_bit(R5_Wantfill, >flags);
-
-   /* The access to dev->read is outside of the
-* spin_lock_irq(>device_lock), but is protected
-* by the STRIPE_OP_BIOFILL pending bit
-*/
-   BUG_ON(!dev->read);
-   rbi = dev->read;
-   dev->read = NULL;
-   while (rbi && rbi->bi_sector <
-   dev->sector + STRIPE_SECTORS) {
-   rbi2 = r5_next_bio(rbi, dev->sector);
-   spin_lock_irq(>device_lock);
-   if (--rbi->bi_phys_segments == 0) {
-   rbi->bi_next = return_bi;
-   return_bi = rbi;
-   }
-   spin_unlock_irq(>device_lock);
-   rbi = rbi2;
-   }
-   }
-   }
-   clear_bit(STRIPE_OP_BIOFILL, >ops.ack);
-   clear_bit(STRIPE_OP_BIOFILL, >ops.pending);
-
-   return_io(return_bi);
-
-   if (more_to_read)
-   set_bit(STRIPE_HANDLE, >state);
+   set_bit(STRIPE_OP_BIOFILL, >ops.complete);
+   set_bit(STRIPE_HANDLE, >state);
release_stripe(sh);
 }
 
@@ -2112,6 +2070,42 @@ static void handle_issuing_new_read_requests6(struct 
stripe_head *sh,
 }
 
 
+/* handle_completed_read_requests - return completion for reads and allow
+ * new read operations to be submitted to the stripe.
+ */
+static void handle_completed_read_requests(raid5_conf_t *conf,
+   struct stripe_head *sh,
+   struct bio **return_bi)
+{
+   int i;
+
+   pr_debug("%s: stripe %llu\n", __FUNCTION__,
+   (unsigned long long)sh->sector);
+
+   /* check if we need to reply to a read request */
+   for (i = sh->disks; i--; ) {
+   struct r5dev *dev = >dev[i];
+
+   if (test_and_clear_bit(R5_Wantfill, >flags)) {
+   struct bio *rbi, *rbi2;
+
+   rbi = dev->read;
+   dev->read = NULL;
+   while (rbi && rbi->bi_sector <
+   dev->sector + STRIPE_SECTORS) {
+   rbi2 = r5_next_bio(rbi, dev->sector);
+   spin_lock_irq(>device_lock);
+   if (--rbi->bi_phys_segments == 0) {
+   rbi->bi_next = *return_bi;
+   *return_bi = rbi;
+   }
+   spin_unlock_irq(>device_lock);
+   rbi = rbi2;
+   }
+   }
+   }
+}
+
 /* handle_completed_write_requests
  * any written block on an uptodate or failed drive can be returned.
  * Note that if we 'wrote' to a failed drive, it will be UPTODATE, but
@@ -2633,6 +2627,14 @@ static void handle_stripe5(struct stripe_head *sh)
s.expanded =

[PATCH 2.6.23-rc7 1/3] async_tx: usage documentation and developer notes

2007-09-20 Thread Dan Williams

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 Documentation/crypto/async-tx-api.txt |  217 +
 1 files changed, 217 insertions(+), 0 deletions(-)

diff --git a/Documentation/crypto/async-tx-api.txt 
b/Documentation/crypto/async-tx-api.txt
new file mode 100644
index 000..48d685a
--- /dev/null
+++ b/Documentation/crypto/async-tx-api.txt
@@ -0,0 +1,217 @@
+Asynchronous Transfers/Transforms API
+
+1 INTRODUCTION
+
+2 GENEALOGY
+
+3 USAGE
+3.1 General format of the API
+3.2 Supported operations
+3.2 Descriptor management
+3.3 When does the operation execute?
+3.4 When does the operation complete?
+3.5 Constraints
+3.6 Example
+
+4 DRIVER DEVELOPER NOTES
+4.1 Conformance points
+4.2 "My application needs finer control of hardware channels"
+
+5 SOURCE
+
+---
+
+1 INTRODUCTION
+
+The async_tx api provides methods for describing a chain of asynchronous
+bulk memory transfers/transforms with support for inter-transactional
+dependencies.  It is implemented as a dmaengine client that smooths over
+the details of different hardware offload engine implementations.  Code
+that is written to the api can optimize for asynchronous operation and
+the api will fit the chain of operations to the available offload
+resources.
+
+2 GENEALOGY
+
+The api was initially designed to offload the memory copy and
+xor-parity-calculations of the md-raid5 driver using the offload engines
+present in the Intel(R) Xscale series of I/O processors.  It also built
+on the 'dmaengine' layer developed for offloading memory copies in the
+network stack using Intel(R) I/OAT engines.  The following design
+features surfaced as a result:
+1/ implicit synchronous path: users of the API do not need to know if
+   the platform they are running on has offload capabilities.  The
+   operation will be offloaded when an engine is available and carried out
+   in software otherwise.
+2/ cross channel dependency chains: the API allows a chain of dependent
+   operations to be submitted, like xor->copy->xor in the raid5 case.  The
+   API automatically handles cases where the transition from one operation
+   to another implies a hardware channel switch.
+3/ dmaengine extensions to support multiple clients and operation types
+   beyond 'memcpy'
+
+3 USAGE
+
+3.1 General format of the API:
+struct dma_async_tx_descriptor *
+async_(,
+ enum async_tx_flags flags,
+ struct dma_async_tx_descriptor *dependency,
+ dma_async_tx_callback callback_routine,
+ void *callback_parameter);
+
+3.2 Supported operations:
+memcpy   - memory copy between a source and a destination buffer
+memset   - fill a destination buffer with a byte value
+xor - xor a series of source buffers and write the result to a
+  destination buffer
+xor_zero_sum - xor a series of source buffers and set a flag if the
+  result is zero.  The implementation attempts to prevent
+  writes to memory
+
+3.2 Descriptor management:
+The return value is non-NULL and points to a 'descriptor' when the operation
+has been queued to execute asynchronously.  Descriptors are recycled
+resources, under control of the offload engine driver, to be reused as
+operations complete.  When an application needs to submit a chain of
+operations it must guarantee that the descriptor is not automatically recycled
+before the dependency is submitted.  This requires that all descriptors be
+acknowledged by the application before the offload engine driver is allowed to
+recycle (or free) the descriptor.  A descriptor can be acked by:
+1/ setting the ASYNC_TX_ACK flag if no operations are to be submitted
+2/ setting the ASYNC_TX_DEP_ACK flag to acknowledge the parent
+   descriptor of a new operation.
+3/ calling async_tx_ack() on the descriptor.
+
+3.3 When does the operation execute?:
+Operations do not immediately issue after return from the
+async_ call.  Offload engine drivers batch operations to
+improve performance by reducing the number of mmio cycles needed to
+manage the channel.  Once a driver specific threshold is met the driver
+automatically issues pending operations.  An application can force this
+event by calling async_tx_issue_pending_all().  This operates on all
+channels since the application has no knowledge of channel to operation
+mapping.
+
+3.4 When does the operation complete?:
+There are two methods for an application to learn about the completion
+of an operation.
+1/ Call dma_wait_for_async_tx().  This call causes the cpu to spin while
+   it polls for the completion of the operation.  It handles dependency
+   chains and issuing pending operations.
+2/ Specify a completion callback.  The callback routine runs in tasklet
+   context if the offload engine driver supports interrupts, or it is
+   called in application context if the operation is carried out
+   synchronously in software.  The callback can be set in the call to
+

[PATCH 2.6.23-rc7 0/3] async_tx and md-accel fixes for 2.6.23

2007-09-20 Thread Dan Williams

Fix a couple bugs and provide documentation for the async_tx api.

Neil, please 'ack' patch #3.

git://lost.foo-projects.org/~dwillia2/git/iop async-tx-fixes-for-linus

Dan Williams (3):
  async_tx: usage documentation and developer notes
  async_tx: fix dma_wait_for_async_tx
  raid5: fix ops_complete_biofill

Documentation/crypto/async-tx-api.txt |  217 +
crypto/async_tx/async_tx.c|   12 ++-
drivers/md/raid5.c|   90 +++---
3 files changed, 273 insertions(+), 46 deletions(-)

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham

Hi.

On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
<[EMAIL PROTECTED]> wrote:
> 
> > Hi Andrew.
> > 
> > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > Seems like good enough for -mm to me.
> > > 
> > >   Pavel
> > 
> > Andrew, if I recall correctly, you said a while ago that you didn't want 
> > another hibernation implementation in the vanilla kernel. If you're going 
to 
> > consider merging this kexec code, will you also please consider merging 
> > TuxOnIce?
> > 
> 
> The theory is that kexec-based hibernation will mainly use preexisting
> kexec code and will permit us to delete the existing hibernation
> implementation.
> 
> That's different from replacing it.

TuxOnIce doesn't remove the existing implementation either. It can 
transparently replace it, but you can enable/disable that at compile time.

Regards,

Nigel
-- 
Nigel Cunningham
Christian Reformed Church of Cobden
103 Curdie Street, Cobden 3266, Victoria, Australia
Ph. +61 3 5595 1185 / +61 417 100 574
Communal Worship: 11 am Sunday.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Andrew Morton

On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham <[EMAIL PROTECTED]> wrote:

> Hi Andrew.
> 
> On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > Seems like good enough for -mm to me.
> > 
> > Pavel
> 
> Andrew, if I recall correctly, you said a while ago that you didn't want 
> another hibernation implementation in the vanilla kernel. If you're going to 
> consider merging this kexec code, will you also please consider merging 
> TuxOnIce?
> 

The theory is that kexec-based hibernation will mainly use preexisting
kexec code and will permit us to delete the existing hibernation
implementation.

That's different from replacing it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 4/4] Port of blktrace to the Linux Kernel Markers.

2007-09-20 Thread Steven Rostedt

On Tue, Sep 18, 2007 at 05:13:28PM -0400, Mathieu Desnoyers wrote:
> +void blk_probe_disarm(void)
> +{
> + int i, err;
> +
> + for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
> + err = marker_disarm(probe_array[i].name);
> + BUG_ON(err);
> + err = IS_ERR(marker_probe_unregister(probe_array[i].name));
> + BUG_ON(err);
> + }
> +}

As well as changing these to WARN_ON.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: System Freeze on Particular workload with kernel 2.6.22.6

2007-09-20 Thread Yucheng Low

Hi all,

Thanks all. After lots of testing, I isolated the problem to one of the
memory modules.

Thought it might have been a kernel problem as I thought memtest should
be exhaustive enough considering I ran it for so long, but apparently not...
Even now, the bad module still does not show any errors in memtest...

Thanks,
Yucheng

Ray Lee wrote:
> On 9/19/07, Low Yucheng <[EMAIL PROTECTED]> wrote:
>   
>> [1.] Summary
>> System Freeze on Particular workload with kernel 2.6.22.6
>>
>> [2.] Description
>> System freezes on repeated application of the following command
>> for f in *png ; do convert -quality 100 $f `basename $f png`jpg; done
>>
>> Problem is consistent and repeatable.
>> Problem persists when running on a different drive, and also in pure console 
>> (no X).
>>
>> One time, the following error logged in syslog:
>> Sep 19 04:22:11 mossnew kernel: [  301.883919] VM: killing process convert
>> Sep 19 04:22:11 mossnew kernel: [  301.884382] swap_free: Unused swap offset 
>> entry ff00
>> Sep 19 04:22:11 mossnew kernel: [  301.884421] swap_free: Unused swap offset 
>> entry 0300
>> Sep 19 04:22:11 mossnew kernel: [  301.884456] swap_free: Unused swap offset 
>> entry 0200
>> Sep 19 04:22:11 mossnew kernel: [  301.884491] swap_free: Unused swap offset 
>> entry ff00
>> Sep 19 04:22:11 mossnew kernel: [  301.884527] swap_free: Unused swap offset 
>> entry ff00
>> Sep 19 04:22:11 mossnew kernel: [  301.884562] swap_free: Unused swap offset 
>> entry 0100
>>
>> Should not be a RAM problem. RAM has survived 12 hrs of Memtest with no 
>> errors.
>> Should not be a CPU problem either. I have been running CPU intensive tasks 
>> for days.
>> 
>
> The "Unused swap offset entry" is almost always a sign of bad memory,
> if google can be trusted. Your workload is *extremely* CPU and memory
> intensive (and even hits the disk!), so this looks like bad RAM, bad
> cooling, or a marginal power supply that is failing under load.
>
> memtest86+ doesn't stress the CPU nearly as much, so it often doesn't
> show all the problems.
>
> Take your RAM down to one stick and try again (looks like you have 2G
> installed?). If that still fails, try different RAM. If that still
> fails, then swap out the power supply for another if you can, and try
> again.
>
> Ray
>
>   

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/4] Linux Kernel Markers - Architecture Independent Code

2007-09-20 Thread Steven Rostedt

On Tue, Sep 18, 2007 at 05:13:25PM -0400, Mathieu Desnoyers wrote:
> +/*
> + * Sets the probe callback corresponding to one marker.
> + */
> +static int set_marker(struct marker_entry **entry,
> + struct __mark_marker *elem)
> +{
> + int ret;
> + BUG_ON(strcmp((*entry)->name, elem->name) != 0);

Can you switch this at least to WARN_ON?  Killing a system with X
running where the user just sees a freeze is not that nice. But a nasty
message in dmesg is very noticable.

-- Steve

> +
> + if ((*entry)->format) {
> + if (strcmp((*entry)->format, elem->format) != 0) {
> + printk(KERN_NOTICE
> + "Format mismatch for probe %s "
> + "(%s), marker (%s)\n",
> + (*entry)->name,
> + (*entry)->format,
> + elem->format);
> + return -EPERM;
> + }
> + } else {
> + ret = marker_set_format(entry, elem->format);
> + if (ret)
> + return ret;
> + }
> + elem->call = (*entry)->probe;
> + elem->pdata = (*entry)->pdata;
> + _immediate_set(elem->state, 1);
> + return 0;
> +}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1

2007-09-20 Thread Tilman Schmidt

Am 20.09.2007 22:25 schrieb Andrew Morton:
> There was a locking imbalance in the IPC code.  Do you have the fixes in
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc6/2.6.23-rc6-mm1/hot-fixes/
> applied?

I hadn't. Now that I have, all the troubles are gone. X comes up
fine, and all of the segfault/invalid context/scheduling while atomic
messages have disappeared.

Thanks,
Tilman

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)



signature.asc
Description: OpenPGP digital signature

Re: [RFC][PATCH] page->mapping clarification [1/3] base functions

2007-09-20 Thread KAMEZAWA Hiroyuki

On Thu, 20 Sep 2007 11:26:47 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Wed, 19 Sep 2007, KAMEZAWA Hiroyuki wrote:
> 
> > Any comments are welcome.
> 
> I am still a bit confused as to what the benefit of this is.
> 
Honestly, I have 3 purposes, 2 for readability/clarificaton and 1 for my trial.

1. Clarify page cache <-> inode relationship before *new concept of page cache*,
   yours or someone else's is introduced.

2. There are some places using PAGE_MAPPING_ANON directly. I don't want to see
   following line in .c file. 
   ==
   anon_vma = (struct anon_vma *)(mapping - PAGE_MAPPING_ANON);
   ==

3. I want to *try* page->mapping overriding... store  memory resource 
controller's   
   information in page->mapping. By this, memory controller doesn't enlarge 
sizeof
   struct page. (works well in my small test.)
   Before doing that, I have to hide page->mapping from direct access.


> > +/*
> > + * On an anonymous page mapped into a user virtual memory area,
> > + * page->mapping points to its anon_vma, not to a struct address_space;
> > + * with the PAGE_MAPPING_ANON bit set to distinguish it.
> > + *
> > + * Please note that, confusingly, "page_mapping" refers to the inode
> > + * address_space which maps the page from disk; whereas "page_mapped"
> > + * refers to user virtual address space into which the page is mapped.
> > + */
> > +#define PAGE_MAPPING_ANON   1
> > +
> > +static inline bool PageAnon(struct page *page)
> 
> bool??? That is unusual?

This is my first experience of using bool in Linux kernel.. :)

I know bool is not very widely used in Linux now but I tried it because 
this function obviously returns yes or no, and C language supports bool as
_Bool now. If messy, I'll avoid using this in this time..


> 
> > +static inline struct address_space *page_mapping_cache(struct page *page)
> > +{
> > +   if (!page->mapping || PageAnon(page))
> > +   return NULL;
> > +   return page->mapping;
> > +}
> 
> That is confusing.
> 
> if (PageAnon(page))
>   return NULL;
> return page->mapping;
ok,

> > +static inline struct address_space *page_mapping(struct page *page)
> > +{
> > +   struct address_space *mapping = page->mapping;
> > +
> > +   VM_BUG_ON(PageSlab(page));
> > +   if (unlikely(PageSwapCache(page)))
> > +   mapping = _space;
> > +#ifdef CONFIG_SLUB
> > +   else if (unlikely(PageSlab(page)))
> > +   mapping = NULL;
> > +#endif
> 
> The #ifdef does not exist in rc6-mm1. No need to reintroduce it.
> 
ok, thanks.

> > +static inline bool
> > +is_page_consistent(struct page *page, struct address_space *mapping)
> > +{
> > +   struct address_space *check = page_mapping_cache(page);
> > +   return (check == mapping);
> > +}
> 
> Why do we need a special function? Why is it safer?
> 
For clarify meaning of compareing page_mapping_cache() with mapping.
Does this reduce readability ?

Thank you for comments.

Regards,
-Kame





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NFS on loopback locks up entire system(2.6.23-rc6)?

2007-09-20 Thread Trond Myklebust

On Thu, 2007-09-20 at 17:22 -0700, Chakri n wrote:
> Hi,
> 
> I am testing NFS on loopback locks up entire system with 2.6.23-rc6 kernel.
> 
> I have mounted a local ext3 partition using loopback NFS (version 3)
> and started my test program. The test program forks 20 threads
> allocates 10MB for each thread, writes & reads a file on the loopback
> NFS mount. After running for about 5 min, I cannot even login to the
> machine. Commands like ps etc, hang in a live session.
> 
> The machine is a DELL 1950 with 4Gig of RAM, so there is plenty of RAM
> & CPU to play around and no other io/heavy processes are running on
> the system.
> 
> vmstat output shows no buffers are actually getting transferred in or
> out and iowait is 100%.
> 
> [EMAIL PROTECTED] ~]# vmstat 1
> procs ---memory-- ---swap-- -io --system--
> -cpu--
>  r  bswpd   free   buff   cache   si   so   bi   bo
> in cs us sy id wa st
>  0 24116 110080  11132 304566400 0 0   28  345  0
> 1  0 99  0
>  0 24116 110080  11132 304566400 0 05  329  0
> 0  0 100  0
>  0 24116 110080  11132 304566400 0 0   26  336  0
> 0  0 100  0
>  0 24116 110080  11132 304566400 0 08  335  0
> 0  0 100  0
>  0 24116 110080  11132 304566400 0 0   26  352  0
> 0  0 100  0
>  0 24116 110080  11132 304566400 0 08  351  0
> 0  0 100  0
>  0 24116 110080  11132 304566400 0 0   23  358  0
> 1  0 99  0
>  0 24116 110080  11132 304566400 0 0   10  350  0
> 0  0 100  0
>  0 24116 110080  11132 304566400 0 0   26  363  0
> 0  0 100  0
>  0 24116 110080  11132 304566400 0 08  346  0
> 1  0 99  0
>  0 24116 110080  11132 304566400 0 0   26  360  0
> 0  0 100  0
>  0 24116 110080  11140 304565600 8 0   11  345  0
> 0  0 100  0
>  0 24116 110080  11140 304566400 0 0   27  355  0
> 0  2 97  0
>  0 24116 110080  11140 304566400 0 09  330  0
> 0  0 100  0
>  0 24116 110080  11140 304566400 0 0   26  358  0
> 0  0 100  0
> 
> 
> The following is the backtrace of
> 1. one of the threads of my test program
> 2. nfsd daemon and
> 3. a generic command like pstree, after the machine hangs:
> -
> crash> bt 3252
> PID: 3252   TASK: f6f3c610  CPU: 0   COMMAND: "test"
>  #0 [f6bdcc10] schedule at c0624a34
>  #1 [f6bdcc84] schedule_timeout at c06250ee
>  #2 [f6bdccc8] io_schedule_timeout at c0624c15
>  #3 [f6bdccdc] congestion_wait at c045eb7d
>  #4 [f6bdcd00] balance_dirty_pages_ratelimited_nr at c045ab91
>  #5 [f6bdcd54] generic_file_buffered_write at c0457148
>  #6 [f6bdcde8] __generic_file_aio_write_nolock at c04576e5
>  #7 [f6bdce40] try_to_wake_up at c042342b
>  #8 [f6bdce5c] generic_file_aio_write at c0457799
>  #9 [f6bdce8c] nfs_file_write at f8c25cee
> #10 [f6bdced0] do_sync_write at c0472e27
> #11 [f6bdcf7c] vfs_write at c0473689
> #12 [f6bdcf98] sys_write at c0473c95
> #13 [f6bdcfb4] sysenter_entry at c0404ddf
> EAX: 0004  EBX: 0013  ECX: a4966008  EDX: 0098
> DS:  007b  ESI: 0098  ES:  007b  EDI: a4966008
> SS:  007b  ESP: a5ae6ec0  EBP: a5ae6ef0
> CS:  0073  EIP: b7eed410  ERR: 0004  EFLAGS: 0246
> crash> bt 3188
> PID: 3188   TASK: f74c4000  CPU: 1   COMMAND: "nfsd"
>  #0 [f6836c7c] schedule at c0624a34
>  #1 [f6836cf0] __mutex_lock_slowpath at c062543d
>  #2 [f6836d0c] mutex_lock at c0625326
>  #3 [f6836d18] generic_file_aio_write at c0457784
>  #4 [f6836d48] ext3_file_write at ffd7
>  #5 [f6836d64] do_sync_readv_writev at c0472d1f
>  #6 [f6836e08] do_readv_writev at c0473486
>  #7 [f6836e6c] vfs_writev at c047358e
>  #8 [f6836e7c] nfsd_vfs_write at f8e7f8d7
>  #9 [f6836ee0] nfsd_write at f8e80139
> #10 [f6836f10] nfsd3_proc_write at f8e86afd
> #11 [f6836f44] nfsd_dispatch at f8e7c20c
> #12 [f6836f6c] svc_process at f89c18e0
> #13 [f6836fbc] nfsd at f8e7c794
> #14 [f6836fe4] kernel_thread_helper at c0405a35
> crash> ps|grep ps
> 234  2   3  cb194000  IN   0.0   0  0  [khpsbpkt]
> 520  2   0  f7e18c20  IN   0.0   0  0  [kpsmoused]
>2859  1   2  f7f3cc20  IN   0.19600   2040  cupsd
>3340   3310   0  f4a0f840  UN   0.04360816  pstree
>3343   3284   2  f4a0f230  UN   0.04212944  ps
> crash> bt 3340
> PID: 3340   TASK: f4a0f840  CPU: 0   COMMAND: "pstree"
>  #0 [e856be30] schedule at c0624a34
>  #1 [e856bea4] rwsem_down_failed_common at c04df6c0
>  #2 [e856bec4] rwsem_down_read_failed at c0625c2a
>  #3 [e856bedc] call_rwsem_down_read_failed at c0625c96
>  #4 [e856bee8] down_read at c043c21a
>  #5 [e856bef0] access_process_vm at c0462039
>  #6 [e856bf38] proc_pid_cmdline at c04a1bbb
>  #7 [e856bf58] proc_info_read at c04a2f41
>  #8

Message codes (Re: [Announce] Linux-tiny project revival)

2007-09-20 Thread Oleg Verych

* Thu, 20 Sep 2007 15:15:47 -0700
* X-MimeOLE: Produced By Microsoft Exchange V6.5
[]
>>*Shrug*.
>>
>>My problem is that switching off printk is the single biggest bloat
> cutter
>>in
>>the kernel, yet it makes the resulting system very hard to support.  It
>>combines a big upside with a big downside, and I'd like something in
>>between.
>
> What about getting even more hard core? 
>
> Use compiler tricks to remove ALL the static printk string from the
> kernel and replace the printk with something that outputs an decimal
> index followed by tuples, of zero to N, hex-strings on a single line.

Not all, but critical info, that must exist in human-readable form of
course.

> Then have the syslogd or some other utility take this cryptic output and
> convolve it with a table (created at compile time) to re-create what
> would have been dumped to the sys-log ring buffer.  This way you strip
> out most of the static text from the kernel and yet can still re-create
> the kernlog output.
>
> At least as a post processing operation

Sure, but a little problem is, that many kernel developers do C (mostly)
and Perl (occasionally), i.e. very few do non-trivial userspace (even
userspace do too much C and Perl sometimes [:
)

> Is this an old idea?  I'm guessing this has been at least proposed
> before

Seriously. When in the Windows there are only messages like:

"Error (Code:0x2012)".

In the Linux... well, embedded targets, this can be turned in a very
efficient thing by means of userspace. On other setups this can be nice
and pleasant one, with yet another L10N project, recently promoted by
README translations. But,,, see problem above.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Disk I/O degraded performance

2007-09-20 Thread David Chinner

On Wed, Sep 19, 2007 at 10:09:12PM +0200, Ramon Chimeno wrote:
> Hi all
> 
> I migrated one of my server from kernel 2.6.18 to the latest 2.6.22
> and I experienced lower disk performance for processes that open file
> with the O_DIRECT flag.
> 
> I did a very simple test program that opens two files with O_DIRECT
> flag and reads the files to end. I monitored the time spent to read
> the files and I have ~ 40% of difference between 2.6.18 and 2.6.22.
> 
> For information, the files are stored on a XFS partition which is part
> of a software raid-5 block device (the raid-5 is made with 3 SATA
> drives).

Start by eliminating the filesystem. i.e. run the same test using
different offsets on the raw device (e.g. seek one fd a few gigabytes
further into the disk than the other then start reading).

Also, you might want to check that you are comparing apples to apples;
are the two files you used in each test the same? If not, are
they all unfragmented, in the same AGs (i.e. all on the same area of
disk as there's a 2x speed difference betweem the inner and outer edges),
etc

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Lossy interrupts on x86_64

2007-09-20 Thread Jesse Barnes

On Thursday, September 20, 2007, Thomas Gleixner wrote:
> On Thu, 2007-09-20 at 12:22 -0700, Jesse Barnes wrote:
> > > Eeek, that sounds scary. Can you add "highres=off" as well ?
> >
> > FWIW I just tried your linux-2.6-hires tree with the attached
> > config and still see the problem.  It doesn't look like NO_HZ is
> > even an option in that tree...
>
> Right, that's a 2.6-hrt update tree for Linus to pull. The 64 bit
> parts are not in there. It's basically Linus + some fixes.

Arg, looks like this is actually a DRM problem, but it doesn't exist in 
the DRM upstream tree, only the upstream kernel tree.  I've only seen 
it on 965 chips though, and they have other vblank related problems, so 
I won't worry about it for 2.6.23 proper.

Thanks,
Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham

Hi Andrew.

On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> Seems like good enough for -mm to me.
> 
>   Pavel

Andrew, if I recall correctly, you said a while ago that you didn't want 
another hibernation implementation in the vanilla kernel. If you're going to 
consider merging this kexec code, will you also please consider merging 
TuxOnIce?

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

NFS on loopback locks up entire system(2.6.23-rc6)?

2007-09-20 Thread Chakri n

Hi,

I am testing NFS on loopback locks up entire system with 2.6.23-rc6 kernel.

I have mounted a local ext3 partition using loopback NFS (version 3)
and started my test program. The test program forks 20 threads
allocates 10MB for each thread, writes & reads a file on the loopback
NFS mount. After running for about 5 min, I cannot even login to the
machine. Commands like ps etc, hang in a live session.

The machine is a DELL 1950 with 4Gig of RAM, so there is plenty of RAM
& CPU to play around and no other io/heavy processes are running on
the system.

vmstat output shows no buffers are actually getting transferred in or
out and iowait is 100%.

[EMAIL PROTECTED] ~]# vmstat 1
procs ---memory-- ---swap-- -io --system--
-cpu--
 r  bswpd   free   buff   cache   si   so   bi   bo
in cs us sy id wa st
 0 24116 110080  11132 304566400 0 0   28  345  0
1  0 99  0
 0 24116 110080  11132 304566400 0 05  329  0
0  0 100  0
 0 24116 110080  11132 304566400 0 0   26  336  0
0  0 100  0
 0 24116 110080  11132 304566400 0 08  335  0
0  0 100  0
 0 24116 110080  11132 304566400 0 0   26  352  0
0  0 100  0
 0 24116 110080  11132 304566400 0 08  351  0
0  0 100  0
 0 24116 110080  11132 304566400 0 0   23  358  0
1  0 99  0
 0 24116 110080  11132 304566400 0 0   10  350  0
0  0 100  0
 0 24116 110080  11132 304566400 0 0   26  363  0
0  0 100  0
 0 24116 110080  11132 304566400 0 08  346  0
1  0 99  0
 0 24116 110080  11132 304566400 0 0   26  360  0
0  0 100  0
 0 24116 110080  11140 304565600 8 0   11  345  0
0  0 100  0
 0 24116 110080  11140 304566400 0 0   27  355  0
0  2 97  0
 0 24116 110080  11140 304566400 0 09  330  0
0  0 100  0
 0 24116 110080  11140 304566400 0 0   26  358  0
0  0 100  0


The following is the backtrace of
1. one of the threads of my test program
2. nfsd daemon and
3. a generic command like pstree, after the machine hangs:
-
crash> bt 3252
PID: 3252   TASK: f6f3c610  CPU: 0   COMMAND: "test"
 #0 [f6bdcc10] schedule at c0624a34
 #1 [f6bdcc84] schedule_timeout at c06250ee
 #2 [f6bdccc8] io_schedule_timeout at c0624c15
 #3 [f6bdccdc] congestion_wait at c045eb7d
 #4 [f6bdcd00] balance_dirty_pages_ratelimited_nr at c045ab91
 #5 [f6bdcd54] generic_file_buffered_write at c0457148
 #6 [f6bdcde8] __generic_file_aio_write_nolock at c04576e5
 #7 [f6bdce40] try_to_wake_up at c042342b
 #8 [f6bdce5c] generic_file_aio_write at c0457799
 #9 [f6bdce8c] nfs_file_write at f8c25cee
#10 [f6bdced0] do_sync_write at c0472e27
#11 [f6bdcf7c] vfs_write at c0473689
#12 [f6bdcf98] sys_write at c0473c95
#13 [f6bdcfb4] sysenter_entry at c0404ddf
EAX: 0004  EBX: 0013  ECX: a4966008  EDX: 0098
DS:  007b  ESI: 0098  ES:  007b  EDI: a4966008
SS:  007b  ESP: a5ae6ec0  EBP: a5ae6ef0
CS:  0073  EIP: b7eed410  ERR: 0004  EFLAGS: 0246
crash> bt 3188
PID: 3188   TASK: f74c4000  CPU: 1   COMMAND: "nfsd"
 #0 [f6836c7c] schedule at c0624a34
 #1 [f6836cf0] __mutex_lock_slowpath at c062543d
 #2 [f6836d0c] mutex_lock at c0625326
 #3 [f6836d18] generic_file_aio_write at c0457784
 #4 [f6836d48] ext3_file_write at ffd7
 #5 [f6836d64] do_sync_readv_writev at c0472d1f
 #6 [f6836e08] do_readv_writev at c0473486
 #7 [f6836e6c] vfs_writev at c047358e
 #8 [f6836e7c] nfsd_vfs_write at f8e7f8d7
 #9 [f6836ee0] nfsd_write at f8e80139
#10 [f6836f10] nfsd3_proc_write at f8e86afd
#11 [f6836f44] nfsd_dispatch at f8e7c20c
#12 [f6836f6c] svc_process at f89c18e0
#13 [f6836fbc] nfsd at f8e7c794
#14 [f6836fe4] kernel_thread_helper at c0405a35
crash> ps|grep ps
234  2   3  cb194000  IN   0.0   0  0  [khpsbpkt]
520  2   0  f7e18c20  IN   0.0   0  0  [kpsmoused]
   2859  1   2  f7f3cc20  IN   0.19600   2040  cupsd
   3340   3310   0  f4a0f840  UN   0.04360816  pstree
   3343   3284   2  f4a0f230  UN   0.04212944  ps
crash> bt 3340
PID: 3340   TASK: f4a0f840  CPU: 0   COMMAND: "pstree"
 #0 [e856be30] schedule at c0624a34
 #1 [e856bea4] rwsem_down_failed_common at c04df6c0
 #2 [e856bec4] rwsem_down_read_failed at c0625c2a
 #3 [e856bedc] call_rwsem_down_read_failed at c0625c96
 #4 [e856bee8] down_read at c043c21a
 #5 [e856bef0] access_process_vm at c0462039
 #6 [e856bf38] proc_pid_cmdline at c04a1bbb
 #7 [e856bf58] proc_info_read at c04a2f41
 #8 [e856bf7c] vfs_read at c04737db
 #9 [e856bf98] sys_read at c0473c2e
#10 [e856bfb4] sysenter_entry at c0404ddf
EAX: 0003  EBX: 0005  ECX: 0804dc58  EDX: 0062
DS:  007b  ESI: 0cba  ES:  007b  EDI: 0804e0e0
SS:  007b  ESP: bfa3afe8  EBP: bfa3d4f8

Re: 2.6.20 (XFS? related) crash after uptime of > 180 days during apt-get dist-upgrade on Debian Testing

2007-09-20 Thread David Chinner

On Wed, Sep 19, 2007 at 04:47:38AM -0400, Justin Piszcz wrote:
> On Mon, 17 Sep 2007, Justin Piszcz wrote:
> 
> >Including the XFS mailing list in here too because it may be an XFS bug 
> >looking at the call trace.
> >
> >System: Debian Testing
> >Kernel: 2.6.20
> >Config: Attached
> >
> >I was running apt-get dist-upgrade as I always do to get the latest 
> >packages upgraded and the kernel OOPS'd when it was upgrading 'tzdata' and 
> >the process went into D-state and I had to reboot.
> >
> >The config file is from 2.6.20 but it had been moved to a 2.6.22 directory 
> >for an upgrade, but all of the options have been left unchanged.
> >
> >Here is the *OOPS I captured via dmesg before I rebooted:
> >
> >
> 
> Also,
> 
> Not sure if this helps but when this happened, any file that was open()
> for read/write seem to have also been corrupted..

Is that all files, or just ones that were being changed?

> $ /usr/sbin/xfs_bmap -v myconfig.txt.orig
> myconfig.txt.orig:
>  EXT: FILE-OFFSET  BLOCK-RANGEAG AG-OFFSETTOTAL
>0: [0..7]:  64601112..64601119 14 (52040..52047)   8
> $ /usr/sbin/xfs_bmap -v myconfig.txt
> myconfig.txt:
>  EXT: FILE-OFFSET  BLOCK-RANGEAG AG-OFFSETTOTAL
>0: [0..7]:  64625720..64625727 14 (76648..76655)   8
> $ md5sum myconfig*
> db8c50ca2c86d2e757ecef1d6b3fcc69  myconfig.txt
> 09fb630623b3ae614511cef4c7a21063  myconfig.txt.orig
> $ file myconfig.txt myconfig.txt.orig
> myconfig.txt:  ASCII text
> myconfig.txt.orig: data
> $
> 
> $ strings -a myconfig.txt.orig
> $
> 
> $ od -c myconfig.txt.orig
> 000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 *
> 0003500  \0  \0  \0  \0  \0  \0
> 0003506
> 
> Seems like it was NULL'd out?

A single block of zeros - its possible that the crash occurred between
the allocation transaction and the data write - the allocation gets
replayed (along with the new file size), but the data write does
not (not journalled). This is one of the rarer "NULL files on crash"
failure modes fixed in 6.5.22.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux-kernel win free lifetime membership

2007-09-20 Thread Alejandro Grady

only +18 http://www.shenow.cn
Belinda Maurer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-20 Thread Joe Perches

On Thu, 2007-09-20 at 19:28 -0500, Rob Landley wrote:
> You convert printk(KERN_INFO, blah) to pr_INFO(blah)?

more or less.
printk(KERN_INFO foo) to pr_info(foo)
printk(KERN_EMERG foo) to pr_emerge(foo)
etc.

> I'm not finding pr_INFO with a grep on the files in 
> 2.6.23-rc7.

I haven't submitted them.

There's a lot of sensible resistance to what appears
to be churn.  Some of the resistance is historical,
merging large changes used to be much more painful
pre git, some is just simple resistance to change.

I started with submitting an add of pr_err to kernel.h
which Andrew Morton picked up for awhile, but dropped.

I've got a local tree with those changes.

for example:

KERN_EMERG  -> pr_emerg  is  ~100KB
KERN_ALERT  -> pr_alert  is   ~80KB
KERN_CRIT   -> pr_crit   is  ~200KB
KERN_NOTICE -> pr_notice is  ~400KB
KERN_INFO   -> pr_info   is ~2500KB

I intended to strip the "\n" trailer from the messages.

Back to the scripts:

In this case, there are multiple files.

A script that finds all the files that contain a search string, 
and a perl script that effectively s/search/replace/g on those files.

sed didn't work as well as perl here because I wanted to
play with perl a bit and many printk(KERN_ foo)
function calls are split across multiple lines.

I've still got to show a real use to this change set.

I believe controlling the interleaving of log messages
by having multiple statement printks have a start and end,
choosing specific message levels in compiled code,
and choosing to print file/function/line per compiled
code block might do, but more utility to the changes
is probably necessary before it could be applied.

cheers, Joe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 2/6] lockdep: validate rcu_dereference() vs rcu_read_lock()

2007-09-20 Thread Paul E. McKenney

On Thu, Sep 20, 2007 at 01:31:35PM -0400, Dmitry Torokhov wrote:
> On 9/19/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > On Wed, 19 Sep 2007 17:29:09 -0400 "Dmitry Torokhov"
> > <[EMAIL PROTECTED]> wrote:
> >
> > > On 9/19/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > > > On Wed, 19 Sep 2007 16:41:04 -0400 "Dmitry Torokhov"
> > > > <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > > If the IRQ handler does rcu_read_lock(),unlock() and the 
> > > > > > i8042_stop()
> > > > > > function does sync_rcu() instead of _sched(), it should be good 
> > > > > > again.
> > > > > > It will not affect anything else than the task that calls _stop(). 
> > > > > > And
> > > > > > even there the only change is that the sleep might be a tad longer.
> > > > >
> > > > > And the IRQ handler needs to do some extra job... Anyway, it looks -rt
> > > > > breaks synchronize_sched() and needs to have it fixed:
> > > > >
> > > > > "/**
> > > > >  * synchronize_sched - block until all CPUs have exited any 
> > > > > non-preemptive
> > > > >  * kernel code sequences.
> > > > >  *
> > > > >  * This means that all preempt_disable code sequences, including NMI 
> > > > > and
> > > > >  * hardware-interrupt handlers, in progress on entry will have 
> > > > > completed
> > > > >  * before this primitive returns."
> > > >
> > > > That still does as it says in -rt. Its just that the interrupt handler
> > > > will be preemptible so the guarantees it gives are useless.
> > >
> > > Please note "... including NMI and hardware-interrupt handlers ..."
> >
> > -rt doesn't run interrupt handlers in hardware irq context anymore.
> 
> OK, then what is the purpose of synchronize_sched() in -rt?

To wait for all preempt-disable, irq-disable, hard-irq, and SMI/NMI code
sequences to complete.

> You really need to provide users with a replacement. There are several
> drivers that use it and for example r8169 is not what you'd call a
> 'low performer'.

I did look at making a synchronize_all_irq() some time back, and all
the approaches I came up with at the time were busted.

But I just took another look, and I think I see a way to handle it.
Either that, or I simply forgot the way in which this approach is
broken...

I will stare at is some more.

> I guess I can switch i8042 to use synchronize_irq(). That still works
> in -rt, doesn't it? That still leaves atkbd...

Yep, looks that way to me.  The only difference that I can see is that
in -rt, concurrent synchronize_irq() calls on the same descriptor mean
that the guy that gets there second has to wait for the next interrupt
to happen.

> > > > > > I find it curious that a driver that is 'low performant' and does 
> > > > > > not
> > > > > > suffer lock contention pioneers locking schemes. I agree with
> > > > > > optimizing, but this is not the place to push the envelope.
> > > > >
> > > > > Please realize that evey microsecond wasted on a 'low performant'
> > > > > driver is taken from high performers and if we can help it why
> > > > > shouldn't we?
> > > >
> > > > sure, but the cache eviction caused by running the driver will have
> > > > more impact than the added rcu_read_{,un}lock() calls.
> > >
> > > Are you saying that adding rcu_read_{,un}lock() will help with cache
> > > eviction? How?
> >
> > No, I'm saying that its noise compared to the cache eviction overhead
> > it causes for others.
> 
> What about udelay(10)? It is probably also a noise but we shoudl not
> go and sprinkle it through drivers, should we? ;)

Agreed!

On the other hand, udelay(10) is more than two orders of magnitude
slower than an rcu_read_lock() / rcu_read_unlock() round trip in -rt,
and a full three orders of magnitude slower in CONFIG_PREEMPT.
As for non-CONFIG_PREEMPT, well, "free is a very good price".  ;-)

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH][VER 4] mspec: handle shrinking virtual memory areas

2007-09-20 Thread Cliff Wickman


Stress testing revealed the need for (yet more) revision. sorry.

This is a revision of Andrew's mspec-handle-shrinking-virtual-memory-areas.patch

Version 4: clear/release fetchop pages only when vma_data is no longer shared

The vma_data structure may be shared by vma's from multiple tasks, with
no way of knowing which areas are shared or not shared, so release/clear
pages only when the refcount (of vma's) goes to zero.

Diffed against 2.6.23-rc7

Signed-off-by: Cliff Wickman <[EMAIL PROTECTED]>
---
 drivers/char/mspec.c |   26 --
 1 file changed, 8 insertions(+), 18 deletions(-)

Index: linus.070920/drivers/char/mspec.c
===
--- linus.070920.orig/drivers/char/mspec.c
+++ linus.070920/drivers/char/mspec.c
@@ -155,23 +155,22 @@ mspec_open(struct vm_area_struct *vma)
  * mspec_close
  *
  * Called when unmapping a device mapping. Frees all mspec pages
- * belonging to the vma.
+ * belonging to all the vma's sharing this vma_data structure.
  */
 static void
 mspec_close(struct vm_area_struct *vma)
 {
struct vma_data *vdata;
-   int index, last_index, result;
+   int index, last_index;
unsigned long my_page;
 
vdata = vma->vm_private_data;
 
-   BUG_ON(vma->vm_start < vdata->vm_start || vma->vm_end > vdata->vm_end);
+   if (!atomic_dec_and_test(>refcnt))
+   return;
 
-   spin_lock(>lock);
-   index = (vma->vm_start - vdata->vm_start) >> PAGE_SHIFT;
-   last_index = (vma->vm_end - vdata->vm_start) >> PAGE_SHIFT;
-   for (; index < last_index; index++) {
+   last_index = (vdata->vm_end - vdata->vm_start) >> PAGE_SHIFT;
+   for (index=0; index < last_index; index++) {
if (vdata->maddr[index] == 0)
continue;
/*
@@ -180,20 +179,12 @@ mspec_close(struct vm_area_struct *vma)
 */
my_page = vdata->maddr[index];
vdata->maddr[index] = 0;
-   spin_unlock(>lock);
-   result = mspec_zero_block(my_page, PAGE_SIZE);
-   if (!result)
+   if (!mspec_zero_block(my_page, PAGE_SIZE))
uncached_free_page(my_page);
else
printk(KERN_WARNING "mspec_close(): "
-  "failed to zero page %i\n",
-  result);
-   spin_lock(>lock);
+  "failed to zero page %ld\n", my_page);
}
-   spin_unlock(>lock);
-
-   if (!atomic_dec_and_test(>refcnt))
-   return;
 
if (vdata->flags & VMD_VMALLOCED)
vfree(vdata);
@@ -201,7 +192,6 @@ mspec_close(struct vm_area_struct *vma)
kfree(vdata);
 }
 
-
 /*
  * mspec_nopfn
  *
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

2007-09-20 Thread Andrew Morton

On Wed, 19 Sep 2007 15:38:19 +0100 (BST)
"Maciej W. Rozycki" <[EMAIL PROTECTED]> wrote:

>  Keep track of disable_irq_nosync() invocations and call enable_irq() the 
> right number of times if work has been cancelled that would include them.
> 
> Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>
> ---
>  Now that the call to flush_work_keventd() (problematic because of 
> rtnl_mutex being held) has been replaced by cancel_work_sync() another 
> issue has arisen and been left unresolved.  As the MDIO bus cannot be 
> accessed from the interrupt context the PHY interrupt handler uses 
> disable_irq_nosync() to prevent from looping and schedules some work to be 
> done as a softirq, which, apart from handling the state change of the 
> originating PHY, is responsible for reenabling the interrupt.  Now if the 
> interrupt line is shared by another device and a call to the softirq 
> handler has been cancelled, that call to enable_irq() never happens and 
> the other device cannot use its interrupt anymore as its stuck disabled.
> 
>  I decided to use a counter rather than a flag because there may be more 
> than one call to phy_change() cancelled in the queue -- a real one and a 
> fake one triggered by free_irq() if DEBUG_SHIRQ is used, if nothing else.  
> Therefore because of its nesting property enable_irq() has to be called 
> the right number of times to match the number disable_irq_nosync() was 
> called and restore the original state.  This DEBUG_SHIRQ feature is also 
> the reason why free_irq() has to be called before cancel_work_sync().
> 
>  While at it I updated the comment about phy_stop_interrupts() being 
> called from `keventd' -- this is no longer relevant as the use of 
> cancel_work_sync() makes such an approach unnecessary.  OTOH a similar 
> comment referring to flush_scheduled_work() in phy_stop() still applies as 
> using cancel_work_sync() there would be dangerous.
> 
>  Checked with checkpatch.pl and at the run time (with and without 
> DEBUG_SHIRQ).

You always put boring, crappy, insufficient text in the for-the-changelog
section and interesting, useful, sufficient text in the not-for-the-changelog
section.

But you can't fool me!  I have an editor and I fix it up.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-20 Thread Jordan Crouse

Chuck Ebbert wrote:

> On 09/20/2007 08:32 AM, Joerg Pommnitz wrote:
>> Hello all,
>> yesterday I tried to boot a kernel built from the current wireless-dev git
>> tree (ath5k branch)
>> on a MSEP800/A board (see http://www.milesie.co.uk/pdf/MSEP800.pdf). The
>> board
>> contains an AMD Geode LX800 CPU.
>> The wireless-dev tree is up to date with Linus kernel 2.6.23-rc6.
>> 
>> Attached is a photographic screen shot. The EIP value of c0378dd6 seems to
>> correspond with the
>> reserve_bootmem_core from System.map:
>> 
>> c0378d51 t free_bootmem_core
>> c0378da7 T free_bootmem
>> c0378db2 T free_bootmem_node
>> c0378dba t reserve_bootmem_core
>> c0378e14 T reserve_bootmem
>> c0378e1f T reserve_bootmem_node
>>

> Can you post disassembled code for that function?

Its hitting a bug - specifically (from bootmem.c:125):
BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);

I hit this problem on a db800 last week.  It went away with a newer
version of the BIOS, which doesn't help Joerg any, since its a different
board (though I think it is the same BIOS vendor).  Other BIOSes work
just fine with the same kernel image (including known troublemakers like
LinuxBIOS).  I believe that 2.6.22 was good, so some change must
have come along in 2.6.23-pre to cause the pain.  Or, it may have exposed
old breakage in the BIOS that was later repaired.

I'll do the math to figure out whats happening - and I'll check the release
notes to see what changed in the BIOS between the failing and working
version.  If anybody familiar with arch/i386 can think of something
new in the kernel that may have precipitated this, do let me know. :)

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] Don't truncate /proc/PID/environ at 4096 characters

2007-09-20 Thread Andrew Morton

On Wed, 19 Sep 2007 14:35:29 +0100
"James Pearson" <[EMAIL PROTECTED]> wrote:

> 
> From: James Pearson <[EMAIL PROTECTED]>
> 
> /proc/PID/environ currently truncates at 4096 characters, patch based on 
> the /proc/PID/mem code.

patch needs to be carefully reviewed from the security POV (ie: permissions)
as well as for correctness.  Does anyone have time to do that?

> Signed-off-by: James Pearson <[EMAIL PROTECTED]>
> 
> --- ./fs/proc/base.c.dist 2007-09-19 12:29:46.244929651 +0100
> +++ ./fs/proc/base.c  2007-09-19 12:36:18.155648760 +0100
> @@ -202,27 +202,6 @@ static int proc_root_link(struct inode *
>(task->state == TASK_STOPPED || task->state == TASK_TRACED) && \
>security_ptrace(current,task) == 0))
>  
> -static int proc_pid_environ(struct task_struct *task, char * buffer)
> -{
> - int res = 0;
> - struct mm_struct *mm = get_task_mm(task);
> - if (mm) {
> - unsigned int len;
> -
> - res = -ESRCH;
> - if (!ptrace_may_attach(task))
> - goto out;
> -
> - len  = mm->env_end - mm->env_start;
> - if (len > PAGE_SIZE)
> - len = PAGE_SIZE;
> - res = access_process_vm(task, mm->env_start, buffer, len, 0);
> -out:
> - mmput(mm);
> - }
> - return res;
> -}
> -
>  static int proc_pid_cmdline(struct task_struct *task, char * buffer)
>  {
>   int res = 0;
> @@ -740,6 +719,79 @@ static const struct file_operations proc
>   .open   = mem_open,
>  };
>  
> +static ssize_t environ_read(struct file *file, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
> + char *page;
> + unsigned long src = *ppos;
> + int ret = -ESRCH;
> + struct mm_struct *mm;
> + size_t max_len;
> +
> + if (!task)
> + goto out_no_task;
> +
> + if (!ptrace_may_attach(task))
> + goto out;
> +
> + ret = -ENOMEM;
> + page = (char *)__get_free_page(GFP_TEMPORARY);

Now I wonder what inspired you to reach for GFP_TEMPORARY?  Perhaps the
fact that it is crappily named and undocumented.

This should be GFP_KERNEL - the page you're allocating here is not
reclaimable by the VM.

> + if (!page)
> + goto out;
> +
> + ret = 0;
> +
> + mm = get_task_mm(task);
> + if (!mm)
> + goto out_free;
> +
> + max_len = (count > PAGE_SIZE) ? PAGE_SIZE : count;
> +
> + while (count > 0) {
> + int this_len, retval;
> +
> + this_len = mm->env_end - (mm->env_start + src);
> +
> + if (this_len <= 0)
> + break;
> +
> + if (this_len > max_len)
> + this_len = max_len;
> +
> + retval = access_process_vm(task, (mm->env_start + src),
> + page, this_len, 0);
> +
> + if (retval <= 0) {
> + ret = retval;
> + break;
> + }
> +
> + if (copy_to_user(buf, page, retval)) {
> + ret = -EFAULT;
> + break;
> + }
> +
> + ret += retval;
> + src += retval;
> + buf += retval;
> + count -= retval;
> + }

Now that's a funky loop.  Someone please convince me that there is no way
in which `count - retval' can ever go negative (ie: huge positive).


> + *ppos = src;
> +
> + mmput(mm);
> +out_free:
> + free_page((unsigned long) page);
> +out:
> + put_task_struct(task);
> +out_no_task:
> + return ret;
> +}
> +
> +static const struct file_operations proc_environ_operations = {
> + .read   = environ_read,
> +};
> +
>  static ssize_t oom_adjust_read(struct file *file, char __user *buf,
>   size_t count, loff_t *ppos)
>  {
> @@ -2092,7 +2144,7 @@ static const struct pid_entry tgid_base_
>   DIR("task",   S_IRUGO|S_IXUGO, task),
>   DIR("fd", S_IRUSR|S_IXUSR, fd),
>   DIR("fdinfo", S_IRUSR|S_IXUSR, fdinfo),
> - INF("environ",S_IRUSR, pid_environ),
> + REG("environ",S_IRUSR, environ),
>   INF("auxv",   S_IRUSR, pid_auxv),
>   INF("status", S_IRUGO, pid_status),
>   INF("limits", S_IRUSR, pid_limits),
> @@ -2421,7 +2473,7 @@ out_no_task:
>  static const struct pid_entry tid_base_stuff[] = {
>   DIR("fd",S_IRUSR|S_IXUSR, fd),
>   DIR("fdinfo",S_IRUSR|S_IXUSR, fdinfo),
> - INF("environ",   S_IRUSR, pid_environ),
> + REG("environ",   S_IRUSR, environ),
>   INF("auxv",  S_IRUSR, pid_auxv),
>   INF("status",S_IRUGO, pid_status),
>   INF("limits",S_IRUSR, pid_limits),
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Andi Kleen

On Thu, Sep 20, 2007 at 06:31:14PM -0500, Matt Mackall wrote:
> On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote:
> > > It's broken for me.
> > > 
> > > 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
> > >   -rc4-mm1: solid lock on X shutdown, random solid locks about
> > > once every four hours
> > >   -rc6-mm1: solid lock on X startup
> > >+your patch: screen goes black, turns off and on a few times during
> > > startup, can reboot with sysrq-b
> > 
> > Does it work with my simple dumb patch instead of Dave's ? 
> 
> Sorry, forgot to mention: your one-liner flush also doesn't work (same
> behavior).
> 
> I suspect I'm tripping two things and the flushing thing fixes one but
> not the other.

Full bisect needed then I guess. Ok as a short cut you could perhaps
the cpa-* patches first (might need to drop some later depending 
patches), then the drm and agp trees.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING

2007-09-20 Thread Len Brown

On Thursday 20 September 2007 17:55, Linus Torvalds wrote:
> 
> On Thu, 20 Sep 2007, Linus Torvalds wrote:
> > 
> > (Btw, the above commit message points to just my response with a testing 
> > patch to the real email: the actual explanation of the INSANE ordering is 
> > from Len Brown in
> > 
> > 
> > https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html
> > 
> > and there Len claims that we *must* wake up CPU's early).
> 
> ..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in 
> turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 
> 
> Howerver, it seems that bugzilla entry may just be bogus. It talks about 
> "it appears that some firmware in the future may depend on that sequence 
> for correction operation"
> 
> Len, Shaohua, what are the real issues here? 

Intel's reference BIOS for Core Duo performs some re-initialization
in _WAK that will get blow away if INIT follows _WAK.
IIR, it is related to re-initializing the thermal sensors.
I opened bug 5651 when the BIOS team informed me of this issue.

Yes, bringing a processor offline and then online again w/o
an intervening suspend or reset would not evaluate _WAK,
and thus may still run into the issue.

I don't know if this is a widespread issue and a commonly
used BIOS hook, or if it is specific to certain processors.

-Len

> It would indeed be nice if we could just take CPU's down early (while 
> everything is working), and run the whole suspend code with just one CPU, 
> rather than having to worry about the ordering between CPU and device 
> takedown.
> 
> That said, at least with STR, the situation is:
> 
>  1) suspend_console
>  2)   device_suspend(PMSG_SUSPEND)  (==   ->suspend)
>  3) disable_nonboot_cpus()
>  4)   device_power_down(PMSG_SUSPEND) (==   ->suspend_late)
>  5) pm_ops->enter()
>  6)   device_power_up() (==   ->resume_early)
>  7) enable_nonboot_cpus()
>  8) pm_finish()
>  9)   device_resume()   (==   ->resume
> 10) resume_console
> 
> So if we agree that things like timers etc should *never* be suspended by 
> the early suspend, and *always* use "suspend_late/resume_early", then at 
> least STR should be ok.
> 
> And I think that's a damn reasonable thing to agree on: timers (and 
> anything else that CPU shutdown/bringup could *possibly* care about) 
> should be considered core enough that they had better be on the 
> suspend_late/resume_early list.
> 
> Thomas, Rafael, can you verify that at least STR is ok in this respect?
> 
>   Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/11] eCryptfs: Replace encrypt, decrypt, and inode size write

2007-09-20 Thread Erez Zadok

In message <[EMAIL PROTECTED]>, Michael Halcrow writes:
> On Wed, Sep 19, 2007 at 10:46:26PM -0700, Andrew Morton wrote:
> (from ecryptfs_encrypt_page()):
> > > + enc_extent_virt = kmalloc(PAGE_CACHE_SIZE, GFP_USER);
> > 
> > I'd have thought that alloc_page() would be nicer.  After all, we _are_
> > treating it as a page, and not as a random piece of memry.
> >
> > > + if (!enc_extent_virt) {
> > > + rc = -ENOMEM;
> > > + ecryptfs_printk(KERN_ERR, "Error allocating memory for "
> > > + "encrypted extent\n");
> > > + goto out;
> > > + }
> > > + enc_extent_page = virt_to_page(enc_extent_virt);
> > 
> > And then we don't need this.
> 
> If neither kmap() nor kmap_atomic() can be safely used to get a
> virtual address to pass to vfs_write(), then I do not know what my
> other options are here.

kmap_atomic is intended for short-lived code sections, where you may not
sleep, so you can't do a vfs_read/write in b/t kmap/kunmap_atomic.

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Matt Mackall

On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote:
> > It's broken for me.
> > 
> > 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
> >   -rc4-mm1: solid lock on X shutdown, random solid locks about
> > once every four hours
> >   -rc6-mm1: solid lock on X startup
> >+your patch: screen goes black, turns off and on a few times during
> > startup, can reboot with sysrq-b
> 
> Does it work with my simple dumb patch instead of Dave's ? 

Sorry, forgot to mention: your one-liner flush also doesn't work (same
behavior).

I suspect I'm tripping two things and the flushing thing fixes one but
not the other.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-20 Thread Rob Landley

On Thursday 20 September 2007 5:14:25 pm Joe Perches wrote:
> On Thu, 2007-09-20 at 14:58 -0700, Tim Bird wrote:
> > Given that there are about 60,000 printks in the kernel (and that's
> > not counting wrappers like dprintk() and other locally-defined
> > functions and macros) it would be a huge task to examine the code
> > and differentiate strings that really start a new log message
> > (and thus should have an attached log level) and strings
> > that don't.
>
> I've converted most all of that treewide.
>
> printk(KERN_ to pr_(
>
> It's pretty automated.

Perl, being a write-only language, does not help my poor little brain 
understand what's going on.  You convert printk(KERN_INFO, blah) to 
pr_INFO(blah)?  I'm not finding pr_INFO with a grep on the files in 
2.6.23-rc7.  Is this something you added?

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Ext4: Uninitialized Block Groups

2007-09-20 Thread Andrew Morton

On Tue, 18 Sep 2007 17:25:31 -0700
Avantika Mathur <[EMAIL PROTECTED]> wrote:

> In pass1 of e2fsck, every inode table in the fileystem is scanned and 
> checked, 
> regardless of whether it is in use.  This is this the most time consuming 
> part 
> of the filesystem check.  The unintialized block group feature can greatly 
> reduce e2fsck time by eliminating checking of uninitialized inodes.  
> 
> With this feature, there is a a high water mark of used inodes for each block 
> group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
> group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
> of each group descriptor is used to ensure that corruption in the group
> descriptor's bit flags does not cause incorrect operation.

This needed a few fixups due to conflicts with
ext2-ext3-ext4-add-block-bitmap-validation.patch but they were pretty
straightforward.  Please check that the result is OK.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/11] Text Edit Lock for 2.6.23-rc6-mm1

2007-09-20 Thread Andrew Morton

On Tue, 18 Sep 2007 17:06:01 -0400
Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:

> Here are the text edit lock patches ported to 2.6.23-rc6-mm1.

I think I'll duck these one more time.  There was a bit of followup
and for now I'd prefer to concentrate on obviously-safe stuff and
stabilisation of the current 2.6.24 queue.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-20 Thread Indan Zupancic

On Fri, September 21, 2007 01:18, Rob Landley wrote:
> On Thursday 20 September 2007 4:26:13 pm Indan Zupancic wrote:
>> A quick scroll through a vmlinux binary shows that there are quite a
>> lot areas consisting only of some repeated pattern. Mostly 0x00, but
>> also 0x90 and ".GCC: (GNU) 4.2.1.". Getting rid of those would save
>> something between 50 and 100KB.
>
> Worse, if you feed an absolute path to O= when you build the kernel out of
> tree, then it uses absolute paths for all the __FILE__ strings and that makes
> kernel BIG.  (Did that by accident a while back.)  Too bad there's no way
> to keep the __FILE__ strings compressed at runtime and gunzip them as needed
> like busybox does with help messages... :)

I suspect that can be fixed by changing the built system. How can using O=
change the source file path anyway? That seems unnecessary.

It seems to be worse, full pathnames are also used when giving a relative path.
(I'm using O=../obj/).

On the other hand, it doesn't seem to cause that much bloat here:

$ strings vmlinux | grep /home/ |wc
119 1816400

CC'ing Sam Ravnborg, perhaps he has some ideas.

Greetings,

Indan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Andi Kleen

> It's broken for me.
> 
> 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
>   -rc4-mm1: solid lock on X shutdown, random solid locks about
> once every four hours
>   -rc6-mm1: solid lock on X startup
>+your patch: screen goes black, turns off and on a few times during
> startup, can reboot with sysrq-b

Does it work with my simple dumb patch instead of Dave's ? 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets

2007-09-20 Thread Nagendra Tomar

--- Davide Libenzi <[EMAIL PROTECTED]> wrote:

> That's not what POLLOUT means in the Unix meaning. POLLOUT indicates the 
> ability to write, and it is not meant as to signal every time a packet 
> (skb) is sent on the wire (and the buffer released).

Aren't they both the same ? Everytime an incoming ACK frees up a buffer
from the retransmit queue, the writability condition is freshly asserted,
much the same way as the readability condition is asserted everytime a 
new data is queued in the socket receive queue (irrespective of 
whether there was data already waiting to be read in the receive queue).

This difference in meaning of POLLOUT only arises in the ET case, which was
not what traditional Unix poll referred to. 

Since its a new game the rules can be modified (ofcourse based on the 
merits i.e. usability)

Thanx,
Tomar

  ___ 
Want ideas for reducing your carbon footprint? Visit Yahoo! For Good  
http://uk.promotions.yahoo.com/forgood/environment.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Chuck Ebbert

On 09/20/2007 06:36 PM, Andrew Morton wrote:
> 
> So the question is, why do we have large amounts of dirty pages for one
> disk which appear to be sitting there not getting written?
> 
> Do we know if there's any writeout at all happening when the system is in
> this state?
> 
> I guess it's possible that the dirty inodes on the "other" disk got
> themselves onto the wrong per-sb inode list, or are on the correct list,
> but in the correct place.  If so, these:
> 
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
> writeback-fix-comment-use-helper-function.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch
> writeback-fix-periodic-superblock-dirty-inode-flushing.patch
> 
> from 2.6.23-rc6-mm1 should help.

Yikes! Simple fixes would be better.

Patch that is confirmed to fix the problem for this user is below, but
that one could cause other problems. I was looking for some band-aid
could be shown to be harmless...

http://lkml.org/lkml/2007/8/2/89:

--
--- linux-2.6.22.1/mm/page-writeback.c.orig 2007-07-30 16:36:09.0 
+0100
+++ linux-2.6.22.1/mm/page-writeback.c  2007-07-31 16:26:43.0 +0100
@@ -250,6 +250,8 @@ static void balance_dirty_pages(struct a
pages_written += write_chunk - wbc.nr_to_write;
if (pages_written >= write_chunk)
break;  /* We've done our duty */
+   if (!wbc.encountered_congestion && wbc.nr_to_write > 0)
+   break;  /* didn't find enough to do */
}
congestion_wait(WRITE, HZ/10);
}

 
> 
> Did anyone try running /bin/sync when the system is in this state?
> 

Reporter is in the CC: list.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Andrew Morton

On Thu, 20 Sep 2007 18:04:38 -0400
Chuck Ebbert <[EMAIL PROTECTED]> wrote:

> > 
> >> Can we get some kind of band-aid, like making the endless 'for' loop in
> >> balance_dirty_pages() terminate after some number of iterations? Clearly
> >> if we haven't written "write_chunk" pages after a few tries, *and* we
> >> haven't encountered congestion, there's no point in trying forever...
> > 
> > Did my above questions get looked at?
> > 
> > Is anyone able to reproduce this?
> > 
> > Do we have a clue what's happening?
> 
> There are a ton of dirty pages for one disk, and zero or close to zero dirty
> for a different one. Kernel spins forever trying to write some arbitrary
> minimum amount of data ("write_chunk" pages) to the second disk...

That should be OK.  The caller will sit in that loop, sleeping in
congestion_wait(), polling the correct backing-dev occasionally and waiting
until the dirty limits subside to an acceptable limit, at which stage this:

if (nr_reclaimable +
global_page_state(NR_WRITEBACK)
<= dirty_thresh)
break;

will happen and we leave balance_dirty_pages().

That's all a bit crappy if the wrong races happen and some other task is
somehow exceeding the dirty limits each time this task polls them.  Seems
unlikely that such a condition would persist forever.

So the question is, why do we have large amounts of dirty pages for one
disk which appear to be sitting there not getting written?

Do we know if there's any writeout at all happening when the system is in
this state?

I guess it's possible that the dirty inodes on the "other" disk got
themselves onto the wrong per-sb inode list, or are on the correct list,
but in the correct place.  If so, these:

writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
writeback-fix-comment-use-helper-function.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch
writeback-fix-periodic-superblock-dirty-inode-flushing.patch

from 2.6.23-rc6-mm1 should help.

Did anyone try running /bin/sync when the system is in this state?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets

2007-09-20 Thread Davide Libenzi

On Thu, 20 Sep 2007, Nagendra Tomar wrote:

> 
> --- Davide Libenzi <[EMAIL PROTECTED]> wrote:
> 
> > Looking back at it, I think the current TCP code is right, once you look 
> > at the "event" to be a output buffer full->with_space transition.
> > If you drop an fd inside epoll with EPOLLOUT|EPOLLET and you get an event 
> > (free space on the output buffer), if you do not consume it (say a 
> > tcp_sendmsg that re-fill the buffer), you can't see other OUT event 
> > anymore since they happen on the full->with_space transition.
> > Yes, I know, the read size (EPOLLIN) works differently and you get an 
> > event for every packet you receive. And yes, I do not like asymmetric 
> > things. But that does not make the EPOLLOUT|EPOLLET wrong IMO.
> > 
> 
> I agree that ET means the event should happen at the transition
> from nospace->space condition, but isn't the other case (event is 
> delivered every time the event actually happens) more usable.
> Also the epoll man page says so
> 
> "... Edge Triggered event distribution delivers events only when 
> events happens on the  monitored file."
> 
> This serves the purpose of ET (reducing the number of poll events) and
> at the same time makes userspace coding easier. My userspace code
> has the liberty of deciding when it can write to the socket. f.e. the
> sendfile buffer management example that I quoted in my earlier post
> will be difficult with the current ET|POLLOUT behaviour. I cannot 
> write in full-buffer units. I'll ve to write partial buffers just to 
> fill the TCP writeq which is needed to trigger the event.

That's not what POLLOUT means in the Unix meaning. POLLOUT indicates the 
ability to write, and it is not meant as to signal every time a packet 
(skb) is sent on the wire (and the buffer released).
In your particular application, you could simply split the sendfile into 
appropriately sized chunks, and handle the buffer realease in there.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets

2007-09-20 Thread Nagendra Tomar


--- Davide Libenzi <[EMAIL PROTECTED]> wrote:

> 
> Unfortunately f_op->poll() does not let the caller to specify the events 
> it's interested in, that would allow to split send/recevie wait queues and 
> better detect read/write cases.
> The detection of a waitqueue_active(->sk_wr_sleep) would work fine in 
> detecting is someone is actually waiting for a write, w/out the false 
> positives triggered by the read-waiters.
> That would be a very sane thing to do, but would require a big change 
> to all the ->poll around (that could be automated by a script - devices 
> not caring about the events hint can just continue to use the single queue 
> like they currently do), and a more critical and gradual change of all the 
> devices that wants to take advantage of it.
> That way, no more magic bits are needed, and a simple waitqueue_active() 
> would tell you if someone is waiting for write-space events.
> 

I like this.





  ___ 
To help you stay safe and secure online, we've developed the all new Yahoo! 
Security Centre. http://uk.security.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Dave Airlie

> > But now I'm talking about another issue -- a regression since rc4-mm1, 
> > where X
> > server is unable to bind agp memory (those x logs above). The clflush issue 
> > has
> > solved andi in
> > http://lkml.org/lkml/2007/9/19/334
> > recently
>
> Tried that, my laptop still bricks the instant X starts up and the NVidia 
> driver
> tries to initialize.  Not even sysrq-foo works. Time to power-cycle.
>

I'd expect the binary to be doing something stupid with it's flushing
and relying on the kernel to do something it no longer does.. so this
is most likely a case of not fixable..

Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING

2007-09-20 Thread Rafael J. Wysocki

On Friday, 21 September 2007 00:05, Thomas Gleixner wrote:
> Linus,
> 
> On Thu, 2007-09-20 at 14:55 -0700, Linus Torvalds wrote:
> > And I think that's a damn reasonable thing to agree on: timers (and 
> > anything else that CPU shutdown/bringup could *possibly* care about) 
> > should be considered core enough that they had better be on the 
> > suspend_late/resume_early list.
> > 
> > Thomas, Rafael, can you verify that at least STR is ok in this respect?
> 
> -ETOOTIRED led me too a wrong conclusion, but still it is a valuable
> hint that this change is making things work again.

Yes, it is.

> I need to go down into the details of the swsusp_suspend() code path to
> figure out, what's the root cause. 

If you need any help from me with that, please let me know.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING

2007-09-20 Thread Rafael J. Wysocki

Thomas,

On Thursday, 20 September 2007 23:53, Thomas Gleixner wrote:
> Rafael,
> 
> On Thu, 2007-09-20 at 23:45 +0200, Rafael J. Wysocki wrote:
> > > We disable everything in device_suspend()
> > 
> > No, we don't.  sysdevs are _not_ suspended in device_suspend().
> > They are suspended in device_power_down(), which is called
> > _after_ disable_nonboot_cpus() (from swsusp_suspend()).
> > 
> > > including timekeeping,
> > 
> > No, the timekeeping is suspended in device_power_down() (or at least it 
> > should
> > be).
> 
> Damn, you are right. Reading through 30 different logs confused me.
> 
> > >   enable_nonboot_cpus();
> > 
> > Actually, we can't do this here, because of ACPI and some interrupt handling
> > related problems.  Unfortunately, platform_finish() needs to go _after_
> > enable_nonboot_cpus() and device_resume() needs to go after 
> > platform_finish().
> > Analogously, disable_nonboot_cpus() has to go after platform_prepare().
> >
> > Otherwise, some systems will break.
> 
> Well, I don't buy this one. The system would break in the same way, when
> I take CPU#1 offline before I initiate the suspend.

I was referring to the resume part.  If we call enable_nonboot_cpus(), which
executes the _INI ACPI control method, after platform_finish(), which executes
the _WAK global ACPI control method, things will break.  That already happened
in the past, when the code ordering was different, AFAICS.

> > > and non-surprisingly the "my VAIO needs help from keyboard" problem went
> > > away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not
> > > work at all on my VAIO due to some yet not identified wreckage)
> > 
> > Hm, I really don't know why it helps, but that's not because of the 
> > timekeeping
> > suspend, IMO.
> 
> It is related. We rely on some subtle thing which is not up when we
> resume the non boot cpu.

Yes, it looks so.

> > > I did not yet look into the suspend to ram code, but I guess that there
> > > is an equivalent problem.
> > 
> > Yes, the code ordering is the same, but it's not totally wrong, IMHO.
> > 
> > > But I have no idea why this affects Andrews jinxed VAIO (UP machine),
> > > though I suspect that we have more timekeeping/timer depending code
> > > somewhere waiting to bite us.
> > 
> > That's possible.
> > 
> > > Also I still need to debug why the HIBERNATION_TEST code path (which has
> > > a msleep(5000) in it) does not fail,
> > 
> > See above. :-)
> 
> Yes. It makes sense. When I change the TEST code path to:
> 
> - printk("swsusp debug: Waiting for 5 seconds.\n");
> - msleep(5000);
> + printk("swsusp debug: before swsusp_suspend\n");
> + error = swsusp_suspend();
> 
> then I have the same effect as I get from real hibernation. And we
> actually shut down time keeping somewhere in that code path.
> 
> ACPI: PCI interrupt for device :00:1b.0 disabled
> swsusp debug: before swsusp_suspend
> Suspend timekeeping

Exactly.  timekeeping_suspend() is called from device_power_down(), which is
called from swsusp_suspend() (after disabling interrupts).

> swsusp: critical section: 
> swsusp: Need to copy 112429 pages
> swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876
> swsusp: critical section: done (112429 pages copied)
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#0.
> Resume timekeeping
> ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
> -> works fine
> 
> This is with my patch applied. Without that I get:
> 
> CPU1 is down
> swsusp debug: before swsusp_suspend
> Suspend timekeeping
> swsusp: critical section: 
> swsusp: Need to copy 112429 pages
> swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876
> swsusp: critical section: done (112429 pages copied)
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#0.
> Resume timekeeping
> Enabling non-boot CPUs
> --> Waits for ever until a key is pressed

Well, perhaps there's something else that we should suspend late and resume
early, but we don't?

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-20 Thread Rob Landley

On Thursday 20 September 2007 4:26:13 pm Indan Zupancic wrote:
> On Thu, September 20, 2007 22:38, Rob Landley wrote:
> > I've been playing with an idea for a while to improve the printk()
> > situation, but it's a more intrusive change than I've had time to bang
> > on.
> >
> > Right now, the first argument to printk() is a loglevel, but it's handled
> > via string concatenation.  I'd like to change that to be an integer, and
> > make it an actual comma-separated first argument.  (Mandatory, not
> > optional.)
> >
> > So instead of:
> >   printk(KERN_NOTICE "Fruit=%d\n", banana);
> > It would now be:
> >   printk(KERN_NOTICE, "Fruit=%d\n", banana);
> >
> > Change the header from:
> >   #define KERN_NOTICE "<5>"
> > to:
> >   #define KERN_NOTICE 5
>
> You have to jump through less hoops if you do:
>
> #define KERN_NOTICE 5,

Less change to the source, but the result is less obvious about what it's 
doing.  I'd personally rather have the churn than wind up with magic 
syntax...

> But the problem remains that there are printk's which don't have
> a KERN_* as the first argument. Those are also impossible to get
> rid off in this way, as the loglevel is unknown (and you don't want
> partially printed messages).
>
> So adding the comma is really needed and in addition all printk's
> without a loglevel should get one. Which clutters the code and may
> increase codesize.

It's ok to _explicitly_ not have a loglevel, and thus take a known default.  
The problem is printing out less than a full line, continuing it later, and 
not making obvious at compile time what the level of this chunk is.

> A quick scroll through a vmlinux binary shows that there are quite a
> lot areas consisting only of some repeated pattern. Mostly 0x00, but
> also 0x90 and ".GCC: (GNU) 4.2.1.". Getting rid of those would save
> something between 50 and 100KB.

Worse, if you feed an absolute path to O= when you build the kernel out of 
tree, then it uses absolute paths for all the __FILE__ strings and that makes 
kernel BIG.  (Did that by accident a while back.)  Too bad there's no way 
to keep the __FILE__ strings compressed at runtime and gunzip them as needed 
like busybox does with help messages... :)

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-20 Thread Joe Perches

On Thu, 2007-09-20 at 14:58 -0700, Tim Bird wrote:
> Given that there are about 60,000 printks in the kernel (and that's
> not counting wrappers like dprintk() and other locally-defined
> functions and macros) it would be a huge task to examine the code
> and differentiate strings that really start a new log message
> (and thus should have an attached log level) and strings
> that don't.

I've converted most all of that treewide.

printk(KERN_ to pr_(

It's pretty automated.

$ cat pr_alert.sh
#!/bin/sh
egrep -r -w --include=*.[ch] -l "printk[[:space:]]*\([[:space:]]*KERN_ALERT" * 
| \
 xargs perl ../cvt_pr.pl KERN_ALERT pr_alert

$ cat cvt_pr.pl
if ($#ARGV < 3) {
print "usage: KERN_ pr_ files...\n";
exit;
}

for ($i=2; $i<$#ARGV; $i++) {
PrintkSearchReplace($ARGV[$i], $ARGV[0], $ARGV[1]);
}

sub PrintkSearchReplace{
my($file, $search, $replace) = @_;

my $content = "";
local( $/ );
open( my $fh, $file ) or die "File not found '$file'\n";
$content = <$fh>;
close(my $fh);
my $orig = $content;

$content =~ 
s/\bprintk[[:space:]]*\([[:space:]]*${search}[[:space:]]*([^\"]*)\"([^\\]*)\\n\"/${replace}\(\1
 \"\2\"/mgs;
$content =~ s/\b${replace}\( /${replace}\(/mgs;

if ($orig ne $content)
{
open(my $fh, ">${file}") or die "Could not open '$file'\n";
print $fh $content;
close(my $fh);
} 
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Celinux-dev] Re: [Announce] Linux-tiny project revival

2007-09-20 Thread Gross, Mark



>-Original Message-
>From: [EMAIL PROTECTED] [mailto:celinux-dev-
>[EMAIL PROTECTED] On Behalf Of Rob Landley
>Sent: Thursday, September 20, 2007 3:02 PM
>To: Alexey Dobriyan
>Cc: Michael Opdenacker; [EMAIL PROTECTED]; CE Linux Developers
List;
>linux kernel
>Subject: [Celinux-dev] Re: [Announce] Linux-tiny project revival
>
>On Thursday 20 September 2007 2:58:44 pm Alexey Dobriyan wrote:
>> On Thu, Sep 20, 2007 at 03:38:42PM -0500, Rob Landley wrote:
>> > I've been playing with an idea for a while to improve the printk()
>> > situation, but it's a more intrusive change than I've had time to
bang
>> > on.
>> >
>> > Right now, the first argument to printk() is a loglevel, but it's
>handled
>> > via string concatenation.  I'd like to change that to be an
integer,
>and
>> > make it an actual comma-separated first argument.  (Mandatory, not
>> > optional.)
>> >
>> > So instead of:
>> >   printk(KERN_NOTICE "Fruit=%d\n", banana);
>> > It would now be:
>> >   printk(KERN_NOTICE, "Fruit=%d\n", banana);
>> >
>> > Change the header from:
>> >   #define KERN_NOTICE "<5>"
>> > to:
>> >   #define KERN_NOTICE 5
>> >
>> > Then you can change the printk guts to do something vaguely like
>> > (untested): #define printk(arg1, arg2, ...) actual_printk("<" #arg1
">"
>> > arg2, __VA_ARGS__)
>> >
>> > And so far no behavior has changed.  But now the _fun_ part is, you
can
>> > add a config symbol for "what is the minimum loglevel I care
about?"
>>
>> Given that
>> a) there're plenty of printks without any KERN_* bloat,
>
>> b) there're printks that SHOULD NOT have KERN_* bloat,
>
>So define a level 0 that doesn't prepend any level to the string, and
have
>the
>macro filter that out at the same default level it counts as now.
>(KERN_INFO, I think?)  The tests are all on contants which should
resolve
>at
>compile time and the dead code eliminator should zap it, even if the
macro
>gets more complicated it shouldn't result in a bigger binary.
>
>> c) debugging-by-printk method is widely used and this will force
>>additional typing, head-scratching  and swear words
>
>Because we never change kernel internal APIs.  Oh yeah.  Never happens.
>
>> d) time wasted on pointless discussions whether some particular
>>printk ALERT or CRIT
>
>Let me get this straight: you're objecting to actually making the
printk
>levels useful enough that developers start to care what they're set to,
>because then they might be motivated to want some of them changed?
>
>Make it useful, people might care, thus they might talk about it...
>
>Sorry, I'm still missing the downside here.
>
>> e) flag day for printk,
>
>That's the main reason I haven't played with it so far, although it
would
>be
>easy to define a new symbol (dprintk or some such, although I note
several
>drivers are already using that) and transition gradually.
>
>> I think that this idea is not worth it.
>
>*Shrug*.
>
>My problem is that switching off printk is the single biggest bloat
cutter
>in
>the kernel, yet it makes the resulting system very hard to support.  It
>combines a big upside with a big downside, and I'd like something in
>between.

What about getting even more hard core? 

Use compiler tricks to remove ALL the static printk string from the
kernel and replace the printk with something that outputs an decimal
index followed by tuples, of zero to N, hex-strings on a single line.
Then have the syslogd or some other utility take this cryptic output and
convolve it with a table (created at compile time) to re-create what
would have been dumped to the sys-log ring buffer.  This way you strip
out most of the static text from the kernel and yet can still re-create
the kernlog output.

At least as a post processing operation

Is this an old idea?  I'm guessing this has been at least proposed
before

--mgross

the 
>
>> > #define printk(level, str, ...) \
>> >   do { \
>> > if (level < CONFIG_PRINTK_DOICARE) \
>> >   actual_printk("<" #level ">" str, __VA_ARGS__); \
>> >   } while(0);
>> >
>> > Opinions?
>>
>> Ick.
>>
>>  Alexey "ignore_loglevel" Dobriyan
>
>But ignore_loglevel doesn't decrease the size of the _binary_.  That's
what
>we're talking about here with the -tiny tree.  Embedded developers want
to
>squeeze more code onto smaller flash/rom chips.  Setting
ignore_loglevel
>does
>prevent these messages from ever being emitted, but they're still in
the
>kernel image as dead weight.  It saves noise but doesn't save _space_.
>
>I'm proposing allowing an ignore_loglevel to remove the unused messages
at
>compile time so they don't take up space.  Doing that requires the
levels
>to
>be integers so they can be compared with < or >, and the remaining
changes
>follow logically.  (To me, anyway...)
>
>Rob
>--
>"One of my most productive days was throwing away 1000 lines of code."
>  - Ken Thompson.
>___
>Celinux-dev mailing list
>[EMAIL PROTECTED]
>http://tree.celinuxforum.org/mailman/listinfo/celinux-dev
-
To

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Jiri Slaby

On 09/20/2007 11:24 AM, Zhenyu Wang wrote:
> On 2007.09.20 17:33:45 +, Dave Airlie wrote:
>>> Maybe you are rather interested in these dmesg lines:
>>> Linux agpgart interface v0.102
>>> agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup 
>>> X.Org
>>> on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
>>> agpgart: Detected an Intel G33 Chipset.
>>> agpgart: Detected 8192K stolen memory.
>>> agpgart: AGP aperture is 256M @ 0xd000
>>> [drm] Initialized drm 1.1.0 20060810
>>> ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
>>> [drm] Initialized i915 1.6.0 20060119 on minor 0
>>> ...
>>> set status page addr 0x00033000
>>> agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800
>>> agpgart: Trying to insert into local/stolen memory
>>>
>>> So the problem is, that X passes too low start.
>>>
>>> The X log:
>>> http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old
> 
> Could you try current xf86-video-intel driver? just do
> git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel

It works! 3d problem, but it has maybe nothing to do with kernel:
$ glxinfo
name of display: :0.0
Unrecognized deviceID 29c2
X Error of failed request:  GLXBadContext
...

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-20 Thread Rob Landley

On Thursday 20 September 2007 4:58:54 pm Tim Bird wrote:
> Rob Landley wrote:
> > So instead of:
> >   printk(KERN_NOTICE "Fruit=%d\n", banana);
> > It would now be:
> >   printk(KERN_NOTICE, "Fruit=%d\n", banana);
> >
> > Change the header from:
> >   #define KERN_NOTICE "<5>"
> > to:
> >   #define KERN_NOTICE 5
> >
> > Then you can change the printk guts to do something vaguely like
> > (untested): #define printk(arg1, arg2, ...) actual_printk("<" #arg1 ">"
> > arg2, __VA_ARGS__)
>
> ...
>
> > [then] the
> > compiler's dead code eliminator zaps the printks you don't care about so
> > they don't bloat the kernel image.
>
> I agree in principal with the idea, but there are some major
> practical wrinkles that would have to be worked through.
>
> First, not all printks that are missing a log level should have one.
> People do stuff like this:
>
> printk(KERN_INFO "interesting info follows:");
> ...
> printk("var5: %d\n", var5);
>
> Or even things that evaluate to:
> printk("");
>
> The code inside printk currently has to examine the
> strings, looking for line feeds and inserting log levels.
>
> Given that there are about 60,000 printks in the kernel (and that's
> not counting wrappers like dprintk() and other locally-defined
> functions and macros) it would be a huge task to examine the code
> and differentiate strings that really start a new log message
> (and thus should have an attached log level) and strings
> that don't.

Hmmm.  The hard part isn't making printk(0,blah) mean the same as not having a 
log level message now, because the current logic already handles it.  The 
problem is that filtering continuations of previous messages involves knowing 
what log level the previous message was so you know whether or not to filter 
it.

Yeah, that would take some doing to untangle.  An incremental switchever (easy 
printks first, I.E. the ones that currently specify a loglevel) seems more 
strongly indicated...

That said, I started this by noting I haven't personally had time to bang on 
this since I thought of it.  You did ask for ideas. :)

>  -- Tim

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets

2007-09-20 Thread Nagendra Tomar

--- Davide Libenzi <[EMAIL PROTECTED]> wrote:

> Looking back at it, I think the current TCP code is right, once you look 
> at the "event" to be a output buffer full->with_space transition.
> If you drop an fd inside epoll with EPOLLOUT|EPOLLET and you get an event 
> (free space on the output buffer), if you do not consume it (say a 
> tcp_sendmsg that re-fill the buffer), you can't see other OUT event 
> anymore since they happen on the full->with_space transition.
> Yes, I know, the read size (EPOLLIN) works differently and you get an 
> event for every packet you receive. And yes, I do not like asymmetric 
> things. But that does not make the EPOLLOUT|EPOLLET wrong IMO.
> 

I agree that ET means the event should happen at the transition
from nospace->space condition, but isn't the other case (event is 
delivered every time the event actually happens) more usable.
Also the epoll man page says so

"... Edge Triggered event distribution delivers events only when 
events happens on the  monitored file."

This serves the purpose of ET (reducing the number of poll events) and
at the same time makes userspace coding easier. My userspace code
has the liberty of deciding when it can write to the socket. f.e. the
sendfile buffer management example that I quoted in my earlier post
will be difficult with the current ET|POLLOUT behaviour. I cannot 
write in full-buffer units. I'll ve to write partial buffers just to 
fill the TCP writeq which is needed to trigger the event.

Thanx,
Tomar

  ___
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Chuck Ebbert

On 09/20/2007 05:29 PM, Andrew Morton wrote:
> On Thu, 20 Sep 2007 17:07:15 -0400
> Chuck Ebbert <[EMAIL PROTECTED]> wrote:
> 
>> On 08/09/2007 12:55 PM, Andrew Morton wrote:
>>> On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler <[EMAIL PROTECTED]> 
>>> wrote:
>>>
 On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
> On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
> [...]
> I am also willing to try the patch posted by Richard.
 I want to give some update here:

 1. We finally hit the problem on a third system, with a total different
setup and hardware. However, again high I/O load caused the problem
and the affected filesystems were mounted with noatime.

 2. I installed a recompiled kernel with just the two line patch from
Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
days uptime now and counting. I believe the patch fixed the problem.
However, I will continue running "vmstat 1" and the endless loop of
"cat /proc/meminfo", just in case I am wrong.

>>> Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?
>>>
>>> If Richard's patch has indeed fixed it then this confirms that we're seeing
>>> contention over the dirty-memory limits.  Richard's patch isn't really the
>>> right one because it allows unlimited dirty-memory windup in some situations
>>> (large number of disks with small writes, or when we perform queue 
>>> congestion
>>> avoidance).
>>>
>>> As you're seeing this happening when multiple disks are being written to it 
>>> is
>>> possible that the per-device-dirty-threshold patches which recently went 
>>> into
>>> -mm (and which appear to have a bug) will fix it.
>>>
>>> But I worry that the stall appears to persist *forever*.  That would 
>>> indicate
>>> that we have a dirty-memory accounting leak, or that for some reason the
>>> system has decided to stop doing writeback to one or more queues (might be
>>> caused by an error in a lower-level driver's queue congestion state 
>>> management).
>>>
>>> If it is the latter, then it could be that running "sync" will clear the
>>> problem.  Temporarily, at least.  Because sync will ignore the queue 
>>> congestion
>>> state.
>>>
>> This is still a problem for people, and no fix is in sight until 2.6.24.
> 
> Any bugzilla urls or anything like that?

https://bugzilla.redhat.com/show_bug.cgi?id=249563

> 
>> Can we get some kind of band-aid, like making the endless 'for' loop in
>> balance_dirty_pages() terminate after some number of iterations? Clearly
>> if we haven't written "write_chunk" pages after a few tries, *and* we
>> haven't encountered congestion, there's no point in trying forever...
> 
> Did my above questions get looked at?
> 
> Is anyone able to reproduce this?
> 
> Do we have a clue what's happening?

There are a ton of dirty pages for one disk, and zero or close to zero dirty
for a different one. Kernel spins forever trying to write some arbitrary
minimum amount of data ("write_chunk" pages) to the second disk...

> 
> Is that function just spinning around, failing to start writeout against
> any pages at all?  If so, how come?

Yes, it spins forever. Just removing the "noatime" mount option for the
second disk generates enough dirty data to keep the system functional.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Matt Mackall

On Thu, Sep 20, 2007 at 11:42:29AM +1000, Dave Airlie wrote:
> > The code is broken anyways. If you free pages without flushing
> > them first some other innocent user allocating them will end up
> > with possible uncached pages for some time.
> >
> > Does this simple patch help?
> >
> 
> I've attached a more complicated patch that does a 2 stage effort to
> unmapping and freeing pages. My kernel no longer hangs with this
> patch...
> 
> Jiri can you confirm?

It's broken for me.

2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
  -rc4-mm1: solid lock on X shutdown, random solid locks about
once every four hours
  -rc6-mm1: solid lock on X startup
   +your patch: screen goes black, turns off and on a few times during
startup, can reboot with sysrq-b

Video is:

01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250
[Mobility FireGL 9000] (rev 02)


-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING

2007-09-20 Thread Thomas Gleixner

Linus,

On Thu, 2007-09-20 at 14:55 -0700, Linus Torvalds wrote:
> And I think that's a damn reasonable thing to agree on: timers (and 
> anything else that CPU shutdown/bringup could *possibly* care about) 
> should be considered core enough that they had better be on the 
> suspend_late/resume_early list.
> 
> Thomas, Rafael, can you verify that at least STR is ok in this respect?

-ETOOTIRED led me too a wrong conclusion, but still it is a valuable
hint that this change is making things work again. I need to go down
into the details of the swsusp_suspend() code path to figure out, what's
the root cause. 

Sorry for the noise, but I'm zooming in.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/11] eCryptfs: Convert mmap functions to use persistent file

2007-09-20 Thread Michael Halcrow

On Wed, Sep 19, 2007 at 10:50:57PM -0700, Andrew Morton wrote:
> On Mon, 17 Sep 2007 16:50:16 -0500 Michael Halcrow <[EMAIL PROTECTED]> wrote:
> > +ecryptfs_copy_up_encrypted_with_header(struct page *page,
> > +  struct ecryptfs_crypt_stat *crypt_stat)
> > +{
...
> > +   flush_dcache_page(page);
> > +   if (rc) {
> > +   ClearPageUptodate(page);
> > +   printk(KERN_ERR "%s: Error reading xattr "
> > +  "region; rc = [%d]\n", __FUNCTION__, rc);
> > +   goto out;
> > +   }
> > +   SetPageUptodate(page);
> 
> I don't know what sort of page `page' refers to here, but normally we only
> manipulate the page uptodate status under lock_page().

This is the page that eCryptfs gets via
ecryptfs_aops->ecryptfs_readpage(), so this should be okay. The
comment should make the fact that the page is locked explicit.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>

---
diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 04103ff..c6a8a33 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -111,7 +111,7 @@ static void set_header_info(char *page_virt,
  * ecryptfs_copy_up_encrypted_with_header
  * @page: Sort of a ``virtual'' representation of the encrypted lower
  *file. The actual lower file does not have the metadata in
- *the header.
+ *the header. This is locked.
  * @crypt_stat: The eCryptfs inode's cryptographic context
  *
  * The ``view'' is the version of the file that userspace winds up
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Celinux-dev] [Announce] Linux-tiny project revival

2007-09-20 Thread Rob Landley

On Wednesday 19 September 2007 4:28:05 pm Andrew Morton wrote:
> On Wed, 19 Sep 2007 11:03:09 -0700
>
> Tim Bird <[EMAIL PROTECTED]> wrote:
> > Recently, the CE Linux forum has been working to revive the
> > Linux-tiny project.  At OLS, I asked for interested parties
> > to volunteer to become the new maintainer for the Linux-tiny patchset.
>
> I volunteer!  Send patches to me, cc linux-kernel and celinuv-dev.
>
> Seriously, putting this stuff into some private patch collection should
> be a complete last resort - you should only do this with patches which
> you (and the rest of us) agree have no hope of ever getting into mainline.

History!

The -tiny tree started out as a separate patch kit of Matt Mackall's, which he 
stopped updating circa 2.6.14 because he didn't think keeping them out of 
tree was helping attract other developers, nor was it helping to get them 
inline.  He decided to focus on pushing the existing patches into mainline, 
and stop maintaining the out of tree patcheset for new releases.  His last 
post on the subject (to the linux-tiny mailing list) was a year ago:
http://selenic.com/pipermail/linux-tiny/2006-March/000314.html

But what happened is that most of the abandoned patches stopped applying to 
new kernels yet still weren't available in mainline a year later, so Tim and 
Michael have stepped in to revive the -tiny tree.  (Tim talked about this a 
bit at the CELF BOF at OLS, which is more acronyms than should really show up 
immediately after one another in any confersation, FYI.)

So yay new tree.  Tried without it, didn't work.  Broken up to make merging 
easier, but mainline will probably never _fully_ catch up, any more than 
it'll catch up with any of the other special-interest development trees.  
Making -tiny an .hg tree would be really really nice, though... :)

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING

2007-09-20 Thread Thomas Gleixner

Rafael,

On Thu, 2007-09-20 at 23:54 +0200, Rafael J. Wysocki wrote:
> > Hmm. This is close to the ordering we have in STR too.
> > 
> > I have some dim memory of there being some ACPI reason why it had to be 
> > done that way.
> 
> Yes.  We're executing _INI from the CPU initialization code and that shouldn't
> be done after _WAK, which is called from platform_finish().

If I tear down CPU#1 right before I tell the kernel to hibernate, then
the box must explode in the same way. It does not. On none of 4 tested
laptops. 

Of course only the jinxed VAIO one exposes the "please press a key
problem".

I need to follow down the swsusp_suspend() code path to figure out, why
this breaks the box.

tglx

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-20 Thread Indan Zupancic

On Thu, September 20, 2007 22:38, Rob Landley wrote:
> I've been playing with an idea for a while to improve the printk() situation,
> but it's a more intrusive change than I've had time to bang on.
>
> Right now, the first argument to printk() is a loglevel, but it's handled via
> string concatenation.  I'd like to change that to be an integer, and make it
> an actual comma-separated first argument.  (Mandatory, not optional.)
>
> So instead of:
>   printk(KERN_NOTICE "Fruit=%d\n", banana);
> It would now be:
>   printk(KERN_NOTICE, "Fruit=%d\n", banana);
>
> Change the header from:
>   #define KERN_NOTICE "<5>"
> to:
>   #define KERN_NOTICE 5

You have to jump through less hoops if you do:

#define KERN_NOTICE 5,

But the problem remains that there are printk's which don't have
a KERN_* as the first argument. Those are also impossible to get
rid off in this way, as the loglevel is unknown (and you don't want
partially printed messages).

So adding the comma is really needed and in addition all printk's
without a loglevel should get one. Which clutters the code and may
increase codesize.

A quick scroll through a vmlinux binary shows that there are quite a
lot areas consisting only of some repeated pattern. Mostly 0x00, but
also 0x90 and ".GCC: (GNU) 4.2.1.". Getting rid of those would save
something between 50 and 100KB.

Greetings,

Indan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] Linux-tiny project revival

2007-09-20 Thread Tim Bird

Rob Landley wrote:
> So instead of:
>   printk(KERN_NOTICE "Fruit=%d\n", banana);
> It would now be:
>   printk(KERN_NOTICE, "Fruit=%d\n", banana);
> 
> Change the header from:
>   #define KERN_NOTICE "<5>"
> to:
>   #define KERN_NOTICE 5
> 
> Then you can change the printk guts to do something vaguely like (untested):
> #define printk(arg1, arg2, ...) actual_printk("<" #arg1 ">" arg2, __VA_ARGS__)
...
> [then] the
> compiler's dead code eliminator zaps the printks you don't care about so they
> don't bloat the kernel image.

I agree in principal with the idea, but there are some major
practical wrinkles that would have to be worked through.

First, not all printks that are missing a log level should have one.
People do stuff like this:

printk(KERN_INFO "interesting info follows:");
...
printk("var5: %d\n", var5);

Or even things that evaluate to:
printk("");

The code inside printk currently has to examine the
strings, looking for line feeds and inserting log levels.

Given that there are about 60,000 printks in the kernel (and that's
not counting wrappers like dprintk() and other locally-defined
functions and macros) it would be a huge task to examine the code
and differentiate strings that really start a new log message
(and thus should have an attached log level) and strings
that don't.
 -- Tim


=
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING

2007-09-20 Thread Linus Torvalds

On Thu, 20 Sep 2007, Linus Torvalds wrote:
> 
> (Btw, the above commit message points to just my response with a testing 
> patch to the real email: the actual explanation of the INSANE ordering is 
> from Len Brown in
> 
>   
> https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html
> 
> and there Len claims that we *must* wake up CPU's early).

..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in 
turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 

Howerver, it seems that bugzilla entry may just be bogus. It talks about 
"it appears that some firmware in the future may depend on that sequence 
for correction operation"

Len, Shaohua, what are the real issues here? 

It would indeed be nice if we could just take CPU's down early (while 
everything is working), and run the whole suspend code with just one CPU, 
rather than having to worry about the ordering between CPU and device 
takedown.

That said, at least with STR, the situation is:

 1) suspend_console
 2)   device_suspend(PMSG_SUSPEND)(==   ->suspend)
 3) disable_nonboot_cpus()
 4)   device_power_down(PMSG_SUSPEND) (==   ->suspend_late)
 5) pm_ops->enter()
 6)   device_power_up()   (==   ->resume_early)
 7) enable_nonboot_cpus()
 8) pm_finish()
 9)   device_resume() (==   ->resume
10) resume_console

So if we agree that things like timers etc should *never* be suspended by 
the early suspend, and *always* use "suspend_late/resume_early", then at 
least STR should be ok.

And I think that's a damn reasonable thing to agree on: timers (and 
anything else that CPU shutdown/bringup could *possibly* care about) 
should be considered core enough that they had better be on the 
suspend_late/resume_early list.

Thomas, Rafael, can you verify that at least STR is ok in this respect?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[no subject]

2007-09-20 Thread Newsletter

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1005 matches

Mail list logo