softdog.c kernel 2.4.29

2005-03-15 Thread Jacques Basson
Hi

There is a bug in the softdog.c (v 0.05) in the 2.4 kernel series
(certainly in 2.4.29 and there are no references to it in the latest
Changelog) that won't reboot the machine if /dev/watchdog is closed
unexpectedly and nowayout is not set. The softdog.c (v 0.07) in 2.6.11
is not affected, but I have been informed by the vendor of analog output
cards that we use (ICP DAS) that they currently have no plans to port
their driver to the 2.6 series.

Anyway, here is a simple patch that does the job. I hope that it is of
use to someone:

diff -Naur softdog.c.orig softdog.c
--- softdog.c.orig  2003-11-28 20:26:20.0 +0200
+++ softdog.c   2005-03-16 09:12:34.0 +0200
@@ -124,7 +124,7 @@
 *  Shut off the timer.
 *  Lock it in if it's a module and we set nowayout
 */
-   if (expect_close || nowayout == 0) {
+   if (expect_close && nowayout == 0) {
del_timer(_ticktock);
} else {
printk(KERN_CRIT "SOFTDOG: WDT device closed
unexpectedly.  WDT will not stop!\n");

Jacques

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01

2005-03-15 Thread Steven Rostedt


On Tue, 15 Mar 2005, Lee Revell wrote:

> On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote:
> > Damn! The answer was right there in front of my eyes! Here's the cleanest
> > solution. I forgot about wait_on_bit_lock.  I've converted all the locks
> > to use this instead.  We probably need to get priority inheritence working
> > on this too someday, but for now it's better than wasting memory or
> > getting into deadlocks.
> >
>
> I am still not clear on why this did not hit with earlier kernels +
> PREEMPT_DESKTOP.  Were the bitlocks introduced recently?  Or was another
> lock-break patch dropped?
>

When did you start seeing this? This code has been there as far back as
2.6.7 (the earliest 2.6 kernel I still have laying around) and as far
back as Ingo's realtime-preempt-2.6.9-mm1-U10. Maybe the tracing didn't
start picking this up till later, or that you were just lucky that no
contention was happening on that lock.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel Ethernet PRO 100

2005-03-15 Thread Meelis Roos
>>>Where we can find specs for writing driver for Intel PRO 100 card.

RD> You can find a developer's manual for the 8255x NIC at
RD> http://sourceforge.net/project/showfiles.php?group_id=42302

I'. not sure what NIC the original poster meant, but:

PRO 100 is actually an older card than e100 supports. e100 supports Pro
100+ and other newer cards. Original Pro 100 used 82556 but this doc
(and e100) is about 82557 and newer chips.

The PCI ID of 82556-based pro 100 smart was just removed from eepro100
driver too because it didn't actually work. I have two of these cards
and tested myself that they did not work.

-- 
Meelis Roos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01

2005-03-15 Thread Steven Rostedt


On Tue, 15 Mar 2005, Steven Rostedt wrote:

>
>
> On Tue, 15 Mar 2005, Ingo Molnar wrote:
> >
> > i'd go for removing bit-spinlocks altogether, in the upstream kernel. It
> > would simplify things, besides making PREEMPT_RT simpler as well. The
> > memory overhead is not a big issue i believe. (8 more bytes per ext3 bh,
> > on x86)
> >
>
> Hi Ingo,
>
> Damn! The answer was right there in front of my eyes! Here's the cleanest
> solution. I forgot about wait_on_bit_lock.  I've converted all the locks
> to use this instead.  We probably need to get priority inheritence working
> on this too someday, but for now it's better than wasting memory or
> getting into deadlocks.
>

One bit of caution on these. If we don't have PREEMPT_RT, then don't the
spinlocks on SMP act the same as normal spinlocks, and that we should not
schedule holding a spinlock? I believe that some of this locks are called
within holding spin_locks. So this isn't the right solution for other than
PREEMPT_RT. I also forgot to add might_sleep in the locking calls. Here's
the patch with the might_sleep added.  What should we do for non
PREEPMT_RT?  Maybe put the bit_spinlocks back in for that case?

-- Steve

diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 
linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
--- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 
02:37:49.0 -0500
+++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c  2005-03-15 
11:58:14.0 -0500
@@ -82,6 +82,17 @@

 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || 
defined(CONFIG_PREEMPT)
+/*
+ * Used in the locking of the bh_state and bh_journalhead bit locks.
+ */
+int jbd_lock_bh_sleep(void *notused)
+{
+   schedule();
+   return 0;
+}
+#endif
+
 /*
  * Helper function used to manage commit timeouts
  */
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 
linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h  2005-03-02 
02:38:19.0 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h   2005-03-16 
02:25:31.881251828 -0500
@@ -324,34 +324,65 @@
return bh->b_private;
 }

+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || 
defined(CONFIG_PREEMPT)
+int jbd_lock_bh_sleep(void *notused);
+#endif
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-   bit_spin_lock(BH_State, >b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || 
defined(CONFIG_PREEMPT)
+   might_sleep();
+   
wait_on_bit_lock(>b_state,BH_State,_lock_bh_sleep,TASK_UNINTERRUPTIBLE);
+#endif
+   __acquire(bitlock);
 }

 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-   return bit_spin_trylock(BH_State, >b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || 
defined(CONFIG_PREEMPT)
+   if (test_and_set_bit(BH_State, >b_state))
+   return 0;
+#endif
+   __acquire(bitlock);
+   return 1;
 }

 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-   return bit_spin_is_locked(BH_State, >b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || 
defined(CONFIG_PREEMPT)
+   return test_bit(BH_State, >b_state);
+#else
+   return 1;
+#endif
 }

 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-   bit_spin_unlock(BH_State, >b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || 
defined(CONFIG_PREEMPT)
+   clear_bit(BH_State, >b_state);
+   smp_mb__after_clear_bit();
+   wake_up_bit(>b_state, BH_State);
+#endif
+   __release(bitlock);
 }

 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-   bit_spin_lock(BH_JournalHead, >b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || 
defined(CONFIG_PREEMPT)
+   might_sleep();
+   
wait_on_bit_lock(>b_state,BH_JournalHead,_lock_bh_sleep,TASK_UNINTERRUPTIBLE);
+#endif
+   __acquire(bitlock);
 }

 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-   bit_spin_unlock(BH_JournalHead, >b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || 
defined(CONFIG_PREEMPT)
+   clear_bit(BH_JournalHead, >b_state);
+   smp_mb__after_clear_bit();
+   wake_up_bit(>b_state, BH_JournalHead);
+#endif
+   __release(bitlock);
 }

 struct jbd_revoke_table_s;
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 
linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 
06:00:54.0 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h  2005-03-15 
12:19:11.0 -0500
@@ -774,67 +774,6 @@
 }))


-/*
- *  bit-based spin_lock()
- *
- * Don't use this unless you really need to: spin_lock() and spin_unlock()
- * are significantly faster.
- 

Re: [PATCH][1/2] SquashFS

2005-03-15 Thread Paul Jackson
>  the King Penguin used these two constructs with consistency:

Nice distinction - thanks.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Reading deterministic cache parameters and exporting it in /sysfs

2005-03-15 Thread Andrew Morton
Venkatesh Pallipadi <[EMAIL PROTECTED]> wrote:
>
>  The attached patch adds support for using cpuid(4) instead of cpuid(2), to 
> get 
>  CPU cache information in a deterministic way for Intel CPUs, whenever 
>  supported.

- find_num_cache_leaves can be marked __init

- Please look for other __init opportunities.  That's quite a lot of code.

- Some functions have a space before the ( and some don't:

+static ssize_t show_size (struct _cpuid4_info *this_leaf, char *buf)

  omitting the space is preferred.

- Don't cast the return value of kmalloc:

+   cpuid4_info[cpu] = (struct _cpuid4_info *)kmalloc(
+   sizeof(struct _cpuid4_info) * num_cache_leaves, GFP_KERNEL);

- Sometimes there's a space after an `if', sometimes not.

+   if(cpuid4_info[i])

  a space is preferred.

- kfree(NULL) is permitted:

+   if(cpuid4_info[i])
+   kfree(cpuid4_info[i]);
+   if(cache_kobject[i])
+   kfree(cache_kobject[i]);
+   if(index_kobject[i])
+   kfree(index_kobject[i]);

  (in several places)


Once you've worked through the design issues with davej, please upissue the
patch, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-mm3: BUG: atomic counter underflow at: rpcauth_destroy

2005-03-15 Thread Borislav Petkov
On Wednesday 16 March 2005 00:52, Trond Myklebust wrote:
> ty den 15.03.2005 Klokka 23:21 (+0100) skreiv Borislav Petkov:
> > After some rookie debugging I think I've found the evildoer:
> >
> > rpcauth_create used to have a line that inits rpc_auth->au_count to one
> > atomically. This line is now missing so when you release the rpc
> > authentication handle, the au_count underflows. Here's a fix:
> >
> > Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> >
> > --- net/sunrpc/auth.c.orig 2005-03-15 22:34:58.0 +0100
> > +++ net/sunrpc/auth.c 2005-03-15 22:36:23.0 +0100
> > @@ -70,6 +70,7 @@ rpcauth_create(rpc_authflavor_t pseudofl
> >   auth = ops->create(clnt, pseudoflavor);
> >   if (!auth)
> >return NULL;
> > + atomic_set(>au_count, 1);
> >   if (clnt->cl_auth)
> >rpcauth_destroy(clnt->cl_auth);
> >   clnt->cl_auth = auth;
>
> The correct fix for this has already been committed to Linus' bitkeeper
> repository. See
>
> http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED]
>nav=index.html|[EMAIL PROTECTED]
>
> Cheers,
>   Trond

Please, excuse the noise :) Just saw ChangeSet 1.2009.4.29 :
RPC: struct rpc_auth initialization and destruction code cleanup. Now I get 
it, thanks.

Regards,
Boris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11 USB broken on VIA computer (not just ACPI)

2005-03-15 Thread Robert W. Fuller
Andrew Morton wrote:
"Robert W. Fuller" <[EMAIL PROTECTED]> wrote:
Nobody's going to fix that machine while you persist in top-posting ;)
OK OK.  No more top posting.  It's Mozilla's fault you know  It 
steers you in the wrong direction by leaving a few lines at the top. 
Yes I'm ashamed to admit I remember when the default behavior of mail 
clients was to put the cursor at the bottom.

How old is it, anyway?
Hmm.  I think I built it in 2000.  Wow, time flies when you're having fun!
Of course, I don't know how well video capture is going to work without 
the apic programming.  So I guess I'm reduced to rebooting when I want 
to switch between USB peripherals and video capture?
>
hm, you didn't mention video capture before.  It should work OK?
I've only ever used it with the APIC enabled.  We'll see what happens 
without?

Are you running the latest BIOS?
The manufacturer, Tyan, didn't produce more than a handful of BIOS'es 
within a matter of months after they started producing the board.  They 
haven't released an update since 2000.

You may be able to set the thing up by hand with the help of
Documentation/i386/IO-APIC.txt.
I'll check it out.  Thanks!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


tty->driver_data is NULL

2005-03-15 Thread Tomita, Haruo
I obtained following oops. 
(B
(BModules Linked in: parport_pc lp parport autofs4 i2c_dev i2c_core xxxnrpc dm_mod
(Bbutton battery ac md5 ipv6 uhci_hcd ehci_hcd e1000 floppy ext3 jbd ata_piix liba
(Bta sd_mod scsi_mod
(BCPU:0
(BEIP:0060:[<021f39cd>]   Not tainted VLI
(BEFLAGS: 00010216(2.6.9-5.0.3.ELhugeme)
(BEIP is at vt_ioctl+0x1d/0x17b7
(Beax:  ebx: 4b3b ecx: 4b3b edx: 02dcea80
(Besi: feed6f60 edi: feed6f60 ebp: 11145000 esp: 08b0fe88
(Bds: 007b es: 007b ss: 0068
(BProcess gpm (pid: 2190, threadinfo=08b0f000 task=0813ab30)
(BStack:  0001  11ed3e80 11949344    
(B    11145000   
(B0246  0246 0246 5315 02120dbc 0007 11145000
(BCall Trace:
(B[<02120dbc>] release_console_sem+0x75/0xa9
(B[<021fbe9c>] con_open+0x88/0x8e
(B[<021eecd0>] tty_open+0x189/0x2a0
(B[<0215c132>] chrdev_open+0x171/0x187
(B[<02154058>] dentry_open+0xf0/0x1a5
(B[<02153f62>] filp_open+0x36/0x3c
(B[<021efad4>] tty_ioctl+0x33e/0x38d
(B[<0216415a>] sys_ioctl+0x211/0x253
(BCode: Bad EIP value.
(B
(Btty->driver_data is NULL.
(BThe following patches were made. Is this patch correct?
(B
(Bdiff -urN linux-2.6.11.3orig/drivers/char/vt_ioctl.c 
(Blinux-2.6.11.3/drivers/char/vt_ioctl.c
(B--- linux-2.6.11.3orig/drivers/char/vt_ioctl.c  2005-03-13 15:44:51.0 
(B+0900
(B+++ linux-2.6.11.3/drivers/char/vt_ioctl.c  2005-03-16 15:08:49.0 
(B+0900
(B@@ -366,7 +366,7 @@
(B unsigned int cmd, unsigned long arg)
(B {
(Bstruct vt_struct *vt = (struct vt_struct *)tty->driver_data;
(B-   struct vc_data *vc = vc_cons[vt->vc_num].d;
(B+   struct vc_data *vc;
(Bstruct console_font_op op;  /* used in multiple places here */
(Bstruct kbd_struct * kbd;
(Bunsigned int console;
(B@@ -374,7 +374,14 @@
(Bvoid __user *up = (void __user *)arg;
(Bint i, perm;
(B
(B+   acquire_console_sem();
(B+   if (vt == NULL) {
(B+   release_console_sem();
(B+   return -EINVAL;
(B+   }
(B+   vc = vc_cons[vt->vc_num].d;
(Bconsole = vt->vc_num;
(B+   release_console_sem();
(B 
(Bif (!vc_cons_allocated(console))/* impossible? */
(Breturn -ENOIOCTLCMD;
(B
(B--
(BHaruo
(B-
(BTo unsubscribe from this list: send the line "unsubscribe linux-kernel" in
(Bthe body of a message to [EMAIL PROTECTED]
(BMore majordomo info at  http://vger.kernel.org/majordomo-info.html
(BPlease read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][2.6.11] generic_serial.h gcc4 fix

2005-03-15 Thread Rogier Wolff
On Tue, Mar 15, 2005 at 06:39:46PM +0100, Adrian Bunk wrote:
> > @@ -91,6 +91,4 @@ int  gs_setserial(struct gs_port *port, 
> >  int  gs_getserial(struct gs_port *port, struct serial_struct __user *sp);
> >  void gs_got_break(struct gs_port *port);
> >  
> > -extern int gs_debug;
> > -
> >  #endif
> 
> This patch is already in -mm for ages.
> 
> When doing such patches, -mm is usually a better basis than Linus' tree.

Note that the original reason for doing "extern int gs_debug" was that
sx.c used to have an ioctl to fiddle with it "live". Apparently
someone removed that piece of useful, but(t) ugly code, as it is no
longer there.

Roger. 


-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. - Adapted from lxrbot FAQ
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ppc32: Update 8260_io/fcc_enet.c to function again

2005-03-15 Thread Andrew Morton
Tom Rini <[EMAIL PROTECTED]> wrote:
>
> There's too many things in here that've sat too long (I'd been hoping to
>  just delete the driver, but that hasn't happened yet, so).  A cobbled
>  together list of changes is:
> 
>  - Update MDIO support for workqueues.
>  - Make use of 
>  - Add RPX6 support.
>  - Comment out set_multicast_list (broken).
>  - Rework tx_ring stuff so we have tx_free, not tx_Full/n_pkts.
>  - Other PHY updates/fixes.
>  - Leo Li: Rework FCC clock configuration, make it easier.
>  - 2.4 : VLAN header room, other misc bits.
>  - Kill MII_REG_NNN in favor of defines from 
>  - DM9161 PHY support (2.4, Myself & [EMAIL PROTECTED])
>  - PQ2ADS and PQ2FADS support bits (Myself & [EMAIL PROTECTED]
> 
>  From: Leo Li <[EMAIL PROTECTED]>
>  Signed-off-by: Tom Rini <[EMAIL PROTECTED]>
>  Signed-off-by: Alexandre Bastos <[EMAIL PROTECTED]>

That's unfortunate - Oray sent a patch in just a few days ago which also
fixes up this driver.  See

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm3/broken-out/ppc-8260-fcc-ethernet-driver-cannot-read-lxt971-phy-id.patch

What should we do?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux 2.4.20 internationalization question

2005-03-15 Thread Sheshadrivasan B
Does the Linux kernel 2.4.20 support internationalization ?
If not what are the patches needed to specifically support 
internationalization for CJK locales
on linux terminals and serial consoles ?

I have been thro Markus Kuhn's UTF-8 & Linux article that points to some 
patches for the
keyboard driver. It also points to this linux console tools project which I 
understand is some
rearchitecture of tty and keyboard drivers. But I am not sure if the linux 
kernel 2.4.20 uses
these ?

I would appreciate if some one could help me in this regard as to what needs 
to be done
if at all ?

Ofcourse, I am a newbie to the kernel and am not familiar with it.
Thanks,
Shesh.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-mm3: BUG: atomic counter underflow at: rpcauth_destroy

2005-03-15 Thread Borislav Petkov
On Wednesday 16 March 2005 00:52, Trond Myklebust wrote:
> ty den 15.03.2005 Klokka 23:21 (+0100) skreiv Borislav Petkov:
> > After some rookie debugging I think I've found the evildoer:
> >
> > rpcauth_create used to have a line that inits rpc_auth->au_count to one
> > atomically. This line is now missing so when you release the rpc
> > authentication handle, the au_count underflows. Here's a fix:
> >
> > Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> >
> > --- net/sunrpc/auth.c.orig 2005-03-15 22:34:58.0 +0100
> > +++ net/sunrpc/auth.c 2005-03-15 22:36:23.0 +0100
> > @@ -70,6 +70,7 @@ rpcauth_create(rpc_authflavor_t pseudofl
> >   auth = ops->create(clnt, pseudoflavor);
> >   if (!auth)
> >return NULL;
> > + atomic_set(>au_count, 1);
> >   if (clnt->cl_auth)
> >rpcauth_destroy(clnt->cl_auth);
> >   clnt->cl_auth = auth;
>
> The correct fix for this has already been committed to Linus' bitkeeper
> repository. See
>
> http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED]
>nav=index.html|[EMAIL PROTECTED]
>
> Cheers,
>   Trond

Yeah, this is fixed, but, the atomic_dec_and_test(>au_count) appears in 
net/sunrpc/auth.c too. And the respective create function which creates the 
authentication handles doesn't do atomic_set of this variable. So we should 
either remove the atomic_dec_and_test() there too or reintroduce the 
atomic_set() in the rpcauth_create function otherwise we are still going to 
experience the same underflow BUG messages. Or am I missing something :)?

Regards,
Boris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] No-exec support for ppc64

2005-03-15 Thread Paul Mackerras
Jake Moilanen writes:

> It does not work w/o the sys_mprotect.  It will hang in one of the first
> few binaries.

Hmmm, what distro is this with?  I just tried a kernel with the patch
below on a SLES9 install and a Debian install and it came up and ran
just fine in both cases.

Paul.

diff -urN linux-2.5/arch/ppc64/kernel/head.S test/arch/ppc64/kernel/head.S
--- linux-2.5/arch/ppc64/kernel/head.S  2005-03-07 10:46:38.0 +1100
+++ test/arch/ppc64/kernel/head.S   2005-03-15 17:14:44.0 +1100
@@ -950,11 +950,12 @@
 * accessing a userspace segment (even from the kernel). We assume
 * kernel addresses always have the high bit set.
 */
-   rlwinm  r4,r4,32-23,29,29   /* DSISR_STORE -> _PAGE_RW */
+   rlwinm  r4,r4,32-25+9,31-9,31-9 /* DSISR_STORE -> _PAGE_RW */
rotldi  r0,r3,15/* Move high bit into MSR_PR posn */
orc r0,r12,r0   /* MSR_PR | ~high_bit */
rlwimi  r4,r0,32-13,30,30   /* becomes _PAGE_USER access bit */
ori r4,r4,1 /* add _PAGE_PRESENT */
+   rlwimi  r4,r5,22+2,31-2,31-2/* Set _PAGE_EXEC if trap is 0x400 */
 
/*
 * On iSeries, we soft-disable interrupts here, then
diff -urN linux-2.5/arch/ppc64/kernel/iSeries_htab.c 
test/arch/ppc64/kernel/iSeries_htab.c
--- linux-2.5/arch/ppc64/kernel/iSeries_htab.c  2004-09-21 17:22:33.0 
+1000
+++ test/arch/ppc64/kernel/iSeries_htab.c   2005-03-15 17:15:36.0 
+1100
@@ -144,6 +144,10 @@
 
HvCallHpt_get(, slot);
if ((hpte.dw0.dw0.avpn == avpn) && (hpte.dw0.dw0.v)) {
+   /*
+* Hypervisor expects bits as NPPP, which is
+* different from how they are mapped in our PP.
+*/
HvCallHpt_setPp(slot, (newpp & 0x3) | ((newpp & 0x4) << 1));
iSeries_hunlock(slot);
return 0;
diff -urN linux-2.5/arch/ppc64/kernel/iSeries_setup.c 
test/arch/ppc64/kernel/iSeries_setup.c
--- linux-2.5/arch/ppc64/kernel/iSeries_setup.c 2005-03-07 10:46:38.0 
+1100
+++ test/arch/ppc64/kernel/iSeries_setup.c  2005-03-15 16:55:05.0 
+1100
@@ -633,6 +633,10 @@
unsigned long vpn = va >> PAGE_SHIFT;
unsigned long slot = HvCallHpt_findValid(, vpn);
 
+   /* Make non-kernel text non-executable */
+   if (!in_kernel_text(ea))
+   mode_rw |= HW_NO_EXEC;
+
if (hpte.dw0.dw0.v) {
/* HPTE exists, so just bolt it */
HvCallHpt_setSwBits(slot, 0x10, 0);
diff -urN linux-2.5/arch/ppc64/kernel/module.c test/arch/ppc64/kernel/module.c
--- linux-2.5/arch/ppc64/kernel/module.c2004-05-10 21:25:58.0 
+1000
+++ test/arch/ppc64/kernel/module.c 2005-03-15 16:55:05.0 +1100
@@ -102,7 +102,8 @@
 {
if (size == 0)
return NULL;
-   return vmalloc(size);
+
+   return vmalloc_exec(size);
 }
 
 /* Free memory returned from module_alloc */
diff -urN linux-2.5/arch/ppc64/kernel/pSeries_lpar.c 
test/arch/ppc64/kernel/pSeries_lpar.c
--- linux-2.5/arch/ppc64/kernel/pSeries_lpar.c  2005-03-07 10:46:38.0 
+1100
+++ test/arch/ppc64/kernel/pSeries_lpar.c   2005-03-15 16:55:02.0 
+1100
@@ -470,7 +470,7 @@
slot = pSeries_lpar_hpte_find(vpn);
BUG_ON(slot == -1);
 
-   flags = newpp & 3;
+   flags = newpp & 7;
lpar_rc = plpar_pte_protect(flags, slot, 0);
 
BUG_ON(lpar_rc != H_Success);
diff -urN linux-2.5/arch/ppc64/mm/fault.c test/arch/ppc64/mm/fault.c
--- linux-2.5/arch/ppc64/mm/fault.c 2005-01-04 10:49:20.0 +1100
+++ test/arch/ppc64/mm/fault.c  2005-03-15 17:13:05.0 +1100
@@ -91,8 +91,9 @@
struct mm_struct *mm = current->mm;
siginfo_t info;
unsigned long code = SEGV_MAPERR;
-   unsigned long is_write = error_code & 0x0200;
+   unsigned long is_write = error_code & DSISR_ISSTORE;
unsigned long trap = TRAP(regs);
+   unsigned long is_exec = trap == 0x400;
 
BUG_ON((trap == 0x380) || (trap == 0x480));
 
@@ -109,7 +110,7 @@
if (!user_mode(regs) && (address >= TASK_SIZE))
return SIGSEGV;
 
-   if (error_code & 0x0040) {
+   if (error_code & DSISR_DABRMATCH) {
if (notify_die(DIE_DABR_MATCH, "dabr_match", regs, error_code,
11, SIGSEGV) == NOTIFY_STOP)
return 0;
@@ -199,16 +200,19 @@
 good_area:
code = SEGV_ACCERR;
 
+   if (is_exec) {
+   /* protection fault */
+   if (error_code & DSISR_PROTFAULT)
+   goto bad_area;
+   if (!(vma->vm_flags & VM_EXEC))
+   goto bad_area;
/* a write */
-   if (is_write) {
+   } else if (is_write) {
if (!(vma->vm_flags & VM_WRITE))

Re: 2.6.11 USB broken on VIA computer (not just ACPI)

2005-03-15 Thread Andrew Morton
"Robert W. Fuller" <[EMAIL PROTECTED]> wrote:
>
> I suppose you have to have your priorities.  It may be old to you, but 
> it's current to me!  That used to be the hallmark of Linux, the fact 
> that it would run on lesser hardware.

Nobody's going to fix that machine while you persist in top-posting ;)

How old is it, anyway?

> Of course, I don't know how well video capture is going to work without 
> the apic programming.  So I guess I'm reduced to rebooting when I want 
> to switch between USB peripherals and video capture?

hm, you didn't mention video capture before.  It should work OK?

Are you running the latest BIOS?

You may be able to set the thing up by hand with the help of
Documentation/i386/IO-APIC.txt.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Where is Margit Schubert-While?

2005-03-15 Thread Luis R. Rodriguez

Anyone heard of Margit Schubert recently? I have stopped hearing from
her. She was actively working on prism54 and all of a sudden
disappeared. IIRC her husband last told me she was sick...

Luis

-- 
GnuPG Key fingerprint = 113F B290 C6D2 0251 4D84  A34A 6ADD 4937 E20A 525E


pgpSjX9YQpcCS.pgp
Description: PGP signature


[PATCH] kprobes: incorrect spin_unlock_irqrestore() call in register_kprobe()

2005-03-15 Thread Prasanna S Panchamukhi
Hi,

register_kprobe() routine was calling spin_unlock_irqrestore() 
wrongly. 
This patch removes unwanted spin_unlock_irqrestore() call in 
register_kprobe() routine.

Signed-off-by: Prasanna S Panchamukhi <[EMAIL PROTECTED]>

---



---

 linux-2.6.11-prasanna/kernel/kprobes.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff -puN kernel/kprobes.c~kprobes-incorrect-returnval kernel/kprobes.c
--- linux-2.6.11/kernel/kprobes.c~kprobes-incorrect-returnval   2005-03-16 
11:03:42.0 +0530
+++ linux-2.6.11-prasanna/kernel/kprobes.c  2005-03-16 11:03:42.0 
+0530
@@ -79,7 +79,7 @@ int register_kprobe(struct kprobe *p)
unsigned long flags = 0;
 
if ((ret = arch_prepare_kprobe(p)) != 0) {
-   goto out;
+   goto rm_kprobe;
}
spin_lock_irqsave(_lock, flags);
INIT_HLIST_NODE(>hlist);
@@ -96,8 +96,9 @@ int register_kprobe(struct kprobe *p)
*p->addr = BREAKPOINT_INSTRUCTION;
flush_icache_range((unsigned long) p->addr,
   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
-  out:
+out:
spin_unlock_irqrestore(_lock, flags);
+rm_kprobe:
if (ret == -EEXIST)
arch_remove_kprobe(p);
return ret;

_

Thanks
Prasanna
-- 

Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Ph: 91-80-25044636
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11 USB broken on VIA computer (not just ACPI)

2005-03-15 Thread Robert W. Fuller
I suppose you have to have your priorities.  It may be old to you, but 
it's current to me!  That used to be the hallmark of Linux, the fact 
that it would run on lesser hardware.

Of course, I don't know how well video capture is going to work without 
the apic programming.  So I guess I'm reduced to rebooting when I want 
to switch between USB peripherals and video capture?

Maybe I should have lied and said it worked :-)
Andrew Morton wrote:
"Robert W. Fuller" <[EMAIL PROTECTED]> wrote:
I never actually saw it work until I added the noapic option to the 
2.6.11.2 boot.  Now I can usually my USB mouse!  Of course the downside 
to specifying noapic is only one CPU is servicing interrupts on my SMP 
system.

Oh, OK.  I was just wondering whether this was an actual regression.  I
guess as it's an old machine and you have a workaround, we have other
things to be working on.
It would be nice to fix though.

It certainly doesn't work under 2.4.28, but I haven't tried specifying 
noapic to that kernel.  Would that be useful information?

Probably not.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Taking strlen of buffers copied from userspace

2005-03-15 Thread Randy.Dunlap
Robert Hancock wrote:
Randy.Dunlap wrote:
The latter one does (before the listed code):
memset(line, 0, LINE_SIZE);
if (len > LINE_SIZE)
len = LINE_SIZE;
if (copy_from_user(line, buf, len - 1))
return -EFAULT;
so isn't line[LINE_SIZE - 1] always 0 ?

In that case, yes (I hadn't looked at the surrounding code). Rather an 
odd way of doing it, but shouldn't have that problem. Could still be 
subject to problems if buf contains a null at the first character, 
unless they're somehow preventing that too..
Yes, that's still a problem.
--
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11 USB broken on VIA computer (not just ACPI)

2005-03-15 Thread Robert W. Fuller
I never actually saw it work until I added the noapic option to the 
2.6.11.2 boot.  Now I can usually my USB mouse!  Of course the downside 
to specifying noapic is only one CPU is servicing interrupts on my SMP 
system.

It certainly doesn't work under 2.4.28, but I haven't tried specifying 
noapic to that kernel.  Would that be useful information?

Andrew Morton wrote:
"Robert W. Fuller" <[EMAIL PROTECTED]> wrote:
This isn't limited to the ACPI case.  My BIOS is old enough that ACPI is 
not supported because the kernel can't find RSDP.  I found that the USB 
works if I boot with "noapic."  This is probably sub-optimal on an SMP 
machine.  If don't boot with "noapic" I get the following errors:

Did it work OK under previous kernels?  If so, which versions?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11 USB broken on VIA computer (not just ACPI)

2005-03-15 Thread Andrew Morton
"Robert W. Fuller" <[EMAIL PROTECTED]> wrote:
>
>  I never actually saw it work until I added the noapic option to the 
>  2.6.11.2 boot.  Now I can usually my USB mouse!  Of course the downside 
>  to specifying noapic is only one CPU is servicing interrupts on my SMP 
>  system.

Oh, OK.  I was just wondering whether this was an actual regression.  I
guess as it's an old machine and you have a workaround, we have other
things to be working on.

It would be nice to fix though.

>  It certainly doesn't work under 2.4.28, but I haven't tried specifying 
>  noapic to that kernel.  Would that be useful information?

Probably not.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Replace zone padding with a definition in cache.h

2005-03-15 Thread Christoph Lameter
On Tue, 15 Mar 2005, Andrew Morton wrote:

> > If the struct is named then there may be
> > conflicts if its used repeatedly.
>
> Hence the "hack" which you just deleted ;)

Ok, Master, I see the light
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Taking strlen of buffers copied from userspace

2005-03-15 Thread Robert Hancock
Randy.Dunlap wrote:
The latter one does (before the listed code):
memset(line, 0, LINE_SIZE);
if (len > LINE_SIZE)
len = LINE_SIZE;
if (copy_from_user(line, buf, len - 1))
return -EFAULT;
so isn't line[LINE_SIZE - 1] always 0 ?
In that case, yes (I hadn't looked at the surrounding code). Rather an 
odd way of doing it, but shouldn't have that problem. Could still be 
subject to problems if buf contains a null at the first character, 
unless they're somehow preventing that too..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Replace zone padding with a definition in cache.h

2005-03-15 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> On Tue, 15 Mar 2005, Andrew Morton wrote:
> 
> > Christoph Lameter <[EMAIL PROTECTED]> wrote:
> > >
> > >  +#ifndef cacheline_pad_in_smp
> > >  +#if defined(CONFIG_SMP)
> > >  +#define cacheline_pad_in_smp struct { char  x; } 
> > > cacheline_maxaligned_in_smp
> > >  +#else
> > >  +#define cacheline_pad_in_smp
> > >  +#endif
> > >  +#endif
> >
> > That's going to spit a warning with older gcc's.  "warning: unnamed
> > struct/union that defines no instances".
> >
> Is it really that important?

Well, it makes gcc-2.95.x unusable, and a number of people like to use it.

It has not proven too burdensome to support.  And we know that if it works
on 2.95.x, it will work on 3.1, 3.2, 3.3, etc.

> If the struct is named then there may be
> conflicts if its used repeatedly.

Hence the "hack" which you just deleted ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Replace zone padding with a definition in cache.h

2005-03-15 Thread Christoph Lameter
On Wed, 16 Mar 2005, Nick Piggin wrote:

> >
> > +#ifndef cacheline_pad_in_smp
> > +#if defined(CONFIG_SMP)
> > +#define cacheline_pad_in_smp struct { char  x; } 
> > cacheline_maxaligned_in_smp
>  ^^^
>
> Doesn't this add a redundant cacheline if the padding is
> previously perfect? Because of the extra byte you're adding?
>
> IIRC, the char x[0]; trick does the job correctly.

Good idea.

This patch removes the zone padding hack and establishes definitions
in include/linux/cache.h to define the padding within struct zone.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]>

Index: linux-2.6.11/include/linux/cache.h
===
--- linux-2.6.11.orig/include/linux/cache.h 2005-03-08 18:40:15.0 
-0800
+++ linux-2.6.11/include/linux/cache.h  2005-03-14 10:33:45.247701040 -0800
@@ -48,4 +48,12 @@
 #endif
 #endif

+#ifndef cacheline_pad_in_smp
+#if defined(CONFIG_SMP)
+#define cacheline_pad_in_smp struct { char  x[0]; } 
cacheline_maxaligned_in_smp
+#else
+#define cacheline_pad_in_smp
+#endif
+#endif
+
 #endif /* __LINUX_CACHE_H */
Index: linux-2.6.11/include/linux/mmzone.h
===
--- linux-2.6.11.orig/include/linux/mmzone.h2005-03-14 10:33:01.037422024 
-0800
+++ linux-2.6.11/include/linux/mmzone.h 2005-03-14 10:33:45.248700888 -0800
@@ -28,21 +28,6 @@ struct free_area {

 struct pglist_data;

-/*
- * zone->lock and zone->lru_lock are two of the hottest locks in the kernel.
- * So add a wild amount of padding here to ensure that they fall into separate
- * cachelines.  There are very few zone structures in the machine, so space
- * consumption is not a concern here.
- */
-#if defined(CONFIG_SMP)
-struct zone_padding {
-   char x[0];
-} cacheline_maxaligned_in_smp;
-#define ZONE_PADDING(name) struct zone_padding name;
-#else
-#define ZONE_PADDING(name)
-#endif
-
 struct per_cpu_pages {
int count;  /* number of pages in the list */
int low;/* low watermark, refill needed */
@@ -131,7 +116,14 @@ struct zone {
struct free_areafree_area[MAX_ORDER];


-   ZONE_PADDING(_pad1_)
+   /*
+* zone->lock and zone->lru_lock are two of the hottest locks in the 
kernel.
+* So add a wild amount of padding here to ensure that they fall into 
separate
+* cachelines.  There are very few zone structures in the machine, so 
space
+* consumption is not a concern here.
+*/
+
+   cacheline_pad_in_smp;

/* Fields commonly accessed by the page reclaim scanner */
spinlock_t  lru_lock;
@@ -164,7 +156,7 @@ struct zone {
int prev_priority;


-   ZONE_PADDING(_pad2_)
+   cacheline_pad_in_smp;
/* Rarely used or read-mostly fields */

/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Replace zone padding with a definition in cache.h

2005-03-15 Thread Christoph Lameter
On Tue, 15 Mar 2005, Andrew Morton wrote:

> Christoph Lameter <[EMAIL PROTECTED]> wrote:
> >
> >  +#ifndef cacheline_pad_in_smp
> >  +#if defined(CONFIG_SMP)
> >  +#define cacheline_pad_in_smp struct { char  x; } 
> > cacheline_maxaligned_in_smp
> >  +#else
> >  +#define cacheline_pad_in_smp
> >  +#endif
> >  +#endif
>
> That's going to spit a warning with older gcc's.  "warning: unnamed
> struct/union that defines no instances".
>
Is it really that important? If the struct is named then there may be
conflicts if its used repeatedly.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] /proc umask and gid [was: Make /proc/ chmod'able]

2005-03-15 Thread Albert Cahalan
Better interface:

/sbin/sysctl -w proc.maps=0440
/sbin/sysctl -w proc.cmdline=0444
/sbin/sysctl -w proc.status=0444

The /etc/sysctl.conf file can be used to set these
at boot time.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [11/many] acrypto: crypto_main.c

2005-03-15 Thread Evgeniy Polyakov
On Tue, 2005-03-15 at 08:24 -0800, Randy.Dunlap wrote:
> Evgeniy Polyakov wrote:
> > --- /tmp/empty/crypto_main.c1970-01-01 03:00:00.0 +0300
> > +++ ./acrypto/crypto_main.c 2005-03-07 20:35:36.0 +0300
> > @@ -0,0 +1,374 @@
> > +/*
> > + * crypto_main.c
> > + *
> > + * Copyright (c) 2004 Evgeniy Polyakov <[EMAIL PROTECTED]>
> > + * 
> > + */
> 
> > +struct crypto_session *crypto_session_alloc(struct 
> > crypto_session_initializer *ci, struct crypto_data *d)
> > +{
> > +   struct crypto_session *s;
> > +
> > +   s = crypto_session_create(ci, d);
> > +   if (!s)
> > +   return NULL;
> > +
> > +   crypto_session_add(s);
> > +
> > +   return s;
> > +}
> > +
> > +
> 
> > +EXPORT_SYMBOL(crypto_session_alloc);
> Why is this one not _GPL ??  It calls _create() and _add().

It is not allowed to control _create() and _add() methods, only call
them "atomically"
(without gap between functions where new route can be created).
So I export only that one functin as non-GPL-only for anyone
who wants to use asynchronous crypto in simple mode.
More powerfull control requires GPL.

> > +EXPORT_SYMBOL_GPL(crypto_session_create);
> > +EXPORT_SYMBOL_GPL(crypto_session_add);
> > +EXPORT_SYMBOL_GPL(crypto_session_dequeue_route);
> 
> 
-- 
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


signature.asc
Description: This is a digitally signed message part


Re: [PATCH][RFC] /proc umask and gid [was: Make /proc/ chmod'able]

2005-03-15 Thread Albert Cahalan
On Wed, 2005-03-16 at 03:39 +0100, Rene Scharfe wrote:
> So, I gather from the feedback I've got that chmod'able /proc/
> would be a bit over the top. 8-)  While providing the easiest and most
> intuitive user interface for changing the permissions on those
> directories, it is overkill.  Paul is right when he says that such a
> feature should be turned on or off for all sessions at once, and that's
> it.
> 
> My patch had at least one other problem: the contents of eac
> /proc/ directory became chmod'able, too, which was not intended.
> 
> Instead of fixing it up I took two steps back, dusted off the umask
> kernel parameter patch and added the "special gid" feature I mentioned.
> 
> Without the new kernel parameters behaviour is unchanged.  Add
> proc.umask=077 and all /proc/ will get a permission mode of 500.
> This breaks pstree (no output), as Bodo already noted, because this
> program needs access to /proc/1.  It also breaks w -- it shows the
> correct number of users but it lists X even for sessions owned
> by the user running it.
> 
> Use proc.umask=007 and proc.gid=50 instead and all /proc/ dirs
> will have a mode of 550 and their group attribute will be set to 50
> (that's "staff" on my Debian system).  Pstree will work for all members
> of that special group (just like top, ps and w -- which also show
> everything in that case).  Normal users will still have a restricted
> view.
> 
> Albert, would you take fixes for w even though you despise the feature
> that makes them necessary?

I will take patches if they are not too messy and they do not
cause tools to report garbage output. For example, I do not
wish to have tools reporting -1, 0, or uninitialized data in
place of correct data.

Distinct controls for the various files could be useful.
I might want to make /proc/*/cmdline be public, or make
/proc/*/maps be private. This is particularly helpful if
a low-security file is added for bare-bones ps operation.

You might make a special exception for built-in kernel tasks
and init.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.11-tiny1 released

2005-03-15 Thread Matt Mackall
This is a resync of the -tiny tree against 2.6.11.

The latest patch can be found at:

 http://selenic.com/tiny/2.6.11-tiny1.patch.bz2
 http://selenic.com/tiny/2.6.11-tiny1-broken-out.tar.bz2

There's a mailing list for linux-tiny development at:
 
 linux-tiny at selenic.com
 http://selenic.com/mailman/listinfo/linux-tiny

Webpage for your bookmarking pleasure:

 http://selenic.com/tiny-about/

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fw: [PATCH ide-dev-2.6] sata_sil: Mod15Write workaround

2005-03-15 Thread Tejun Heo
 Oops, I forgot to cc lkml.  Please cc to linux-ide@vger.kernel.org
and [EMAIL PROTECTED] when replying.  Sorry.

- Forwarded message from Tejun Heo <[EMAIL PROTECTED]> -

From: Tejun Heo <[EMAIL PROTECTED]>
To: Jeff Garzik <[EMAIL PROTECTED]>
Cc: linux-ide@vger.kernel.org, [EMAIL PROTECTED],
[EMAIL PROTECTED]
Subject: [PATCH ide-dev-2.6] sata_sil: Mod15Write workaround
X-UID: 125
X-Keywords: 
   

 Hello, Jeff.

 I've finished the sata_sil workaround.  It turned out that libata
already has all the hooks needed.  Although I had to twist things a
bit, the workaround is completely contained inside sata_sil driver.

 The new work-around doesn't limit max sectors 15.  All read requests
and write requests <= 15 sectors are processed as-is.  Write requests
larger than 15 sectors are iterated inside the sata_sil driver using
the ops->qc_prep and qc->complete_fn hooks.  The work-around doesn't
map/unmap on each iteration, it just manipulates mapped sg table and
thus the PRD entries.

 I've been running tests (repeated mke2fs and bonnie) several hours
from yesterday and it hasn't caused any problem yet.  Read performance
is now unhampered.  Write performance doesn't look very good, but it's
still much better.  I'm having difficult time remembering results but
on ext2, I think the write performance was better (compared to other
controllers, in ratio).  If you have a siimage controller and seagate
drives with this problem, please don't hesitate benchmarking.

 Also, I think it would be very helpful if we can find out what the
Windows driver is doing to work around Mod15Write.  As now we can
split write requests at will without affecting upper layers, we can
easily replicate how they perform writes if we only know it.  So,
here are things I think might help.

 * Benchmarking new workaround.  I think there should be tools better
   suited for this purpose than bonnie.
 * Benchmarking Mod15Write affected drives' read/write performance on
   affected siimage controllers and on other controllers on Windows.
 * Finding out how Windows splits write requests on affected drives.
   The best way would be Silicon Image coming out of the closet and
   tells us what they did with their Windows driver, but that doesn't
   seem likely.  So, if somebody has the right equipment and time,
   please come forward and shed some light here.

 These sil3112/3114 controllers are way too common and so are 7200.7
Seagate drives.  I was shopping for a sata add-in card last week and
couldn't find any product which matches the price point of these sil
controllers and ended up buying one, even knowing about the Mod15Write
problem.  So, I think it would be great if we can get this thing to
work as fast as on Windows.  So, some inputs, please.  :-)

 Bonnie benchmark results follow and then the patch.  Per-char results
on P3 800 are capped by cpu, ignore them.

 The first one is the original sata_sil driver with max_sectors==15
work-around.  The second one is with the new work-around, and the last
one is on another machine with via controller.  I got confused about
the mount point so I'm not sure if it was a 3120026 or 3200822, but
either way, you can see the write performance is way better.

libata-dev-2.6  P3 800, Sil3112 rev 02, ST3120026AS
===
Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
 2G  9633  95 15101  24  6135  10  9975  87 14536  12 215.8   1
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16   590  99 + +++ 27052  88   604  99 + +++  1949  95

libata-dev-2.6 w/ workaroundP3 800, Sil3112 rev 02, ST3120026AS
===
Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
 2G 10606  95 23736  34 14695  19 12581  90 52786  31 218.5   1
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16   599  99 + +++ 30161  99   579  98 + +++  1971  97

linux-2.6.11A64 3000+, VT6420, ST3120026AS or ST3200822AS

Re: 2.6.11 USB broken on VIA computer (not just ACPI)

2005-03-15 Thread Andrew Morton
"Robert W. Fuller" <[EMAIL PROTECTED]> wrote:
>
> This isn't limited to the ACPI case.  My BIOS is old enough that ACPI is 
>  not supported because the kernel can't find RSDP.  I found that the USB 
>  works if I boot with "noapic."  This is probably sub-optimal on an SMP 
>  machine.  If don't boot with "noapic" I get the following errors:

Did it work OK under previous kernels?  If so, which versions?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Replace zone padding with a definition in cache.h

2005-03-15 Thread Nick Piggin
On Tue, 2005-03-15 at 20:12 -0800, Christoph Lameter wrote:
> This patch removes the zone padding hack and establishes definitions
> in include/linux/cache.h to define the padding within struct zone.
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]>
> 
> Index: linux-2.6.11/include/linux/cache.h
> ===
> --- linux-2.6.11.orig/include/linux/cache.h   2005-03-08 18:40:15.0 
> -0800
> +++ linux-2.6.11/include/linux/cache.h2005-03-14 10:33:45.247701040 
> -0800
> @@ -48,4 +48,12 @@
>  #endif
>  #endif
> 
> +#ifndef cacheline_pad_in_smp
> +#if defined(CONFIG_SMP)
> +#define cacheline_pad_in_smp struct { char  x; } 
> cacheline_maxaligned_in_smp
 ^^^

Doesn't this add a redundant cacheline if the padding is
previously perfect? Because of the extra byte you're adding?

IIRC, the char x[0]; trick does the job correctly.





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Replace zone padding with a definition in cache.h

2005-03-15 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
>  +#ifndef cacheline_pad_in_smp
>  +#if defined(CONFIG_SMP)
>  +#define cacheline_pad_in_smp struct { char  x; } 
> cacheline_maxaligned_in_smp
>  +#else
>  +#define cacheline_pad_in_smp
>  +#endif
>  +#endif

That's going to spit a warning with older gcc's.  "warning: unnamed
struct/union that defines no instances".
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Taking strlen of buffers copied from userspace

2005-03-15 Thread Randy.Dunlap
Robert Hancock wrote:
Artem Frolov wrote:
Hello,
I am in the process of testing static defect analyzer on a Linux
kernel source code (see disclosure below).
I found some potential array bounds violations. The pattern is as
follows: bytes are copied from the user space and then buffer is
accessed on index strlen(buf)-1. This is a defect if user data start
from 0. So the question is: can we make any assumptions what data may
be received from the user or it could be arbitrary?

In general I don't think any such assumptions should be made. In the 
case of the two below I'm assuming that root access is required to write 
those files, preventing any serious security hole, but it shouldn't 
really be permitted to corrupt kernel memory like this, as would likely 
happen if somebody wrote some data that contained a null as the first 
character.

For example, in ./drivers/block/cciss.c, function cciss_proc_write
(line numbers are taken form 2.6.11.3):
   
   293  if (count > sizeof(cmd)-1) return -EINVAL;
   294  if (copy_from_user(cmd, buffer, count)) return -EFAULT;
   295  cmd[count] = '\0';
   296  len = strlen(cmd);  // above 3 lines ensure safety
   297  if (cmd[len-1] == '\n')
   298  cmd[--len] = '\0';
   .
Another example is arch/i386/kernel/cpu/mtrr/if.c, function mtrr_write:
   
   107  if (copy_from_user(line, buf, len - 1))
   108  return -EFAULT;
   109  ptr = line + strlen(line) - 1;
   110  if (*ptr == '\n')
   111  *ptr = '\0';

This one is also unsafe if somebody writes some data which is not 
null-terminated (assuming that that's possible), since strlen will run 
off the end of the buffer. The first example doesn't have that problem.
The latter one does (before the listed code):
memset(line, 0, LINE_SIZE);
if (len > LINE_SIZE)
len = LINE_SIZE;
if (copy_from_user(line, buf, len - 1))
return -EFAULT;
so isn't line[LINE_SIZE - 1] always 0 ?
--
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2/2] SquashFS

2005-03-15 Thread Matt Mackall
On Tue, Mar 15, 2005 at 05:04:32PM -0800, Matt Mackall wrote:
> On Tue, Mar 15, 2005 at 11:25:07PM +, Phillip Lougher wrote:
> > >>+ unsigned ints_major:16;
> > >>+ unsigned ints_minor:16;
> > >
> > >What's going on here? s_minor's not big enough for modern minor
> > >numbers.
> > 
> > What is the modern size then?
> 
> Minors are 22 bits, majors are 10. May grow to 32 each at some point.

Both akpm and I remembered wrong, fyi. It's 12 major bits, 20 minor.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cacheline alignment for cpu maps

2005-03-15 Thread Christoph Lameter
Add cacheline alignment to some critical SMP management maps.
These are in particular important for NUMA systems to avoid false
sharing.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]>

Index: linux-2.6.11/arch/i386/kernel/smpboot.c
===
--- linux-2.6.11.orig/arch/i386/kernel/smpboot.c2005-03-14 
10:32:53.349590752 -0800
+++ linux-2.6.11/arch/i386/kernel/smpboot.c 2005-03-14 10:33:48.592192600 
-0800
@@ -64,7 +64,7 @@ int phys_proc_id[NR_CPUS]; /* Package ID
 EXPORT_SYMBOL(phys_proc_id);

 /* bitmap of online cpus */
-cpumask_t cpu_online_map;
+cpumask_t cpu_online_map __cacheline_aligned;

 cpumask_t cpu_callin_map;
 cpumask_t cpu_callout_map;
@@ -472,10 +472,10 @@ extern struct {
 #ifdef CONFIG_NUMA

 /* which logical CPUs are on which nodes */
-cpumask_t node_2_cpu_mask[MAX_NUMNODES] =
+cpumask_t node_2_cpu_mask[MAX_NUMNODES] __cacheline_aligned =
{ [0 ... MAX_NUMNODES-1] = CPU_MASK_NONE };
 /* which node each logical CPU is on */
-int cpu_2_node[NR_CPUS] = { [0 ... NR_CPUS-1] = 0 };
+int cpu_2_node[NR_CPUS] __cacheline_aligned = { [0 ... NR_CPUS-1] = 0 };
 EXPORT_SYMBOL(cpu_2_node);

 /* set up a mapping between cpu and node. */
@@ -503,7 +503,8 @@ static inline void unmap_cpu_to_node(int

 #endif /* CONFIG_NUMA */

-u8 cpu_2_logical_apicid[NR_CPUS] = { [0 ... NR_CPUS-1] = BAD_APICID };
+u8 cpu_2_logical_apicid[NR_CPUS] __cacheline_aligned =
+   { [0 ... NR_CPUS-1] = BAD_APICID };

 static void map_cpu_to_logical_apicid(void)
 {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Replace zone padding with a definition in cache.h

2005-03-15 Thread Christoph Lameter
This patch removes the zone padding hack and establishes definitions
in include/linux/cache.h to define the padding within struct zone.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]>

Index: linux-2.6.11/include/linux/cache.h
===
--- linux-2.6.11.orig/include/linux/cache.h 2005-03-08 18:40:15.0 
-0800
+++ linux-2.6.11/include/linux/cache.h  2005-03-14 10:33:45.247701040 -0800
@@ -48,4 +48,12 @@
 #endif
 #endif

+#ifndef cacheline_pad_in_smp
+#if defined(CONFIG_SMP)
+#define cacheline_pad_in_smp struct { char  x; } 
cacheline_maxaligned_in_smp
+#else
+#define cacheline_pad_in_smp
+#endif
+#endif
+
 #endif /* __LINUX_CACHE_H */
Index: linux-2.6.11/include/linux/mmzone.h
===
--- linux-2.6.11.orig/include/linux/mmzone.h2005-03-14 10:33:01.037422024 
-0800
+++ linux-2.6.11/include/linux/mmzone.h 2005-03-14 10:33:45.248700888 -0800
@@ -28,21 +28,6 @@ struct free_area {

 struct pglist_data;

-/*
- * zone->lock and zone->lru_lock are two of the hottest locks in the kernel.
- * So add a wild amount of padding here to ensure that they fall into separate
- * cachelines.  There are very few zone structures in the machine, so space
- * consumption is not a concern here.
- */
-#if defined(CONFIG_SMP)
-struct zone_padding {
-   char x[0];
-} cacheline_maxaligned_in_smp;
-#define ZONE_PADDING(name) struct zone_padding name;
-#else
-#define ZONE_PADDING(name)
-#endif
-
 struct per_cpu_pages {
int count;  /* number of pages in the list */
int low;/* low watermark, refill needed */
@@ -131,7 +116,14 @@ struct zone {
struct free_areafree_area[MAX_ORDER];


-   ZONE_PADDING(_pad1_)
+   /*
+* zone->lock and zone->lru_lock are two of the hottest locks in the 
kernel.
+* So add a wild amount of padding here to ensure that they fall into 
separate
+* cachelines.  There are very few zone structures in the machine, so 
space
+* consumption is not a concern here.
+*/
+
+   cacheline_pad_in_smp;

/* Fields commonly accessed by the page reclaim scanner */
spinlock_t  lru_lock;
@@ -164,7 +156,7 @@ struct zone {
int prev_priority;


-   ZONE_PADDING(_pad2_)
+   cacheline_pad_in_smp;
/* Rarely used or read-mostly fields */

/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Can no longer build ipv6 built-in (2.6.11, today's BK head)

2005-03-15 Thread David S. Miller
On Wed, 16 Mar 2005 14:53:29 +1100
Peter Chubb <[EMAIL PROTECTED]> wrote:

> A simple fix is to delete the __exit from the various functions now that
> they're called other than at module_exit.
> 
> Signed-off-by: Peter Chubb <[EMAIL PROTECTED]>

Applied, thanks Peter.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Localize Pagesets in a NUMA system using the NUMA slab allocator

2005-03-15 Thread Christoph Lameter
[obviously depends on the NUMA slab allocator]

This patch modifies the way pagesets in struct zone are managed. It relocates
the pagesets for each cpu to the node that is nearest to the cpu using
the NUMA slab allocator. This means that the operations to manage pages
on remote zone can be done with information available locally.

Signed-off-by: Shobhit Dayal <[EMAIL PROTECTED]>
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]>

Index: linux-2.6.11/drivers/base/node.c
===
--- linux-2.6.11.orig/drivers/base/node.c   2005-03-15 10:59:44.158914024 
-0800
+++ linux-2.6.11/drivers/base/node.c2005-03-15 10:59:47.308435224 -0800
@@ -87,7 +87,7 @@ static ssize_t node_read_numastat(struct
for (i = 0; i < MAX_NR_ZONES; i++) {
struct zone *z = >node_zones[i];
for (cpu = 0; cpu < NR_CPUS; cpu++) {
-   struct per_cpu_pageset *ps = >pageset[cpu];
+   struct per_cpu_pageset *ps = z->pageset[cpu];
numa_hit += ps->numa_hit;
numa_miss += ps->numa_miss;
numa_foreign += ps->numa_foreign;
Index: linux-2.6.11/include/linux/mm.h
===
--- linux-2.6.11.orig/include/linux/mm.h2005-03-15 10:31:51.217239816 
-0800
+++ linux-2.6.11/include/linux/mm.h 2005-03-15 10:59:47.309435072 -0800
@@ -691,6 +691,7 @@ extern void mem_init(void);
 extern void show_mem(void);
 extern void si_meminfo(struct sysinfo * val);
 extern void si_meminfo_node(struct sysinfo *val, int nid);
+extern void setup_per_cpu_pageset(void);

 /* prio_tree.c */
 void vma_prio_tree_add(struct vm_area_struct *, struct vm_area_struct *old);
Index: linux-2.6.11/include/linux/mmzone.h
===
--- linux-2.6.11.orig/include/linux/mmzone.h2005-03-15 10:59:44.158914024 
-0800
+++ linux-2.6.11/include/linux/mmzone.h 2005-03-15 10:59:47.309435072 -0800
@@ -107,7 +107,7 @@ struct zone {
 */
unsigned long   lowmem_reserve[MAX_NR_ZONES];

-   struct per_cpu_pageset  pageset[NR_CPUS];
+   struct per_cpu_pageset  *pageset[NR_CPUS];

/*
 * free areas of different sizes
Index: linux-2.6.11/init/main.c
===
--- linux-2.6.11.orig/init/main.c   2005-03-15 10:31:52.09128 -0800
+++ linux-2.6.11/init/main.c2005-03-15 10:59:47.310434920 -0800
@@ -490,6 +490,7 @@ asmlinkage void __init start_kernel(void
vfs_caches_init_early();
mem_init();
kmem_cache_init();
+   setup_per_cpu_pageset();
numa_policy_init();
if (late_time_init)
late_time_init();
Index: linux-2.6.11/mm/mempolicy.c
===
--- linux-2.6.11.orig/mm/mempolicy.c2005-03-15 10:59:44.159913872 -0800
+++ linux-2.6.11/mm/mempolicy.c 2005-03-15 10:59:47.310434920 -0800
@@ -721,7 +721,7 @@ static struct page *alloc_page_interleav
zl = NODE_DATA(nid)->node_zonelists + (gfp & GFP_ZONEMASK);
page = __alloc_pages(gfp, order, zl);
if (page && page_zone(page) == zl->zones[0]) {
-   zl->zones[0]->pageset[get_cpu()].interleave_hit++;
+   zl->zones[0]->pageset[get_cpu()]->interleave_hit++;
put_cpu();
}
return page;
Index: linux-2.6.11/mm/page_alloc.c
===
--- linux-2.6.11.orig/mm/page_alloc.c   2005-03-15 10:59:44.159913872 -0800
+++ linux-2.6.11/mm/page_alloc.c2005-03-15 10:59:47.312434616 -0800
@@ -68,6 +68,7 @@ EXPORT_SYMBOL(nr_swap_pages);
  */
 struct zone *zone_table[1 << (ZONES_SHIFT + NODES_SHIFT)];
 EXPORT_SYMBOL(zone_table);
+struct per_cpu_pageset pageset_table[MAX_NR_ZONES*MAX_NUMNODES*NR_CPUS] 
__initdata;

 static char *zone_names[MAX_NR_ZONES] = { "DMA", "Normal", "HighMem" };
 int min_free_kbytes = 1024;
@@ -518,7 +519,7 @@ static void __drain_pages(unsigned int c
for_each_zone(zone) {
struct per_cpu_pageset *pset;

-   pset = >pageset[cpu];
+   pset = zone->pageset[cpu];
for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) {
struct per_cpu_pages *pcp;

@@ -581,12 +582,12 @@ static void zone_statistics(struct zonel

local_irq_save(flags);
cpu = smp_processor_id();
-   p = >pageset[cpu];
+   p = z->pageset[cpu];
if (pg == orig) {
-   z->pageset[cpu].numa_hit++;
+   z->pageset[cpu]->numa_hit++;
} else {
p->numa_miss++;
-   zonelist->zones[0]->pageset[cpu].numa_foreign++;
+   zonelist->zones[0]->pageset[cpu]->numa_foreign++;
}
if (pg == 

[PATCH] NUMA Slab Allocator

2005-03-15 Thread Christoph Lameter
This is a NUMA slab allocator. It creates slabs on multiple nodes and
manages slabs in such a way that locality of allocations is optimized.
Each node has its own list of partial, free and full slabs. All object
allocations for a node occur from node specific slab lists.

Signed-off-by: Alok N Kataria <[EMAIL PROTECTED]>
Signed-off-by: Shobhit Dayal <[EMAIL PROTECTED]>
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]>

Index: linux-2.6.11/include/linux/slab.h
===
--- linux-2.6.11.orig/include/linux/slab.h  2005-03-15 14:47:12.567040608 
-0800
+++ linux-2.6.11/include/linux/slab.h   2005-03-15 14:47:19.290018560 -0800
@@ -63,11 +63,11 @@ extern int kmem_cache_destroy(kmem_cache
 extern int kmem_cache_shrink(kmem_cache_t *);
 extern void *kmem_cache_alloc(kmem_cache_t *, int);
 #ifdef CONFIG_NUMA
-extern void *kmem_cache_alloc_node(kmem_cache_t *, int);
+extern void *kmem_cache_alloc_node(kmem_cache_t *, int, int);
 #else
-static inline void *kmem_cache_alloc_node(kmem_cache_t *cachep, int node)
+static inline void *kmem_cache_alloc_node(kmem_cache_t *cachep, int node, int 
gfp_mask)
 {
-   return kmem_cache_alloc(cachep, GFP_KERNEL);
+   return kmem_cache_alloc(cachep, gfp_mask);
 }
 #endif
 extern void kmem_cache_free(kmem_cache_t *, void *);
@@ -81,6 +81,33 @@ struct cache_sizes {
 };
 extern struct cache_sizes malloc_sizes[];
 extern void *__kmalloc(size_t, int);
+extern void *__kmalloc_node(size_t, int, int);
+
+/*
+ * A new interface to allow allocating memory on a specific node.
+ */
+static inline void *kmalloc_node(size_t size, int node, int flags)
+{
+   if (__builtin_constant_p(size)) {
+   int i = 0;
+#define CACHE(x) \
+   if (size <= x) \
+   goto found; \
+   else \
+   i++;
+#include "kmalloc_sizes.h"
+#undef CACHE
+   {
+   extern void __you_cannot_kmalloc_that_much(void);
+   __you_cannot_kmalloc_that_much();
+   }
+found:
+   return kmem_cache_alloc_node((flags & GFP_DMA) ?
+   malloc_sizes[i].cs_dmacachep :
+   malloc_sizes[i].cs_cachep, node, flags);
+   }
+   return __kmalloc_node(size, node, flags);
+}

 static inline void *kmalloc(size_t size, int flags)
 {
Index: linux-2.6.11/mm/slab.c
===
--- linux-2.6.11.orig/mm/slab.c 2005-03-15 14:47:12.567040608 -0800
+++ linux-2.6.11/mm/slab.c  2005-03-15 16:17:27.242884760 -0800
@@ -75,6 +75,15 @@
  *
  * At present, each engine can be growing a cache.  This should be blocked.
  *
+ * 15 March 2005. NUMA slab allocator.
+ * Shobhit Dayal <[EMAIL PROTECTED]>
+ * Alok N Kataria <[EMAIL PROTECTED]>
+ *
+ * Modified the slab allocator to be node aware on NUMA systems.
+ * Each node has its own list of partial, free and full slabs.
+ * All object allocations for a node occur from node specific slab lists.
+ * Created a new interface called kmalloc_node() for allocating memory from
+ * a specific node.
  */

 #include   
@@ -92,7 +101,7 @@
 #include   
 #include   
 #include   
-
+#include   
 #include   
 #include   
 #include   
@@ -210,6 +219,7 @@ struct slab {
void*s_mem; /* including colour offset */
unsigned intinuse;  /* num of objs active in slab */
kmem_bufctl_t   free;
+   unsigned short  nodeid;
 };

 /*
@@ -278,21 +288,58 @@ struct kmem_list3 {
int free_touched;
unsigned long   next_reap;
struct array_cache  *shared;
+   spinlock_t  list_lock;
+   unsigned intfree_limit;
 };

+/*
+ * Need this for bootstrapping a per node allocator.
+ */
+#define NUM_INIT_LISTS 3
+struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];
+struct kmem_list3 __initdata kmem64_list3[MAX_NUMNODES];
+
 #define LIST3_INIT(parent) \
-   { \
-   .slabs_full = LIST_HEAD_INIT(parent.slabs_full), \
-   .slabs_partial  = LIST_HEAD_INIT(parent.slabs_partial), \
-   .slabs_free = LIST_HEAD_INIT(parent.slabs_free) \
-   }
+   do {\
+   INIT_LIST_HEAD(&(parent)->slabs_full);  \
+   INIT_LIST_HEAD(&(parent)->slabs_partial);   \
+   INIT_LIST_HEAD(&(parent)->slabs_free);  \
+   (parent)->shared = NULL; \
+   (parent)->list_lock = SPIN_LOCK_UNLOCKED;   \
+   (parent)->free_objects = 0; \
+   (parent)->free_touched = 0; \
+   } while(0)
+
+#define MAKE_LIST(cachep, listp, slab, nodeid) \
+   do {\
+   if(list_empty(&(cachep->nodelists[nodeid]->slab)))  \
+   

Re: [topic change] jiffies as a time value

2005-03-15 Thread john stultz
George,
I'm still digesting your mail. For now I'll just answer the easy bits,
and I'll owe you a better reply once I get all of this absorbed. 

On Tue, 2005-03-15 at 15:01 -0800, George Anzinger wrote:
> We also need, IMNSHO to recognize that, at lest with some hardware, that 
> interrupt IS in fact the clock and is the only reasonable way we have of 
> reading 
> it.  This is true, for example, on the x86.  The TSC we use as a fill in for 
> between interrupts is not stable in the long term and should only be used to 
> interpolate over 1 to 10 ticks or so.

Yep, the TSC is a terrible time source, but everyone still loves it! Its
so fast! However since every timesource isn't so bad, I don't feel we
need to punish everyone with the bugs interpolation can cause. 

So my plan is an "interpolated timesource", which will fit into my
current framework without any changes. Basically it will work as the
current tsc/tick code does, but just in its own timesource driver, so
the core code stays pretty and sane. It will still preserve some of the
issues we see now with the interpolated time code, but since we're in a
more flexible environment, we might be able more easily try new
workarounds.

thanks
-john

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Changes to the driver model class code.

2005-03-15 Thread Dmitry Torokhov
On Tuesday 15 March 2005 17:14, Greg KH wrote:
> > Ease-of-use, maybe. However, it also means
> > ease-of-getting-reference-counting-wrong. And reference counting trumps it
> > all :)
> 
> It will not make the reference counting logic easier to get wrong, or
> easier to get right.  It totally takes it away from the user, and makes
> them implement it themselves if they so wish (like the USB HCD patch
> does.)

Exactly, _IF_ they wish. And as practice shows proper reference counting
is not the very first thing people are concerned about.

I see the the new class interface useful in the following scenario:

 - We have a proper subsystem that already does proper refcounting and
   one might not want to consolidate core reference counting with
   subsystems as it is too invasive.

If you consider the following scenario I do not think we want to
encourage it:

 - We have "bad" system and user says "ah, I'll just use the new model
   so I don't have to think about lifetime rules at the moment, I don't
   have time/have something more interesting to do/I'll do that later
   when I have time".

There is a third scenario:

 - We have a completely new system or a system undergoing overhaul: the
   coder tries to do it right and does consider all lifetime rules and
   makes sure that all objects are properly accounted for. In this case
   old interface is much more clear and easier to use than the new one.

I also not quite sure why a bus with its devices and drivers can be
implemented correctly (I believe we have a bunch of them now - PCI, USB,
serio, gameport) but class interface cannot be tamed?

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.11 USB broken on VIA computer (not just ACPI)

2005-03-15 Thread Robert W. Fuller
This isn't limited to the ACPI case.  My BIOS is old enough that ACPI is 
not supported because the kernel can't find RSDP.  I found that the USB 
works if I boot with "noapic."  This is probably sub-optimal on an SMP 
machine.  If don't boot with "noapic" I get the following errors:

Mar 15 21:30:17 falcon USB Universal Host Controller Interface driver v2.2
Mar 15 21:30:17 falcon uhci_hcd :00:07.2: VIA Technologies, Inc. 
VT82x UHCI USB 1.1 Controller
Mar 15 21:30:17 falcon uhci_hcd :00:07.2: irq 19, io base 0xa400
Mar 15 21:30:17 falcon uhci_hcd :00:07.2: new USB bus registered, 
assigned bus number 1
Mar 15 21:30:17 falcon hub 1-0:1.0: USB hub found
Mar 15 21:30:17 falcon hub 1-0:1.0: 2 ports detected
Mar 15 21:30:17 falcon usb 1-2: new low speed USB device using uhci_hcd 
and address 2
Mar 15 21:30:18 falcon uhci_hcd :00:07.2: Unlink after no-IRQ? 
Controller is probably using the wrong IRQ.
Mar 15 21:30:18 falcon usb 1-2: khubd timed out on ep0in
Mar 15 21:30:24 falcon usb 1-2: khubd timed out on ep0out
Mar 15 21:30:29 falcon usb 1-2: khubd timed out on ep0out
Mar 15 21:30:29 falcon usb 1-2: device not accepting address 2, error -110

Here's my lspci:
:00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo 
PRO133x] (rev c4)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- 
	Latency: 0
	Region 0: Memory at d000 (32-bit, prefetchable)
	Capabilities: [a0] AGP version 2.0
		Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- 
FW+ AGP3- Rate=x1,x2,x4
		Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x2
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

:00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo 
MVP3/Pro133x AGP] (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- 
	Latency: 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 9000-9fff
	Memory behind bridge: e000-e1ff
	Prefetchable memory behind bridge: d800-dfff
	Expansion ROM at 9000 [disabled] [size=4K]
	BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

:00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Mobile 
South] (rev 23)
	Subsystem: VIA Technologies, Inc. VT82C596/A/B PCI to ISA Bridge
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping+ SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- 
	Latency: 0

:00:07.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 10) 
(prog-if 8a [Master SecP PriP])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
	Latency: 32
	Region 4: I/O ports at a000 [size=16]
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

:00:07.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 
1.1 Controller (rev 11) (prog-if 00 [UHCI])
	Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- 
	Latency: 32, cache line size 08
	Interrupt: pin D routed to IRQ 11
	Region 4: I/O ports at a400 [size=32]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

:00:07.3 Host bridge: VIA Technologies, Inc. VT82C596 Power 
Management (rev 30)
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 

:00:0f.0 Unknown mass storage controller: Promise Technology, Inc. 
20269 (rev 02) (prog-if 85)
	Subsystem: Promise Technology, Inc. Ultra133TX2
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- SERR- 
	Latency: 32 (1000ns min, 4500ns max), cache line size 08
	Interrupt: pin A routed to IRQ 11
	Region 0: I/O ports at a800
	Region 1: I/O ports at ac00 [size=4]
	Region 2: I/O ports at b000 [size=8]
	Region 3: I/O ports at b400 [size=4]
	Region 4: I/O ports at 

Re: a problem with linux 2.6.11 and sa

2005-03-15 Thread George Georgalis
On Wed, Mar 09, 2005 at 06:28:35PM -0500, Paul Jarc wrote:
>"George Georgalis" <[EMAIL PROTECTED]> wrote:
>> It (Gerrit Pape's technique) very defiantly stopped working a few revs
>> back (2.6.7?). I'm seeing a similar failed read from /dev/rtc and
>> mplayer with 2.6.10, now too.
>
>The /proc/kmsg problem happens because the kernel now checks for
>permission at read() instead of open().  The /dev/rtc problem seems to
>be a different beast.

Thanks for the kmsg clairfication, Paul.

>> while read file; do mplayer $file ; done >
>> Failed to open /dev/rtc: Permission denied
>>
>> for file in `cat mediafiles.txt`; do mplayer $file ; done
>>
>> works.
>
>To simplify, what about these two:
>mplayer foo.mpg
>mplayer foo.mpg < mediafiles.txt
>
>You might try strace'ing both cases and see how they compare.

The particular host does not have X support so mpg is out.
I'm not sure that that test would work as mplayer requires filenames
as command arguments not stdin (exclusivly, I think); my guess
is mplayer would try to decode stdin.

this works fine
mplayer `cat zz.mtest `

Then I tried
mplayer /dev/stdin  /proc/sys/dev/rtc/max-user-freq" to your system startup 
scripts.

the file almost played though...
Playing 
/usr/nfs/sandbox/media/audio/_the-party-has-just-begun/Lebanese_Blonde.ogg.
Ogg file format detected.
...

But it seemed to take keyboard commands from the binary
No bind found for key _ 
A:   0.1 (00.1) ??,?%   
No bind found for key R 
A:   0.8 (00.8)  4.2%   

and quit.  I tried the sysctl suggestion, no change, whenever a file list
is redirected to stdin, and a filename argument is given to mplayer, eg

while read file; do mplayer "$file" ; done http://galis.org/george/ cell:646-331-2027 mailto:[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bogus buffer length check in linux-2.6.11 read()

2005-03-15 Thread Tom Felker
On Tuesday 15 March 2005 11:59 am, linux-os wrote:
> The attached file shows that the kernel thinks it's doing
> something helpful by checking the length of the input
> buffer for a read(). It will return "Bad Address" until
> the length is 1632 bytes.  Apparently the kernel thinks
> 1632 is a good length!
>
> Did anybody consider the overhead necessary to do this
> and the fact that the kernel has no way of knowing if
> the pointer to the buffer is valid until it actually
> does the write. What was wrong with copy_to_user()?
> Why is there the additional bogus check?

I don't think that's what's happening.  The kernel is perfectly happy to read 
data into any virtual address range that your process can legally write to - 
this includes any part of the heap and any part of the stack.  The kernel 
can't check whether writing to the given address would clobber the stack or 
heap - it's your memory, you manage it.  The kernel's notion of an "invalid 
address" is very simple, and doesn't include every address that you would 
consider invalid from a C perspective.

So what's probably happening is that your stack is (1632+256) bytes tall, 
including the buffer you allocated.  (Stack grows downward on i386.)  So 
ideally you read less than 256 bytes.  If you read more than 256 but less 
than 1888 bytes, the read would damage other elements on the stack, but it is 
OK as far as the kernel is concerned.  But if you read more than that, you're 
asking the kernel to write to an address that is higher than the highest 
address of the stack (the address of the bottom element), and this address 
isn't mapped into your process, so you get EINVAL.

If you were to type more than 256 (but less than 1888) characters before 
pressing enter, the read would silently overflow the buffer, thus clobbering 
the stack, including the return address of main().  So when main tried to 
return, you'd get a segfault.  Somebody with assembly skills could probably 
craft a string which, when your program reads it, would take control of the 
program.

-- 
Tom Felker, <[EMAIL PROTECTED]>
 - Stop fiddling with the volume knob.

No army can withstand the strength of an idea whose time has come.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Can no longer build ipv6 built-in (2.6.11, today's BK head)

2005-03-15 Thread Peter Chubb


Changeset 
  [EMAIL PROTECTED]|ChangeSet|20050310043957|06845
added cleanup to ipv6_init(), which calls ip6_route_cleanup()

ip6_route_cleanup() is marked __exit so cannot be called from an
__init section -- it's discarded by the linker from the image
(although it'll be retained in a module).

You get errors like this:
ip6_route_cleanup: discarded in section `.exit.text' from
net/built-in.o 
xfrm6_fini: discarded in section `.exit.text' from net/built-in.o
fib6_gc_cleanup: discarded in section `.exit.text' from net/built-in.o
ipv6_packet_cleanup: discarded in section `.exit.text' from
net/built-in.o


A simple fix is to delete the __exit from the various functions now that
they're called other than at module_exit.

Signed-off-by: Peter Chubb <[EMAIL PROTECTED]>

Index: linux-2.5-import/net/ipv6/route.c
===
--- linux-2.5-import.orig/net/ipv6/route.c  2005-03-16 10:12:44.742595387 
+1100
+++ linux-2.5-import/net/ipv6/route.c   2005-03-16 13:01:50.246678866 +1100
@@ -2116,7 +2116,7 @@
 #endif
 }
 
-void __exit ip6_route_cleanup(void)
+void ip6_route_cleanup(void)
 {
 #ifdef CONFIG_PROC_FS
proc_net_remove("ipv6_route");
Index: linux-2.5-import/net/ipv6/ipv6_sockglue.c
===
--- linux-2.5-import.orig/net/ipv6/ipv6_sockglue.c  2005-03-16 
10:12:44.736736056 +1100
+++ linux-2.5-import/net/ipv6/ipv6_sockglue.c   2005-03-16 13:24:19.095793200 
+1100
@@ -698,7 +698,7 @@
dev_add_pack(_packet_type);
 }
 
-void __exit ipv6_packet_cleanup(void)
+void ipv6_packet_cleanup(void)
 {
dev_remove_pack(_packet_type);
 }
Index: linux-2.5-import/net/ipv6/ip6_fib.c
===
--- linux-2.5-import.orig/net/ipv6/ip6_fib.c2005-03-15 12:28:44.819748921 
+1100
+++ linux-2.5-import/net/ipv6/ip6_fib.c 2005-03-16 13:27:46.423351526 +1100
@@ -1218,7 +1218,7 @@
panic("cannot create fib6_nodes cache");
 }
 
-void __exit fib6_gc_cleanup(void)
+void fib6_gc_cleanup(void)
 {
del_timer(_fib_timer);
kmem_cache_destroy(fib6_node_kmem);
Index: linux-2.5-import/net/ipv6/xfrm6_policy.c
===
--- linux-2.5-import.orig/net/ipv6/xfrm6_policy.c   2005-03-15 
12:28:44.853928319 +1100
+++ linux-2.5-import/net/ipv6/xfrm6_policy.c2005-03-16 13:53:28.890552848 
+1100
@@ -276,7 +276,7 @@
xfrm_policy_register_afinfo(_policy_afinfo);
 }
 
-static void __exit xfrm6_policy_fini(void)
+static void xfrm6_policy_fini(void)
 {
xfrm_policy_unregister_afinfo(_policy_afinfo);
 }
@@ -287,7 +287,7 @@
xfrm6_state_init();
 }
 
-void __exit xfrm6_fini(void)
+void xfrm6_fini(void)
 {
//xfrm6_input_fini();
xfrm6_policy_fini();
Index: linux-2.5-import/net/ipv6/xfrm6_state.c
===
--- linux-2.5-import.orig/net/ipv6/xfrm6_state.c2005-03-15 
12:28:44.854904874 +1100
+++ linux-2.5-import/net/ipv6/xfrm6_state.c 2005-03-16 13:29:30.183337361 
+1100
@@ -129,7 +129,7 @@
xfrm_state_register_afinfo(_state_afinfo);
 }
 
-void __exit xfrm6_state_fini(void)
+void xfrm6_state_fini(void)
 {
xfrm_state_unregister_afinfo(_state_afinfo);
 }



-- 
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
The technical we do immediately,  the political takes *forever*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFS client bug in 2.6.8-2.6.11

2005-03-15 Thread Bernardo Innocenti
Neil Conway wrote:
766 -> 770 sounds like a "small" (ish) number of patches to check, if
we're lucky.  Did you wade through 'em all yet?  Any smoking guns?
The RPM changelog doesn't contain anything relevant
between 766 and 770:
---CUT---
* Thu Feb 24 2005 Dave Jones <[EMAIL PROTECTED]>
- Use old scheme first when probing USB. (#145273)
* Wed Feb 23 2005 Dave Jones <[EMAIL PROTECTED]>
- Try as you may, there's no escape from crap SCSI hardware. (#149402)
* Mon Feb 21 2005 Dave Jones <[EMAIL PROTECTED]>
- Disable some experimental USB EHCI features.
* Tue Feb 15 2005 Dave Jones <[EMAIL PROTECTED]>
- Fix bio leak in md layer.
---CUT---
Perhaps the changelog is incomplete.  I don't have the
two SRPMs at hand to make a comparison.
By the way, it seems upgrading to 2.6.10-1.770_FC3 just made
the bug much harder to trigger: I've definitely seen it once
again when I had left a shell sitting in an NFS directory
overnight.  I couldn't reproduce it a second time.

PS: oh bugger, just remembered that I also reproduced my bug with a
2.6.8 kernel on the server; admittedly though it was an FC2 kernel so
who knows what extra patches it had.
You can easily find out by downloading the SRPM.  Now that
Fedora provides a public CVS, perhaps it could be used to
make such investigations directly with the cvsweb interface
without downloading and unpacking a 40MB file.
--
 // Bernardo Innocenti - Develer S.r.l., R dept.
\X/  http://www.develer.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] PCI Express Advanced Error Reporting Driver

2005-03-15 Thread Greg KH
On Tue, Mar 15, 2005 at 07:12:07PM -0700, Grant Grundler wrote:
> On Tue, Mar 15, 2005 at 01:11:39PM -0700, Grant Grundler wrote:
> > Tom,
> > A co-worker made the following observation (I'm paraphrasing):
> > ...this proposal does not deal with the Error Reporting ECN.
> > For example, they do not show the advisory non-fatal bit in
> > the correctable error status register.
> > 
> > I believe he is referring to the "Error Clarifications ECN":
> > 
> > 
> > http://www.pcisig.com/specifications/pciexpress/ECN_-_Error_Clarifications.pdf
> 
> Tom,
> Sorry - I got this wrong.
> He was referring to an unpublished draft "Error Reporting ECN".
> You'll have to talk to Intel's PCI-SIG representative to get a copy.
> [ Ugh. And everyone else is SOL - sorry ]

Then we have no obligation to be compliant with a unpublished spec :)

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] /proc umask and gid [was: Make /proc/ chmod'able]

2005-03-15 Thread Rene Scharfe
So, I gather from the feedback I've got that chmod'able /proc/
would be a bit over the top. 8-)  While providing the easiest and most
intuitive user interface for changing the permissions on those
directories, it is overkill.  Paul is right when he says that such a
feature should be turned on or off for all sessions at once, and that's
it.

My patch had at least one other problem: the contents of eac
/proc/ directory became chmod'able, too, which was not intended.

Instead of fixing it up I took two steps back, dusted off the umask
kernel parameter patch and added the "special gid" feature I mentioned.

Without the new kernel parameters behaviour is unchanged.  Add
proc.umask=077 and all /proc/ will get a permission mode of 500.
This breaks pstree (no output), as Bodo already noted, because this
program needs access to /proc/1.  It also breaks w -- it shows the
correct number of users but it lists X even for sessions owned
by the user running it.

Use proc.umask=007 and proc.gid=50 instead and all /proc/ dirs
will have a mode of 550 and their group attribute will be set to 50
(that's "staff" on my Debian system).  Pstree will work for all members
of that special group (just like top, ps and w -- which also show
everything in that case).  Normal users will still have a restricted
view.

Albert, would you take fixes for w even though you despise the feature
that makes them necessary?

Is this less scary?  Still useful?

Thanks,
Rene



diff -urp linux-2.6.11-mm3/Documentation/kernel-parameters.txt 
l5/Documentation/kernel-parameters.txt
--- linux-2.6.11-mm3/Documentation/kernel-parameters.txt2005-03-12 
19:23:30.0 +0100
+++ l5/Documentation/kernel-parameters.txt  2005-03-16 01:14:05.0 
+0100
@@ -1095,16 +1095,22 @@ running once the system is up.
[ISAPNP] Exclude memory regions for the 
autoconfiguration
Ranges are in pairs (memory base and size).
 
+   processor.max_cstate=   [HW, ACPI]
+   Limit processor to maximum C-state
+   max_cstate=9 overrides any DMI blacklist limit.
+
+   proc.gid=   [KNL] If non-zero all /proc/ directories will
+   have their group attribute set to that value.
+
+   proc.umask= [KNL] Restrict permissions of process specific
+   entries in /proc (i.e. the numerical directories).
+
profile=[KNL] Enable kernel profiling via /proc/profile
{ schedule |  }
(param: schedule - profile schedule points}
(param: profile step/bucket size as a power of 2 for
statistical time based profiling)
 
-   processor.max_cstate=   [HW, ACPI]
-   Limit processor to maximum C-state
-   max_cstate=9 overrides any DMI blacklist limit.
-
prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
before loading.
See Documentation/ramdisk.txt.
diff -urp linux-2.6.11-mm3/fs/proc/base.c l5/fs/proc/base.c
--- linux-2.6.11-mm3/fs/proc/base.c 2005-03-12 19:23:36.0 +0100
+++ l5/fs/proc/base.c   2005-03-16 01:54:52.0 +0100
@@ -35,8 +35,18 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "internal.h"
 
+static umode_t proc_umask;
+module_param_named(umask, proc_umask, ushort, 0);
+MODULE_PARM_DESC(umask, "umask for all /proc/ entries");
+
+static gid_t proc_gid;
+module_param_named(gid, proc_gid, uint, 0);
+MODULE_PARM_DESC(gid, "group attribute of all /proc/ entries");
+
 /*
  * For hysterical raisins we keep the same inumbers as in the old procfs.
  * Feel free to change the macro below - just keep the range distinct from
@@ -1149,6 +1159,8 @@ static struct inode *proc_pid_make_inode
inode->i_uid = task->euid;
inode->i_gid = task->egid;
}
+   if ((ino == PROC_TGID_INO || ino == PROC_TID_INO) && proc_gid)
+   inode->i_gid = proc_gid;
security_task_to_inode(task, inode);
 
 out:
@@ -1182,6 +1194,8 @@ static int pid_revalidate(struct dentry 
inode->i_uid = 0;
inode->i_gid = 0;
}
+   if ((proc_type(inode) == PROC_TGID_INO || proc_type(inode) == 
PROC_TID_INO) && proc_gid)
+   inode->i_gid = proc_gid;
security_task_to_inode(task, inode);
return 1;
}
@@ -1797,7 +1811,7 @@ struct dentry *proc_pid_lookup(struct in
put_task_struct(task);
goto out;
}
-   inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO;
+   inode->i_mode = S_IFDIR | ((S_IRUGO|S_IXUGO) & ~proc_umask);
inode->i_op = _tgid_base_inode_operations;
inode->i_fop = _tgid_base_operations;
inode->i_nlink = 3;
@@ -1852,7 +1866,7 @@ static struct dentry *proc_task_lookup(s
 

console/fbdev/DRM rearchitecture progress?

2005-03-15 Thread Adam
Back on the second of August Jon Smirl posted (http://tinyurl.com/5w2nt) a
synopsis of the plan created at OLS for the rearchitecture of the console, fbdev
and DRM subsystems.  Has any more thought gone into this major rework of the
kernel?

--adam

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] PCI Express Advanced Error Reporting Driver

2005-03-15 Thread Grant Grundler
On Tue, Mar 15, 2005 at 01:11:39PM -0700, Grant Grundler wrote:
> Tom,
> A co-worker made the following observation (I'm paraphrasing):
>   ...this proposal does not deal with the Error Reporting ECN.
>   For example, they do not show the advisory non-fatal bit in
>   the correctable error status register.
> 
> I believe he is referring to the "Error Clarifications ECN":
> 
>   
> http://www.pcisig.com/specifications/pciexpress/ECN_-_Error_Clarifications.pdf

Tom,
Sorry - I got this wrong.
He was referring to an unpublished draft "Error Reporting ECN".
You'll have to talk to Intel's PCI-SIG representative to get a copy.
[ Ugh. And everyone else is SOL - sorry ]

I'm annoyed he wanted me to raise this in a public forum without
having a public document to point at. And I'm annoyed at myself
for being lazy and not verifying that before hand...

sorry,
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BK Snapshots (Re: where did 2.6.11-bkx go?)

2005-03-15 Thread YOSHIFUJI Hideaki / $B5HF#1QL@(B
In article <[EMAIL PROTECTED]> (at Tue, 15 Mar 2005 13:28:26 -0500), sean 
<[EMAIL PROTECTED]> says:

> pub/mirrors/linux/kernel/linux/kernel/v2.6/snapshots
> 
> Now there just the 2.6.11.x snapshots.
> 
> For instance where is bk10?

Now 2.6.11.3-bk1 has come up...

The bk-snap script seems to be scewed up by the v2.6.11.3 tag.
It'd very nice to see 2.6.11-bk11 instead.
Current naming is very confusing; e.g. is this patch against 2.6.11.3?

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][1/2] SquashFS

2005-03-15 Thread Junio C Hamano
> "PJ" == Paul Jackson <[EMAIL PROTECTED]> writes:

PJ> There is not a concensus (nor a King Penguin dictate) between the
PJ> "while(1)" and "for(;;)" style to document.

FWIW, linux-0.01 has four uses of "while (1)" and two uses of
"for (;;)" ;-).

./fs/inode.c:   while (1) {
./fs/namei.c:   while (1) {
./fs/namei.c:   while (1) {
./kernel/sched.c:   while (1) {

./init/main.c:  for(;;) pause();
./kernel/panic.c:   for(;;);

What is interesting here is that the King Penguin used these two
constructs with consistency.  The "while (1)" form was used with
normal exit routes with "if (...) break" inside; while the
"for(;;)" form was used only in unusual "the thread of control
should get stuck here forever" cases.

So, Phillip's decision to go back to his original while(1) style
seems to be in line with the style used in the original Linux
kernel ;-).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

2005-03-15 Thread john stultz
On Wed, 2005-03-16 at 00:44 +0100, Pavel Machek wrote:
> On Út 15-03-05 15:42:09, john stultz wrote:
> > On Tue, 2005-03-15 at 23:59 +0100, Pavel Machek wrote:
> > > > diff -Nru a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c
> > > > --- a/arch/i386/kernel/apm.c2005-03-11 17:02:30 -08:00
> > > > +++ b/arch/i386/kernel/apm.c2005-03-11 17:02:30 -08:00
> > > > @@ -224,6 +224,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  
> > > >  #include 
> > > >  #include 
> > > > @@ -1204,6 +1205,7 @@
> > > > device_suspend(PMSG_SUSPEND);
> > > > device_power_down(PMSG_SUSPEND);
> > > >  
> > > > +   timeofday_suspend_hook();
> > > > /* serialize with the timer interrupt */
> > > > write_seqlock_irq(_lock);
> > > >  
> > > 
> > > Could you just register timeofday subsystem as a system device? Then
> > > device_power_down will call you automagically. And you'll not have
> > > to modify apm, acpi, swsusp, ppc suspend, arm suspend, ...
> > 
> > That may very well be the right way to go. At the moment I'm just very
> > hesitant of making any user-visible changes.
> > 
> > What is the impact if a new system device name is created and then I
> > later change it? How stable is that interface supposed to be?
> 
> Changing its name is okay... your device probably will not have any
> user-accessible controls, right?

Well, at some point I want to have some way for the user to be able to
select which timesource they want to be used. Similar to the current
"clock=" boot option override, there would be some sort of sysfs
timesource entry that users could "echo tsc" or whatever into in order
to force the system to use the tsc timesource at runtime.

This however would be separate from the timeofday suspend/resume hooks,
so its probably not an issue. Let me know if I'm wrong.

thanks!
-john


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch] x86, x86_64: Intel dual-core detection

2005-03-15 Thread Siddha, Suresh B
Appended patch adds the support for Intel dual-core detection and displaying
the core related information in /proc/cpuinfo. 

It adds two new fields "core id" and "cpu cores" to x86 /proc/cpuinfo
and the "core id" field for x86_64("cpu cores" field is already present in
x86_64).

Number of processor cores in a die is detected using cpuid(4) and this
is documented in IA-32 Intel Architecture Software Developer's Manual (vol 2a)
(http://developer.intel.com/design/pentium4/manuals/index_new.htm#sdm_vol2a)

This patch also adds cpu_core_map similar to cpu_sibling_map.

Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>

diff -Nru linux-2.6.11/arch/i386/kernel/cpu/amd.c 
linux-mc/arch/i386/kernel/cpu/amd.c
--- linux-2.6.11/arch/i386/kernel/cpu/amd.c 2005-03-01 23:38:26.0 
-0800
+++ linux-mc/arch/i386/kernel/cpu/amd.c 2004-11-01 16:13:46.141256624 -0800
@@ -188,6 +188,13 @@
}
 
display_cacheinfo(c);
+
+   if (cpuid_eax(0x8000) >= 0x8008) {
+   c->x86_num_cores = (cpuid_ecx(0x8008) & 0xff) + 1;
+   if (c->x86_num_cores & (c->x86_num_cores - 1))
+   c->x86_num_cores = 1;
+   }
+
detect_ht(c);
 
 #ifdef CONFIG_X86_HT
@@ -199,12 +206,6 @@
if (cpu_has(c, X86_FEATURE_CMP_LEGACY))
smp_num_siblings = 1;
 #endif
-
-   if (cpuid_eax(0x8000) >= 0x8008) {
-   c->x86_num_cores = (cpuid_ecx(0x8008) & 0xff) + 1;
-   if (c->x86_num_cores & (c->x86_num_cores - 1))
-   c->x86_num_cores = 1;
-   }
 }
 
 static unsigned int amd_size_cache(struct cpuinfo_x86 * c, unsigned int size)
diff -Nru linux-2.6.11/arch/i386/kernel/cpu/common.c 
linux-mc/arch/i386/kernel/cpu/common.c
--- linux-2.6.11/arch/i386/kernel/cpu/common.c  2005-03-01 23:37:47.0 
-0800
+++ linux-mc/arch/i386/kernel/cpu/common.c  2004-11-07 11:34:10.237802664 
-0800
@@ -441,7 +441,7 @@
 void __init detect_ht(struct cpuinfo_x86 *c)
 {
u32 eax, ebx, ecx, edx;
-   int index_lsb, index_msb, tmp;
+   int index_msb, tmp;
int cpu = smp_processor_id();
 
if (!cpu_has(c, X86_FEATURE_HT))
@@ -453,7 +453,6 @@
if (smp_num_siblings == 1) {
printk(KERN_INFO  "CPU: Hyper-Threading is disabled\n");
} else if (smp_num_siblings > 1 ) {
-   index_lsb = 0;
index_msb = 31;
 
if (smp_num_siblings > NR_CPUS) {
@@ -462,21 +461,34 @@
return;
}
tmp = smp_num_siblings;
-   while ((tmp & 1) == 0) {
-   tmp >>=1 ;
-   index_lsb++;
-   }
-   tmp = smp_num_siblings;
while ((tmp & 0x8000 ) == 0) {
tmp <<=1 ;
index_msb--;
}
-   if (index_lsb != index_msb )
+   if (smp_num_siblings & (smp_num_siblings - 1))
index_msb++;
phys_proc_id[cpu] = phys_pkg_id((ebx >> 24) & 0xFF, index_msb);
 
printk(KERN_INFO  "CPU: Physical Processor ID: %d\n",
   phys_proc_id[cpu]);
+   
+   smp_num_siblings = smp_num_siblings / c->x86_num_cores;
+
+   tmp = smp_num_siblings;
+   index_msb = 31;
+   while ((tmp & 0x8000) == 0) {
+   tmp <<=1 ;
+   index_msb--;
+   }
+
+   if (smp_num_siblings & (smp_num_siblings - 1))
+   index_msb++;
+
+   cpu_core_id[cpu] = phys_pkg_id((ebx >> 24) & 0xFF, index_msb);
+
+   if (c->x86_num_cores > 1)
+   printk(KERN_INFO  "CPU: Processor Core ID: %d\n",
+  cpu_core_id[cpu]);
}
 }
 #endif
diff -Nru linux-2.6.11/arch/i386/kernel/cpu/intel.c 
linux-mc/arch/i386/kernel/cpu/intel.c
--- linux-2.6.11/arch/i386/kernel/cpu/intel.c   2005-03-01 23:37:52.0 
-0800
+++ linux-mc/arch/i386/kernel/cpu/intel.c   2004-11-01 16:13:46.187249632 
-0800
@@ -77,6 +77,27 @@
 }
 
 
+/*
+ * find out the number of processor cores on the die
+ */
+static int __init num_cpu_cores(struct cpuinfo_x86 *c)
+{
+   unsigned int eax;
+
+   if (c->cpuid_level < 4)
+   return 1;
+
+   __asm__("cpuid"
+   : "=a" (eax)
+   : "0" (4), "c" (0)
+   : "bx", "dx");
+
+   if (eax & 0x1f)
+   return ((eax >> 26) + 1);
+   else
+   return 1;
+}
+
 static void __init init_intel(struct cpuinfo_x86 *c)
 {
unsigned int l2 = 0;
@@ -139,6 +160,8 @@
if ( p )
strcpy(c->x86_model_id, p);

+   c->x86_num_cores = num_cpu_cores(c);
+
detect_ht(c);
 
/* Work around errata */
diff -Nru linux-2.6.11/arch/i386/kernel/cpu/proc.c 

Re: [PATCH] Add freezer call in

2005-03-15 Thread Nigel Cunningham
Hi.

On Wed, 2005-03-16 at 10:37, Pavel Machek wrote:
> Hi!
> 
> > This patch adds a freezer call to the slow path in __alloc_pages. It
> > thus avoids freezing failures in low memory situations. Like the other
> > patches, it has been in Suspend2 for longer than I can remember.
> 
> This one seems wrong.
> 
> What if someone does
> 
>   down(_lock_needed_during_suspend);
>   kmalloc()
> 
> ? If you freeze him during that allocation, you'll deadlock later...

I suppose you're right. I'll see if I can look into this situation some
more. (Longer todo!).

Nigel

> > Signed-of-by: Nigel Cunningham <[EMAIL PROTECTED]>
> > 
> > diff -ruNp 213-missing-refrigerator-calls-old/mm/page_alloc.c 
> > 213-missing-refrigerator-calls-new/mm/page_alloc.c
> > --- 213-missing-refrigerator-calls-old/mm/page_alloc.c  2005-02-03 
> > 22:33:50.0 +1100
> > +++ 213-missing-refrigerator-calls-new/mm/page_alloc.c  2005-03-16 
> > 09:01:28.0 +1100
> > @@ -838,6 +838,7 @@ rebalance:
> > do_retry = 1;
> > }
> > if (do_retry) {
> > +   try_to_freeze(0);
> > blk_congestion_wait(WRITE, HZ/50);
> > goto rebalance;
> > }
-- 
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028;  Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2/2] SquashFS

2005-03-15 Thread Matt Mackall
On Tue, Mar 15, 2005 at 11:25:07PM +, Phillip Lougher wrote:
> Matt Mackall wrote:
> >
> >>+config SQUASHFS_1_0_COMPATIBILITY
> >>+   bool "Include support for mounting SquashFS 1.x filesystems"
> >
> >How common are these? It would be nice not to bring in legacy code.
> 
> Squashfs 1.x filesystems were the previous file format.  Embedded 
> systems tend to be conservative, and so there are quite a few systems 
> out there using 1.x filesystems.  I've also heard of quite a few cases 
> where Squashfs is used as an archival filesystem, and so there's 
> probably quite a few 1.x fileystems around for this reason.
>
> One issue which I'm aware of here is deciding what getting squashfs 
> support into the kernel is meant to answer.  I'm asking for it to be put 
> into the kernel because developers out there are asking me to put it in 
> the kernel - because they don't want to continually (re)patch their kernels.

My suggestion would be to break out the 1.x code into a separate patch
and encourage everyone to convert to 2.x. No one has ever created a
1.x fs with the expectation it'll work on an unpatched kernel, so they
don't lose anything. And no one should be creating such any more, right?

> >>+   unsigned ints_major:16;
> >>+   unsigned ints_minor:16;
> >
> >What's going on here? s_minor's not big enough for modern minor
> >numbers.
> 
> What is the modern size then?

Minors are 22 bits, majors are 10. May grow to 32 each at some point.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.11] bonding: avoid tx balance for IGMP (alb/tlb mode)

2005-03-15 Thread Jay Vosburgh
Rick Jones <[EMAIL PROTECTED]> wrote:

> treats IGMP packets the same as all other non-broadcast traffic (i.e. 
>it
>will attempt to load balance). This switch behavior seems rather odd in an
>aggregated case, given the fact that most traffic (except broadcast packets)
>will be load balanced by the partner device. In addition, the switch (in
>theory) is suppose to treat the aggregated switch ports as 1 logical port
>and therefore it should allow IGMP packets to be received back on any port
>in the logical aggregation.
>
>IMO, the switch behavior in this case seems questionable.

This patch only applies to the bonding balance-alb/tlb modes,
which do not require the switch to be configured for aggregation.  Since
the switch has no explicit knowledge that the links are being
aggregated, it seems reasonable for it to be confused by what it gets in
the described case.

I haven't tested the patch, but conceptually the problem John
described in his original mail sounds plausible, as does the fix for it.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Changes to the driver model class code.

2005-03-15 Thread Dominik Brodowski
On Tue, Mar 15, 2005 at 02:14:31PM -0800, Greg KH wrote:
> > So this means every device will have yet another reference count, and you
> > need to be aware of _each_ lifetime to write correct code. And the 
> > _reference counting_ is the hard thing to get right, so we should make 
> > _that_ easier. The existing class API was a step towards this direction, and
> > with the changes you're suggesting here we'd do two jumps backwards.
> 
> You are correct, it was a step forward in this direction.
> 
> But we now have a kref to handle the reference counting for the device,
> which make things a whole lot easier than ever before.

Is it really easier if you have to be aware of _both_ the class reference
possibly having reached zero yet and the kref device reference
possibly having reached zero yet? Using your approach, you need to take 
_two_ lifetimes into account instead of one. Think of class device
attributes being opened / still being accessed when kref device reference 
reaching zero... you need to check for that in code now, AFAICS, while you 
could rely on "we still have a reference to the _device_" in "historic" 
class device attribute access paths.

> But the both of you are correct, there is a real need for the class code
> to support trees of devices that are presented to userspace (which is
> what the class code is for).  I'm not taking that away, just trying to
> make the interface to that code simpler.

The interface may get simpler, but we lose the advantages. And I prefer a
interface which reduces the chances of doing things wrongly; and at least
the existing warnings on empty release functions force you to _think_ about
what you do.

> I'm also not saying that I'm going to go off and delete those functions
> from the kernel today, or tomorrow. 
...
> Anyway, don't worry, the code isn't going away anytime soon,

That's totally besides the point. If the decision was made to indeed do this
transition, I'd be all for doing this fast. If the "old" code was gone
within two weeks, I wouldn't care because of the short period, but because
of the functionality being lost:

> I will not be removing any functionality, don't worry :)

the "functionality" of the device core to teach, encourage, and forcing to 
think of reference counting is being lost by this approach. Independent of
the question whether the transition will take two weeks or two years.

> It will not make the reference counting logic easier to get wrong, or
> easier to get right.  It totally takes it away from the user, and makes
> them implement it themselves if they so wish (like the USB HCD patch
> does.).

Keeping the chance to do the "new"/class_simple way is a good thing -- so
that anybody who _knows_ _exactly_ what he does can shoot himself in his
foot^W^W^W^W^W do what is best for the affected code.

> we just
> need to make it easier to use.  Any suggestions that any of you have to
> make this that way (as you are the ones who had to use it to start with)
> would be greatly appreciated.

drivers/base/class_simple.c:


printk("are you really sure you don't want not to have reference 
counting for free by using struct class instead of struct class_simple *?\n");


:)

Thanks,
Dominik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2/2] SquashFS

2005-03-15 Thread Andrew Morton
Phillip Lougher <[EMAIL PROTECTED]> wrote:
>
> >>+   unsigned ints_major:16;
>  >>+  unsigned ints_minor:16;
>  > 
>  > 
>  > What's going on here? s_minor's not big enough for modern minor
>  > numbers.
>  > 
> 
>  What is the modern size then?

10 bits of major, 20 bits of minor.

As this is an on-disk thing, you're kinda stuck.  A number of filesystems
have this problem.  We used tricks in the inode to support it in ext2 and
ext3.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.11.4

2005-03-15 Thread Hacksaw
+   while (dlen >= 2 && dlen >= data[1] && data[1] >= 2) {

Not that it matters much to me, since I don't have to maintain it, but 
couldn't this be:

while (data[1] >= 2 && dlen >= data[1]) {

I think this captures the relationship and priority.
-- 
http://www.hacksaw.org -- http://www.privatecircus.com -- KB1FVD


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BK PATCH] SCSI updates for 2.6.11

2005-03-15 Thread James Bottomley
This is my current tranch of patches that were waiting the transition
from -rc to released (sorry it's late ... I've been on holiday).

The patch is available here:

bk://linux-scsi.bkbits.net/scsi-for-linus-2.6

The short log is:

Adrian Bunk:
  o SCSI NCR_D700.c: make some code static
  o SCSI sim710.c: make some code static

Alan Stern:
  o Add a NOREPORTLUN blacklist flag
  o Retry supposedly "unrecoverable" hardware errors

Andi Kleen:
  o Fix selection of command serial numbers and pids
  o Add compat_ioctl to mptctl
  o Convert megaraid2 to compat_ioctl
  o Add compat_ioctl to SG
  o Convert aacraid to compat_ioctl
  o Add compat_ioctl to osst
  o Add comment for compat_ioctl to SR
  o Add compat_ioctl to st
  o Add compat_ioctl to SD

Andrew Morton:
  o st msleep warning fix

Andrew Vasquez:
  o target code updates to support scanned targets

Brian King:
  o ipr: Handle new RAID 6 errors
  o ipr: Bump driver version to 2.0.13
  o ipr: Send uevent change notifications
  o ipr: Sparse fixes
  o ipr: Use bitwise types
  o ipr: Remove tcq_active flag from resource entry
  o ipr: Remove resource qdepth field
  o ipr: Remove tcq_enable device attribute
  o ipr: Use change queue type API
  o ipr: Fast failure module options
  o ipr: Support dynamic IDs
  o ipr: Setup max_sectors based on device type
  o ipr: Device remove cleanup
  o ipr: New adapter support
  o ipr: PCI ID table update
  o PCI: update ipr PCI ids

Christoph Hellwig:
  o mark qlogicisp broken
  o mark eata_pio broken
  o qla1280: update changelog
  o qla1280: use pci_map_single
  o qla1280: remove qla1280_proc_info
  o drop some attibutes from the FC transport class

Dave Jones:
  o blacklist microtek scanmaker III

Eric Moore:
  o mptfusion: delete watchdogs timers from mptctl and mptscsih

Guennadi Liakhovetski:
  o dc395x: Fix support for highmem

James Bottomley:
  o FC Remote Port Patch
  o SCSI: dc395x.c add missing #include 
  o SCSI: fix transport statistics mismerge
  o Add statistics to generic transport class
  o SCSI: revamp target scanning routines
  o SCSI: fix io statistics compile warnings
  o SCSI: Add device io statistics
  o SCSI: fix compat_ioctl compile warnings

James Smart:
  o add per scsi-host workqueues for defered processing

Kai Mäkisara:
  o SCSI tape security: require CAP_ADMIN for SG_IO etc
  o SCSI tape fixes: remove f_pos handling
  o SCSI tape fixes (new version): sense descriptor
  o SCSI tape fixes: sense descriptor init, bsf->weof, blkno,
  o SCSI tape: write filemark before rewind etc. when writing
  o SCSI tape descriptor based sense data support

Kenn Humborg:
  o NCR5380 delayed work fix and locking fix

Mark Haverkamp:
  o aacraid: adapter naming fix

Matthew Wilcox:
  o sym2 version 2.2.0
  o Use spi_display_xfer_agreement() in 53c700
  o Display SPI transfer agreement in common code
  o scsi: remove device_request_lock

Mike Anderson:
  o SCSI: Add TASK_ABORTED to status_byte macro

And the diffstat is:

 b/Documentation/kernel-parameters.txt   |3 
 b/Documentation/scsi/st.txt |5 
 b/Documentation/scsi/sym53c8xx_2.txt|2 
 b/drivers/base/transport_class.c|   24 
 b/drivers/message/fusion/mptbase.h  |   10 
 b/drivers/message/fusion/mptctl.c   |  630 +-
 b/drivers/message/fusion/mptscsih.c |  185 +---
 b/drivers/pci/pci.ids   |7 
 b/drivers/scsi/53c700.c |   16 
 b/drivers/scsi/Kconfig  |4 
 b/drivers/scsi/NCR5380.c|   15 
 b/drivers/scsi/NCR_D700.c   |4 
 b/drivers/scsi/aacraid/linit.c  |  115 +-
 b/drivers/scsi/dc395x.c |   49 -
 b/drivers/scsi/hosts.c  |   41 
 b/drivers/scsi/ipr.c|  248 +++--
 b/drivers/scsi/ipr.h|  201 ++--
 b/drivers/scsi/megaraid/megaraid_mm.c   |   26 
 b/drivers/scsi/osst.c   |   19 
 b/drivers/scsi/qla1280.c|  146 ---
 b/drivers/scsi/scsi.c   |  105 ++
 b/drivers/scsi/scsi_devinfo.c   |4 
 b/drivers/scsi/scsi_error.c |   13 
 b/drivers/scsi/scsi_lib.c   |   61 +
 b/drivers/scsi/scsi_scan.c  |  266 --
 b/drivers/scsi/scsi_sysfs.c |  188 +---
 b/drivers/scsi/scsi_transport_fc.c  | 1092 -
 b/drivers/scsi/scsi_transport_iscsi.c   |   30 
 b/drivers/scsi/scsi_transport_spi.c |  213 +++--
 b/drivers/scsi/sd.c |   39 
 b/drivers/scsi/sg.c |   26 
 b/drivers/scsi/sim710.c |6 
 b/drivers/scsi/sr.c |4 
 b/drivers/scsi/st.c |  426 +-
 b/drivers/scsi/st.h |   19 
 b/drivers/scsi/sym53c8xx_2/Makefile |2 
 b/drivers/scsi/sym53c8xx_2/sym53c8xx.h  |   60 +
 b/drivers/scsi/sym53c8xx_2/sym_defs.h   |4 
 b/drivers/scsi/sym53c8xx_2/sym_fw.c |2 
 

Re: [patch 2.6.11] bonding: avoid tx balance for IGMP (alb/tlb mode)

2005-03-15 Thread Rick Jones
Is that switch behaviour "normal" or "correct?"  I know next to nothing about 
what stuff like LACP should do, but asked some internal folks and they had this 
to say:


 treats IGMP packets the same as all other non-broadcast traffic 
(i.e. it
will attempt to load balance). This switch behavior seems rather odd in an
aggregated case, given the fact that most traffic (except broadcast packets)
will be load balanced by the partner device. In addition, the switch (in
theory) is suppose to treat the aggregated switch ports as 1 logical port
and therefore it should allow IGMP packets to be received back on any port
in the logical aggregation.
IMO, the switch behavior in this case seems questionable.

FWIW,
rick jones
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2/2] SquashFS

2005-03-15 Thread Phillip Lougher
Matt Mackall wrote:
On Mon, Mar 14, 2005 at 04:30:33PM +, Phillip Lougher wrote:

+config SQUASHFS_1_0_COMPATIBILITY
+   bool "Include support for mounting SquashFS 1.x filesystems"

How common are these? It would be nice not to bring in legacy code.
Squashfs 1.x filesystems were the previous file format.  Embedded 
systems tend to be conservative, and so there are quite a few systems 
out there using 1.x filesystems.  I've also heard of quite a few cases 
where Squashfs is used as an archival filesystem, and so there's 
probably quite a few 1.x fileystems around for this reason.

One issue which I'm aware of here is deciding what getting squashfs 
support into the kernel is meant to answer.  I'm asking for it to be put 
into the kernel because developers out there are asking me to put it in 
the kernel - because they don't want to continually (re)patch their kernels.

If I drop too much support from the kernel patch, then the kernel 
squashfs support will not be adequate, and the developers will still 
have to patch their kernels with my third-party patches.

Before I submitted this patch I factored out the Squashfs 1.x code into 
a separate file only built if this option is selected.  Obviously this 
reduces the built kernel size (by 6K - 8K depending on architecture), 
but doesn't address the issue of "legacy" code in the kernel.

If people don't want support for 1.x filesystems in the patch, then I 
will drop it...  Opinions?

+#define SERROR(s, args...) do { \
+   if (!silent) \
+   printk(KERN_ERR "SQUASHFS error: "s, ## args);\
+   } while(0)

Why would we ever want to be silent about something of KERN_ERR
severity? Isn't that a better job for klogd?
Silent is a parameter passed into the superblock read routine at mount 
time.  It appears to be intended to ensure the filesystem is silent 
about failed mounts, which is what I use it for.

The macros is only used by the superbock read routine and so I'll 
replace it with direct printks.


+#define SQUASHFS_MAGIC 0x73717368
+#define SQUASHFS_MAGIC_SWAP0x68737173

Again, what's the story here? Is this purely endian conversion or do
filesystems of both endian persuasions exist? If the latter, let's not
keep that legacy. Pick an order, and use endian conversion functions
unconditionally everywhere.
This is _certainly_ not legacy code.  Squashfs deliberately supports 
filesystems of both endian persuasions for efficiency in embedded 
systems.  Swapping data structures is an unnecessary overhead which can 
be avoided if the filesystem is in the native byte order - embedded 
systems often need all the performance optimisations possible, 
especially in the filesystem to reduce initial 'turn-on' start up delay.

Picking an order will impose unnecessary overhead on the losing 
architecture.  When Linux was almost exclusively running on little 
endian machines, having little endian only filesystems probably didn't 
matter (but still not nice in my view), however, Linux now runs on lots 
of different architectures.  In the embedded market the PowerPC (big 
endian) makes up a large percentage of the machines running Linux.

In short SquashFS will always be a dual endian filesystem.
Incidently cramfs is also a dual endian filesystem (not by design, but 
by virtue of the fact it writes filesystems in the host byte order). 
No-one seems to be complaining there.


+#define SQUASHFS_COMPRESSED_SIZE_BLOCK(B)  (((B) & \
+   ~SQUASHFS_COMPRESSED_BIT_BLOCK) ? (B) & \
+   ~SQUASHFS_COMPRESSED_BIT_BLOCK : SQUASHFS_COMPRESSED_BIT_BLOCK)

Shortening all these macro names would be nice..

+typedef unsigned int   squashfs_block;
+typedef long long  squashfs_inode;

Eh? Seems we can have many more inodes than blocks? What sorts of
volume limits do we have here?
For efficiency Squashfs encodes the location of inode data on disk 
within the inode number, this means the inode can be directly read 
without an intermediate inode to disk block lookup.  Because SquashFS 
compresses metadata the inode data location consists of a tuple: the 
location of the compressed block the inode is within, and the offset 
within the uncompressed block of the inode data itself.

The filesystem can be 4GB in size which requires 32 bits for the block 
location.  An uncompressed metadata block is 8KB, which requires 13 bits 
for the block offset.  A Squashfs inode is consequently 45 bits in size.

+   unsigned ints_major:16;
+   unsigned ints_minor:16;

What's going on here? s_minor's not big enough for modern minor
numbers.
What is the modern size then?

+typedef struct {
+   unsigned intindex:27;
+   unsigned intstart_block:29;
+   unsigned char   size;

Eep. Not sure how bit-fields handle crossing word boundaries, would be
surprised if this were very portable.
It is.  

Re: Capabilities across execve

2005-03-15 Thread Alexander Nyberg
> > > It was meant to work with capabilities in the filesystem like setuid bits.
> > > So the patches that have floated around from myself, Andy Lutomirski
> > > and Alex Nyberg are attempts to make something half-way sane out of the
> > > mess.  The trouble is then convincing yourself that it's not some way to
> > > leak capabilities (esp. since some programs use the interface already,
> > > like bind9).
> > 
> > Anyone who uses the current interfaces should not play with the
> > inheritable flag, the text I looked at said it was specifically for
> > execve. Thus if the application doesn't modify the inheritable mask
> > things will look like it has always done. And it really should not mess
> > with inheritable mask if it doesn't intend to, that would be a security
> > bug.
> > We really should be safe doing this.
> 
> That's one of the points.  Latent bugs getting triggered is what makes
> the change deserving of being conservative.

bind9 actually sets inheritable, but I don't see it doing any exec in
the whole package, so it should be safe. I'll look for other large
common packages using capabilities.
I don't think this necessarily is 2.7 material, but otoh if it has
waited this long there doesn't appear to be that kind of rush to get it
in.

> > > All I can say is work is underway.  There's 3 different patches that
> > > will get you to your goal.  I understand that it's a real pain right now.
> > > One of the authors of the withdrawn draft has told me that the notion of
> > > capabilities w/out filesystem support was considered effectively useless.
> > > So, we're in uncharted territory.  BTW, thanks for reminding me of
> > > scripts, I had been testing just C programs.
> > 
> > I wouldn't call it useless, retaining capabilities across execve +
> > pam_cap is a very useful thing, on my machine I can give myself a few
> > capabilities that have always annoyed me (iirc the database that wanted
> > mlock as regular user would have been solved aswell).
> 
> Yes, that's useful, but having 3 sets and complicated rules for
> combining task and file based sets is not really necessary for that.

However I never saw any real clean solution for the problem and I would
call this better and more general for this kind of problems.

> > Regarding fs attributes:
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0211.0/0171.html
> > 
> > I can see useful scenarios of having the possiblity of capabilities per
> > inode (it appears the xattr way wins somewhat in the previous
> > discussion).
> 
> It's how it should be done.
> 
> > Chris, have you seen any capabilities+xattr patches around?
> 
> http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.4-fcap/

Thanks, I'll have a look at this.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 2.6.11.4

2005-03-15 Thread Greg KH
I've release 2.6.11.4 with two security fixes in it.  It can be found at
the normal kernel.org places.

The diffstat and short summary of the fixes are below.  

I'll also be replying to this message with a copy of the patch between
2.6.11.3 and 2.6.11.4, as it is small enough to do so.

thanks,
 
greg k-h

--
 Makefile|2 +-
 drivers/net/ppp_async.c |2 +-
 fs/exec.c   |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)


Summary of changes from v2.6.11.3 to v2.6.11.4
==

Greg Kroah-Hartman:
  o Linux 2.6.11.4

Paul Mackerras:
  o CAN-2005-0384: Remote Linux DoS on ppp servers

Prasanna Meda:
  o use strncpy in get_task_comm

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.11.4

2005-03-15 Thread Greg KH
diff -Nru a/Makefile b/Makefile
--- a/Makefile  2005-03-15 16:09:59 -08:00
+++ b/Makefile  2005-03-15 16:09:59 -08:00
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 11
-EXTRAVERSION = .3
+EXTRAVERSION = .4
 NAME=Woozy Numbat
 
 # *DOCUMENTATION*
diff -Nru a/drivers/net/ppp_async.c b/drivers/net/ppp_async.c
--- a/drivers/net/ppp_async.c   2005-03-15 16:09:59 -08:00
+++ b/drivers/net/ppp_async.c   2005-03-15 16:09:59 -08:00
@@ -1000,7 +1000,7 @@
data += 4;
dlen -= 4;
/* data[0] is code, data[1] is length */
-   while (dlen >= 2 && dlen >= data[1]) {
+   while (dlen >= 2 && dlen >= data[1] && data[1] >= 2) {
switch (data[0]) {
case LCP_MRU:
val = (data[2] << 8) + data[3];
diff -Nru a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c 2005-03-15 16:09:59 -08:00
+++ b/fs/exec.c 2005-03-15 16:09:59 -08:00
@@ -814,7 +814,7 @@
 {
/* buf must be at least sizeof(tsk->comm) in size */
task_lock(tsk);
-   memcpy(buf, tsk->comm, sizeof(tsk->comm));
+   strncpy(buf, tsk->comm, sizeof(tsk->comm));
task_unlock(tsk);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OOM problems with 2.6.11-rc4

2005-03-15 Thread Andrea Arcangeli
On Tue, Mar 15, 2005 at 03:44:13PM -0500, Noah Meyerhans wrote:
> Hello.  We have a server, currently running 2.6.11-rc4, that is
> experiencing similar OOM problems to those described at
> http://groups-beta.google.com/group/fa.linux.kernel/msg/9633559fea029f6e
> and discussed further by several developers here (the summary is at
> http://www.kerneltraffic.org/kernel-traffic/kt20050212_296.html#6)  We
> are running 2.6.11-rc4 because it contains the patches that Andrea
> mentioned in the kerneltraffic link.  The problem was present in 2.6.10
> as well.  We can try newer 2.6 kernels if it helps.

Thanks for testing the new code, but unfortunately the problem you're
facing is a different one. It's still definitely another VM bug though.

While looking after your bug I identified for sure a bug in how the VM
sets the all_unreclaimable, the VM is setting all_unreclaimable on the
normal zone without any care about the progress we're making at freeing
the slab. Once all_unreclaimable is set, it's pretty much too late in
trying not to go OOM. all_unreclaimable truly means OOM so we must be
extremely careful when we set it (for sure the slab progress must be
taken into account).

We also want kswapd to help us in freeing the slab in the background
instead of erroneously giving it up if some slab cache is still
freeable.

Once all_unreclaimable is set, then shrink_caches will stop calling
shrink_zone for anything but the lowest prio, and this will lead to
sc.nr_scanned to be small, and this will lead to shrink_slab to get a
small parameter too.


In short I think we can start by trying this fix (which has some risk,
since now it might become harder to detect an oom condition, but I don't
see many other ways in order to keep the slab progress into account
without major changes). perhaps another way would be to check for
total_reclaimed < SWAP_CLUSTER_MAX, but the one I used in the patch is
much safer for your purposes (even if less safe in terms of not running
into live locks).

Beware this absolutely untested and it may not be enough.  Perhaps there
are more bugs in the same area (the shrink_slab itself seems overkill
complicated for no good reason and different methods returns random
stuff, dcache returns a percentage of the free entries, dquot instead
returns the allocated inuse entries too which makes the whole API
looking unreliable).

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

--- x/mm/vmscan.c.~1~   2005-03-14 05:02:17.0 +0100
+++ x/mm/vmscan.c   2005-03-16 01:28:16.0 +0100
@@ -1074,8 +1074,9 @@ scan:
total_scanned += sc.nr_scanned;
if (zone->all_unreclaimable)
continue;
-   if (zone->pages_scanned >= (zone->nr_active +
-   zone->nr_inactive) * 4)
+   if (!reclaim_state->reclaimed_slab &&
+   zone->pages_scanned >= (zone->nr_active +
+   zone->nr_inactive) * 4)
zone->all_unreclaimable = 1;
/*
 * If we've done a decent amount of scanning and

This below is an untested attempt at bringing dquot a bit more in line
with the API, to make the whole thing a bit more consistent, though I
doubt you're using quotas, so it's only the above one that's going to be
interesting for you to test.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

--- x/fs/dquot.c.~1~2005-03-08 01:02:13.0 +0100
+++ x/fs/dquot.c2005-03-16 01:18:19.0 +0100
@@ -510,7 +510,7 @@ static int shrink_dqcache_memory(int nr,
spin_lock(_list_lock);
if (nr)
prune_dqcache(nr);
-   ret = dqstats.allocated_dquots;
+   ret = (dqstats.free_dquots / 100) * sysctl_vfs_cache_pressure;
spin_unlock(_list_lock);
return ret;
 }

Let us know if this helps in any way or not. Thanks!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] Make /proc/ chmod'able

2005-03-15 Thread Kyle Moffett
On Mar 15, 2005, at 16:18, Rene Scharfe wrote:
It's easily visible in the style of public toilets: in some contries 
you have one big room with no walls in between where all men or women 
merrily shit together, in other countries (like mine) every person can 
lock himself into a private closet.  Both ways work, there's nothing 
too special about using a toilet, but I'm simply used to the privacy 
provided by those thin walls.  I assure you, I don't do anything evil 
in there. :]
Just as long as our labs "bathrooms" don't mysteriously get a
bazillion walls all over the place on kernel upgrade, we're ok.
I don't mind adding new options for advanced security, as long
as you don't change the defaults.  It's hard enough managing
a boatload of workstations under ideal conditions.  When the
default settings change every month it gets really annoying
really quickly. :-D.
Cheers,
Kyle Moffett
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$
L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r  
!y?(-)
--END GEEK CODE BLOCK--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Capabilities across execve

2005-03-15 Thread Albert Cahalan
Russell King, the latest person to notice defects, writes:

> However, the way the kernel is setup today, this seems
> impossible to achieve, which tends to make the whole
> idea of capabilities completely and utterly useless.
>
> How is this stuff supposed to work?  Are my ideas of
> what's supposed to be achievable completely wrong,
> although they look completely reasonable to me.
>
> Don't get me wrong - the capability system seems great at
> permanently revoking capabilities via /proc/sys/kernel/cap-bound,
> and dropping them within an application provided it remains UID0.
> Apart from that, capabilities seem completely useless.
...
> it seems to be something of a lost cause.
...
> my goal of running the script with minimal capabilities
> was completely *impossible* to achieve.

Uh huh. First, some history.

Capability bits were implemented in DG-UX and IRIX.
The two systems did not agree on operation. The draft
POSIX standard, withdrawn for good reason, greatly
changed between draft 16 and draft 17. Settings that
work for one draft are horribly insecure on the other.
Linux capabilities were partly done by the IRIX crew,
working from draft 16. Everyone else had draft 17 or
even draft 13. (and DG-UX had a better system anyway)

Tytso put things well when he wrote: "A lot of innocent
bits have been deforested  while trying work out the
differences between what Linux is doing (which is basically
following Draft 17), and what Trusted Irix is doing (which 
apparently is following Draft 16)."

Then along comes a sendmail exploit. An emergency fix
was produced, breaking an already-defective capability
design.

Note that, unlike DG-UX, our IRIX-inspired design did
not reserve any capability bits for non-kernel use.
This causes an inconsistent security model, with things
like the X server relying on UID. Inconsistency is bad.

OK, so that's how we got into this mess.

Now, how do we get out?

We will always have to deal with old-style apps. Those
few apps that handle capabilities can handle the bad
system we have now, and can handle a system without the
capability syscalls. (for old kernels) These apps can
not handle a changed setup though; to change things we
must make the old syscalls return failure. ANYTHING ELSE
IS VERY UNSAFE.

There is exactly one capability system in popular use.
That would be the one that comes with Solaris. Moving
toward that, via a kernel config option, appears to be
a sane way to get ourselves unstuck from this big mess.
An added advantage that that the Solaris-style method
instantly becomes the standard, especially if Linux is
strongly compatible. This helps with admin training and
portable software.

See if you can find any holes:
http://docs.sun.com/app/docs/doc/816-5175/6mbba7f39?a=view


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


swsusp: Remove arch-specific references from generic code

2005-03-15 Thread Pavel Machek
Hi!

This is fix for "swsusp_restore crap"-: we had some i386-specific code
referenced from generic code. This fixes it by inlining tlb_flush_all
into assembly.

Please apply,
Pavel

From: Rafael J. Wysocki <[EMAIL PROTECTED]>
Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
Signed-off-by: Pavel Machek <[EMAIL PROTECTED]>



diff -Nrup linux-2.6.11-bk10-a/arch/i386/power/swsusp.S 
linux-2.6.11-bk10-b/arch/i386/power/swsusp.S
--- linux-2.6.11-bk10-a/arch/i386/power/swsusp.S2005-03-15 
09:20:53.0 +0100
+++ linux-2.6.11-bk10-b/arch/i386/power/swsusp.S2005-03-15 
15:37:25.0 +0100
@@ -51,6 +51,15 @@ copy_loop:
.p2align 4,,7
 
 done:
+   /* Flush TLB, including "global" things (vmalloc) */
+   movlmmu_cr4_features, %eax
+   movl%eax, %edx
+   andl$~(1<<7), %edx;  # PGE
+   movl%edx, %cr4;  # turn off PGE
+   movl%cr3, %ecx;  # flush TLB
+   movl%ecx, %cr3
+   movl%eax, %cr4;  # turn PGE back on
+
movl saved_context_esp, %esp
movl saved_context_ebp, %ebp
movl saved_context_ebx, %ebx
@@ -58,5 +67,5 @@ done:
movl saved_context_edi, %edi
 
pushl saved_context_eflags ; popfl
-   call swsusp_restore
+
ret
diff -Nrup linux-2.6.11-bk10-a/arch/x86_64/kernel/suspend_asm.S 
linux-2.6.11-bk10-b/arch/x86_64/kernel/suspend_asm.S
--- linux-2.6.11-bk10-a/arch/x86_64/kernel/suspend_asm.S2005-03-15 
09:20:53.0 +0100
+++ linux-2.6.11-bk10-b/arch/x86_64/kernel/suspend_asm.S2005-03-16 
00:56:53.0 +0100
@@ -69,6 +69,15 @@ loop:
movqpbe_next(%rdx), %rdx
jmp loop
 done:
+   /* Flush TLB, including "global" things (vmalloc) */
+   movqmmu_cr4_features(%rip), %rax
+   movq%rax, %rdx
+   andq$~(1<<7), %rdx;  # PGE
+   movq%rdx, %cr4;  # turn off PGE
+   movq%cr3, %rcx;  # flush TLB
+   movq%rcx, %cr3
+   movq%rax, %cr4;  # turn PGE back on
+
movl$24, %eax
movl%eax, %ds
 
@@ -89,5 +98,5 @@ done:
movq saved_context_r14(%rip), %r14
movq saved_context_r15(%rip), %r15
pushq saved_context_eflags(%rip) ; popfq
-   callswsusp_restore
+
ret
diff -Nrup linux-2.6.11-bk10-a/kernel/power/swsusp.c 
linux-2.6.11-bk10-b/kernel/power/swsusp.c
--- linux-2.6.11-bk10-a/kernel/power/swsusp.c   2005-03-15 09:21:23.0 
+0100
+++ linux-2.6.11-bk10-b/kernel/power/swsusp.c   2005-03-15 15:35:44.0 
+0100
@@ -900,22 +900,13 @@ int swsusp_suspend(void)
error = swsusp_arch_suspend();
/* Restore control flow magically appears here */
restore_processor_state();
+   BUG_ON (nr_copy_pages_check != nr_copy_pages);
restore_highmem();
device_power_up();
local_irq_enable();
return error;
 }
 
-
-asmlinkage int swsusp_restore(void)
-{
-   BUG_ON (nr_copy_pages_check != nr_copy_pages);
-   
-   /* Even mappings of "global" things (vmalloc) need to be fixed */
-   __flush_tlb_global();
-   return 0;
-}
-
 int swsusp_resume(void)
 {
int error;

-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

- End forwarded message -

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Reading deterministic cache parameters and exporting it in /sysfs

2005-03-15 Thread Venkatesh Pallipadi
On Tue, Mar 15, 2005 at 06:36:20PM -0500, Dave Jones wrote:
> On Tue, Mar 15, 2005 at 03:24:48PM -0800, Venkatesh Pallipadi wrote:
>  >  
>  > The attached patch adds support for using cpuid(4) instead of cpuid(2), to 
> get 
>  > CPU cache information in a deterministic way for Intel CPUs, whenever 
>  > supported. The details of cpuid(4) can be found here
>  > 
>  > IA-32 Intel Architecture Software Developer's Manual (vol 2a)
>  > 
> (http://developer.intel.com/design/pentium4/manuals/index_new.htm#sdm_vol2a)
>  > and
>  > Prescott New Instructions (PNI) Technology: Software Developer's Guide
>  > (http://www.intel.com/cd/ids/developer/asmo-na/eng/events/43988.htm)
>  >  
>  > The advantage of using the cpuid(4) ('Deterministic Cache Parameters 
> Leaf') are:
>  > * It provides more information than the descriptors provided by cpuid(2)
>  > * It is not table based as cpuid(2). So, we will not need changes to the 
>  >   kernel to support new cache descriptors in the descriptor table (as is 
> the 
>  >   case with cpuid(2)).
>  >  
>  > The patch also adds a bunch of interfaces under 
>  > /sys/devices/system/cpu/cpuX/cache, showing various information about the
>  > caches.
> 
> Why does this need to be in kernel-space ? 

Currently, the CPU cache information is printed as a part of kernel bootup
messages and /proc/cpuinfo using cpuid(2). This patch is trying to use cpuid(4)
to print the messages in these places. I think this part of the patch is
required. Otherwise, we may end up printing 0 cache sizes on some CPUs.
It will also reduce the zero_cache_size_complaints on lkml :-).

> Is there some reason that prevents
> you from enhancing x86info for example ?  I really want to live to see the
> death of /proc/cpuinfo one day, and reinventing it in sysfs seems pointless
> if it can all be done in userspace.
> Given that the most useful field is of limited use to a majority of users,
> and those that are interested can read this from userspace, this has me very 
> puzzled.

Agreed. Exporting it in /sysfs is debatable. And some of the information like,
'Which CPUs are sharing what caches' may not be useful today. But,
with CPUs with HT and multiple cores and combinations of it, sharing different
caches, having this information will be useful inside the kernel as well. 
scheduler for example. We can setup some of the scheduler domain parameters 
based on whether L2 is shared or not. 
Also, we felt, exporting this information to userspace in a consistent way will
help userspace apps to do various things like binding to specific CPUs, using
the working set size based on cache size, etc, to optimize the performance. 
Again, this can be done in userspace as well. But, if kernel is already doing
it, it may be better to export it from the kernel space.

Thanks,
Venki

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swsusp_restore crap

2005-03-15 Thread Rafael J. Wysocki
Hi,

On Wednesday, 16 of March 2005 00:39, Pavel Machek wrote:
> Hi!
> 
> > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > 
> > > > diff -Nrup linux-2.6.11-bk10-a/arch/x86_64/kernel/suspend_asm.S 
> > > > linux-2.6.11-bk10-b/arch/x86_64/kernel/suspend_asm.S
> > > > --- linux-2.6.11-bk10-a/arch/x86_64/kernel/suspend_asm.S
> > > > 2005-03-15 09:20:53.0 +0100
> > > > +++ linux-2.6.11-bk10-b/arch/x86_64/kernel/suspend_asm.S
> > > > 2005-03-15 15:36:29.0 +0100
> > > > @@ -69,6 +69,14 @@ loop:
> > > > movqpbe_next(%rdx), %rdx
> > > > jmp loop
> > > >  done:
> > > > +   /* Flush TLB, including "global" things (vmalloc) */
> > > > +   movq%rax, %rdx;  # mmu_cr4_features(%rip)
> > > 
> > > I somehow don't think %rax contains mmu_cr4_features at this
> > > point. Otherwise it seems to look ok.
> > 
> > Yes, it does, because on x86-64 the TLBs are flushed before the loop,
> > right after %cr3 is loaded with init_level4_pgt.  %rax is not touched
> > afterwards, so it contains the right value.  Here's the relevant code
> > from suspend_asm.S (with the patch applied):
> 
> Well, it is mmu_cr4_features from "old" kernel, while you are flushing
> tlb in "new" kernel. It is probably same anyway, but %rax is
> commonly-used scratch register, and memory load is not that
> expensive. Can you just load it from memory?

Sure, revised patch follows.

Greets,
Rafael


Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>

diff -Nrup linux-2.6.11-bk10-a/arch/i386/power/swsusp.S 
linux-2.6.11-bk10-b/arch/i386/power/swsusp.S
--- linux-2.6.11-bk10-a/arch/i386/power/swsusp.S2005-03-15 
09:20:53.0 +0100
+++ linux-2.6.11-bk10-b/arch/i386/power/swsusp.S2005-03-15 
15:37:25.0 +0100
@@ -51,6 +51,15 @@ copy_loop:
.p2align 4,,7
 
 done:
+   /* Flush TLB, including "global" things (vmalloc) */
+   movlmmu_cr4_features, %eax
+   movl%eax, %edx
+   andl$~(1<<7), %edx;  # PGE
+   movl%edx, %cr4;  # turn off PGE
+   movl%cr3, %ecx;  # flush TLB
+   movl%ecx, %cr3
+   movl%eax, %cr4;  # turn PGE back on
+
movl saved_context_esp, %esp
movl saved_context_ebp, %ebp
movl saved_context_ebx, %ebx
@@ -58,5 +67,5 @@ done:
movl saved_context_edi, %edi
 
pushl saved_context_eflags ; popfl
-   call swsusp_restore
+
ret
diff -Nrup linux-2.6.11-bk10-a/arch/x86_64/kernel/suspend_asm.S 
linux-2.6.11-bk10-b/arch/x86_64/kernel/suspend_asm.S
--- linux-2.6.11-bk10-a/arch/x86_64/kernel/suspend_asm.S2005-03-15 
09:20:53.0 +0100
+++ linux-2.6.11-bk10-b/arch/x86_64/kernel/suspend_asm.S2005-03-16 
00:56:53.0 +0100
@@ -69,6 +69,15 @@ loop:
movqpbe_next(%rdx), %rdx
jmp loop
 done:
+   /* Flush TLB, including "global" things (vmalloc) */
+   movqmmu_cr4_features(%rip), %rax
+   movq%rax, %rdx
+   andq$~(1<<7), %rdx;  # PGE
+   movq%rdx, %cr4;  # turn off PGE
+   movq%cr3, %rcx;  # flush TLB
+   movq%rcx, %cr3
+   movq%rax, %cr4;  # turn PGE back on
+
movl$24, %eax
movl%eax, %ds
 
@@ -89,5 +98,5 @@ done:
movq saved_context_r14(%rip), %r14
movq saved_context_r15(%rip), %r15
pushq saved_context_eflags(%rip) ; popfq
-   callswsusp_restore
+
ret
diff -Nrup linux-2.6.11-bk10-a/kernel/power/swsusp.c 
linux-2.6.11-bk10-b/kernel/power/swsusp.c
--- linux-2.6.11-bk10-a/kernel/power/swsusp.c   2005-03-15 09:21:23.0 
+0100
+++ linux-2.6.11-bk10-b/kernel/power/swsusp.c   2005-03-15 15:35:44.0 
+0100
@@ -900,22 +900,13 @@ int swsusp_suspend(void)
error = swsusp_arch_suspend();
/* Restore control flow magically appears here */
restore_processor_state();
+   BUG_ON (nr_copy_pages_check != nr_copy_pages);
restore_highmem();
device_power_up();
local_irq_enable();
return error;
 }
 
-
-asmlinkage int swsusp_restore(void)
-{
-   BUG_ON (nr_copy_pages_check != nr_copy_pages);
-   
-   /* Even mappings of "global" things (vmalloc) need to be fixed */
-   __flush_tlb_global();
-   return 0;
-}
-
 int swsusp_resume(void)
 {
int error;

-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CONFIG_PM for ppc64, to allow sysrq o

2005-03-15 Thread Benjamin Herrenschmidt
On Tue, 2005-03-15 at 22:26 +0100, Olaf Hering wrote:
> For some weird reason, sysrq o is hidden behind CONFIG_PM.
> Why? One can power off just fine without that. Can pm_sysrq_init be
> moved to a better place? I think it used to be in sysrq.c in 2.4.
> 
> Too bad, with this patch radeonfb fails to compile.

Hehe :)

ppc64 isn't yet ready for CONFIG_PM, though I have some
hacks-in-progress ...

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/6] PCI Express Advanced Error Reporting Driver

2005-03-15 Thread Nguyen, Tom L
Tuesday, March 15, 2005 2:51 PM Linas Vepstas wrote:
>> +void hw_aer_unregister(void)
>> +{
>> +struct pci_dev *dev = (struct pci_dev*)host->dev;
>> +unsigned short id;
>> +
>> +id = (dev->bus->number << 8) | dev->devfn;
>> +
>> +/* Unregister with AER Root driver */
>> +pcie_aer_unregister(id);
>> +}
>
>I don't understand how this can work on a system with 
>more than one domain.  On any midrange/high-end system, 
>you'll have a number of devices with identical values
>for (bus->number << 8) | devfn)

Good catch! I forgot to encounter multiple segments. However, based on
LKML inputs for a common interface in the pci_driver data structure, it
appears that pcie_aer_register and pcie_aer_unregister are no longer
required.

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Capabilities across execve

2005-03-15 Thread Chris Wright
* Alexander Nyberg ([EMAIL PROTECTED]) wrote:
> tis 2005-03-15 klockan 14:42 -0800 skrev Chris Wright:
> > It was meant to work with capabilities in the filesystem like setuid bits.
> > So the patches that have floated around from myself, Andy Lutomirski
> > and Alex Nyberg are attempts to make something half-way sane out of the
> > mess.  The trouble is then convincing yourself that it's not some way to
> > leak capabilities (esp. since some programs use the interface already,
> > like bind9).
> 
> Anyone who uses the current interfaces should not play with the
> inheritable flag, the text I looked at said it was specifically for
> execve. Thus if the application doesn't modify the inheritable mask
> things will look like it has always done. And it really should not mess
> with inheritable mask if it doesn't intend to, that would be a security
> bug.
> We really should be safe doing this.

That's one of the points.  Latent bugs getting triggered is what makes
the change deserving of being conservative.

> > All I can say is work is underway.  There's 3 different patches that
> > will get you to your goal.  I understand that it's a real pain right now.
> > One of the authors of the withdrawn draft has told me that the notion of
> > capabilities w/out filesystem support was considered effectively useless.
> > So, we're in uncharted territory.  BTW, thanks for reminding me of
> > scripts, I had been testing just C programs.
> 
> I wouldn't call it useless, retaining capabilities across execve +
> pam_cap is a very useful thing, on my machine I can give myself a few
> capabilities that have always annoyed me (iirc the database that wanted
> mlock as regular user would have been solved aswell).

Yes, that's useful, but having 3 sets and complicated rules for
combining task and file based sets is not really necessary for that.

> Regarding fs attributes:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0211.0/0171.html
> 
> I can see useful scenarios of having the possiblity of capabilities per
> inode (it appears the xattr way wins somewhat in the previous
> discussion).

It's how it should be done.

> Chris, have you seen any capabilities+xattr patches around?

http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.4-fcap/

thanks,
-chris
-- 
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bogus buffer length check in linux-2.6.11 read()

2005-03-15 Thread Robert Hancock
linux-os wrote:
The attached file shows that the kernel thinks it's doing
something helpful by checking the length of the input
buffer for a read(). It will return "Bad Address" until
the length is 1632 bytes.  Apparently the kernel thinks
1632 is a good length!
Likely because only 1632 bytes of memory is accessible after the start 
of the buf buffer, and trying to read in more than that results in 
copy_to_user failing to write some data.

Did anybody consider the overhead necessary to do this
and the fact that the kernel has no way of knowing if
the pointer to the buffer is valid until it actually
does the write. What was wrong with copy_to_user()?
Why is there the additional bogus check?
What additional check?
--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add TPM hardware enablement driver

2005-03-15 Thread Kylene Jo Hall
Thanks for the helpful comments I am working on a patch to fix your
concerns but I have a couple of questions.  

On Wed, 2005-03-09 at 22:51 -0500, Jeff Garzik wrote:


> > +   down(>timer_manipulation_mutex);
> > +   chip->time_expired = 0;
> > +   init_timer(>device_timer);
> > +   chip->device_timer.function = tpm_time_expired;
> > +   chip->device_timer.expires = jiffies + 2 * 60 * HZ;
> > +   chip->device_timer.data = (unsigned long) >time_expired;
> > +   add_timer(>device_timer);
> 
> very wrong.  you init_timer() when you initialize 'chip'... once.  then 
> during the device lifetime you add/mod/del the timer.
> 
> calling init_timer() could lead to corruption of state.
> 
> > +   up(>timer_manipulation_mutex);
When calling mod_timer and an occasional del_singleshot_timer_sync is it
necessary to protect this with a mutex like I was doing or not?


> > +   pci_dev_get(chip->pci_dev);
> > +
> > +   spin_unlock(_lock);
> > +
> > +   chip->data_buffer = kmalloc(TPM_BUFSIZE * sizeof(u8), GFP_KERNEL);
> > +   if (chip->data_buffer == NULL) {
> > +   chip->num_opens--;
> > +   pci_dev_put(chip->pci_dev);
> > +   return -ENOMEM;
> > +   }
> 
> what is the purpose of this pci_dev_get/put?  attempting to prevent 
> hotplug or something?

We were doing reference counting since their is a pci_dev in the chip
structure which set to the file->private_data pointer. Not correct?

> 
> > +   }
> > +
> > +   down(>buffer_mutex);
> > +
> > +   if (in_size > TPM_BUFSIZE)
> > +   in_size = TPM_BUFSIZE;
> > +
> > +   if (copy_from_user
> > +   (chip->data_buffer, (void __user *) buf, in_size)) {
> > +   up(>buffer_mutex);
> > +   return -EFAULT;
> > +   }
> > +
> > +   /* atomic tpm command send and result receive */
> > +   out_size = tpm_transmit(chip, chip->data_buffer, TPM_BUFSIZE);
> 
> major bug?  in_size may be smaller than TPM_BUFSIZE
> 
Yes in_size might be but the chip->data_buffer will always be this size since 
it is malloc'd in open.  The operation needs to be atomic so the tpm_transmit 
function is sending the command to the device and receiving the result which 
might be bigger than in_size.  The result is reported back to userspace from 
this buffer on a read.

> > +
> > +ssize_t tpm_read(struct file * file, char __user * buf,
> > +size_t size, loff_t * off)
> > +{
> > +   struct tpm_chip *chip = file->private_data;
> > +   int ret_size = -ENODATA;
> > +
> > +   if (atomic_read(>data_pending) != 0) {/* Result available */
> > +   down(>timer_manipulation_mutex);
> > +   del_singleshot_timer_sync(>user_read_timer);
> > +   up(>timer_manipulation_mutex);
> > +
> > +   down(>buffer_mutex);
> > +
> > +   ret_size = atomic_read(>data_pending);
> > +   atomic_set(>data_pending, 0);
> > +
> > +   if (ret_size == 0)  /* timeout just occurred */
> > +   ret_size = -ETIME;
> > +   else if (ret_size > 0) {/* relay data */
> > +   if (size < ret_size)
> > +   ret_size = size;
> > +
> > +   if (copy_to_user((void __user *) buf,
> > +chip->data_buffer, ret_size)) {
> > +   ret_size = -EFAULT;
> > +   }
> > +   }
> > +   up(>buffer_mutex);
> > +   }
> > +
> > +   return ret_size;
> 
> POSIX violation -- when there is no data available, returning a 
> non-standard error is silly
> 
So read should just return 0 if no data available?


Thanks,
Kylie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Taking strlen of buffers copied from userspace

2005-03-15 Thread Robert Hancock
Artem Frolov wrote:
Hello,
I am in the process of testing static defect analyzer on a Linux
kernel source code (see disclosure below).
I found some potential array bounds violations. The pattern is as
follows: bytes are copied from the user space and then buffer is
accessed on index strlen(buf)-1. This is a defect if user data start
from 0. So the question is: can we make any assumptions what data may
be received from the user or it could be arbitrary?
In general I don't think any such assumptions should be made. In the 
case of the two below I'm assuming that root access is required to write 
those files, preventing any serious security hole, but it shouldn't 
really be permitted to corrupt kernel memory like this, as would likely 
happen if somebody wrote some data that contained a null as the first 
character.

For example, in ./drivers/block/cciss.c, function cciss_proc_write
(line numbers are taken form 2.6.11.3):
   
   293  if (count > sizeof(cmd)-1) return -EINVAL;
   294  if (copy_from_user(cmd, buffer, count)) return -EFAULT;
   295  cmd[count] = '\0';
   296  len = strlen(cmd);  // above 3 lines ensure safety
   297  if (cmd[len-1] == '\n')
   298  cmd[--len] = '\0';
   .
Another example is arch/i386/kernel/cpu/mtrr/if.c, function mtrr_write:
   
   107  if (copy_from_user(line, buf, len - 1))
   108  return -EFAULT;
   109  ptr = line + strlen(line) - 1;
   110  if (*ptr == '\n')
   111  *ptr = '\0';

This one is also unsafe if somebody writes some data which is not 
null-terminated (assuming that that's possible), since strlen will run 
off the end of the buffer. The first example doesn't have that problem.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-mm3: BUG: atomic counter underflow at: rpcauth_destroy

2005-03-15 Thread Trond Myklebust
ty den 15.03.2005 Klokka 23:21 (+0100) skreiv Borislav Petkov:

> After some rookie debugging I think I've found the evildoer:
> 
> rpcauth_create used to have a line that inits rpc_auth->au_count to one
> atomically. This line is now missing so when you release the rpc
> authentication handle, the au_count underflows. Here's a fix:
> 
> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> 
> --- net/sunrpc/auth.c.orig 2005-03-15 22:34:58.0 +0100
> +++ net/sunrpc/auth.c 2005-03-15 22:36:23.0 +0100
> @@ -70,6 +70,7 @@ rpcauth_create(rpc_authflavor_t pseudofl
>   auth = ops->create(clnt, pseudoflavor);
>   if (!auth)
>return NULL;
> + atomic_set(>au_count, 1);
>   if (clnt->cl_auth)
>rpcauth_destroy(clnt->cl_auth);
>   clnt->cl_auth = auth;

The correct fix for this has already been committed to Linus' bitkeeper
repository. See

http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED]|[EMAIL PROTECTED]

Cheers,
  Trond
-- 
Trond Myklebust <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Capabilities across execve

2005-03-15 Thread Alexander Nyberg
tis 2005-03-15 klockan 14:42 -0800 skrev Chris Wright:
> * Russell King ([EMAIL PROTECTED]) wrote:
> > At some point, I decided I'd like to run a certain program non-root
> > with certain capabilities only.  I looked at the above two programs
> > and stupidly thought they'd actually allow me to do this.
> > 
> > However, the way the kernel is setup today, this seems impossible to
> > achieve, which tends to make the whole idea of capabilities completely
> > and utterly useless.
> 
> Yes, the only value of capabilities right now is for a single program
> that starts off as root with full caps to drop uid and caps.
> 
> > How is this stuff supposed to work?  Are my ideas of what's supposed
> > to be achievable completely wrong, although they look completely
> > reasonable to me.
> 
> It was meant to work with capabilities in the filesystem like setuid bits.
> So the patches that have floated around from myself, Andy Lutomirski
> and Alex Nyberg are attempts to make something half-way sane out of the
> mess.  The trouble is then convincing yourself that it's not some way to
> leak capabilities (esp. since some programs use the interface already,
> like bind9).

Anyone who uses the current interfaces should not play with the
inheritable flag, the text I looked at said it was specifically for
execve. Thus if the application doesn't modify the inheritable mask
things will look like it has always done. And it really should not mess
with inheritable mask if it doesn't intend to, that would be a security
bug.
We really should be safe doing this.

> > Don't get me wrong - the capability system seems great at permanently
> > revoking capabilities via /proc/sys/kernel/cap-bound, and dropping
> > them within an application provided it remains UID0.  Apart from that,
> > capabilities seem completely useless.
> 
> It doesn't have to remain uid0.  That's what the prctl PR_SET_KEEPCAPS does.
> But it's not a nice interface, nor simple to use (for example, it'll
> drop the effective set, so you have to reinstate them).
>
>
> > Don't get me wrong - the current behaviour is secure.  But it's so
> > secure that it gets in the way of things which should appear to work.
> > 
> > I forget precisely what I wanted to achieve with this, and why I
> > couldn't just make the program do it itself...  It may have been a
> > script running from cron periodically which needed just one or two
> > capabilities in order to operate, rather than the whole truck load
> > you get by running it as root.  What I do remember is that my goal
> > of running the script with minimal capabilities was completely
> > *impossible* to achieve.
> 
> All I can say is work is underway.  There's 3 different patches that
> will get you to your goal.  I understand that it's a real pain right now.
> One of the authors of the withdrawn draft has told me that the notion of
> capabilities w/out filesystem support was considered effectively useless.
> So, we're in uncharted territory.  BTW, thanks for reminding me of
> scripts, I had been testing just C programs.

I wouldn't call it useless, retaining capabilities across execve +
pam_cap is a very useful thing, on my machine I can give myself a few
capabilities that have always annoyed me (iirc the database that wanted
mlock as regular user would have been solved aswell).

Regarding fs attributes:
http://www.ussg.iu.edu/hypermail/linux/kernel/0211.0/0171.html

I can see useful scenarios of having the possiblity of capabilities per
inode (it appears the xattr way wins somewhat in the previous
discussion).

Chris, have you seen any capabilities+xattr patches around?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OOM problems with 2.6.11-rc4

2005-03-15 Thread Andrew Morton
Noah Meyerhans <[EMAIL PROTECTED]> wrote:
>
> Active:12382 inactive:280459 dirty:214 writeback:0 unstable:0 free:2299 
> slab:220221 mapped:12256 pagetables:122

Vast amounts of slab - presumably inode and dentries.

What sort of local filesystems are in use?

Can you take a copy of /proc/slabinfo when the backup has run for a while,
send it?

It's useful to run `watch -n1 cat /proc/meminfo', see what the various
caches are doing during the operation.

Also, run slabtop if you have it.  Or bloatmeter
(http://www.zip.com.au/~akpm/linux/patches/stuff/bloatmon and
http://www.zip.com.au/~akpm/linux/patches/stuff/bloatmeter).  The thing to
watch for here is the internal fragmentation of the slab caches:

dentry_cache:76505KB82373KB   92.87

93% is good.  Sometimes it gets much worse - very regular directory
patterns can trigger high fragmentation levels.

Does increasing /proc/sys/vm/vfs_cache_pressure help?  If you're watching
/proc/meminfo you should be able to observe the effect of that upon the
Slab: figure.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFS client bug in 2.6.8-2.6.11

2005-03-15 Thread Neil Conway
Hi Bernardo (et al).  Apologies - I've not been reading my account for
a wee while.  Then again, I probably don't have much useful to add to
the debate right now ;-)

--- Bernardo Innocenti <[EMAIL PROTECTED]> wrote:
> Anders Saaby wrote:
> > Anyways if your server has only run with 2.6.10 - try 2.6.11.
> 
> Thank you, I've finally nailed it down by upgrading the
> *server* kernel from 2.6.10-1.770_FC3 to 2.6.10-1.770_FC3.

Hmm, I will infer from a previous email you sent that you mean 766_FC3
for the "from" kernel.

> The latter is basically 2.6.10-ac12 plus a bunch of vendor
> specific patches.

766 -> 770 sounds like a "small" (ish) number of patches to check, if
we're lucky.  Did you wade through 'em all yet?  Any smoking guns?

Regards,
Neil
PS: oh bugger, just remembered that I also reproduced my bug with a
2.6.8 kernel on the server; admittedly though it was an FC2 kernel so
who knows what extra patches it had.




__ 
Do you Yahoo!? 
Make Yahoo! your home page 
http://www.yahoo.com/r/hs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

2005-03-15 Thread Pavel Machek
On Út 15-03-05 15:42:09, john stultz wrote:
> On Tue, 2005-03-15 at 23:59 +0100, Pavel Machek wrote:
> > > diff -Nru a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c
> > > --- a/arch/i386/kernel/apm.c  2005-03-11 17:02:30 -08:00
> > > +++ b/arch/i386/kernel/apm.c  2005-03-11 17:02:30 -08:00
> > > @@ -224,6 +224,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  
> > >  #include 
> > >  #include 
> > > @@ -1204,6 +1205,7 @@
> > >   device_suspend(PMSG_SUSPEND);
> > >   device_power_down(PMSG_SUSPEND);
> > >  
> > > + timeofday_suspend_hook();
> > >   /* serialize with the timer interrupt */
> > >   write_seqlock_irq(_lock);
> > >  
> > 
> > Could you just register timeofday subsystem as a system device? Then
> > device_power_down will call you automagically. And you'll not have
> > to modify apm, acpi, swsusp, ppc suspend, arm suspend, ...
> 
> That may very well be the right way to go. At the moment I'm just very
> hesitant of making any user-visible changes.
> 
> What is the impact if a new system device name is created and then I
> later change it? How stable is that interface supposed to be?

Changing its name is okay... your device probably will not have any
user-accessible controls, right?
Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

2005-03-15 Thread john stultz
On Tue, 2005-03-15 at 23:59 +0100, Pavel Machek wrote:
> > diff -Nru a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c
> > --- a/arch/i386/kernel/apm.c2005-03-11 17:02:30 -08:00
> > +++ b/arch/i386/kernel/apm.c2005-03-11 17:02:30 -08:00
> > @@ -224,6 +224,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -1204,6 +1205,7 @@
> > device_suspend(PMSG_SUSPEND);
> > device_power_down(PMSG_SUSPEND);
> >  
> > +   timeofday_suspend_hook();
> > /* serialize with the timer interrupt */
> > write_seqlock_irq(_lock);
> >  
> 
> Could you just register timeofday subsystem as a system device? Then
> device_power_down will call you automagically. And you'll not have
> to modify apm, acpi, swsusp, ppc suspend, arm suspend, ...

That may very well be the right way to go. At the moment I'm just very
hesitant of making any user-visible changes.

What is the impact if a new system device name is created and then I
later change it? How stable is that interface supposed to be?

thanks
-john

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.11- sym53c8xx Broken on pp64

2005-03-15 Thread Benjamin Herrenschmidt
On Tue, 2005-03-15 at 09:54 -0600, Omkhar Arasaratnam wrote:
> Benjamin Herrenschmidt wrote:

> The 2.6.11.3 kernel with the 2.6.10 driver seems to fail with the same 
> sym2 driver error - so I suppose it goes deeper than the driver itself.
> 

Let's move that to linuxppc64-dev and drop the CC-list. Last message on
this thread.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add freezer call in

2005-03-15 Thread Pavel Machek
Hi!

> This patch adds a freezer call to the slow path in __alloc_pages. It
> thus avoids freezing failures in low memory situations. Like the other
> patches, it has been in Suspend2 for longer than I can remember.

This one seems wrong.

What if someone does

down(_lock_needed_during_suspend);
kmalloc()

? If you freeze him during that allocation, you'll deadlock later...

Pavel


> Signed-of-by: Nigel Cunningham <[EMAIL PROTECTED]>
> 
> diff -ruNp 213-missing-refrigerator-calls-old/mm/page_alloc.c 
> 213-missing-refrigerator-calls-new/mm/page_alloc.c
> --- 213-missing-refrigerator-calls-old/mm/page_alloc.c2005-02-03 
> 22:33:50.0 +1100
> +++ 213-missing-refrigerator-calls-new/mm/page_alloc.c2005-03-16 
> 09:01:28.0 +1100
> @@ -838,6 +838,7 @@ rebalance:
>   do_retry = 1;
>   }
>   if (do_retry) {
> + try_to_freeze(0);
>   blk_congestion_wait(WRITE, HZ/50);
>   goto rebalance;
>   }


-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swsusp_restore crap

2005-03-15 Thread Pavel Machek
Hi!

> > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > 
> > > diff -Nrup linux-2.6.11-bk10-a/arch/x86_64/kernel/suspend_asm.S 
> > > linux-2.6.11-bk10-b/arch/x86_64/kernel/suspend_asm.S
> > > --- linux-2.6.11-bk10-a/arch/x86_64/kernel/suspend_asm.S  2005-03-15 
> > > 09:20:53.0 +0100
> > > +++ linux-2.6.11-bk10-b/arch/x86_64/kernel/suspend_asm.S  2005-03-15 
> > > 15:36:29.0 +0100
> > > @@ -69,6 +69,14 @@ loop:
> > >   movqpbe_next(%rdx), %rdx
> > >   jmp loop
> > >  done:
> > > + /* Flush TLB, including "global" things (vmalloc) */
> > > + movq%rax, %rdx;  # mmu_cr4_features(%rip)
> > 
> > I somehow don't think %rax contains mmu_cr4_features at this
> > point. Otherwise it seems to look ok.
> 
> Yes, it does, because on x86-64 the TLBs are flushed before the loop,
> right after %cr3 is loaded with init_level4_pgt.  %rax is not touched
> afterwards, so it contains the right value.  Here's the relevant code
> from suspend_asm.S (with the patch applied):

Well, it is mmu_cr4_features from "old" kernel, while you are flushing
tlb in "new" kernel. It is probably same anyway, but %rax is
commonly-used scratch register, and memory load is not that
expensive. Can you just load it from memory?
Pavel

> ENTRY(swsusp_arch_resume)
>   /* set up cr3 */
>   leaqinit_level4_pgt(%rip),%rax
>   subq$__START_KERNEL_map,%rax
>   movq%rax,%cr3
> 
>   movqmmu_cr4_features(%rip), %rax
>   movq%rax, %rdx
>   andq$~(1<<7), %rdx  # PGE
>   movq%rdx, %cr4;  # turn off PGE
>   movq%cr3, %rcx;  # flush TLB
>   movq%rcx, %cr3;
>   movq%rax, %cr4;  # turn PGE back on
> 
>   movqpagedir_nosave(%rip), %rdx
> loop:
>   testq   %rdx, %rdx
>   jz  done
> 
>   /* get addresses from the pbe and copy the page */
>   movqpbe_address(%rdx), %rsi
>   movqpbe_orig_address(%rdx), %rdi
>   movq$512, %rcx
>   rep
>   movsq
> 
>   /* progress to the next pbe */
>   movqpbe_next(%rdx), %rdx
>   jmp loop
> done:
>   /* Flush TLB, including "global" things (vmalloc) */
>   movq%rax, %rdx;  # mmu_cr4_features(%rip)
>   andq$~(1<<7), %rdx;  # PGE
>   movq%rdx, %cr4;  # turn off PGE
>   movq%cr3, %rcx;  # flush TLB
>   movq%rcx, %cr3
>   movq%rax, %cr4;  # turn PGE back on
> 
> 
> Greets,
> Rafael
> 
> 

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Reading deterministic cache parameters and exporting it in /sysfs

2005-03-15 Thread Dave Jones
On Tue, Mar 15, 2005 at 03:24:48PM -0800, Venkatesh Pallipadi wrote:
 >  
 > The attached patch adds support for using cpuid(4) instead of cpuid(2), to 
 > get 
 > CPU cache information in a deterministic way for Intel CPUs, whenever 
 > supported. The details of cpuid(4) can be found here
 > 
 > IA-32 Intel Architecture Software Developer's Manual (vol 2a)
 > (http://developer.intel.com/design/pentium4/manuals/index_new.htm#sdm_vol2a)
 > and
 > Prescott New Instructions (PNI) Technology: Software Developer's Guide
 > (http://www.intel.com/cd/ids/developer/asmo-na/eng/events/43988.htm)
 >  
 > The advantage of using the cpuid(4) ('Deterministic Cache Parameters Leaf') 
 > are:
 > * It provides more information than the descriptors provided by cpuid(2)
 > * It is not table based as cpuid(2). So, we will not need changes to the 
 >   kernel to support new cache descriptors in the descriptor table (as is the 
 >   case with cpuid(2)).
 >  
 > The patch also adds a bunch of interfaces under 
 > /sys/devices/system/cpu/cpuX/cache, showing various information about the
 > caches.

Why does this need to be in kernel-space ? Is there some reason that prevents
you from enhancing x86info for example ?  I really want to live to see the
death of /proc/cpuinfo one day, and reinventing it in sysfs seems pointless
if it can all be done in userspace.
 
 > Most useful field being shared_cpu_map, which says what caches are 
 > shared among which logical cpus. 

Given that the most useful field is of limited use to a majority of users,
and those that are interested can read this from userspace, this has me very 
puzzled.

Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Make md thread NO_FREEZE.

2005-03-15 Thread Pavel Machek
Hi!

> The md driver is currently frozen during suspend. I'm told this
>doesn't help much if you're seeking to suspend to RAID :>

Hmm, and does suspend actually work on md with this patch applied?

Pavel

> diff -ruNp 213-missing-refrigerator-calls-old/drivers/md/md.c 
> 213-missing-refrigerator-calls-new/drivers/md/md.c
> --- 213-missing-refrigerator-calls-old/drivers/md/md.c2005-02-14 
> 09:05:26.0 +1100
> +++ 213-missing-refrigerator-calls-new/drivers/md/md.c2005-03-11 
> 09:35:15.0 +1100
> @@ -2763,6 +2762,7 @@ int md_thread(void * arg)
>*/
>  
>   daemonize(thread->name, mdname(thread->mddev));
> + current->flags |= PF_NOFREEZE;
>  
>   current->exit_signal = SIGCHLD;
>   allow_signal(SIGKILL);
> @@ -2787,8 +2787,6 @@ int md_thread(void * arg)
>  
>   wait_event_interruptible(thread->wqueue,
>test_bit(THREAD_WAKEUP, 
> >flags));
> - if (current->flags & PF_FREEZE)
> - refrigerator(PF_FREEZE);
>  
>   clear_bit(THREAD_WAKEUP, >flags);
>  
>  
> 

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


swsusp: Add missing refrigerator calls

2005-03-15 Thread Pavel Machek
Hi!

This adds few more places where it is possible freeze kernel
threads. Please apply,
Pavel

From: Nigel Cunningham <[EMAIL PROTECTED]>
Signed-off-by: Pavel Machek <[EMAIL PROTECTED]>

diff -ruNp 213-missing-refrigerator-calls-old/drivers/media/video/msp3400.c 
213-missing-refrigerator-calls-new/drivers/media/video/msp3400.c
--- 213-missing-refrigerator-calls-old/drivers/media/video/msp3400.c
2005-03-16 10:10:49.0 +1100
+++ 213-missing-refrigerator-calls-new/drivers/media/video/msp3400.c
2005-03-16 10:10:53.0 +1100
@@ -734,6 +734,7 @@ static int msp34xx_sleep(struct msp3400c
 {
DECLARE_WAITQUEUE(wait, current);
 
+again:
add_wait_queue(>wq, );
if (!kthread_should_stop()) {
if (timeout < 0) {
@@ -749,9 +750,12 @@ static int msp34xx_sleep(struct msp3400c
 #endif
}
}
-   if (current->flags & PF_FREEZE)
-   refrigerator(PF_FREEZE);
+
remove_wait_queue(>wq, );
+   
+   if (try_to_freeze(PF_FREEZE))
+   goto again;
+
return msp->restart;
 }
 
diff -ruNp 213-missing-refrigerator-calls-old/drivers/media/video/tvaudio.c 
213-missing-refrigerator-calls-new/drivers/media/video/tvaudio.c
--- 213-missing-refrigerator-calls-old/drivers/media/video/tvaudio.c
2005-02-03 22:33:29.0 +1100
+++ 213-missing-refrigerator-calls-new/drivers/media/video/tvaudio.c
2005-03-11 09:35:15.0 +1100
@@ -286,6 +286,7 @@ static int chip_thread(void *data)
schedule();
}
remove_wait_queue(>wq, );
+   try_to_freeze(PF_FREEZE);
if (chip->done || signal_pending(current))
break;
dprintk("%s: thread wakeup\n", i2c_clientname(>c));
diff -ruNp 213-missing-refrigerator-calls-old/drivers/pnp/pnpbios/core.c 
213-missing-refrigerator-calls-new/drivers/pnp/pnpbios/core.c
--- 213-missing-refrigerator-calls-old/drivers/pnp/pnpbios/core.c   
2005-02-14 09:05:26.0 +1100
+++ 213-missing-refrigerator-calls-new/drivers/pnp/pnpbios/core.c   
2005-03-11 09:35:15.0 +1100
@@ -180,8 +180,12 @@ static int pnp_dock_thread(void * unused
 * Poll every 2 seconds
 */
msleep_interruptible(2000);
-   if(signal_pending(current))
+   
+   if(signal_pending(current)) {
+   if (try_to_freeze(PF_FREEZE))
+   continue;
break;
+   }
 
status = pnp_bios_dock_station_info();
 
diff -ruNp 213-missing-refrigerator-calls-old/fs/afs/kafsasyncd.c 
213-missing-refrigerator-calls-new/fs/afs/kafsasyncd.c
--- 213-missing-refrigerator-calls-old/fs/afs/kafsasyncd.c  2005-02-03 
22:33:40.0 +1100
+++ 213-missing-refrigerator-calls-new/fs/afs/kafsasyncd.c  2005-03-11 
09:35:15.0 +1100
@@ -116,6 +116,8 @@ static int kafsasyncd(void *arg)
remove_wait_queue(_sleepq, );
set_current_state(TASK_RUNNING);
 
+   try_to_freeze(PF_FREEZE);
+
/* discard pending signals */
afs_discard_my_signals();
 
diff -ruNp 213-missing-refrigerator-calls-old/fs/afs/kafstimod.c 
213-missing-refrigerator-calls-new/fs/afs/kafstimod.c
--- 213-missing-refrigerator-calls-old/fs/afs/kafstimod.c   2005-02-03 
22:33:40.0 +1100
+++ 213-missing-refrigerator-calls-new/fs/afs/kafstimod.c   2005-03-11 
09:35:15.0 +1100
@@ -91,6 +91,8 @@ static int kafstimod(void *arg)
complete_and_exit(_dead, 0);
}
 
+   try_to_freeze(PF_FREEZE);
+
/* discard pending signals */
afs_discard_my_signals();
 
diff -ruNp 213-missing-refrigerator-calls-old/fs/lockd/clntproc.c 
213-missing-refrigerator-calls-new/fs/lockd/clntproc.c
--- 213-missing-refrigerator-calls-old/fs/lockd/clntproc.c  2004-12-10 
14:27:10.0 +1100
+++ 213-missing-refrigerator-calls-new/fs/lockd/clntproc.c  2005-03-11 
09:35:15.0 +1100
@@ -312,6 +312,7 @@ static int nlm_wait_on_grace(wait_queue_
prepare_to_wait(queue, , TASK_INTERRUPTIBLE);
if (!signalled ()) {
schedule_timeout(NLMCLNT_GRACE_WAIT);
+   try_to_freeze(PF_FREEZE);
if (!signalled ())
status = 0;
}
diff -ruNp 213-missing-refrigerator-calls-old/kernel/signal.c 
213-missing-refrigerator-calls-new/kernel/signal.c
--- 213-missing-refrigerator-calls-old/kernel/signal.c  2005-03-16 
10:10:48.0 +1100
+++ 213-missing-refrigerator-calls-new/kernel/signal.c  2005-03-16 
10:10:41.0 +1100
@@ -2201,6 +2201,8 @@ sys_rt_sigtimedwait(const sigset_t __use
current->state = TASK_INTERRUPTIBLE;
timeout = 

Re: drm lockups since 2.6.11-bk2

2005-03-15 Thread Andrew Morton
Jesse Barnes <[EMAIL PROTECTED]> wrote:
>
> > We're hoping that davem's fix (committed yesterday) fixed that.
> >
> >
> > ChangeSet 1.2181.1.2, 2005/03/14 21:16:17-08:00, [EMAIL PROTECTED]
> >
> >  [MM]: Restore pgd_index() iteration to clear_page_range().
> 
> Yep, seems to have worked (at least my system boots).

It causes ppc64 to oops unpleasantly so we're not quite there yet.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Reading deterministic cache parameters and exporting it in /sysfs

2005-03-15 Thread Venkatesh Pallipadi
 
The attached patch adds support for using cpuid(4) instead of cpuid(2), to get 
CPU cache information in a deterministic way for Intel CPUs, whenever 
supported. The details of cpuid(4) can be found here

IA-32 Intel Architecture Software Developer's Manual (vol 2a)
(http://developer.intel.com/design/pentium4/manuals/index_new.htm#sdm_vol2a)
and
Prescott New Instructions (PNI) Technology: Software Developer's Guide
(http://www.intel.com/cd/ids/developer/asmo-na/eng/events/43988.htm)
 
The advantage of using the cpuid(4) ('Deterministic Cache Parameters Leaf') are:
* It provides more information than the descriptors provided by cpuid(2)
* It is not table based as cpuid(2). So, we will not need changes to the 
  kernel to support new cache descriptors in the descriptor table (as is the 
  case with cpuid(2)).
 
The patch also adds a bunch of interfaces under 
/sys/devices/system/cpu/cpuX/cache, showing various information about the
caches. Most useful field being shared_cpu_map, which says what caches are 
shared among which logical cpus. 

The patch adds support for both i386 and x86-64.

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>


--- linux-2.6.11/include/asm-i386/processor.h.org   2005-03-14 
13:27:34.0 -0800
+++ linux-2.6.11/include/asm-i386/processor.h   2005-03-14 20:33:39.0 
-0800
@@ -147,6 +147,18 @@ static inline void cpuid(int op, int *ea
: "0" (op), "c"(0));
 }
 
+/* Some CPUID calls want 'count' to be placed in ecx */
+static inline void cpuid_count(int op, int count, int *eax, int *ebx, int *ecx,
+   int *edx)
+{
+   __asm__("cpuid"
+   : "=a" (*eax),
+ "=b" (*ebx),
+ "=c" (*ecx),
+ "=d" (*edx)
+   : "0" (op), "c" (count));
+}
+
 /*
  * CPUID functions returning a single datum
  */
--- linux-2.6.11/include/asm-x86_64/msr.h.org   2005-03-14 13:27:47.0 
-0800
+++ linux-2.6.11/include/asm-x86_64/msr.h   2005-03-14 20:33:39.0 
-0800
@@ -78,6 +78,18 @@ extern inline void cpuid(int op, unsigne
: "0" (op));
 }
 
+/* Some CPUID calls want 'count' to be placed in ecx */
+static inline void cpuid_count(int op, int count, int *eax, int *ebx, int *ecx,
+   int *edx)
+{
+   __asm__("cpuid"
+   : "=a" (*eax),
+ "=b" (*ebx),
+ "=c" (*ecx),
+ "=d" (*edx)
+   : "0" (op), "c" (count));
+}
+
 /*
  * CPUID functions returning a single datum
  */
--- linux-2.6.11/arch/i386/kernel/cpu/intel_cacheinfo.c.org 2005-03-14 
13:27:20.0 -0800
+++ linux-2.6.11/arch/i386/kernel/cpu/intel_cacheinfo.c 2005-03-15 
13:57:30.0 -0800
@@ -1,5 +1,17 @@
+/*
+ *  Routines to indentify caches on Intel CPU.
+ *
+ *  Changes:
+ *  Venkatesh Pallipadi: Adding cache identification through cpuid(4)
+ */
+
 #include 
+#include 
+#include 
+#include 
+
 #include 
+#include 
 
 #define LVL_1_INST 1
 #define LVL_1_DATA 2
@@ -58,10 +70,142 @@ static struct _cache_table cache_table[]
{ 0x00, 0, 0}
 };
 
+
+enum _cache_type
+{
+   CACHE_TYPE_NULL = 0,
+   CACHE_TYPE_DATA = 1,
+   CACHE_TYPE_INST = 2,
+   CACHE_TYPE_UNIFIED = 3
+};
+
+union _cpuid4_leaf_eax {
+   struct {
+   enum _cache_typetype:5;
+   unsigned intlevel:3;
+   unsigned intis_self_initializing:1;
+   unsigned intis_fully_associative:1;
+   unsigned intreserved:4;
+   unsigned intnum_threads_sharing:12;
+   unsigned intnum_cores_on_die:6;
+   } split;
+   u32 full;
+};
+
+union _cpuid4_leaf_ebx {
+   struct {
+   unsigned intcoherency_line_size:12;
+   unsigned intphysical_line_partition:10;
+   unsigned intways_of_associativity:10;
+   } split;
+   u32 full;
+};
+
+union _cpuid4_leaf_ecx {
+   struct {
+   unsigned intnumber_of_sets:32;
+   } split;
+   u32 full;
+};
+
+struct _cpuid4_info {
+   union _cpuid4_leaf_eax eax;
+   union _cpuid4_leaf_ebx ebx;
+   union _cpuid4_leaf_ecx ecx;
+   unsigned long size;
+   cpumask_t shared_cpu_map;
+};
+
+#define MAX_CACHE_LEAVES   4
+static unsigned short  num_cache_leaves;
+
+static int cpuid4_cache_lookup(int index, struct _cpuid4_info *this_leaf)
+{
+   unsigned inteax, ebx, ecx, edx;
+   union _cpuid4_leaf_eax  cache_eax;
+
+   cpuid_count(4, index, , , , );
+   cache_eax.full = eax;
+   if (cache_eax.split.type == CACHE_TYPE_NULL)
+   return -1;
+
+   this_leaf->eax.full = eax;
+   this_leaf->ebx.full = ebx;
+   this_leaf->ecx.full = ecx;
+   this_leaf->size = (this_leaf->ecx.split.number_of_sets + 1) *
+

Re: [PATCH 1/2] No-exec support for ppc64

2005-03-15 Thread Jake Moilanen
On Wed, 16 Mar 2005 09:18:36 +1030
Alan Modra <[EMAIL PROTECTED]> wrote:

> On Tue, Mar 15, 2005 at 03:51:35PM -0600, Jake Moilanen wrote:
> > I believe the problem is that the last PT_LOAD entry does not have the
> > correct size, and we only mmap up to the sbss.  The .sbss, .plt, and
> > .bss do not get mmapped with the section.
> 
> Huh?  .sbss, .plt and .bss have no file contents, so of course p_filesz
> doesn't cover them.

Your right, those shouldn't be mmapped.  

set_brk() call is called on sbss, plt and bss.  There needs to be some
method to set execute permission, on those pieces as well.  Currently it
has no concept of what permission should be set.

Jake   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Your invoice is Overdue March 5

2005-03-15 Thread Kimberly Vazquez
After careful consideration.
our team of experts have chosen a selected 1000 people to recieve an 
inexpensive home loan.
This offer is unconditional to you only and your credit is in no way a factor.

Please find all details below:

_SUMMARY__
Interest: As low as 3.95%
Term: Up to 360 months
Max Price: $80,000 and Up
Closing Date: 30 days 


Fill out this 30 sec. form and you will be approved in 24 hours.
http://www.nowratez.com/x/loan.php?id=ph15


Lastly, if uninterested please discard this email.
It was only sent to a selected few that we felt deemworthy of this oppurtunity.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

2005-03-15 Thread Pavel Machek
Hi!

> diff -Nru a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c
> --- a/arch/i386/kernel/apm.c  2005-03-11 17:02:30 -08:00
> +++ b/arch/i386/kernel/apm.c  2005-03-11 17:02:30 -08:00
> @@ -224,6 +224,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1204,6 +1205,7 @@
>   device_suspend(PMSG_SUSPEND);
>   device_power_down(PMSG_SUSPEND);
>  
> + timeofday_suspend_hook();
>   /* serialize with the timer interrupt */
>   write_seqlock_irq(_lock);
>  

Could you just register timeofday subsystem as a system device? Then
device_power_down will call you automagically. And you'll not have
to modify apm, acpi, swsusp, ppc suspend, arm suspend, ...

Pavel

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] (Revised) Add missing refrigerator calls

2005-03-15 Thread Nigel Cunningham
Hi again.

Okay. I'll leave mtd_blkdevs as NO_FREEZE and remove the superfluous
bluetooth addition.

Here's a revised version:

diff -ruNp 213-missing-refrigerator-calls-old/drivers/media/video/msp3400.c 
213-missing-refrigerator-calls-new/drivers/media/video/msp3400.c
--- 213-missing-refrigerator-calls-old/drivers/media/video/msp3400.c
2005-03-16 10:10:49.0 +1100
+++ 213-missing-refrigerator-calls-new/drivers/media/video/msp3400.c
2005-03-16 10:10:53.0 +1100
@@ -734,6 +734,7 @@ static int msp34xx_sleep(struct msp3400c
 {
DECLARE_WAITQUEUE(wait, current);
 
+again:
add_wait_queue(>wq, );
if (!kthread_should_stop()) {
if (timeout < 0) {
@@ -749,9 +750,12 @@ static int msp34xx_sleep(struct msp3400c
 #endif
}
}
-   if (current->flags & PF_FREEZE)
-   refrigerator(PF_FREEZE);
+
remove_wait_queue(>wq, );
+   
+   if (try_to_freeze(PF_FREEZE))
+   goto again;
+
return msp->restart;
 }
 
diff -ruNp 213-missing-refrigerator-calls-old/drivers/media/video/tvaudio.c 
213-missing-refrigerator-calls-new/drivers/media/video/tvaudio.c
--- 213-missing-refrigerator-calls-old/drivers/media/video/tvaudio.c
2005-02-03 22:33:29.0 +1100
+++ 213-missing-refrigerator-calls-new/drivers/media/video/tvaudio.c
2005-03-11 09:35:15.0 +1100
@@ -286,6 +286,7 @@ static int chip_thread(void *data)
schedule();
}
remove_wait_queue(>wq, );
+   try_to_freeze(PF_FREEZE);
if (chip->done || signal_pending(current))
break;
dprintk("%s: thread wakeup\n", i2c_clientname(>c));
diff -ruNp 213-missing-refrigerator-calls-old/drivers/pnp/pnpbios/core.c 
213-missing-refrigerator-calls-new/drivers/pnp/pnpbios/core.c
--- 213-missing-refrigerator-calls-old/drivers/pnp/pnpbios/core.c   
2005-02-14 09:05:26.0 +1100
+++ 213-missing-refrigerator-calls-new/drivers/pnp/pnpbios/core.c   
2005-03-11 09:35:15.0 +1100
@@ -180,8 +180,12 @@ static int pnp_dock_thread(void * unused
 * Poll every 2 seconds
 */
msleep_interruptible(2000);
-   if(signal_pending(current))
+   
+   if(signal_pending(current)) {
+   if (try_to_freeze(PF_FREEZE))
+   continue;
break;
+   }
 
status = pnp_bios_dock_station_info();
 
diff -ruNp 213-missing-refrigerator-calls-old/fs/afs/kafsasyncd.c 
213-missing-refrigerator-calls-new/fs/afs/kafsasyncd.c
--- 213-missing-refrigerator-calls-old/fs/afs/kafsasyncd.c  2005-02-03 
22:33:40.0 +1100
+++ 213-missing-refrigerator-calls-new/fs/afs/kafsasyncd.c  2005-03-11 
09:35:15.0 +1100
@@ -116,6 +116,8 @@ static int kafsasyncd(void *arg)
remove_wait_queue(_sleepq, );
set_current_state(TASK_RUNNING);
 
+   try_to_freeze(PF_FREEZE);
+
/* discard pending signals */
afs_discard_my_signals();
 
diff -ruNp 213-missing-refrigerator-calls-old/fs/afs/kafstimod.c 
213-missing-refrigerator-calls-new/fs/afs/kafstimod.c
--- 213-missing-refrigerator-calls-old/fs/afs/kafstimod.c   2005-02-03 
22:33:40.0 +1100
+++ 213-missing-refrigerator-calls-new/fs/afs/kafstimod.c   2005-03-11 
09:35:15.0 +1100
@@ -91,6 +91,8 @@ static int kafstimod(void *arg)
complete_and_exit(_dead, 0);
}
 
+   try_to_freeze(PF_FREEZE);
+
/* discard pending signals */
afs_discard_my_signals();
 
diff -ruNp 213-missing-refrigerator-calls-old/fs/lockd/clntproc.c 
213-missing-refrigerator-calls-new/fs/lockd/clntproc.c
--- 213-missing-refrigerator-calls-old/fs/lockd/clntproc.c  2004-12-10 
14:27:10.0 +1100
+++ 213-missing-refrigerator-calls-new/fs/lockd/clntproc.c  2005-03-11 
09:35:15.0 +1100
@@ -312,6 +312,7 @@ static int nlm_wait_on_grace(wait_queue_
prepare_to_wait(queue, , TASK_INTERRUPTIBLE);
if (!signalled ()) {
schedule_timeout(NLMCLNT_GRACE_WAIT);
+   try_to_freeze(PF_FREEZE);
if (!signalled ())
status = 0;
}
diff -ruNp 213-missing-refrigerator-calls-old/kernel/signal.c 
213-missing-refrigerator-calls-new/kernel/signal.c
--- 213-missing-refrigerator-calls-old/kernel/signal.c  2005-03-16 
10:10:48.0 +1100
+++ 213-missing-refrigerator-calls-new/kernel/signal.c  2005-03-16 
10:10:41.0 +1100
@@ -2201,6 +2201,8 @@ sys_rt_sigtimedwait(const sigset_t __use
current->state = TASK_INTERRUPTIBLE;
timeout = schedule_timeout(timeout);
 
+   if (current->flags & PF_FREEZE)
+   

Re: [PATCH 1/6] PCI Express Advanced Error Reporting Driver

2005-03-15 Thread Grant Grundler
On Tue, Mar 15, 2005 at 04:51:01PM -0600, Linas Vepstas wrote:
> Hi,
> 
> On Fri, Mar 11, 2005 at 04:12:18PM -0800, long was heard to remark:
> 
> > +void hw_aer_unregister(void)
> > +{
> > +   struct pci_dev *dev = (struct pci_dev*)host->dev;

I'm more nervous about "host" being defined as a global
instead of being passed in. I've not review the
other code and don't know if that's safe.

> > +   unsigned short id;
> > +
> > +   id = (dev->bus->number << 8) | dev->devfn;
> > +   
> > +   /* Unregister with AER Root driver */
> > +   pcie_aer_unregister(id);
> > +}
> 
> I don't understand how this can work on a system with 
> more than one domain.  On any midrange/high-end system, 
> you'll have a number of devices with identical values
> for (bus->number << 8) | devfn)

Yes - this is an error reported within a particular domain.
I'm expecting host-> to refer to a particular domain.
Maybe it doesn't?

[ example deleted ]

> Or am I being stupid/dense/all-of-the-above?

Probably not.

grant

> 
> --linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [topic change] jiffies as a time value

2005-03-15 Thread George Anzinger
john stultz wrote:
On Mon, 2005-03-14 at 15:40 -0800, George Anzinger wrote:
john stultz wrote:
On Sat, 2005-03-12 at 16:49 -0800, Matt Mackall wrote:
+   /* finally, update legacy time values */
+   write_seqlock_irqsave(_lock, x_flags);
+   xtime = ns2timespec(system_time + wall_time_offset);
+   wall_to_monotonic = ns2timespec(wall_time_offset);
+   wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
+   wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
+   /* XXX - should jiffies be updated here? */
Excellent question. 
Indeed.  Currently jiffies is used as both a interrupt counter and a
time unit, and I'm trying make it just the former. If I emulate it then
it stops functioning as a interrupt counter, and if I don't then I'll
probably break assumptions about jiffies being a time unit. So I'm not
sure which is the easiest path to go until all the users of jiffies are
audited for intent. 
Really?  Who counts interrupts???  The timer code treats jiffies as a unit of 
time.  You will need to rewrite that to make it otherwise.  

Ug. I'm thin on time this week, so I was hoping to save this discussion
for later, but I guess we can get into it now.
Well, assuming timer interrupts actually occur HZ times a second, yes
one could (and current practice, one does) implicitly interpret jiffies
as being a valid notion of time.  However with SMIs, bad drivers that
disable interrupts for too long, and virtualization the reality is that
that assumption doesn't hold. 

We do have the lost-ticks compensation code that tries to help this, but
that conflicts with some virtualization implementations. Suspend/resume
tries to compensate jiffies for ticks missed over time suspended, but
I'm not sure how accurate it really is (additionally, looking at it now,
it assumes jiffies is only 32bits).
Adding to that, the whole jiffies doesn't really increment at HZ, but
ACTHZ confusion, or bad drivers that assume HZ=100, we get a fair amount
of trouble stemming from folks using jiffies as a time value.  Because
in reality, it is just a interrupt counter.
Well, currently, in x86 systems it causes wall clock to advance a very well 
defined amount.  That it is not exactly 1/HZ is something we need to live with...
So now, if new timeofday code emulates jiffies, we have to decide if it
emulates jiffies at HZ or ACTHZ? Also there could be issues with jiffies
possibly jittering from it being incremented every tick and then set to
the proper time when the timekeeping code runs. 
I think your overlooking timers.  We have a given resolution for timers and some 
code, at least, expects timers to run with that resolution.  This REQUIRES 
interrupts at resolution frequency.  We can argue about what that interrupt 
event is called (currently a jiffies interrupt) and disparage the fact that 
hardware can not give us "nice" numbers for the resolution, but we do need the 
interrupts.  That there are bad places in the code where interrupts are delayed 
is not really important in this discussion.  For what it worth, the RT patch 
Ingo is working on is getting latencies down in the 10s of microseconds region.

We also need, IMNSHO to recognize that, at lest with some hardware, that 
interrupt IS in fact the clock and is the only reasonable way we have of reading 
it.  This is true, for example, on the x86.  The TSC we use as a fill in for 
between interrupts is not stable in the long term and should only be used to 
interpolate over 1 to 10 ticks or so.
I'm not sure which is the best way to go, but it sounds that emulating
it is probably the easiest. I just deferred the question with a comment
until now because its not completely obvious. Any suggestions on the
above questions (I'm guessing the answers are: use ACTHZ, and the jitter
won't hurt that bad). 


But then you have 
another problem.  To correctly function, times need to expire on time (hay how 
bout that) not some time later.  To do this we need an interrupt source.  To 
this point in time, the jiffies interrupt has been the indication that one or 
more timer may have expired.  While we don't need to "count" the interrupts, we 
DO need them to expire the timers AND they need to be on time.

Well, something Nish Aravamudan has been working on is converting the
common users of jiffies (drivers) to start using human time units. These
very well understood units (which avoid HZ/ACTHZ/HZ=100 assumptions) can
then be accurately changed to jiffies (or possibly some other time unit)
internally. It would even be possible for soft-timers to expire based
upon the actual high-res time value, rather then the low-res tick-
counter(which is something else Nish has been playing with). When that
occurs we can easily start doing other interesting things that I believe
you've already been working on in your HRT code, such as changing the
timer interrupt frequency dynamically, or working with multiple timer
interrupt sources. 
This is also what is done in things like posix 

  1   2   3   4   5   6   7   8   >