Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Ingo Oeser
Hi Linus,

On Friday 18 January 2008, Linus Torvalds wrote:
> On Fri, 18 Jan 2008, Miklos Szeredi wrote:
> > 
> > What I'm saying is that the times could be left un-updated for a long
> > time if program doesn't do munmap() or msync(MS_SYNC) for a long time.
> 
> Sure.
> 
> But in those circumstances, the programmer cannot depend on the mtime 
> *anyway* (because there is no synchronization), so what's the downside?

Can we get "if the write to the page hits the disk, the mtime has hit the disk
already no less than SOME_GRANULARITY before"? 

That is very important for computer forensics. Esp. in saving your ass!

Ok, now back again to making that fast :-)


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] x86: Add config variables for SMP_MAX

2008-01-18 Thread Ingo Oeser
Hi Mike,

On Friday 18 January 2008, [EMAIL PROTECTED] wrote:
> +config THREAD_ORDER
> + int "Kernel stack size (in page order)"
> + range 1 3
> + depends on X86_64_SMP
> + default "3" if X86_SMP_MAX
> + default "1"
> + help
> +   Increases kernel stack size.
> +

Could you please elaborate, why this is needed and put more info about
this requirement into this patch description?

People worked hard to push data allocation from stack to heap to make 
THREAD_ORDER of 0 and 1 possible. So why increase it again and why does this
help scalability?

Many thanks and Best Regards

Ingo Oeser, puzzled a bit :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Ingo Oeser
Hi Linus,

On Friday 18 January 2008, Linus Torvalds wrote:
 On Fri, 18 Jan 2008, Miklos Szeredi wrote:
  
  What I'm saying is that the times could be left un-updated for a long
  time if program doesn't do munmap() or msync(MS_SYNC) for a long time.
 
 Sure.
 
 But in those circumstances, the programmer cannot depend on the mtime 
 *anyway* (because there is no synchronization), so what's the downside?

Can we get if the write to the page hits the disk, the mtime has hit the disk
already no less than SOME_GRANULARITY before? 

That is very important for computer forensics. Esp. in saving your ass!

Ok, now back again to making that fast :-)


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] x86: Add config variables for SMP_MAX

2008-01-18 Thread Ingo Oeser
Hi Mike,

On Friday 18 January 2008, [EMAIL PROTECTED] wrote:
 +config THREAD_ORDER
 + int Kernel stack size (in page order)
 + range 1 3
 + depends on X86_64_SMP
 + default 3 if X86_SMP_MAX
 + default 1
 + help
 +   Increases kernel stack size.
 +

Could you please elaborate, why this is needed and put more info about
this requirement into this patch description?

People worked hard to push data allocation from stack to heap to make 
THREAD_ORDER of 0 and 1 possible. So why increase it again and why does this
help scalability?

Many thanks and Best Regards

Ingo Oeser, puzzled a bit :-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kinit (was: sleep before boot panic)

2008-01-10 Thread Ingo Oeser
On Wednesday 09 January 2008, H. Peter Anvin wrote:
> Pavel Machek wrote:
> > 
> >> Of course, if we'd been using kinit, "soft panic" would 
> >> have been done exclusively in userspace...
> > 
> > What's the status of kinit, btw?
> > Pavel
> 
> It's bitrotted a bit since it was first rejected.  It wouldn't take too 
> much work to bring it back up to speed, however.  klibc, and some of the 
> kinit components, are used for the initramfs in Debian.

Yes, and I like most of it. The only thing really missing for me 
is LVM support. Debian (?) did a evil hack to make it work.
Maybe one day this itches soo much, I'll even scratch it :-)

Then I'll be able to test kernels on a standard LVM installation again.

So please keep up the good work!


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kinit (was: sleep before boot panic)

2008-01-10 Thread Ingo Oeser
On Wednesday 09 January 2008, H. Peter Anvin wrote:
 Pavel Machek wrote:
  
  Of course, if we'd been using kinit, soft panic would 
  have been done exclusively in userspace...
  
  What's the status of kinit, btw?
  Pavel
 
 It's bitrotted a bit since it was first rejected.  It wouldn't take too 
 much work to bring it back up to speed, however.  klibc, and some of the 
 kinit components, are used for the initramfs in Debian.

Yes, and I like most of it. The only thing really missing for me 
is LVM support. Debian (?) did a evil hack to make it work.
Maybe one day this itches soo much, I'll even scratch it :-)

Then I'll be able to test kernels on a standard LVM installation again.

So please keep up the good work!


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sleep before boot panic

2008-01-07 Thread Ingo Oeser
On Monday 07 January 2008, Andi Kleen wrote:
> Bernd Schubert <[EMAIL PROTECTED]> writes:
> 
> > Hi,
> >
> > I just switched to libata (pata) on my laptop and the immediate panic made 
> > it 
> > impossible to figure out why my boot partition wasn't available.
> > After applying this little patch I could check boot printk output and then 
> > saw 
> > everything was properly recognized and only scsi-disk support was missing.
> 
> The correct fix would be to make scroll back (and sysrq) still work
> after panic.  It's a little more complicated, but possible (essentially
> it needs a polled keyboard handler)

Customer: "This system could not find the root fs."
Support: "Oh, yeah, just connect a (USB-) keyboard and scroll back."

Hmm, device detection works after panic?

I really like the "soft" panic better, where you still can operate the 
kernel debugging features, but just have no user space supporting it.

One better hopes, that keyboards never need external firmware to be loaded
at this stage :-)

Best Regards

Ingo Oeser, who just hit the same problem yesterday...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sleep before boot panic

2008-01-07 Thread Ingo Oeser
On Monday 07 January 2008, Andi Kleen wrote:
 Bernd Schubert [EMAIL PROTECTED] writes:
 
  Hi,
 
  I just switched to libata (pata) on my laptop and the immediate panic made 
  it 
  impossible to figure out why my boot partition wasn't available.
  After applying this little patch I could check boot printk output and then 
  saw 
  everything was properly recognized and only scsi-disk support was missing.
 
 The correct fix would be to make scroll back (and sysrq) still work
 after panic.  It's a little more complicated, but possible (essentially
 it needs a polled keyboard handler)

Customer: This system could not find the root fs.
Support: Oh, yeah, just connect a (USB-) keyboard and scroll back.

Hmm, device detection works after panic?

I really like the soft panic better, where you still can operate the 
kernel debugging features, but just have no user space supporting it.

One better hopes, that keyboards never need external firmware to be loaded
at this stage :-)

Best Regards

Ingo Oeser, who just hit the same problem yesterday...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sleep before boot panic

2008-01-06 Thread Ingo Oeser
Hi Bernd,

CC'ed hpa, since I'm sure he can give useful advise on that :-)

On Sunday 06 January 2008, Bernd Schubert wrote:
> On Sunday 06 January 2008, Ingo Oeser wrote:
> > Hi Bernd,
> >
> > On Sunday 06 January 2008, you wrote:
> > > Index: zd1211rw.git.beno/init/do_mounts.c
> > > ===
> > > --- zd1211rw.git.beno.orig/init/do_mounts.c   2008-01-06 
> > > 18:44:23.0
> > > +0100
> > > +++ zd1211rw.git.beno/init/do_mounts.c2008-01-06 18:45:44.0
> > > +0100 @@ -330,6 +330,7 @@
> > >   printk("Please append a correct \"root=\" boot option; here are 
> > > the
> > > available partitions:\n");
> > >
> > >   printk_all_partitions();
> > > + msleep(60 * 1000);
> >
> > ssleep(60);
> 
> feel free to replace it replace it :)

Not that urgent, but if you resubmit please do it :-)

> There is no dump_stack() here, but disc detection is relatively early in boot 
> process and on all these information are already scrolled off screen when the 
> panic is done. For this and any other panic it would be optimal if scrolling 
> still would work, but scrolling also requires kernel code, so I see there's a 
> reason not to this for all panics. However, for this boot problem I tend to 
> say there's no need to panic at all...

But the kernel cannot continue from that position. You would need a "soft" 
panic,
which allows behavior of panic=X, but let the kernel continue.

Even better is to continue with the init in the builtin ramfs. That should 
always
be available and can implement any behavior desired (like droping into a dash).

> Btw, not all stack straces are useless, *most* of them are actually very 
> useful.

I didn't say that. Just if you cannot continue due to admin error, 
but the kernel is in a perfect valid state otherwise,
dumping stack is next to useless.


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sleep before boot panic

2008-01-06 Thread Ingo Oeser
Hi Bernd,

On Sunday 06 January 2008, you wrote:
> Index: zd1211rw.git.beno/init/do_mounts.c
> ===
> --- zd1211rw.git.beno.orig/init/do_mounts.c   2008-01-06 18:44:23.0 
> +0100
> +++ zd1211rw.git.beno/init/do_mounts.c2008-01-06 18:45:44.0 
> +0100
> @@ -330,6 +330,7 @@
>   printk("Please append a correct \"root=\" boot option; here are 
> the 
> available partitions:\n");
>  
>   printk_all_partitions();
> + msleep(60 * 1000);

ssleep(60);

>   panic("VFS: Unable to mount root fs on %s", b);
>   }

Better would be for this and similiar panic()s
(fatal user/admin errors on boot) to NOT print a stack trace+registers,
since it is useless and actually hides useful information.


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sleep before boot panic

2008-01-06 Thread Ingo Oeser
Hi Bernd,

On Sunday 06 January 2008, you wrote:
 Index: zd1211rw.git.beno/init/do_mounts.c
 ===
 --- zd1211rw.git.beno.orig/init/do_mounts.c   2008-01-06 18:44:23.0 
 +0100
 +++ zd1211rw.git.beno/init/do_mounts.c2008-01-06 18:45:44.0 
 +0100
 @@ -330,6 +330,7 @@
   printk(Please append a correct \root=\ boot option; here are 
 the 
 available partitions:\n);
  
   printk_all_partitions();
 + msleep(60 * 1000);

ssleep(60);

   panic(VFS: Unable to mount root fs on %s, b);
   }

Better would be for this and similiar panic()s
(fatal user/admin errors on boot) to NOT print a stack trace+registers,
since it is useless and actually hides useful information.


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sleep before boot panic

2008-01-06 Thread Ingo Oeser
Hi Bernd,

CC'ed hpa, since I'm sure he can give useful advise on that :-)

On Sunday 06 January 2008, Bernd Schubert wrote:
 On Sunday 06 January 2008, Ingo Oeser wrote:
  Hi Bernd,
 
  On Sunday 06 January 2008, you wrote:
   Index: zd1211rw.git.beno/init/do_mounts.c
   ===
   --- zd1211rw.git.beno.orig/init/do_mounts.c   2008-01-06 
   18:44:23.0
   +0100
   +++ zd1211rw.git.beno/init/do_mounts.c2008-01-06 18:45:44.0
   +0100 @@ -330,6 +330,7 @@
 printk(Please append a correct \root=\ boot option; here are 
   the
   available partitions:\n);
  
 printk_all_partitions();
   + msleep(60 * 1000);
 
  ssleep(60);
 
 feel free to replace it replace it :)

Not that urgent, but if you resubmit please do it :-)

 There is no dump_stack() here, but disc detection is relatively early in boot 
 process and on all these information are already scrolled off screen when the 
 panic is done. For this and any other panic it would be optimal if scrolling 
 still would work, but scrolling also requires kernel code, so I see there's a 
 reason not to this for all panics. However, for this boot problem I tend to 
 say there's no need to panic at all...

But the kernel cannot continue from that position. You would need a soft 
panic,
which allows behavior of panic=X, but let the kernel continue.

Even better is to continue with the init in the builtin ramfs. That should 
always
be available and can implement any behavior desired (like droping into a dash).

 Btw, not all stack straces are useless, *most* of them are actually very 
 useful.

I didn't say that. Just if you cannot continue due to admin error, 
but the kernel is in a perfect valid state otherwise,
dumping stack is next to useless.


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] add task handling notifier

2007-12-20 Thread Ingo Oeser
Hi Jan,

I like and support your idea!

On Thursday 20 December 2007, Jan Beulich wrote:
> With more and more sub-systems/sub-components leaving their footprint
> in task handling functions, it seems reasonable to add notifiers that
> these components can use instead of having them all patch themselves
> directly into core files.

Yes, but why export variables? Wouldn't it be better to export 
an API? 

That simplifies the callers (they all pass "current" as task 
and "task_notifier_list" as arguments).

It also prevents exposing internal variables (notifier lists 
ARE internal variables) to modules.

What do you think?


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] add task handling notifier

2007-12-20 Thread Ingo Oeser
Hi Jan,

I like and support your idea!

On Thursday 20 December 2007, Jan Beulich wrote:
 With more and more sub-systems/sub-components leaving their footprint
 in task handling functions, it seems reasonable to add notifiers that
 these components can use instead of having them all patch themselves
 directly into core files.

Yes, but why export variables? Wouldn't it be better to export 
an API? 

That simplifies the callers (they all pass current as task 
and task_notifier_list as arguments).

It also prevents exposing internal variables (notifier lists 
ARE internal variables) to modules.

What do you think?


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] dynamic pipe resizing

2007-12-15 Thread Ingo Oeser
On Saturday 15 December 2007, Jan Engelhardt wrote:
> On Aug 24 2007 10:52, Jens Axboe wrote:
> >Subject: [PATCH][RFC] dynamic pipe resizing
> >Like with my original splice patches from 2005, I used fcntl()
> >F_GETPIPE_SZ and F_SETPIPE_SZ to change the size of the pipe. I'm not
> >particularly fond of that interface, so suggestions on how to improve it
> >would be appreciated. Even if fcntl() should be the preferred approach,
> >I think it would be better to pass in a byte based value instead of a
> >number of pages.
> >
> Could this patch still make it in?
> Yes, I think its set() and get() parts should use bytes and convert 
> to/from pages.
> 
> Perhaps just round up, and mention the rounding in the manpage, so that 
> noone gets a shock when the pipe is not exactly as small as requested.

Yes, but document only that it is rounding and make the unit arbitrary.
That reduces the ABI requirements.


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] dynamic pipe resizing

2007-12-15 Thread Ingo Oeser
On Saturday 15 December 2007, Jan Engelhardt wrote:
 On Aug 24 2007 10:52, Jens Axboe wrote:
 Subject: [PATCH][RFC] dynamic pipe resizing
 Like with my original splice patches from 2005, I used fcntl()
 F_GETPIPE_SZ and F_SETPIPE_SZ to change the size of the pipe. I'm not
 particularly fond of that interface, so suggestions on how to improve it
 would be appreciated. Even if fcntl() should be the preferred approach,
 I think it would be better to pass in a byte based value instead of a
 number of pages.
 
 Could this patch still make it in?
 Yes, I think its set() and get() parts should use bytes and convert 
 to/from pages.
 
 Perhaps just round up, and mention the rounding in the manpage, so that 
 noone gets a shock when the pipe is not exactly as small as requested.

Yes, but document only that it is rounding and make the unit arbitrary.
That reduces the ABI requirements.


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-01 Thread Ingo Oeser
On Saturday 01 December 2007, Ingo Molnar wrote:
> maybe, but we'd have to see how often this gets triggered. An OOM is 
> something that could happen in any overloaded system - while a hung task 
> is likely due to a kernel bug.

What about a client using hard mounted NFS shares here? That shouldn't be
killed by the OOM killer in that situation, should it?

Am I missing sth.?


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-01 Thread Ingo Oeser
On Saturday 01 December 2007, Ingo Molnar wrote:
 maybe, but we'd have to see how often this gets triggered. An OOM is 
 something that could happen in any overloaded system - while a hung task 
 is likely due to a kernel bug.

What about a client using hard mounted NFS shares here? That shouldn't be
killed by the OOM killer in that situation, should it?

Am I missing sth.?


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ata: ahci: Enable enclosure management via LED (resend)

2007-10-25 Thread Ingo Oeser
On Thursday 25 October 2007, Kristen Carlson Accardi wrote:
> I did look into using the LED class for this, but it didn't appropriate
> as I wanted the leds to be associated with a particular disk, and not
> with the platform as a whole.  It seemed to me that the led_class was 
> a bit of overkill for what we needed to do here, since we just need 
> on/off and nothing else.

Maybe. But didn't you want mdadm to control it? Then it would make sense.

But you have a point in the LED API missing the ability to associate 
a LED to a specific device (e.g. where it is installed :-).

So I'm fine either way, since I see you point.


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ata: ahci: Enable enclosure management via LED (resend)

2007-10-25 Thread Ingo Oeser
Hi Kristen,

On Thursday 25 October 2007, Kristen Carlson Accardi wrote:
> Enable enclosure management via LED
> 
> As described in the AHCI spec, some AHCI controllers may support 
> Enclosure management via a variety of protocols.  This patch
> adds support for the LED message type that is specified in 
> AHCI 1.1 and higher.

Linux has a LED subsystem for that. May I suggest, that you just register
these leds and let userspace handle them via that via the LED API?

The LED userspace API is described in Documentation/leds-class.txt
and the headers for registering LEDs is linux/leds.h under include/

Since you explicitly WANT user space to control these, that should
be the right API. 

Richard, what do YOU think?


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ata: ahci: Enable enclosure management via LED (resend)

2007-10-25 Thread Ingo Oeser
Hi Kristen,

On Thursday 25 October 2007, Kristen Carlson Accardi wrote:
 Enable enclosure management via LED
 
 As described in the AHCI spec, some AHCI controllers may support 
 Enclosure management via a variety of protocols.  This patch
 adds support for the LED message type that is specified in 
 AHCI 1.1 and higher.

Linux has a LED subsystem for that. May I suggest, that you just register
these leds and let userspace handle them via that via the LED API?

The LED userspace API is described in Documentation/leds-class.txt
and the headers for registering LEDs is linux/leds.h under include/

Since you explicitly WANT user space to control these, that should
be the right API. 

Richard, what do YOU think?


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ata: ahci: Enable enclosure management via LED (resend)

2007-10-25 Thread Ingo Oeser
On Thursday 25 October 2007, Kristen Carlson Accardi wrote:
 I did look into using the LED class for this, but it didn't appropriate
 as I wanted the leds to be associated with a particular disk, and not
 with the platform as a whole.  It seemed to me that the led_class was 
 a bit of overkill for what we needed to do here, since we just need 
 on/off and nothing else.

Maybe. But didn't you want mdadm to control it? Then it would make sense.

But you have a point in the LED API missing the ability to associate 
a LED to a specific device (e.g. where it is installed :-).

So I'm fine either way, since I see you point.


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 13/13] RT: Cache cpus_allowed weight for optimizing migration

2007-10-23 Thread Ingo Oeser
Hi Gregory,

On Tuesday 23 October 2007, Gregory Haskins wrote:
> Calculating the weight is probably relatively expensive, so it is only
> done when the cpus_allowed mask is updated (which should be relatively
> infrequent, especially compared to scheduling frequency) and cached in
> the task_struct.

Why not make it a task flag, since according to your code, you are only 
interested whether this is <= 1 or > 1. Since !(x <= 1) <=> (x > 1)
for any given unsigned integer x, the required data structure is
a "boolean" or a flag.


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 13/13] RT: Cache cpus_allowed weight for optimizing migration

2007-10-23 Thread Ingo Oeser
Hi Gregory,

On Tuesday 23 October 2007, Gregory Haskins wrote:
 Calculating the weight is probably relatively expensive, so it is only
 done when the cpus_allowed mask is updated (which should be relatively
 infrequent, especially compared to scheduling frequency) and cached in
 the task_struct.

Why not make it a task flag, since according to your code, you are only 
interested whether this is = 1 or  1. Since !(x = 1) = (x  1)
for any given unsigned integer x, the required data structure is
a boolean or a flag.


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Extending kbuild syntax

2007-09-30 Thread Ingo Oeser
Hi Sam,

On Saturday 29 September 2007, Sam Ravnborg wrote:
> Introducing the following new variable could make this a oneliner:
> ccflags-y
> 
> ccflags-$(DEBUG) := -DDEBUG
> 
> grep -r -C 1 -B 1 EXTRA_CFLAGS shows that the above is a 
> very common pattern especially in drivers/

ACK. Also ACK for asflags, if done the same way :-)

> The second is the more controversial suggestion.

Yes, but please bear in mind, what the developers are trying
to express in these cases.

> In several Makefile we have simple if expression of the variants:
> if ($(CONFIG_FOO),y)
>   obj-$(CONFIG_BAR) += fubar.o
> endif

This is "feature FOO of module BAR" where the feature itself
cannot be a module. The composition scheme described in 
section 3.3 is at least equally useful. And that is used today.
Maybe the documentation of that scheme is not prominent enough :-)

> obj-y-ifn-

This is the only one needed, because it is cumbersome to express
negative rules in kbuild to include stubs (e.g. nommu stuff). 
But again this can be done with composition rules right now, but
is order dependent. If we could get rid of this requirement,
I would be happy already. So kbuild is just lacking an "else" 
clause here.


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Extending kbuild syntax

2007-09-30 Thread Ingo Oeser
Hi Sam,

On Saturday 29 September 2007, Sam Ravnborg wrote:
 Introducing the following new variable could make this a oneliner:
 ccflags-y
 
 ccflags-$(DEBUG) := -DDEBUG
 
 grep -r -C 1 -B 1 EXTRA_CFLAGS shows that the above is a 
 very common pattern especially in drivers/

ACK. Also ACK for asflags, if done the same way :-)

 The second is the more controversial suggestion.

Yes, but please bear in mind, what the developers are trying
to express in these cases.

 In several Makefile we have simple if expression of the variants:
 if ($(CONFIG_FOO),y)
   obj-$(CONFIG_BAR) += fubar.o
 endif

This is feature FOO of module BAR where the feature itself
cannot be a module. The composition scheme described in 
section 3.3 is at least equally useful. And that is used today.
Maybe the documentation of that scheme is not prominent enough :-)

 obj-y-ifn-

This is the only one needed, because it is cumbersome to express
negative rules in kbuild to include stubs (e.g. nommu stuff). 
But again this can be done with composition rules right now, but
is order dependent. If we could get rid of this requirement,
I would be happy already. So kbuild is just lacking an else 
clause here.


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Combine path_put and path_put_conditional

2007-09-29 Thread Ingo Oeser
Hi Andreas,

On Friday 28 September 2007, Andreas Gruenbacher wrote:
> The name path_put_conditional (formerly, dput_path) is a little unclear.
> Replace (path_put_conditional + path_put) with path_walk_put_both,
> "put a pair of paths after a path_walk" (see the kerneldoc).
 ^

So why not name it path_walk_put_pair() then?

Rationale: "_both" is just counting, "_pair" 
means they are related somehow.


Best regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[OT] kbuild syntax extension for ccflags and asflags (was: [PATCH 1/3] CodingStyle updates)

2007-09-29 Thread Ingo Oeser
Hi Sam,

On Saturday 29 September 2007, Sam Ravnborg wrote:
> Lately I have considered extending the kbuild syntax a bit.
> 
> Introducing
> ccflags-y
> asflags-y
> 
> [with same functionality as the EXTRA_CFLAGS, EXTRA_AFLAGS]
> would allow us to do:
> 
> ccflags-$(CONFIG_WHATEVER_DEBUG) := -DDEBUG

Please do!

That is very useful for testing and developing new modules.
I learnt a lot from you in this regard and used that kind
of syntax to the extreme in some other non-kernel project
of mine. There it included also ccflags, asflags and so on.

I further split that into -debug-y and -optimize-y flags,
but that was just for my own convenience.


Best Regards

Ingo Oeser

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[OT] kbuild syntax extension for ccflags and asflags (was: [PATCH 1/3] CodingStyle updates)

2007-09-29 Thread Ingo Oeser
Hi Sam,

On Saturday 29 September 2007, Sam Ravnborg wrote:
 Lately I have considered extending the kbuild syntax a bit.
 
 Introducing
 ccflags-y
 asflags-y
 
 [with same functionality as the EXTRA_CFLAGS, EXTRA_AFLAGS]
 would allow us to do:
 
 ccflags-$(CONFIG_WHATEVER_DEBUG) := -DDEBUG

Please do!

That is very useful for testing and developing new modules.
I learnt a lot from you in this regard and used that kind
of syntax to the extreme in some other non-kernel project
of mine. There it included also ccflags, asflags and so on.

I further split that into -debug-y and -optimize-y flags,
but that was just for my own convenience.


Best Regards

Ingo Oeser

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Combine path_put and path_put_conditional

2007-09-29 Thread Ingo Oeser
Hi Andreas,

On Friday 28 September 2007, Andreas Gruenbacher wrote:
 The name path_put_conditional (formerly, dput_path) is a little unclear.
 Replace (path_put_conditional + path_put) with path_walk_put_both,
 put a pair of paths after a path_walk (see the kerneldoc).
 ^

So why not name it path_walk_put_pair() then?

Rationale: _both is just counting, _pair 
means they are related somehow.


Best regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Fix coding style

2007-09-25 Thread Ingo Oeser
On Tuesday 25 September 2007, Srivatsa Vaddagiri wrote:
> Index: current/kernel/sched_debug.c
> ===
> --- current.orig/kernel/sched_debug.c
> +++ current/kernel/sched_debug.c
> @@ -239,11 +239,7 @@ static int
>  root_user_share_read_proc(char *page, char **start, off_t off, int count,
>int *eof, void *data)
>  {
> - int len;
> -
> - len = sprintf(page, "%d\n", init_task_grp_load);
> -
> - return len;
> + return sprintf(page, "%d\n", init_task_grp_load);
>  }
>  
>  static int
> @@ -297,7 +293,7 @@ static int __init init_sched_debug_procf
>   pe->proc_fops = _debug_fops;
>  
>  #ifdef CONFIG_FAIR_USER_SCHED
> - pe = create_proc_entry("root_user_share", 0644, NULL);
> + pe = create_proc_entry("root_user_cpu_share", 0644, NULL);
>   if (!pe)
>   return -ENOMEM;

What about moving this debug stuff under debugfs?
Please consider using the functions in  .
They compile into nothing, if DEBUGFS is not compiled in
and have already useful functions for reading/writing integers
and booleans.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Fix coding style

2007-09-25 Thread Ingo Oeser
On Tuesday 25 September 2007, Srivatsa Vaddagiri wrote:
 Index: current/kernel/sched_debug.c
 ===
 --- current.orig/kernel/sched_debug.c
 +++ current/kernel/sched_debug.c
 @@ -239,11 +239,7 @@ static int
  root_user_share_read_proc(char *page, char **start, off_t off, int count,
int *eof, void *data)
  {
 - int len;
 -
 - len = sprintf(page, %d\n, init_task_grp_load);
 -
 - return len;
 + return sprintf(page, %d\n, init_task_grp_load);
  }
  
  static int
 @@ -297,7 +293,7 @@ static int __init init_sched_debug_procf
   pe-proc_fops = sched_debug_fops;
  
  #ifdef CONFIG_FAIR_USER_SCHED
 - pe = create_proc_entry(root_user_share, 0644, NULL);
 + pe = create_proc_entry(root_user_cpu_share, 0644, NULL);
   if (!pe)
   return -ENOMEM;

What about moving this debug stuff under debugfs?
Please consider using the functions in linux/debugfs.h .
They compile into nothing, if DEBUGFS is not compiled in
and have already useful functions for reading/writing integers
and booleans.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why do so many machines need "noapic"?

2007-09-15 Thread Ingo Oeser
On Saturday 15 September 2007, Andrew Morton wrote:
> There are 48 bugs in bugzilla which mention "noapic"
> 
> http://bugzilla.kernel.org/buglist.cgi?query_format=advanced_desc_type=allwordssubstr_desc=_desc_type=substring_desc=noapic_version_type=allwordssubstr_version=_status=NEW_status=REOPENED_status=ASSIGNED_to1=1=substring=_to2=1=1=1=substring==include_id===Now==both=doit=Reuse+same+sort+as+last+time=noop=noop=
> 
> And there are 173,000 on the internet ;)
> http://www.google.com/search?hl=en=linux+noapic=Google+Search
> 
> We screwed this pooch a long time ago - years.  Perhaps if some of the many
> noapic users could run a bisection search to work out when it broke we
> could start fixing things.  But they all have a workaround so there's no
> motivation.

I have 2 SMP-Boards and both need noapic. One is from 2001 (AUSUS CUR-DLS),
one is from June 2006 (Gigabyte M57SLI-S4).

There are many reasons:

1. Bugs which have such a simple workaround don't get much attention.

2. Usually SMP boards are used for machines, which just HAVE to work,
   since they have been expensive. These are not consumer boards.

3. I usually had only USB problems (no IRQ), if ommiting noapic.
   USB technology is a cosumer grade technology and enterprise
   grade developers don't have much interest in it (until now?).

4. IRQ routing setup is often a BIOS issue. You might be able
   to fix that by upgrading your BIOS. That often needs a Windows
   tool. Linux people not always (want to) have access to Windows :-)

I reported the all the problems (starting 2001), no developer 
seemed interested.

I can report them against the latest RC6 kernel tomorrow and put them
into bugzilla, if we now REALLY care.


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why do so many machines need noapic?

2007-09-15 Thread Ingo Oeser
On Saturday 15 September 2007, Andrew Morton wrote:
 There are 48 bugs in bugzilla which mention noapic
 
 http://bugzilla.kernel.org/buglist.cgi?query_format=advancedshort_desc_type=allwordssubstrshort_desc=long_desc_type=substringlong_desc=noapickernel_version_type=allwordssubstrkernel_version=bug_status=NEWbug_status=REOPENEDbug_status=ASSIGNEDemailassigned_to1=1emailtype1=substringemail1=emailassigned_to2=1emailreporter2=1emailcc2=1emailtype2=substringemail2=bugidtype=includebug_id=chfieldfrom=chfieldto=Nowchfieldvalue=regression=bothcmdtype=doitorder=Reuse+same+sort+as+last+timefield0-0-0=nooptype0-0-0=noopvalue0-0-0=
 
 And there are 173,000 on the internet ;)
 http://www.google.com/search?hl=enq=linux+noapicbtnG=Google+Search
 
 We screwed this pooch a long time ago - years.  Perhaps if some of the many
 noapic users could run a bisection search to work out when it broke we
 could start fixing things.  But they all have a workaround so there's no
 motivation.

I have 2 SMP-Boards and both need noapic. One is from 2001 (AUSUS CUR-DLS),
one is from June 2006 (Gigabyte M57SLI-S4).

There are many reasons:

1. Bugs which have such a simple workaround don't get much attention.

2. Usually SMP boards are used for machines, which just HAVE to work,
   since they have been expensive. These are not consumer boards.

3. I usually had only USB problems (no IRQ), if ommiting noapic.
   USB technology is a cosumer grade technology and enterprise
   grade developers don't have much interest in it (until now?).

4. IRQ routing setup is often a BIOS issue. You might be able
   to fix that by upgrading your BIOS. That often needs a Windows
   tool. Linux people not always (want to) have access to Windows :-)

I reported the all the problems (starting 2001), no developer 
seemed interested.

I can report them against the latest RC6 kernel tomorrow and put them
into bugzilla, if we now REALLY care.


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] crypto: cleanup: Use max() in blkcipher_get_spot() to state the intention.

2007-09-11 Thread Ingo Oeser
[PATCH] crypto: cleanup: Use max() in blkcipher_get_spot() to state the 
intention.

Signed-off-by: Ingo Oeser <[EMAIL PROTECTED]>
---
Hi Herbert,

here is the requested patch against Linus' latest tree. 
It at least compiles.

Best Regards

Ingo Oeser

 crypto/blkcipher.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c
index d8f8ec3..1c99d92 100644
--- a/crypto/blkcipher.c
+++ b/crypto/blkcipher.c
@@ -65,7 +65,7 @@ static inline void blkcipher_unmap_dst(struct blkcipher_walk 
*walk)
 static inline u8 *blkcipher_get_spot(u8 *start, unsigned int len)
 {
u8 *end_page = (u8 *)(((unsigned long)(start + len - 1)) & PAGE_MASK);
-   return start > end_page ? start : end_page;
+   return max(start, end_page);
 }
 
 static inline unsigned int blkcipher_done_slow(struct crypto_blkcipher *tfm,
-- 
1.5.2.5



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] crypto: cleanup: Use max() in blkcipher_get_spot() to state the intention.

2007-09-11 Thread Ingo Oeser
[PATCH] crypto: cleanup: Use max() in blkcipher_get_spot() to state the 
intention.

Signed-off-by: Ingo Oeser [EMAIL PROTECTED]
---
Hi Herbert,

here is the requested patch against Linus' latest tree. 
It at least compiles.

Best Regards

Ingo Oeser

 crypto/blkcipher.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c
index d8f8ec3..1c99d92 100644
--- a/crypto/blkcipher.c
+++ b/crypto/blkcipher.c
@@ -65,7 +65,7 @@ static inline void blkcipher_unmap_dst(struct blkcipher_walk 
*walk)
 static inline u8 *blkcipher_get_spot(u8 *start, unsigned int len)
 {
u8 *end_page = (u8 *)(((unsigned long)(start + len - 1))  PAGE_MASK);
-   return start  end_page ? start : end_page;
+   return max(start, end_page);
 }
 
 static inline unsigned int blkcipher_done_slow(struct crypto_blkcipher *tfm,
-- 
1.5.2.5



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] crypto: blkcipher_get_spot() handling of buffer at end of page

2007-09-10 Thread Ingo Oeser
Hi Herbert,

On Monday 10 September 2007, Herbert Xu wrote:
> On Sat, Sep 08, 2007 at 12:14:23PM +0800, Herbert Xu wrote:
> > 
> > [CRYPTO] blkcipher: Fix handling of kmalloc page straddling
> 
> As Bob correctly noted, I had the boolean test inverted.
> Here is the correction:
> 
> [CRYPTO] blkcipher: Fix inverted test in blkcipher_get_spot
> 
> The previous patch had the conditional inverted.  This patch fixes it
> so that we return the original position if it does not straddle a page.

What about using max() for this to make your intention obvious?

static inline u8 *blkcipher_get_spot(u8 *start, unsigned int len)
{
u8 *end_page = (u8 *)(((unsigned long)(start + len - 1)) & PAGE_MASK); 
return max(start, end_page);
}


Best Regards

Ingo Oeser 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFD] new syscalls: suspend_other/resume_other?

2007-09-02 Thread Ingo Oeser
Hi there,

at the moment implementing a mark and sweep garbage collection 
subsystem is quite a hack, because you always have to use up
some signals for suspend/resume all threads to implement this.

For runtime environments (like D system libraries or JVMs) this is 
a hack, since you take away flexibility from the application.

A possible solution would be a syscall or a PTRACE extension
to realize the suspend/resume. I best describe the possible
syscall manpages here, so you get an idea.

NAME 
suspend_other - suspends execution of all but the calling thread

SYNOPSIS
long suspend_other(void);

RETURN VALUE
Positive count suspended threads on success. 
If 0, then suspend_other was a no-op and there is nothing to resume, 
but the
call should still considered successful.
If the number is -1, the errno has to be checked for possible error 
values.

ERRORS
EDEADLK We run already a suspend_other() and the calling thread 
has just been resumed.

EPERM The calling thread is not allowed to do this. 
(optional case due to security)

DESCRIPTION
After sucessful return of this call, the affected process 
is single threaded and only the calling thread runs 
in this process (==MM struct).

The thread, which calls this is responsible for resuming 
all the suspended threads. One can iterate through 
"/proc/self/task/", to verify for sure that one knows all threads,
if the returned count doesn't match the expected value.

Any per thread queued signals are deferred until resume_other()
or process destruction.

NOTES
This call might be restricted to the main thread.



NAME 
resume_other - resume execution of foreign thread in this process

SYNOPSIS
long resume_other(pid_t tid)

RETURN VALUE
Returns 0, if successful.
Otherwise -1 is returned, the errno has to be checked for 
possible error values and the call has no effect at all.

Any non-blocked signals of that thread which happend during 
suspend/resume are deliverd now.

ERRORS
EINVAL The thread is running already. 
(this is a severe caller BUG).
ESRCH The thread with tid does not exist.
(or doesn't belong to this process).
EPERM The calling thread is not allowed to do this. 
(optional case due to security)

DESCRIPTION
After sucessful return of this call, the affected thread 
is the the state it was before it was suspended 
by calling suspend_other().

NOTES
The value -1 for tid is reserved for future extension 
(e.g. meaning ALL other threads).

This call might be restricted to the main thread.

-

Any opinions?


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFD] new syscalls: suspend_other/resume_other?

2007-09-02 Thread Ingo Oeser
Hi there,

at the moment implementing a mark and sweep garbage collection 
subsystem is quite a hack, because you always have to use up
some signals for suspend/resume all threads to implement this.

For runtime environments (like D system libraries or JVMs) this is 
a hack, since you take away flexibility from the application.

A possible solution would be a syscall or a PTRACE extension
to realize the suspend/resume. I best describe the possible
syscall manpages here, so you get an idea.

NAME 
suspend_other - suspends execution of all but the calling thread

SYNOPSIS
long suspend_other(void);

RETURN VALUE
Positive count suspended threads on success. 
If 0, then suspend_other was a no-op and there is nothing to resume, 
but the
call should still considered successful.
If the number is -1, the errno has to be checked for possible error 
values.

ERRORS
EDEADLK We run already a suspend_other() and the calling thread 
has just been resumed.

EPERM The calling thread is not allowed to do this. 
(optional case due to security)

DESCRIPTION
After sucessful return of this call, the affected process 
is single threaded and only the calling thread runs 
in this process (==MM struct).

The thread, which calls this is responsible for resuming 
all the suspended threads. One can iterate through 
/proc/self/task/, to verify for sure that one knows all threads,
if the returned count doesn't match the expected value.

Any per thread queued signals are deferred until resume_other()
or process destruction.

NOTES
This call might be restricted to the main thread.



NAME 
resume_other - resume execution of foreign thread in this process

SYNOPSIS
long resume_other(pid_t tid)

RETURN VALUE
Returns 0, if successful.
Otherwise -1 is returned, the errno has to be checked for 
possible error values and the call has no effect at all.

Any non-blocked signals of that thread which happend during 
suspend/resume are deliverd now.

ERRORS
EINVAL The thread is running already. 
(this is a severe caller BUG).
ESRCH The thread with tid does not exist.
(or doesn't belong to this process).
EPERM The calling thread is not allowed to do this. 
(optional case due to security)

DESCRIPTION
After sucessful return of this call, the affected thread 
is the the state it was before it was suspended 
by calling suspend_other().

NOTES
The value -1 for tid is reserved for future extension 
(e.g. meaning ALL other threads).

This call might be restricted to the main thread.

-

Any opinions?


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] memchr (trivial) optimization

2007-08-22 Thread Ingo Oeser
On Wednesday 22 August 2007, lode leroy wrote:
> While profiling something completely unrelated, I noticed
> that on the workloads I used memchr for, I saw a 30%-40% improvement
> in performance, with the following trivial changes...
> (basically, it saves 3 operations for each call)

Yes, but then you could be a bit more explicit to the compiler
on what you are doing here:
 
void *memchr(const void *s, int c, size_t n)
{
const unsigned char *p = s;

for (; n != 0; n--, p++) {
   if ((unsigned char)c == *p) {
   return (void *)p;
}
return NULL;
}

Now the compiler should see the loop more clearly.


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] memchr (trivial) optimization

2007-08-22 Thread Ingo Oeser
On Wednesday 22 August 2007, lode leroy wrote:
 While profiling something completely unrelated, I noticed
 that on the workloads I used memchr for, I saw a 30%-40% improvement
 in performance, with the following trivial changes...
 (basically, it saves 3 operations for each call)

Yes, but then you could be a bit more explicit to the compiler
on what you are doing here:
 
void *memchr(const void *s, int c, size_t n)
{
const unsigned char *p = s;

for (; n != 0; n--, p++) {
   if ((unsigned char)c == *p) {
   return (void *)p;
}
return NULL;
}

Now the compiler should see the loop more clearly.


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]: proc: export a processes resource limits via proc/

2007-08-13 Thread Ingo Oeser
Hi Neil,

> +static struct limit_names lnames[RLIM_NLIMITS] = {
static const ...

may be better here.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]: proc: export a processes resource limits via proc/pid

2007-08-13 Thread Ingo Oeser
Hi Neil,

 +static struct limit_names lnames[RLIM_NLIMITS] = {
static const ...

may be better here.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Driver-level memory management

2007-08-12 Thread Ingo Oeser
Hi Michael,

On Sunday 12 August 2007, Michael Bourgeous wrote:
> I'm working on a driver for older HDTV cards based on the TL880 chip.
> These cards typically have 16MB of their own memory, which is
> available to me over the PCI bus.  Various functions of the card
> require me to manage this memory, allocating and freeing chunks of it
> as necessary.  I can easily include my own allocation and management
> code, 

Ok.

> but I'm sure this is a problem that has been solved before. 

Yes!

in your Kconfig

select GENERIC_ALLOCATOR

in your driver.c

#include 

Code is in lib/genalloc.c, if you like to take a look.

Memory for MANAGING free/allocated space is NOT taken 
from your on-card memory! That allocator is explicitly developed
for such use cases.

Happy hacking!


Best regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Driver-level memory management

2007-08-12 Thread Ingo Oeser
Hi Michael,

On Sunday 12 August 2007, Michael Bourgeous wrote:
 I'm working on a driver for older HDTV cards based on the TL880 chip.
 These cards typically have 16MB of their own memory, which is
 available to me over the PCI bus.  Various functions of the card
 require me to manage this memory, allocating and freeing chunks of it
 as necessary.  I can easily include my own allocation and management
 code, 

Ok.

 but I'm sure this is a problem that has been solved before. 

Yes!

in your Kconfig

select GENERIC_ALLOCATOR

in your driver.c

#include linux/genalloc.h

Code is in lib/genalloc.c, if you like to take a look.

Memory for MANAGING free/allocated space is NOT taken 
from your on-card memory! That allocator is explicitly developed
for such use cases.

Happy hacking!


Best regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 03/11] fuse: add reference counting to fuse_file

2007-08-04 Thread Ingo Oeser
Hi Miklos,

On Friday 03 August 2007, Miklos Szeredi wrote:
> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> Make lifetime of 'struct fuse_file' independent from 'struct file' by
> adding a reference counter and destructor.

What about using krefs to implement that?

see include/linux/kref.h
and lib/kref.c

Just embed that "struct kref" inside your struct fuse_file

struct fuse_file {
...
struct kref ref;
...
}

init in struct fuse_file *fuse_file_alloc(void) where you added the counter.
...
kref_init(>ref);
...

and implement the release function like:

static void fuse_file_release(struct kref *ff_ref)
{
struct fuse_file *ff = container_of(ff_ref, struct fuse_file, ref);
struct fuse_req *req = ff->reserved_req;
struct fuse_conn *fc = get_fuse_conn(req->dentry->d_inode);
request_send_background(fc, req);
kfree(ff);
}

This will also fix the missing smp_barriers, is very simple, saves code,
makes your life easier and is a well known known kernel infrastructure :-)

BTW: FUSE rocks! :-)

You can add my "Signed-off-by: Ingo Oeser <[EMAIL PROTECTED]>",
if you want to use that suggestion.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 03/11] fuse: add reference counting to fuse_file

2007-08-04 Thread Ingo Oeser
Hi Miklos,

On Friday 03 August 2007, Miklos Szeredi wrote:
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Make lifetime of 'struct fuse_file' independent from 'struct file' by
 adding a reference counter and destructor.

What about using krefs to implement that?

see include/linux/kref.h
and lib/kref.c

Just embed that struct kref inside your struct fuse_file

struct fuse_file {
...
struct kref ref;
...
}

init in struct fuse_file *fuse_file_alloc(void) where you added the counter.
...
kref_init(ff-ref);
...

and implement the release function like:

static void fuse_file_release(struct kref *ff_ref)
{
struct fuse_file *ff = container_of(ff_ref, struct fuse_file, ref);
struct fuse_req *req = ff-reserved_req;
struct fuse_conn *fc = get_fuse_conn(req-dentry-d_inode);
request_send_background(fc, req);
kfree(ff);
}

This will also fix the missing smp_barriers, is very simple, saves code,
makes your life easier and is a well known known kernel infrastructure :-)

BTW: FUSE rocks! :-)

You can add my Signed-off-by: Ingo Oeser [EMAIL PROTECTED],
if you want to use that suggestion.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [EXT4 set 7][PATCH 1/1]Remove 32000 subdirs limit.

2007-07-22 Thread Ingo Oeser
On Tuesday 17 July 2007, Kalpak Shah wrote:
> Index: linux-2.6.22/include/linux/ext4_fs.h
> ===
> --- linux-2.6.22.orig/include/linux/ext4_fs.h
> +++ linux-2.6.22/include/linux/ext4_fs.h
> @@ -797,12 +797,18 @@ struct ext4_dir_entry_2 {
>#define is_dx(dir) (EXT4_HAS_COMPAT_FEATURE(dir->i_sb, \
>   EXT4_FEATURE_COMPAT_DIR_INDEX) 
> && \
>   (EXT4_I(dir)->i_flags & EXT4_INDEX_FL))
> -#define EXT4_DIR_LINK_MAX(dir) (!is_dx(dir) && (dir)->i_nlink >= 
> EXT4_LINK_MAX)
> -#define EXT4_DIR_LINK_EMPTY(dir) ((dir)->i_nlink == 2 || (dir)->i_nlink == 1)
> +static inline int ext4_dir_link_max(struct inode *dir)
> +{
> +   return (!is_dx(dir) && (dir)->i_nlink >= EXT4_LINK_MAX);
> +}
> +static inline int ext4_dir_link_empty(struct inode *dir)
> +{
> +   return ((dir)->i_nlink == 2 || (dir)->i_nlink == 1);
> +}

even better:

static inline bool ext4_is_dx(const struct inode *dir)
{
#ifdef FOOBAR
return EXT4_HAS_COMPAT_FEATURE(dir->i_sb,   
EXT4_FEATURE_COMPAT_DIR_INDEX) 
   && (EXT4_I(dir)->i_flags & EXT4_INDEX_FL));
#else
return false;
#endif
}

static inline bool ext4_dir_link_max(const struct inode *dir)
{
return !ext4_is_dx(dir) && (dir->i_nlink >= EXT4_LINK_MAX);
}

static inline bool ext4_dir_link_empty(const struct inode *dir)
{
#ifdef FOOBAR
return dir->i_nlink == 2 || dir->i_nlink == 1;
#else
return dir->i_nlink == 2;
#endif
}


FOOBAR is the define, which enables ext4_is_dx(). 
That is not in the patch, so left as an exercise to the reader :-)

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [EXT4 set 7][PATCH 1/1]Remove 32000 subdirs limit.

2007-07-22 Thread Ingo Oeser
On Tuesday 17 July 2007, Kalpak Shah wrote:
 Index: linux-2.6.22/include/linux/ext4_fs.h
 ===
 --- linux-2.6.22.orig/include/linux/ext4_fs.h
 +++ linux-2.6.22/include/linux/ext4_fs.h
 @@ -797,12 +797,18 @@ struct ext4_dir_entry_2 {
#define is_dx(dir) (EXT4_HAS_COMPAT_FEATURE(dir-i_sb, \
   EXT4_FEATURE_COMPAT_DIR_INDEX) 
  \
   (EXT4_I(dir)-i_flags  EXT4_INDEX_FL))
 -#define EXT4_DIR_LINK_MAX(dir) (!is_dx(dir)  (dir)-i_nlink = 
 EXT4_LINK_MAX)
 -#define EXT4_DIR_LINK_EMPTY(dir) ((dir)-i_nlink == 2 || (dir)-i_nlink == 1)
 +static inline int ext4_dir_link_max(struct inode *dir)
 +{
 +   return (!is_dx(dir)  (dir)-i_nlink = EXT4_LINK_MAX);
 +}
 +static inline int ext4_dir_link_empty(struct inode *dir)
 +{
 +   return ((dir)-i_nlink == 2 || (dir)-i_nlink == 1);
 +}

even better:

static inline bool ext4_is_dx(const struct inode *dir)
{
#ifdef FOOBAR
return EXT4_HAS_COMPAT_FEATURE(dir-i_sb,   
EXT4_FEATURE_COMPAT_DIR_INDEX) 
(EXT4_I(dir)-i_flags  EXT4_INDEX_FL));
#else
return false;
#endif
}

static inline bool ext4_dir_link_max(const struct inode *dir)
{
return !ext4_is_dx(dir)  (dir-i_nlink = EXT4_LINK_MAX);
}

static inline bool ext4_dir_link_empty(const struct inode *dir)
{
#ifdef FOOBAR
return dir-i_nlink == 2 || dir-i_nlink == 1;
#else
return dir-i_nlink == 2;
#endif
}


FOOBAR is the define, which enables ext4_is_dx(). 
That is not in the patch, so left as an exercise to the reader :-)

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] intel-rng: Undo mess made by an 80 column extremist

2007-06-08 Thread Ingo Oeser
On Friday 08 June 2007, John Stoffel wrote:
> Jeff> On Thu, Jun 07, 2007 at 09:56:06PM -0400, John Stoffel wrote:
> >> Thinking about it more, I wonder if Krysztof is bitching more about
> >> the tab width of 8 characters?   I know that it ticks me off,
> 
> Jeff> Even if he is, _that_ is definitely not getting changed.
> 
> Oh sure... I know that part is written in stone.  

Yes, and as a person doing Linux code review for 12 years now,
I'm really thankful for it. 8 char tab, 80 column rule and 25-50 lines
of code per function actually enable effective review of code snippets.
Because you can see more code flow per patch.

And enables high code reuse. If you can get within 1-5min, 
what a functions does and match it with your actually 
written down last 20 code lines, you just reuse it more often.

If you have more to choose from, you reuse naturally. 
Personally I find best candidates by code position in tree and 
function signature.

> Jeff> If code starts creeping way right due to indentation levels,
> Jeff> create a new function.
> 
> Sure... compilers are good, us humans haven't gotten much better, make
> it easier on us and harder on the computer.   

Yes, let compile remove all the abstraction overhead. 
GCC does a pretty good job there, I think.

I recently analyzed some code and it took much, much longer (factor 2-3), 
because of laxer coding rules similiar to the ones you suggest.

I even asked the developers, who wrote that code and to ones who work 
daily with that code base and they had the same problems. They all couldn't 
explain the "Why?" only the "How?". Not to mention, that this was a core 
component.

After refactoring some big messy parts into smaller functions,
identical, missing, unhandled cases became visible, inappropriate usages
were identified and even some loops could be removed.

Now try to find such problems within Linux. They should be a small percentage 
and not within core components.

So a big THANKS to all the code cops here: You actually make the 
damn fast change rate of Linux possible by keeping the base clean 
and neat.

Best Regards

Ingo oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/6] Storing ipcs into radix trees

2007-06-08 Thread Ingo Oeser
Hi,

On Friday 08 June 2007, Nadia Derbey wrote:
> Ingo Oeser wrote:
> > ... together with this means 4*256 -> 1k of precious stack space used.
> > Please consider either lowering IPCS_MAX_SCAN_ENTRIES or kmalloc() that.
> You're completely right, but trying to lower the extraction size, I'm 
> afraid this will have an impact on performances.
> 
> Here are the results of a small test I did: I have run ctxbench on both 
> the 256 and and 16 entries versions
> 
> 1) 256 entries:
> 42523679 itterations in 300.005423 seconds = 141743/sec
> 2) 16 entries:
> 41774255 itterations in 300.005334 seconds = 139245/sec

So that is around 1.8% in a benchmark. 

Not bad, if one considers, that this is an expensive syncronisation primitive 
anyway (and thus shouldn't dominate any real workload). At least _much_
better than possible stack underflow :-)

BTW: You forgot to include measurements with the unmodified code 
as it is in Linus' tree now. They woule be a nice data point here.

> Will try with a dynamic allocation.

But than you have an additional error path 
or have to sleep until memory becomes available.

Maybe try doubling IPCS_MAX_SCAN_ENTRIES - until the performance impact
is in the noise - is simpler. Up to 64 seems acceptable.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/6] Storing ipcs into radix trees

2007-06-08 Thread Ingo Oeser
Hi,

On Friday 08 June 2007, Nadia Derbey wrote:
 Ingo Oeser wrote:
  ... together with this means 4*256 - 1k of precious stack space used.
  Please consider either lowering IPCS_MAX_SCAN_ENTRIES or kmalloc() that.
 You're completely right, but trying to lower the extraction size, I'm 
 afraid this will have an impact on performances.
 
 Here are the results of a small test I did: I have run ctxbench on both 
 the 256 and and 16 entries versions
 
 1) 256 entries:
 42523679 itterations in 300.005423 seconds = 141743/sec
 2) 16 entries:
 41774255 itterations in 300.005334 seconds = 139245/sec

So that is around 1.8% in a benchmark. 

Not bad, if one considers, that this is an expensive syncronisation primitive 
anyway (and thus shouldn't dominate any real workload). At least _much_
better than possible stack underflow :-)

BTW: You forgot to include measurements with the unmodified code 
as it is in Linus' tree now. They woule be a nice data point here.

 Will try with a dynamic allocation.

But than you have an additional error path 
or have to sleep until memory becomes available.

Maybe try doubling IPCS_MAX_SCAN_ENTRIES - until the performance impact
is in the noise - is simpler. Up to 64 seems acceptable.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] intel-rng: Undo mess made by an 80 column extremist

2007-06-08 Thread Ingo Oeser
On Friday 08 June 2007, John Stoffel wrote:
 Jeff On Thu, Jun 07, 2007 at 09:56:06PM -0400, John Stoffel wrote:
  Thinking about it more, I wonder if Krysztof is bitching more about
  the tab width of 8 characters?   I know that it ticks me off,
 
 Jeff Even if he is, _that_ is definitely not getting changed.
 
 Oh sure... I know that part is written in stone.  

Yes, and as a person doing Linux code review for 12 years now,
I'm really thankful for it. 8 char tab, 80 column rule and 25-50 lines
of code per function actually enable effective review of code snippets.
Because you can see more code flow per patch.

And enables high code reuse. If you can get within 1-5min, 
what a functions does and match it with your actually 
written down last 20 code lines, you just reuse it more often.

If you have more to choose from, you reuse naturally. 
Personally I find best candidates by code position in tree and 
function signature.

 Jeff If code starts creeping way right due to indentation levels,
 Jeff create a new function.
 
 Sure... compilers are good, us humans haven't gotten much better, make
 it easier on us and harder on the computer.   

Yes, let compile remove all the abstraction overhead. 
GCC does a pretty good job there, I think.

I recently analyzed some code and it took much, much longer (factor 2-3), 
because of laxer coding rules similiar to the ones you suggest.

I even asked the developers, who wrote that code and to ones who work 
daily with that code base and they had the same problems. They all couldn't 
explain the Why? only the How?. Not to mention, that this was a core 
component.

After refactoring some big messy parts into smaller functions,
identical, missing, unhandled cases became visible, inappropriate usages
were identified and even some loops could be removed.

Now try to find such problems within Linux. They should be a small percentage 
and not within core components.

So a big THANKS to all the code cops here: You actually make the 
damn fast change rate of Linux possible by keeping the base clean 
and neat.

Best Regards

Ingo oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/6] Storing ipcs into radix trees

2007-06-07 Thread Ingo Oeser
Hi Nadia,

good to see someone is pounding this old beast again :-)

On Thursday 07 June 2007, [EMAIL PROTECTED] wrote:
> Index: linux-2.6.21/ipc/util.h
> ===
> --- linux-2.6.21.orig/ipc/util.h  2007-06-07 11:00:30.0 +0200
> +++ linux-2.6.21/ipc/util.h   2007-06-07 11:07:22.0 +0200
> @@ -13,6 +13,8 @@
>  #define USHRT_MAX 0x
>  #define SEQ_MULTIPLIER   (IPCMNI)
>  
> +#define IPCS_MAX_SCAN_ENTRIES 256

That ...

> Index: linux-2.6.21/ipc/util.c
> ===
> --- linux-2.6.21.orig/ipc/util.c  2007-06-07 11:00:30.0 +0200
> +++ linux-2.6.21/ipc/util.c   2007-06-07 11:29:43.0 +0200
> @@ -252,72 +241,94 @@ void __init ipc_init_proc_interface(cons
>   *   @key: The key to find
>   *   
>   *   Requires ipc_ids.mutex locked.
> - *   Returns the identifier if found or -1 if not.
> + *   Returns the LOCKED pointer to the ipc structure if found or NULL
> + *   if not.
> + *  If key is found ipc contains its ipc structure
>   */
>   
> -int ipc_findkey(struct ipc_ids* ids, key_t key)
> +struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
>  {
> - int id;
> - struct kern_ipc_perm* p;
> - int max_id = ids->max_id;
> + struct kern_ipc_perm *ipc;
> + struct kern_ipc_perm *ipcs[IPCS_MAX_SCAN_ENTRIES];

... together with this means 4*256 -> 1k of precious stack space used.
Please consider either lowering IPCS_MAX_SCAN_ENTRIES or kmalloc() that.

Same problem with your third patch called 
"Changing the loops on a single ipcid into radix_tree_gang_lookup() calls"

If you cannot sleep, try to lower that constant (e.g. 16-32). 
The current users use much smaller numbers.

If you can sleep and performance goes down after lowering that constant,
try to kmalloc these arrays (since kmalloc() of that small amount 
should succeed easily).



Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/6] Storing ipcs into radix trees

2007-06-07 Thread Ingo Oeser
Hi Nadia,

good to see someone is pounding this old beast again :-)

On Thursday 07 June 2007, [EMAIL PROTECTED] wrote:
 Index: linux-2.6.21/ipc/util.h
 ===
 --- linux-2.6.21.orig/ipc/util.h  2007-06-07 11:00:30.0 +0200
 +++ linux-2.6.21/ipc/util.h   2007-06-07 11:07:22.0 +0200
 @@ -13,6 +13,8 @@
  #define USHRT_MAX 0x
  #define SEQ_MULTIPLIER   (IPCMNI)
  
 +#define IPCS_MAX_SCAN_ENTRIES 256

That ...

 Index: linux-2.6.21/ipc/util.c
 ===
 --- linux-2.6.21.orig/ipc/util.c  2007-06-07 11:00:30.0 +0200
 +++ linux-2.6.21/ipc/util.c   2007-06-07 11:29:43.0 +0200
 @@ -252,72 +241,94 @@ void __init ipc_init_proc_interface(cons
   *   @key: The key to find
   *   
   *   Requires ipc_ids.mutex locked.
 - *   Returns the identifier if found or -1 if not.
 + *   Returns the LOCKED pointer to the ipc structure if found or NULL
 + *   if not.
 + *  If key is found ipc contains its ipc structure
   */
   
 -int ipc_findkey(struct ipc_ids* ids, key_t key)
 +struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
  {
 - int id;
 - struct kern_ipc_perm* p;
 - int max_id = ids-max_id;
 + struct kern_ipc_perm *ipc;
 + struct kern_ipc_perm *ipcs[IPCS_MAX_SCAN_ENTRIES];

... together with this means 4*256 - 1k of precious stack space used.
Please consider either lowering IPCS_MAX_SCAN_ENTRIES or kmalloc() that.

Same problem with your third patch called 
Changing the loops on a single ipcid into radix_tree_gang_lookup() calls

If you cannot sleep, try to lower that constant (e.g. 16-32). 
The current users use much smaller numbers.

If you can sleep and performance goes down after lowering that constant,
try to kmalloc these arrays (since kmalloc() of that small amount 
should succeed easily).



Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bridge] [BUG] Dropping fragmented IP packets within VLAN frames on bridge

2007-05-26 Thread Ingo Oeser
On Saturday 26 May 2007, Patrick McHardy wrote:
> Adam Osuchowski wrote:
> > if (((skb->protocol == htons(ETH_P_IP) && skb->len > skb->dev->mtu) ||
> > (IS_VLAN_IP(skb) && skb->len > skb->dev->mtu - VLAN_HLEN)) &&
> > !skb_is_gso(skb))
> > return ip_fragment ...
> 
> 
> net/8021q ignores the VLAN header overhead, so we should probably do the
> same here for consistency. Using IS_VLAN_IP (and IS_PPPOE_IP for current
> -rc) looks fine, additionally we should probably also check for
> skb->nfct != NULL to make sure that at least without connection tracking
> the bridge doesn't perform fragmentation.

And could we separe the conditions for that into a static helper function
explaining each of these conditions? e.g. sth. like that:

static bool br_nf_need_fragment(struct sk_buff *skb)
{
/* Plain IP packet does not fit in MTU */
if (!(skb->protocol == htons(ETH_P_IP) && skb->len > skb->dev->mtu))
return true;

/* VLAN encapsulated IP packet does not fit in MTU */
if (IS_VLAN_IP(skb) && skb->len > skb->dev->mtu - VLAN_HLEN)
return true;

/* PPPoE encapsulated IP packet does not fit in MTU */
if (IS_PPPOE_IP(skb) && skb->len > skb->dev->mtu - PPPOE_SES_HLEN)
return true;

return false;
}

and then br_nf_dev_queue_xmit() becomes:

static int br_nf_dev_queue_xmit(struct sk_buff *skb)
{
if (br_nf_need_fragment(skb) &&  !skb_is_gso(skb))
return ip_fragment(skb, br_dev_queue_push_xmit);
else
return br_dev_queue_push_xmit(skb);
}

which is much more readable, more documented and doesn't contain a condition 
monster :-)

@Patrick: Could you check, wether the PPPoE case is correct?

What do you think? Should I submit a patch for that?


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: epoll,threading

2007-05-26 Thread Ingo Oeser
Hi Arunachalam,

On Saturday 26 May 2007, Arunachalam wrote:
> I want to know in detail about , what the events (epoll or /dev/poll or
> select ) achieve in contrast to  thread per client.
> 
> i can have a thread per client and use send and recv system call directly
> right? Why do i go for these event mechanisms?

Try 30.000 clients or more on a x86 32bit box. 
That will show you the difference quite nicely :-)

More seriously: Thread per client scales only to a certain amount of clients
per RAM. If you like to scale beyond that to like to minimize your 
state 
per client. If you have a thread then you have a task structure as 
unswappable memory in kernel, a per-thread stack, which is reducing 
your virtual memory per process (you have only around 3GB of virtual 
memory per process in Linux x86 32bit).

So one uses a process or thread pool to scale beyond that. 
Pool size is typically related to the amount of CPU cores in the system.

Regards

Ingo Oeser

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: epoll,threading

2007-05-26 Thread Ingo Oeser
Hi Arunachalam,

On Saturday 26 May 2007, Arunachalam wrote:
 I want to know in detail about , what the events (epoll or /dev/poll or
 select ) achieve in contrast to  thread per client.
 
 i can have a thread per client and use send and recv system call directly
 right? Why do i go for these event mechanisms?

Try 30.000 clients or more on a x86 32bit box. 
That will show you the difference quite nicely :-)

More seriously: Thread per client scales only to a certain amount of clients
per RAM. If you like to scale beyond that to like to minimize your 
state 
per client. If you have a thread then you have a task structure as 
unswappable memory in kernel, a per-thread stack, which is reducing 
your virtual memory per process (you have only around 3GB of virtual 
memory per process in Linux x86 32bit).

So one uses a process or thread pool to scale beyond that. 
Pool size is typically related to the amount of CPU cores in the system.

Regards

Ingo Oeser

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bridge] [BUG] Dropping fragmented IP packets within VLAN frames on bridge

2007-05-26 Thread Ingo Oeser
On Saturday 26 May 2007, Patrick McHardy wrote:
 Adam Osuchowski wrote:
  if (((skb-protocol == htons(ETH_P_IP)  skb-len  skb-dev-mtu) ||
  (IS_VLAN_IP(skb)  skb-len  skb-dev-mtu - VLAN_HLEN)) 
  !skb_is_gso(skb))
  return ip_fragment ...
 
 
 net/8021q ignores the VLAN header overhead, so we should probably do the
 same here for consistency. Using IS_VLAN_IP (and IS_PPPOE_IP for current
 -rc) looks fine, additionally we should probably also check for
 skb-nfct != NULL to make sure that at least without connection tracking
 the bridge doesn't perform fragmentation.

And could we separe the conditions for that into a static helper function
explaining each of these conditions? e.g. sth. like that:

static bool br_nf_need_fragment(struct sk_buff *skb)
{
/* Plain IP packet does not fit in MTU */
if (!(skb-protocol == htons(ETH_P_IP)  skb-len  skb-dev-mtu))
return true;

/* VLAN encapsulated IP packet does not fit in MTU */
if (IS_VLAN_IP(skb)  skb-len  skb-dev-mtu - VLAN_HLEN)
return true;

/* PPPoE encapsulated IP packet does not fit in MTU */
if (IS_PPPOE_IP(skb)  skb-len  skb-dev-mtu - PPPOE_SES_HLEN)
return true;

return false;
}

and then br_nf_dev_queue_xmit() becomes:

static int br_nf_dev_queue_xmit(struct sk_buff *skb)
{
if (br_nf_need_fragment(skb)   !skb_is_gso(skb))
return ip_fragment(skb, br_dev_queue_push_xmit);
else
return br_dev_queue_push_xmit(skb);
}

which is much more readable, more documented and doesn't contain a condition 
monster :-)

@Patrick: Could you check, wether the PPPoE case is correct?

What do you think? Should I submit a patch for that?


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce boot based time

2007-05-10 Thread Ingo Oeser
Hi John and Tomas, 

On Thursday 10 May 2007, john stultz wrote:
> I'm not sure I follow this. 
> 
> total_sleep_time stores seconds. So on 32bit systems that's 130some
> years, so it shouldn't be an issue.
> 
> Is the reason you want it to be a ktime is because you want a way to
> keep sub-second sleep granularity?

No, I'm just overworked and getting sloppy :-/ 

Sorry for the noise...

Best regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce boot based time

2007-05-10 Thread Ingo Oeser
Hi Tomas,

On Thursday 10 May 2007, Tomas Janousek wrote:
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 8997b61..06f3eaf 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -116,6 +116,8 @@ extern int do_setitimer(int which, struct itimerval 
> *value,
>  extern unsigned int alarm_setitimer(unsigned int seconds);
>  extern int do_getitimer(int which, struct itimerval *value);
>  extern void getnstimeofday(struct timespec *tv);
> +extern void getboottime(struct timespec *ts);
> +extern void monotonic_to_bootbased(struct timespec *ts);
>  
>  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
>  extern int timekeeping_is_continuous(void);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index f9217bf..dd9647a 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -36,9 +36,17 @@ EXPORT_SYMBOL(xtime_lock);
>   * at zero at system boot time, so wall_to_monotonic will be negative,
>   * however, we will ALWAYS keep the tv_nsec part positive so we can use
>   * the usual normalization.
> + *
> + * wall_to_monotonic is moved after resume from suspend for the monotonic
> + * time not to jump. We need to add total_sleep_time to wall_to_monotonic
> + * to get the real boot based time offset.
> + *
> + * - wall_to_monotonic is no longer the boot time, getboottime must be
> + * used instead.
>   */
>  struct timespec xtime __attribute__ ((aligned (16)));
>  struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
> +static unsigned long total_sleep_time;

Could you make that a ktime_t (or struct ktime)?
There are machines, which sleep more than they are awake. 
Just imagine a surveillance camera triggered by door entrance.

Yes, these things might run Linux (e.g. on "cris" architecture).

Or your VCR.
Yes, these devices might sleep more than they are awake, 
if you are not a TV junkie :-)


Best regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce boot based time

2007-05-10 Thread Ingo Oeser
Hi Tomas,

On Thursday 10 May 2007, Tomas Janousek wrote:
 diff --git a/include/linux/time.h b/include/linux/time.h
 index 8997b61..06f3eaf 100644
 --- a/include/linux/time.h
 +++ b/include/linux/time.h
 @@ -116,6 +116,8 @@ extern int do_setitimer(int which, struct itimerval 
 *value,
  extern unsigned int alarm_setitimer(unsigned int seconds);
  extern int do_getitimer(int which, struct itimerval *value);
  extern void getnstimeofday(struct timespec *tv);
 +extern void getboottime(struct timespec *ts);
 +extern void monotonic_to_bootbased(struct timespec *ts);
  
  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_is_continuous(void);
 diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
 index f9217bf..dd9647a 100644
 --- a/kernel/time/timekeeping.c
 +++ b/kernel/time/timekeeping.c
 @@ -36,9 +36,17 @@ EXPORT_SYMBOL(xtime_lock);
   * at zero at system boot time, so wall_to_monotonic will be negative,
   * however, we will ALWAYS keep the tv_nsec part positive so we can use
   * the usual normalization.
 + *
 + * wall_to_monotonic is moved after resume from suspend for the monotonic
 + * time not to jump. We need to add total_sleep_time to wall_to_monotonic
 + * to get the real boot based time offset.
 + *
 + * - wall_to_monotonic is no longer the boot time, getboottime must be
 + * used instead.
   */
  struct timespec xtime __attribute__ ((aligned (16)));
  struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
 +static unsigned long total_sleep_time;

Could you make that a ktime_t (or struct ktime)?
There are machines, which sleep more than they are awake. 
Just imagine a surveillance camera triggered by door entrance.

Yes, these things might run Linux (e.g. on cris architecture).

Or your VCR.
Yes, these devices might sleep more than they are awake, 
if you are not a TV junkie :-)


Best regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce boot based time

2007-05-10 Thread Ingo Oeser
Hi John and Tomas, 

On Thursday 10 May 2007, john stultz wrote:
 I'm not sure I follow this. 
 
 total_sleep_time stores seconds. So on 32bit systems that's 130some
 years, so it shouldn't be an issue.
 
 Is the reason you want it to be a ktime is because you want a way to
 keep sub-second sleep granularity?

No, I'm just overworked and getting sloppy :-/ 

Sorry for the noise...

Best regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] LogFS proper

2007-05-08 Thread Ingo Oeser
On Tuesday 08 May 2007, Thomas Gleixner wrote:
> On Tue, 2007-05-08 at 00:00 +0200, Jörn Engel wrote:
> > +#define packed __attribute__((__packed__))
> 
> Please use the __attribute__((__packed__)) on your structs instead of
> creating some extra "needs lookup" magic.

Don't worry, we have __packed predefined for this. 
Just look in include/linux/compiler-gcc.h

I love it, because I always forget at least one brace or undescore level :-)




Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] LogFS proper

2007-05-08 Thread Ingo Oeser
On Tuesday 08 May 2007, Thomas Gleixner wrote:
 On Tue, 2007-05-08 at 00:00 +0200, Jörn Engel wrote:
  +#define packed __attribute__((__packed__))
 
 Please use the __attribute__((__packed__)) on your structs instead of
 creating some extra needs lookup magic.

Don't worry, we have __packed predefined for this. 
Just look in include/linux/compiler-gcc.h

I love it, because I always forget at least one brace or undescore level :-)




Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-01 Thread Ingo Oeser
On Tuesday 01 May 2007, Dave Airlie wrote:
> >   - what's with the /proc interface?  Don't add new proc code for
> > non-process related things.  This should all go into sysfs
> > somewhere.  And yes, I know /proc/dri/ is there today, but don't add
> > new stuff please.
> 
> Well we should move all that stuff to sysfs, but we have all the
> infrastructure for publishing this stuff under /proc/dri and adding
> new files doesn't take a major amount, as much as I appreciate sysfs,
> it isn't suitable for this sort of information dump, the whole one
> value per file is quite useless to provide this sort of information
> which is uni-directional for users to send to us for debugging without
> have to install some special tool to join all the values into one
> place.. and I don't think drmfs is the answer either... or maybe it
> is

Ok, what about debugfs then? If it is just for debugging blobs -> debugfs,
if it is crucial for operation -> sysfs and representation of one value 
per file.


Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-01 Thread Ingo Oeser
On Tuesday 01 May 2007, Dave Airlie wrote:
- what's with the /proc interface?  Don't add new proc code for
  non-process related things.  This should all go into sysfs
  somewhere.  And yes, I know /proc/dri/ is there today, but don't add
  new stuff please.
 
 Well we should move all that stuff to sysfs, but we have all the
 infrastructure for publishing this stuff under /proc/dri and adding
 new files doesn't take a major amount, as much as I appreciate sysfs,
 it isn't suitable for this sort of information dump, the whole one
 value per file is quite useless to provide this sort of information
 which is uni-directional for users to send to us for debugging without
 have to install some special tool to join all the values into one
 place.. and I don't think drmfs is the answer either... or maybe it
 is

Ok, what about debugfs then? If it is just for debugging blobs - debugfs,
if it is crucial for operation - sysfs and representation of one value 
per file.


Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] crypto: Use padlock.ko only as a module

2007-04-29 Thread Ingo Oeser
Hi Scott,

On Sunday 29 April 2007, Simon Arlott wrote:
> Ideally I'd just remove that module completely, all it does is 
> trigger the loading of the other two modules when modules are 
> used - so I'll submit a patch for that instead.

That's much better! 

When you force a feature to be a module on a kernel without 
module support, it will effectivly be disabled.

And if it is so simple to do the same in userspace like you suggest,
than that's much better.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] crypto: Use padlock.ko only as a module

2007-04-29 Thread Ingo Oeser
Hi Scott,

On Sunday 29 April 2007, Simon Arlott wrote:
 Ideally I'd just remove that module completely, all it does is 
 trigger the loading of the other two modules when modules are 
 used - so I'll submit a patch for that instead.

That's much better! 

When you force a feature to be a module on a kernel without 
module support, it will effectivly be disabled.

And if it is so simple to do the same in userspace like you suggest,
than that's much better.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: init's children list is long and slows reaping children.

2007-04-10 Thread Ingo Oeser
On Tuesday 10 April 2007, Jeff Garzik wrote:
> Thus, rather than forcing authors to make their code more complex, we 
> should find another solution.

What about sth. like the "pre-forking" concept? So just have a thread creator 
thread,
which checks the amount of unused threads and keeps them within certain limits.

So that anything which needs a thread now simply queues up the work and
specifies, that it wants a new thread, if possible.

One problem seems to be, that a thread is nothing else but a statement
on what other tasks I can wait before doing my current one (e.g. I don't want 
to 
mlseep() twice on the same reset timeout). 
But we usually use locking to order that.

Do I miss anything fundamental here?

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: init's children list is long and slows reaping children.

2007-04-10 Thread Ingo Oeser
On Tuesday 10 April 2007, Jeff Garzik wrote:
> That's why I feel thread creation -- cheap under Linux -- is quite 
> appropriate for many of these situations.

Maybe that (thread creation) can be done at open(), socket-creation, 
service request, syscall or whatever event triggers a driver/subsystem 
to actually queue work into a thread.

And when there is a close(), socket-destruction, service completion
or whatever these threads can be marked for destruction and destroyed
by a timer or even immediately.

Regards

Ingo Oeser

-- 
If something is getting cheap, it is getting wasted just because it is cheap.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: init's children list is long and slows reaping children.

2007-04-10 Thread Ingo Oeser
On Tuesday 10 April 2007, Jeff Garzik wrote:
 That's why I feel thread creation -- cheap under Linux -- is quite 
 appropriate for many of these situations.

Maybe that (thread creation) can be done at open(), socket-creation, 
service request, syscall or whatever event triggers a driver/subsystem 
to actually queue work into a thread.

And when there is a close(), socket-destruction, service completion
or whatever these threads can be marked for destruction and destroyed
by a timer or even immediately.

Regards

Ingo Oeser

-- 
If something is getting cheap, it is getting wasted just because it is cheap.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: init's children list is long and slows reaping children.

2007-04-10 Thread Ingo Oeser
On Tuesday 10 April 2007, Jeff Garzik wrote:
 Thus, rather than forcing authors to make their code more complex, we 
 should find another solution.

What about sth. like the pre-forking concept? So just have a thread creator 
thread,
which checks the amount of unused threads and keeps them within certain limits.

So that anything which needs a thread now simply queues up the work and
specifies, that it wants a new thread, if possible.

One problem seems to be, that a thread is nothing else but a statement
on what other tasks I can wait before doing my current one (e.g. I don't want 
to 
mlseep() twice on the same reset timeout). 
But we usually use locking to order that.

Do I miss anything fundamental here?

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: getting processor numbers

2007-04-03 Thread Ingo Oeser
Hi Ulrich,

On Tuesday 03 April 2007, Ulrich Drepper wrote:
> So, anybody else has a proposal?  This is a pressing issue and cannot
> wait until someday in the distant future NUMA topology information is
> easily and speedily accessible.

Since for now you just need a fast and dirty hack, which will be replaced 
with better interfaces, I suggest creating a directory with some files in it.
These should just contain, what you need to handle your most pressing cases.

I propose /sys/devices/system/topology_counters/ for that.
These can contain "online_cpu", "proped_cpu", "max_cpu"
and maybe the same for nodes. All that as a simple file with an integer
value.

Since sysfs-attribute files are pollable (if the owners notifies sysfs 
on changes), you also have the notification system you need 
(select, poll, epoll etc.).

If you promise to just keep the slow code around, than one day when the shiny 
NUMA topology stuff is ready, this directory can be completely removed and
glibc (plus all their users) keeps working. It will then even work better with 
a 
new glibc version, which supports the shiny new NUMA topology stuff.

The kernel can create these counters quiete easy, since most of them are 
the hamming weight (or population count) of some bitmaps.

Does this sound like a proper hacky solution? :-)

Regards

Ingo Oeser


pgpeUyaLE4v0G.pgp
Description: PGP signature


Re: getting processor numbers

2007-04-03 Thread Ingo Oeser
Hi Ulrich,

On Tuesday 03 April 2007, Ulrich Drepper wrote:
 So, anybody else has a proposal?  This is a pressing issue and cannot
 wait until someday in the distant future NUMA topology information is
 easily and speedily accessible.

Since for now you just need a fast and dirty hack, which will be replaced 
with better interfaces, I suggest creating a directory with some files in it.
These should just contain, what you need to handle your most pressing cases.

I propose /sys/devices/system/topology_counters/ for that.
These can contain online_cpu, proped_cpu, max_cpu
and maybe the same for nodes. All that as a simple file with an integer
value.

Since sysfs-attribute files are pollable (if the owners notifies sysfs 
on changes), you also have the notification system you need 
(select, poll, epoll etc.).

If you promise to just keep the slow code around, than one day when the shiny 
NUMA topology stuff is ready, this directory can be completely removed and
glibc (plus all their users) keeps working. It will then even work better with 
a 
new glibc version, which supports the shiny new NUMA topology stuff.

The kernel can create these counters quiete easy, since most of them are 
the hamming weight (or population count) of some bitmaps.

Does this sound like a proper hacky solution? :-)

Regards

Ingo Oeser


pgpeUyaLE4v0G.pgp
Description: PGP signature


Re: [PATCH] sysctl: vfs_cache_divisor

2007-03-20 Thread Ingo Oeser
Hi Randy,

On Monday 19 March 2007, Randy Dunlap wrote:
> Were there any patches written after this?  If so, I missed them.
> If not, does this patch help any?

How is division by zero avoided? Maybe one can avoid setting it to zero.

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sysctl: vfs_cache_divisor

2007-03-20 Thread Ingo Oeser
Hi Randy,

On Monday 19 March 2007, Randy Dunlap wrote:
 Were there any patches written after this?  If so, I missed them.
 If not, does this patch help any?

How is division by zero avoided? Maybe one can avoid setting it to zero.

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] semaphores: add down_interruptible_timeout() and asm-generic/semaphore.h

2007-02-27 Thread Ingo Oeser
Hi Inaky,

On Tuesday, 27. February 2007, Inaky Perez-Gonzalez wrote:
> On Monday 26 February 2007 18:18, Alan wrote:
> > > Yeah, I need semaphore. This is a hw register that says when the hw
> > > is ready to accept a new command. Code that wants to send commands has
> > > to down the semaphore and then send it. When hw is ready to get a new
> > > command, it sends and IRQ and the IRQ up()s the semaphore.
> >
> > So you need a mutex not a semaphore
> 
> Theoretically I could use a mutex. Practically it would trigger ugly
> complications. Only the owner can unlock a mutex (for example), so
> I could not unlock from an IRQ handler -- not to mention that the
> semantic rules outlined in Documentation/mutex-design.txt explicitly
> forbid IRQ usage. 
> 
> And then, this is what semaphores where designed for, as gates :)
> for once that I get to use a semaphore properly...

But they are not required for that :-)

I would suggest to use an irq-safe spinlock for the hardware access
and a status indicator (ready for command), if this is really just a command 
register. 
If the status indicator is updated (in IRQ) and read under spinlock, that is 
safe.

If that command sending is speed critical, please try a FIFO and batch that 
stuff.

Timeout based locking mechanisms are flawed, because they introduce the hard to 
find
timing sensitive bugs.

Please try sth. different (e.g. like suggested above).

Semaphores aren't good "busy/ready flags", as you might have already noticed.

Many Thanks and Best Regards

Ingo Oeser, the down{_interruptible,}_timeout() implementation of Linux :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] semaphores: add down_interruptible_timeout() and asm-generic/semaphore.h

2007-02-27 Thread Ingo Oeser
Hi Inaky,

On Tuesday, 27. February 2007, Inaky Perez-Gonzalez wrote:
 On Monday 26 February 2007 18:18, Alan wrote:
   Yeah, I need semaphore. This is a hw register that says when the hw
   is ready to accept a new command. Code that wants to send commands has
   to down the semaphore and then send it. When hw is ready to get a new
   command, it sends and IRQ and the IRQ up()s the semaphore.
 
  So you need a mutex not a semaphore
 
 Theoretically I could use a mutex. Practically it would trigger ugly
 complications. Only the owner can unlock a mutex (for example), so
 I could not unlock from an IRQ handler -- not to mention that the
 semantic rules outlined in Documentation/mutex-design.txt explicitly
 forbid IRQ usage. 
 
 And then, this is what semaphores where designed for, as gates :)
 for once that I get to use a semaphore properly...

But they are not required for that :-)

I would suggest to use an irq-safe spinlock for the hardware access
and a status indicator (ready for command), if this is really just a command 
register. 
If the status indicator is updated (in IRQ) and read under spinlock, that is 
safe.

If that command sending is speed critical, please try a FIFO and batch that 
stuff.

Timeout based locking mechanisms are flawed, because they introduce the hard to 
find
timing sensitive bugs.

Please try sth. different (e.g. like suggested above).

Semaphores aren't good busy/ready flags, as you might have already noticed.

Many Thanks and Best Regards

Ingo Oeser, the down{_interruptible,}_timeout() implementation of Linux :-)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] MM: detach_vmas_to_be_unmapped fix

2007-02-21 Thread Ingo Oeser
Hi,

On Wednesday, 21. February 2007, [EMAIL PROTECTED] wrote:
> 
> ---
> 
>  mm/mmap.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff -puN mm/mmap.c~Avoiding-mmap-fragmentation_fixup mm/mmap.c
> --- linux-2.6_clean/mm/mmap.c~Avoiding-mmap-fragmentation_fixup   
> 2007-02-21 09:49:32.0 -0800
> +++ linux-2.6_clean-akuster/mm/mmap.c 2007-02-21 09:51:26.0 -0800
> @@ -1720,9 +1720,9 @@ detach_vmas_to_be_unmapped(struct mm_str
>   *insertion_point = vma;
>   tail_vma->vm_next = NULL;
>   if (mm->unmap_area == arch_unmap_area)
> - addr = prev ? prev->vm_end : mm->mmap_base;
> + addr = prev ? prev->vm_start : mm->mmap_base;
>   else
> - addr = vma ?  vma->vm_start : mm->mmap_base;
> + addr = vma ?  vma->vm_end : mm->mmap_base;
>   mm->unmap_area(mm, addr);
>   mm->mmap_cache = NULL;  /* Kill the cache. */
>  }

Please comment, why you think this is necessary.


Thanks & Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] MM: detach_vmas_to_be_unmapped fix

2007-02-21 Thread Ingo Oeser
Hi,

On Wednesday, 21. February 2007, [EMAIL PROTECTED] wrote:
 
 ---
 
  mm/mmap.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff -puN mm/mmap.c~Avoiding-mmap-fragmentation_fixup mm/mmap.c
 --- linux-2.6_clean/mm/mmap.c~Avoiding-mmap-fragmentation_fixup   
 2007-02-21 09:49:32.0 -0800
 +++ linux-2.6_clean-akuster/mm/mmap.c 2007-02-21 09:51:26.0 -0800
 @@ -1720,9 +1720,9 @@ detach_vmas_to_be_unmapped(struct mm_str
   *insertion_point = vma;
   tail_vma-vm_next = NULL;
   if (mm-unmap_area == arch_unmap_area)
 - addr = prev ? prev-vm_end : mm-mmap_base;
 + addr = prev ? prev-vm_start : mm-mmap_base;
   else
 - addr = vma ?  vma-vm_start : mm-mmap_base;
 + addr = vma ?  vma-vm_end : mm-mmap_base;
   mm-unmap_area(mm, addr);
   mm-mmap_cache = NULL;  /* Kill the cache. */
  }

Please comment, why you think this is necessary.


Thanks  Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] MTD: fix DOC2000/2001/2001PLUS build error

2007-02-05 Thread Ingo Oeser
On Monday, 5. February 2007, Linus Torvalds wrote:
> So thank God for the few selects we have, and we should add a whole lot 
> more!

But "select" is not fine grained enough.

I would like to have "require", "recommend", "suggest" for feature A.

require X
does not work without X, but X is way down the tree 
e.g. ext3 and block device or how select currently is intended

recommend X
it is usable but uncomfortable without X, enabled per default
e.g. firewalling recommends connection tracking support 
or NAT recommends all NAT helpers

suggest X
many people use A together with X, 
so you might be interested in enabling it, but I disabled it
per default unless you said "featuritis mode" before.
e.g. highmem and SMP or a network driver and NAPI.

That is what the Debian/Ubuntu package management does and maybe other too.
And this also gives us new keywords to replace select with, 
so migration is doable :-)

This would also make "EMBEDDED" superflous, because it would just mean 
"disable anything not required". 

And this would enable an individual tree for the users current configuration 
problem instead of a global one.

Regards

Ingo "and tomorrow we change the world" Oeser :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] MTD: fix DOC2000/2001/2001PLUS build error

2007-02-05 Thread Ingo Oeser
On Monday, 5. February 2007, Linus Torvalds wrote:
 So thank God for the few selects we have, and we should add a whole lot 
 more!

But select is not fine grained enough.

I would like to have require, recommend, suggest for feature A.

require X
does not work without X, but X is way down the tree 
e.g. ext3 and block device or how select currently is intended

recommend X
it is usable but uncomfortable without X, enabled per default
e.g. firewalling recommends connection tracking support 
or NAT recommends all NAT helpers

suggest X
many people use A together with X, 
so you might be interested in enabling it, but I disabled it
per default unless you said featuritis mode before.
e.g. highmem and SMP or a network driver and NAPI.

That is what the Debian/Ubuntu package management does and maybe other too.
And this also gives us new keywords to replace select with, 
so migration is doable :-)

This would also make EMBEDDED superflous, because it would just mean 
disable anything not required. 

And this would enable an individual tree for the users current configuration 
problem instead of a global one.

Regards

Ingo and tomorrow we change the world Oeser :-)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left

2007-01-16 Thread Ingo Oeser
On Tuesday, 16. January 2007 06:37, Nate Diller wrote:
> On 1/15/07, Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> > On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote:
> > > Convert code using iocb->ki_left to use the more generic iov_length() 
> > > call.
> >
> > No way.  We need to reduce the numer of iovec traversals, not adding
> > more of them.
> 
> ok, I can work on a version of this that uses struct iodesc.  Maybe
> something like this?
> 
> struct iodesc {
> struct iovec *iov;
> unsigned long nr_segs;
> size_t nbytes;
> };
> 
> I suppose it's worth doing the iodesc thing along with this patchset
> anyway, since it'll avoid an extra round of interface churn.

What about this instead

struct iodesc {
struct iovec *iov;
unsigned long nr_segs;
unsigned long seg_limit;
size_t nr_bytes;
};

That will enable resizeable iodescs with partial completion state and
will enable successive filling of an iodesc with iovs.

This will be needed anyway. I built an complete short userspace 
module for that already. I can post and GPLv2 it somewhere, if people
are interested.

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 31/59] sysctl: C99 convert the ctl_tables in arch/mips/au1000/common/power.c

2007-01-16 Thread Ingo Oeser
Hi Eric,

On Tuesday, 16. January 2007 17:39, Eric W. Biederman wrote:
> diff --git a/arch/mips/au1000/common/power.c b/arch/mips/au1000/common/power.c
> index b531ab7..31256b8 100644
> --- a/arch/mips/au1000/common/power.c
> +++ b/arch/mips/au1000/common/power.c
> @@ -419,15 +419,41 @@ static int pm_do_freq(ctl_table * ctl, int write, 
> struct file *file,

> + {
> + .ctl_name   = CTL_UNNUMBERED,
> + .procname   = "suspend",
> + .data   = NULL,
> + .maxlen = 0,
> + .mode   = 0600,
> + .proc_handler   = _do_suspend
> + },

No need for zero initialization for maxlen.

> + {
> + .ctl_name   = CTL_UNNUMBERED,
> + .procname   = "sleep",
> + .data   = NULL,
> + .maxlen = 0,
> + .mode   = 0600,
> + .proc_handler   = _do_sleep
> + },

dito

> + {
> + .ctl_name   = CTL_UNNUMBERED,
> + .procname   = "freq",
> + .data   = NULL,
> + .maxlen = 0,
> +     .mode   = 0600,
> + .proc_handler   = _do_freq
> + },
> + {}
>  };

dito

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 45/59] sysctl: C99 convert ctl_tables in drivers/parport/procfs.c

2007-01-16 Thread Ingo Oeser
Hi Eric,

On Tuesday, 16. January 2007 17:39, Eric W. Biederman wrote:
> diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
> index 2e744a2..5337789 100644
> --- a/drivers/parport/procfs.c
> +++ b/drivers/parport/procfs.c
> @@ -263,50 +263,118 @@ struct parport_sysctl_table {
> + {
> + .ctl_name   = DEV_PARPORT_BASE_ADDR,
> + .procname   = "base-addr",
> + .data   = NULL,
> + .maxlen = 0,
> + .mode   = 0444,
> + .proc_handler   = _hardware_base_addr
> + },

No need to initialize to zero or NULL. Just list any variable, which is NOT 
zero or NULL.

> + {
> + .ctl_name   = DEV_PARPORT_AUTOPROBE + 1,
> + .procname   = "autoprobe0",
> + .data   = NULL,
> + .maxlen = 0,
> + .maxlen = 0444,
> + .proc_handler   =  _autoprobe
> + },

Typo here? .mode = 0444 makes mor sense.

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left

2007-01-16 Thread Ingo Oeser
On Tuesday, 16. January 2007 06:37, Nate Diller wrote:
 On 1/15/07, Christoph Hellwig [EMAIL PROTECTED] wrote:
  On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote:
   Convert code using iocb-ki_left to use the more generic iov_length() 
   call.
 
  No way.  We need to reduce the numer of iovec traversals, not adding
  more of them.
 
 ok, I can work on a version of this that uses struct iodesc.  Maybe
 something like this?
 
 struct iodesc {
 struct iovec *iov;
 unsigned long nr_segs;
 size_t nbytes;
 };
 
 I suppose it's worth doing the iodesc thing along with this patchset
 anyway, since it'll avoid an extra round of interface churn.

What about this instead

struct iodesc {
struct iovec *iov;
unsigned long nr_segs;
unsigned long seg_limit;
size_t nr_bytes;
};

That will enable resizeable iodescs with partial completion state and
will enable successive filling of an iodesc with iovs.

This will be needed anyway. I built an complete short userspace 
module for that already. I can post and GPLv2 it somewhere, if people
are interested.

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 45/59] sysctl: C99 convert ctl_tables in drivers/parport/procfs.c

2007-01-16 Thread Ingo Oeser
Hi Eric,

On Tuesday, 16. January 2007 17:39, Eric W. Biederman wrote:
 diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
 index 2e744a2..5337789 100644
 --- a/drivers/parport/procfs.c
 +++ b/drivers/parport/procfs.c
 @@ -263,50 +263,118 @@ struct parport_sysctl_table {
 + {
 + .ctl_name   = DEV_PARPORT_BASE_ADDR,
 + .procname   = base-addr,
 + .data   = NULL,
 + .maxlen = 0,
 + .mode   = 0444,
 + .proc_handler   = do_hardware_base_addr
 + },

No need to initialize to zero or NULL. Just list any variable, which is NOT 
zero or NULL.

 + {
 + .ctl_name   = DEV_PARPORT_AUTOPROBE + 1,
 + .procname   = autoprobe0,
 + .data   = NULL,
 + .maxlen = 0,
 + .maxlen = 0444,
 + .proc_handler   =  do_autoprobe
 + },

Typo here? .mode = 0444 makes mor sense.

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 31/59] sysctl: C99 convert the ctl_tables in arch/mips/au1000/common/power.c

2007-01-16 Thread Ingo Oeser
Hi Eric,

On Tuesday, 16. January 2007 17:39, Eric W. Biederman wrote:
 diff --git a/arch/mips/au1000/common/power.c b/arch/mips/au1000/common/power.c
 index b531ab7..31256b8 100644
 --- a/arch/mips/au1000/common/power.c
 +++ b/arch/mips/au1000/common/power.c
 @@ -419,15 +419,41 @@ static int pm_do_freq(ctl_table * ctl, int write, 
 struct file *file,

 + {
 + .ctl_name   = CTL_UNNUMBERED,
 + .procname   = suspend,
 + .data   = NULL,
 + .maxlen = 0,
 + .mode   = 0600,
 + .proc_handler   = pm_do_suspend
 + },

No need for zero initialization for maxlen.

 + {
 + .ctl_name   = CTL_UNNUMBERED,
 + .procname   = sleep,
 + .data   = NULL,
 + .maxlen = 0,
 + .mode   = 0600,
 + .proc_handler   = pm_do_sleep
 + },

dito

 + {
 + .ctl_name   = CTL_UNNUMBERED,
 + .procname   = freq,
 + .data   = NULL,
 + .maxlen = 0,
 + .mode   = 0600,
 + .proc_handler   = pm_do_freq
 + },
 + {}
  };

dito

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [DISCUSS] Make the variable NULL after freeing it.

2007-01-01 Thread Ingo Oeser
On Monday, 1. January 2007 17:25, Andreas Schwab wrote:
> Ingo Oeser <[EMAIL PROTECTED]> writes:
> > Then this works, because the side effect (+20) is evaluated only once. 
> 
> It's not a side effect, it's a non-lvalue, and you can't take the address
> of a non-lvalue.

Just verified this. So If we cannot make it work in all cases, it will
cause more problems then it will solve.

So we are left with a function, which will 
a) only be used by janitors to provide "kfree(x); x = NULL;" 
with an macro KFREE(x) in all the simple cases.

b) be used by developers, who are aware of the fact that reusable
pointer values should set to NULL after kfree().

Doing a) and b) is "running into open doors", so doesn't prevent any
error, obfuscates code more and works only sometimes.

I give up here and would vote for dropping that idea then.


Regards

Ingo Oeser


pgpcCSfafJsC7.pgp
Description: PGP signature


Re: [PATCH] [DISCUSS] Make the variable NULL after freeing it.

2007-01-01 Thread Ingo Oeser
Hi,

On Monday, 1. January 2007 07:37, Amit Choudhary wrote:
> --- Ingo Oeser <[EMAIL PROTECTED]> wrote:
> > #define kfree_nullify(x) do { \
> > if (__builtin_constant_p(x)) { \
> > kfree(x); \
> > } else { \
> > typeof(x) *__addr_x =  \

Ok, I should change that line to 
typeof(x) *__addr_x = &(x); \

> > kfree(*__addr_x); \
> > *__addr_x = NULL; \
> > } \
> > } while (0)
> > 
> > Regards
> > 
> > Ingo Oeser
> > 
> 
> This is a nice approach but what if someone does kfree_nullify(x+20).

Then this works, because the side effect (+20) is evaluated only once. 
AFAIK __builtin_constant_p() and typeof() are both free of side effects.

 
> I decided to keep it simple. If someone is calling kfree_nullify() with 
> anything other than a
> simple variable, then they should call kfree().

kfree_nullify() has to replace kfree() to be of any use one day. So this is not 
an option.

Anybody thinking of "Hey, this must be NULL afterwards!", will set it to NULL 
himself.
Anybody else doesn't know or care about it, which is the case we like to catch.

> But definitely an approach that takes care of all 
> situations is the best but I cannot think of a macro that can handle all 
> situations. The simple
> macro that I sent earlier will catch all the other usage at compile time. 

The problems I see are:
1. parameter to kfree is a value not a pointer 
-> solved by using a macro instead of function, 
 but generate new (the other) problems
-> take the address of the value there.
2. possible side effects of macro parameter usage 
   -> solved by assigning once only and using typeof
3. Constants don't have an address 
   -> need to check for constant

So apart from missing braces before taking the address, I don't see
any problem with my solution :-)

Should I send a patch?

> Please let me know if I have missed something.

I reviewed it and you missed side effects (kfree(x); x = NULL).

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [DISCUSS] Make the variable NULL after freeing it.

2007-01-01 Thread Ingo Oeser
Hi,

On Monday, 1. January 2007 07:37, Amit Choudhary wrote:
 --- Ingo Oeser [EMAIL PROTECTED] wrote:
  #define kfree_nullify(x) do { \
  if (__builtin_constant_p(x)) { \
  kfree(x); \
  } else { \
  typeof(x) *__addr_x = x; \

Ok, I should change that line to 
typeof(x) *__addr_x = (x); \

  kfree(*__addr_x); \
  *__addr_x = NULL; \
  } \
  } while (0)
  
  Regards
  
  Ingo Oeser
  
 
 This is a nice approach but what if someone does kfree_nullify(x+20).

Then this works, because the side effect (+20) is evaluated only once. 
AFAIK __builtin_constant_p() and typeof() are both free of side effects.

 
 I decided to keep it simple. If someone is calling kfree_nullify() with 
 anything other than a
 simple variable, then they should call kfree().

kfree_nullify() has to replace kfree() to be of any use one day. So this is not 
an option.

Anybody thinking of Hey, this must be NULL afterwards!, will set it to NULL 
himself.
Anybody else doesn't know or care about it, which is the case we like to catch.

 But definitely an approach that takes care of all 
 situations is the best but I cannot think of a macro that can handle all 
 situations. The simple
 macro that I sent earlier will catch all the other usage at compile time. 

The problems I see are:
1. parameter to kfree is a value not a pointer 
- solved by using a macro instead of function, 
 but generate new (the other) problems
- take the address of the value there.
2. possible side effects of macro parameter usage 
   - solved by assigning once only and using typeof
3. Constants don't have an address 
   - need to check for constant

So apart from missing braces before taking the address, I don't see
any problem with my solution :-)

Should I send a patch?

 Please let me know if I have missed something.

I reviewed it and you missed side effects (kfree(x); x = NULL).

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [DISCUSS] Make the variable NULL after freeing it.

2007-01-01 Thread Ingo Oeser
On Monday, 1. January 2007 17:25, Andreas Schwab wrote:
 Ingo Oeser [EMAIL PROTECTED] writes:
  Then this works, because the side effect (+20) is evaluated only once. 
 
 It's not a side effect, it's a non-lvalue, and you can't take the address
 of a non-lvalue.

Just verified this. So If we cannot make it work in all cases, it will
cause more problems then it will solve.

So we are left with a function, which will 
a) only be used by janitors to provide kfree(x); x = NULL; 
with an macro KFREE(x) in all the simple cases.

b) be used by developers, who are aware of the fact that reusable
pointer values should set to NULL after kfree().

Doing a) and b) is running into open doors, so doesn't prevent any
error, obfuscates code more and works only sometimes.

I give up here and would vote for dropping that idea then.


Regards

Ingo Oeser


pgpcCSfafJsC7.pgp
Description: PGP signature


Re: [PATCH] [DISCUSS] Make the variable NULL after freeing it.

2006-12-31 Thread Ingo Oeser
On Sunday, 31. December 2006 14:38, Bernd Petrovitsch wrote:
> That depends on the decision/definition if (so called) "double free" is
> an error or not (and "free(NULL)" must work in POSIX-compliant
> environments).

A double free of non-NULL is certainly an error.
So the idea of setting it to NULL is ok, since then you can
kfree the variable over and over again without any harm.

It is just complicated to do this side effect free.

Maybe one should check for builtin-constant and take the address,
if this is not an builtin-constant.

sth, like this

#define kfree_nullify(x) do { \
if (__builtin_constant_p(x)) { \
kfree(x); \
} else { \
typeof(x) *__addr_x =  \
kfree(*__addr_x); \
*__addr_x = NULL; \
    } \
} while (0)

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/8] KVM: Implement a few system configuration msrs

2006-12-31 Thread Ingo Oeser
Hi,

On Thursday, 28. December 2006 11:11, Avi Kivity wrote:
> Index: linux-2.6/drivers/kvm/svm.c
> ===
> --- linux-2.6.orig/drivers/kvm/svm.c
> +++ linux-2.6/drivers/kvm/svm.c
> @@ -1068,6 +1068,9 @@ static int emulate_on_interception(struc
>  static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
>  {
>   switch (ecx) {
> + case 0xc0010010: /* SYSCFG */
> + case 0xc0010015: /* HWCR */
> + case MSR_IA32_PLATFORM_ID:
>   case MSR_IA32_P5_MC_ADDR:
>   case MSR_IA32_P5_MC_TYPE:
>   case MSR_IA32_MC0_CTL:

What about just defining constants for these?
Then you can rip out these comments.

Same for linux-2.6/drivers/kvm/vmx.c


Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/8] KVM: Implement a few system configuration msrs

2006-12-31 Thread Ingo Oeser
Hi,

On Thursday, 28. December 2006 11:11, Avi Kivity wrote:
 Index: linux-2.6/drivers/kvm/svm.c
 ===
 --- linux-2.6.orig/drivers/kvm/svm.c
 +++ linux-2.6/drivers/kvm/svm.c
 @@ -1068,6 +1068,9 @@ static int emulate_on_interception(struc
  static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
  {
   switch (ecx) {
 + case 0xc0010010: /* SYSCFG */
 + case 0xc0010015: /* HWCR */
 + case MSR_IA32_PLATFORM_ID:
   case MSR_IA32_P5_MC_ADDR:
   case MSR_IA32_P5_MC_TYPE:
   case MSR_IA32_MC0_CTL:

What about just defining constants for these?
Then you can rip out these comments.

Same for linux-2.6/drivers/kvm/vmx.c


Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [DISCUSS] Make the variable NULL after freeing it.

2006-12-31 Thread Ingo Oeser
On Sunday, 31. December 2006 14:38, Bernd Petrovitsch wrote:
 That depends on the decision/definition if (so called) double free is
 an error or not (and free(NULL) must work in POSIX-compliant
 environments).

A double free of non-NULL is certainly an error.
So the idea of setting it to NULL is ok, since then you can
kfree the variable over and over again without any harm.

It is just complicated to do this side effect free.

Maybe one should check for builtin-constant and take the address,
if this is not an builtin-constant.

sth, like this

#define kfree_nullify(x) do { \
if (__builtin_constant_p(x)) { \
kfree(x); \
} else { \
typeof(x) *__addr_x = x; \
kfree(*__addr_x); \
*__addr_x = NULL; \
} \
} while (0)

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 009 of 14] knfsd: SUNRPC: teach svc_sendto() to deal with IPv6 addresses

2006-12-17 Thread Ingo Oeser
On Wednesday, 13. December 2006 00:59, NeilBrown wrote:
> diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
> --- .prev/net/sunrpc/svcsock.c2006-12-13 10:31:39.0 +1100
> +++ ./net/sunrpc/svcsock.c2006-12-13 10:32:15.0 +1100
> @@ -438,6 +439,47 @@ svc_wake_up(struct svc_serv *serv)
>   }
>  }
>  
> +union svc_pktinfo_u {
> + struct in_pktinfo pkti;
> +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> + struct in6_pktinfo pkti6;
> +#endif
> +};
> +
> +static void svc_set_cmsg_data(struct svc_rqst *rqstp, struct cmsghdr *cmh)
> +{
> + switch (rqstp->rq_sock->sk_sk->sk_family) {
> + case AF_INET:
> + do {
> + struct in_pktinfo *pki =
> + (struct in_pktinfo *) CMSG_DATA(cmh);

struct in_pktinfo *pki = CMSG_DATA(cmh);

Ugly casting not needed here, since CMSG_DATA should return "void *",
which can be casted to any pointer.

> +
> + cmh->cmsg_level = SOL_IP;
> + cmh->cmsg_type = IP_PKTINFO;
> + pki->ipi_ifindex = 0;
> + pki->ipi_spec_dst.s_addr = rqstp->rq_daddr.addr.s_addr;
> + cmh->cmsg_len = CMSG_LEN(sizeof(*pki));
> + } while (0);
> + break;
> +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> + case AF_INET6:
> + do {
> + struct in6_pktinfo *pki =
> + (struct in6_pktinfo *) CMSG_DATA(cmh);
> +

No casting needed, so:

struct in6_pktinfo *pki = CMSG_DATA(cmh);


Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.19] e1000: replace kmalloc with kzalloc

2006-12-17 Thread Ingo Oeser
On Tuesday, 12. December 2006 18:34, Pekka Enberg wrote:
> On 12/12/06, Yan Burman <[EMAIL PROTECTED]> wrote:
> > size = txdr->count * sizeof(struct e1000_buffer);
> > -   if (!(txdr->buffer_info = kmalloc(size, GFP_KERNEL))) {
> > +   if (!(txdr->buffer_info = kzalloc(size, GFP_KERNEL))) {
> > ret_val = 1;
> > goto err_nomem;
> > }
> > -   memset(txdr->buffer_info, 0, size);
> 
> No one seems to be using size elsewhere so why not convert to
> kcalloc() and get rid of it? (Seems to apply to other places as well.)

Because if done properly that often exceeds the 80 column limit.
The intermediate variable should be optimized away from the compiler.

But kcalloc() is better for another reason: Overflow checking.

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.19] e1000: replace kmalloc with kzalloc

2006-12-17 Thread Ingo Oeser
On Tuesday, 12. December 2006 18:34, Pekka Enberg wrote:
 On 12/12/06, Yan Burman [EMAIL PROTECTED] wrote:
  size = txdr-count * sizeof(struct e1000_buffer);
  -   if (!(txdr-buffer_info = kmalloc(size, GFP_KERNEL))) {
  +   if (!(txdr-buffer_info = kzalloc(size, GFP_KERNEL))) {
  ret_val = 1;
  goto err_nomem;
  }
  -   memset(txdr-buffer_info, 0, size);
 
 No one seems to be using size elsewhere so why not convert to
 kcalloc() and get rid of it? (Seems to apply to other places as well.)

Because if done properly that often exceeds the 80 column limit.
The intermediate variable should be optimized away from the compiler.

But kcalloc() is better for another reason: Overflow checking.

Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >