subject:"\[RFC\]\[PATCH 6\/6\] automatic tuning applied to some kernel components"

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-02-15 Thread Nadia Derbey


Eric W. Biederman wrote:

Nadia Derbey <[EMAIL PROTECTED]> writes:



But, what do you do with Oracle that's asking maxfiles to be set to 0x1,
while the default value might be enough for a system that's not running Oracle.
I'm afraid that giving boot time values to the max_* tunables we will loose all
the benefits from /proc (or /sys): it is impossible to anticipate what an OS
will be used for. So allowing such things to be changed without having to reboot
the machine is in my mind quite a powerful feature we should keep taking
adavntage of.



I'm not saying remove user spaces' ability to set the
denial-of-service limits.  I'm saying if they need to be frequently
changed we need to update the default so they are higher by default.

There really is no cost in moving those values up and down  it is just
an arbitrary integer used in comparisons.  But if we can make a good
guess that still catches runaway programs before they kill the machine
but also allows more programs to work out of the box we are in better
shape.

OK, happy to see we are on the same wavelength (and sorry for 
misunderstanding what you were saying ;-) )


Regards,
Nadia
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-02-14 Thread Eric W. Biederman

Nadia Derbey <[EMAIL PROTECTED]> writes:

> But, what do you do with Oracle that's asking maxfiles to be set to 0x1,
> while the default value might be enough for a system that's not running 
> Oracle.
> I'm afraid that giving boot time values to the max_* tunables we will loose 
> all
> the benefits from /proc (or /sys): it is impossible to anticipate what an OS
> will be used for. So allowing such things to be changed without having to 
> reboot
> the machine is in my mind quite a powerful feature we should keep taking
> adavntage of.

I'm not saying remove user spaces' ability to set the
denial-of-service limits.  I'm saying if they need to be frequently
changed we need to update the default so they are higher by default.

There really is no cost in moving those values up and down  it is just
an arbitrary integer used in comparisons.  But if we can make a good
guess that still catches runaway programs before they kill the machine
but also allows more programs to work out of the box we are in better
shape.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-02-14 Thread Nadia Derbey


Eric W. Biederman wrote:

Nadia Derbey <[EMAIL PROTECTED]> writes:



So, should I understand from this that automatic tuning and the AKT framework
itself would make sense, given that I find the rigth tunables it should be
applied to?



Sort of.  The concept of things tuning themselves automatically makes
a lot of sense.

I'm not at all certain about tunables being exported just to be hidden
again.  Ideally you don't even want the fact that these things are
varying visible to the user.

So I think that if you can find a good example that cannot be solved
better another way, you can build a case for your framework.
Currently I am doubt you can find such a case.



Actually, dont' know if you had the opportunity to read all the patches, but
there are 2 other tunables AKT is proposed to be applied to:
. max_threads, the tunable limit on nr_threads
. max_files, the tunable limit on nr_files



At a quick glance max_threads and max_files appear even more to be
DOS limits and not tunables and even less applicable to needing any
tuning at all.  My gut feel is at worst these values may need a little
better boot time defaults but otherwise they the should be good.

But, what do you do with Oracle that's asking maxfiles to be set to 
0x1, while the default value might be enough for a system that's not 
running Oracle.
I'm afraid that giving boot time values to the max_* tunables we will 
loose all the benefits from /proc (or /sys): it is impossible to 
anticipate what an OS will be used for. So allowing such things to be 
changed without having to reboot the machine is in my mind quite a 
powerful feature we should keep taking adavntage of.


Regards,
Nadia
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-02-14 Thread Al Boldi

ebiederm wrote:
> At a quick glance max_threads and max_files appear even more to be
> DOS limits and not tunables and even less applicable to needing any
> tuning at all.  My gut feel is at worst these values may need a little
> better boot time defaults but otherwise they the should be good.

Autotuning max_threads and max_files by using some sort of rate-limiter could 
possibly be more useful than any kind of fixed default.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-02-13 Thread Eric W. Biederman

Nadia Derbey <[EMAIL PROTECTED]> writes:

> So, should I understand from this that automatic tuning and the AKT framework
> itself would make sense, given that I find the rigth tunables it should be
> applied to?

Sort of.  The concept of things tuning themselves automatically makes
a lot of sense.

I'm not at all certain about tunables being exported just to be hidden
again.  Ideally you don't even want the fact that these things are
varying visible to the user.

So I think that if you can find a good example that cannot be solved
better another way, you can build a case for your framework.
Currently I am doubt you can find such a case.

> Actually, dont' know if you had the opportunity to read all the patches, but
> there are 2 other tunables AKT is proposed to be applied to:
> . max_threads, the tunable limit on nr_threads
> . max_files, the tunable limit on nr_files

At a quick glance max_threads and max_files appear even more to be
DOS limits and not tunables and even less applicable to needing any
tuning at all.  My gut feel is at worst these values may need a little
better boot time defaults but otherwise they the should be good.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-02-13 Thread Nadia Derbey


Eric W. Biederman wrote:

Nadia Derbey <[EMAIL PROTECTED]> writes:



I do not fully agree with you:
It is true that some ipc tunables play the role of DoS limits.
But IMHO the *mni ones (semmni, msgmni, shmmni) are used by the ipc subsystem to
adapt its data structures sizes to what is being asked for through the tunable
value. I think this is how they manage to take into account a new tunable value
without a need for rebooting the system: reallocate some more memory on demand.



Yes, they do.  However if you are constantly having to play with shmmni or
the others that is the problem and the array should be replaced with
a hash table or some form of radix tree, so it changes it's size to fit
the need.  Once that is done, shmmni does become a simple DOS limit.

So what I'm asking is please fix the problem at the source don't plaster over
it.



Now, what the akt framework does, is that it takes advantage of this concept of
"on demand memory allocation" to replace a user (or a daemon) that would
periodically check its ipcs consumptions and manually adjust the ipcs tunables:
Doing this from the user space would imply a latency that makes it difficult to
react fast enough to resources running out.



There may be some sense in this but you haven't found something that inherently
needs tuning.  You have found something that has a poor data structure,
and can more easily be fixed by simply fixing the data structure.


So, should I understand from this that automatic tuning and the AKT 
framework itself would make sense, given that I find the rigth tunables 
it should be applied to?
Actually, dont' know if you had the opportunity to read all the patches, 
but there are 2 other tunables AKT is proposed to be applied to:

. max_threads, the tunable limit on nr_threads
. max_files, the tunable limit on nr_files

Regards,
Nadia

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-02-09 Thread Eric W. Biederman

Nadia Derbey <[EMAIL PROTECTED]> writes:

> I do not fully agree with you:
> It is true that some ipc tunables play the role of DoS limits.
> But IMHO the *mni ones (semmni, msgmni, shmmni) are used by the ipc subsystem 
> to
> adapt its data structures sizes to what is being asked for through the tunable
> value. I think this is how they manage to take into account a new tunable 
> value
> without a need for rebooting the system: reallocate some more memory on 
> demand.

Yes, they do.  However if you are constantly having to play with shmmni or
the others that is the problem and the array should be replaced with
a hash table or some form of radix tree, so it changes it's size to fit
the need.  Once that is done, shmmni does become a simple DOS limit.

So what I'm asking is please fix the problem at the source don't plaster over
it.

> Now, what the akt framework does, is that it takes advantage of this concept 
> of
> "on demand memory allocation" to replace a user (or a daemon) that would
> periodically check its ipcs consumptions and manually adjust the ipcs 
> tunables:
> Doing this from the user space would imply a latency that makes it difficult 
> to
> react fast enough to resources running out.

There may be some sense in this but you haven't found something that inherently
needs tuning.  You have found something that has a poor data structure,
and can more easily be fixed by simply fixing the data structure.

I'm guessing that we have a disconnect somewhere with kernel developers thinking
shm is an old legacy api and doing minimal maintenance, expecting serious users
to use tmpfs or hugetlbfs and users not used to the old stuff using the SYSV 
apis.

If we have serious users it makes sense to fix these things properly, in a 
backwards
compatible way, so existing users and applications don't need to be changed.

> Now, talking about DoS limits, akt implements them in a sense: each tunable
> managed by akt has 3 attributes exported to sysfs:
> . autotune: enable / disable auto-tuning
> . min: min value the tunable can ever reach
> . max: max value the tunable can ever reach
>
> Enabling a sysadmin to play with these min and max values makes it possible to
> refine the dynamic adjustment, and avoid that the tunable reaches really huge
> values.

This just shifts the location where you have your DOS limit and could
be done transparently under the covers with shmmni being the maximum.
If we can't get users to switch to something that doesn't need tuning
that has been available for years, I doubt even more user tunables
that tune the tunables will make the situation any better.  I suspect
your changes would just confuse the landscape even more and give us
more weird legacy cases to support that we can never get rid of?

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-02-09 Thread Nadia Derbey


Eric W. Biederman wrote:

Nadia Derbey <[EMAIL PROTECTED]> writes:



2) why autotuning:
There are at least 3 cases where it can be useful
. for workloads that are known to need a big amount of a given resource type
(say shared memories), but we don't know what the maximum amount needed will be
. to solve the case of multiple applications running on a single system, and
that need the same tunable to be adjusted to feet their needs
. to make a system correctly react to eventual peak loads for a given resource
usage, i.e. make it tune up *and down* as needed.




In all these cases, the akt framework will enable the kernel to adapt to
increasing / decreasing resource consumption:
1) avoid allocating "a priori" a big amount of memory that will be used only in
extreme cases. This is the effect of doing an "echo 


/proc/sys/kernel/shmmni"


2) the system will come back to the default values as soon as the peak load is
over.



At least the ipc ones are supposed to be DOS limits not behavior
modifiers.  I do admit from looking at the code that there are some
consequences of increasing things like shmmni.  However I think we
would be better off with  better data structures and implementations
that remove these consequences than this autotuning of
denial-of-service limits.



I do not fully agree with you:
It is true that some ipc tunables play the role of DoS limits.
But IMHO the *mni ones (semmni, msgmni, shmmni) are used by the ipc 
subsystem to adapt its data structures sizes to what is being asked for 
through the tunable value. I think this is how they manage to take into 
account a new tunable value without a need for rebooting the system: 
reallocate some more memory on demand.


Now, what the akt framework does, is that it takes advantage of this 
concept of "on demand memory allocation" to replace a user (or a daemon) 
that would periodically check its ipcs consumptions and manually adjust 
the ipcs tunables: Doing this from the user space would imply a latency 
that makes it difficult to react fast enough to resources running out.


Now, talking about DoS limits, akt implements them in a sense: each 
tunable managed by akt has 3 attributes exported to sysfs:

. autotune: enable / disable auto-tuning
. min: min value the tunable can ever reach
. max: max value the tunable can ever reach

Enabling a sysadmin to play with these min and max values makes it 
possible to refine the dynamic adjustment, and avoid that the tunable 
reaches really huge values.


Regards,
Nadia

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-02-07 Thread Eric W. Biederman

Nadia Derbey <[EMAIL PROTECTED]> writes:

>
> 2) why autotuning:
> There are at least 3 cases where it can be useful
> . for workloads that are known to need a big amount of a given resource type
> (say shared memories), but we don't know what the maximum amount needed will 
> be
> . to solve the case of multiple applications running on a single system, and
> that need the same tunable to be adjusted to feet their needs
> . to make a system correctly react to eventual peak loads for a given resource
> usage, i.e. make it tune up *and down* as needed.

>
> In all these cases, the akt framework will enable the kernel to adapt to
> increasing / decreasing resource consumption:
> 1) avoid allocating "a priori" a big amount of memory that will be used only 
> in
> extreme cases. This is the effect of doing an "echo 
>> /proc/sys/kernel/shmmni"
>
> 2) the system will come back to the default values as soon as the peak load is
> over.

At least the ipc ones are supposed to be DOS limits not behavior
modifiers.  I do admit from looking at the code that there are some
consequences of increasing things like shmmni.  However I think we
would be better off with  better data structures and implementations
that remove these consequences than this autotuning of
denial-of-service limits.

i.e. I think you are treating the symptom not the problem.

Does this make sense?

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-01-23 Thread Nadia Derbey


Andrew Morton wrote:

On Tue, 16 Jan 2007 07:15:22 +0100 [EMAIL PROTECTED] wrote:
The following kernel components register a tunable structure and call the
auto-tuning routine:
 . file system
 . shared memory (per namespace)
 . semaphore (per namespace)
 . message queues (per namespace)



This is the part of the patch series which really matters, and I just don't
understand it :(

Why do we want to autotune these things?  What problem is this patch series
solving?  Please describe this part of the work much, much more completely,
so we can understand the need to add such a large amount of code to the
kernel.


1) why these tunables?
The ipc tunables have been selected as "guinea-pig" tunables for the AKT 
framework because they are likely to be often used in data bases. This 
applies to file-max too.
Now, if the framework itself is accepted, the set of impacted tunables 
can easily be enhanced.


2) why autotuning:
There are at least 3 cases where it can be useful
. for workloads that are known to need a big amount of a given resource 
type (say shared memories), but we don't know what the maximum amount 
needed will be
. to solve the case of multiple applications running on a single system, 
and that need the same tunable to be adjusted to feet their needs
. to make a system correctly react to eventual peak loads for a given 
resource usage, i.e. make it tune up *and down* as needed.


In all these cases, the akt framework will enable the kernel to adapt to 
increasing / decreasing resource consumption:
1) avoid allocating "a priori" a big amount of memory that will be used 
only in extreme cases. This is the effect of doing an "echo  
> /proc/sys/kernel/shmmni"
2) the system will come back to the default values as soon as the peak 
load is over.




It seems strange that the whole feature is Kconfigurable.  Please also
explain the thinking behind that.


We wanted to make it configurable because it adds some overhead in terms of
1) generated kernel size
2) instructions added to the resource creation / removal code paths even 
if auto-tuning is not activated for th corresponding tunable -> 
performance impact.




I suspect the patches would be much simpler if you simply required that all
these new tunables be of type `long'.  About seven eighths of the code
would go away.  As would most of those eye-popping macros.



Yes, agree with you: the idea here was to make the framework more 
generic. But I can change that.


Regards,
Nadia




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-01-22 Thread Andrew Morton

> On Tue, 16 Jan 2007 07:15:22 +0100 [EMAIL PROTECTED] wrote:
> The following kernel components register a tunable structure and call the
> auto-tuning routine:
>   . file system
>   . shared memory (per namespace)
>   . semaphore (per namespace)
>   . message queues (per namespace)

This is the part of the patch series which really matters, and I just don't
understand it :(

Why do we want to autotune these things?  What problem is this patch series
solving?  Please describe this part of the work much, much more completely,
so we can understand the need to add such a large amount of code to the
kernel.

It seems strange that the whole feature is Kconfigurable.  Please also
explain the thinking behind that.

I suspect the patches would be much simpler if you simply required that all
these new tunables be of type `long'.  About seven eighths of the code
would go away.  As would most of those eye-popping macros.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH 6/6] automatic tuning applied to some kernel components

2007-01-15 Thread Nadia . Derbey

[PATCH 06/06]


The following kernel components register a tunable structure and call the
auto-tuning routine:
  . file system
  . shared memory (per namespace)
  . semaphore (per namespace)
  . message queues (per namespace)


Signed-off-by: Nadia Derbey <[EMAIL PROTECTED]>


---
 fs/file_table.c |   81 
 include/linux/akt.h |1 
 include/linux/ipc.h |6 +++
 init/main.c |1 
 ipc/msg.c   |   19 
 ipc/sem.c   |   41 ++
 ipc/shm.c   |   74 ---
 7 files changed, 218 insertions(+), 5 deletions(-)

Index: linux-2.6.20-rc4/fs/file_table.c
===
--- linux-2.6.20-rc4.orig/fs/file_table.c   2007-01-15 13:08:14.0 
+0100
+++ linux-2.6.20-rc4/fs/file_table.c2007-01-15 15:44:39.0 +0100
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 
@@ -34,6 +36,71 @@ __cacheline_aligned_in_smp DEFINE_SPINLO
 
 static struct percpu_counter nr_files __cacheline_aligned_in_smp;
 
+#ifdef CONFIG_AKT
+
+static int get_nr_files(void);
+
+/** automatic tuning **/
+#define FILPTHRESH 80  /* threshold = 80% */
+
+/*
+ * FUNCTION:This is the routine called to accomplish auto tuning for the
+ *  max_files tunable.
+ *
+ *  Upwards adjustment:
+ *  Adjustment is needed if nr_files has reached
+ *  (threshold / 100 * max_files)
+ *  In that case, max_files is set to
+ *  (tunable + max_files * (100 - threshold) / 100)
+ *
+ *  Downards adjustment:
+ *   Adjustment is needed if nr_files has fallen under
+ *   (threshold / 100 * max_files previous value)
+ *   In that case max_files is set back to its previous value,
+ *   i.e. to (max_files * 100 / (200 - threshold))
+ *
+ * PARAMETERS:  cmd: controls the adjustment direction (up / down)
+ *  params: pointer to the registered tunable structure
+ *
+ * EXECUTION ENVIRONMENT: This routine should be called with the
+ *params->tunable_lck lock held
+ *
+ * RETURN VALUE: 1 if tunable has been adjusted
+ *   0 else
+ */
+static inline int maxfiles_auto_tuning(int cmd, struct auto_tune *params)
+{
+   int thr = params->threshold;
+   int min = params->min.value.val_int;
+   int max = params->max.value.val_int;
+   int tun = files_stat.max_files;
+
+   if (cmd == AKT_UP) {
+   if (get_nr_files() >= tun * thr / 100 && tun < max) {
+   int new = tun * (200 - thr) / 100;
+
+   files_stat.max_files = min(max, new);
+   return 1;
+   } else
+   return 0;
+   }
+
+   if (get_nr_files() < tun * thr / (200 - thr) && tun > min) {
+   int new = tun * 100 / (200 - thr);
+
+   files_stat.max_files = max(min, new);
+   return 1;
+   } else
+   return 0;
+}
+
+#endif /* CONFIG_AKT */
+
+/* The maximum value will be known later on */
+DEFINE_TUNABLE(maxfiles_akt, FILPTHRESH, 0, 0, &files_stat.max_files,
+   &nr_files, int);
+
+
 static inline void file_free_rcu(struct rcu_head *head)
 {
struct file *f =  container_of(head, struct file, f_u.fu_rcuhead);
@@ -44,6 +111,8 @@ static inline void file_free(struct file
 {
percpu_counter_dec(&nr_files);
call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
+
+   activate_auto_tuning(AKT_DOWN, &maxfiles_akt);
 }
 
 /*
@@ -91,6 +160,8 @@ struct file *get_empty_filp(void)
static int old_max;
struct file * f;
 
+   activate_auto_tuning(AKT_UP, &maxfiles_akt);
+
/*
 * Privileged users can go above max_files
 */
@@ -299,6 +370,16 @@ void __init files_init(unsigned long mem
files_stat.max_files = n; 
if (files_stat.max_files < NR_FILE)
files_stat.max_files = NR_FILE;
+
+   set_tunable_min_max(maxfiles_akt, n, n * 2, int);
+   set_autotuning_routine(&maxfiles_akt, maxfiles_auto_tuning);
+
files_defer_init();
percpu_counter_init(&nr_files, 0);
 } 
+
+void __init files_late_init(void)
+{
+   if (register_tunable(&maxfiles_akt))
+   printk(KERN_WARNING "Failed registering tunable file-max\n");
+}
Index: linux-2.6.20-rc4/include/linux/akt.h
===
--- linux-2.6.20-rc4.orig/include/linux/akt.h   2007-01-15 15:31:44.0 
+0100
+++ linux-2.6.20-rc4/include/linux/akt.h2007-01-15 15:45:29.0 
+0100
@@ -295,5 +295,6 @@ static inline void init_auto_tuning(void
 #endif /* CONFIG_AKT */
 
 extern void fork_late_init(void);
+extern void files_late_init(void);

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components

[RFC][PATCH 6/6] automatic tuning applied to some kernel components

12 matches

Site Navigation

Mail list logo

Footer information