Re: mod_deflate DoS using HEAD

2010-06-26 Thread Stefan Fritsch
On Tuesday 22 June 2010, Plüm, Rüdiger, VF-Group wrote:
 I am currently +0 on wether to use the patch above or my original
 proposal. Both have its pros and cons (Saving more CPU vs. be more
 picky about caching and implement an RFC SHOULD).

I have now commited your original patch because it is less likely to 
break something. If the CPU usage for compressing the first buffer 
turns out to be a problem, we can still change it later.

Cheers,
Stefan


Some memory usage optimizations

2010-06-26 Thread Stefan Fritsch
Hi,

there are some places where httpd uses more memory than necessary, 
which can increase cache misses and reduce performance on current 
hardware. Things I would like to change:

1) reorder structs to have fewer holes on 64bit arches

httpd.h has this nice comment since the days of 1.3:

/* Things placed at the end of the record to avoid breaking binary
 * compatibility.  It would be nice to remember to reorder the entire
 * record to improve 64bit alignment the next time we need to break
 * binary compatibility for some other reason.
 */

Pro: Examples for space savings on 64bit:

request_rec:   704 - 672 bytes
server_rec:208 - 192 bytes
proxy_worker:  264 - 208 bytes

Con: In some cases reordering the struct members makes the code harder 
to read because related members may no longer be grouped together.

Is it worth changing? Would somebody be -1?

(BTW, pahole is useful for finding such structs).


2) Use char instead of int for the module_levels vector in struct 
ap_logconfig. The length of the vector is ~100.

Pro: This may save ~300 bytes per server conf which are read often and 
therefore occupy CPU-cache memory

Con: On some architectures, accessing chars is slower than accessing 
ints.

Does anyone have an idea what is better, here? Int or char?

The same argument could be made for boolean flags in various other 
structs. But I don't think those are worth the effort.


3) In server/config.c, many config vectors are created (allocated and 
zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT). But this 
is only necessary before dynamic modules are loaded. Afterwards, 
total_modules is set to the total number of modules. Therefore 
allocating a vec of length total_modules should be enough. This would 
save zeroing 128 pointers per request and connection.

It seems the attached patch works and I could not find any problems 
when adding/removing modules during graceful restart. Objections?

Cheers,
Stefan
diff --git a/server/config.c b/server/config.c
index 0f1660a..dfaac61 100644
--- a/server/config.c
+++ b/server/config.c
@@ -210,6 +210,7 @@ typedef void *(*merger_func)(apr_pool_t *, void *, void *);
  * overridden).
  */
 
+/* for use before dynamic modules are loaded */
 static ap_conf_vector_t *create_empty_config(apr_pool_t *p)
 {
 void *conf_vector = apr_pcalloc(p, sizeof(void *) *
@@ -217,6 +218,13 @@ static ap_conf_vector_t *create_empty_config(apr_pool_t *p)
 return conf_vector;
 }
 
+/* for use after dynamic modules are loaded */
+static ap_conf_vector_t *create_short_empty_config(apr_pool_t *p)
+{
+void *conf_vector = apr_pcalloc(p, sizeof(void *) * total_modules);
+return conf_vector;
+}
+
 static ap_conf_vector_t *create_default_per_dir_config(apr_pool_t *p)
 {
 void **conf_vector = apr_pcalloc(p, sizeof(void *) *
@@ -299,17 +307,17 @@ static void merge_server_configs(apr_pool_t *p, ap_conf_vector_t *base,
 
 AP_CORE_DECLARE(ap_conf_vector_t *) ap_create_request_config(apr_pool_t *p)
 {
-return create_empty_config(p);
+return create_short_empty_config(p);
 }
 
 AP_CORE_DECLARE(ap_conf_vector_t *) ap_create_conn_config(apr_pool_t *p)
 {
-return create_empty_config(p);
+return create_short_empty_config(p);
 }
 
 AP_CORE_DECLARE(ap_conf_vector_t *) ap_create_per_dir_config(apr_pool_t *p)
 {
-return create_empty_config(p);
+return create_short_empty_config(p);
 }
 
 /* Invoke the filter_init_func for all filters with FILTERS where f-r
@@ -1364,7 +1372,7 @@ AP_DECLARE(void) ap_reset_module_loglevels(struct ap_logconf *l, int val)
 {
 if (l-module_levels) {
 int i;
-for (i = 0; i  total_modules + DYNAMIC_MODULE_LIMIT; i++)
+for (i = 0; i  total_modules; i++)
 l-module_levels[i] = val;
 }
 }
@@ -1373,8 +1381,7 @@ AP_DECLARE(void) ap_set_module_loglevel(apr_pool_t *pool, struct ap_logconf *l,
 int index, int level)
 {
 if (!l-module_levels) {
-l-module_levels = apr_palloc(pool,
- sizeof(int) * (total_modules + DYNAMIC_MODULE_LIMIT));
+l-module_levels = apr_palloc(pool, sizeof(int) * total_modules);
 if (l-level == APLOG_UNSET) {
 ap_reset_module_loglevels(l, APLOG_UNSET);
 }
@@ -2002,7 +2009,7 @@ AP_CORE_DECLARE(const char *) ap_init_virtual_host(apr_pool_t *p,
 s-names = apr_array_make(p, 4, sizeof(char **));
 s-wild_names = apr_array_make(p, 4, sizeof(char **));
 
-s-module_config = create_empty_config(p);
+s-module_config = create_short_empty_config(p);
 s-lookup_defaults = ap_create_per_dir_config(p);
 
 s-limit_req_line = main_server-limit_req_line;
@@ -2022,8 +2029,7 @@ AP_DECLARE(struct ap_logconf *) ap_new_log_config(apr_pool_t *p,
 l-level = old-level;
 if (old-module_levels) {
 l-module_levels =
-apr_pmemdup(p, old-module_levels,
-sizeof(int) * (total_modules + 

Re: Some memory usage optimizations

2010-06-26 Thread William A. Rowe Jr.
On 6/26/2010 1:49 PM, Stefan Fritsch wrote:
 Hi,
 
 there are some places where httpd uses more memory than necessary, 
 which can increase cache misses and reduce performance on current 
 hardware. Things I would like to change:
 
 1) reorder structs to have fewer holes on 64bit arches
 
 Pro: Examples for space savings on 64bit:
 
 Con: In some cases reordering the struct members makes the code harder 
 to read because related members may no longer be grouped together.

Feel free to reorder for more-optimal storage, but please don't completely
disassociate related fields!  It makes debugging a pain in the neck.  For
that matter, lots of those 'add these to the end of the struct' fields should
be reordered anyways now that we are going towards 2.4.

This still needs to be human-readable.

 2) Use char instead of int for the module_levels vector in struct 
 ap_logconfig. The length of the vector is ~100.
 
 Pro: This may save ~300 bytes per server conf which are read often and 
 therefore occupy CPU-cache memory
 
 Con: On some architectures, accessing chars is slower than accessing 
 ints.
 
 Does anyone have an idea what is better, here? Int or char?
 
 The same argument could be made for boolean flags in various other 
 structs. But I don't think those are worth the effort.

Anytime it's part of a per-dir, I'd suggest char is probably better, or
even bitfields.  The shifting pain is equal to slicing a char.

But we are almost to 128 modules.  So be careful here, let's presume that
folks continue to propagate specific bits of functionality into a module.

 3) In server/config.c, many config vectors are created (allocated and 
 zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT). But this 
 is only necessary before dynamic modules are loaded. Afterwards, 
 total_modules is set to the total number of modules. Therefore 
 allocating a vec of length total_modules should be enough. This would 
 save zeroing 128 pointers per request and connection.
 
 It seems the attached patch works and I could not find any problems 
 when adding/removing modules during graceful restart. Objections?

What of a separate alloc_modules variable, that would be init to the old
(total_modules + DYNAMIC_MODULE_LIMIT), and could be truncated when the
pre-config was complete to simply total_modules.  This might even save
additional processing/merging/copying on the stale/unused NULL pointers.



Re: Some memory usage optimizations

2010-06-26 Thread Stefan Fritsch
On Saturday 26 June 2010, William A. Rowe Jr. wrote:
  2) Use char instead of int for the module_levels vector in
  struct  ap_logconfig. The length of the vector is ~100.
 
  Pro: This may save ~300 bytes per server conf which are read
  often and  therefore occupy CPU-cache memory
 
  Con: On some architectures, accessing chars is slower than
  accessing  ints.
 
  Does anyone have an idea what is better, here? Int or char?
 
  The same argument could be made for boolean flags in various
  other structs. But I don't think those are worth the effort.
 
 Anytime it's part of a per-dir, I'd suggest char is probably
 better, or even bitfields.  The shifting pain is equal to slicing
 a char.

Since bit-field ordering is compiler-dependant, I think we should not 
use bit-fields in public headers. In private data, it may be ok. But 
accessing bit fields is even more expensive than accessing chars.

 But we are almost to 128 modules.  So be careful here, let's
 presume that folks continue to propagate specific bits of
 functionality into a module.

To make it clear: I only want to change the elements of the 
module_levels vector which only need to take values from -2 to 15. The 
index used to address the element will of course stay an int.

But trunk has  110 modules already. Maybe we should increase 
DYNAMIC_MODULE_LIMIT from 128 to 192?

 
  3) In server/config.c, many config vectors are created (allocated
  and  zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT).
  But this is only necessary before dynamic modules are loaded.
  Afterwards, total_modules is set to the total number of modules.
  Therefore allocating a vec of length total_modules should be
  enough. This would save zeroing 128 pointers per request and
  connection.
 
  It seems the attached patch works and I could not find any
  problems  when adding/removing modules during graceful restart.
  Objections?
 
 What of a separate alloc_modules variable, that would be init to
 the old (total_modules + DYNAMIC_MODULE_LIMIT), and could be
 truncated when the pre-config was complete to simply
 total_modules.  This might even save additional
 processing/merging/copying on the stale/unused NULL pointers.

I have commited something like this.

Cheers,
Stefan


Re: Some memory usage optimizations

2010-06-26 Thread William A. Rowe Jr.
On 6/26/2010 5:27 PM, Stefan Fritsch wrote:
 But trunk has  110 modules already. Maybe we should increase 
 DYNAMIC_MODULE_LIMIT from 128 to 192?

With your proposed optimizations, we can take it to 256 with still a
significant net win..

 3) In server/config.c, many config vectors are created (allocated
 and  zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT).
 But this is only necessary before dynamic modules are loaded.
 Afterwards, total_modules is set to the total number of modules.
 Therefore allocating a vec of length total_modules should be
 enough. This would save zeroing 128 pointers per request and
 connection.

 It seems the attached patch works and I could not find any
 problems  when adding/removing modules during graceful restart.
 Objections?

 What of a separate alloc_modules variable, that would be init to
 the old (total_modules + DYNAMIC_MODULE_LIMIT), and could be
 truncated when the pre-config was complete to simply
 total_modules.  This might even save additional
 processing/merging/copying on the stale/unused NULL pointers.
 
 I have commited something like this.

Will look Monday eve with anticipation, thanks!