Re: mod_deflate DoS using HEAD
On Tuesday 22 June 2010, Plüm, Rüdiger, VF-Group wrote: I am currently +0 on wether to use the patch above or my original proposal. Both have its pros and cons (Saving more CPU vs. be more picky about caching and implement an RFC SHOULD). I have now commited your original patch because it is less likely to break something. If the CPU usage for compressing the first buffer turns out to be a problem, we can still change it later. Cheers, Stefan
Some memory usage optimizations
Hi, there are some places where httpd uses more memory than necessary, which can increase cache misses and reduce performance on current hardware. Things I would like to change: 1) reorder structs to have fewer holes on 64bit arches httpd.h has this nice comment since the days of 1.3: /* Things placed at the end of the record to avoid breaking binary * compatibility. It would be nice to remember to reorder the entire * record to improve 64bit alignment the next time we need to break * binary compatibility for some other reason. */ Pro: Examples for space savings on 64bit: request_rec: 704 - 672 bytes server_rec:208 - 192 bytes proxy_worker: 264 - 208 bytes Con: In some cases reordering the struct members makes the code harder to read because related members may no longer be grouped together. Is it worth changing? Would somebody be -1? (BTW, pahole is useful for finding such structs). 2) Use char instead of int for the module_levels vector in struct ap_logconfig. The length of the vector is ~100. Pro: This may save ~300 bytes per server conf which are read often and therefore occupy CPU-cache memory Con: On some architectures, accessing chars is slower than accessing ints. Does anyone have an idea what is better, here? Int or char? The same argument could be made for boolean flags in various other structs. But I don't think those are worth the effort. 3) In server/config.c, many config vectors are created (allocated and zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT). But this is only necessary before dynamic modules are loaded. Afterwards, total_modules is set to the total number of modules. Therefore allocating a vec of length total_modules should be enough. This would save zeroing 128 pointers per request and connection. It seems the attached patch works and I could not find any problems when adding/removing modules during graceful restart. Objections? Cheers, Stefan diff --git a/server/config.c b/server/config.c index 0f1660a..dfaac61 100644 --- a/server/config.c +++ b/server/config.c @@ -210,6 +210,7 @@ typedef void *(*merger_func)(apr_pool_t *, void *, void *); * overridden). */ +/* for use before dynamic modules are loaded */ static ap_conf_vector_t *create_empty_config(apr_pool_t *p) { void *conf_vector = apr_pcalloc(p, sizeof(void *) * @@ -217,6 +218,13 @@ static ap_conf_vector_t *create_empty_config(apr_pool_t *p) return conf_vector; } +/* for use after dynamic modules are loaded */ +static ap_conf_vector_t *create_short_empty_config(apr_pool_t *p) +{ +void *conf_vector = apr_pcalloc(p, sizeof(void *) * total_modules); +return conf_vector; +} + static ap_conf_vector_t *create_default_per_dir_config(apr_pool_t *p) { void **conf_vector = apr_pcalloc(p, sizeof(void *) * @@ -299,17 +307,17 @@ static void merge_server_configs(apr_pool_t *p, ap_conf_vector_t *base, AP_CORE_DECLARE(ap_conf_vector_t *) ap_create_request_config(apr_pool_t *p) { -return create_empty_config(p); +return create_short_empty_config(p); } AP_CORE_DECLARE(ap_conf_vector_t *) ap_create_conn_config(apr_pool_t *p) { -return create_empty_config(p); +return create_short_empty_config(p); } AP_CORE_DECLARE(ap_conf_vector_t *) ap_create_per_dir_config(apr_pool_t *p) { -return create_empty_config(p); +return create_short_empty_config(p); } /* Invoke the filter_init_func for all filters with FILTERS where f-r @@ -1364,7 +1372,7 @@ AP_DECLARE(void) ap_reset_module_loglevels(struct ap_logconf *l, int val) { if (l-module_levels) { int i; -for (i = 0; i total_modules + DYNAMIC_MODULE_LIMIT; i++) +for (i = 0; i total_modules; i++) l-module_levels[i] = val; } } @@ -1373,8 +1381,7 @@ AP_DECLARE(void) ap_set_module_loglevel(apr_pool_t *pool, struct ap_logconf *l, int index, int level) { if (!l-module_levels) { -l-module_levels = apr_palloc(pool, - sizeof(int) * (total_modules + DYNAMIC_MODULE_LIMIT)); +l-module_levels = apr_palloc(pool, sizeof(int) * total_modules); if (l-level == APLOG_UNSET) { ap_reset_module_loglevels(l, APLOG_UNSET); } @@ -2002,7 +2009,7 @@ AP_CORE_DECLARE(const char *) ap_init_virtual_host(apr_pool_t *p, s-names = apr_array_make(p, 4, sizeof(char **)); s-wild_names = apr_array_make(p, 4, sizeof(char **)); -s-module_config = create_empty_config(p); +s-module_config = create_short_empty_config(p); s-lookup_defaults = ap_create_per_dir_config(p); s-limit_req_line = main_server-limit_req_line; @@ -2022,8 +2029,7 @@ AP_DECLARE(struct ap_logconf *) ap_new_log_config(apr_pool_t *p, l-level = old-level; if (old-module_levels) { l-module_levels = -apr_pmemdup(p, old-module_levels, -sizeof(int) * (total_modules +
Re: Some memory usage optimizations
On 6/26/2010 1:49 PM, Stefan Fritsch wrote: Hi, there are some places where httpd uses more memory than necessary, which can increase cache misses and reduce performance on current hardware. Things I would like to change: 1) reorder structs to have fewer holes on 64bit arches Pro: Examples for space savings on 64bit: Con: In some cases reordering the struct members makes the code harder to read because related members may no longer be grouped together. Feel free to reorder for more-optimal storage, but please don't completely disassociate related fields! It makes debugging a pain in the neck. For that matter, lots of those 'add these to the end of the struct' fields should be reordered anyways now that we are going towards 2.4. This still needs to be human-readable. 2) Use char instead of int for the module_levels vector in struct ap_logconfig. The length of the vector is ~100. Pro: This may save ~300 bytes per server conf which are read often and therefore occupy CPU-cache memory Con: On some architectures, accessing chars is slower than accessing ints. Does anyone have an idea what is better, here? Int or char? The same argument could be made for boolean flags in various other structs. But I don't think those are worth the effort. Anytime it's part of a per-dir, I'd suggest char is probably better, or even bitfields. The shifting pain is equal to slicing a char. But we are almost to 128 modules. So be careful here, let's presume that folks continue to propagate specific bits of functionality into a module. 3) In server/config.c, many config vectors are created (allocated and zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT). But this is only necessary before dynamic modules are loaded. Afterwards, total_modules is set to the total number of modules. Therefore allocating a vec of length total_modules should be enough. This would save zeroing 128 pointers per request and connection. It seems the attached patch works and I could not find any problems when adding/removing modules during graceful restart. Objections? What of a separate alloc_modules variable, that would be init to the old (total_modules + DYNAMIC_MODULE_LIMIT), and could be truncated when the pre-config was complete to simply total_modules. This might even save additional processing/merging/copying on the stale/unused NULL pointers.
Re: Some memory usage optimizations
On Saturday 26 June 2010, William A. Rowe Jr. wrote: 2) Use char instead of int for the module_levels vector in struct ap_logconfig. The length of the vector is ~100. Pro: This may save ~300 bytes per server conf which are read often and therefore occupy CPU-cache memory Con: On some architectures, accessing chars is slower than accessing ints. Does anyone have an idea what is better, here? Int or char? The same argument could be made for boolean flags in various other structs. But I don't think those are worth the effort. Anytime it's part of a per-dir, I'd suggest char is probably better, or even bitfields. The shifting pain is equal to slicing a char. Since bit-field ordering is compiler-dependant, I think we should not use bit-fields in public headers. In private data, it may be ok. But accessing bit fields is even more expensive than accessing chars. But we are almost to 128 modules. So be careful here, let's presume that folks continue to propagate specific bits of functionality into a module. To make it clear: I only want to change the elements of the module_levels vector which only need to take values from -2 to 15. The index used to address the element will of course stay an int. But trunk has 110 modules already. Maybe we should increase DYNAMIC_MODULE_LIMIT from 128 to 192? 3) In server/config.c, many config vectors are created (allocated and zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT). But this is only necessary before dynamic modules are loaded. Afterwards, total_modules is set to the total number of modules. Therefore allocating a vec of length total_modules should be enough. This would save zeroing 128 pointers per request and connection. It seems the attached patch works and I could not find any problems when adding/removing modules during graceful restart. Objections? What of a separate alloc_modules variable, that would be init to the old (total_modules + DYNAMIC_MODULE_LIMIT), and could be truncated when the pre-config was complete to simply total_modules. This might even save additional processing/merging/copying on the stale/unused NULL pointers. I have commited something like this. Cheers, Stefan
Re: Some memory usage optimizations
On 6/26/2010 5:27 PM, Stefan Fritsch wrote: But trunk has 110 modules already. Maybe we should increase DYNAMIC_MODULE_LIMIT from 128 to 192? With your proposed optimizations, we can take it to 256 with still a significant net win.. 3) In server/config.c, many config vectors are created (allocated and zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT). But this is only necessary before dynamic modules are loaded. Afterwards, total_modules is set to the total number of modules. Therefore allocating a vec of length total_modules should be enough. This would save zeroing 128 pointers per request and connection. It seems the attached patch works and I could not find any problems when adding/removing modules during graceful restart. Objections? What of a separate alloc_modules variable, that would be init to the old (total_modules + DYNAMIC_MODULE_LIMIT), and could be truncated when the pre-config was complete to simply total_modules. This might even save additional processing/merging/copying on the stale/unused NULL pointers. I have commited something like this. Will look Monday eve with anticipation, thanks!