Re: Rewritemap by internal function
On 10/10/06 7:28 AM, "Nick Kew" <[EMAIL PROTECTED]> wrote: > Doesn't that mean > AN Other module can register its own "int:foo" functions, and > the documentation is wrong? That is my understanding: the framework is in place, just not documented. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: httpd 2.2 cache - disable and enable
On 10/11/06 11:11 AM, "Colm MacCarthaigh" <[EMAIL PROTECTED]> wrote: somebody have more preferences to one method ? > > I'd be -1 to anything that needed to perform a regular expression test > to check the cache, it'd be a huge CPU hit on something that we're > supposed to be doing as quickly as possible. It's not a very large performance hit to do a regex on almost every request. We do it on some of our sites, as our hacked version of mod_cache allows regex matches. (in fact, all of are matches use the provider mechanism - all providers are "resolved" at config time, not per request). -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_rewrite XML RewriteMap
On 12/29/06 3:58 PM, "Arnold Daniels" <[EMAIL PROTECTED]> wrote: > I use mod_rewrite quite often. The thing I really miss is the option for > an XML based RewriteMap, where an XPath statement can be used as key. > Luckily Apache is open-source, so I've edited mod_rewrite to include > this feature. > > It works as follows: > RewriteEngine On > RewriteMap xmlmenu xml:/var/www/develop/cfg/menu.xml > RewriteRule ^/menu/([^/]+)/?$ ${xmlmenu://[EMAIL PROTECTED]'$1']/@page} > [QSA] Can't this be done via a rewrite function (not very well documented, but it works). -1 doing this in mod_rewrite. Needs to be in a mod_xml_rewrite (or whatever name) and use the rewrite function interface. >From mod_rewrite.h (2.2.3) /* rewrite map function prototype */ typedef char *(rewrite_mapfunc_t)(request_rec *r, char *key); /* optional function declaration */ APR_DECLARE_OPTIONAL_FN(void, ap_register_rewrite_mapfunc, (char *name, rewrite_mapfunc_t *func)); -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_cache+mod_rewrite behaviour
On 1/23/07 10:44 AM, "Niklas Edmundsson" <[EMAIL PROTECTED]> wrote: > Ah, you can place CacheEnable-directives in the vhost context too. > Then it should be sufficient, unless you want to say "ignore > querystring for all .gif:s" or something like that. Perhaps use a > regex instead? In our home-grown cache module, the "rules" are actually provider based. We have providers that provide matches based on exact match, string match, regex, prefix, environment variable, and other assorted things. Something like: CacheEnable disk regex=\.gif$ ignore_query CacheEnable disk prefix=/somethingelse foo=bar Cacheenable mem exact=/strange/stuff.asp CacheDisable regex=\.php$ Default "match provider" could be prefix to allow users to still do: CacheEnable disk /cache_me -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Patch for implementing ap_document_root as a hook
On 4/23/07 11:33 AM, "Paul Querna" <[EMAIL PROTECTED]> wrote: > +1, I've been down this road before too. +1 on the concept. Still looking at patch. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
RFC: replace r->subprocess_env was Re: Patch for implementing ap_document_root as a hook
This idea has been rattling around in my head off and on for a while. What is we replaced all the r->subprocess_env with something a little more interesting... General "environment" API: /* "directly" set an env variable. Will always show up in env list */ apr_status_t ap_set_env(request_rec *r, const char *key, const char *val) /* Get the value of an env var. */ const char *ap_get_env(request_rec *r, const char *key And the interesting ones: /* Set a handler for a given key for env variables. Can choose whether or not the key shows up in the list. */ apr_status_t ap_set_env_handler(constchar *key, ap_env_func *func, int show_in_list) /* Return a list of available (exposed) env variables suitable for iteration */ apr_array_header_t *ap_env_list( request_rec *r, const char *key) ap_env_func would be : const char* my_env_handler(request_rec*r, const char* key) This would allow most env variables to be overridden easily. Also, many env variables could be set "lazily," ie, only calculate it when someone actually needs it. A good example of this is when you occasionally use UNIQUE_ID and only want the calculation to be done when you actually need it, not on every single request. The handler could cache it's results if it wanted. We may want a flag to say cache is okay or not. Or do caching in env itself... The actually handler, could actually be a hook. For example, the handler for "DOCUMENT_ROOT" could actually be a wrapper around a hook. Thoughts? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
[PATCH] mod_wombat: add table_get and table_set
Probably not the best way to do this, but adds ability to get/set on r->headers_in and r->headers_out Example usage: function handle(r) r:table_set(r.headers_out, "Lua", "Cool"); val = r:table_get(r.headers_in, "User-Agent"); r:puts("User-Agent: " .. val .. "\n"); End FWW, I had never done any lua until this morning, so I'm sure it can be done better. I'm not a huge fan of just wrapping apr_table_get/set, but wasn't sure if I shoved them into a lua table that I could keep them (the apr table and the lua table) in sync. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies request.diff Description: Binary data
Re: [PATCH] mod_wombat: add table_get and table_set
On 4/27/07 2:34 PM, "Brian McCallister" <[EMAIL PROTECTED]> wrote: > Thoughts? Sounds good to me. Like I said, I just started playing with it this morning :) If you can point me more in the right direction, I can give it a try. I was just scratching an itch. Also, I want to add quick_handler to the mix. If I'm reading it correctly, wombat_handler only uses the stat caching, so I was using the harness stuff basically like this: int wombat_quick_harness(request_rec *r, int lookup) { if(lookup) { return DECLINED; } return wombat_request_rec_hook_harness(r, "quick"); } static const char* register_quick_hook(cmd_parms *cmd, void *_cfg, const char *file, const char *function) { return register_named_file_function_hook("quick", cmd, _cfg, file, function); } I want to be able to cache "forever" Thoughts about that? Doesn't seem ideal the way I was doing it... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: [PATCH] mod_wombat: add table_get and table_set
On 4/27/07 2:34 PM, "Brian McCallister" <[EMAIL PROTECTED]> wrote: > We may want to consider not putting table_set and table_get on the > request, though. It might be better to have a general purpose > userdata type (metatable) for apr_table_t and put the functions > there. This would allow for something like: > > function handle(r) >r.headers_out['Lua'] = 'Cool' >val = r.headers_in['User-Agent'] > end > Here's the patch that does just that. Ugly, I'm sure. I know lua now, and I know C. Still having issues stitching them together... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies request-table.diff Description: Binary data
Re: [PATCH] mod_wombat: add table_get and table_set
Probably more changes than needs to be in one patch: - use hooks for: -- wombat_open - called by create_vm -- wombat_request - called instead of apw_request_push -added apr_lua.c and .h - only handles tables for now. Can be extended to do more in future. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies wombat_hooks.diff Description: Binary data
Re: [PATCH] mod_wombat: add table_get and table_set
On 4/30/07 5:53 PM, "Brian McCallister" <[EMAIL PROTECTED]> wrote: > I would like to maintain a function which is analogous to > lua_pushstring() and lua_pushinteger() for pushing the request_rec > into a function call or whatnot from the C side. > > Will this work with the hook? (I am a hook newb). Sure. The way I have it now, it calls the push function first in the hook. We could move that outside the hook and still have the hook available. > Even though these are static, we might want to be careful in naming > as these are reaching into lua's namespace (lua_* and luaL_*). Sure, we can rename these to apr_lua_table_methods and prefix the methods with "apr_". > Why pass the pool in (other than matching the hook form, but this > isn't invoked via ) I added thepool because I'm sure at some point I will need it and didn't want to rewrite anything that already called it. I know not a good reason... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
[PATCH] mod_wombat: add connection and server to request
Adds ability to get to r.connection and r.server. Also renamed back to apw_request_push per discussion with Brian M. This works now: function quick_handler(r) r.headers_out["Lua"] = "Rulez"; h = r.headers_out val = r.headers_in["User-Agent"]; h["Test"] = "HELP"; h["Browser"] = val; r:puts("User-Agent: " .. val .. "\n"); c = r.connection r:puts("hello " .. c.remote_ip .. "\n"); s = r.server r:puts("Server name: " .. s.server_hostname .. "\n"); if string.find(val, "Firefox") then r.headers_out["Location"] = "http://www.cnn.com";; return apache2.HTTP_MOVED_TEMPORARILY end return apache2.OK end -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies conn-and-server.diff Description: Binary data
Re: [PATCH] mod_wombat: add connection and server to request
On 5/3/07 12:10 PM, "Brian McCallister" <[EMAIL PROTECTED]> wrote: > We should probably figure out how to avoid pushing the various values > into the Apache2.Request metatable -- we are going to need a general > purpose solution sooner rather than later, I think. I think I mostly figured this out: static int server_index(lua_State* L) { server_rec* s = lua_unboxpointer(L, 1); const char* key = luaL_checkstring(L, 2); ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, "server_index: %s", key); if (0 == apr_strnatcmp("server_hostname", key)) { lua_pushstring(L, s->server_hostname); return 1; } return 0; } static const struct luaL_Reg server_methods[] = { {"__index", server_index}, {NULL, NULL} }; void apw_load_request_lmodule(lua_State *L) { luaL_newmetatable(L, "Apache2.Request"); // [metatable] lua_pushvalue(L, -1); lua_setfield(L, -2, "__index"); luaL_register(L, NULL, request_methods); // [metatable] lua_pop(L, 2); luaL_newmetatable(L, "Apache2.Connection"); // [metatable] lua_pushvalue(L, -1); lua_setfield(L, -2, "__index"); luaL_register(L, NULL, connection_methods); // [metatable] lua_pop(L, 2); luaL_newmetatable(L, "Apache2.Server"); lua_pushstring(L, "__index"); lua_pushvalue(L, -2); /* pushes the metatable */ lua_settable(L, -3); /* metatable.__index = metatable */ luaL_openlib(L, NULL, server_methods, 0); lua_pop(L, 2); } And this works in lua: s = r.server r:puts("Server name: " .. s.server_hostname .. "\n"); We could probably wrap the server_index type funtion into something so we don't have to write huge if/else/ statements everytime. Mayme just copy the way Lua does it with LuaL_Reg. Could wrap the whole metatable creation and population thing, I suppose. Maybe do that in apr_lua? static const struct apr_lua_reg server_methods[] = { "blah", get_blah, ... {NULL, NULL} } apr_lua_register("Apache2.Server", server_methods); Would take care of all the details. Would need to be able to get set, maybe pass that in as to whetehr it is __index or __newindex??? (Just thinking out load on the apr_lua stuff...) -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: [PATCH] mod_wombat: add table_get and table_set
On 5/4/07 7:42 PM, "Rici Lake" <[EMAIL PROTECTED]> wrote: > In Lua, setting to nil is equivalent to deletion. So I think this should be: > > if (lua_isnoneornil(L, 3) > apr_table_unset(t, key); > else { > const char *val = luaL_checkstring(L, 3); > apr_table_set(t, key, val); > } +1 > if (val == NULL) > lua_pushnil(L); > else > lua_pushstring(L, val); > +1 > Agreed. Also, it's misleading -- they are APR tables, not Lua tables. Brian M. changed this before committing, I think. >> > This is poor Lua style. You shouldn't assume (or require) the stack to > be empty at the start of the function. Use top-relative (negative) indices > instead -- they are no slower. Until a day before I submitted patch, I had never touched Lua, so I'm sure I have bad style and form :) > Also use lua_{get,set}field for clarity, and luaL_register (luaL_openlib > is deprecated). I had been using the PiL 5.0 for docs and lua-users.org was down most of last week. I got hard copy of 5.1 PiL this weekend, so maybe my style will improve ;) Thanks for all the pointers. I think mod_wombat is almost in a state where I can start actually working on the original problem I needed to solve. Question: what would be the best way to load "other modules" to mod_wombat (not apache modules). For example, I want/need to load the lua pcre stuff, but I don't want to write a new apache module just to hook that into mod_wombat. I am new to Lua, so I am sure there is a better way. Can ping me off list if this seems OT for httpd-dev -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
PATCH: mod_wombat: add default scope and cache to config
This patch adds: LuaDefaultCacheStyle and LuaDefaultScope To set defaults for these. The code needs to be change around so that we check dir_config first, and then revert to default. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies default-config.diff Description: Binary data
PATCH: mod_wombat fix finfo "leak" in vmprep.c load_file
Was always allocating finfo from a pool. Now just do it on stack. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies finfo-leak.diff Description: Binary data
[PATCH] mod_wombat separate out connection and server
Had a couple hours while on vacation after reading PiL. This makes connection, server, and apr_table into real lua "modules." I also separated out the code and started playing with getters and setters. I like the idea of doing the function tables in "plain" C, rather than "Lua" C. Mostly because I understand it more :) Also, it may be faster as it avoids using a bunch of Lua tables to keep track of callbacks and functions. Would be interesting to have a performance bake-off. Haven't found a good way to rework request.c into this arrangement, but I ran out of time. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies bakins-big.diff Description: Binary data
Re: [PATCH] mod_wombat separate out connection and server
On 5/17/07 10:26 PM, "Garrett Rooney" <[EMAIL PROTECTED]> wrote: > I'm not a fan of the way the pools and hash tables are lazily > initialized, as it isn't thread safe and one of the nice things about > mod_wombat is its thread safety. Perhaps something that's initialized > during server startup instead? Yea, I was being "lazy" myself. I will try to hack together some init stuff. > Also the new files all need license headers. K. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: [PATCH] mod_wombat separate out connection and server
Here's an updated one that adds the init stuff and license info. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies conn-server-mod.diff Description: Binary data
Re: [Fwd: Apache httpd vulenrabilities]
On 5/30/07 6:09 PM, "Jim Jagielski" <[EMAIL PROTECTED]> wrote: > "The only issue..." refers to the problems if we try to restructure > the scoreboard instead, which is good for 2.4/3.0 Scoreboard needs an overhaul anyway. So I wouldn't muck with it now. The local pid table sounds fine. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_memcache??
On 7/1/07 10:11 AM, "Frank" <[EMAIL PROTECTED]> wrote: > > I just wonder what has happen to this good idea? Did you start > implementing it? (Today I was thinking about implementing this, coz' I > need it) Never had the time. Project at work went a different direction. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_gzip and incorrect ETag response (Bug #39727)
On 8/27/07 12:34 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > Hasn't the non-compressed variant become an extreme edge-case > by now? I would certainly hope so. > Unfortunately not. About 30% of our requests do not advertise gzip support.. Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_gzip and incorrect ETag response (Bug #39727)
On 8/27/07 1:19 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > You are the CNN guy, right? Sure, why not... > Of your 30 percent... is there an identifiable "User-Agent" > that comprises a visible chunk of the requests? > > If so... what is it? Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) And other slight variations... Same user-agent has gzip in other requests. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Config Bug in proxy_balancer?
This works: ... BalancerMember http://server1:80 route=server1 BalancerMember http://server2:80 route=server2 ProxyPass /path balancer://fill/ stickysession=Sticky This does not: BalancerMember http://server1:80 route=server1 BalancerMember http://server2:80 route=server2 ... ProxyPass /path balancer://fill/ stickysession=Sticky I want to be able to use same balancer in multiple vhosts.
Re: Config Bug in proxy_balancer?
On 3/22/06 11:39 AM, "Sander Temme" <[EMAIL PROTECTED]> wrote: > Looks like something doesn't get inherited. Have a peek at the merge > functions for the structure(s) that affect this behaviour. > > S. Looks like it may be in add_pass where it calls ap_proxy_get_balancer: if (strncasecmp(r, "balancer:", 9) == 0) { proxy_balancer *balancer = ap_proxy_get_balancer(cmd->pool, conf, r); if (!balancer) { const char *err = ap_proxy_add_balancer(&balancer, cmd->pool, conf, r); if (err) return apr_pstrcat(cmd->temp_pool, "ProxyPass ", err, NULL); } get_balancer looks to return Null. So conf is somehow not merged correctly? Should conf->balancers be global, rather than per server? -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: AW: Config Bug in proxy_balancer?
Here's a really simple patch that just makes the balancers global rather than per server... -- Brian Akins Lead Systems Engineer CNN Internet Technologies global_balancers.patch Description: Binary data
RE: CustomLog suggestion
This is more of a users question. Just write a little program (C, perl, python, wahetevr) to be used as a piped logger and have it split the lines as they are logged. Just log the virtual host first (%V, I think) and use that to split it). You only have to have one logger running/open. -Original Message- From: Alain Williams [mailto:[EMAIL PROTECTED] Sent: Mon 5/15/2006 7:05 AM To: dev@httpd.apache.org Subject: CustomLog suggestion I have a suggestion that may be a useful enhancement to CustomLog. Background: the number of virtual hosts that I have is increasing, some of them are for other people - several have several virtual hosts. I want them to have access to the log information that relates to their sites and not be confused with stuff for others. So what I do is to have every virtual host have it's own set of log files, as pointed out this does not scale too far. The suggestion is to log to one place and split the log file daily/... The trouble with this is that it makes debugging individual vhosts slow (wait 'till the log file is split), or I need to split continuously. Suggestion: If a CustomLog directive specifies a log file/pipe that is already open (eg for some other vhost) then it should share/reuse the file descriptor. This would allow separate log files to scale a little further. I tried to do this but find that apache just opens the log file several times. Looking at mod_log_config.c I see that buffered_log has a mutex that controls different threads writing to the same log file (what I thought would be the main issue), what needs to be added is * some reference count as to how often the file is open in config_log_state * logic in ap_default_log_writer_init to check if 'name' is already used Comments ? -- Alain Williams Parliament Hill Computers Ltd. Linux Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer. +44 (0) 787 668 0256 http://www.phcomp.co.uk/ #include <>
Keepalives
Title: Keepalives Here's the problem: If you want to use keepalives, all of you workers (threads/procs/whatever) can become busy just waiting on another request on a keepalive connection. Raising MaxClients does not help. The Event MPM does not seems to really help this situation. It seems to only make each keepalive connection "cheaper." It can still allow all workers to be blocking on keepalives. Short Term solution: This is what we did. We use worker MPM. We wrote a simple modules that keep track of how many keeapalive connections are active. When a threshold is reached, it does not allow anymore keepalives. (Basically sets r->connection->keepalive = AP_CONN_CLOSE). This works for us, but the limit is per process and only works for threaded MPM's. Long Term solution: Keep track of keepalives in the scoreboard (or somewhere else). Allow admin's to set a threshold for keepalives: MaxClients 1024 MaxConcurrentKeepalives 768 Or something like that. Thoughts? I am willing to write the code if this seems desirable. Should this just be another module or in the http core? -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Keepalives
Title: Re: Keepalives On 6/20/05 3:14 PM, "Greg Ames" <[EMAIL PROTECTED]> wrote: > ...so with this setup, I have roughly 3 connections for every worker thread, > including the idle threads. Cool. Maybe I just need the latest version. Or I could have just screwed my test... Anyway, there should be a way to limit how many idle connections to keep around. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Reward SSL and IE
Title: Reward SSL and IE Not the most appropriate forum, but we are willing to pay a reward to someone who can definitively help use with a mod_ssl (Apache 2.0.54) and IE issue. It seems to only affect older versions (5.5 and early 6). We have tried various work arounds from the net but to no avail. Call me at 404-545-6217 to discuss problem, money, etc. This is URGENT. I am not very skilled in ssl (first experience in over 5 years), so I am calling on the experts. Call your friends if they can help... -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Reward SSL and IE
Title: Re: Reward SSL and IE Akins, Brian wrote: > Not the most appropriate forum, but we are willing to pay a reward to > someone who can definitively help use with a mod_ssl (Apache 2.0.54) and IE > issue. It seems to only affect older versions (5.5 and early 6). For reference: [Mon Jun 20 20:23:23 2005] [debug] ssl_engine_io.c(1522): OpenSSL: I/O error, 11 bytes expected to read on BIO#87b01f8 [mem: 880daa8] [Mon Jun 20 20:23:23 2005] [debug] ssl_engine_kernel.c(1813): OpenSSL: Exit: error in SSLv2/v3 read client hello A[Mon Jun 20 20:23:23 2005] [info] (70014)End of file found: SSL handshake interrupted by system [Hint: Stop button pressed in browser?!] Apache config: from standard apache config BrowserMatch "Mozilla/2" nokeepalive BrowserMatch "MSIE 4\.0b2;" nokeepalive downgrade-1.0 force-response-1.0 BrowserMatch "RealPlayer 4\.0" force-response-1.0 BrowserMatch "Java/1\.0" force-response-1.0 BrowserMatch "JDK/1\.0" force-response-1.0 BrowserMatch "Microsoft Data Access Internet Publishing Provider" redirect-carefully BrowserMatch "^WebDrive" redirect-carefully BrowserMatch "^WebDAVFS/1.[012]" redirect-carefully BrowserMatch "^gnome-vfs" redirect-carefully #ssl global options SSLPassPhraseDialog exec:/opt/apache/https-relay/config/https.password SSLSessionCache dbm:/logs/https-relay.ssl_session_cache SSLSessionCacheTimeout 600 SSLMutex sem Listen 443 ServerName x.com SSLEngine on SSLCertificateFile /opt/apache/https-relay/config/xx.crt SSLCertificateChainFile /opt/apache/https-relay/config/intermediate.crt SSLCertificateKeyFile /opt/apache/https-relay/config/x.key SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0 SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Reward SSL and IE
Title: Re: Reward SSL and IE The problem was actual a certificate. The key was generated by a newer version of openssl than we normally use (0.9.6 vs 0.9.7)) and somehow that translated to a cert from Verisign that did not work on Win98 and IE. Strange. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: apache developers documentation!!!
Title: Re: apache developers documentation!!! On 6/21/05 5:29 PM, "Nick Kew" <[EMAIL PROTECTED]> wrote: > (2) http://www.apachecon.com/ - come to our module developers tutorial > and other talks. When will there be another apachecon US? -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: how do i debug a segfault?
On 6/29/05 11:40 AM, "Paul A Houle" <[EMAIL PROTECTED]> wrote: > [Tue Jun 28 14:45:53 2005] [notice] child pid 28182 exit signal > Segmentation fault (11) Sorry if I missed it, which mpm are you using? Basically before you start apache do: ulimit -c unlimited Set CoreDumpDirectory to so directory writable by the web server user (not root). Start apache. You should get a core file. Use GDB to look at it: assuming the core is in tmp. gdb --core=/tmp/core. /path/to/httpd If you are using a threaded mpm, then use the following gdb command to track down the offending thread: thread apply all bt Then switch to that thread: thread -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: how do i debug a segfault?
On 6/29/05 1:39 PM, "Paul A Houle" <[EMAIL PROTECTED]> wrote: > Akins, Brian wrote: > >> Sorry if I missed it, which mpm are you using? >> >> >> > prefork > For prefork, just follow the directions I gave and forget all the "thread" stuff. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
[PATCH] new proxy hook 2.1
This patch adds a new hook (request_status) that gets ran in proxy_handler just before the final return. This gives modules an opportunity to do something based on the proxy status. A couple of examples where this is useful: -You are using a caching module and would rather return stale content rather than an error to the client if the origin is down. -you proxy some subrequests (using SSI - mod_include) and do not want SSI errors when the backend is down. If you would normally return HTTP_BAD_GATEWAY, you may have a module that serves some other content. This feature is one of the features of our in house proxy module that keeps us from moving toward the "stock" 2.1 proxy. -- Brian Akins Lead Systems Engineer CNN Internet Technologies proxy.diff Description: Binary data
Subrequests, keepalives, mod_proxy in 2.1
>From the best I can tell, subrequests do not get the benefits of keepalives in mod_proxy in 2.1. What is the reason for this? -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: mod-cache-requestor plan
On 7/11/05 11:48 PM, "Parin Shah" <[EMAIL PROTECTED]> wrote: should be > refreshed. but, if we consider keeping non-popular but expensive pages in > the cache, in that case houw would te mod-c-requester would make the > decision? > We have been down this road. The way one might solve it is to allow mod_cache to be able to reload an object while serving the "old" one. Example: cache /A for 600 seconds after 500 seconds, request /A with special header (or from special client, etc) and cache does not serve from cache, but rather pretends the cache has expired. do normal refresh stuff. The cache will continue to server /A even though it is refreshing it Make any sense? This way, a simple cron job can be used to refresh desired objects. Also, one of the flaws of mod_disk_cache (at least the version I am looking at) is that it deletes objects before reloading them. It is better for many reasons to only replace them. That's the best way to accomplish what I described above. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: mod-cache-requestor plan
On 7/12/05 10:27 PM, "Parin Shah" <[EMAIL PROTECTED]> wrote: > >> Also, one of the flaws of mod_disk_cache (at least the version I am looking >> at) is that it deletes objects before reloading them. It is better for many >> reasons to only replace them. That's the best way to accomplish what I >> described above. > > If we implement it the way you suggested, then this problem would > automatically be solved. The basic flow of mod_disk_cache should be something like: Determine cache key. Does meta and data exist? yes -> check for expire, serve it, etc No: insert filter, etc In filter, open a deterministic tmp file, not the "random ones like in current. Something like metafile.tmp. When file is opened, try to open it exclusively. That way only one worker is trying to cache file. After caching rename from tmp to real files. Also by using such a temp file scheme allows you to possible be "sloppy" with your expiry times: Does meta file exist? Yes. Is meta file "fresh"? No. Does a tmp file exist? Yes, someone else is "refreshing" it. Is temp file less than x seconds old? Yes. Serve "stale" content. This avoids the "thundering herd" to the backend server/database/whatever handler. Trust me, it works :) -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: mod-cache-requestor plan
On 7/13/05 2:43 PM, "Graham Leggett" <[EMAIL PROTECTED]> wrote: > This was one of the basic design goals of the new cache, but the code > for it was never written. > > It was logged as a bug against the original v1.3 proxy cache, which > suffered from thundering herd when cache entries expired. > > At some point soon when my mailbox is a little less full and I have a > week or two spare, I plan to fix this problem if nobody beats me to it :) Should only take a couple of hours to get a working patch together. Maybe I'll get time this week for mod_disk_cache -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: mod-cache-requestor plan
On 7/13/05 6:36 PM, "Ian Holsman" <[EMAIL PROTECTED]> wrote: > Hi There. > > just remember that this project is Parin's SoC project, and he is > expected to do the code on it. sure. I am expected to do what's best for my employer and the httpd project. > While normally I think it would be great to get a patch, we need parin > to do the work on this, otherwise he might get a bit upset when it comes > to getting paid. He can have the patch. I just want to see it done right. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: mod-cache-requestor plan
On 7/13/05 6:41 PM, "Ian Holsman" <[EMAIL PROTECTED]> wrote: > a pool of threads read the queue and start fetching the content, and > re-filling the cache with fresh responses. > How is this better than simply having an external cron job to fetch the urls? You have total control of throttling there and it doesn't muck up the cache code. A good idea may be to have a "cache store hook" that gets called after a cache object is stored. In it, another module could keep track of cached url's. This list could be feed to the above cron job. I know one big web site that may do it in a similar way... -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: mod-cache-requestor plan
On 7/14/05 9:59 AM, "Ian Holsman" <[EMAIL PROTECTED]> wrote: > > that wouldn't keep track of the popularity of the given url, only when > it is stored. Which would be a useful input to something like htcacheclean so that it does not have to scan directories. > The priority re-fetch would make sure the > popular pages are always in cache, while others are allowed to die at > their expense. So every request for an object would update a counter for that url? I still think this would be better handled as an external process with some "glue" between it and apache (IPC, dbm, shm, etc.). Both approaches have disadvantages. I guess you just have to choose your poison :) -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: [PATCH] mod_disk_cache: change handling of varied contents
On 7/20/05 11:25 AM, "Paul Querna" <[EMAIL PROTECTED]> wrote: > If a URI is varied, all variations will be stored underneath a new > .header.vary/ directory. Looks good for starters. Good idea. I would like a way to override some varies. For example, in a reverse proxy situation, you may vary on "User-Agent" header for gzip stuff. However, you really only care whether the user-agent can do gzip or not, you do not want to store a different version for every browser. May be something like this in config: VaryValue User-Agent gzip Where gzip is an environment value set by SetEnv if or something similar. This may have been discussed before.
Corrupting error log buffer?
Maybe a little off topic, but driving me crazy... It looks like I have somehow corrupted something about the server error log. After Apache has been running for a while, entries in my error log always have the following for example: [Mon Jul 25 13:15:28 2005] [error] [client 85.140.27.54] url_cache_handler: may serve STALE content: 0: /toon/tools/img/jewel.jpg, referer: http://schedule.cartoonnetwork.com/servlet/ScheduleServlet?action=show&showI D=320361&show=Justice%20League&filter=tm But, I am not logging from "referer" on. That shouldn't be there. It looks like somewhere I have a mis-match between the format and number of arguments to ap_log_rerror. Anyone know a good way to track it down? Thanks. --Brian Akins
How long until 2.2
As I sit here debugging our home grown proxy code for 2.0, I wonder how long until 2.2? We wrote our own proxy because the cool 2.1 stuff was not out at the time. The new proxy stuff would be wonderful for us, but noone wants to run alpha code in production. (However, we are quick to run homegrown stuff...) So, please, I be you -- give us 2.2 ;) Thanks.
2.1 proxy and keepalives
Does this code from 2.1 in apr_proxy_http_request still make sense? Do we not want to attempt to maintain the server connection anyway? Maybe I'm missing some other logic... /* strip connection listed hop-by-hop headers from the request */ /* even though in theory a connection: close coming from the client * should not affect the connection to the server, it's unlikely * that subsequent client requests will hit this thread/process, * so we cancel server keepalive if the client does. */ if (ap_proxy_liststr(apr_table_get(r->headers_in, "Connection"), "close")) { p_conn->close++; /* XXX: we are abusing r->headers_in rather than a copy, * give the core output handler a clue the client would * rather just close. */ c->keepalive = AP_CONN_CLOSE; }
[PATCH] mod_cache. Allow override of some vary headers
This patch allows one to override the values of some headers so that they "vary" to the same value. Config Example: #all lines that have gzip set one variable SetEnvIf Accept-Encoding gzip gzip=1 #browsers that have problems with gzip BrowserMatch "MSIE [1-3]" gzip=0 BrowserMatch "MSIE [1-5].*Mac" gzip=0 BrowserMatch "^Mozilla/4\.0[678]" gzip=0 ... CacheOverrideHeader Accept-Encoding gzip CacheOverrideHeader User-Agent gzip This would allow all browsers that send "Accept-Encoding: gzip" and do not match the BrowserMatches to be mapped to the same cache object. All the other variants would point to another object. This would be very useful in reverse proxy caches. Only patched mod_disk_cache, but mod_mem_cache should be trivial. override2.patch Description: Binary data
[PATCH] mod_disk_cache deterministic tempfiles
The current 2.1 mod_disk_cache allows any number of workers to be actively trying to cache the same object. This is because of the use of apr_file_mktemp. This patch makes the tempfiles the same per cache object rather than "random". I basically added a temp_file() that mimics data_file() and header_file(). This way only one thread is trying to cache that object and avoids a "thundering herd." The other threads fail the EXCL flag, and serve like normal. newtemp.patch Description: Binary data
Re: proxy health checks [was: Proxying OPTIONS *]
In "our" proxy, we launch an external helper app the does active health checking of the origin servers. This is a HEAD request on a configurable (per origin "pool") uri (ie, http://host:port/url/blah). When an origin passes/fails a given number of checks it is marked up/down. For example, when an origin passes 2 health checks in a row, it is marked up; when it fails 3 in a row, it is marked down. We check each backend every x seconds (x usually equal to 5). The proxy module and the health checker are linked via a simple array in shared memory. The health checker marks the index of each origin as 1 (up) or 0 (down). I have been contemplating writing a balancer module that implements this, but haven't had the time. With this method, we don't have to check every time, as we assume if the origin is up in the health checker that it is up. This works several million times a day for us :) (Also, we do "true" connection pooling to the origins, but that's another story...) -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Broken URI-unescaping in mod_proxy
On 10/8/07 1:44 PM, "Roy T. Fielding" <[EMAIL PROTECTED]> wrote: > For the millionth time, if that is a problem then separate the proxy > module from the gateway ("reverse proxy") module. They do not belong > together. +1. This would sway me more to go back to the "stock" modules. The "reverse proxy" could be much more aggressive with keep-alives, connection pools, etc. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Proxying subrequests
-1 from me (if that counts.) Using ProxyPass should be fine for 95% of the use cases?? ProxyPass /cnn http://www.cnn.com/ -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Proxying subrequests
On 11/1/07 6:48 PM, "Ian Holsman" <[EMAIL PROTECTED]> wrote: > Akins, Brian wrote: >> -1 from me (if that counts.) >> >> Using ProxyPass should be fine for 95% of the use cases?? >> >> ProxyPass /cnn http://www.cnn.com/ >> >> >> > > yes. > if you: > a. have a static small number of hosts > b. those hosts don't change often > > if either of these 2 conditions aren't met, then proxypass is next to > useless. True. (You could manage with a decent config tool, however.) Couldn't some creative rewrite rules do the same thing: RewriteCond %{IS_SUBREQ} true RewriteRule ^/proxy/(.*) http://$1 [P,L] Note: not sure if that will actually work. I just foresee horrible cross-site type "vulnerabilities" in this. If it is configurable on/off, then I'm -0. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: keepalive connections and exiting MPM processes
On 11/13/07 8:30 AM, "Jeff Trawick" <[EMAIL PROTECTED]> wrote: > * Note that the condition evaluation order is extremely important. > @@ -212,7 +214,8 @@ > && (!apr_table_get(r->subprocess_env, "nokeepalive") > || apr_table_get(r->headers_in, "Via")) > && ((ka_sent = ap_find_token(r->pool, conn, "keep-alive")) > -|| (r->proto_num >= HTTP_VERSION(1,1 { > +|| (r->proto_num >= HTTP_VERSION(1,1))) > +&& !ap_graceful_stop_signalled()) { > int left = r->server->keep_alive_max - r->connection->keepalives; > > r->connection->keepalive = AP_CONN_KEEPALIVE; Looks reasonable. We do so many checks in this function, and it's a pain to modify. I wish this were a hook. I have selfish reasons for this, as I want an easy way to limit the number of keepalives per child (especially in event mpm). It would be so much easier if this were a hook: In event somewhere: static can_http_keepalive(request_rec *r) { if (current_num_of_keepalives + 1 > configured_max) { return NO; /*whatever, not OK or declined...*/ } return OK; } When resources get tight, I'd like to disable keepalives so I don't have all the "freeloading" connections hanging around. (Sometimes every little bit helps...) Also, shouldn't Keepalive become HTTPKeepalive and live in the http_module's config (which doesn't exist, I don't think) rather than is "global" server_rec. Keepalive doesn't make sense for mod_ftp, mod_dns, etc. While I'm thinking about this, shouldn't virtual server selection be a hook rather than the current "core" way... Would make eventually "dynamic" server_rec creation easier. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_serf is in trunk
On 11/13/07 11:28 AM, "Jim Jagielski" <[EMAIL PROTECTED]> wrote: > Agreed that mod_proxy has the potential of joining the ranks > of mod_rewrite and mod_ssl as the Modules Most Likely To Make > One Lose Their Minds And Run Screaming Hysterically Through > The Halls. We found it much easier to write our own proxy rather than try to plug away at mod_proxy... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: memory leak in 2.2.x balancer?
On 11/28/07 3:39 AM, "jean-frederic clere" <[EMAIL PROTECTED]> wrote: > One of the question is should we go on using scoreboard to store the > balancers and workers information No. There were a couple of alternatives discussed a few (?) months ago. I know I showed some "per-module scoreboard" example code. I think others had a couple of ideas as well. This does not belong in the "core" scoreboard. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: memory leak in 2.2.x balancer?
On 11/28/07 11:20 AM, "Jim Jagielski" <[EMAIL PROTECTED]> wrote: >> Agreed. >> However, we should have an httpd api that will allow >> module to register his own shared memory, that will >> be managed and handled like scoreboard with probably >> the 'generation' extension to allow graceful restarts >> with configuration changed. >> > > +1 http://marc.info/?l=apache-httpd-dev&m=115694865205979&w=2 -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
apr_ bndm? was Re: Expression Parser API?
Should we also break out the bndm stuff from mod_include as well? I have used in a few modules. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: memory leak in 2.2.x balancer?
On 11/28/07 1:41 PM, "Mladen Turk" <[EMAIL PROTECTED]> wrote: > > This code is simply a scoreboard callback. > > We were talking of the API that would allow module to > register the 'shared memory intention'. Then the > core will take care of creating/attaching/removing the shared > memory. It will be the separate per-module shared memory > handled the same way, with the same security as the scoreboard is. If you strip away the "per thread" stuff in mod_slotmem, it does that. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Include variables as providers?
I have a need to possible add more variables to mod_include (ie, DATE_LOCAL, DOCUMENT_URI) and also to have the variable act like tables. Something like "$CNNVAR{foo}". I was thinking that we could have get_include_var (and others) to use providers. Mod_include would only provide providers for the "normal" SSI stuff, but it would allow others to extend and/or replace them. Pseudo code: get_include_var(var, ctx) { /*do the regex stuff*/ /*parse out var and key from $foo{bar}, if present*/ if(provider = ap_provider_get(var)) { provider->do_stuff(var, key) /*key may be NULL*/ } else { /*error*/ } } Thoughts? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Include variables as providers?
On 11/28/07 2:54 PM, "Nick Kew" <[EMAIL PROTECTED]> wrote: > as in ap_register_include_handler? No. As in want to have other variables in addition to the environment and the "special" httpd ones (like DATE_LOCAL, DOCUMENT_URI). >> Pseudo code: > > Where would you propose to use it? Like: this is baz stuff Other crap Sorta like this: http://esi-examples.akamai.com/viewsource/geo.html But in ssi. In reality, there needs to be a way to tie "environment values" to functions, rather than jut r->subproccess_env. Or maybe call them something else, but in a way that all modules (rewrite, include, etc.) can access, just like they do "environment." ap_get_env(r, "DOCUMENT_URI", &val); A provider for this would just return r->uri, but ap_get_env(r, "BAKINS", &val) Would call the provider I registered. And this would "just work" in other modules because they would always use ap_get_env (and ap_set_env, if needed). Would be nice if env was more than just key = value, ie, FOO{bar}. However, this would require redoing a significant amount of stuff - modules and config practices. I, currently, just have need of it for ssi, but just wanted to get a discussion going. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: apr_ bndm? was Re: Expression Parser API?
On 11/28/07 8:27 PM, "Nick Kew" <[EMAIL PROTECTED]> wrote: > I haven't looked at that specifically, but I think it likely there's > a good case for it. Are you volunteering to hack up a patch? Sure. Why not. Needs to be against apr trunk and httpd trunk? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: apr_ bndm? was Re: Expression Parser API?
On 11/29/07 8:23 AM, "Nick Kew" <[EMAIL PROTECTED]> wrote: > Oh, er, um ... porting over to APR? That'd need to be raised on > [EMAIL PROTECTED] I read your original post as ap_bndm, meaning httpd core. I can just do ap_bndm. Just to keep it on this list... Any particular place we are sticking util stuff in the tree? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Is async the answer?
This is just some ramblings based on some observations, theories, and tests. Partially "devil's advocate" as well. Most of us seem to have convinced our self that high performance network applications (including web servers) must be asynchronous in order to scale. Is this still valid? For that matter, was it ever? We just ran a large scale test on a busy website (won't mention the name...) and ran about 95% of production traffic on a single server. This was about 30k connections. We set maxclients to 50k. Server did fine, had about 4GB Ram free and 55% cpu idle. This was the full production config, not stripped down or anything, using our cache and proxy and several "stock" modules. Granted these were fairly "beefy" servers, but nothing extraordinary: 2xdual core 2.4 Ghz CPU's with 8GB RAM, normal non-TOE Ethernet (but with checksum's on card). It seems that modern OS's (this was Linux 2.6.something) deal with the "thread overhead" and all the context switches very well. All the stuff mentioned in the "the c10k problem" ( http://www.kegel.com/c10k.html) didn't seem to apply. We could have easily doubled the amount of connections to the server, I think. We were using normal worker MPM with keepalives for this test. The current "stable" event would have helped with idle keepalive threads, but the system didn't seem to care. Response time never increased in any measurable amount. Yes, we are using sendfile, mmap, etc., so zero-copy helps us a lot. So, do we need apache 3 (or whatever it's called) to be fully asynchronous? Is that just us reacting to "the market" trends, ie, lighttpd? All the apache httpd "is bloated and slow" is just plain horse crap. It's not that hard to configure apache to be "fast." C programming is my "hobby," and it's not that hard to write modules that don't do stupid things and kill the performance. The biggest thing we do in our modules is to make trade-offs to avoid locking. Ie, we are happy to "waste" a few MB of RAM with some "per-module scoreboards" than to use per-proc or global locking. Most of our counters are per thread, and we just add them up when someone access the counter (ie, via mod_status). This made a huge difference in our performance. Also, don't get in httpd's way. Let the core handlers handle as much as possible. Just "encourage" them when needed. They are battle tested and improvements made there help everything out if you haven't tried to rewrite it in your own module. Like I said, just some ramblings. We may experiment with the current event MPM some more, but I honestly do not see the huge benefit to moving to a fully async IO architecture. It's very easy to write modules in the current "one thread per request" model. People will screw up the async thing and make it slower anyway, probably. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
On 1/18/08 2:16 PM, "Justin Erenkrantz" <[EMAIL PROTECTED]> wrote: > Speaking for myself, I think writing and using buckets with serf is > more straightforward than our complicated bucket brigade system with > mixed push/pull paradigms. It very well may be. Async may be easy. Except when my db connection blocks.. On stat calls.. Etc. I am by no means defending the buckets! Or anything for that matter... Just some observations. I just no longer buy into the idea that async is somehow inherently superior. It sounds good in theory, but in the "real world" I am just not seeing it. The whole reason I brought this up was to stimulate discussion. I really really would hate for us to spend many months porting everything over to async to discover that it made no positive impact on performance. Worse, it made extending httpd (or "D") much harder. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
On 1/18/08 3:07 PM, "Colm MacCarthaigh" <[EMAIL PROTECTED]> wrote: > That's not even a consideration, > async is really for dynamic content, proxies, and other non-sendfile > content. For dynamic stuff, "X-sendfile" works well. (Just really starting to play with that, liking it so far). The proxy that the LiveJournal folks wrote, I think, copies all the data from the origin server into a file and then uses sendfile to send to the client... Also, we have driven apache as a proxy as far as we have squid... Paul Q and I have been kicking around the idea that even if we go to a completely async core, etc. that modules could mark some hooks as "blocking" and they would run basically how they do today. (One day, Paul, I'll actually think about this more...) Having a request tied to one thread for its lifetime does make some things easier. If the underlying IO is asynchronous and its faster/scalable/fun, then, all the better. I just am not a big fan of the "callback" method that squid uses (or used last time I looked at it). Yes, its doable, but just seems "not quite right" to me. That's just my opinion. I'd like to be able to say, "hey httpd, write this stuff to the client" and it just happen wonderfully fast :) Currently, worker is doing a great job for us. Maybe async would be fine as well, especially if the serf buckets are as easy to use as Justin says. I just don't want us to say "we must be async" with no real reason other than "we must." -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
On 1/18/08 2:20 PM, "Colm MacCarthaigh" <[EMAIL PROTECTED]> wrote: > I think so, in some environments anyway. If you have a server tuned for > high throughput accross large bandwidth-delay product links then you > have the general problem of equal-priority threads sitting around with > quite a lot of large impending writes. Doesn't sendfile (and others) help in that case? Also RAM is cheap, bandwidth isn't :) -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
On 1/18/08 12:18 PM, "Colm MacCarthaigh" <[EMAIL PROTECTED]> wrote: > Hmmm, it depends what you mean by scale really. Async doesn't help a > daemon scale in terms of concurrency or throughput, if anything it might > even impede it, but it certainly can help improve latency and > responsivity greatly. On the whole, it's easy to see how it might make > the end user experience of a very busy server much more pleasant. I also wonder is that has actually been tested or if it's just a "factoid"? >> Response time never increased in any measurable amount. > > I suspect it might though if the scheduler became bound, async would > route the interupts more efficiently. But, I wonder if the scheduler would become bound in a "reasonable" amount of traffic. > discussions on scalability baffling, the reality is that modern hardware > can outscale pretty much any amount of bandwidth you can buy regardless > of the software. Bandwidth generally isn't an issue for us anymore (thanks to gzip). We can still overrun the CPU with small objects requests/responses. On "large" objects (ie, over 16k or so), the CPU is bored when multiple gig interfaces are full. > The scalability wars should really be over, > everyone won - kernel's rule :-) Which is why I hate to see a ton of work go into async core if it actually does very little to help performance (or if it hurts it) and makes writing modules harder. It braindead simple nowadays to write well behaved high performance modules (well, mostly) bcs you rarely worry about threads, reads/writes, etc. Full async programming is just as challenging as handling a ton of threads yourself. My $.02 US worth (which ain't much). -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
On 1/20/08 10:44 AM, "Graham Leggett" <[EMAIL PROTECTED]> wrote: > In terms of space, caches are not infinite in size, but then neither are > the majority of backend websites either. 73GB is pretty big for a reverse proxy cache. And fast SAS drives are pretty cheap. > Sure, but I think the point that Brian was making was that you could > support the kind of large load sizes that are traditionally associated > with event based models using a prefork or worker setup, simply by > making sure you have enough RAM. And to stimulate some conversation. I just don't want us to "buy into" the "async is better" because that's the "trend" in servers nowadays. If async truly is better, then let's us it. Just don't want to do it "just because everyone else is." Also, this test included all sorts of clients (slow, fast, in between). A blocking thread didn't seem to hurt the server. I'm guessing that 48k blocking threads wouldn't hurt it too bad either. Also, I'm going to look at the serf "buckets" when I get time. Story of my life, though, no time... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer
On 1/19/08 6:29 PM, "Davi Arnaut" <[EMAIL PROTECTED]> wrote: > This is true for expensive hardware and very well designed operating > systems and file systems.. and the space is not infinite. It depends on your definition of "expensive." All of our servers are fairly "commodity." The new linux fileserver I built at home is faster than most of ours, and it cost me less that $1k. It's all the "management/redundancy" stuff that makes "real servers" so expensive. I dual dual-core opteron with 8GB RAM is not all that "exotic." -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: My hacked mod_xsendfile
On 1/25/08 3:51 PM, "André Malo" <[EMAIL PROTECTED]> wrote: > I don't recommend doing that as it contains a race condition (the file might > be changed in the meantime). That race is in the default_handler as well, isn't it? It creates a file bucket based on the size of an earlier stat. So, we are setting the content length to what the core handler thinks it is anyway. The bucket is that "long." >From default_handler: ap_set_content_length(r, r->finfo.size); e = apr_bucket_file_create(fd, 0, (apr_size_t)r->finfo.size, r->pool, c->bucket_alloc); The finfo here is from map_to_storage (directory_walk??) which happened before... In theory to avoid this, we could get_file_info on the open handle. That would need to be changed in default_handler. (That's one of the reasons I like my approach, we just rely on the features/bugs of the core. Fix/enhance that and everyone benefits. ) > By backend I meant here the script providing > the x-sendfile header. That's the only place knowing the behaviour of the > file exactly. In theory, if backend gives us an x-sendfile, it shouldn't touch it. Especially if it's a x-sendfile-temp (lighttpd calls this something strange). -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: My hacked mod_xsendfile
On 1/25/08 3:33 PM, "André Malo" <[EMAIL PROTECTED]> wrote: If it should not > be chunked, the backend simply has to provide a content-length along with > the x-sendfile header. Okay, I add " ap_set_content_length(r, sub->finfo.size)" and that fixes it and does not chunk. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: My hacked mod_xsendfile
On 1/25/08 3:33 PM, "André Malo" <[EMAIL PROTECTED]> wrote: > I'm not sure if a filter is semantically the right place. IMHO that smells a > bit problematic. It might be better to I'd rather hack that into a function > similar to ap_internal_redirect and let it call explicitly. That way you'd > need to hack a recognition per backend (but the code is mostly there > anyway), on the other hand you could enable and disable it per backend. A filter was just convenient. It needs to be generic as possible so that, a) people will use it, b) noone has to edit their backend much to support it. X-sendfile is already "in the wild" and supported by a number of backends. I don't have strong feeling as to how it is implemented within httpd, but the subrequest gives a lot of flexibility to it, like using deflate, running SSI on php output, etc. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer?
On 1/24/08 3:14 PM, "Olaf van der Spek" <[EMAIL PROTECTED]> wrote: > Working on making a FastCGI based setup the recommended approach > instead of mod_php is probably more important then async. Actually, > it's a prerequisite. Fastcgi is the "recommended way of using php and httpd 2, AFAIK. Isn't it??? > Having 30k threads still seems like a waste of resource to me though. Not if system is handling the load very well and "needs" 30k threads. My point was that 30k threads did not seem to be a "waste of resources." I doubt an async server would have used a significantly lower amount of resources because worker did not use a significant amount of resources. > What about a hybrid approach? > Async for network IO and other stuff that doesn't require sync calls, > worker threads for other parts? That's kind of what I was thinking after Apachecon US this year. I won't speak for others, but it seemed reasonable to most. However, after doing several real world tests, I just don't honestly see that async will be a huge improvement. Please prove me wrong with real world results. I'd be more than happy to be wrong on this, really. To be honest, I don't have strong feelings either way. I was surprised by my results. I, now, think that completely rewriting the core to be async *may be* a "waste of resources." If it fits nicely into some ideas on reengineering buckets and brigades (ala serf stuff), and does not actually decrease overall performance, then by all means do it. Remember, I'm partially playing devil's advocate as well... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
My hacked mod_xsendfile
I started to play with xsendfile more. I noticed the mod_xsendfile floating around tried to basically replace what the default handler does very well. Basically, my version does a subrequest for the file. This allows things like "Deny from all", etc, to work. This should be more secure, ie, if you set your deny's correctly, you can't "X-Sendfile: /etc/passwd". All in all, it seems more "httpd"-like, to me. It is very rough. I do not understand brigades enough to know why it is chunking every reply in my tests. I have tested with just a normal cgi setting the header. Not well tested. I'd like to see us work toward getting X-sendfile into the normal httpd distribution (along with mod_fcgid...) -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies mod_xsendfile.c Description: Binary data
[PATCH] for bug 20832. Add double stat option
Against trunk. Basically, has a new config "EnableDoubleStat" (I know, horrible name) off by default. If on, will use apr_file_info_get on open file handle. Based on patch in thread: http://issues.apache.org/bugzilla/show_bug.cgi?id=43386 Doesn't apply cleanly against 2.2.8. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies double-stat.diff Description: Binary data
Re: My hacked mod_xsendfile
On 1/28/08 4:35 AM, "Ivan Ristic" <[EMAIL PROTECTED]> wrote: The FastCGI process is likely to be running under a different > account, but here we have a facility that allows that other process to > use the privileges of the Apache user to fetch a file. I can see how > this feature could easily find its way to the list of small tricks > that can be used to compromise a web server installation, one step at > a time. Perhaps. Most of out fastcgi stuff gets executed by httpd, so it has the same privileges. Also php under fastgci has access to everything completely outside httpd, for example. I guess if we choose to include support, but the appropriate security warnings. Also, this approach will use all the normal httpd file access controls rather than just grabbing it "directly." It is also a "first draft" and I'm sure needs work, but I'd like us to push to get xsendfile into core. It's already Apache license, if that helps. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Is async the answer?
On 1/28/08 3:29 PM, "Olaf van der Spek" <[EMAIL PROTECTED]> wrote: > I agree that FastCGI is the better technical solution, I'm just > stating that neither the Apache documentation nor the PHP > documentation seems to state that. Even worse, they hardly document > the FastCGI way at all. The only reason I know is because at Apachecon in Austin (?) the php and httpd guys kissed and made up and said a bunch of stuff about fastcgi in a presentation. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: [PATCH] for bug 20832. Add double stat option
On 1/28/08 12:35 PM, "William A. Rowe, Jr." <[EMAIL PROTECTED]> wrote: > AFAICT, this still does not resolve the problem, it simply picks up the > replacement of the file. Which is what I care about. > It does not address the root 80% of the problem when an open file is > then changed during the transmission. You get what you deserve if you do that, IMO. All of our publishing systems write to temp file, then rename to real name. > For trunk, it's time finally to open the file earlier in the request > cycle and use the fstat/file_info_get for the lifetime of the request. > Thoughts? Maybe. I'm wondering how often we wind-up not using the file. I guess only when we redirect or something in fixups, maybe. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Transfer time in apache
On 1/30/08 11:25 PM, "Niko Wilfritz Sianipar Sianipar" <[EMAIL PROTECTED]> wrote: > Please help me with this problem: > > HOW TO get/know/calculate transfer time of a packet not the entire of a file > (just a packet) that just sent to a client in Apache web server? Register a network filter on the connection and keep track of time there. It's not quit what you want, but it's close. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: store_headers in memcache and diskcache
On 2/6/08 1:35 PM, "Albert Lash" <[EMAIL PROTECTED]> wrote: > A little off topic, but would it make sense to use a ramfs with > mod_disk_cache to get the best performance? On linux, at least, just set cacheroot to something like /dev/shm/cache. Same principle applies for other OS's as well. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: cache - cleaning up mod_memcache and making other caches their live easier
If anyone cares, here's how we do keys and vary in our cache: On store: Generate key: using url and r->args (we can ignore r->args per server, if needed) (http://www.domain.com/index.html?you=me) If(vary) { store the following info in meta file: cache_version_t - ala disk_cache (ours includes expire time) length of key (apr_size_t) including \0 key with \0 length of "array" (apr_size_t) a \0 delimited array of Vary headers regenerate key (basically the original key + vary info: http://www.domain.com/index.html?you=meuser-agentmozillaaccept-encodinggzip ) } Store key in meta file. a normal meta file has this format: cache_version_t (ours includes expire time) length of key (apr_size_t) including \0 key with \0 length of "table" (apr_size_t) a \0 delimited table (key\0value\0key\0value\0) of response headers Note: the reason we use \0 delimited arrays/tables is we read the entire metafile info into memory on serving and then just apr_table_setn on the values. In theory we could mmap the meta files, but we actually found that to be slower. On serving: Generate key: using url and r->args (we can ignore r->args per server, if needed) (http://www.domain.com/index.html?you=me) Open metafile If(vary) { thaw vary array generate new key (vary values may be overridden by env) open new metafile } Thaw headers, etc. So, we only store the headers that we use in vary key calculation. On my TODO list is to make key generation a hook bcs we have apps that would benefit from that. Of course, we have only one provider (disk) and ignore a lot of RFC stuff (although we have made most of that configurable), but our key/vary handling is pretty fast (I spent a lot of time with profilers when writing it). I'm still working on my side to allow us to actually release the code. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/13/08 11:07 AM, "Jim Jagielski" <[EMAIL PROTECTED]> wrote: > I've started looking at adding "ping" support for > mod_proxy_http to complement whats in mod_proxy_ajp... > The idea is to send a simple OPTIONS * to the backend > and hope for a reply. Would it be more useful to have active healthchecking to backend servers? Ie, periodically hit a url on each origin and mark them up/down based on response. Only send traffic to up servers. I think mod_backhand does something similar but much more complex (?). I had started looking to add this to balancer, but I have no time. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/13/08 12:41 PM, "Plüm, Rüdiger, VF-Group" <[EMAIL PROTECTED]> wrote: > If your health checks are smarter and notice that the backend will > fail soon (e.g. because it reached 98% or 99% percent of its capacity) then > this is a different story and can be very useful. Correct. Perhaps a weighted round-robin that is based on response time would be fairly easy to code... (Says the guy with no time.) -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/13/08 12:27 PM, "Plüm, Rüdiger, VF-Group" <[EMAIL PROTECTED]> wrote: > This does not help with race conditions on HTTP keepalive connections. > Nevertheless active healthchecking could be useful. But on a busy site > I guess a real request will notice before the healthcheck that one backend > is down or the frequence of health checks needs to be insane. On a busy site you want to know if a server is down before you send a few thousand requests to it and want to know that it's up as soon as it's available so you can send a few thousand requests to it. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/13/08 12:50 PM, "Jim Jagielski" <[EMAIL PROTECTED]> wrote: > That was the other option as well... some sort of hearbeat > loop which updates worker status. Still, we get into the issue > with how much of "how proxy connects to and communicates with > the backend" to honor or work around. An external process (using serf maybe) would be easy. Just have some of the worker stats in shared memory. The healthchecker writes status to it, and httpd reads from it. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/13/08 1:09 PM, "Jim Jagielski" <[EMAIL PROTECTED]> wrote: > And again, we're basically doubling traffic and adding > overheard (more overhead than AJP's cping/cpong) at which > point I go back into wondering whether this sort of > implementation makes sense at all... So is the main issue we are trying to solve is that we have a keepalive connection to an origin but it does away (closes) before we make or next request and this causes an "error." (Sorry if I'm behind on the thread.) Shouldn't the HTTP client handle all of this? I know that libcurl handles this situations very well - it just tries to reconnect. How does serf handle it? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/13/08 2:10 PM, "Jim Jagielski" <[EMAIL PROTECTED]> wrote: > The latter is relatively easy to do with the current > impl... Maybe I'll drop the ping idea and work on this ;) +1 ;) -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/13/08 6:01 PM, "Graham Leggett" <[EMAIL PROTECTED]> wrote: > Is there anything stopping us going the multicasting route, say by > adding a hook or hooks of some kind to proxy that keeps track of known > server states? Multicasting doesn't work well for us, for example, because servers are spread across several different vlan's that explicitly don't allow multicast between them. If it were "hookable" or used providers, that would be fine. I think sticky sessions should be hookable/provider as well. Someone could write a spread based module for origin status (or mysql, or memcache, or...) if the interface was well defined and "clean." The way balancer is so hooked into proxy makes it hard to write a replacement without hacking "core" proxy. In proxy it could be as simple a call as: apr_status_t ap_get_origin_server(request_rec *r, proxy_origin_t **o) And that figures out all the balancer and sticky stuff. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/14/08 6:44 AM, "Plüm, Rüdiger, VF-Group" <[EMAIL PROTECTED]> wrote: > 1. We currently have no mechanism in place that "simulates" these kind of >failures we experience ourselves with the backend for the client. Returning >a 500 or 503 does not cause the client to repeat the request. IMHO we would >need to cut off the network connection without sending anything to trigger >this behaviour in a well designed client. Hrm.. Seems like the HTTP client should "just handle" this case. > 2. Clients are only allowed to resend the request automatically, if the > request >was idempotent. Clients are not allowed to do so with non-idempotent > requests >like POST's without user intervention. So by probing the keepalive > connection >before sending the request we want to reduce these cases. >From real world experience, I can say that we have rarely every had an issue with POSTS. The active health-checking we do is based on how our load balancers do it. They (the load balancers) can occasionally send requests to a down server for a few seconds if it goes down in between healthchecks. Sound bad in theory, but in reality, it has never been a real issue. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/14/08 9:54 AM, "Graham Leggett" <[EMAIL PROTECTED]> wrote: > In theory, you should be able to stack the providers, so that a balancer > module could return the list of servers to try in the right order, and > then another module could further reduce that list down to servers that > are actually up. Yeah that was my thought. I guess you pass around the array of servers. Just remove (or mark as N/A) from the list and/or reorder it. At the end, core proxy picks whatever is in index 0 (possibly walking the list in case of connection error or something). -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/15/08 8:13 AM, "Plüm, Rüdiger, VF-Group" <[EMAIL PROTECTED]> wrote: > Any specific reason why we need to add an hook here and why this cannot be > done by the existing provider (interface). I am scared of adding another > level of indirection here if it is not really needed and things can be already > done with the existing infrastructure. I like hooks bcs providers are "one-shot." I use both, but find my self using hooks more and more. A good example is the discussion around having stacked providers in mod_cache. If it were a hook, you'd already have that... Providers are good when you will have one, and only one, "thing" that needs to munge/manipulate/compute like database stuff. With the proxy stuff, it looks like we want "n" things to be able to manipulate the data. Once you make the leap from 1 to 2, might as well make it a hook. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: ping for http in mod_proxy
On 2/15/08 11:03 AM, "Plüm, Rüdiger, VF-Group" <[EMAIL PROTECTED]> wrote: > My main point is that I want to avoid > using both hook and provider if not really needed, as it Agreed. Was just stating my preference. As long as it's "easy" to use, I have no strong feelings either way. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_wombat for Google Summer of Code
On 3/1/08 9:41 PM, "Matthew M. Burke" <[EMAIL PROTECTED]> wrote: > If any of you use > it and have any thoughts (heck, even if you don't use it, but do have > thoughts) please send them to me and I'll compile a list and put > together a project description. Sounds good. I'll get my list together. Want it here or in personal e-mail? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_wombat for Google Summer of Code
Okay my list: -Ability for other modules to call lua without worrying about caching, compiling, etc. ie something like: APR_DECLARE_OPTIONAL_FN(apr_status_t, lua_request_register, (apr_pool_t *pool, lua_request_t **new, const char *file)); APR_DECLARE_OPTIONAL_FN(apr_status_t, lua_request_run, (request_rec * r, lua_request_t *run, const char *function, apr_status_t *rc)); In theory, this generalized "framework" would be used by mod_wombat internally. -Per thread vm's. An httpd thread has a dedicated lua vm for each file. (We do these two things^^. Brian M. has our very hacked version of mod_wombat. I can post it here if there is interest) Support for in httpd.conf Lua. Ie, I want to be able to do "simple" lua in the config rather than having to use a separate file. require 'string' require 'apache2' function simple_redirect(r) if string.match(r.headers_in['User-Agent'], 'mozilla') then r.headers_out['Location'] = 'http://somecoolmozillasite.com/stuff' return apache2.HTTP_MOVED_TEMPORARILY end end LuaHook fixups my_cool_code:simple_redirect (Bad example, but you get the general idea) Ability to make "real" modules in lua: function register_hooks(p) apache2.hook_handler(my_handler, NULL, NULL, apache2.HOOK_MIDDLE) End apache2.module( apache2.STANDARD20_MODULE_STUFF, create_dir_cfg, merge_dir_cfg, create_svr_cfg, merge_svr_cfg, cmd_table, register_hooks ) The modules loading is place I see this being tricky. For the truly ambitious, I'd like to be able to configure apache via lua: httpd.load_module('mime_module', 'modules/mod_mime.so') v = apache2.server.new() v:document_root = '/opt/apache/htdocs' mod_mime:types_config = '/etc/mime.types' mod_env.set_env('X-Lua', 'is cool') Not valid, but something like that... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Adding stickysession cookie on the proxy
On 3/5/08 3:12 PM, "Jani M." <[EMAIL PROTECTED]> wrote: > To start with, am I correct in assuming that others might find use for > this feature? We do this in our "homegrown" proxy. One less thing for the java guys to think about... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_disk_cache and atimes
On 3/25/08 4:37 AM, "Dirk-Willem van Gulik" <[EMAIL PROTECTED]> wrote: > No - it does not; so you get that speed increase (which is very > noticable on a swapspace/ram-disk*). I'd like to see some numbers on that. I just did a quick test on Linux and saw no real improvement (testing our hacked-to-heck version). -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_disk_cache and atimes
On 3/25/08 12:54 PM, "Dirk-Willem van Gulik" <[EMAIL PROTECTED]> wrote: > Though if anyone can find me some more serious hardware* - I'd love to > do this proper; as I am struggling getting a fixed mod_memm_cache and > mod_memcached_cache to be taxed hard enough to actually measure/ > profile sensibly Use really small files so you won't fill up pipe. Using 1 1x1 gif, I run out of CPU before I run out of bandwidth. My cache size is smaller and is in /dev/shm. And yes, I have "more serious hardware" ;) -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Dynamic configuration for the hackathon?
On 3/26/08 9:06 AM, "Nick Kew" <[EMAIL PROTECTED]> wrote: > There seems to be a demand for dynamic per-request configuration, > as evidenced by the number of users hacking it with mod_rewrite, > and the other very limited tools available. Modern mod_rewrite > usage commonly looks like programming, but it's not designed as > a programming language. Result: confused and frustrated users. This is what I had in mind when I suggested having blocks of code. No need to invent a new language when a perfectly fine one exists... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Dynamic configuration for the hackathon?
On 3/26/08 9:53 AM, "Nick Kew" <[EMAIL PROTECTED]> wrote: > I'm not talking about inventing a new language. Those who want one > have some options already, as noted below ... Right. I was just "throwing it out there," so to speak. I'm not opposed to what you are saying, just wondering if we would/should take it to the next level. As to your suggestion: So basically, the per_dir merge would use this mechanism instead of what it does now (file walk, location walk)> (or in addition to??) Something like: SetEnv coolstuff Set something different foo bar something completely different (Horrible, example I know). If it were easy to extend the expresions (ie, I want to implement (Cache == yes/no) and stuff like ENV{key} were made to work, I'm all for it. It *should* be fairly easy to test this out with the current system (ala Proxy blocks). -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies