omit to loop-forever processing some regex acls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 - -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJLYaAJiSUUSaRdSURCNBMAJ90xJm8VjlLJuubuxqi2drt8plR7QCdHXDs zBhdg5Gf8JScY8BdXqMZf8I= =Kd5i -END PGP SIGNATURE- --- HttpHeaderTools.c.orig 2008-11-07 17:00:20.0 -0500 +++ HttpHeaderTools.c 2008-11-07 17:52:14.0 -0500 @@ -246,6 +246,7 @@ " ?,\t\r\n" }; int quoted = 0; + delim[0][1] = del; delim[2][1] = del; assert(str && item && pos); @@ -258,6 +259,7 @@ *pos += strspn(*pos, delim[2]); *item = *pos; /* remember item's start */ + /* find next delimiter */ do { *pos += strcspn(*pos, delim[quoted]); @@ -265,13 +267,15 @@ break; if (**pos == '"') { quoted = !quoted; - *pos += 1; + goto advance; } if (quoted && **pos == '\\') { *pos += 1; - if (**pos) - *pos += 1; + goto advance; } +advance: + if (**pos) + (*pos)++; } while (**pos); len = *pos - *item;/* *pos points to del or '\0' */ /* rtrim */
fixup for URI decoration with port when not wanted
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Affects store keys and cache peering lookups. Matt - -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJLYbwJiSUUSaRdSURCD8oAJ4m5aa3dY95qVsNN4ociDuI2375EgCeOANb HAtw5ccxyMiICF/ShN0bg3Q= =6WMf -END PGP SIGNATURE- diff --git a/src/cf.data.pre b/src/cf.data.pre index b9dc4c9..7d19ba4 100644 --- a/src/cf.data.pre +++ b/src/cf.data.pre @@ -3921,6 +3921,16 @@ DOC_START sporadically hang or never complete requests set this to on. DOC_END +NAME: httpd_accel_no_append_port +COMMENT: on|off +TYPE: onoff +DEFAULT: off +LOC: Config.onoff.accel_no_append_port +DOC_START + Do not append the accelerator port to request URI. This + is intended for clustered accelerator setups. +DOC_END + COMMENT_START DELAY POOL PARAMETERS - diff --git a/src/client_side.c b/src/client_side.c index 23c4274..09899c9 100644 --- a/src/client_side.c +++ b/src/client_side.c @@ -3842,9 +3842,13 @@ parseHttpRequest(ConnStateData * conn, HttpMsgBuf * hmsg, method_t * method_p, i if (strchr(host, ':')) snprintf(http->uri, url_sz, "%s://%s%s", conn->port->protocol, host, url); - else + else if(Config.onoff.accel_no_append_port) { + snprintf(http->uri, url_sz, "%s://%s%s", + conn->port->protocol, host, url); + } else { snprintf(http->uri, url_sz, "%s://%s:%d%s", - conn->port->protocol, host, port, url); + conn->port->protocol, host, port, url); + } debug(33, 5) ("VHOST REWRITE: '%s'\n", http->uri); } else if (internalCheck(url)) { goto internal; diff --git a/src/structs.h b/src/structs.h index 12652ab..33c7185 100644 --- a/src/structs.h +++ b/src/structs.h @@ -688,6 +688,7 @@ struct _SquidConfig { int collapsed_forwarding; int relaxed_header_parser; int accel_no_pmtu_disc; + int accel_no_append_port; int global_internal_static; int httpd_suppress_version_string; int via;
Re: Rv: Why not BerkeleyDB based object store?
Just a tangental thought; has there been any investigation into reducing the amount of write traffic with the existing stores? E.g., establishing a floor for reference count; if it doesn't have n refs, don't write to disk? This will impact hit rate, of course, but may mitigate in situations where disk caching is desirable, but writing is the bottleneck... On 26/11/2008, at 9:14 AM, Kinkie wrote: On Tue, Nov 25, 2008 at 10:23 PM, Pablo Rosatti <[EMAIL PROTECTED]> wrote: Amazon uses BerkeleyDB for several critical parts of its website. The Chicago Mercatile Exchange uses BerkeleyDB for backup and recovery of its trading database. And Google uses BerkeleyDB to process Gmail and Google user accounts. Are you sure BerkeleyDB is not a good idea to replace the Squid filesystems even COSS? Squid3 uses a modular storage backend system, so you're more than welcome to try to code it up and see how it compares. Generally speaking, the needs of a data cache such as squid are very different from those of a general-purpose backend storage. Among the other key differences: - the data in the cache has little or no value. it's important to know whether a file was corrupted, but it can always be thrown away and fetched from the origin server at a relatively low cost - workload is mostly writes a well-tuned forward proxy will have a hit-rate of roughly 30%, which means 3 writes for every read on average - data is stored in incremental chunks Given these characteristics, a long list of mechanisms database-like systems have such as journaling, transactions etc. are a waste of resources. COSS is explicitly designed to handle a workload of this kind. I would not trust any valuable data to it, but it's about as fast as it gets for a cache. IMHO BDB might be much more useful as a metadata storage engine, as those have a very different access pattern than a general-purpose cache store. But if I had any time to devote to this, my priority would be in bringing 3.HEAD COSS up to speed with the work Adrian has done in 2. -- /kinkie -- Mark Nottingham [EMAIL PROTECTED]
Associating accesses with cache.log entries
I've been playing around with associating specific requests with the debug output they generate, with a simple patch to _db_print along these lines: if (Config.Log.accesslogs && Config.Log.accesslogs->logfile) { seqnum = LOGFILE_SEQNO(Config.Log.accesslogs->logfile); } snprintf(f, BUFSIZ, "%s %i| %s", debugLogTime(squid_curtime), seqnum, format); This leverages the sequence number that's available in custom access logs (%sn). It's really useful for debugging requests that are causing problems, etc; rather than having to correlate times and URLs, you can just correlate sequence numbers. It also makes it possible to automate debug output (which is the direction I want to take this in). beyond the obvious cleanup that needs to happen (e.g., outputting '-' or blank instead of 0 if there isn't an access log line associated, a few questions; * How do people feel about putting this in cache.log all the time? I don't think it'll break any scripts (there aren't many, and those that are tend to grep for specific phrases, rather than do an actual parse, AFAICT). Is the placement above appropriate? * The sequence number mechanism doesn't guarantee uniqueness in the log file; if squid is started between rotates, it will reset the counters. Has fixing this been discussed? * Is it reasonable to hardcode this to associate the numbers with the first configured access_log? * To make this really useful, it would be necessary to be able to trigger debug_options (or just all debugging) based upon an ACL match. However, this looks like it would require changing how debug is #defined. Any comments on this? Cheers, -- Mark Nottingham [EMAIL PROTECTED]
Re: Associating accesses with cache.log entries
On Thu, Nov 27, 2008 at 4:21 AM, Mark Nottingham <[EMAIL PROTECTED]> wrote: > I've been playing around with associating specific requests with the debug > output they generate, with a simple patch to _db_print along these lines: > >if (Config.Log.accesslogs && Config.Log.accesslogs->logfile) { > seqnum = LOGFILE_SEQNO(Config.Log.accesslogs->logfile); >} >snprintf(f, BUFSIZ, "%s %i| %s", >debugLogTime(squid_curtime), >seqnum, >format); > > This leverages the sequence number that's available in custom access logs > (%sn). > > It's really useful for debugging requests that are causing problems, etc; > rather than having to correlate times and URLs, you can just correlate > sequence numbers. It also makes it possible to automate debug output (which > is the direction I want to take this in). Looks interesting to me. > beyond the obvious cleanup that needs to happen (e.g., outputting '-' or > blank instead of 0 if there isn't an access log line associated, a few > questions; > > * How do people feel about putting this in cache.log all the time? I don't > think it'll break any scripts (there aren't many, and those that are tend to > grep for specific phrases, rather than do an actual parse, AFAICT). Is the > placement above appropriate? I'd avoid the | character, but apart from that it makes sense to me > * The sequence number mechanism doesn't guarantee uniqueness in the log > file; if squid is started between rotates, it will reset the counters. Has > fixing this been discussed? I don't think that uniqueness has much value, correlating seqnum with the timestamp will address any uncertain cases. > * Is it reasonable to hardcode this to associate the numbers with the first > configured access_log? > > * To make this really useful, it would be necessary to be able to trigger > debug_options (or just all debugging) based upon an ACL match. However, this > looks like it would require changing how debug is #defined. Any comments on > this? YES! It's something I've been thinking about for some time. Count me in. -- /kinkie
Re: Associating accesses with cache.log entries
I like the idea too. 2008/11/27 Kinkie <[EMAIL PROTECTED]>: > On Thu, Nov 27, 2008 at 4:21 AM, Mark Nottingham <[EMAIL PROTECTED]> wrote: >> I've been playing around with associating specific requests with the debug >> output they generate, with a simple patch to _db_print along these lines: >> >>if (Config.Log.accesslogs && Config.Log.accesslogs->logfile) { >> seqnum = LOGFILE_SEQNO(Config.Log.accesslogs->logfile); >>} >>snprintf(f, BUFSIZ, "%s %i| %s", >>debugLogTime(squid_curtime), >>seqnum, >>format); >> >> This leverages the sequence number that's available in custom access logs >> (%sn). >> >> It's really useful for debugging requests that are causing problems, etc; >> rather than having to correlate times and URLs, you can just correlate >> sequence numbers. It also makes it possible to automate debug output (which >> is the direction I want to take this in). > > Looks interesting to me. > >> beyond the obvious cleanup that needs to happen (e.g., outputting '-' or >> blank instead of 0 if there isn't an access log line associated, a few >> questions; >> >> * How do people feel about putting this in cache.log all the time? I don't >> think it'll break any scripts (there aren't many, and those that are tend to >> grep for specific phrases, rather than do an actual parse, AFAICT). Is the >> placement above appropriate? > > I'd avoid the | character, but apart from that it makes sense to me > >> * The sequence number mechanism doesn't guarantee uniqueness in the log >> file; if squid is started between rotates, it will reset the counters. Has >> fixing this been discussed? > > I don't think that uniqueness has much value, correlating seqnum with > the timestamp will address any uncertain cases. > >> * Is it reasonable to hardcode this to associate the numbers with the first >> configured access_log? >> >> * To make this really useful, it would be necessary to be able to trigger >> debug_options (or just all debugging) based upon an ACL match. However, this >> looks like it would require changing how debug is #defined. Any comments on >> this? > > YES! It's something I've been thinking about for some time. > Count me in. > > -- >/kinkie > >
Re: Rv: Why not BerkeleyDB based object store?
I thought about it a while ago but i'm just out of time to be honest. Writing objects to disk only if they're popular or you need the RAM to handle concurrent accesses for large objects for some reason would probably way way improve disk performance as the amount of writing would drop drastically. Sponsorship for investigating and developing this is gladly accepted :) Adrian 2008/11/26 Mark Nottingham <[EMAIL PROTECTED]>: > Just a tangental thought; has there been any investigation into reducing the > amount of write traffic with the existing stores? > > E.g., establishing a floor for reference count; if it doesn't have n refs, > don't write to disk? This will impact hit rate, of course, but may mitigate > in situations where disk caching is desirable, but writing is the > bottleneck... > > > On 26/11/2008, at 9:14 AM, Kinkie wrote: > >> On Tue, Nov 25, 2008 at 10:23 PM, Pablo Rosatti >> <[EMAIL PROTECTED]> wrote: >>> >>> Amazon uses BerkeleyDB for several critical parts of its website. The >>> Chicago Mercatile Exchange uses BerkeleyDB for backup and recovery of its >>> trading database. And Google uses BerkeleyDB to process Gmail and Google >>> user accounts. Are you sure BerkeleyDB is not a good idea to replace the >>> Squid filesystems even COSS? >> >> Squid3 uses a modular storage backend system, so you're more than >> welcome to try to code it up and see how it compares. >> Generally speaking, the needs of a data cache such as squid are very >> different from those of a general-purpose backend storage. >> Among the other key differences: >> - the data in the cache has little or no value. >> it's important to know whether a file was corrupted, but it can >> always be thrown away and fetched from the origin server at a >> relatively low cost >> - workload is mostly writes >> a well-tuned forward proxy will have a hit-rate of roughly 30%, >> which means 3 writes for every read on average >> - data is stored in incremental chunks >> >> Given these characteristics, a long list of mechanisms database-like >> systems have such as journaling, transactions etc. are a waste of >> resources. >> COSS is explicitly designed to handle a workload of this kind. I would >> not trust any valuable data to it, but it's about as fast as it gets >> for a cache. >> >> IMHO BDB might be much more useful as a metadata storage engine, as >> those have a very different access pattern than a general-purpose >> cache store. >> But if I had any time to devote to this, my priority would be in >> bringing 3.HEAD COSS up to speed with the work Adrian has done in 2. >> >> -- >> /kinkie > > -- > Mark Nottingham [EMAIL PROTECTED] > > >
Re: omit to loop-forever processing some regex acls
G'day! If these are patches against Squid-2 then please put them into the Squid bugzilla so we don't lose them. There's a different process for Squid-3 submissions. Thanks! Adrian 2008/11/26 Matt Benjamin <[EMAIL PROTECTED]>: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > > > > - -- > > Matt Benjamin > > The Linux Box > 206 South Fifth Ave. Suite 150 > Ann Arbor, MI 48104 > > http://linuxbox.com > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309 > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.7 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFJLYaAJiSUUSaRdSURCNBMAJ90xJm8VjlLJuubuxqi2drt8plR7QCdHXDs > zBhdg5Gf8JScY8BdXqMZf8I= > =Kd5i > -END PGP SIGNATURE- >