Fw:
Hi! News: http://signlanguageforbaby.com/qyib/page.php toki...@aol.com
Re: canned deflate conf in manual -- time to drop the NS4/vary?
Mark Nottingham wrote... On 02/06/2010, at 9:00 AM, toki...@aol.com wrote: Sergey wrote... That's new to me that browsers don't cache stuff that has Vary only on Accept-Encoding - can you post some statistics or describe the test you ran? Test results and statistics... Apache DEV forum... http://www.pubbs.net/200908/httpd/55434-modcache-moddeflate-and-vary-user-agent.html I don't see anything there but anecdotal evidence I think you need to do a reboot on your definition of 'anecdotal'. The thread above was a focused discussion about what ACTUALLY happens if you try to 'Vary:' on 'User-Agent' in the real world these days accompanied by some additional (relevant) information about what COULD (actually) happen if you (alternatively) try to 'Vary:' on 'Accept-encoding:'. If you still think any of it 'lacks veracity' and is 'not trustworthy' then my only suggestion would be to spend a little time on Google or Bling. It's an ongoing 'story'. certainly no reproducible tests. What sort of tests would you like to see? Anyone with access to certain browsers can 'reproduce' the reported results. apache-modgzip forum... http://marc.info/?l=apache-modgzipm=103958533520502w=2 Seven and a half years old, Yea. Wow. Boggles the mind that it's still relevant, doesn't it? and again anecdotal. See above regarding use/misuse of the word 'anecdotal'. The tests (described in the link) were done using a kernel debugger and a lot of those (unpatched) browsers are still in use TODAY. I've heard kernel debuggers called a lot of things but 'anecdotal' is not on the list. Etc, etc. Lots of discussion about this has taken place over on the SQUID forums as well. Yes; most of it in the past few years surrounding the ETag bug in Apache, not browser bugs. Regards, Mark Nottingham The 2.5 release of SQUID ( Early 2004 ) was the very FIRST version of that Proxy Server that made any attempt to handle 'Vary:' headers at all. Prior to that, they were just doing the same thing all the browsers would. If a 'Vary:' header of ANY description arrived in the stream, it was simply treated as if it was 'Vary: *' ( STAR ) and there was no attempt to cache it at all. There was a huge discussion about ALL of this in late December of 2003 over in SQUID land as they were trying to get 2.5 out the door. I believe, at that time, it was Robert Collins who got 'tagged' to do the 'Vary:' part and Henrik Nordstrom took the whole 'ETag' part on his shoulders. If you Google 'Vary Accept-Encoding Browsers SQUID' but also include Robert Collins name you'll find more than if you use 'Henrik's' name since he was ultra-focused on the ETag thing. ( He still is ). Regardless, they were both VERY MUCH interested in the 'Browser bugs' surrounding all of this since they both realized that SQUID was going to 'take the heat' if/when the whole 'Vary:' scheme came alive and things suddenly got weird 'on the last mile'. In the end, they did a good job implementing 'Vary:' in SQUID 2.5 but it really has been an ongoing 'adventure' that continues to this very day. Only about 12 months ago one of the SQUID User's forum lit up with another 'discovered' problem surrounding all this 'Vary:' stuff and this had to do with non-compliance on the actual 'Accept-Encoding:' fields themselves coming from Browsers/User-Agents. ( Browser BUGS ). In some cases the newly discovered problem reflects the same nightmare seen TODAY with the out-of-control use of 'User-Agent'. Too many variants being generated. Squid User's Forum... http://www.pubbs.net/200904/squid/57482-re-squid-users-strange-problem-regarding-accept-encoding-and-compression-regex-anyone.html Here's just a sampling of what was being shown from REAL WORLD Server logs just 12 months ago... Accept-Encoding: , FFF Accept-Encoding: mzip, meflate Accept-Encoding: identity, deflate, gzip Accept-Encoding: gzip;q=3D1.0, deflate;q=3D0.8, chunked;q=3D0.6, identity;q=3D0.4, *;q=3D0 Accept-Encoding: gzip, deflate, x-gzip, identity; q=3D0.9 Accept-Encoding: gzip,deflate,bzip2 Accept-Encoding: ndeflate Accept-Encoding: x-gzip, gzip Accept-Encoding: gzip,identity Accept-Encoding: gzip, deflate, compress;q=3D0.9 Accept-Encoding: gzip,deflate,X.509 Yada, yada, yada... To this day... not even Firefox and MSIE 7 'do the same thing' with regards to this header. Though SEMANTICALLY identical... the following is STILL causing some problems for people that weren't tickety-boo with their parsing code... Firefox sends this... Accept-Encoding: gzip,deflate MSIE sends this... Accept-Encoding: gzip, deflate That one even bit the SQUID folks in the butt for a couple of revisions and they are STILL trying to arrive at the best 'normalization' parsing for this sort of thing. Here's a thread from less than 60 days ago detailing this 'How do we normalize this Accept-Encoding stuff?' issue with SQUID...
Re: mod_deflate handling of empty initial brigade
Paul Fee wrote... Bryan McQuade wrote: Are there any cases where it's important for ap_pass_bridgade to pass on an empty brigade? Doesn't sound like it, but since this is a core library change I want to double check. When handling a CONNECT request, the response will have no body. In mod_proxy, the CONNECT handler currently skips most filters and writes via the connection filters. However there is a block of #if 0 code which intends to send only a FLUSH bucket down the filter chain. That's not quite the case of an entirely empty brigade, but it seems close enough to warrant highlighting. Thanks, Paul Can't think of anything else that might suffer from the sudden halt of passing empty brigades down the chain ( unless some 3rd party module using tandem-work filters is using them as 'place-markers' and actually EXPECTS them to show up at some point ) but there MIGHT be at least 2 other considerations... 1. If 'ap_pass_brigade()' becomes hard-wired to simply always return SUCCESS when it sees an empty brigade and it NEVER goes down the chain anymore... then is there ANY possibility this could become a 'lost brigade'? When would it ever get reused/released if it's not even making it down to the core filter set anymore? 2. It doesn't 'feel' right that 'ap_pass_brigade()' should always return SUCCESS when it starts throwing the empty brigades on the floor and does NOT send them down the chain. That seems 'misleading'. The call did NOT actually 'SUCCEED' because ap_pass_brigade() did NOT 'pass the brigade'. Should there be some other ERROR/WARNING level return code instead of always tossing back 'SUCCESS'? Might help with debugging later on or give module writers more options for writing their own 'safety catch' return code check(s) when calling ap_pass_brigade(). Yours Kevin Kiley
Re: Fast by default (FWIW - Some tests)
Bryan McQuade wrote... thanks! it is really great that you did this investigation. You're welcome, but I wouldn't really call that an 'investigation'. More like just a quick 'observation'. RE: checking to see if in cache, try typing the URL into the nav bar and hitting enter rather than reloading. Tried that ( with Safari ). No conditional GET was sent. Behavior was the same as pressing the 'Refresh' Toolbar button. The entire document was 'Reloaded' and not 'Refreshed'. OT: If this discussion is going to continue let's agree that there IS, in fact, a difference between saying 'Reload' and 'Refresh'. On most browsers, the Toolbar button is SUPPOSED to behave as a 'Refresh' option. A browser is supposed to check its local cache and issue a 'conditional GET' request if it has a non-expired copy of the entity onboard. A RELOAD is when this local-cache-check process is 'skipped' and a browser simply disregards the content of its cache and RELOADS the page with no conditional GET request. CTRL-R is the industry standard browser RELOAD command. Works the same for both the MSIE and Mozilla/Firefox browser lineage. On Apple/Safari it's COMMAND-R. Interesting side note: Official documentation for Safari actually says the Toolbar Button is SUPPOSED to be the 'Refresh' option ( NOT RELOAD ), just like other browsers, and that pressing SHIFT-Refresh is the 'official' way to force a page to RELOAD instead of REFRESHING. If that is actually the case then the quick Safari test I did really would seem to indicate that the response with the 'Vary: Accept-Encoding' header was NOT CACHED. most browsers use a more aggressive reload algorithm (bypassing the cache for hte html) on reload. Of course. See above about the established standards and the difference between 'Refresh' and 'Reload'. also could you set an explicit resource expiration? otherwise you're operating at the whim of caching heuristics which aren't explicitly specified by the HTTP RFC. That is exactly why I did NOT add any resource control directives. The point of the test(s) was to observe the DEFAULT caching behavior. Try setting an Expires or Cache-Control: max-age header on your response. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.2.4 for info on heuristics. See above. I'm perfectly familiar with the directives. NOT using any of them is/was part of the testing. In my tests most browsers implement the 10% rule but not all. YOUR tests? Exactly what tests are you referring to? Are you saying you already have some detailed caching tests for X number of browsers? Do you have any 'tests' of your own that involve 'Vary:' headers and local caching behavior? Can you share any of those results or would that violate Google's information sharing policy? If you have time can you also test with safari4? Perhaps there was an issue in 3.x that was fixed in 4.x. I am not an Apple person. I do not personally own any Apple hardware. The MacBook I used for a test is a loaner from a client. I will not/cannot change their current software configuration(s). Bryan, don't take this the wrong way, but everyone is perfectly aware of who you are and who you work for and what your/their agenda is. I'm not criticizing that in any way. You have every right to be here contributing to an open-source project... ...but remember that SOME of us just do this as a HOBBY. People like you are being PAID to be here and it's part of YOUR JOB to discover/know the answers to some of the questions. I. myself, am simply curious. There is no paycheck behind anything I do with regards to Apache. As an employee of Google I would imagine you have far more resources than I do up to and including any number of machines that you could configure with any browser you want for your OWN testing. one thing that's also useful is to first load a page to put the resource in the browser cache, then shut down the browser and restart it to try to load that page again. this makes sure that the resource was really persisted to the disk cache and isn't just being reserved out of some temporary in memory cache. Great idea... you should make that part of YOUR testing. My curiosity has already been satisfied. Things are a little better than the were a few years ago. That was all I wanted to know yesterday. Now it's back to my REAL job, which has nothing to do with Content-encoding on the World Wide Web. thanks again! very cool that you did this. -bryan Again... you are welcome. Let me/us know how your OWN testing goes there at Google. Yours Kevin
Re: canned deflate conf in manual -- time to drop the NS4/vary?
Don't forget the ongoing issue that if you ONLY vary on 'Accept-Encoding' then almost ALL browsers will then refuse to cache a response entity LOCALLY and the pain factor moves directly to the Proxy/Content Server(s). If you vary on 'User-Agent' ( No longer reasonable because of the abuse of that header 'out there'? ) then the browsers WILL cache responses locally and the pain is reduced at the Proxy/Content server level, but pie is not free at a truck stop and there are then OTHER issues to deal with. The OTHER 'ongoing issue' regarding compression is that, to this day, it still ONLY works for a limited set of MIME types. The 'Accept-Encoding: gzip,deflate' header coming from ALL major browser is still mostly a LIE. It would seem to indicate that the MIME type doesn't matter and it will 'decode' for ANY MIME type but nothing could be further from the truth. There is no browser on the planet that will 'Accept-Encoding' for ANY/ALL mime type(s). If you are going to turn compression ON by default, without the user having to make any decisions for their particular environment, then part of the decision for the default config has to be 'Which MIME types?' text/plain and/or text/html only? SOME browsers can 'Accept-Encoding' on the ever-increasing .js Javascript backloads but some CANNOT. These 2 issues alone are probably enough to justify keeping compression OFF by default. A lot of people that use Apache won't even be able to get their heads around either one of these 'issues' and they really SHOULD do a little homework before turning it ON. Someone already quoted that... 'people expect the default config to just WORK without major issues'. That's exactly what you have now. It's not 'broken'. Why change it? Kevin Kiley -Original Message- From: Sergey Chernyshev sergey.chernys...@gmail.com To: dev@httpd.apache.org Sent: Tue, Jun 1, 2010 3:09 pm Subject: Re: canned deflate conf in manual -- time to drop the NS4/vary? Yeah, it should only Vary on Accept-encoding (already does). It's still not perfect, but at least it doesn't blow up proxies too much. The question to people with statistics - are there any other issues with gzip/proxy configurations? Sergey On Tue, Jun 1, 2010 at 11:01 AM, Eric Covener cove...@gmail.com wrote: IIUC, the vary: user-agent to accomodate Netscape 4 is a pain for caches because obviously they can only vary on the entire user-agent. http://httpd.apache.org/docs/2.2/mod/mod_deflate.html Is it time to move this aspect of the snippet into a separate note or some historical trivia section, to remove the Vary? -- On the same topic, are there still non-academic CSS and JS compression issues (e.g. XP-era browsers, earlier, later, ???) Should we instead account for these in the complicated/more compression example, and is there a way to do so without adding the Vary right back in? -- Eric Covener cove...@gmail.com
Re: Fast by default
There is zero reason for us to avoid putting deflate into the default configuration. Sorry. There ARE (good) reasons to avoid doing so. I'm the one who wrote the FIRST mod_gzip module for Apache 1.x series so you would think I'd be a strong advocate of 'auto-enablement' by default, but I am NOT. There is HOMEWORK involved here and most users will get into deep tapioca unless they understand all the (ongoing) issues. it is also very arguable that we should leave it off. Yes, it is. I think others have argued well to enable it by default. Disagree. I haven't seen the 'good' argument for 'auto-enablement' yet. Some of the reasons to NOT 'go there' are coming out in other similar threads right now... Here's a clip from the (concurrent) message thread entitled... 'Canned deflate conf in manual - time to drop the NS4/Vary' [snip] Don't forget the ongoing issue that if you ONLY vary on 'Accept-Encoding' then almost ALL browsers will then refuse to cache a response entity LOCALLY and the pain factor moves directly to the Proxy/Content Server(s). If you vary on 'User-Agent' ( No longer reasonable because of the abuse of that header 'out there'? ) then the browsers WILL cache responses locally and the pain is reduced at the Proxy/Content server level, but pie is not free at a truck stop and there are then OTHER issues to deal with. The OTHER 'ongoing issue' regarding compression is that, to this day, it still ONLY works for a limited set of MIME types. The 'Accept-Encoding: gzip,deflate' header coming from ALL major browser is still mostly a LIE. It would seem to indicate that the MIME type doesn't matter and it will 'decode' for ANY MIME type but nothing could be further from the truth. There is no browser on the planet that will 'Accept-Encoding' for ANY/ALL mime type(s). If you are going to turn compression ON by default, without the user having to make any decisions for their particular environment, then part of the decision for the default config has to be 'Which MIME types?' text/plain and/or text/html only? SOME browsers can 'Accept-Encoding' on the ever-increasing .js Javascript backloads but some CANNOT. These 2 issues alone are probably enough to justify keeping compression OFF by default. A lot of people that use Apache won't even be able to get their heads around either one of these 'issues' and they really SHOULD do a little homework before turning it ON. Someone already quoted that... 'people expect the default config to just WORK without major issues'. That's exactly what you have now. It's not 'broken'. Why change it? Kevin Kiley [snip] -Original Message- From: Greg Stein gst...@gmail.com To: dev@httpd.apache.org Sent: Tue, Jun 1, 2010 7:40 am Subject: Re: Fast by default Geez, Eric. No wonder people don't want to contribute to httpd, when they run into an attitude like yours. That dismissiveness makes me embarressed for our community. There is zero reason for us to avoid putting deflate into the default configuration. It is also very arguable that we should leave it off. I think others have argued well to enable it by default, while you've simply dismissed them with your holier-than-thou attitude and lack of any solid rationale. -g On May 31, 2010 8:06 PM, Eric Covener cove...@gmail.com wrote: On Mon, May 31, 2010 at 8:30 PM, Bryan McQuade bmcqu...@google.com wrote: I propose providing an... An additional httpd.conf doesn't sound valuable to me. What slice of non-savvy users would scrutinize an alternate config file, can replace the config file of their webserver, isn't using a webserver packaged by their OS, and wouldn't have just gotten the same information today from the manual and 400,000 other websites? There's currently no ifModule bloat in the default conf, but you're welcome to submit a patch that adds one for deflate or expires (latter seems more unwise to me). See the supplemental configuration section of the generated config. This doesn't address mass-vhost companies failing to allow deflate because it's not in the no-args HTTPD ./configure , which sounds far-fetched to me. I can't recall a users@ or #httpd user implying being subjected to such a thing with their own build or with cheap hosting. -- Eric Covener cove...@gmail.com
Re: Fast by default
web sites are loading too slow for pipes and web-server power that we have. The key phrase there is 'that WE have'. YOU need to tune YOUR configs to match what YOU have. ANYONE who uses Apache can/should/must do that. That's how that works. The discussion at this moment is what 'default' configs should ship with Apache. It is NOT POSSIBLE to accomodate EVERYONE. The default httpd.conf for Apache is simply JFW ( Just Feckin Works )... and for a product as complicated as Apache I tend to agree with those who think that is all that it needs to 'ship' with. Kevin Kiley -Original Message- From: Sergey Chernyshev sergey.chernys...@gmail.com To: dev@httpd.apache.org Sent: Tue, Jun 1, 2010 5:30 pm Subject: Re: Fast by default It's not 'broken'. Why change it? Please don't think that old configurations and practices are not broken - web sites are loading too slow for pipes and web-server power that we have. And situation is getting worse year after year - here's analysis by Patrick Meanan of WebPageTest.org's one year history: http://blog.patrickmeenan.com/2010/05/are-pages-getting-faster.html Sergey Kevin Kiley [snip] -Original Message- From: Greg Stein gst...@gmail.com To: dev@httpd.apache.org Sent: Tue, Jun 1, 2010 7:40 am Subject: Re: Fast by default Geez, Eric. No wonder people don't want to contribute to httpd, when they run into an attitude like yours. That dismissiveness makes me embarressed for our community. There is zero reason for us to avoid putting deflate into the default configuration. It is also very arguable that we should leave it off. I think others have argued well to enable it by default, while you've simply dismissed them with your holier-than-thou attitude and lack of any solid rationale. -g On May 31, 2010 8:06 PM, Eric Covener cove...@gmail.com wrote: On Mon, May 31, 2010 at 8:30 PM, Bryan McQuade bmcqu...@google.com wrote: I propose providing an... An additional httpd.conf doesn't sound valuable to me. What slice of non-savvy users would scrutinize an alternate config file, can replace the config file of their webserver, isn't using a webserver packaged by their OS, and wouldn't have just gotten the same information today from the manual and 400,000 other websites? There's currently no ifModule bloat in the default conf, but you're welcome to submit a patch that adds one for deflate or expires (latter seems more unwise to me). See the supplemental configuration section of the generated config. This doesn't address mass-vhost companies failing to allow deflate because it's not in the no-args HTTPD ./configure , which sounds far-fetched to me. I can't recall a users@ or #httpd user implying being subjected to such a thing with their own build or with cheap hosting. -- Eric Covener cove...@gmail.com
Re: canned deflate conf in manual -- time to drop the NS4/vary?
Sergey wrote... That's new to me that browsers don't cache stuff that has Vary only on Accept-Encoding - can you post some statistics or describe the test you ran? Test results and statistics... Apache DEV forum... http://www.pubbs.net/200908/httpd/55434-modcache-moddeflate-and-vary-user-agent.html apache-modgzip forum... http://marc.info/?l=apache-modgzipm=103958533520502w=2 Etc, etc. Lots of discussion about this has taken place over on the SQUID forums as well. As for *all* content types, I don't think we're talking about compressing images and it's relatively easy to create a white-list to have gzip on for by default. Apache's own mod_deflate docs show how to exclude images. That's a no-brainer. It's the OTHER mime types that get hairy. The question regarding support in browsers actually is very serious too and I'd love to see statistics for that too - it sounds too scary and middle-ages to me. You must be new to this sort of thing. See links above and read the MANY related threads on the SQUID forum. I didn't get this impression from all the talks about gzip and research that guys from Google did, for example, when they were looking for a source of lower gzip rates (it turned out to be antivirus software stripping Accept-encoding headers). I think I know the Google R/D you are referring to and it was almost a joke. There was a LOT of research they did NOT do and they made many assumptions that are simply NOT TRUE in the REAL WORLD. Thank you, Sergey You're welcome Kevin
Re: Fast by default
Let me preface ALL the remarks below with TWO statements... 1. I haven't done any research on these HTTP based Client/Server compression topics in quite some time. It is all, essentially, 'ancient history' for me but it still amazes me that some of the issues are, so many years later, still being 'guessed about' and no one has all the answers. 2. Let's not lose the TOPIC of this thread. The thread is about whether or not it's time to just turn mod_deflate ON by default in the 'safe' httpd.conf that ships with Apache. Regardless of disagreement on some of the internals and the remaining 'questions' I think it's clear to some now that this is NOT just an 'easy decision'. It's complicated. It WILL cause 'problems' for some installations and some environments. Bryan McQuade wrote... Kevin Kiley wrote... Don't forget the ongoing issue that if you ONLY vary on 'Accept-Encoding' then almost ALL browsers will then refuse to cache a response entity LOCALLY and the pain factor moves directly to the Proxy/Content Server(s). I don't think this is true for browsers in use today. Well, it's certainly true of the MSIE 6 I have 'in use today' on almost all of my Windows XP Virtual Machines that I use for testing. Also 'I don't think this is true' is certainly not 'I'm SURE this is not true'. Firefox will certainly cache responses with Vary: Accept-Encoding. I haven't done any testing with Firefox. Firefox wasn't even around when this first became an issue years ago. I'll take your word for it unless/until you can provide some Firefox specific test results page(s)that prove this to be true. The Mozilla/Firefox family has ALWAYS had a 'different' approach to how the client-end decompression gets done. That browser lineage chose to always use the local 'cache' as the place where the decompression takes place. That's why, when you use those browsers and you receive a compressed entity, you always get TWO cache files. One is simply the browser opening up a cache file to store the COMPRESSED version of the entity and the other is the DECOMPRESSED version. This is true even if the response is labeled 'Cache-control: private'. It will STILL 'cache' the response and ignore all 'Cache-Control:' directives but that's another 'bug' story altogether. They are simply doing all the decompression 'on disk' and using plain old file-based GUNZIP to get the job done so there HAS to be a 'cached copy' of the response regardless of any 'Cache-Control:' directives. It will also keep BOTH copies of the file around. The MSIE browser line will also use a cache-file ( sometimes ) for the decompression but, unlike the Mozilla lineage, MSIE will DELETE the initial compressed copy to avoid confusion. There used to be this weird-ass bug with Mozilla that would only show up if you tried to PRINT a decompressed page. The browser would forget that it had TWO disk files representing the compressed/uncompressed response and it would accidentally try to PRINT the COMPRESSED version. I certainly hope the Firefox branch of this source code line worked that little bug out. Eric Lawrence of the Internet Explorer team has a nice blog post that explains that in IE6 and later, Vary: Accept-Encoding are cached: http://blogs.msdn.com/b/ieinternals/archive/2009/06/17/vary-header-prevents-caching-in-ie.aspx. The 'EricLaw' MSDN Blog link you mention only says this about MSIE 6... [snip] Internet Explorer 6 Internet Explorer 6 will treat a response with a Vary header as completely uncacheable, unless the Vary header contains only the token User-Agent or Accept-Encoding. Hence, a subsequent request will be made unconditionally, resulting in a full re-delivery of the unchanged response. This results in a significant performance problem when Internet Explorer 6 encounters Vary headers. [/snip] This does NOT match my own research with MSIE 6 so the guy must be talking about some very specific BUILD version or 'hotpatched' version of MSIE 6. In case you missed it... here is the link to one of the discussions about this on the Apache mod_gzip forum which contains complete test results obtained using a kernel debugger with MSIE 4, 5 and 6... http://marc.info/?l=apache-modgzipm=103958533520502w=2 Eric also fails to mention that if you include MULTIPLE Vary: headers and/or multiple conditionals on the same 'Vary:' line ( as RFC 2616 says you are supposed to be able to do ) then MSIE 6 stops caching and treats those inbounds as the infamous 'Vary: *'. ( Vary: STAR ) I believe that last part is STILL TRUE even to this day which means it is STILL 'Non-RFC compliant'. This 'other issue' was also covered in my own MSIE 6 testing at the links above. Other variants of Vary do prevent caching in IE but Vary: Accept-Encoding is safe. According to EricLaw, yes... but see links and test results above. That is NOT what my own MSIE 6 testing showed. My testing on MSIE 6 only showed that ANY presence of ANY 'Vary:' header OTHER
Re: Study about developer and commits.
Mario... If you would get someone in your department who knows the English language a little better to rewrite the request you might get a little more traction. I THINK I can 'decipher' what the heck you are asking but not well enough to risk a response. Yours Kevin Kiley PS: A 'psychometric text analysis of the Apache developer mailing list'. Now there's a best-seller. It would be somewhere between a dime-store mystery novel and a Greek Epic. LOL. -Original Message- From: Mário André mario...@infonet.com.br To: dev@httpd.apache.org; 'M Jr' mj...@hotmail.com Sent: Thu, Nov 26, 2009 9:17 am Subject: Study about developer and commits. Dear Developers, We are studying behavior and characteristic about OpenSource Software (OSS) development using by reference “What can OSSmailing lists tell us? A preliminary psychometric text analysis of the Apachedeveloper mailing list (Hassan , Rigby)” that used the Apache httpdserver developer mailing list. We need to know are major developer commit in the ProjectApache HTTP Server (4 developers), for to continue our study. In period: Developer A: Top committer in release 1.3 (We guess1997-1998) Developer B: Top committer in release 2.0 (developer Bwas the top committer for 1999 and 2000) Developer C: developer C was the top committer for 2001 and2002 Developer D: (We guess 2003 - 2005) Do you help us? Where do We can find this information? Thank you for attention. - Mário André Master’s degree student InstitutoFederal de Educação, Ciência e Tecnologia de Sergipe - IFS Mestrandoem MCC - Universidade Federald de Alagoas - UFAL http://www.marioandre.com.br/ Skype:mario-fa --
Re: Study about developer and commits.
Mario wrote... Dear Kevin, So, I want to know who are (the) developers that more contributed (contributed more) with (to) the Apache project in period: - Release 1.3 (1997 and 1998) - Release 2.0 (1999 and 2000) - 2001 and 2002 - 2003 - 2005. Did (Do) you understand me? Now, yes. I addition to the links Eric and Rüdiger have posted the following might be helpful... Back in August of this year t...@sharanet.org ( Tomislav Nakic-A) posted a news item to this development list about a new 'tool' that can be used to analyze the Apache Subversion repository. AFAIK it would be able to be tell you exactly what you seek to know for any timeframe since Subversion was first used for Apache development. For timeframes prior to the first use of Subversion by apache.org you will need to find a 'wayback' machine that contains the original *Nix CVS development activity. See the following message thread from August... http://www.mail-archive.com/dev@httpd.apache.org/msg44886.html [snip] On Tue, Aug 25, 2009 at 3:45 PM, t.n.a. t...@sharanet.org wrote: Hello everyone, I am the author of the PanBI http://www.PanBI.org open source business intelligence project. PanBI provides data extraction, transformation and loading logic as well as data warehouse schemas for a number of systems and Subversion - used by Apache - is one of the systems supported. I have designed a dedicated Subversion data warehouse and loading logic so that Subversion repository data can be analyzed using OLAP tools. To demonstrate the functionality, I have made a short screencast http://panbi.sourceforge.net/systems/subversion/olap.html (7 minutes) using none other than the Apache web server's code repository as the one under analysis. [/snip] I hope that helps. Yours Kevin Kiley -Original Message- From: Ruediger Pluem rpl...@apache.org To: dev@httpd.apache.org Sent: Thu, Nov 26, 2009 2:41 pm Subject: Re: Study about developer and commits. On 11/26/2009 10:06 PM, Eric Covener wrote: 2009/11/26 Mário André mario...@infonet.com.br: Dear Kevin, So, I want to know who are developers that more contributed with the Apache project in period: - Release 1.3 (1997 and 1998) - Release 2.0 (1999 and 2000) - 2001 and 2002 - 2003 - 2005. Did you understand me? Looking at the top users here might get you on the right track, and it even has a sense of time: http://www.ohloh.net/p/apache/contributors Or here: http://httpd.markmail.org/search/?q=#query:list%3Aorg.apache.httpd.cvs+page:1+state:facets http://httpd.markmail.org/search/list:org%2Eapache%2Ehttpd%2Edev Regards Rüdiger
Re: mod_cache, mod_deflate and Vary: User-Agent
William A. Rowe, Jr. I think we blew it :) Vary: user-agent is not practical for correcting errant browser behavior. You have not 'blown it'. From a certain perspective, it's the only reasonable thing to do. Everyone keeps forgetting one very important aspect of this issue and that is the fact that the 'Browsers' themselves are participating in the whole 'caching' scheme and that they are the source of the actual requests, so their behavior is as much a part of the equation as any inline proxy cache. There is no real solution to this problem. The HTTP protocol itself does not have the capability to deal with things correctly with regards to compressed variants. The only decision that anyone needs to make is 'Where is the pain factor?'. If you VARY on ANYTHING other than 'User-Agent' then this might show some reduction of the pain factor at the proxy level but you have now exponentially increased the pain factor at the infamous 'Last Mile'. Most modern browsers will NOT 'cache' anything that has a 'Vary:' header OTHER than 'User-Agent:'. This is as true today as it was 10 years ago. The following discussion involving myself and some of the authors of the SQUID Proxy caching Server took place just short of SEVEN (7) YEARS ago but, as unbelievable as it might seem, is still just as relevant ( and unresolved )... http://marc.info/?l=apache-modgzipm=103958533520502w=2 It's way too long to reproduce here but here is just the SUMMARY part. You would have to access the link above to read all the gory details... [snip] Hello all. This is a continuation of the thread entitled... [Mod_gzip] mod_gzip_send_vary=Yes disables caching on IE After several hours spent doing my own testing with MSIE and digging into MSIE internals with a kernel debugger I think I have the answers. The news is NOT GOOD. I will start with a SUMMARY first for those who don't have the time to read the whole, ugly story but for those who want to know where the following 'conclusions' are coming from I refer you to the rest of the message and the detail. SUMMARY There is only 1 request header value that you can use with Vary: that will cause MSIE to cache a non-compressed response and that is ( drum roll please ) User-Agent. If you use ANY other (legal) request header field name in a Vary: header then MSIE ( Versions 4, 5 and 6 ) will REFUSE to cache that response in the MSIE local cache. This is why Jordan is seeing a caching problem and Slava is not. Slava is 'accidentally' using the only possible Vary: field name that will cause MSIE to behave as it should and cache a non-compressed response. Jordan is seeing non-compressed responses never being cached by MSIE because the responses are arriving with something other than Vary: User-Agent like Vary: Accept-Encoding. It should be perfectly legal and fine to send Vary: Accept-Encoding on a non-compressed response that can 'Vary' on that field value and that response SHOULD be 'cached' by MSIE... but so much for assumptions. MSIE will NOT cache this response. MSIE will treat ANY field name other than User-Agent as if Vary: * ( Vary + STAR ) was used and it will NOT cache the non-compressed response. The reason the COMPRESSED responses are, in fact, always getting cached no matter what Vary: field name is present is just as I suspected... it is because MSIE decides it MUST cache responses that arrive with Content-Encoding: gzip because it MUST have a disk ( cache ) file to work with in order to do the decompression. The problem exists in ALL versions of MSIE but it's even WORSE for any version earlier than 5.0. MSIE 4.x will not even cache responses with Vary: User-Agent. That's it for the SUMMARY. The rest of this message contains the gory details. [/snip] I participated in another lengthy 'offline' discussion about all this some 3 or 4 years ago again with the authors of SQUID. There was still no real resolution to the problem. The general consensus was that if there is always going to be a 'pain factor' then it's better to follow one of the rules of Networking and assume the following... The least amount of resources will always be present the closer you get to the last mile. In other words... it's BETTER to live with some redundant traffic at the proxy level, where the equipment and bandwidth is usually more robust and closer to the backbone, than to put the pain factor onto the 'last mile' where resources are usually more constrained. If anyone is going to start dropping some special code anywhere to 'invisibly handle the problem' my suggestion would be to look at coming up with a scheme that undoes the damage these out-of-control redundant 'User-Agent' strings are causing. The only thing a proxy cache really needs to know is whether a certain 'User-Agent' string represents a different level of DEVCAP than another one. If all that is changing is a version number and there is no change with regards to
Re: mod_cache, mod_deflate and Vary: User-Agent
Brian Akins of Turner Broadcasting, Inc. wrote... We are moving towards the 'if you say you support gzip, then you get gzip' attitude. There isn't a browser in the world that can 'Accept Encoding' successfully for ALL mime types. Some are better than others but there are always certain mime types that should never be returned with any 'Content Encoding' regardless of what the browser is saying. In that sense, you can never really trust the 'Accept-encoding: gzip, deflate' header at all. There is (currently) no mechanism in the HTTP protocol for a client to specify WHICH mime types it can successfully decode. It was supposed to be an 'all or nothing' DEVCAP indicator but that's not how things have evolved in the real world. There are really only 3 choices... 1. Stick with the original spec and continue to treat 'Accept-encoding: whatever' as an 'all or nothing' indicator with regards to possible mime types and treat every complaint of breakage as 'it's not our problem, your browser is non-compliant'. 2. Change the original spec and add a way for clients to indicate which mime types can be successfully decoded and then wait for all the resulting support code to be added to all Servers and Proxies. 3. Do nothing, and let every individual Server owner continue to find their own solution(s) to the problem(s). Yours Kevin Kiley -Original Message- From: Akins, Brian brian.ak...@turner.com To: dev@httpd.apache.org dev@httpd.apache.org Sent: Thu, Aug 27, 2009 9:42 am Subject: Re: mod_cache, mod_deflate and Vary: User-Agent On 8/26/09 3:20 PM, Paul Querna p...@querna.org wrote: I would write little lua scriptlets that map user agents to two buckets: supports gzip, doesnt support gzip. store the thing in mod_cache only twice, instead of once for every user agent. We do the same basic thing. We are moving towards the if you say you support gzip, then you get gzip attitude. I think less than 1% of our clients would be affected, and I think a lot of those are fake agents anyway. -- Brian Akins
Re: Analysis of the Apache web server code repository
I knew Trawick was a slacker most of the time. Now there's cool pie charts and movies to prove it. ROFL Hmm... why do I get the feeling this tool's real usage is so that IT managers can see who they can 'let go'? Kevin Kiley -Original Message- From: Jeff Trawick traw...@gmail.com To: dev@httpd.apache.org Sent: Tue, Aug 25, 2009 4:26 pm Subject: Re: Analysis of the Apache web server code repository On Tue, Aug 25, 2009 at 3:45 PM, t.n.a. t...@sharanet.org wrote: I have designed a dedicated Subversion data warehouse and loading logic so that Subversion repository data can be analyzed using OLAP tools. To demonstrate the functionality, I have made a short screencast http://panbi.sourceforge.net/systems/subversion/olap.html (7 minutes) using none other than the Apache web server's code repository as the one under analysis. It is fun to be in the movies; maybe I'll make my kids sit through it later ;)? (And I'm curious which company you found when you looked up trawick.) But please note that ASF ids aren't shared.? It is important for the integrity of the code that we know who is making contributions.? You should remove the overlaid text that suggests that any ASF work is being done by a commercial entity using shared ids. (All I can say about when I commit historically is that I like to sleep at least from midnight to six a.m. US Eastern Time, at least when I'm at home ;)? I don't think the same is true of many other people (wrowe).) Thanks!
Re: Analysis of the Apache web server code repository
Ah... the good 'ol days. -Original Message- From: Bill Stoddard wgstodd...@gmail.com To: dev@httpd.apache.org Sent: Wed, 26 Aug 2009 06:45:11 -0400 Subject: Re: Analysis of the Apache web server code repository toki...@aol.com wrote: I knew Trawick was a slacker most of the time. Now there's cool pie charts and movies to prove it. ROFL Hmm... why do I get the feeling this tool's real usage is so that IT managers can see who they can 'let go'? Kevin Kiley Uh oh... first I show on list up out of the blue, then you. What's next? Maybe stein and rbb show up and we can start a flame war on gzip compression? :-) Cheers, Bill
Re: Analysis of the Apache web server code repository
Ah... the good 'ol days. -Original Message- From: Bill Stoddard wgstodd...@gmail.com To: dev@httpd.apache.org Sent: Wed, 26 Aug 2009 06:45:11 -0400 Subject: Re: Analysis of the Apache web server code repository toki...@aol.com wrote: I knew Trawick was a slacker most of the time. Now there's cool pie charts and movies to prove it. ROFL Hmm... why do I get the feeling this tool's real usage is so that IT managers can see who they can 'let go'? Kevin Kiley Uh oh... first I show on list up out of the blue, then you. What's next? Maybe stein and rbb show up and we can start a flame war on gzip compression? :-) Cheers, Bill
Re: Palm Treo access to OWA via Apache 2.2.x Proxy
Ray... Can you send me just the part of your httpd config that governs this transaction and the redirect to the IIS server? I really would like to reproduce this here. The moment I can actually make it happen I have tools in place that will show exactly WHEN/WHERE it's happening (not gdb. I don't use it). All I have here is a simple mod_proxy setup with a single ProxyPass staement redirecting all requests to a back end. You are doing something with Virtual Hosts here, right? That COULD be making a difference. Yours Kevin Kiley -Original Message- From: Ray Van Dolson [EMAIL PROTECTED] To: dev@httpd.apache.org Sent: Fri, 30 May 2008 08:08:55 -0700 Subject: Re: Palm Treo access to OWA via Apache 2.2.x Proxy Nope. My test client is a Perl SOCKET level IO deal and I am able to send EXACTLY what was in your email as the 'fake' Treo request, including the 'messed up' Host:redowadev.esri.com header which is missing the SPACE character after the colon. Doesn't make any difference here. Apache parses the 'Host:' header with no problems and, as you said, corrects the header when it forwards it upstream, but the original 'fixing' of the inbound header appears to have nothing to do with any decision making with regards to 'Connection: close'. It's still a total mystery where that 'Connection: close' is actually coming from and I still cannot make it happen. Ah, interesting. So much for that theory... Glad you have a working solution but something tells me this isn't over yet. Still curious about who/what is/was adding that 'Connection: close' and why. If it starts appearing again give a shout. Well, this is definitely still something I can reproduce easily (just have to remove the RequestHeader directive), and I have a test server where it's easy to troubleshoot. I would *love* to step through this with gdb -- anyone out there a gdb expert enough to tell me how to capture a bit of runtime and play it back later so I can step through the source after the fact? Doing it in real time seems to not give consistent results. Maybe oprofile would do the trick for me... seems like there should be a way for gdb to capture the session though if its attached... I'll dig into that sometime unless someone can give me a pointer off the top of their head :) Thanks, Ray
Re: Palm Treo access to OWA via Apache 2.2.x Proxy
Ray... Can you send me just the part of your httpd config that governs this transaction and the redirect to the IIS server? I really would like to reproduce this here. The moment I can actually make it happen I have tools in place that will show exactly WHEN/WHERE it's happening (not gdb. I don't use it). All I have here is a simple mod_proxy setup with a single ProxyPass staement redirecting all requests to a back end. You are doing something with Virtual Hosts here, right? That COULD be making a difference. Yours Kevin Kiley -Original Message- From: Ray Van Dolson [EMAIL PROTECTED] To: dev@httpd.apache.org Sent: Fri, 30 May 2008 08:08:55 -0700 Subject: Re: Palm Treo access to OWA via Apache 2.2.x Proxy Nope. My test client is a Perl SOCKET level IO deal and I am able to send EXACTLY what was in your email as the 'fake' Treo request, including the 'messed up' Host:redowadev.esri.com header which is missing the SPACE character after the colon. Doesn't make any difference here. Apache parses the 'Host:' header with no problems and, as you said, corrects the header when it forwards it upstream, but the original 'fixing' of the inbound header appears to have nothing to do with any decision making with regards to 'Connection: close'. It's still a total mystery where that 'Connection: close' is actually coming from and I still cannot make it happen. Ah, interesting. So much for that theory... Glad you have a working solution but something tells me this isn't over yet. Still curious about who/what is/was adding that 'Connection: close' and why. If it starts appearing again give a shout. Well, this is definitely still something I can reproduce easily (just have to remove the RequestHeader directive), and I have a test server where it's easy to troubleshoot. I would *love* to step through this with gdb -- anyone out there a gdb expert enough to tell me how to capture a bit of runtime and play it back later so I can step through the source after the fact? Doing it in real time seems to not give consistent results. Maybe oprofile would do the trick for me... seems like there should be a way for gdb to capture the session though if its attached... I'll dig into that sometime unless someone can give me a pointer off the top of their head :) Thanks, Ray
Re: Palm Treo access to OWA via Apache 2.2.x Proxy
Well, I thought this one would be easy to spot but it's not. There's nothing I can do here to reproduce the reported behavior. I wrote a Perl script client that sends your EXACT ( Palm Treo ) OPTIONS request as you had it documented in the last email. I also wrote a simple Perl Server to imitate the IIS Server and it always sends back your same exact IIS response as documented including HTTP/1.1 indicator and NO 'Connection:' header on the response. ( Sic: Keep-alive can/should be assumed ). I fed it all through Apache 2.2.8 running mod_proxy. The capture results from point to point all look identical to yours except for one thing... the behavior is correct at all times. In my testing... mod_proxy always does the RIGHT thing and, in the absence of any 'Connection:' header at all on the response from the upstream (IIS) Server, it dutifully adds the right 'Keep-Alive:' and 'Connection: Keep-Alive headers to the response being returned to the client because at all times the default HTTP level here is 1.1 and 'Keep-Alive' can/should be assumed unless told otherwise. There was nothing I could do to make mod_proxy add 'Connection: close' to the response headed back to the client ( which is the behavior you are seeing ) unless I actually forced a 'Connection: close' header to be returned from the upstream (IIS) Server. Every time there is no 'Connection:' header at all seen on the IIS response, mod_proxy stays in 'Keep-Alive', as it should. I was testing using Apache 2.2.8 ( Stable ) version. I doubt this is all some strange behavior that is only appearing in 2.2.9 on trunk and is not evident in 2.2.8 but I suppose it's possible. The only thing that seemed to make sense for your situation after this testing was the thought that maybe, somehow, you have accidentally set the 'proxy-nokeepalive' runtime flag to TRUE and that is what is causing the 'Connection: close' header to be inserted regardless of the 'Keep-Alive' requests. So I tested that as well. * SetEnv proxy-nokeepalive 1 If I add 'SetEnv proxy-nokeepalive 1' to the httpd.conf file then mod_proxy does, in fact, change the original 'Connection: Keep-Alive' in the initial request to 'Connection: close' when it forwards the request to the upstream ( IIS ) Server. In this case, any normal upstream Server would see the 'Connection: close' coming from mod_proxy and SHOULD add it's own 'Connection: close' header to the response and close the connection after fulfilling the OPTIONS request, but just for gags I sent back the same 'No Connection header at all' response to the new OPTIONS request being changed to 'Connection: close' via mod_proxy's 'proxy-nokeepalive' option. Oddly enough... mod_proxy then reverts to assuming the response to the client should be 'Keep-Alive' even though the 'proxy-nokeepalive' flag is in effect. This appears to be a bug unto itself but still doesn't reproduce your situation ( even though it SHOULD ). Again... even when using the 'proxy-nokeepalive' option, the only way I could get mod_proxy to send back 'Connection: close' in its response to the original request is to manually insert 'Connection: close' into the response coming back from the upstream (IIS) Server. The only way this would not be a (new?) bug is if the 'proxy-nokeepalive' flag is supposed to ONLY be in effect for requests that mod_proxy makes to the upstream Server but is NOT supposed to hamper it's ability to maintain 'Keep-Alive' with the original client. I don't think that was the intent of the 'proxy-nokeepalive' flag but that's the behavior I am seeing when no 'Connection: close' header comes from the upstream Server but HTTP Protocol value is still HTTP/1.1. * SetEnv force-proxy-request-1.0 1 This option also has no effect on your situation but there is some potentially odd behavior appearing when this flag is used which may or may not be YAB. If this option is ON, then mod_proxy does, in fact, dutifully change the original HTTP/1.1 level in the original client OPTIONS request to HTTP/1.0 when it forwards the request to the upstream (IIS) Server. However... it does NOT automatically add 'Connection: close' to same upstream request ( Not sure whether it's supposed to or not since HTTP/1.0 level does not automatically preclude the use of ?Keep-Alive but it definitely does NOT add 'Connection: close' .) Oddly enough, though, it always REMOVES the original 'Connection: Keep-Alive' header but forwards the new HTTP/1.0 request upstream with no 'Connection:' header at all. This is true even if you ALSO have the 'proxy-nokeepalive' option set to 'on' in conjunction with 'force-proxy-request-1.0'. The only time you actually get a 'Connection: close' header being forwarded to the upstream header is if you use 'proxy-nokeepalive' option WITHOUT also using 'force-proxy-request-1.0'. The moment 'force-proxy-request-1.0' is in effect it cancels out the behavior of 'proxy-nokeepalive' and you always get requests being forwarded
Re: Palm Treo access to OWA via Apache 2.2.x Proxy
Believe I may have this working now. The Treo was sending its Host header as follows: Host:hostname.esri.com (Note the lack of space betwen the colon and hostname.? This probably isn't valid but was corrected by Apache as it proxied the request on to IIS.? However, maybe the initial invalid header there somehow caused Apache to decide the connection wouldn't support Keep-Alives? Nope. My test client is a Perl SOCKET level IO deal and I am able to send EXACTLY what was in your email as the 'fake' Treo request, including the 'messed up' Host:redowadev.esri.com header which is missing the SPACE character after the colon. Doesn't make any difference here. Apache parses the 'Host:' header with no problems and, as you said, corrects the header when it forwards it upstream, but the original 'fixing' of the inbound header appears to have nothing to do with any decision making with regards to 'Connection: close'. It's still a total mystery where that 'Connection: close' is actually coming from and I still cannot make it happen. In any case, when I add: ? RequestHeader set Host hostname.esri.com Everything works! ? Super! Using 'RequestHeader' here on a test causes no change whatsoever. Everything works as it should either with or without it. The odd thing is is that Apache was still matching on the correct name based virtual host so perhaps regular apache virtual hosting uses a different system for validating the Host header than mod_proxy does... Hallelujah!? And thanks to all who bore with me on this issue. Ray Glad you have a working solution but something tells me this isn't over yet. Still curious about who/what is/was adding that 'Connection: close' and why. If it starts appearing again give a shout. Yours Kevin Kiley -Original Message- From: Ray Van Dolson [EMAIL PROTECTED] To: dev@httpd.apache.org Sent: Thu, 29 May 2008 11:17 am Subject: Re: Palm Treo access to OWA via Apache 2.2.x Proxy Believe I may have this working now. The Treo was sending its Host header as follows: Host:hostname.esri.com (Note the lack of space betwen the colon and hostname. This probably isn't valid but was corrected by Apache as it proxied the request on to IIS. However, maybe the initial invalid header there somehow caused Apache to decide the connection wouldn't support Keep-Alives? In any case, when I add: RequestHeader set Host hostname.esri.com Everything works! The odd thing is is that Apache was still matching on the correct name based virtual host so perhaps regular apache virtual hosting uses a different system for validating the Host header than mod_proxy does... Hallelujah! And thanks to all who bore with me on this issue. Ray
Re: Palm Treo access to OWA via Apache 2.2.x Proxy
Your posts keep saying The Treo does this and the Treo does that and likelihood of fixing Treos is 0 percent... ...but I'm a little confused. What SOFTWARE are we talking about on the Treo. The Treo is just a handheld. It does what it's told to do. Are you using one of the carrier's standard browsers or is this is custom piece of software sending this intitial OPTIONS request and then ignoring the 'Connection: Close' from the Server? If it's a piece of custom software then the likelihood of at least fixing the Why doesn't the Treo respond correctly to Close Connection part of this issue is actually around 100 percent. Just fix the client and send out updates. It is NOT a 'bug' for a Proxy server to decide to send 'Connection: Close' to an upstream server even if the original request contains Connection: Keep-Alive. It IS, however, a definite bug on the client side if a Server sends 'Connection: Close' and the client still behaves as if the connection is active. -Original Message- From: Ray Van Dolson [EMAIL PROTECTED] To: dev@httpd.apache.org Sent: Thu, 22 May 2008 5:24 pm Subject: Re: Palm Treo access to OWA via Apache 2.2.x Proxy On Thu, May 22, 2008 at 02:59:41PM -0700, Jim Jagielski wrote: On Thu, May 22, 2008 at 01:59:30PM -0700, Ray Van Dolson wrote: I promise to go and read through the RFC's, but if the Treo is requesting a Keep-Alive connection, shouldn't Apache try its best not to close the connection as quickly as it is? It doesn't matter though. If the server is saying Connection closed, the client shouldn't just ignore it. I am not disagreeing with this. The likelihood of getting all Treo devices fixed to act correctly however is right near 0% :) Ray
Re: Palm Treo access to OWA via Apache 2.2.x Proxy
Ah... okay. Thanks for the clarification. Sounds you are just stuck in the middle trying to deal with a broken client. I thought you might be trying to actually implement the client software or something. Sure, you can fix this. Just get in with a monkey wrench if you have to and force mod_proxy to honor 'Keep-Alive' for an OPTIONS request and the behavior should then be identical to the ( known good ) direct-to-IIS example. However... if you have a client that just 'assumes' connections stay active just because it sent 'Connection: Keep-Alive' and isn't bothering to check the actual connection responses from Servers then this initial handshake thing is going to be the least of your worries. That's a client-side boo-boo that's going to keep jumping up and biting everyone in the buttocks and it won't hurt to point that out to whoever is responsible for that client. Good luck. PS: Still just curious. What is the HTTP/x.x value actually being sent by the The Treo for the exchange in question?. Is it the older HTTP/1.0 or is it actually requesting full HTTP/1.1 functionality? Sometimes that comes into play with this 'Keep-Alive' stuff. If it's sending HTTP/1.0 then perhaps mod_proxy is simply obeying strict standards and that's why it changes 'Keep-Alive' back to 'Close'. 'Keep-Alive' was not 'officially' part of the HTTP/1.0 specs. It just sort of 'crept in there' and was available BEFORE full implementation of HTTP/1.1. So there's still a lot of confusion out there and a lot of 'looking the other way' going on with regards to 'Keep-Alive'. Some code tries to be strict ( Apache, generally ) and others are 'loose' ( Microsoft/IIS? ). Example: MS Internet Explorer has always had an 'Advanced Option' which allows you to decide to use HTTP/1.1 for Proxy Connections but it is OFF by default. Default behavior for MSIE Proxy requests is to use the older HTTP/1.0. However... that doesn't mean it won't use Keep-Alive. It treats that part of the HTTP/1.1 spec as an exception. Apologies in advance if this is all just old news to you. On my own Microsft Windows Mobile Treo, however, this legacy Advanced Option is missing. The Pocket Internet Explorer Browser under Windows Mobile will ALWAYS send an HTTP/1.1 request. -Original Message- From: Ray Van Dolson [EMAIL PROTECTED] To: dev@httpd.apache.org Sent: Thu, 22 May 2008 11:50 pm Subject: Re: Palm Treo access to OWA via Apache 2.2.x Proxy On Thu, May 22, 2008 at 09:03:23PM -0700, [EMAIL PROTECTED] wrote: Your posts keep saying The Treo does this and the Treo does that and likelihood of fixing Treos is 0 percent... ...but I'm a little confused. What SOFTWARE are we talking about on the Treo. The Treo is just a handheld. It does what it's told to do. Are you using one of the carrier's standard browsers or is this is custom piece of software sending this intitial OPTIONS request and then ignoring the 'Connection: Close' from the Server? I think the latter. The web browser component of the Treo software appears to work correctly, it's just the ActiveSync portion that is failing (part of the mail application). If it's a piece of custom software then the likelihood of at least fixing the Why doesn't the Treo respond correctly to Close Connection part of this issue is actually around 100 percent. Just fix the client and send out updates. It is NOT a 'bug' for a Proxy server to decide to send 'Connection: Close' to an upstream server even if the original request contains Connection: Keep-Alive. It IS, however, a definite bug on the client side if a Server sends 'Connection: Close' and the client still behaves as if the connection is active. Believe me, I understand this. I will definitely push Palm to address this issue with whatever influence I can wield. :) However, as Apache has other environment type settings that allow overriding of behaviors for other broken clients, I was hoping to discover something similar that might do the trick here -- or that an option could even be added. Thanks for all the feedback. Ray
Re: mod_gzip and incorrect ETag response (Bug #39727)
I'm not proposing a solution but just pointing out that if this discussion is going to come up once again that even the latest, greatest versions of one of the most popular browsers in the world, Microsoft Internet Explorer, will still REFUSE TO CACHE any response that shows up with a Vary: on any field other than User-Agent. It is certainly true that using the same ETag for two different Variants is in violation of RFC, but just fixing ETag isn't going to fully solve the industry-wide problem. The moment the user-agent itself it refusing to cache responses that have any Vary: conditions at all then you create a thundering herd scenario now on the last mile instead of between Proxy/COS ( Content Origin Server ). The long-running question will still remain. Is it better to at least have the user-agents hanging on to non-expired responses that have actually been CACHED locally and not hammering the proxies at all, or it is it simply better to have the proxy do the heavy-breathing if/when a lot of requests are coming for a document that might have two ( varied ) responses. In the case of Accept-encoding: gzip, it's pretty darn rare these days for any COS or Proxy to ever get a reqeust from some brain-dead user-agent that can't decompress, isn't it? Hasn't the non-compressed variant become an extreme edge-case by now? I would certainly hope so. Yours Kevin Kiley In a message dated 8/27/2007 5:33:55 AM Pacific Standard Time, [EMAIL PROTECTED] writes: Just wondering if there is any plans on addressing Bug #39727, incorrect ETag on gzip:ed content (mod_deflate). Been pretty silent for a long while now, and the current implementation is a clear violation of RFC2616 and makes a mess of any shared cache trying to cache responses from mod_deflate enabled Apache servers (same problem also in mod_gzip btw..) For details about the problem this is causing see RFC2616 section 13.6, pay specific attention to the section talking about the use of If-None-Match and the implications of this when a server responds with the same ETag for the two different variants of the same resource. There is already a couple of proposed solutions, but no concensus on which is the bettter or if any of them is the proper way of addressing the issue. The problem touches - ETag generation - Module interface - Conditionals processing when there is modules altering the content Squid currently have a kind of workaround in place for the Apache problem, but relies on being able to detect broken Apache servers by the precense of Apache in the Server: header, which isn't fool prof by any means. Regards Henrik ** Get a sneak peek of the all-new AOL at http://discover.aol.com/memed/aolcom30tour
Re: mod_gzip and incorrect ETag response (Bug #39727)
You are the CNN guy, right? Of your 30 percent... is there an identifiable User-Agent that comprises a visible chunk of the requests? If so... what is it? Yours... Kevin Kiley In a message dated 8/27/2007 10:09:33 AM Pacific Standard Time, [EMAIL PROTECTED] writes: On 8/27/07 12:34 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hasn't the non-compressed variant become an extreme edge-case by now? I would certainly hope so. Unfortunately not. About 30% of our requests do not advertise gzip support.. Brian Akins Chief Operations Engineer Turner Digital Media Technologies ** Get a sneak peek of the all-new AOL at http://discover.aol.com/memed/aolcom30tour
Re: Completely transform a request
I'm doing some testing here on the latest build from trunk. Will let you know ASAP whether this is going to be possible from solely within a connection input filter or whether you will need other hooks to pull it off. In the meantime... if someone else is more familiar with connection input filters and already knows there is no way to do this given the current design and implementation it would cut some corners to hear it can't be done. You MAY have found a bug in connection input filtering. Maybe not. It's worth a look to see if that's the case. Kevin Kiley In a message dated 7/31/2007 5:53:10 AM Pacific Standard Time, [EMAIL PROTECTED] writes: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 [EMAIL PROTECTED] wrote: People can get kinda short and blunt over here but be advised that the only bad discussion about technology is not having one at all and, in general, the constructive criticism is all well-intentioned. Well, then we might just have to continue discussing this technology. I'd repost my original question, and I kindly ask everyone just to forget all the OpenPGP stuff. I want to Completely Transform a Request. 100% transformation. Based on a certain logic, If an incoming request matches one of my action triggers, then I want to apply a transformation to the 100% of the incoming request. I know I can do that when I just want to modify brigade-by-brigade. But I need to read the WHOLE request before doing so. Even the METHOD line. Even the headers. Even the body. All of it. Then, completely transform that into another request, and have Apache process it. With the current input filtering framework, at the connection level, I should be able to do it. But I can't. If you NEED an example of what I'd like to transform, and into WHAT i want to transform it, see this post: What I'd like to transform: http://www.mail-archive.com/dev@httpd.apache.org/msg37206.html Into WHAT I want to transform it: a completely different request (i.e different method line, different headers and different body, and I can't do that in stages, I have to read the whole request first). Sincerely, - -- Arturo Buanzo Busleiman - Consultor Independiente en Seguridad Informatica SHOW DE FUTURABANDA - Sabado 18 de Agosto 2007 (Speed King, Capital Federal) Entradas anticipadas a traves de www.futurabanda.com.ar - Punk Rock Melodico -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGry75AlpOsGhXcE0RChjtAJ4kXkrjZKyJ5iG1Qtbtge2HEXFySQCdHej3 52H1qN3dNLZa7fL8/Bre8BI= =ooIU -END PGP SIGNATURE- ** Get a sneak peek of the all-new AOL at http://discover.aol.com/memed/aolcom30tour
Re: Completely transform a request
Arturo wrote... Thanks for taking the time to discuss this with me. I really appreciate it. I was able to read a whole pgp-encrypted request, even a large 12+MB one using my code. I read the content-length header, then read up to that quantity of bytes, saving the brigades to a context brigade. Of course, I just DECLINE when the request is not a POST /HTTP_OPENPGP_DECRYPT HTTP/1.1 one. That's all good news. You're almost there. Just send your headers normally, add the encrypted BODY, and let the Server side conn-filter do it's thing on the BODY data. Why isn't that good enough? Privacy. Ah... Ok... I get it now. You have the 'content' encryption/decryption part worked out but you want to also make sure no one between the client and the COS ( Content Origin Server ) can ever see where someone is going or what they are asking for. Your POST /HTTP_OPENPGP_DECRYPT HTTP/1.1 approach will, of course, always reveal the actual server being contacted unless the target is just running as a portal in some other domain... so all you are really trying to mask is the actual DOCUMENT being requested and, perhaps, any associated QUERY parms. Right? I really want to put headers and URI, and body if applicable. That's why the special POST request URI I'm using has minimal headers. The real headers, the real body, the real URI is inside the encrypted body. Ok... let's take a deep breath here. Let's define what you are really trying to do and maybe the reason you are having trouble implementing it in Apache will be a little clearer. The word for this is tunnelling. You don't want to implement your own LOC ( Left of Colon ) protocol named httpp ( http + PGP ) like Secure Socket Layer does ( https = http + SSL ). You want to tunnel the real (encrypted) request through the non-secure HTTP protocol using a fake request that appears to be un-encrypted. The problem you are running into is that you want to let Apache's normal http protocol handler answer the phone and then rip the rug out and create the real request and discard the dummy one. It's classic tunnelling with a twist. You want to both tunnel and redirect. No problem. You SHOULD be able to do this, if you want, however quirky some might think the approach is. By choosing to tunnel, however, you are missing the opportunity to answer the phone. Since the initial request appears to be a normal HTTP request then all of Apache's normal http input handlers are going to kick in right at the front door. I don't think just a simple filter is going to do all the job for you, in this case. Keep in mind that the first line of input for any HTTP server is not considered a Header at all. It is called the HTTP method line and by the time it is parsed by the server you have a whole bunch of server variables that have been intialized and the server now thinks it knows where it's going. What FOLLOWS the HTTP method line are things called input headers. So even if you figure out how to vaporize the inbound fake headers and replace the buckets with your own you are still going to have to do something else to pull off your redirect to the secret URI regardless of how you find out what it really is. Just to satisfy my own curiosity I worked up a module here that is, in fact, able to do this. It is by no means a working PGP demonstration but it does do a simple imitation of what you are trying to do. In other words, I posted myself some gobbly-gook which has an actual target URI in it and not only am I able to filter the gobbly-gook and turn it back into non-gobbly-gook I was able to simply do a standard internal redirect to the URI once I pulled it out of the non-gobbly-gook. It's called an internal-redirect. Apache has been able to do this for a long time and you need to take a hard look at it and see if it will work for you. The trick is that you are going to have to do the same things that the existing Apache module mod_rewrite does. It's the same concept as mod_rewrite only it needs to happen a little later than usual. Take a close look at mod_rewrite to see all of the API calls it makes. You can make these same calls yourself from within your filter and make it appear as if mod_rewrite actually ran on your request. So I was able to send a fake (dummy) request into Apache, start filtering my posted gobbly-gook, pull a real destination URI out of the gobbly-gook and then tell Apache to redirect to it. It seems to work fine. What this does NOT do is address your reported problem that you are having trouble vaporizing existing request HEADER buffers and then replacing them with your own. You may, in fact, have found a bug there but it may also simply be that you are too late in the game to pull this from within a BODY data input filter. So the vaporization of existing input request headers and replacing them with others from within filter code only ( and not a server hook ) is
Re: Completely transform a request
I wrote about this last week, on dev@httpd.apache.org, with a thread whose subject was Introducing mod_openpgp: Yes, I saw that. It was your new question about Posting a Secret request and then trying to re-dump it into Apache as a Trojan Horse that had me confused. Is this the way you actually plan on implementing OpenPGP? The Nestcape plugin just wakes up for every Navigation and then rewrites the request into a POST buffer + encryption? I really don't think that's going to fly in the long run. Even if you go this route... is it only going to work with Nestsape, or something? That's not much of an Open standard. As to your last question... Nick Kew's post is on the mark. Look hard and long at mod_ssl and mod_deflate. If you need to get in there BEFORE the intial headers are processed look at the mod_ssl code. It does the job. If you don't care about intercepting the header's for your Trojan POST then a simple rewrite of mod_deflate should do the job. If it can't then there's still something essentially wrong with the Apache 2.0 filtering design and it would be interesting to find out. If you really do need to vanish the entire original request including the headers then you can still do that from within a non-connection based filter but you are going to have to replace all the tables that have already been created by Apache like request-headers_in with your own tables. The memory isn't protected at that point and it's no problem to do this, but I can't think of any existing ( public ) Apache filter that does this yet. You could also just create an entire new 'sub-request' the moment your filter wakes up and discard the original but be advised that there are still some things that might not work as advertised for a 'sub-request' that currenly work fine if the request is 'main'. There are still some bugs lurking in that area. Third alternative would be to look at the legacy mod_gzip code for the Apache 1.3 series. It implements a connection filter just like mod_ssl does but does NOT require EAPI or any rewrites to standard Apache. The methods used in the legacy mod_gzip will still work in the 2.0 series, if desired. Later... Kevin In a message dated 7/28/2007 10:03:27 AM Pacific Standard Time, [EMAIL PROTECTED] writes: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 [EMAIL PROTECTED] wrote: It is, in fact, possible to do what you are trying to do but before anyone tells you how, in public, do you mind expaining, in public, what the heck you are actually trying to do here? Hi :) I'm writing a draft on OpenPGP extensions to the HTTP Protocol. So far, I've implemented request signing, which enables apache to verify a request using openpgp. http://freshmeat.net/articles/view/2599 I plan on implementing encrypted requests, too. I'm also currently writing a session management extension. All client-side stuff is contained in a Firefox extension called Enigform. I wrote about this last week, on dev@httpd.apache.org, with a thread whose subject was Introducing mod_openpgp: http://www.gossamer-threads.com/lists/apache/dev/333889 I hope the answer to what the heck is now clear. - -- Arturo Buanzo Busleiman - Consultor Independiente en Seguridad Informatica SHOW DE FUTURABANDA - Sabado 18 de Agosto 2007 (Speed King, Capital Federal) Entradas anticipadas a traves de www.futurabanda.com.ar - Punk Rock Melodico -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGq3agAlpOsGhXcE0RCoxwAJ9KClqBRMRG3+2ASQXZ9uxUt5vC/gCfQAh6 YMYGFold1eI9tU+vsFHJc+Q= =lgFi -END PGP SIGNATURE- ** Get a sneak peek of the all-new AOL at http://discover.aol.com/memed/aolcom30tour
Re: Completely transform a request
That's why I thought bringing the concept over here was a good idea. I'm finally getting some constructive criticism! It's an interesting idea. There have been years of work put into making HTTP and Apache extensible for ideas just such as this one and regardless of what anyone thinks of your idea if you can't implement it easily right now then the ball just isn't into the end-zone yet on the Server design. People can get kinda short and blunt over here but be advised that the only bad discussion about technology is not having one at all and, in general, the constructive criticism is all well-intentioned. Not really. I have many paths. One of them is almost fully avoiding pgp from inside a browsers plugin, and just write a proxy that implements the required functionality, and the same in reverse at the server side, so it's more easily plugged into any browser/httpd app. Ultimately, I think that's the way to go. You should have that setup available regardless of anything else you cook up. Even if your standards are approved by all the various parties involved the horse has left the barn with clients and user-agents and even if they wanted to implement your new scheme it would be years before it happened and then years after that before all the bugs were worked out for all the clients, probably. That's just the way it is. It's still impossible to get all major clients up to speed on HTTP/1.1 RFCs alone, much less throw in something like this for them to implement. I'm not trying to poo-poo your current plans. Let's assume you just want to see it work the way you are going and that it SHOULD be possible to do it with Apache's current design. The problem with an encrypted request is that, well, it doesn't work on chunks. Well, if it's a request headed at an Apache server then sooner or later you are going to have deal with 'chunking'. If you are going to allow HUGE posts up to Apache with your OpenPGP then at some point it better be able to 'chunk' or you'll have to force the Server to think you are, at all times, a legacy HTTP 1.0 client and then you can get conflict errors trying to use other headers. But just for the sake of getting something to work... let's assume that you are just trying to package simple, short HTTP requests and they will all fit into one buffer read on the Server side. If you can't get that working then all the chunking in the world isn't going to make any difference later. ...take into account I've already implemented reading and decrypting the request withing the conn filter. Ok. So help me out here. Why isn't that enough? I think your problem at the moment might be in how you are implementing things on the CLIENT side. If you are able to run the request in your Mozilla/Firefox plug-in and add the encrypted POST data to the request then why do you need the Trojan Horse thing? Just send your headers normally, add the encrypted BODY, and let the Server side conn-filter do it's thing on the BODY data. Why isn't that good enough? If I had just to decrypt the body, it'd be quite easy. Yup. That's what filters are designed to do. WHY do you need to encrypt the headers at all? Help me understand. If there is any piece of information outside the BODY that you are trying to hide from normal view then why don't you just add it to the normal request headers as an encrypted field... without trying to encrypt ALL of the headers and create the Trojan Horse scenario on the Server side? Later... Kevin Kiley In a message dated 7/28/2007 11:31:33 AM Pacific Standard Time, [EMAIL PROTECTED] writes: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 [EMAIL PROTECTED] wrote: Is this the way you actually plan on implementing OpenPGP? Not really. I have many paths. One of them is almost fully avoiding pgp from inside a browsers plugin, and just write a proxy that implements the required functionality, and the same in reverse at the server side, so it's more easily plugged into any browser/httpd app. In any case, Enigform, mod_openpgp, etc, are all things that help me find a solid base for implementing Openpgp into http. For example, the way request signing was implement is quite straight forward and http compliant. (see http://freshmeat.net/articles/view/2599). The problem with an encrypted request is that, well, it doesn't work on chunks. I could make it work on chunks, but that would include LOTS of overhead to the request. And I mean LOTS. Are you a GnuPG user? You probably know what I'm talking about if you are. The Nestcape plugin just wakes up for every Navigation and then rewrites the request into a POST buffer + encryption? Not Netscape, but Mozilla Firefox. I really don't think that's going to fly in the long run. That's why I thought bringing the concept over here was a good idea. I'm finally getting some constructive criticism! Even if you go this route... is it only going to work with Nestsape, or
Re: Wrong etag sent with mod_deflate
Let me preface all comments by saying that I AGREE with BOTH Roy and Henrik... If Apache is sending the same exact (strong) ETag value for both a compressed and an identity variant of the same entity... then, according to current RFC content, that is broken behavior and it should be fixed. You can take the part of the RFC that talks specifically about how Weak Etags might seem ideal for compressed variants and argue that against Henrik's point of view that a compressed variant should ALWAYS be treated as a separate (unique) HTTP entity but I don't want to go there. Not now, anyway. Personally I tend to agree with the concept that even if DCE is employed ( Dynamic Content Encoding ) that any code that is doing DCE ( versus Transfer Encoding ) should make that dynamically generated entity appear as if it was simply a disk-based (separate) resource. DCE is, after all, just a magic trick. It is making it APPEAR to end-users as if compressed variants of entities actually physically exist and are being sent back to anyone ready/able/willing to receive them... ...and it's a GOOD TRICK, when done correctly. Roy wrote In other words, Henrik has it right. It is our responsibility to assign different etags to different variants because doing otherwise may result in errors on shared caches that use the etag as a variant identifier. See above. Totally agree. Justin wrote... As Kevin mentioned, Squid is only using the ETag and is ignoring the Vary header. That's the crux of the broken behavior on their part. Roy wrote... Then they will still be broken regardless of what we do here. It simply isn't a relevant issue. It's relevant to the extent that I think there are still some things missing from the RFCs with regards to all this which is why a piece of software like SQUID might be doing the wrong thing as well. Best way I could elaborate on that feeling is to just walk through Roy's scenario... Roy wrote... Unlike Squid, RFC compliance is part of our mission, at least when it isn't due to a bug in the spec. This is not a bug in the spec. A high-efficiency response cache is expected to have multiple representations of a given resource cached. No doubt. The cache key is the URI. Yes. If the set of varying header field values that generated the cached response is different from the request set, ...as when one browser asks for the a URI and sends Accept-encoding: gzip and another ask for the same URI and does NOT supply Accept-encoding: gzip... then a conditional GET request is made containing ALL of the cached entity tags in an If-None-Match field (in accordance with the Vary requirements). ...and, currently, if the cache has stored both a compressed and and non-compressed version of the same entity received from Apache ( sic: mod_deflate ) then the same ( strong ) ETag is returned in the conditional GET for both of the cached variants. Hmmm... begins to look like a problem... but is it really?... If the server says that any one of the representations, as indicated by the ETag in a 304 response, is okay, okay means fresh. In the case of a DCE encoded variant, an argument could be made here that it doesn't make a bit of difference if the ETag for the compressed or non-compressed variant is the 'same' or it is 'different'. All the cache really wants to know is Is the ORIGINAL ( uncompressed ) version of this response fresh or not? The compressed variant should ALWAYS be just the encoded version of the same original uncompressed entity. If the original uncompressed version ( indicated by strong ETag 1 ) is not fresh then there is no possible way for any compressed variant of the same entity ( marked by the same strong ETag 1 ) to be fresh. It's just not possible. So, in essence, when the Vary: has to do with just compression, then the compressed and uncompressed variants are married in a way that, perhaps, is not covered in the existing ETag RFC specifications. The ETag CAN/SHOULD be the same because there is no way for the original ( strong ETag ) to become not fresh without the other representation also becoming not fresh. These kinds of Variants are Synced in a way perhaps not ( currently ) covered by the ETag specs. then the cached representation with that entity tag is sent to the user-agent regardless of the Vary calculation. sent to means the cache has received it's 304 response and decided what it CAN/SHOULD send back to the user, right? Well... if you follow the argument above about how certain variants are synced together then even if two variants on the cache share the same strong ETag... then how can the cache send back the wrong thing or NOT pay attention to the Vary calculation on its end? I don't know the exact details of the exact field problem that Henrik is trying to solve but it seems to me that EVEN THOUGH the compressed and non-compressed variants might happen to share the same (strong) ETag... if SQUID is delivering
Re: Wrong etag sent with mod_deflate
Justin wrote... No - this patch breaks conditional GETs which is what I'm against. See the problem here is that you have to teach ap_meets_conditions() about this. An ETag of 1234-gzip needs to also satisfy a conditional request when the ETag when ap_meets_conditions() is run is 1234. In other words, ap_meets_conditions() also needs to strip -gzip if it is present before it does the ETag comparison. But, the issue is that there is no real way for us to implement this without a butt-ugly hack. However, I disagree with Roy in that we most certainly *do* treat the ETag values as opaque - Subversion has its own ETag values - Roy's position only works if you assume the core is assigning the ETag value which has a set format - not a third-party module. IMO, any valid solution that we deploy must work *independently* of what any module may set ETag to. It is perfectly valid for a 3rd-party module to include -gzip at the end of their ETag. ...or -bzip2. mod_bzip2 has been working fine for almost a year now and presents the same issue Justin is talking about here. It (can) generate it's own ETag values, if you want it to ( configurable ), and ap_meets_conditions isn't going to know what to strip or not strip. Yours Kevin
Re: Wrong etag sent with mod_deflate
And please stop lying about Squid. C'mon Henrik. No one is intentionally trying to LIE about Squid. If you are referring to Justin quoting ME let me supply a big fat MEA CULPA here and say right now that I haven't looked at the SQUID Vary/ETag code since the last major release and I DO NOT KNOW FOR SURE what SQUID is doing ( or not doing ) if/when it sees the same (strong) ETag for both a compressed and an identity version of the same entity. Period. I DO NOT KNOW FER SURE. I should have made that perfectly clear along with any opinion previously offered. I apologize for that. I also DID already state clearly in another post... I don't know the exact details of the exact field problem that Henrik is trying to solve... Keyphrase --don't know the exact details In my other posts, I was suggesting, however, that even if an upstream content server ( Apache ) is not sending separate unique ETags I am still having a hard time understanding why that would cause SQUID to deliver the wrong Varied response back to the user. Something is nagging at me telling me that EVEN IF the same (strong) ETag happens to be on both a compressed and a non-compressed version of the SAME ENTITY that there shouldn't be a big problem in the field ( sic: A user not getting what they asked for ). A compressed version of an entity IS the same entity... for all intents and purposes... it just has compression applied. One cannot possibly become stale without the other also being stale at the same exact moment in time. If you think something in our cache implementation of Vary/ETag is not right then say what and back it up with RFC reference. At the moment... yes... I do... but if you read my other posts I also have a feeling the reason I can't quote you Verse and Chapter from an RFC is because I have a sneaking suspicion that there is something missing from the ETag/Vary scheme that can lead to problems like this... and it's NOT IN ANY RFC YET. It has something to do with being too literal about a spec and ignoring common sense. In other words... you may be doing exactly what hours and hours of reading an RFC seems to be telling you you SHOULD do... but there still might be something else that OUGHT to be done. I hope the discussion continues. This is something that has been lurking for years now and it needs to get resolved. There will always be the chance that some upstream server will ( mistakenly? ) keep the same (strong) ETag on a compressed variant. People are not perfect and they make mistakes. I still think that even when that happens any caching software should follow the be lenient in what you accpet and strict in what you send rule and still use the other information available to it ( sic: What the client really asked for and expects ) and do the right thing. Only the cache knows what the client is REALLY asking for. Yours... Kevin
Re: Wrong etag sent with mod_deflate
In other words, Henrik has it right. It is our responsibility to assign different etags to different variants because doing otherwise may result in errors on shared caches that use the etag as a variant identifier. Henrik is trying to make it sound like it is all Apache's fault. It is not. SQUID is screwing up, too. ...shared caches that use the etag as a variant identifier. To ONLY ever use ETag as a the end-all-be-all for variant identification is, itself, a mistake. If the Vary: field is present... then THAT is what the entity (also) Varies: on and to ignore that and only rely on ETag is a screw-up. I had this argument years ago with folks at the SQUID forum. It was just prior to when they ( finally ) got around to adding any support for Vary: at all but (limited) support for ETag:. Regardless of whether it's DCC ( Dynamic Content-Encoding ) or not... if the entity Varies: on Content-encoding: but some cache software is ignoring that just because it's ETag matches some other stored variant... well... that's just WRONG. Both pieces of software ( SQUID and Apache ) need just a little more code to finally get it right. Don't forget about Content-Length, either. If 2 different responses for the same requested entity come back with 2 different Content-Lengths and there is no Vary: or ETag then regardless of any other protocol semantics the only SANE thing for any caching software to do is to recoginze that, assume it is not a mistake, and REPLACE the existing entity with the new one. Yea.. sure... you might get a lot of cache bounce that way but at least you are returning a fresh copy. It is not possible for 2 EXACTLY identical reprsentations of the same requested entity to have different content lengths. If the lengths are different, then SOMETHING is different with regards to what you have in your cache. To ignore that reality as well ( which most caching software does ) is just kinda stupid. No protocol ( sic: set of rules ) can ever cover all the realities. ( Good ) software knows how to make common sense as well. Yours... Kevin Kiley In a message dated 12/8/2006 11:45:44 AM Pacific Standard Time, [EMAIL PROTECTED] writes: Argh, my stupid ISP is losing apache email again because they use spamcop. On Dec 7, 2006, at 2:45 PM, Henrik Nordstrom wrote: tor 2006-12-07 klockan 02:42 +0100 skrev Justin Erenkrantz: -1 on adding semantic junk to the existing ETag (and keeping it strong); that's blatantly uncool. Any generated ETag from mod_deflate should either be the original strong version or a weak version of any previous etag. mod_deflate by *definition* is just creating a weak version of the prior entity. No, it is changing the content-encoding value, which is changing the entity. The purpose of etag for caching is two-fold: 1) for freshness checks, and 2) handling conditional range/authoring requests. That is why the spec is full of gobbledygook on etag handling -- it was stretched at the last minute to reuse a very simple freshness check as a form of variant identifier. What we should be doing is sending transfer-encoding, not content- encoding, and get past the chicken and egg dilemma of that feature in HTTP. If we are changing content-encoding, then we must behave as if there are two different files on the server representing the resource. That means tweaking the etag and being prepared to handle that tweak on future conditional requests. In other words, Henrik has it right. It is our responsibility to assign different etags to different variants because doing otherwise may result in errors on shared caches that use the etag as a variant identifier. Roy
Re: product name
I wouldn't push the "Apache" thing. Truth is... a letter could show up at any moment from lawyers of the Apache Nation regarding the name usage. Might even be way overdue. I wouldn't "go there" and draw attention to the issue at all. Yours... Kevin Kiley In a message dated 7/28/2006 3:00:23 PM Pacific Standard Time, [EMAIL PROTECTED] writes: Let me sugest::: Nahche ::Nahche (Na-ai-che, `mischievous,' `meddlesome.'-George Wrattan). AnApache warrior, a member of the Chiricahua band. He is the second sonof the celebrated Cochise, and as hereditary chief succeeded his elderbrother, Tazi, on the death of the latter. His mother was a daughterof the notorious Mangas Coloradas.why:1) looks vagly like Apache ;)2) has a link to Apache Indians3) Sounds cool
Re: restructuring mod_ssl as an overlay
Roy wrote... The sane solution would be to convince the US government to remove encryption from the export control list, since that regulation has been totally ineffective. That is not likely to happen during this administration, though, and I don't think the ASF is allowed to lobby for it directly. Not going to happen in our lifetimes. Since World War II, when modern cryptography/encryption/decryption turned out to be the deciding factor in the conflict itself, it's all been classified as 'muntions' right along with firearms and explosives and, as such, is ( now ) under the jurisdiction of the ATF ( Bureau of Alchohol, Tobacco and Firearms. ) It isn't just a State Department policy thing. Any change in the policy would involve tons of government agencies, not just one. Roy also wrote... If anyone can think of another option, I'd like to hear it before proposing a vote. Here's another option before doing anything drastic... ...get a professional opinion. Roy... you have done fantastic research but I am seeing a lot of 'assumptions' in all the postings. ( Yours and others ). This isn't really something to get into with any 'assumptions'. You should be SURE that any changes are going to gey you ( ASF ) where you want to be. Hypotheticals are fun but they don't get the horses into the barn. Here is just one example from the postings... but an important one... The mere presence of mod_ssl source code appears to be sufficient to make the product as a whole covered by 5D002 export controls --keyword=appears Does it, or doesn't it? mod_ssl is just a module. It doesn't do squat unless the OTHER ( Non ASF ) product is included in the compile. It's just a bunch of hooks into someone else's product. Are you SURE mod_ssl alone puts ASF into a 'danger zone' at all? Another example would be the early posting where the (little) crypto that IS included in Apache was shrugged off as 'insignificant'. ( MD5, SHA, Hashes, whatever ). Well... maybe that's a reverse case where ASF is assuming that isn't a problem but might, in fact, cause someone who doesn't really, really understand these things ( like some Justice Department lawyer ) some consternation. Best bet is for ASF to actually get a RULING on the technology from the State Department or whoever it is that would prosecute down the road if their 'assumptions' don' t match your 'assumptions'. Get it from the horse's mouth. Get it in writing. You could pay tons of lawyers to look into this and they could all still turn out to be wrong. The only people who know what will satisfy them are the people who would prosecute a violation. ATF? They have jurisdiction for prosecution. Maybe they are the ones who can supply the rulings. You asked for additional options before going to vote. That's my 'additional option'. Wait... and find out the exact nature of the problem and then be SURE the changes provide the proper solution ( if there even is a problem in the first place ). Yours... Kevin
Re: Apache proxy behaviour...
There is no such thing as an intermediate proxy that has any kind of 'filtering' going on that won't, on some occasions, need to 'buffer' some data. I believe even mod_include will 'wait' for tags to resolve if they split across buffers. The real questions to ask is... Why is the proxy timing out? ( I.e. Why isn't it getting the rest of the data? ) If something is always taking a huge amount of time to come up with the response then you just need to speed it up or increase your proxy timeout. Yours... Kevin Kiley In a message dated 2/2/2006 7:14:01 AM Pacific Standard Time, [EMAIL PROTECTED] writes: Hi there, I came with a problem which surprise me, as I thought Apache was working differently... We have Apache 2.0.55 working in reverse-proxy in front of different webservers. One of our website takes a long time to process requests and respond to the client. The proxy reaches its timeout and closes the connection. So, our developers created a webpage which sends small chunks of data so that the connection is never closed. I thought it would work as, for me, apache don't wait for the page to be complete before sending some chunks to the client... but instead it does ! In our configuration, Apache waits to have the entire page before sending it back to the client. Is it because of one of our module ? We are using, among other things, mod_rewrite, mod_proxy, mod_deflate, mod_security, mod_headers... Thanks for your help, Regards, Thomas.
Re: Directions for Win32 binary httpd
As someone who knows all of the Windows build platforms well... my 2 cents jives with your decision, Bill. Using MSVC 6.0 at this point and keeping the makefiles is the only 'sane' thing to do at this point. There are ISSUES with just about any of the newer platforms including the obvious "where o where did the makefiles go" stuff. Someone said it a few messages ago ( I think it was you, Bill ). httpd is an OPEN SOURCE project ( --keyword=source ). Let other people who need/must/or just want to use some other build environment have at it with other and find/solve the issues and post solutions back to the project. That's how it's supposed to work. I know Apache is pretty much your "full time job", Bill... but that still doesn't mean you're supposed to do EVERYTHING. Yours... Kevin PS: Whatever became of that nifty perl script nmake makefile-to-project-file conversion script you were working on? It was almost working fine at one point? Maybe you should post that into a distribution and let some other Perl wizard take it into the end zone. Might be needed later. In a message dated 12/3/2005 3:19:17 AM Central Standard Time, [EMAIL PROTECTED] writes: Ok, I've come to a conclusion; for the coming release, only msvcrt.dll builds under Visual C++ 6.0 make sense as our binary distribution. I'm not suggesting we dismiss the potential win of supporting our Studio 2005 compiler users(!) But let's quickly compare... . binary users generally aren't building modules, they need to plug into widely distributed binary components. . source users generally can build anything from source, if they need to. If they want to interface several components, they can build our source tarball with any compiler they like, including the 1 year free license of Studio 2005. . it's pretty trivial to build/install httpd with one of several pretty minimal unix toolchains available. It seems that most of the communites are still in VC 6. Remember the key reason we keep using it, MS dropped support for exporting makefiles. With no makefiles, you are roped into supporting only version x or newer Studio products. With .dsp/.dsw solutions, we can export makefiles on the old reliable VC 6, and users can load/convert these into Studio 2000/03/05. So I'll move ahead with all the msi tweaks required for our changed files, and we can reevaluate the state of things 6 mos or a year from now when we are almost ready to ship Apache X :) That's my conclusion, I'm still more than happy to hear out dissenting opinions. Speak up quick, though, planning to have a package up in /dev/dist by Sunday night for review, and push it out sometime early next week. Bill
Re: pgp trust for https?
Aw shucks... dad... you never let us have any fun. ROFL Kevin Hmmm... HTTP/1.1 PGP based TLS mechanisms under Itanium? Interesting ( and OT ). In a message dated 11/9/2005 2:45:13 PM Central Standard Time, [EMAIL PROTECTED] writes: Folks, somehow this thread diverged from HTTP/1.1 PGP based TLS mechanisms into a fun-with-hardware-trust thread. Please take this discussion to an appropriate security-wonk debating club forum, such as vuln-dev or bugtraq, as it's all entirely off topic on this forum. Yours, Bill
Re: pgp trust for https?
In a message dated 11/9/2005 4:12:50 PM Central Standard Time, [EMAIL PROTECTED] writes: Bill wrote... So rather than spin off-topic threads, where's the discussion of taking something that exists, such as se-linux, and actually leveraging security features of more evolved security architectures? That's when things come back on-topic here. Well... before you jumped in... I think we were just about to get there. We were just starting to discuss the 'more evolved security architectures' and how they can improve the chain of trust, etc... and how that might come into play for the lads discussing the ( Thread title ) "pgp trust for https". I am not affiliated with any company that directly manufactures or ships Itanium products... but I happen to have 2 or 3 of the beasties here and Peter is right... the answers to most people's well-worn arguments about how you can't secure an OS are now laying right on the coffee table ( once the solve the over-heating problems, LOL ). There are parts of every server ( httpd included ) that should ONLY run in the (new) protected IA64 'container' spaces. It's a given and it's inevitable. The httpd's security isn't off topic, I'll agree. Debating or promoting different ring and kernel architectures is off topic, though, when you aren't applying them to an operating system that httpd can run on. httpd runs fine on Itanium... it just hasn't even begun to take advantage of the new architecture, that's all... and that's all I saw Peter throwing out to the thread. It could... and it might represent some solutions for the lads who started the thread. Of course anyone is welcome to take the httpd code off to their own project to develop embedded httpd in a truly secure environment. Been there, done that. It's interesting to discover the real success of APR when you remove the operating system altogether and discover that Apache is now so good at not caring what the OS really is that it doesn't care much, either, when there is no OS at all. Yours... Kevin
Re: mod_deflate Vary header
Igor Sysoev wrote Actually, with MSIE 5.5+ appearance the chances that client can not decompress the response from downstream cache have increased. If MSIE 5.5 is configured to work via proxy with HTTP/1.0, then MSIE will never send "Accept-Encoding" header, and it would refuse the compressed content. You are right on the first part. If you don't have the "Use HTTP/1.1 for Proxies" checkbox on then no "Accept-Encoding:" header would be sent... ( principle of least astonishment applies here ). ...however... I think you will discover on closer inspection that the second part is not true. Even if MSIE 5.x doesn't send an "Accept-Encoding: gzip" header... it KNOWS it can decompress and will do so for any response that shows up with "Content-encoding: gzip" regardless of whether it sent an "Accept-Encoding: gzip" header. It's part of the wild and whacky world of browsers. Netscape exhibits similar behavior. Rather than rejecting something when they know they can do it, regardless of protocol 'envelope'... they just go ahead and do it anyway. There were ( are ) versions of Netscape, MSIE and Opera that were capable of decompressing "Content-encoding: gzip" even BEFORE they added any HTTP/1.1 support at all. Go figure. Actually, MSIE 5.5+ will cache the response with any "Vary" header if it also has "Content-Encoding: gzip" or "Content-Encoding: deflate" headers. But it refuses to cache if the response has no "Content-Encoding: gzip" header. Thanks for pointing that out. You are right. I didn't make that exception clear in last message. If response has "Content-Encoding: gzip ( or deflate )" then It HAS to cache it. It will even do so beyond all "Cache-Control:" directives. Both MSIE and Netscape actually USE their own local cache to perform the actual decompression and, as such, will ALWAYS write the compressed data to disk first regardless of any "Cache-Control:" directives or "Vary:" headers or anything else, for that matter. MSIE will end up with only the decompressed version in the cache but Netscape will end up with TWO copies of the response in it's local cache... the original compressed version and the decompressed version. Only problem with Netscape is that it then goes brain-dead and forgets that it has 2 cached copies of the same response and sometimes tries to actually PRINT the compressed version of the response out of it's local cache. Not sure they ever solved that one. "Vary:" really doesn't hold much meaning for end-point client caches, anyway, so it's curious that it actually affects browser caching behavior one way or the other. "Vary:" is/was mostly meant to just help intermediate caches figure out "what to send" to actual end-point agents ( the ones ASKING for the right response ). I can't think of many HTTP response headers that would/should matter a hoot to the end-point browser as far as "Vary:" goes once it has a response to a request. It's all about 'freshness' from the end-point cache's point of view. It's the browser that expects someone else to "Vary:" the response based on the REQUEST headers it sent. Only reason I ever heard for MSIE deciding to never cache any response with a "Vary:" header ( Other than compressed responses ) goes back to some huge bug they had to fix that was being caused by a "Vary:" header and, rather than actually add true "Vary:" support, the quick fix was to just reject everything. Better to point the blame upstream than ( God forbid ) show the user the wrong thing. I believe Netscape used the same approach. If you can't do ALL of the "Vary:" rules then safest thing to do is just reject it all and let someone ( or something ) else worry about getting you the right content. This was also SQUID's philosophy with regards to "Vary:" in the not-too-distant past, before they made any attempt to support multiple variants at all. SQUID would never bother to cache ANYTHING that had a "Vary:" header on it. My mod_deflate allows optionally to add "Vary" header to compressed response only. So you still run the risk of only getting one variant 'stuck' in a downstream cache. If the uncompressed version goes out at 'start of day' with no "Vary:" then the uncompressed version can get 'stuck' where you don't want it until it actually expires. Yours... Kevin Kiley
Re: mod_deflate Vary header
This has been discussed many times before and no one seems to understand what the fundamental problem is. It is not with the servers at all, it is with the CLIENTS. What both of you are saying is true... whether you "Vary:" on "Content-encoding" and/or "User-agent" or not... there is a risk of getting the wrong content ( compressed versus uncompressed ) "stuck" in a downstream cache. It is less and less likely these days that the cache will receive a request from a client that CANNOT "decompress", but still possible. Handling requests from clients that cannot decompress have become ( at long last ) the "fringe" cases but are no less important than ever. Microsoft Internet Explorer ( all versions ) will REFUSE to cache anything locally if it shows up with any "Vary:" headers on it. Period. End of sentence. So you might think you are doing your downstream clients a favor by tacking on a "Vary:" header to the compressed data so it gets 'stored' somewhere close to them... but you would be wrong. If you don't put a "Vary:" header on it... MSIE will, in fact, cache the compressed response locally and life is good. The user won't even come back for it until it's expired on their own hard drive or they clear their browser cache. However... if you simply add a "Vary:" header to the same compressed response... MSIE now refuses to cache that response at all and now you create a "thundering herd" scenario whereby the page is never local to the user for any length of time and each "forward" or "back" button hit causes the client to go upstream for the page each and every time. Even if there is a cache nearby you would discover that the clients are nailing it each and every time for the same page just because it has a "Vary:" header on it. I believe Netscape has the same problem(s). I don't use Netscape anymore. Anyone know for sure if Netscape actually stores "variants" correctly in local browser cache? Yours... Kevin Kiley In a message dated 11/4/2005 4:55:02 PM Central Standard Time, [EMAIL PROTECTED] writes: On 11/04/2005 07:36 AM, Florian Zumbiehl wrote: [..cut..] Maybe I'm pessimistic, but I think, omitting the Vary header for uncompressed ressources will lead to "poisoned" caches, which statistically nearly always will request the uncompressed variant and so actually *add* load to your server's bandwidth. Hu? Erm, could it be that you are thinking of load-reducing caches put directly in front of the web server? In that case, I wonder how the web server's bandwidth could be the bottleneck?! To put this straight: I was thinking about web servers behind V.90/ISDN/ADSL lines where _that_ line usually will be the bottleneck on any connections to the outside world and about caching proxies in that outside world ... Yes, but if you do compression because some of your clients have low bandwith connections (but are capable of receiving compressed pages) and access your server via a proxy then not sending the Vary header can "poison" the cache in a way you do not want. Because if the client which causes the proxy to cache the response is not capable to receive compressed pages will cause the proxy to cache *only* the uncompressed version of the page. Regards Rüdiger
Re: Issues for 2.1.8
I thnk we all understand what Bill is saying, there is simpy normal, healthy disagreement. That's good. Look... every now and then we ALL get the urgre to clean up the room and move the furniture around and get the dirty laundry off the floor. Bill thinks modules/experimental is part of the 'cleanup' that he thinks should happen. More power to him. My 2 cents jives with Jim J, Brad and others. Healthy discussion and it's nice to see... but I don't think it's broken. Don't fix it. Clean it up, sure. Try and make it 'tidier' and get rid of shit that isn't going anywhere and nobody seems to care about, but don't DEEP-SIX it. Even if the directory is EMPTY for any particular release I still think it is an ENTICEMENT to CONTRIBUTE. It's very presence in the tarball is an INVITATION for those who, as Shakespeare would say, are "Seeking the bubble reputation even in the cannon's mouth". It says "Can you think of something useful that we haven't yet?". As stated before... it's just a perfect symbol of the way everyone should look at Apache... that is has NEVER been 'finished' or 'perfect', is not now, and never will be. It is a total 'work in progress' at all times. It is, in essence,a living'experiment' unto itself. That's all. Kevin Kiley In a message dated 9/22/2005 11:18:35 AM Central Daylight Time, [EMAIL PROTECTED] writes: But what you are suggesting is exactly what has proven NOT to work. Moving modules into sub-projects has already proven to kill the module. It needs to be included in the release. That is an essential part ofthe incubation process. modules/experimental WORKS!! That fact hasalready been proven true by the number of modules that have passedthrough modules/experimental and graduated to become standard modules. If it isn't broken, then why are you trying to fix it with somethingthat we already know DOESN'T work.Brad
Re: Issues for 2.1.8
Jim J. wrote... People will not use it unless they can *really* trust a module. Simply expecting people to migrate to it because of the theoretical benefits isn't quite wise, until it has proven itself. The idea is to make it easier for people to have access to a module, use it and test it. More exposure means more feedback and more bug-fixes (hopefully :) ). But simply "being there" isn't enough to expect world-wide usage, but "being there" is enough to hope that people have easier access to play around with it. Well said, Jim. The experimental modules section of the general release has always been the 'enticing' part of the tarball and the "You can play too" advertisement. It is the constant reminder that Apache had never been, is not now, and will never be FINISHED. You need to keep everything you've got to prevent the trend of the last few years to keep from "closing in" and losing your ability to attract new talent into your developer pool. Yours Kevin Kiley
Re: [PATCH] mod_cache. Allow override of some vary headers
In a message dated 8/17/2005 2:01:41 PM Central Daylight Time, [EMAIL PROTECTED] writes: CacheOverrideHeader Accept-Encoding gzip CacheOverrideHeader User-Agent gzip This would allow all browsers that send "Accept-Encoding: gzip" and do not match the BrowserMatches to be mapped to the same cache object. All the other variants would point to another object. This would be very useful in reverse proxy caches. Only patched mod_disk_cache, but mod_mem_cache should be trivial. This was all done in a few hours today, including testing. What sort of testing did you do in those few hours? If I could play devil's advocate for a moment... my concern would be that you haven't considered certain scenarios that won't work with this patch. Did you test a scenario whereby the COS ( Content Origin Server ) is actually trying to "Vary:" the content based on "User-Agent:" regardless of the "Accept-Encoding:" value? My concern would be that if you have to set this up using BOTH of the following 'overrides'... CacheOverrideHeader Accept-Encoding gzipCacheOverrideHeader User-Agent gzip ...that there might be times when the varied 'content' for 'User-Agent' alone gets lost in the 'overrides'. What would happen in this scenario... COS has sent 2 totally different responses based on whether 'User-Agent' was MSIE or NETSCAPE. Both of those responses were compressed ( Content-Encoding: gzip ) and they should have been stored as 2 different ( compressed ) 'variants' for the same response based on ( non-overriden ) actual value of "User-Agent" field. If a request now shows up and hits your overrides and "User-Agent" is overriden by the 'gzip' environment variable then does the requestor get the ( right ) compressed variant? Yours... Kevin
Re: [PATCH] fix incorrect 304's responses when cache is unwritable
In a message dated 8/11/2005 12:42:35 PM Central Daylight Time, [EMAIL PROTECTED] writes: The code will remove the header file and the disk file; but it also likely needs to go up a 'level' and remove all variants. Because if we get a 404 on a varied entity, it also means that all variants should be removed, no? Actually, no. What you really should do if you are going to drop into 'cleanup' mode on this base URI is RE-VALIDATE ALL THE VARIANTS and remove any that return bad response codes but keep the ones that are 'still ok'. It is possible ( and legal? ) for a COS ( Content Origin Server ) to return a '404' on one variant of an entity but not on another. I have seen CGI scripts that would do this very thing. Yours... Kevin Kiley
Re: Supported Compilers
Is there a list of "supported" compilers? I am having to compile using gcc 2.96 and having some wierdness, but works fine on 3.3. It may be something else with the box, but just wanted to know if there was an "official" list. wasn't the 2.96 one the redhat special version (known to be weird)? nd Yes. Absolutely. It should have NEVER shipped with any product. It builds 'bogus' COFF and ELF images yet you will never get one single warning or error message. It's trash. Don't use it. Yours Kevin Kiley In a message dated 3/23/2005 11:38:45 AM Central Standard Time, [EMAIL PROTECTED] writes: Is there a list of "supported" compilers? I am having to compile using gcc 2.96 and having some wierdness, but works fine on 3.3. It may be something else with the box, but just wanted to know if there was an "official" list. wasn't the 2.96 one the redhat special version (known to be weird)? nd
Re: [PATCH] another mod_deflate vs 304 response case
At 10:26 AM 11/22/2004, Cliff Woolley wrote: On Mon, 22 Nov 2004, Joe Orton wrote: There's another mod_deflate vs 304 response problem which is being triggered by ViewCVS on svn.apache.org: when a CGI script gives a "Status: 304" response the brigade contains a CGI bucket then the EOS, so fails the "if this brigade begins with EOS do nothing" test added for the proxied-304 case. Okay, but why the next three lines? Why would Content-Encoding: gzip *ever* be set on a 304? Let me expand on this question... ... we've all seen broken browsers. Why attempt to gzip non-2xx responses at all? I'd prefer (at least on servers with modest error message responses) to ensure the user can *read* any error response, in spite of any broken browser behavior w.r.t. deflate. It seems like a flag (even default) dropping gzip from non-2xx class responses could be a very useful thing. At least, if the browser results are a mess, it's due to a good response. Thoughts? Bill Because some custom error response pages on some sites are HUGE... and they WANT to compress them. For a while there ( especially in Europe and most notably in Germany ) it seemed like there was a contest going on to see who could come up with the most bloated and complicated error response pages for their Web farm/ commercial Server. Tons of _javascript_ and flashing lights and bouncing balls and advertising links, you name it. I have seen some base error templates exceed 200,000 bytes of HTML and _javascript_ just to say 'We're sorry... that page can't be found'. Bottom line: If someone WANTS to be doing this sort of thing and they WANT to compress the responses they certainly should be able to and it should all work. Anything that prevents it from working is still just a bug that's being ignored. Suggestion: Make sure someone can compress any response they want via config but then make sure to NOT recommend doing certain things and let them swim at their own risk. No lifeguard on duty. Yours... Kevin Kiley
Re: cvs commit: httpd-2.0/server protocol.c
In the case you just mentioned... it is going to take a special 'filter' to 'sense' that a possible DOS attack is in progress. Just fair amounts of 'dataless' connection requests from one or a small number of orgins doesn't qualify. There are plenty of official algorithms around now to 'sense' most of these brute force attacks and ( only then ) pop you an 'alert' or something. Just relying on a gazillion entries in a log file isn't the right way to 'officially' distinguish a DOS attack from just ( as Roy says ) 'life on the Internet'. Sure, you may need to have some logic to determine what makes an attack and what not, but you must have the log entry to begin with so you feed it to the algorithm. Respectfully disagree. There is no 'may' about it. You MUST have SOMETHING that knows the difference or you don't have DOS protection. Also... if you wait all the way until you have a 'log' entry for a DOS in progress then you haven't achieved the goal of sensing them 'at the front door'. What I was suggesting is some kind of 'connection' based filter that has all the well-known DOS attack scheme algorithms in place and can 'sense' when they are happening before the Server gets overloaded. Once the DOS protection kicks in... you don't get any 'log' entries at all... the goal is to prevent the connections from ever turning into 'requests' that the Server has to waste time processing. It's your only chance to survive a real DOS attack. Yours... Kevin Kiley In a message dated 10/26/2004 8:50:11 AM Central Daylight Time, [EMAIL PROTECTED] writes: In the case you just mentioned... it is going to take a special 'filter' to 'sense' that a possible DOS attack is in progress. Just fair amounts of 'dataless' connection requests from one or a small number of orgins doesn't qualify. There are plenty of official algorithms around now to 'sense' most of these brute force attacks and ( only then ) pop you an 'alert' or something. Just relying on a gazillion entries in a log file isn't the right way to 'officially' distinguish a DOS attack from just ( as Roy says ) 'life on the Internet'. Sure, you may need to have some logic to determine what makes an attack and what not, but you must have the log entry to begin with so you feed it to the algorithm.
Re: cvs commit: httpd-2.0/server protocol.c
You MUST have SOMETHING that knows the difference or you don't have DOS protection. Also... if you wait all the way until you have a 'log' entry for a DOS in progress then you haven't achieved the goal of sensing them 'at the front door'. I don't set myself that goal. I agree that it's the best place to detect a DoS but it's often not possible for various reasons. With that option not available I prefer to be able to detect DoS attacks anywhere I can. Roger that. What I was suggesting is some kind of 'connection' based filter that has all the well-known DOS attack scheme algorithms in place and can 'sense' when they are happening before the Server gets overloaded. That does not need to be in web server at all. It can work from within the kernel, or be a part of a network gateway. Double Roger That Yours... Kevin Kiley
Re: cvs commit: httpd-2.0/server protocol.c
For example, we had a problem report on #apache a couple of days ago which turned out, after considerable investigation, to be the result of a single host ip issuing hundreds of request connections in a few minutes. Whether this was a deliberate attack or simply a buggy client is not clear (to me) but the temporary solution of blocking the ip address was certainly within the server admin's abilities. That could have easily just been one of these 'commercial' companies that are just testing sites for 'availabilty' and publishing the results. There are lots of these. Only one example... http://www.internethealthreport.com These guys might hit you hard and fast at any moment and they aren't sending any data... they just want to see how fast you can 'answer the phone' and they turn that into a 'site health' statistic. Roy was right in his previous message. Apache USED to try and log something for all broken inbound connect requests but that, itself, turned into a 'please fix this right away' bug report when people's log files went through the roof. In the case you just mentioned... it is going to take a special 'filter' to 'sense' that a possible DOS attack is in progress. Just fair amounts of 'dataless' connection requests from one or a small number of orgins doesn't qualify. There are plenty of official algorithms around now to 'sense' most of these brute force attacks and ( only then ) pop you an 'alert' or something. Just relying on a gazillion entries in a log file isn't the right way to 'officially' distinguish a DOS attack from just ( as Roy says ) 'life on the Internet'. All major browsers will abandon pending connect threads for a web page whenever you hit the BACK button, as well. Connects in progress at the socket level will still complete but no data will be sent because the threads have all died. Happens 24x7x365. The 5 second rule still applies. If people don't see good content showing up within 5 seconds they will click away from you and all the threads pending connects to you die immediately but all you might see are tons of 'dataless' connect completions on your end. It's not worth logging any of it. Yours... Kevin Kiley In a message dated 10/25/2004 11:23:24 PM Central Daylight Time, [EMAIL PROTECTED] writes: This is not an error that the server admin can solve -- it is normal life on the Internet. We really shouldn't be logging it except when on DEBUG level. That was my first reaction, too. However, Ivan Ristic pointed out that (in some cases, anyway) it is the result of a DoS attack, and there may be something a server admin can do about it. Or, if not, at least they might know why their server is suddenly performing badly. For example, we had a problem report on #apache a couple of days ago which turned out, after considerable investigation, to be the result of a single host ip issuing hundreds of request connections in a few minutes. Whether this was a deliberate attack or simply a buggy client is not clear (to me) but the temporary solution of blocking the ip address was certainly within the server admin's abilities.
Re: cvs commit: httpd-2.0/server core.c protocol.c request.c scoreboard.c uti...
Roy is right... Willy-nilly throwing casts on data objects just to satisfy some anal-retentive urge to not see any warnings appearing during a compile is the absolute WRONG thing to do when it comes to porting 32-bit code to 64-bit platforms. The situation is NOT as simple as it was when 32-bits hit the scene and everybody had to port all the 16-bit stuff. I've been working with 64 bit for quite some time and I can tell you that there are still many goblins lurking in this department and many assumptions that are not true. Example... Roy wrote... It is far safer to cast module-provided data from int up to 64 bits than it is to cast it down from 64 bit to int. Probably... but there is an assumption being made there on Roy's part. He is assuming that the cast will automatically clear the high order 32 bits. One SHOULD be able to 'assume' that in order not to violate the 'principle of least astonishment' but, unfortunately, on some current 64 platforms this is not the case. A simple cast of an 'int' to a 'long long' ( or the equivalent 64 bit wide data item ) will NOT 'clear' all the high order bits that reach the target. I am not going to name any names here on this (public) forum but all I can tell you is that if you think a simple typecast in ANY direction ( up or down ) on a 64 bit platform is automatically setting or clearing high/low order bits... think again. Consult with your chip manufacturer and read all defect reports and read your 64 bit compiler or assembler manual carefully. Also... take a (careful) look at the ASM code being produced. Trust... but verify. Roy is absolutely right when he says that the only way to go is from 'the bottom up' ( and keep those chip and compiler manuals and defect reports close by ). Yours... Kevin Kiley In a message dated 10/22/2004 8:20:38 PM Central Daylight Time, [EMAIL PROTECTED] writes: The precursor to this patch "[PATCH] WIN64: httpd API changes" was posted 10/7 so I thought we had had suitable time for discussion. I have addressed the one issue that was raised. That explains why I didn't see it -- I was in Switzerland. There have also been several other threads on the httpd apr lists and the feedback I had received indicated the it was appropriate to sanitize the 64 bit compile even if it incurred httpd API changes. However if there are specific security issues that this has brought up I am more than anxious to address them. Are you opposed to changing the API to fix 64 bit warnings or are there specific issues that I can address and continue to move forward rather that back out the entire patch? I am opposed to changing the API just to mask warnings within the implementations. In any case, these changes cannot possibly be correct -- the API has to be changed from the bottom-up, not top-down. It is far safer to cast module-provided data from int up to 64 bits than it is to cast it down from 64 bit to int. Fix mismatches of the standard library functions first, then fix APR, then carefully change our implementation so that it works efficiently on the right data types as provided by APR, and finally fix the API so that modules can work. If that isn't possible, then just live with those warnings on win64. In any case, changes like + /* Cast to eliminate 64 bit warning */ + rv = apr_file_gets(buf, (int)bufsiz, cfp); are absolutely forbidden. Roy
Re: [PATCH] mod_disk cached fixed
Brian Akins wrote... [EMAIL PROTECTED] wrote... Brian Akins wrote... Serving cached content: - lookup uri in cache (via md5?). - check varies - a list of headers to vary on - caculate new key (md5) based on uri and clients value of these headers - lookup new uri in cache - continue as normal Don't forget that you can't just 'MD5' a header from one response and compare it to an 'MD5' value for same header field from another response. This isn't what I meant. I mean get the "first-level" key by the md5 of the uri, not the headers. Ok... fine... but when you wrote this... "caculate new key (md5) based on uri AND clients value of these headers" The AND is what got me worried. I thought you were referring to the scheme you proposed in an earlier email where you WERE planning on doing just that... Brian Akins wrote... I actually have somewhat of a solution: URL encode the uri and any vary elements: www.cnn.com/index.html?this=that Accept-Encoding: gzip Cookie: Special=SomeValue may become: www.cnn.com%2Findex.html%3Fthis%3Dthat+Accept-Encoding%3A+gzip+Cookie%3A+Special%3DSomeValue A very simple hashing function could put this in some directory structure, so the file on disk may be: /var/cache/apache/00/89/www.cnn.com%2Findex.html%3Fthis%3Dthat+Accept-Encoding%3A+gzip+Cookie%3A+Special%3DSomeValue Should be pretty fast (?) if the urlencode was effecient. Brian Akins Not that this wouldn't acutally WORK under some circumstances ( It might ) but it would qualify as just a 'hack'. It wouldnt' qualify as a good way to perform RFC standard 'Vary:'. Brian Akins also wrote... BrowserMatch ".*MSIE [1-3]|MSIE [1-5].*Mac.*|^Mozilla/[1-4].*Nav" no-gzip and just "vary" on no-gzip (1 or 0), but this may be hard to do just using headers... It's not hard to do at all... question would be whether it's ever the 'right' thing to do. If you know alot about the data you can do this. In "reverse proxy" mode, you would. Take your logic just a tiny step farther. I you know EVERYTHING about the data... then you CERTAINLY can/would. You have just hit on something that should probably be discussed further. The whole reason "Vary:" was even created was so that COS ( Content Origin Servers ) could tell downstream caches to come back upstream for a 'freshness' check for reasons OTHER than simple time/date based 'expiration' rules. I am not certain but I believe it was actually the whole "User-Agent:" deal that made it necessary. When it became obvious that different major release browsers had completely different levels of HTTP support and the HTML that might work for one would puke on another then it became necessary to have 'Multi-Variants' of the same response. I am sure the 'scheme' was intended to ( and certainly will ) handle all kinds of other situations ( Cookie values would be second, I guess ) but IIRC there was no more pressing issue for 'Vary:' and 'Multiple Variants of a request to the same URI' than to solve the emerging 'User-Agent:' nightmare. So that's all well and good. There really SHOULD be a way for any cache to hold 2 different copies of the same non-expired page for both MSIE and Netscape, when the only reason to do so is that the HTML that works for one (still) might not work for the other. But that leads back to YOUR idea ( concern )... When does a response REALLY (actually) Vary and why should you have to store tons and tons of responses all the time? That's easy... when the entire response for the same URI differs in any way from an earlier ( non-expired ) response to a request for the same URI... only then does it 'actually Vary'. If you MD5 and/or hard-CRC/cheksum the actual BODY DATA of a response and it does not differ one iota from another (earlier) non-expired response to a request for the same URI... then those 2 'responses' DO NOT VARY. It is only when the RESPONSE DATA itself is 'different' that it can be said the responses truly 'Vary'. So here is the deal... Even if you get 25,000 different 'User-Agents' asking for the same URI... there will most probably only be a small sub-set of actual RESPONSES coming from the COS. It is only THAT sub-set of responses that need to be stored by a cache and 'associated' with the different ( Varying ) User-Agents. So that doesn't mean a (smart) cache needs to store 25,000 variants of the same response... It only needs to STORE responses that ACTUALLY VARY. How the sub-sets of 'Varying' responses get 'associated' with the right set(s) of 'Varying' header field(s) ( ie. User-Agent ) is something that the 'Vary:' scheme lacks and was not considered in the design. Topic for discussion? Kindling for a flame ware? Not sure... but you raise an interesting question. Brian Akins also wrote... [EMAIL PROTECTED] wrote... That's why it (Vary) remains one of the least-supported features of HTTP. Squid supports it really well. Look again.
Re: [PATCH] mod_disk cached fixed
Brian Akins wrote... Serving cached content: - lookup uri in cache (via md5?). - check varies - a list of headers to vary on - caculate new key (md5) based on uri and clients value of these headers - lookup new uri in cache - continue as normal Don't forget that you can't just 'MD5' a header from one response and compare it to an 'MD5' value for same header field from another response. A "Vary:" check does not mean 'has to be exactly the same as the other one'. It just has to be 'semantically' different. You can have a header value that is formatted differently from another and it is still, essentially, the SAME as another and does NOT VARY. That includes different amounts of whitespace and a different 'ordering' of the 'values'. As long as the 'values' are the SAME with regards to another header, then the header fields do NOT VARY. The only way to do it right is to be able to parse each and every header (correctly) according o BNC and and compare them that way. Syntax or whitespace differences don't automatically mean a header 'Varies' at all. The thing that sucks is if you vary on User-Agent. You wind up with a ton of entries per uri. Yep. That's how 'Muli-Variants' works. There might be very good reasons why every 'Varying' User-Agent needs a different 'Variant' of the same response. I cheated in another modules by "varying" on an environmental variable. Kind of like this: BrowserMatch ".*MSIE [1-3]|MSIE [1-5].*Mac.*|^Mozilla/[1-4].*Nav" no-gzip and just "vary" on no-gzip (1 or 0), but this may be hard to do just using headers... It's not hard to do at all... question would be whether it's ever the 'right' thing to do. The actual compressed content for different 'User-Agents' might actually 'Vary:' as well so no one single compressed version of a response should be used to satisfy all non-no-gzip requests if there is actually a 'Vary: User-Agent' rule involved. It's pretty hard to 'cheat' on 'Vary:' That's why it remains one of the least-supported features of HTTP. It's kind of an 'all or nothing' deal whereby if you can't do it ALL correctly... then might as well do what most products do and treat ANY 'Vary:' header as if it was 'Vary: *' ( Vary: STAR ) and don't even bother trying to cache it. Kevin Kiley
Re: Aborting a filter.
On Tue, 22 Jun 2004, Peter J. Cranstone wrote: Thanks... we're currently testing a new version of mod_gzip called mod_gzip64i For the record, I've fixed the problem. Super! It was a failure to support some of the compression flags. Now I'll have to (side?)port it into a CVS version of mod_deflate ... Suggestion: 99 time out of 100 these kind of errors from ZLIB are going to come right off the bat when the ZLIB stuff first reads the LZ77 'header'. Any 'incompatibilites' will discovered right then and there. What you probably ought to do in your filter is make the FIRST ZLIB call and check the return code and, if bad, not even bother to compress. Just 'reject' it and shut down the filter like you already do for other 'front door' reasons. Once you start the compression and the filter is 'engaged' and you have returned 'OK' it's too late to go back. It would be VERY RARE to get past the initial header read with ZLIB without an error and THEN run into a data error somewhere in the middle of the compression. Whatever error you might get would almost always be 'knowable' at the point where you still have the chance to not even do the compression at all and 'remove' the filter from this particular response. [grumble] Why isn't this documented in the manpages or in zlib.h[/grumble]? There is a LOT about ZLIB ( and GZIP, for that matter ) that isn't documented anywhere. Welcome to the party. Later... Kevin In a message dated 6/24/2004 6:05:09 AM Central Daylight Time, [EMAIL PROTECTED] writes: On Tue, 22 Jun 2004, Peter J. Cranstone wrote: Thanks... we're currently testing a new version of mod_gzip called mod_gzip64i For the record, I've fixed the problem. It was a failure to support some of the compression flags. Now I'll have to (side?)port it into a CVS version of mod_deflate ... grumbleWhy isn't this documented in the manpages or in zlib.h/grumble? --
Re: [PATCH 1.3] Proxied Server:/Date: headers
William Rowe wrote... I'd worked with some interesting java and cgi code which implements proxy behavior, as opposed to using a compiled-in module such as mod_proxy. In order to properly pass on the Server: and Date: headers (which are owned by the origin server), this patch tests for the presence of a Via: header, indicating the response is proxied. If neither r-proxyreq nor a 'Via:' header exist, the server still overrides the Server: and Date: responses, per RFC. Not seeing any other feedback on this but thought I would add my 2 cents. I understand what you are talking about. Allowing CGI to have finer grain control over what headers are ACTUALLY returned with a response has always been an issue with Apache. Only a MODULE really has the API calls for that kind of 'out the door' control and sometimes not even enough... ...but this sound like a real hack. Using 'Via:' to ASSUME anything, much less making it a permanent patch to the production code, just sounds like a bad idea. You just can't depend on anyone actually using 'Via:' at all. The patch might actually create more problems than it solves. Maybe the discussion really should be centered on the classic issue that this problem represents... How can CGI gain better control over the actual 'on the wire' header output? ( Override Server default behavior, when necessary, etc... ). Maybe the OTHER discussion going on about mod_headers is actually RELATED to this. Syntax: Header set|append|add|unset|echo header [value [env=[!]variable]] [persist] Someone who wants to do 'proxy behavior' from CGI and not a module probably SHOULD be able to coordinate the actual return header content by using ENVIRONMENT variables in conjunction with mod_headers. Example... 1. CGI decides ( However it can ) that Server: field should either be changed on the return OR remain the same. 2. CGI supplies the RIGHT 'Server:' header 3. CGI also ( somehow? ) tells core Server to RESPECT the decisions it has made and actually USE the new Server/Date header(s) on the return trip and not default to something else. Some environment variable signal like "Use_the_headers_I_am_giving_you_as_is"? Or maybe what you are talking about here is just some new way for the CGI ( decision maker ) itself to be able to directly set the the r-proxyreq flag BEFORE it 'returns' if/when the CGI decides that needs to be done. Right now... you are right... that's something that only mod_proxy or some other MODULE code can do. Better access to core server variables from CGI? A new API to do that? Don't know. I just think the 'Via:' thing is bad idea and doesn't rise to the level of needing to be a permanent patch. Later... Kevin Original Message from William Rowe... Still hoping for some feedback. Note that this proposal affects anyone who tries to implement a proxy feed from CGI, modperl, tomcat, php, or any other interesting mechanism, where the user can't manipulate the r-proxyreq flag :) Bill Date: Fri, 28 May 2004 13:02:14 -0500 To: [EMAIL PROTECTED] From: "William A. Rowe, Jr." [EMAIL PROTECTED] I'd worked with some interesting java and cgi code which implements proxy behavior, as opposed to using a compiled-in module such as mod_proxy. In order to properly pass on the Server: and Date: headers (which are owned by the origin server), this patch tests for the presence of a Via: header, indicating the response is proxied. If neither r-proxyreq nor a 'Via:' header exist, the server still overrides the Server: and Date: responses, per RFC. Comments appreciated. Bill
Re: mod_proxy distinguish cookies?
Roy T. Fielding wrote: I do wish people would read the specification to refresh their memory before summarizing. RFC 2616 doesn't say anything about cookies -- it doesn't have to because there are already several mechanisms for marking a request or response as varying. In this case Vary: Cookie added to the response by the server module (the only component capable of knowing how the resource varies) is sufficient for caching clients that are compliant with HTTP/1.1. Graham wrote... My sentence "RFC2616 does not consider a request with a different cookie a different variant" should have read "RFC2616 does not recognise cookies specifically at all, as they are just another header". I did not think of the Vary case, sorry for the confusion. Regards, Graham "Vary" still won't work for the original caller's scenario. Few people know this but Microsoft Internet Explorer and other major browsers only PRETEND to support "Vary:". In MSIE's case... there is only 1 value that you can use with "Vary:" that will cause MSIE to make any attempt at all to cache the response and/or deal with a refresh later. That value is "User-Agent". MSIE treats all other "Vary:" header values as if it received "Vary: *" and will REFUSE to cache that response at all. This means that if you try and use "Vary:" for anything other than "User-Agent" then the browser is going to not cache anything (ever) and will be hammering away at the unlucky nearest target ProxyCache and/or Content Server. Why in the world an end-point User-Agent would only be interested in doing a "Vary:" on its own name ( which it already knows ) ceases to be a mystery if you read the following link. The HACK that Microsoft added actually originated as a problem report to the Apache Group itself back in 1999... URI title: Client bug: IE 4.0 breaks with "Vary" header. http://bugs.apache.org/index.cgi/full/4118 Microsfot reacted to the problem with a simple HACK that just looks for "User-Agent" and this fixed 4.0. That simple hack is the only "Vary:" support MSIE really has to this day. The following message thread at W3C.ORG itself proves that the "Vary:" problem still exists with MSIE 6.0 ( and other major browsers )... http://lists.w3.org/Archives/Public/ietf-http-wg/2002AprJun/0046.html There is also a lengthy discussion about why "Vary:" is a nightmare on the client side at the mod_gzip forum. The discussion centers on the fact that major browsers will refuse to cache responses locally that have "Vary: Accept-encoding" and will end up hammering Content Servers but the discussion expanded when it was discovered that most browsers won't do "Vary:" at all. http://lists.over.net/pipermail/mod_gzip/2002-December/006838.html As far as this fellow's 'Cookie' issue goes... there is, in fact, a TRICK that you can use ( for MSIE, anyway ) that actually works. Just defeat the HACK with another HACK. If a COS ( Content Origin Server ) sends out a "Vary: User-Agent" then most major browsers ( MSIE included ) will, in fact, cache the response locally and will 'react' to changes in "User-Agent:" field when it sends out an "If-Modified-Since:' refresh request. If you create your own psuedo-cookies and just hide them in the 'extra' text fields that are allowed to be in any "User-Agent:" field then Voila... it actually WORKS! I know that's going to send chills up Roy's spine but it happens to actually WORK OK. Nothing happens other than 'the right thing'. MSIE sees a 'different' "User-Agent:" field coming back and could actually care less WHAT the value is... it only knows that it's now 'different' and so it just goes ahead and accepts a 'fresh' response for the "Vary:". If this fellow were to simply 'stuff' his Cookie into the 'extra text' part of the User-Agent: string and send out a "Vary: User-Agent" along with the response then it would actually work the way he expects it too. Nothing else is going to solve the problem with MSIE, I'm afraid, other than this 'HACK the HACK'. Later... Kevin
Re: mod_proxy distinguish cookies?
Hi Neil... This is Kevin Kiley... Personally, I don't think this discussion is all that OT for Apache but others might disagree. "Vary:" is still a broken mess out there and if 'getting it right' is still anyone's goal then these are the kinds of discussions that need to take place SOMEWHERE. Apache is not the W3C but it's about as close as you can get. I haven't looked at this whole thing for a LOOONG time so I had to go back and check my notes regarding the MSIE 'User-Agent' trick. As absurd as it sounds... you actually got the point. "User-Agent:' IS, in fact, supposed to be a 'request-side' header but when it comes to "Vary:"... the world can turn upside down and what doesn't seem to make any sense can actually WORK. Unfortuneately... I can't find the (old) notes I had about exactly what I did to make the "Vary: User-Agent" trick actually work with MSIE. I was just mucking around and never had any intention of implementing this as a solution for anything but I DO remember somehow making it WORK ( almost ) just the way you are doing it. If I have some time... I'll try to find those notes and the test code I know I had somewhere that WORKED. Another fellow who just responded pointed out that "Content-encoding:'" seems to be another field that MSIE will actually react to when it comes to VARY. Well... it had been so long since I mucked with all this I had to go back and find/read some notes. The fellow who posted is SORT OF right about "Content-Encoding:" LOOKING like it can "Vary:" but it's not really "Vary:" at work at all. The REALITY is explained in that link I already supplied in last message... http://lists.over.net/pipermail/mod_gzip/2002-December/006838.html Unless there has been some major change or patch to MSIE 6.0 and above then I still stand by my original research/statement... MSIE will treat ANY field name OTHER than "User-Agent" that arrives with a "Vary:" header on a non-compressed response as if it had received "Vary: *" ( Vary: STAR ) and it will NOT CACHE that response locally. Every reference to page ( Via Refresh, Back-button, local hyperlink-jump, whatever ) will cause MSIE to go all the way upstream for a new copy of the page EVERY TIME. Maybe this is really what you want? Dunno. The reason it also LOOKS like "Content-Encoding" is being accepted as a VARY and MSIE is sending out an 'If-Modified-Since:' on those pages is NOT because it is doing "Vary:"... it's for other strange reasons. Whenever MSIE receives a compressed response ( Content-encoding: gzip ) then it will ALWAYS cache that response... even if it has been specifically told to NEVER do that ( no-cache, Expires: -1 , whatever ). It HAS to. MSIE ( and Netsape ) MUST use the CACHE FILE to DECOMPRESS the response... and it always KEEPS it around. Neither MSIE or Netscape nor Opera are able to 'decompress' in memory. They all MUST have a cache file to work from even if they are not supposed to EVER cache that particular response. They just do it anyway. So... to make a long story short... MSIE will always decide it MUST cache a response with any kind of "Content-Encoding:" on it and it will set the cache flags for that puppy to 'always-revalidate' and that's where the "If-Modified-Since:" output is coming from which makes it LOOKS like "Vary:" is involved... but it is NOT. However... in the world of "Vary:" you run into this snafu whereby you can't differentiate between what you are trying to tell an inline Proxy Cache 'what to do' versus an end-point user-aget. Example: If you are a COS ( Content Origin Server ) and you want a downstream Proxy Cache to 'Vary' the ( non-expired ) response it might give out according to whether a requestor says it can handle compression or not ( Accept-encoding: gzip, deflate ) then the right VARY header to add to the response(s) is "Vary: Accept-Encoding" and not "Vary: Content-Encoding". The "Content-Encoding" only comes FROM the Server. The 'decision' you want the Proxy Cache to make can only be based on whether a requestor has sent "Accept-Encoding: gzip, deflate" ( or not ). If there is no inline Proxy ( which is always impossible to tell ) and response is direct to browser then the same "Vary:" header that would 'do the right thing' for a Proxy Cache is meaningless for the end-point user-agent itself. The User-Agent never 'varies' it's own 'Accept-Encoding:' output header ( unless you are using Opera and clicking all those 'imitate other browser' options in-between requests for the same resource ). One of the biggest mis-conceptions out there is that browsers are somehow REQUIRED to obey all the RFC standard caching rules as if they were HTTP/x.x compliant Proxy Caches. They are NOT. The RFC's themselves say that end-point user agents can be 'implementation specific' when it comes to caching and should not be considered true "Proxy Caches". Most major browsers DO 'follow the rules' ( sort of ) but none of them could be considered true HTTP
Re: mod_proxy distinguish cookies?
Neil wrote... Thanks again Kevin for the insight and interesting links. It seems to me that there are basically three components here: My server, intermediate caching proxies, and the end-user browser. From my understanding of the discussion so far, each of these can be covered as follows: 1. My server: Cookies can be understood (i.e. queries are differentiated) by my server's reverse proxy cache. Sure... but only if you are receiving all the requests WHEN and AS OFTEN as you need to. ( User-Agents coming back for pages when they are supposed to )... 2. Intermediate caching proxies: I can use the 'Vary: Cookie' header to tell any intermediate caches that cookies differentiate requests. Nope. Scratch the word 'any' and substitute 'some'. There are very few 'Intermediate caching proxies' that are able to 'do the right thing' when it comes to 'Vary:'. MOST Proxy Cache Servers ( including ones that SAY they are HTTP/1.1 compliant ) do NOT handle Vary: and they will simple treat ANY response they get with a "Vary:" header of any kind exactly the way MSIE seems to. They will treat it as if it was "Vary: *" ( Vary: STAR ) and will REFUSE to cache it at all. Might as well just use 'Cache-Control: no-cache'. It will be the same behavior for caches that don't support "Vary:". SQUID is the ONLY caching proxy I know of that even comes close to handling "Vary:" correctly but only the latest version(s). For years now... even SQUID would just 'punt' any response that had any kind of "Vary:" header at all. It would default all "Vary: xx" headers to "Vary: *" ( Vary: STAR ) and never bother to cache them at all. Even the latest version of SQUID is still not HTTP/1.1 compliant. There is still a lot of 'Etag:' things that don't get handled correctly. It's possible to implement "Vary:" without doing full "Etag:" support as well but there will always be times when the response is not cacheable unless full "Etag:" support is onboard. So you CAN/SHOULD use the "Vary: Cookie" response header and it WILL work for SOME inline caches... but be fully prepared for users to report problems when the inline cache is paying no attention to your "Vary:". 3. Browsers: Pass the option cookie around as part of the URL param list (relatively easy to do using HTML::Embperl or other template solution). So if the cookie is "opts=123", then I make every link on my site be of the form "/somedir/example.html?opts=123...". This makes the page look different to the browser when the cookie is changed, so the browser will have to get the new version of the page. Not sure. Maybe. I guess I really don't follow what the heck you are trying to do here. What do you mean by 'make every link on my site be of the form uri?' Don't you mean you want everyone USING your site to be sending these varius 'cookie' deals so you can tell who is who and something just steps in and makes sure they get the right response? You should not have to 'make every link on my site' be anything. Something else should be sorting all the requests out. I guess I just don't get what it is you are trying to do that falls outside the boundaries of normal CGI and 'standard practice'. AFAIK 'shopping carts' had this all figured out years ago. Now... if what you meant was that every time you send a PAGE down to someone with a particular cookie ( Real Cookie:, not URI PARMS one ) and you re-write all the clickable 'href' links in THAT DOCUMENT to have the 'other URI cookie' then yea I guess that will work. That should force any 'clicks' on that page to come back to you so that YOU can decide where they go or if that Cookie needs to change. But that would mean rewriting every page on the way out the door. Surely there must be an easier way to do whatever it is you are trying to do. Officially... the fact that you will be using QUERY PARMS at all times SHOULD take you out of the 'caching' ball game altogether since the mere presence of QUERY PARMS in a URI is SUPPOSED to make it ineligible for caching at any point in the delivery chain. In other words... might as well use 'Cache-Control: no-cache' and just force everybody to come back all the time. ...This makes the the page look different to the browser when the cookie is changed, so the browser will have to get the new version of the page. Again.. I am not sure I would say 'have to'. There is no 'have to' when it comes to what a User-Agent may or may not be doing with cached files. Most of them follow the rules but many do not. I think you might be a little confused about what is actually going on down at the browser level. Just because someone hits a 'Forward' or a 'Back' button on some GUI menu doesn't mean the HTTP freshness ( schemes ) always come into play. All you are asking the browser to do is jump between pages it has stored locally and that local cache is not actually required to be HTTP/1.1 compliant. Usually is NOT. Only the REFRESH button ( or CTRL-R ) can FORCE
Re: deflate input filter and jk
[EMAIL PROTECTED] wrote... Hi to all, A new question to HTTP / RFC gurus. A customer has developped a custom PHP HTTP client, using HTTP 1.0 and compression. That's like mixing Vodka and Beer... something could easily puke... but OK... I hear ya... This HTTP client compress both request and replies. Sure, why not. For replies it works great but for request we have a doubt. I imagine so, yes. Since the HTTP client compress a request there is in HTTP header : Content-Encoding: gzip Also the Content-Length is set to the size of the plain request (not the size of the compressed request). Is it correct or should it send the Content-Length with the size of the compressed request ? In such case, it seems that mod_deflate INPUT filter should modify the Content-Length accordingly ? Thanks for your help You've got some messed up code on your hands, Henri. In your particular case... Content-length should ALWAYS be ACTUAL length of the number of bytes on the wire. Anything else is going to screw something up somewhere. You have to remember the difference between 'Content-Encoding:' and 'Tranfer-encoding:'. 'Transfer-Encoding:' is TRANSPORT layer thing but 'Content-Encoding:' is a PRESENTATION layer thing. When any HTTP request or response says that it's BODY DATA has 'Content-type: ' and/or 'Content-Length: ' what that really meant ( in early HTTP terms ) is... Content-Type: = Original MIME type of original data (file). Content-Length = Actual length of original data (file). The original assumption in early HTTP was that this would always represent some file on some disk and the 'Content-type:' was usually just the file extension (mapped) and the 'Content-length:' was whatever a 'stat()' call says the file length was. When Content started to get produced dynamically ( does not exist until asked for ) things got a little sticky but the CONCEPT is still the same. Content-type: is supposed to be the MIME type 'as-if' the 'file' already existed and 'Content-length' would be the exact number of ( PRESENTATION LAYER ) bytes 'as-if' the 'data file' was sitting on a disk somewhere. If ANYTHING steps in to alter or filter or convert the 'content' at the PRESENTATION layer then it MUST change the 'Content-Length' as well because from the 'Content-x' perspective... the content has, in fact, changed at the PRESENTATION layer. There is no HTTP header field that looks like this... Original-Content-Length: - Length of data before P layer content changed All you have to work with is this... Content-length: - Length of P layer data NOW after something changes it. RFC 2616 says... 4.4 Message Length 3. If a Content-Length header field ( section 14.41 ) is present, its decimal value in OCTETs represents BOTH the entity-length and the transfer-length. The Content-Length header field must NOT be sent if these two lengths are different [snip] What this really means is... 3. If a ( PRESENTATION layer ) Content-Length header field ( section 14.41 ) is present, its decimal value in OCTETs represents BOTH the entity-length ( Actual PRESENTATION layer length ) and the transfer-length. ( TRANSPORT layer length - actual number of bytes on the wire ). The Content-Length header field must NOT be sent if these two lengths are different [snip] The last part is kind of moot since it's not uncommon at all for presentation layer content-length to be 'different' from the actual transport layer length. You will see it all the time 'out there'. The only thing that gets you into real trouble is when the actual length of the data is MORE than whatever the 'Content-length:' field says it's supposed to be. Example: Even with all the above being said... it is actually OK to leave 'Content-Length:' set to the original size of the file IF you are using GZIP or DEFLATE ( or any LZ77 ) to compress the content. As long as the specified 'Content-length:' ( original size ) is MORE than the number of compressed LZ77 bytes on the wire you will usually still be OK. Why?... because GZIP and ZLIB and all other LZ77 decompressors already KNOW what the original content length was and they don't need HTTP to tell it to them. The size of the orignal file is (usually) contained in the LZ77 headers. Even 'streamed compression' ( sic: ZLIB ) will KNOW when the decompression has ended. There's an EOD signal built into the stream itself... but that doesn't mean the Server will know what the decompressor 'knows'. Which brings us to your 'action items', methinks. If you are using 'streamed compression' ( Sic: ZLIB ) then there will only be 2 ways that the Server knows how many bytes the Client is actually SENDING... 1. The Content-Length in the request header is, in fact, the transfer length and the Server will stop uploading data when that length is reached and won't 'timeout' or anything waiting for more data that never arrives. 2. The Client is using HTTP/1.1 and "Transfer-encoding:
Re: mod_deflate vs mod_gzip
In a message dated 3/30/2004 8:06:52 AM Central Standard Time, [EMAIL PROTECTED] writes: Hi to all, One of my customers is trying to use to an Apache 2.0.47 using mod_deflate. Its HTTP implementation works with Apache 1.3.x and mod_gzip but not with Apache 2.0.47 and mod_deflate. The PHP gzinflate and gzuncompress were used but without luck and even when skipped 10 first chars. Any help welcome. A beer to winner. C'mon, Henri... you know better than that. That's not enough information for anybody to even guess at the problem. mod_deflate WORKS... there's no doubt about that. It might be missing some config stuff to please everybody but I doubt that it's actually screwing up. Can you throw us a bone here? What is this person trying to do? Yours... Kevin Kiley
Re: mod_deflate vs mod_gzip
Hi Henri... Kevin again... Willing to try and help, Henri... but you've got to give us something to go on here. You are asking for crystal-ball debug. The job doesn't pay enough for that. Peter Cranstone wrote... What about trying mod_gzip with Apache 2.x? That would at least tell them SOMETHING. If even mod_gzip doesn't work then you can turn on mod_gzip debug and the Apache log will be filled with more than you ever wanted to know about trying to perform DCE ( Dynamic Content Encoding ) and what may or may not be going wrong. Later... Kevin In a message dated 3/30/2004 9:47:53 AM Central Standard Time, [EMAIL PROTECTED] writes: [EMAIL PROTECTED] wrote: In a message dated 3/30/2004 8:06:52 AM Central Standard Time, [EMAIL PROTECTED] writes: Hi to all, One of my customers is trying to use to an Apache 2.0.47 using mod_deflate. Its HTTP implementation works with Apache 1.3.x and mod_gzip but not with Apache 2.0.47 and mod_deflate. The PHP gzinflate and gzuncompress were used but without luck and even when skipped 10 first chars. Any help welcome. A beer to winner. C'mon, Henri... you know better than that. That's not enough information for anybody to even guess at the problem. mod_deflate WORKS... there's no doubt about that. It might be missing some config stuff to please everybody but I doubt that it's actually screwing up. I know but I need arguments and may be some help or advices. Can you throw us a bone here? Well I didn't have access to my customer code so I could only forward the informations I've got. What is this person trying to do? A customer which didn't allow me to give its name :) Any help if welcome, and I could try to grab part of the PHP code of my customer
Re: mod_deflate vs mod_gzip
May be also something related with transfer and chunk. Perfectly possible. Stay tuned Glued to the TV at this point. Yours... Kevin In a message dated 3/30/2004 10:22:28 AM Central Standard Time, [EMAIL PROTECTED] writes: [EMAIL PROTECTED] wrote: Hi Henri... Kevin again... Willing to try and help, Henri... but you've got to give us something to go on here. You are asking for crystal-ball debug. The job doesn't pay enough for that. Ok, my customer allow me to send the PHP code to the list so it could be studied. Peter Cranstone wrote... What about trying mod_gzip with Apache 2.x? That would at least tell them SOMETHING. If even mod_gzip doesn't work then you can turn on mod_gzip debug and the Apache log will be filled with more than you ever wanted to know about trying to perform DCE ( Dynamic Content Encoding ) and what may or may not be going wrong. May be also something related with transfer and chunk. Stay tuned
Re: mod_deflate - disabling per response?
Hmmm... What I'm really looking for is a response header or some such that I can set in my JSP page or servlet in Tomcat to indicate that the response should be left alone Jess Holle I assume you want to be able to add a response header from your back-end that looks something like this... X-Do-not-compress-this: Dummy_value mod_deflate has no such pickup at this time... but you can easily just add this yourself. Just take a look at ../modules/filers/mod_deflate.c At the point where 'deflate_out_filter()' kicks in you have access to both the input (request) headers and the output (response) headers so you can pretty much do whatever you want. The routine is already checking for things in both the input and output headers with (actual) statements like... accepts = apr_table_get(r-headers_in, "Accept-Encoding"); encoding = apr_table_get(r-headers_out, "Content-Encoding"); You can just add your OWN check right there amongst the other 'checks' and shut down the filter based on whatever criteria you like. Here are the actual 4 lines of code that will 'shut down' the compression filter if it doesn't see 'Accept-Encoding' in the input (request) header(s)... [snip] /* if they don't have the line, then they can't play */ accepts = apr_table_get(r-headers_in, "Accept-Encoding"); if (accepts == NULL) { ap_remove_output_filter(f); return ap_pass_brigade(f-next, bb); } [snip] All you would have to do is just copy these 4 lines of code right underneath where they appear and just make a few little changes... [snip] /* If a back-end server tells us not to compress this response then OBEY... */ if ( apr_table_get(r-headers_out, "X-Do-not-compress-this") ) { ap_remove_output_filter(f); return ap_pass_brigade(f-next, bb); } [snip] That's it. You don't even need a scratch pointer to do this since all you would be doing is checking for the presence of the special 'X-Do-not-compress-this:' header coming from the back-end server. The 'Dummy_value' doesn't matter. It only matters if the header itself is present. If it has an 'X-' on the front then you don't even need to worry about removing it. Just leave it there and all downstream proxies and agents SHOULD 'ignore it'. Actually... a good case could be made for leaving it there in case someone gripes about something not getting compressed when they think it should. If the header makes it all the way back to the 'griper' then there is no doubt what 'happened'... it was specifically EXCLUDED from being compressed and it was not 'an accident' or 'a bug'... it's a 'feature'. If you wanna get fancy you could do this... [snip] /* If a back-end server tells us not to compress this then OBEY... */ no-compress = apr_table_get(r-headers_out, "X-Do-not-compress_this"); if ( no-compress ) { apr_table_unset(r-headers_out, "X-Do-not-compress-this"); ap_remove_output_filter(f); return ap_pass_brigade(f-next, bb); } [snip] If you go this route be sure to add the 'scratch pointer' named 'no-compress' to the stack variables declared at the top of the function... This... if (!ctx) { char *buf, *token; const char *encoding, *accepts; ...Would need to have your 'scratch' pointer added to it like this... if (!ctx) { char *buf, *token; const char *encoding, *accepts, *no-compress; This line... apr_table_unset(r-headers_out, "X-Do-not-compress-this"); ...will search the output headers for all headers matching this string and will REMOVE them from the output header(s). Caveat: See notes above about good reason(s) to actually LEAVE the 'X-Do-not-compress-this:' header there if/when it is used. BTW: mod_gzip for the Apache 1.2.x series and above has always had the ability to do this sort of thing. You can always 'control' what is or isn't getting compressed with simple configuration commands like... mod_gzip_item_exclude rspheader "X-Do-not-compress-this: *" That single mod_gzip configuration line accomplishes the same purpose as the code shown above. If a RESPONSE header appears ( from the back-end or wherever ) named 'X-Do-not-compress-this:' or whatever ( totally up to you and defined in the config ) and it has any kind of 'value' at all then it will match the STAR regular _expression_ search and the the response will excluded from compression. mod_deflate will certainly ( at some point ) need better configuration-based control over the decision making process than it currently has. I hope this has been of some help. No one else seemed to be responding so I thought I would put my 2 cents in. Yours... Kevin Kiley In a message dated 3/5/2004 1:35:35 PM Central Standard Time, [EMAIL PROTECTED] writes: Geoffrey Young wrote: Jess Holle wrote: My apologies if this is better done on the user group, but I've been reading Apache source code and trying to understand the following. Is there any way to signal mod_deflate that a particular response should not be deflated when: 1. the URL of the request is identical to other cases that
Re: consider reopening 1.3
Geez... it's nice to discover everybody hasn't just dropped dead! I see a lot of healthy 'things to do' coming out of this thread that could inject a lot of life back into the development... which is what the various threads the past few days have all been about. Action items?... Facts to face?... -- FACT?: Apache 2.0 pre-fork ( which is the only thing still available on some of the best platforms ) is SLOWER than Apache 1.3 pre-fork. -- This gives someone who might be stuck with one of those pre-fork only platforms, or anyone who just WANTS to stick with pre-fork, absolutely NO INCENTIVE to upgrade at all ( ever! ). The whole module-rewrite thing is another issue but as long as the same process model is SLOWER in the 'new' version than the 'old' version you have a serious migration roadblock that isn't going to go away. Okay... problem identified... what to do? - Verify that's it's true. ( seems to be ). You have to KNOW, yourselves, where you are on the stopwatch and not wait for usrs to tell you. - If it's not (true)... do some marketing and make sure people KNOW IT. - If it is... fix it. Make it NOT TRUE. I popped off and looked at 2.0 code again just now and I can tell you right now it's (still) the filtering that's killing it. The core filters are going to need to be totally optimized and kick-ass if there's any chance of 2.0 matching 1.3 speed... and that's before anybody loads any 'extra' (optional) filters ( other than core ). I don't think this is something any one person can do considering no one seems to really have the 'whole picture' of the filtering at this point and the orignal (primary) author is gone. I have a few suggestions but I'm not even sure I have the 'whole picture' on how to improve things. One idea, of course, is to code in some BYPASSES ( on a config switch? Dunno ) that would allow 2.0 pre-fork core filters to not actually use the filtering (scheme) at all and put it right back into 1.3 performance range. I am by no means suggesting you bring back BUFF for the core filters... but I AM suggesting there might be ways to BYPASS all the BUCKET_BRIGADE stuff at the core level, if that's the way someone wants to run it. The moment someone starts loading a lot of 'optional' modules you'd probably have to re-engage the filtering but I'll bet you a dollar to a donut there are a LOT of people running Apache with nothing but out-of-the-box options and CORE filters only. You might even be suprised how MANY people just run it that way. I think you would see a MAJOR bump in 2.0 usage numbers if there was any way 2.0 pre-fork could be FASTER than 1.3 in a same-box same-environment scenario. You can't really fix the module-migration roadblock nearly as easily as you could FIX THIS. - FACT?: There are still some non-maintenance mode things that people have already expressed they would like to see added to 1.3.x and probably more 'ideas' lurking nearby. - I say... let it happen. Whether it was 'officially' closed or not... when someone can't get a 2 line logging patch added to 1.3 after 6 months of trying then that is CLOSURE no matter how you look at it. Make it not so. Just let it be known that 'worthy' additions to 1.3 are still WELCOME and maybe some fresh air will blow in the window. - FACT?: One of the biggest roadblocks to 2.0 migration is ( and will remain ) module migration. - No one even knows how many freakin' 'in-house' modules there are out there ( security, auditing, billing, etc. ) that people depend on for their 'business' needs but you can be sure those 'in-house' modules are what bring home their bacon and are MORE important to them than Apache itself. In a lot of cases these people are using (private) modules written FOR them by someone who is long... long gone and they really don't have the faintest idea how to get it 'migrated' to some 'new version'. It's too late to talk about total forward-backward compatibility for 1.3 and 2.0 modules ( that opportunity was already blown at the design stage ) but it IS POSSIBLE to start a discussion about providing better 'compatibility' between modules. Example: If a simple Apache 1.3 module is already NOT using APR but simply relies on the old BUFF API wrappers... it's perfectly possible to just 'load' that module and run it under Apache 2.0. No kidding. All you have to do is have some way for 2.0 to 'fake' the old BUFF calls with a 'filter wrapper' around that (simple) module. You might even be able to do it by 'ifdef'ing the old BUFF calls with new MACRO calls but that would require a re-compile. This would require some ( a lot? ) of code in the core module loader/runner that isn't there right now but IT COULD BE DONE. If Microsoft can carry their DLLs forward through 3 major architecture changes... you can do the same. It just takes adding the right 'smarts' to the Server itself. FACT?: Whatever the target codebase... it's become nearly impossible to get patches
Re: consider reopening 1.3
Last benchmarks I have currently are quite old. I think the last time I ( just a USER of Apache ) did any serious benchmarking was 2.0.40 or something... but the results were right inline with what Rasmus just posted. Apache 2.0 pre-fork was a pig compared to Apache 1.3 prefork. If I get some time off from my 'real' job in the next little while I will try and get you some benchmarks... but if you read my last message you will see that it says... FACT?: - That's a QUESTION MARK there. - Verify that's it's true. ( seems to be ). You have to KNOW, yourselves, where you are on the stopwatch and not wait for usrs to tell you. - If it's not (true)... do some marketing and make sure people KNOW IT. - If it is... fix it. Make it NOT TRUE. That's kinda parta the problem right at the moment, isn't it? Apache 2.0 has been out for almost 2 years and nobody seems to be sure WHAT the real performance numbers are??? That's just an indication of how bad the lethargy has become and/or how piss-poor the rollout follow-up on 2.0 has been. Later... Kevin In a message dated 11/17/2003 2:01:13 AM Central Standard Time, [EMAIL PROTECTED] writes: * [EMAIL PROTECTED] wrote: -- FACT?: Apache 2.0 pre-fork ( which is the only thing still available on some of the best platforms ) is SLOWER than Apache 1.3 pre-fork. -- Do you have a supporting benchmark available? Benchmarking a PHP script as Rasmus did does not express anything about the httpd. Please don't continue to distribute misleading information again and again you know nothing about. nd
Re: consider reopening 1.3
Fantastic! So Rasmus has just uncovered some 'other' problem then which means (only) mod_perl is a pig on 2.0 or something? I guess that's better than the core being the problem. I'd like to see this get put to bed once and for all and eliminate it from the 2.0 migration discussion(s). Got any real numbers? What if sendfile was added to 1.3? I wonder how it would all stack up then? Last time I checked... 'sendfile' was not available on all platforms. What would the numbers look like on those platforms? Later... Kevin In a message dated 11/17/2003 3:02:24 AM Central Standard Time, [EMAIL PROTECTED] writes: On Mon, Nov 17, 2003 at 02:05:33AM -0500, [EMAIL PROTECTED] wrote: FACT?: Apache 2.0 pre-fork ( which is the only thing still available on some of the best platforms ) is SLOWER than Apache 1.3 pre-fork. Not for me it's not. Especially with sendfile. -- Colm MacCárthaigh Public Key: [EMAIL PROTECTED]
Re: consider reopening 1.3
You are right, apache 2.0 pre fork is apache 1.3 prefork... Maybe. Maybe not. My 'FACT?:' header had a QUESTION MARK there. Just in the last 4 or 5 messages on this thread the actual reality has become even more obfuscated. Rasmus seems to be saying it's a pig... but maybe he's simply uncovered a serious I/O problem in mod_perl or something other than the core itself. Colm says the opposite. I DON'T KNOW what the REAL story is. Maybe nobody does. Nobody's bothered to find out for sure. But wouldn't it be nice to know.. since this product has more of a monopoly in it's target market than even Microsoft does in any of theirs? To this day ( almost 2 years after the relase of 2.0 ) no one has done any serious benchmarking... not even Covalent. If they have... I've never seen it. URI? All I know is that last time I (personally) tested Apache 2.0 pre-fork against Apache 1.3 pre-fork the 'new' version was losing hands-down as compared to the 'old' version. 'sendfile' doesn't do jack-squat for you if have a platform that doesn't even fully support it and/or when the responses are dynamic and not static 'files'... which is more and more the reality these days. Your other points are WELL TAKEN. I don't think anyone would say that there should NOT be an MPM for Apache. If you have Natvie threads ( Windows ) or 3rd party threads ( UNIX ) then at least you have options. If you have FreeBSD you are still kinda screwed but I'm sure SOMEONE is going to fix that. In the meantime... while all this is getting hashed out... the subject of the thread is 'consider reopening 1.3'. Whatever else is going on with 2.0... I say +1 to that. Personally... I've always wondered how fast 1.3 could be with full 'sendfile'. Later... Kevin In a message dated 11/17/2003 4:09:37 AM Central Standard Time, [EMAIL PROTECTED] writes: You are right, apache 2.0 pre fork is apache 1.3 prefork... But one nice feature of apache 2.0 is to provide other mpm more powerfull. worker mpm is apache 1.3. If you look all benchmark of web server, you will see that all are now providing threaded architectures because it's more stable and faster. Did you see all speed benchmark ? for example IIS 6 is really faster than apache 1.3 and a bit than threaded mpm of apache 2.0. Apache 2.0 is able to run on old plateform not using threaded mpm and also to run "faster" on the lastest plateform using threads... I also read here that companies prefer buy load balancer than take a faster web server... it's right in 50% of them, but not for the 50% of the others who want load balancer + webserver performance. And what about companies not able to buy load balancer, running a webserver for many services, (i think to all these companies selling packaged and hosted webserver) and using a basic linux distribution, or a BSD distrib ? i think they could be happy to use the real power of threads than stay on Apache 1.3. (which one of them not want to announce more power, and more security ?). Apache is fighting against webserver now providing powerfull features _ONLY_ based on threaded architectures. And i think in the future it will be based more and more on this, if not, why all unix kernel are working and now providing thread lib and function more and more powerfull ??? What about all this new features in Apache 2.0 which is oriented (IMHO) for web application, what about the security of them ? the entire and easy control of input and output data due to filters ? Apache 1.3 is working fine yes... but did you see all new features provided by others webserver, based on apps, apps security, and apps speed ? Apache 2.0 is designed to give a solid answer to this, on all actual and future needs. it's just asking to become more and more stable. For me, Apache 1.3 will become more and more stopped to give place to a more powerfull solution, able to run latest applications and features. (I am benchmarking the speed of Apache 2.0 for 1 years and an half now, with a special and dedicated request injector architecture able to simulate connexions, hits, auth etc... And yes, worker mpm is faster than apache 1.3. ) In Apache 2.0 I trust :p regards, Matthieu [EMAIL PROTECTED] wrote: Last benchmarks I have currently are quite old. I think the last time I ( just a USER of Apache ) did any serious benchmarking was 2.0.40 or something... but the results were right inline with what Rasmus just posted. Apache 2.0 pre-fork was a pig compared to Apache 1.3 prefork. If I get some time off from my 'real' job in the next little while I will try and get you some benchmarks... but if you read my last message you will see that it says... FACT?: - That's a QUESTION MARK there. - Verify that's it's true. ( seems to be ). You have to KNOW, yourselves, where you are on the stopwatch and not wait for usrs to tell you. - If it's not (true)... do some marketing and make sure people KNOW IT. - If it is... fix it. Make it NOT
Re: consider reopening 1.3
Hi Colm... Slainte!... Cead mile failte romhat! Go raibh maith agat! Wow... I believe everything you are saying... and please don't take this the wrong way... but I'm not sure a test that only runs for 1.1 second and 1000 requests with 100 clients being launched ( on the same machine? ) is a good way to get accurate results especially in the TPS ( Transactions Per Second ) numbers. The rounding errors alone could be huge with so little time on the clock. Try same test for a reasonable amount of TIME and see if it's any different. Rasmus recent benchmark shows the EXACT OPPOSITE and I think you have certainly just proved that something is seriously wrong with THAT test... but I'm not sure yours is the end-all be-all proof either. When I get the chance... I'll run the same ( 6 hour ) benchmark against 2.0.47 that I did sometime back for 2.0.40 and see what that says. At that time... I did the same 'out of the box' thing you just did and got the EXACT OPPOSITE results. Apache 2.0 prefork was about twice as SLOW as 1.3 prefork... not twice as fast. Man, this is confusing. ( Pionta Guiness le do thoil... fer sure ) Later... ( Slainte )... Kevin In a message dated 11/17/2003 4:17:57 AM Central Standard Time, [EMAIL PROTECTED] writes: On Mon, Nov 17, 2003 at 04:40:02AM -0500, [EMAIL PROTECTED] wrote: Got any real numbers? Completely unconfigured, out of the box configs; Apache 1.3.29; Concurrency Level: 100 Time taken for tests: 2.54841 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 1883090 bytes HTML transferred: 1466192 bytes Requests per second: 486.66 [#/sec] (mean) Time per request: 205.484 [ms] (mean) Time per request: 2.055 [ms] (mean, across all concurrent requests) Transfer rate: 894.47 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 1.8 0 13 Processing: 17 193 38.7 196 297 Waiting: 17 192 38.7 196 296 Total: 23 194 37.8 196 297 Percentage of the requests served within a certain time (ms) 50% 196 66% 201 75% 206 80% 209 90% 223 95% 252 98% 272 99% 280 100% 297 (longest request) Apache 2.0.48 (using prefork): Concurrency Level: 100 Time taken for tests: 1.110512 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 1909712 bytes HTML transferred: 1460368 bytes Requests per second: 900.49 [#/sec] (mean) Time per request: 111.051 [ms] (mean) Time per request: 1.111 [ms] (mean, across all concurrent requests) Transfer rate: 1678.51 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 3 3.2 3 13 Processing: 21 100 14.1 103 142 Waiting: 3 96 14.2 100 142 Total: 32 103 11.9 106 143 Percentage of the requests served within a certain time (ms) 50% 106 66% 107 75% 107 80% 108 90% 109 95% 110 98% 114 99% 123 100% 143 (longest request) That's completely unconfigured. Apache 2 is *much* more configurable, and it's possible to make things much faster. It's not just faster, it's capable of scaling much better. Two weeks ago I was sustaining 8,000 simultaneous connections and throughing out a about 300 Meg of traffic. On the same hardware, 1.3 would never get close to that. I used to get problems when dealing with a few hundred connections. And before you ask; pre-fork. Btw, it's not as if you can just disregard the other mpm's. They exist, and they are much faster again than prefork. You can't just ignore progress because it doesn't work on every platform. There are other ways in which 2.0 is much better, mod_*_cache are good examples of how it's possible for admins to serve requests a lot faster, especially if they have slow disks and so on. Last time I checked... 'sendfile' was not available on all platforms. What would the numbers look like on those platforms? Much much much better. I turn off sendfile on purpose, because TCP checksum offloading problems with my Gigabit NIC's means I can't use it for IPv6. The friday before last, I was shipping 440Megabit's of traffic from one machine using httpd. That was using pre-fork, without sendfile. The notion that 2.0 does not outperform 1.3 is laughable, it does. Now it may underperform it in some configurations, and a lot of people report that they see slowdown with dynamic content, especially php, but that's not the same thing as saying 2.0 is slower. I'm telling you, it's certainly not in my configuration. -- Colm MacCárthaigh Public Key: [EMAIL PROTECTED]
Re: Apache 2.0 Uptake thoughts
William Rowe wrote... ...Ignoring for a moment the 9.13% of Apache servers that don't reveal their version whatsoever, ang ignorning rounding errors, 3.57% of the servers out there use some 2.0 version of Apache, so that 6% of Apache servers (identifying themselves) run 2.0 as opposed to another version. Question for ya... using the same URI... http://www.securityspace.com/s_survey/sdata/ ...your numbers are not out of line but I don't see how you got all the way to 6 percent. For ALL Servers ( Apache or otherwise )... I get 3.6 pct. using some flavor of 2.x and only 2.36 pct. of Secure Servers. Within the subset 'Servers identified as Apache'... I get 5.4 pct. for ALL Servers and 5.11 for Secure Servers using any flavor of 2.x [snip] Report date: November 1, 2003 * ALL SERVERS Total Servers found: 12,220,278 Total reporting themselves any flavor of Apache: 7,979,368 Server Name Found Pct. - -- Apache/2.0.40 181671 1.49 Apache/2.0.47 143631 1.18 Apache/2.0.46 030733 0.25 Apache/2.0.45 028823 0.24 Apache/2.0.44 018745 0.15 Apache/2.0.43 017849 0.15 Apache/2.0.39 008280 0.07 Apache/2.0.47 002450 0.02 * - Apache-AdvancedExtranetServer/2.0.47 Apache/2.0.42 002117 0.02 Apache/2.0.44 001859 0.02 * - Apache-AdvancedExtranetServer/2.0.44 Apache/2.0.36 001360 0.01 - -- 437518 3.60 3.6 pct. of ALL Servers are using any flavor of Apache 2.x. 5.4 pct. of all Apache servers found are using any flavor of Apache 2.x. * SECURE SERVERS Total Secure Servers found: 154,477 Total reporting themselves any flavor of Apache: 71,541 Server Name Found Pct. - -- Apache/2.0.40 001627 1.05 Apache/2.0.47 000770 0.50 Apache/2.0.46 000306 0.20 Apache/2.0.43 000257 0.17 Apache/2.0.45 000248 0.16 Apache/2.0.44 000198 0.13 Apache/2.0.39 000161 0.10 Apache/2.0.47 51 0.03 * - Apache-AdvancedExtranetServer/2.0.47 Apache/2.0.42 22 0.01 Apache/2.0.44 17 0.01 * - Apache-AdvancedExtranetServer/2.0.44 - -- 3657 2.36 2.36 pct. of ALL Secure Servers are using any flavor of Apache 2.x. 5.11 pct. of all Apache Secure Servers found are using any flavor of Apache 2.x. [snip] Personally, I'm pleased by a 6% uptake in a software application that doesn't have to change till someone needs the new features, given that we continue to provide the security patches people need for their existing 1.3 infrastructure. Well... then I wonder what the percentage of folks is that have NEVER needed the 'new features' and what it will take to EVER get them to upgrade if they haven't already done so? That's obviously ( after almost 2 years of waiting to find out ) the majority of users, by far... and may remain so until... ... forever ??? Don't know. Of course it will only grow higher if folks trust 2.0 and can get their problems solved, which the current dialog in [EMAIL PROTECTED] I hope will help address. Got any comments back in the other thread about any of the following 'suggestions'? - Close 1.3 to ALL patches ( security included ) and finally put the nails in the coffin lid. - Re-open 1.3 for additions, changes, new things, since it's obvious ( by now ) that the majority of Apache users don't even need/want 2.x. - Maintain active development on ALL versions of Apache. Maybe the simple reason a lot of people haven't bothered to go anywhere near Apache 2.0 is that they simply don't realize that 1.3.x is 'dead man walking' as far as this devlist is concerned. If they embrace that horror... maybe the 2.x numbers will JUMP I don't know... an 'annoucement' or something that makes it CLEAR to the average 1.3.x joe that he's now using 'obsolete/unsupported' software? Yours... Kevin Kiley
Re: the wheel of httpd-dev life is surely slowing down, solutions please
In a message dated 11/13/2003 12:53:42 PM Central Standard Time, [EMAIL PROTECTED] writes: By the by... Covalent signs my paycheck. And if you look at 1.3, you'll see that I've been pretty key on staying on top of it. Kind of blows away your theory, don't it? Nope
Re: the wheel of httpd-dev life is surely slowing down, solutions please
Hi Bill... This is Kevin... William Rowe wrote... We value individual contributions here, not corporate affiliation. We means ASF, right? If so... then I think you just nailed the whole point of this thread, if I am reading the original poster's concerns correctly. There doesn't CURRENTLY seem to be much evidence that what you just said is TRUE. 'Individual' attempts to contribute are getting IGNORED and the last few words of this message thread's subject line are just asking to hear from the powers that be what they intend to do about that ( solutions please ? ). Requiring people to post and re-post and re-post patches until they are blue in the face and/or need to start shouting from mountaintops ( or find a 'backdoor' or 'inside track' into the 'real inner cirlce' like the last guy finally did ) is no way to 'walk the walk' that you just talked ( We value individual contributions ). Prove it. Make it EASY to contribute... not a nightmare. I think that's all the guy is asking for. ( and before you come back with the standard Are YOU willing to review patches, then? at least I'll be honest and say there's NO WAY right now or for the forseeable future. Unlike you... I am NOT paid to work on Apache and I just don't have the time. Besides, I'm also about 100% sure you don't want me reviewing patches for Apache. Newcomers can go read some archived messages if they are curious about the history there. ) Nobody has ever gotten a 'pass' into the Apache HTTP Server for their employment or their employers efforts. Methinks thou dost protest too much. I never suggested any such thing. I never came near saying that Covalent or any of it's myriad Apache developers were getting 'free passes' to anything. It still takes votes to do things at Apache and even though there are times when the vote count includes mostly Covalent people ( like when Apache 2.0 was released too early ) there is always the VETO as a safety catch. It was never exercised because as far as anyone knows nothing un-toward was going on. Apache is stil a 'meritocracy'... but isn't that part of what the guy who started this thread is complaining about? He tried for 4 months to even get a leg up on the bottom rung of the 'meritocracy' and no one gave a shit. I would say that's a short-circuit to the whole scheme itself. The powers that be stay the powers that be unless 'new' people are being give the chance to show what THEY can do. The fact that folks, such as Brad and Madhu, are committers and PMC members is a result of their personal dedication to the project and that the project is proud to count them as members, regardless of current employment status. Then that means there was a time when they first tried to post a patch as well. What was their experience? Good, bad, or ugly? How did they get a 'leg up' in the meritocracy? I'm not asking these questions for myself... It's no mystery to me and I know exactly how it's done and how many rungs there are on that ladder. I am asking the questions on behalf of the original poster and I think the answers might fill out the thread subject. If you look at what has REALLY happened in the past 3 years ( yes... going back that far since it's now 4 or 5 years since 2.0 became a real blib on the radar ) there's no question that there was this intense period of development and 'new' things were happening at a fast rate. Without a doubt this period of development was abnormally intense for any five year old open source project. Good point. ...and let's hope anyone that's even paying attention to this thread isn't expecting Apache development to always be that way. Sometimes ( like now ) it just becomes 'dog work' and the flash and glitter is hard to find. There have NEVER been enough warm bodies at Apache and there never will be. I think the red flags have gone up because it's starting to appear as if NO ONE is answering the phone at all anymore. That's a problem. As more and more developers got interested in getting 2.0 cranked out the (limited) resources all got eaten up in the 2.0 development cycle and 1.3 development virtually shut down. It was even 'officially' shut down long before 2.0 was ready ( 1.3 officially went into maintenance mode only ). That's an interesting point. Most of my early (independent) contributions, about 600 dev hours worth, were entirely focused on making 1.3 work under Win32. I know. From some of your other comments it might appear that your memory is failing a bit, Bill, but you must remember me. I had Apache running under Windows using the (free) Borland compiler about 2 years before you appeared but the concern about making Apache for Windows REAL and going to head to head with Microsoft only cranked up after that Mindspring article appeared and showed that IIS was beating the pants off of pre-fork Apache and the UNIX boys got all pissed off. That was the 2x4 that got the mule's attention. My patches to compile Apache under Windows using
Re: mod_deflate and transfer / content encoding problem
My reading of RFC 2616 is that Accept-encoding is only for content-codings. You are right. Brain fart on my part. I am still not sure how the discussion about mod_deflate has gotten anywhere near Transfer-Encoding:. mod_deflate is NOT DOING TRANSFER ENCODING. Was it you that suggested it was or the original fellow who started the thread? Content-encoding: gzip Transfer-encoding: chunked Cannot be interpreted as 'using Transfer encoding'. That would be... Transfer-encoding: gzip, chunked. Is someone saying that's what they are actually SEEING coming out of Apache? God... I hope not. Bug-city. Clients should indicate their ability to handle transfer-codings via TE. Yep... except a Server may always ASSUME that a client can handle Transfer-encoding if it says it's HTTP 1.1 compliant. There's no need for a TE header at all. The only caveat is that you can only assume a TE encoding/decodiing capability of chunked. Anything other than 'chunked' has to be indicated with a TE header... you are right. Problem here is that what I said about 'not knowing' is still sort of true... it just didn't come out right. A Server still has no way of knowing if the original requestor can handle TE, or not. The TE header is a 'hop by hop' header. Might have come from the original requestor, might not. There's no way to know. And that's OK... that's all that TE: was designed for. It's all based on the NN ( Nearest Neighbor ) concept and is a property of the 'message', not the 'content'. It's just part of that strange mixture of transport AND presentation layer concepts that is modern day HTTP. Even if it shows up ( very rare ) the TE header is actually SUPPOSED to be 'removed' by the NN ( Nearest Neighbor )... [snip - From RFC 2616] 13.5.1 End-to-end and Hop-by-hop Headers For the purpose of defining the behavior of caches and non-caching proxies, we divide HTTP headers into two categories: - End-to-end headers, which are transmitted to the ultimate recipient of a request or response. End-to-end headers in responses MUST be stored as part of a cache entry and MUST be transmitted in any response formed from a cache entry. - Hop-by-hop headers, which are meaningful only for a single transport-level connection, and are not stored by caches or forwarded by proxies. The following HTTP/1.1 headers are hop-by-hop headers: - Connection - Keep-Alive - Proxy-Authenticate - Proxy-Authorization - TE - Trailers - Transfer-Encoding - Upgrade All other headers defined by HTTP/1.1 are end-to-end headers. [snip] Hop-by-hop headers, which are meaningful only for a single transport-level connection, and are not stored by caches or forwarded by proxies. The above is, of course, not what is really going 'out there' in the real world but it's all hit or miss and you still can't be sure what's being 'forwarded' and what isn't. I have certainly seen inline proxies 'forwarding' hop-by-hop headers and 'caches' storing them, as well. It's a jungle out there. ROFL. If a Server really wants to be sure compressed content is being sent all the way along the response chain ( including that critical 'last mile' ) to the original requestor then the only choice is still to just use 'Content-Encoding'... even if there is no static representation or the page is totally dynamic and doesn't even exist until it's asked for. ASIDE: Maybe that's where someone is getting confused about TE versus CE? When HTTP was designed the whole CONTENT concept was based on disk files and file extensions and MIME types and whatnot but that's not how things have evolved. Content-type is now more a 'concept' than a physical reality. There are gigabytes of Content these days that doesn' even exist until someone asks for it and it's NEVER represented on disk at all. There's just no way to know if your 'last mile' is covered with TE capability, or not. The alternative to using the Content-encoding: voodoo to get compressed representations of non-compressed resources all the way down to a client would be to have some sort of 'end-to-end' TE header which says 'the content was compressed because the original requestor says he wants it that way so just pass it through'... but that ain't gonna happen anytime soon. Content-Encoding: gzip together with Transfer-Encoding: chunked or simply... Transfer-Encoding: gzip, chunked. It should make no difference to the 'receiver'. Well, not if the receiver is a caching proxy... Personally... I still don't think that matters much. I am of the opinion that it's always 'cheaper' to store compressed versions of entities and then just decompress it if you need to which means a proxy SHOULD just go ahead and remove the chunking and stash the response regardless of whether it came in as 'Content-encoded' compression or 'Transfer-encoded' compression... but that's just me. Actually... I firmly believe any
Re: the wheel of httpd-dev life is surely slowing down, solutions please
Hi all... I just have to jump in here since the topic is fascinating... ...and I think there's an opportunity here to review something that has contributed to the 'slow down' at httpd-dev which no one has seemed to grasp (yet). I will call it... The Covalent Factor. If you look at what has REALLY happened in the past 3 years ( yes... going back that far since it's now 4 or 5 years since 2.0 became a real blib on the radar ) there's no question that there was this intense period of development and 'new' things were happening at a fast rate. As more and more developers got interested in getting 2.0 cranked out the (limited) resources all got eaten up in the 2.0 development cycle and 1.3 development virtually shut down. It was even 'officially' shut down long before 2.0 was ready ( 1.3 officially went into maintenance mode only ). So now you had lots of legacy developers ( albeit... lots of weekend-warriors, too, but WW's are the heartbeat of Open Source ) who knew 1.3 very well but were now totally put out to pasture. Very few of them 'came over' to the 2.0 development. The majority of the developers for 2.0 were the 'paid to play' kind that were either being paid to directly work on Apache ( I'll get to the Covalent factor in a second ) or, at least, were being paid to do something else but no one seemed to mind if they sat there at their real jobs all day and did Apache development. They might not admit to being directly paid to work on Apache but when that's what you are doing all day and still bringing home a paycheck then that's exactly what that is. The 'other' not-so-dedicated-but-certainly-interested developers felt 'shut out' of the 2.0 development cycle because it was obvious a lot of it was taking place 'off line' and nothing was being documented so they couldn't really get a good handle on what was going on in order to make a contribution. So they were mostly 'waiting in the wings' for a lot of the 2.0 DESIGN level decisions and concepts to become clear so they could 'jump in'. Well... we all know that it was a looong time before some of the 2.0 design concepts became 'clear' enough for a not-paid-to-work-on-apache person to 'jump in'. There were basically only 2 big changes to Apache 2.0 versus 1.x. 1. MPM. Get out of the 'prefork' only 'go fork thyself' design model. 2. ( Came later ) Add some I/O filtering and get rid of BUFF. Well, these are both fine goals to have but they are not for the faint-of-heart ( or within the grasp of most weekend-warriors ). That brings me to The Covalent Factor. Is anyone going to deny that at a certain point in the 2.0 development cycle the PRIMARY driving force was a group of people that ALL worked for one company? ( Covalent ). That's certainly the way it looked. ...and these guys were crankin'... I mean they were MOTIVATED, because Covalent's entire corporate life revolves around Apache and they were actually high level deals going on with Compaq ( and other corporate entities? ) that all revolved around getting Apache 2.0 out the door. At one point one of the Covalent guys ( I believe it was that Randy Bloom fellow? ) was pretty much the ONLY person who had any idea how the new 'filtering' was even SUPPOSED to work. It was quite some time before he even finished thinking it through and it went through (too?) many re-works to even keep a good grasp on it. I remember the feeling for MONTHS was something like... let's just see what Covalent comes up with and let them put their stamp of approval on it and then we'll see if we understand it. ...and that's pretty much how the entire SECOND primary goal of Apache 2.0 was achieved. Covalent just did it and expected everyone else to 'go along'... and they did. The point of all this is not to FLAME anyone or any corporate entity. No one can deny that Covalent has been crucial to the life and times ( and success ) of Apache itself and they should be proud of the contribution(s) they have made to OSS. Covalent was started by one of the original Apache guys and some of the best developers left around here still work for Covalent. I think William Rowe still does, for example, and he's still pretty much the only human being who seems to understand the Windows version of Apache... but the legacy flood of 'covalent' email addresses showing up at httpd-dev has virtually vanished so it's hard to tell what what with that anymore. There is nothing in the rules of Open Source that says you can't crank up a company that makes money off of Open Source software. Indeed... if you look at almost ALL of the major Open Source projects you will see the same sort of 'Covalent' deal going on whereby the people that know that OSS best have found a way to get paid to work on it. My only real point here ( and the way all this relates to the current thread ) is that maybe it's time to acknowledge that what is happening now is what will always happen to a major development project if you let too many of your
Re: mod_deflate and transfer / content encoding problem
Andre Schild wrote: [EMAIL PROTECTED] 31.10.2003 23:44:06 On Fri, 31 Oct 2003, Andre Schild wrote: Please have a look at the following Mozilla bug report http://bugzilla.mozilla.org/show_bug.cgi?id=224296 It seems that mod_deflate does transfer encoding, but sets the headers as if doing content encoding. I'm not an expert in this, but that statement seems completely wrong to me. Compressing the content is a content-encoding, not a transfer-encoding. The only thing transfer-encoding is used for in HTTP is chunking. I anyone here reading this who can answer this for sure ? Bill Stoddard replied... Compression is content-encoding not transfer-encoding. See RFC 2616 14.11, 14.41 3.6 Nope. Totally wrong. Read 3.6 again ( the whole thing ). 'Compression' is most certainly a valid 'Transfer-Encoding'. [snip - From section 3.6 of RFC 2616 ] The Internet Assigned Numbers Authority (IANA) acts as a registry for transfer-coding value tokens. Initially, the registry contains the following tokens: chunked (section 3.6.1), identity (section 3.6.2), gzip (section 3.5), compress (section 3.5), and deflate (section 3.5). [snip] There are actually some clients ( and servers ) out there that are RFC 2616 compliant and can handle Transfer-encoding: gzip, chunked perfectly well. Apache cannot. I think we should put a warning on the second recomended configuration that compressing everything can cause problems. (Specially with PDF files) Yep. IE in particular has some really weird problems handling compressed SSL data streams containing JavaScript. Bill Netscape is by far the worst when it comes to not being able to handle certain 'compressed' mime types even though it sends the all-or-nothing indicator Accept-encoding: gzip. Netscape will even screw up a compressed .css style sheet. Opera is the 'most capable' browser in this regard... When Opera says it can Accept-encoding: gzip it comes the closest to not being a 'liar. It can (almost) do anything. MSIE falls somewhere in-between. There is no browser on the planet that is telling the truth when it says Accept-Encoding: gzip when you consider that this is an ALL OR NOTHING DEVCAPS ( Device Capbilites ) indicator. Unlike the Accept: field, for mime types, there isn't even any way for a user-agent to indicate WHICH mime types it will be able to 'decompress' so unless it can absolutely handle a compressed version of ANY mime type then it really has no business sending Accept-encoding: gzip at all. There also is no way, in current HTTP specs, for a Server to distinguish between Content-encoding and Transfer-encoding as far as what the client really means it can/can't do. When a User-Agent says Accept-encoding: a Server can only assume that it means it can handle the '' encoding for EITHER Content-encoding: OR Transfer-encoding:. There's just no way to tell the difference. Same HTTP request field is supposed to be good for BOTH 'Transfer' and 'Content' encoding. Again... if a User Agent cannnot handle '' encodings for all mime types for BOTH Content-encoding: AND Transfer-encoding: then it has no business ever sending the Accept-Encoding: in the first place. This is of course, far from reality. As far as a User-Agent being able to tell the difference between Content-Encoding: and Transfer-encoding: coming from the Server then the only thing you can rely on is the response header itself. The 'compressed' BODY data itself will always be identical regardless of whether it's actually... Content-Encoding: gzip together with Transfer-Encoding: chunked or simply... Transfer-Encoding: gzip, chunked. It should make no difference to the 'receiver'. The User-Agent will still be receiving a gzip compression stream with chunking bytes injected into it and the decompression step is the same for both scenarios. You have to dechunk, then decompress. If you pass the stream data to a ZLIB decompression routine without first removing the chunking bytes then ZLIB is going to blow sky-high. This is the reason most browsers are screwing up. They are actually passing the response stream off to an embedded mime-handler ( like an Adobe plug-in for PDF files or some ill-coded Javascript engine, etc. ) that has no idea how to remove the chunking bytes AND get it decompressed. The only way for all of this to currently work 'out there' is to never rely on the Accpet-Encoding: field at all and always have a 'list' of exclusion mime-types that can be tied to 'User-Agent' since they are all different as far as what they are able to do ( or not do ). It's still a classic 'DEVCAPS' ( Device Capabilities ) issue and the current HTTP headers just don't solve the problem(s). You can't really trust User-Agent much. Not sure it's ever been trustworthy. A scrape-bot might be imitating Netscape request headers right down to the Accept-Encoding: field so it doesn't get blocked out but if you actually send it compressed data it's probably going
Re: mod_deflate -- File size lower bound needed?
[EMAIL PROTECTED] writes: Stephen Pierzchala wrote: All: A question for discussion: should a lower bound be set in mod_deflate? I just ran a test using the Linux Documentation Project files and found that some of the files in the test group were quite small, less that 100 bytes. When mod_deflate tackled these files, I saw a file sized increase of between 12 and 15 bytes. Doesn't sound like a lot, until maybe you start to add up all of those HTML error pages we all send. I open the floor to debate. :-) while this may be easy for the cases where the filter gets passed a content-length header, No brainer ( if it's accurate when the filter fires up ). it may be harder for the ones where it doesn't know the size of the page before it starts compressing.. Any filter is supposed to be able to buffer data. Harder, yes, but not impossible. The whole filtering scheme is SUPPOSED to be able to handle this, no problems. Not sure if anything has actually exercised this yet, though. Might be time to see if it really works. The 'cheap trick' compromise would be to realize that by the time the first brigade shows up there will probably be only one of 2 scenarios... 1. The response is less than the size of the fileread and/or CGI transfer buffer size ( 4k? ) and nothing else is coming ( There is already an EOS in the first brigade. This can be treated the same as having 'content-length' because you know for sure how long the full response is before you have to decide whether to fire up the compression. 2. If there's no EOS in the brigade yet you have to assume more is coming so now it's nut-crackin' time. If the 'minimum file size' is less than the amount of data already in the first brigade showing up then it's also a no brainer... just pass it on without compressing. If the minimum file size is larger than what's in the first brigade... then it's time to start buffering ( if the code allows it ). I would say that just allowing a 'minimum file size' in the 2-3k byterange and only doing the check on the first brigade would handle most situations where anyone is worried about whether something is worth compression. Anything over about 900 bytes is almost ALWAYS going to show some size reduction. Problem solved. If someone wants to start setting a minimum file size of 100,000 bytes then I would suggest the 'requirement' on their end be that all responses MUST have 'Content-Length:' from the origin ( if it's not a file read ) until mod_deflate can actually 'store and forward' any size response. I'm cool with putting in a directive, but not sure how to write the doc's up to say that this is a 'guidance' only and that it might be ignored. See above. Limit it to the normal (Apache) I/O buffer read size(s) and suggest that anything above that can/should have 'Content-Length' or it's not eligible for the 'minimum size' check. we should also put in a directive to only compress when system load is below a certain level. (but we would need a apr_get_system_load() function first .. any volunteers? ) If you go down this route watch out for what's called 'back-flash'. You can easily get into a catch-22 at the 'threshhold' rate where you are ping-ponging over/under the threshhold because currently executing ZLIB compressions will always be included in the 'system load' stat you are computing. In other words... if you don't want to compress because you think the machine is too busy then it might only be too busy because it's already compressing. The minute you turn off compression you drop under-threshhold and now you are 'thrashing' and 'ping-ponging' over/under the threshhold. You might want to always compare system load against transaction/compression task load to see if something other than normal compression activity is eating the CPU. Low transaction count + high CPU load = Something other than compression is eating the CPU and stopping compressions won't really make much difference. High transaction count + high CPU load + high number of compressions in progress = Might be best to back off on the compressions for a moment. [EMAIL PROTECTED]
Re: mod_deflate -- File size lower bound needed?
FYI: There was a serious brain fart (mine) in the previous message... I said... 2. If there's no EOS in the brigade yet you have to assume more is coming so now it's nut-crackin' time. If the 'minimum file size' is less than the amount of data already in the first brigade showing up then it's also a no brainer... just pass it on without compressing. If the minimum file size is larger than what's in the first brigade... then it's time to start buffering ( if the code allows it ). What I meant was that if the 'minimum_file_size' is less than the amount of data already in the first brigade that shows up then you already know that this puppy is NOT 'below the minimum' and ( based on other criteria ) is eligible for compression. At this point it's is a sure-fire 'MAY' be compressed depending on other eligibilty checks.
Re: regarding EAPI
what can happen if I load a module compiled with EAPI flag into a Apache 1.3 without EAPI?? I ask for ditribution of binaries and want to know if it makes no problems loading EAPI-enabled modules; or if I should Should work *I think*. It wouldn't work the other way around, of course (you can't load a non-eapi module into an eapi apache). Not true. It's perfectly OK to load a Non-EAPI module into an EAPI (ssl) compiled Apache. Just ignore the warning about "This module might crash because it's not EAPI". As far as loading an EAPI module into a Non-EAPI compiled Apache... it depends on the module. Might work, might not. If it relies on the EAPI callbacks to get its job done ( instead of just the other regular hooks ) then it probably won't work because the Non-EAPI Apache won't be making the EAPI hook calls. In a message dated 1/22/2003 12:27:07 PM Central Standard Time, [EMAIL PROTECTED] writes: what can happen if I load a module compiled with EAPI flag into a Apache 1.3 without EAPI?? I ask for ditribution of binaries and want to know if it makes no problems loading EAPI-enabled modules; or if I should Should work *I think*. It wouldn't work the other way around, of course (you can't load a non-eapi module into an eapi apache).
Re: STATUS mailings for stable httpd 2.0
Justin wrote ( RE: Apache STATUS files )... Oh, I hate to get more email that I just delete as soon as it comes in Does anyone actually read these things though? Yes.
Re: [PATCH] mod_deflate extensions
--On Friday, November 22, 2002 12:03 PM +0100 Henri Gomez [EMAIL PROTECTED] wrote: So we should use a copy of mod_gzip compression code in Apache 2.0. Also as someone involved in mod_jk/jk2, I'll need gzip compress/uncompress support in Apache 2.0 for a new ajp protocol I'm working on, so having compression/uncompression in Apache 2.0 will be mandatory for me. Justin wrote... No, we shouldn't be including zlib code (or any variant) in our distribution. That's not our responsibility. It's also not important enough for us to further bloat our code just because an insignificant number of distributions haven't provided a good package of zlib. If it's that important, those administrators can build their own zlib or just not use any functionality requiring zlib. The point here is that no functionality is lost if zlib is missing. I guess that's where some would disagree and the point of Henri's original posting. I happen to think that a LOT of 'functionality' is 'missing' from Apache if it can't do compression/decompression 'out of the box' but that's certainly no secret since I've been voicing that opinion for some years now. It's still just one man's opinion, as is yours. Diff strokes for Diff folks. (And, if you are doing a mod_jk, zlib support should be optional not mandatory.) Ok, let be pragmatic. Did Apache HTTP developpers agree that compression should be added in Apache 2.0 by incorporating mod_gzip comp code in Apache 2.0 ? mod_deflate is already there and it uses an external zlib library, so I'm confused why we should also provide mod_gzip and/or its proprietary compression code. No one has suggested 'also providing mod_gzip'. That's a dead issue. Apache will never be using 'mod_gzip'. The issue was whether or not it's time to think about adding some actual compression/decompression code (like ZLIB) to the source tree itself and/or the Non-ZLIB inline compression engine that mod_gzip uses which is perfectly do-able and pretty much a no-brainer. ...and for the record... the 'compression code' in mod_gzip is NOT PROPRIETARY. It is still just simple public domain LZ77 + Huffman. It is as 'open-source' as your own product (Apache), if not even more-so. ALL of the code for mod-gzip ( compression engine(s) included ) has been donated to Apache 3 different times and it's all still sitting there on your hard drives ready for you to do whatever the heck you want with it. All of mod_gzip is also now sitting up at SOURCEFORGE free as the wind and under the same 'do whatever you like with this' conditions. mod_gzip is freely available, and the ASF doesn't need to distribute it (Remote Communications evangelizes it enough). RCI has nothing do with mod_gzip anymore. mod_gzip is all sitting up on SOURCEFORGE for some time now. One of the main reasons for selecting mod_deflate was that it didn't unnecessarily duplicate code. Less code is better. We don't need to repackage zlib. I have no desire for us to compete with the zlib maintainers. We have enough work as-is. -- justin All points well taken. Your points go against the advice of people who actually wrote ZLIB ( Dr. Mark Adler and others ) with regards to anyone who 'distributes' applications that (may) depend on it (ZLIB) but your concerns about how much the Apache Group is able to deal with are well-grounded. Jeff Trawick has already said he thinks including the 'minimal' amount of code neeeded by Apache to do what mod_deflate needs to do in the SRC tree itself would be GOODNESS and would circumvent a lot of build problems/bug reports but therein lies the issue itself. If it comes to a vote I would focus on that one single issue. Forget about mod_gzip's LZ77 engine or any other version of LZ77 that 'could' be used... if the Apache Group isn't even willing to add a minimal subset of ZLIB code to the SRC tree and compile/link against it then there's no use discussing whether that extends to any 'other' compression/decompresion code as well. Yours... Kevin
Re: [PATCH] mod_deflate extensions
Henri Gomez wrote... - Put part of zlib code in Apache 2.0 source ? Jeff Trawick wrote... that is what I suspect to be the safest, easiest-to-understand way... the build would work like on Windows, where the project file for mod_deflate pulls in the right parts when building mod_deflate.so Henri Gomez also wrote... We could grab the compression part, like does mod_gzip, mod_gzip even use zlib includes. If the discussion is once again open about including compression code in a base Apache source release then let me clarify what Henri just said... The compression engine in the orginal mod_gzip ( and the current one in the 1.3 series mod_gzip ) is NOT based on ZLIB at all. It is based on the ORIGINAL public domain GZIP/LZ77 code that pre-dates ZLIB. That LZ77 code is, in many ways, much 'simpler' than ZLIB since ZLIB became this kind of 'all things to all people' package and the I/O and API interface(s) got a lot more complicated than the original GZIP stuff. ZLIB was meant to be 'easier to deal with' at the API level but the sacrafice was more overhead during the compression itself. The original GZIP code was not thread-safe so the engine in the original mod_gzip was a thread-safe rewrite that uses a Finite State Processor ( like SSL codebases ). Every compression task has a 'context handle' just like SSL does to maintain the integrity of all the compression tasks that might be running at the same time. There are no 'globals' as there are in GZIP ( and ZLIB? ). It also got 'tweaked' to do straight pointer based in-memory compressions which made it a better candidate for a 'real-time' compression engine. A lot of the time-consuming things in GZIP were also speeded up with better loops and code that produced better .ASM output. So... to make a long story short... mod_gzip actually includes some original GZIP 'headers' and uses the same functions names ( deflate/inflate, etc. ) as legacy GZIP and ZLIB... but those headers were not from ZLIB. They didn't change the headers much when they created ZLIB from GZIP but there are some subtle differences. NOTE: The original sumbmission of mod_gzip for Apache 2.0 used the same built-in compression engine as the 1.3 series. It was already thread-safe so it would have been fine for 2.0. There was a LATER re-submission of mod_gzip for Apache 2.0 that followed an intense discussion about NOT using such code and it is more like mod_deflate and does, in fact, just include ZLIB headers and has the same build expectations as mod_deflate. BOTH VERSIONS are still sitting there somewhere in the Apache archives. They were both submitted some long time ( years now? ) ago. I also refer you to the discussion thread regarding the original inclusion of mod_deflate which contains some 'advice' posted to the Apache forum from Dr. Mark Adler ( one of the original authors of all this GZIP/ZLIB LZ77 code ). He suggested that compiling your OWN version of GZIP/ZLIB was pretty much the only 'sane' thing to do and I agree 100%. There are actually a number of 'flaky' distributions of GZIP/ZLIB 'out there'. He ( Mark Adler ) himself pointed out that there are some 'patches' floating around for GZIP/ZLIB that never made it into ANY standard distribution and only by applying them yourself to your own compiled OBJ/LIB could you be sure what you are actually using and what shape it is in. Now that Apache is multi-thread capable and there are some known issues with legacy GZIP/ZLIB distributions and multi-threading it makes even more sense to know EXACTLY what 'version' of whatever compression code you are dealing with when the threading bug reports start coming in. Jeff Trawick also wrote... The reason I'm being a real stick-in-the-mud about this is because Apache users already seem to have enough trouble getting Apache to build out of the box, usually because of some screwy setup or other unexpected conditions that nobody can ever reproduce, and it is very difficult for me to accept that some quick change to turn on mod_deflate is going to do anything other than increase these problems, particularly since I committed various user errors myself getting consistent builds of mod_deflate and zlib. (Heck, I even broke the cvs executable on one machine before I found out what was going on :) ) I think you underestimate your legacy user base. Everyone already accepts the fact that building Apache is usually an absolute nightmare, especially if you are trying to get any kind of SSL going. It hasn't slowed anyone down that I can tell. They ( historically ) just stay up all night and figure it out. That's how you build Apache. If they REALLY WANT some 'feature' in their Server... they will tough it out and get it done. That's the way it's always been with Apache so no one would be shocked if some LIB wasn't in the right place or a makefile needed tweaking. Later... Kevin Kiley
Re: [PATCH] mod_deflate extensions
Peter J. Cranstone wrote... Since when does web server throughput drop by x% factor using mod_deflate? Jeff Trawick wrote... I don't think you need me to explain the why or the when to you. Think again. Exactly what scenario are you assuming is supposed to be so 'obvious' that it doesn't need and explanation/discussion? There has never been a good discussion and/or presentation of real data on this topic... just a bunch of 'assumptions'... and now that compression modules have caching ability whatever testing HAS been done needs to be done again because perhaps any/all of the sore spots in anyone's testing can now be completely eliminated by real-time caching of compressed objects. All of my experience with compressing Internet Content in real time on Servers, with or without the caching of the compression objects, indicates that it USUALLY, if done correctly, does nothing but INCREASE the 'throughput' of the Server. Same experience has also shown that if something ends up being much SLOWER then something is bad WRONG with the code that's doing it and it is FIXABLE. The assumption that YOU seem to be clinging to is that once the Server has bounced through enough APR calls to handle the transaction with as few things showing up in STRACE as possible that the Server has done it's job and the transaction is OVER ( and the CPU somehow magically free again ). This is never the case. Pie is rarely free at a truck stop. If you dump 100,000 bytes into the I/O subsystem without taking the (few) milliseconds needed to compress down to 70-80 percent LESS then SOMETHING in the CPU is still working MUCH harder than it has to. The 'data' is not GONE from the box just because the Server has made some socket calls and gone about it's business. It still has to be SENT, one byte at a time, by the same CPU in the same machine. NIC cards are interrupt driven. Asking the I/O subsystem to constantly send 70-80 percent more data than it has to via an interrupt driven mechanism is basically the most expensive thing you could ask the CPU to do. In-memory compression is NOT interrupt driven. As compared to interrupt driven I/O it is one of the LEAST expensive things to ask the CPU to do, on average. Do not confuse the performance of any given standard distribution of some legacy compression library called ZLIB with whether or not, in THEORY, the real-time compression of content is able to INCREASE the throughput of the Server. ZLIB was never designed to be used as a 'real-time' compression engine. The code is VERY OLD and is still based on a streaming I/O model with heavy overhead versus direct in-memory compression. It is a FILE based implementation of LZ77 and while it performs very well in a batch job against disk files it still lacks some things which could qualify it as a high-perfomance real-time compression engine. mod_gzip does NOT use 'standard ZLIB' for this very reason. The performance was not good enough to produce consistently good throughput. We went through this debate with mod_gzip and it doesn't hold much water. Server boxes are cheap and adding some more ram or even a faster processor is a cheap price to pay when compared to customer satisfaction when their pages load faster. Your Server boxes are cheap comment is very telling; if I add more ram or a faster processor we aren't talking about the same web server. Exactly. Regardless of the fact that content compression at the source origin CAN actually 'improve' the throughput of one single server ( if done correctly ) let me chime in on this point and say that if adding a little hardware or perhaps even another ( dirt cheap these days ) Server box is what it takes to provide a DRAMATIC improvement in the user experience then what's the gripe? If that's what it takes to provide a better experience for the USER then I agree 100% with Peter. That is what SHOULD be the focus. Your point of view seems to indicate that you believe it's better to let your USERS have a 'worse experience' than they need to just to avoid having to beef up the Server side. I have always believed that the END USER experience should be more important than how some single piece of software 'looks' on a benchmark test. Those benchmarks that produce these holy TPS ratings are usually flawed when it comes to imitating a REAL user-experience. It's a classic argument and there have always been 2 camps... Which is more important... 1. Having a minimal amount of Server to deal with/maintain and let the users suffer more than they need to. 2. Do whatever it takes to make sure all the technology that is currently available is being put into play to provide the best USER experience possible. I have always pitched my tent in camp # 2 and I think most people that are serious about hosting Web sites circle their wagons around the same camp. But overall I agree completely that compressing content and adding more ram and/or a faster processor as appropriate is
Re: 2.0 book
Ryan Bloom wrote. It's being printed now, should be in stores in a week or two. Congratulations ( I mean it ). Interesting timing, though. That means final draft(s) went to publisher on or about the time that you initiated the release of Apache 2.0 way before it was ready for GA ( 2.0.35 ). Will the copy be available online anywhere? If not... what's the price tag for this book and will you allow portions of it to be reprinted online for users of this 'public domain' software? Yours... Kevin Kiley
Re: [STATUS] (httpd-2.0) Wed Nov 28 23:45:08 EST 2001
Hello William... This is Kevin Kiley again... See comments inline below... In a message dated 11/28/2001 10:59:26 PM Pacific Standard Time, [EMAIL PROTECTED] writes: From: [EMAIL PROTECTED] Sent: Thursday, November 29, 2001 12:30 AM In a message dated 11/28/2001 10:21:46 PM Pacific Standard Time, [EMAIL PROTECTED] writes: If you have any doubts about why sometimes submissions aren't considered for inclusion in any open source project ... well there you have it. Ya know what... I am going to stand still for this little spanking session because I am willing to admit when I have made a mistake and unlike most times before on this forum when you guys have tried to make a punching bag out of me... this time I give you permission to fire away. Ok... now on to the next question ( already asked by someone else ) Whatever happened to the 'other candidate for submission' as described by Coar? I assume that was mod_gzip itself? As described by Ken? Once again, what would he have to do with that? Nothing, really, other than the fact that the only reason I asked is that he is the one who updated the status file to read... +1 Cliff ( there's now another candidate to be evaluated ) ...and failed to actually mention what that 'other candidate' really was. I believe I missed any messages from a 'Cliff' and so I was never sure myself what that was all about since it wasn't clear in the STATUS. I have never really been sure what happened to the complete working mod_gzip for Apache 2.0 that was (freely) submitted ( after both public and private urgings by Apache developers ) If that's what Ken's note was really referring to then OK but I was personally never sure since it wasn't specific. I thought maybe this guy Cliff submitted something, too. I remember mod_gzip for Apache 2.0 was immediately hacked upon right after submission by some people ( Justin? Ian? Don't remember ) and they started removing features without even fully reading the source code and/or understanding what they were for ( and then put some of them back after I explained some things ) but all of that work just died out into silence and the STATUS file became the only remnant of the whole firestorm that Justin started by asking for Ian's mod_gz to be dumped into the tree ASAP. A LOT of folks on the mod_gzip forum caught the whole discussion at Apache and started asking us 'Is that 'other candidate really mod_gzip for 2.0 or is it 'something else' and our response was always 'We do not know for sure... ask them'. And (FYI) a few people came to Apache and DID ask 'What is the staus of mod_gzip version 2.0?' and no one even ack'ed their messages so we assumed it wasn't even being considered. jwoolley01/09/15 12:18:59 Modified:.STATUS Log: A chilly day in Charlottesville... Revision ChangesPath 1.294 +3 -2 httpd-2.0/STATUS [snip] @@ -117,7 +117,8 @@ and in-your-face.) This proposed change would not depricate Alias. * add mod_gz to httpd-2.0 (in modules/experimental/) - +1: Greg, Justin, Cliff, ben, Ken, Jeff + +1: Greg, Justin, ben, Ken, Jeff + 0: Cliff (there's now another candidate to be evaluated) 0: Jim (premature decision at present, IMO) -0: Doug, Ryan Kevin... I believe I've generally treated you civilly... Read the end of my previous response above. A good number of non-combatants to these 'gzip wars' are really disgusted with the language and attitude on list. Much of that has turned on your comments and hostility. If anyone really views a few 'heated exchanges' on a public forum over some specific technology issues as a 'war' then I'm sorry but I still won't apologize for being passionate about something and willing to argue/defend it. Email is a strange medium. Some people take it way too seriously, methinks. In accepting a contribution, the submitter is generally expected to support the submission, ongoing. Okay... mind blown... that is the exact OPPOSITE of the argument that I beleive even YOU were making during the 'Please why won't you submit mod_gzip for 2.0 before we go BETA' exchanges. One of the arguments I made ( Capital I for emphasis ) was that if I was going to 'support' it I wanted to see at least one good beta of Apache 2.0 before the submission was made. Strings of arguments came right back saying That should NOT be your concern... if you submit mod_gzip for Apache 2.0 then WE will support it, not YOU. Seriously... check the threads if you have time... that fact that Apache would NOT be relying on us to support it was one of the 'arm twisting' arguments that was made to try and get us to submit the code BEFORE Beta so that Ian's mod_gz wouldn't be the 'only choice'. Everyone here enjoys working on or with Apache, or they would find another server. Even
Re: [STATUS] (httpd-2.0) Wed Nov 28 23:45:08 EST 2001
In a message dated 11/29/2001 3:23:32 AM Pacific Standard Time, [EMAIL PROTECTED] writes: William A. Rowe, Jr. wrote: What is the http content-encoding value for this facility? deflate Ergo, mod_deflate. And the name change from mod_gz to mod_deflate was suggested by Roy, whom I think knows HTTP better than anyone else here.. Knowing HTTP is one thing... knowing compression formats is another. Does the output of mod_deflate have a GZIP and/or ZLIB header on it, or not? Even those 2 headers are NOT the same but that's yet another story. If it does... then it's not really 'pure deflate'. If it doesn't... then it MIGHT be pure deflate but it won't do squat in most legacy and/or modern browsers. Yours... Kevin
Re: [STATUS] (httpd-2.0) Wed Nov 28 23:45:08 EST 2001
In a message dated 11/29/2001 3:23:27 AM Pacific Standard Time, [EMAIL PROTECTED] writes: As described by Ken? Once again, what would he have to do with that? I just happen to be the chap with the cron job that sends the current STATUS file every Wednesday. I don't maintain it; that's a shared responsibility (and one of its deficiencies, IMHO, but I've ranted about that before :-). Then even more apologies are due and are hereby given. I thought that the 'final editing' of the auto-broadcast was someone's ongoing task ( yours ). Then I have no idea who added... +1 Cliff ( there's now another candidate to be evaluated ) ...or why that comment was so ambiguous. I really did 'miss' a message somewhere and I thought some guy named Cliff ( Wooley? Doesn't say ) submitted something for consideration as well and mod_gzip was already 'off' the table. This impression was then substantiated when a few people from the mod_gzip forum who were curious about the status of the 'mod_gzip 2.0 submission' queried this forum and got absolutely no reply from anyone. Doesn't matter now... But is it too much for future reference to ask STATUS file comments to be a little more explicit? It would help others track what's really happening there at Apache. Later... Kevin
Re: [STATUS] (httpd-2.0) Wed Nov 28 23:45:08 EST 2001
In a message dated 11/28/2001 10:21:46 PM Pacific Standard Time, [EMAIL PROTECTED] writes: From: [EMAIL PROTECTED] Sent: Wednesday, November 28, 2001 11:45 PM Since when do things that have already been voted on just suddenly 'disappear' from the official Apache STATUS file(s)? As cliff points out, modules/experimental/mod_deflate now exists. Yes, it does... and for the third time now ( and probably more to come ) I apologize to the forum and to Ken Coar for asking a stupid question. Without excessive quoting... read the logs before you go flying off the handle... Yes... I missed what happened while on vacation and I thought there was something arbitrarily removed from STATUS. My bad. http://cvs.apache.org/viewcvs.cgi/httpd-2.0/STATUS ianh01/11/28 12:14:09 Modified:.STATUS Log: deflate is in Revision ChangesPath 1.346 +1 -7 httpd-2.0/STATUS Does Ken Coar really have this kind of personal control over what does or does not go into the Apache Server and/or remains in the STATUS file? WTF does Ken Coar have to do with this, other than being the messenger of all that is good or bad [STATUS] within the Apache projects? I don't see his name on that commit message. Once again, you manage to make personal something that was, as your noted, properly voted upon, and then properly carried out. If he had removed it arbitrarily then yea... it might have got personal because I don't think any one persion should have that kind of control over the STATUS file but we all know by now this is just a fuck-up on my part and the paddles are out. If you have any doubts about why sometimes submissions aren't considered for inclusion in any open source project ... well there you have it. Ya know what... I am going to stand still for this little spanking session because I am willing to admit when I have made a mistake and unlike most times before on this forum when you guys have tried to make a punching bag out of me... this time I give you permission to fire away. It was stupid to not realize that maybe there was chance the damn thing finally got comitted and go off and start reading CVS logs. I admit it. I apologize. I missed the 'vote' or the 'committ' message or something. Ok... now on to the next question ( already asked by someone else ) Whatever happened to the 'other candidate for submission' as described by Coar? I assume that was mod_gzip itself? Didn't rise to the level of consideration? Later... Kevin
Re: [STATUS] (httpd-2.0) Wed Nov 28 23:45:08 EST 2001
In a message dated 11/28/2001 10:26:28 PM Pacific Standard Time, [EMAIL PROTECTED] writes: As you point out, vacations are rough for tracking discussions. What is the http content-encoding value for this facility? deflate Ergo, mod_deflate. 'deflate' is not GZIP, it's just PART of GZIP. Ergo... bad name choice IMHO. Browser sends... Accept-Encoding: gzip,deflate They are not actually the 'exact same thing'. If you actually send 'pure deflate' to a modern browser it will blow up even though it says it 'accepts' it. Not to worry... doesn't matter what you call it as long as it sends a GZIP header and it works. Yours... Kevin
Re: Tag time?
In a message dated 01-10-01 04:37:59 EDT, Greg Stein wrote... I have been looking and looking at the patch and someone want to tell me where it checks for TE: which is the only way to REALLY know how the Transfer-Encoding will end? ( Blank CR/LF following CR/LF following 0 byte length, or no? ). It did not before, so it doesn't now. For the most part, this is a rearrangement of code. There is a lot of cleanup and rationalization to the code. It should be much easier to fix/extend the code now. Ah... Okay. I will stop looking for what isn't there, then. Near as I can tell it just relegates 'extra' CR/LF back to stream as 'useless noise' instead of actually knowing for sure if it was a proper end to the Transfer-Encoding. Yup. Actually... on further inspection... turns out this is a 'good' thing. If you are not actually stopping to find out for SURE if a client is going to be ending Transfer-Encoding with a blank CR/LF then you MUST have some kind of 'noise filter'. See other messages regarding the Netscape 'out of band' CR/LF on POST potentially 'following' blank CR/LF to end trailers. Turns out that no matter what you do there may always be times when a blank CR/LF is to be considered 'line noise' and needs to be tossed by the front-end if it appears before the start of a new HTTP request. The only worst-case scenario is if you treat an 'out of nowhere' CR/LF as a valid request with no valid headers appearing before it. While it is possible to 'know' if there will be a blank CR/LF to indicate 'End of trailers' it's impossible to tell if you are really ever going to get the Netscape 'out of band' CR/LF. Might be stripped by a Proxy, might not, but it's never part of Content-Length so no way to anticipate it. Caveat: I have NOT had time to REALLY pour over this HUGE patch so I may be totally off-base... but I think this fact alone is what Ryan is trying to explain... there hasn't been enough TIME to look at this HUGE/CRITICAL patch. It *has* been reviewed. Given that we're in commit-then-review mode, and that two people are fine with the patch, then it can/should go in. If further problems are discovered in the patch, then they can be fixed in the repository rather than before the patch is applied. Roger that. Commit away, then. Yours... Kevin
Dr. Mark Adler on ZLIB OS_CODE
Hello all... This is Kevin Kiley In an effort to resolve a pending issue with regards to the inclusion of code that supports dynamic IETF Content-Encoding I checked out the whole OS_CODE issue in ZLIB. If you use the OS_CODE manifest constant in whatever code you end up with in the source tree then you are automatically establishing a dependency on the ZUTIL.H header which does not normally come with standard binary distributions of ZLIB. Normally you only get ZLIB.H and ZCONF.H unless you have downloaded/installed a source-code level distribution of ZLIB. If you don't want to include ZLIB source in the Apache tree ( still recommended because of patches needed for inflate() memory leaks ) then any dependency on ZUTIL.H might still force your users to download a copy of ZLIB that they don't normally have. By substituing 0x03 for OS_CODE you remove the dependency on ZUTIL.H but you are doing then is 'hard-coding' the OS indicator byte as 'UNIX default'. The OS code means nothing to the compressor and it is only really used by decompressors that are also required to do formatting of the decompressed data. I did some testing and discovered that all browsers tested don't really care about the OS code and the formatting will be handled by the presentation layer elements in the user-agent. So whatever codebase you end up with that uses ZLIB, if you don't include a source-level version of ZLIB in the Apache tree and you want Apache to be able to compile with most people's pre-installed ZLIB implementations just use 0x03 instead of OS_CODE and they shouldn't need ZUTIL.H. I decided to verify this with Dr. Mark Adler ( co-author of ZLIB ) just to make sure... Kevin Kiley wrote... Is the OS_CODE part of the ZLIB header ever really used? Dr. Mark Adler responded... It is only useful when the decompressed data is expected to be text, and the software after the decompressor would like some clue about how to translate end-of-line characters. I have no idea if any browsers make any use of it, but I doubt it. I assume that html/javascript/etc. interpreters already know how to handle different end-of-line conventions without bothering with conversion. Kevin Kiley wrote... Can the Apache guys just forget the OS_CODE or set it to 'UNIX default' like ZUTIL.H does right in the Server code and, hence, eliminate any dependency on ZUTIL.H? Dr. Mark Adler responded... I would say yes, almost certainly. mark
Re: [SUBMIT] mod_gzip 2.0.26a ( Non-debug version )
In a message dated 01-09-16 15:38:37 EDT, Cliff wrote... I should have been more explicit. It's not bogus to do a conditional like the one you just displayed. I thought it was excessive to make it a whole separate function that's only used in one place. I thought it was bogus to set it to the string NULL instead of the empty string, though coincidentally a string that says NULL works out fine in the one place where this function is used. Then there the fact that I can't think of a way that r-uri would ever be NULL. Can it really? Even when there is no r-uri, we set it to the empty string. That was that whole INTERNALLY GENERATED bogosity that we scratched our heads over for a while and then Ryan fixed it by setting it to . Under what circumstances will r-uri==NULL? That call to mod_gzip_npp() ( Null Pointer Protection ) that remains in the debug code was actually an oversight. I have never actually seen the 'r-uri' pointer cause a segfault inside any standard Apache module user-exit, hook, or filter callback. If you look at the 'debug' version of mod_gzip.c that was submitted, however, you will see that it was quite necessary to have NPP for debug mode since there are tons of places where debug is trying to print things that might very well be NULL. Solaris version started exploding all over the place in DEBUG mode just because pointers were NULL so that's when the Null Pointer Protection stuff went in. The reason it's a function is that it used to do a whole lot of other things if/when something showed up as NULL ) that weren't apppriate for a macro ( Used to write a complete separate log file for things that turned up NULL during request hooks and such ). In non-debug mode... it's really all pretty irrelevant. Yours... Kevin
Re: [Fwd: Changes to Apache for Solaris2.8]
In a message dated 01-09-15 12:30:13 EDT, you write: Now they are demanding the change.. Not acked. Anyone want to take this up with them? 1. If they have already found the problem and fixed it for themselves where is the house on fire? 2. Tell them to submit a patch just like you tell everyone else and it will be reviewed for correctness. 3. They (somehow) think they need your 'permission' to make a change. Just my 2 cents. Kevin Kiley
Re: [SUBMIT] mod_gzip 2.0.26a ( Non-debug version )
In a message dated 01-09-15 15:16:16 EDT, Cliff Wooley wrote... On Sat, 15 Sep 2001 [EMAIL PROTECTED] wrote: We decided not to wait any longer for a new BETA. Attached is the current source code for mod_gzip for Apache 2.x series. It has been tested pretty heavily and seems to be working fine. There is only 1 file... mod_gzip.c. You are free to do whatever you like with this submission. Cliff Wooley wrote... Thanks Kevin. I took the liberty of applying a good dose of Apache stylistic rules (might have still missed some things, but I tried). In the heated exchange last week that was one of my specific questions ( major formatting concerns ) and the specific answer was that it didn't matter much at this point in time. If I had thought anyone still cared about where braces are I would have done that myself. I also removed some more debug stuff (the r-notes things) since none of it ever seemed to be really used. Bad move. Did you even read the code before you started butchering it or bother to visit the mod_gzip website and look at all the existing ( and tested ) Apache log analyzers for mod_gzip? The ability to add compression statistics to the Apache logs via the r-notes interface was one of the things that people needed the MOST. If you have removed the r-notes interface you have just taken a huge step backwards and made the code useless for a lot of other people's hard work writing log analysis scripts. I removed the ZLIB license since I don't think any of this code actually came from ZLIB. Cliff... now I really am confused. I really don't know what you are talking about. If a program USES ZLIB then it is SUPPOSED to have the ZLIB license. For people that get so bent out of shape about your own public license I would think you would have more respect for other's licenses. Take deep breath, look at the code, and I suggest you put the COMPLETE LIBPNG/ZLIB license back. I believe when the rubber meets the road your own Board Members will require it to be there. The fact that it uses zutil.h is immaterial AFAIK. If you determine the right OS_CODE for the ZLIB/GZIP LZ77 header then you don't need zutil.h. However... just look at the code in ZUTIL.H and you will see you are better off trusting that header. It's been tested for over 8 years and it does the RIGHT thing. If I'm wrong, somebody tell me and I'll put the ZLIB license back in. See above. It really should be there if you are going to be using ZLIB. I also removed some of the verbose comments... I think it's easier to read with fewer comments in some places. Geezus... ok... whatever. You guys kill me. Some of those comments were IMPORTANT because Apache 2.0 itself isn't even finished yet and there was good information about what to expect might need to happen when it is. I took out the version numbering since it'd be hard to keep that in-sync if this were in the actual httpd-2.0 tree. It makes sense for a 3rd party module, but not for an official module. Sure... Whatever. I still think it's important that the NAME show up in the 'Server:' response field. Did you screw with that? I cleaned up a few logic things, but I tried my best not to change any functionality. Did you break it? Did you even test it to make sure? What was submitted was heavily tested and worked fine. If you broke it then I suggest putting it all back the way it was. I did, however, strip out the handling of the deprecated commands... if they're deprecated, that can be documented, but IMO there's no need to handle them just to print out an error that says this isn't supported anymore. I guess you haven't distributed many modules. If you had, then you would understand that's it's easier to just warn someone they have a mistake in their config then to stop the Server cold. I guess I really don't get this urge at Apache to have such bare-bones code all the time that the users themselves are the ones who suffer. My 2 cents. Where I had problems with/questions about things, I put in an #error so I couldn't forget to take care of them. (They should be easily taken care of.) Because of the #error additions and the fact that I'm out of time to work on this for the day, I haven't even tried compiling this, so there's the possibility I made some stupid mistakes. I guess my only response to that is... Huh? If you are going to butcher something you could at least be a little careful about it. ( Re-compile/test as you make major changes to be sure you are not breaking everything ). I'll get back to it later no doubt, but if someone else would care to take the next turn at looking over it, I'd appreciate it. To save bandwidth, my version is here: http://www.apache.org/~jwoolley/mod_gzip.c --Cliff I submitted that code and I said you (Apache) can do whatever you like with it but I guess I didn't expect such an up-front butcher job before
Re: [SUBMIT] mod_gzip 2.0.26a ( Non-debug version )
In a message dated 01-09-15 15:44:43 EDT, Ian wrote... Coments on coments ( my2c )... additional comments (my 2c) * Caching should be removed (there is another caching module there it should use that), failing that, maybe it should be split out to a different filter What caching are you talking about? This version isn't attempting to have a compressed object cache (yet). * functions should be static Whatever. * why are you defining your own strncmp?? Faster and guaranteed thread-safe using pointers only. * logging should be via the logging optional function Completely configurable just as it is right now. * flushing should probably flush the zlib buffer before sending it out Doesn't seem to matter. * only check if gzip is on ONCE per request. (ie if (!f-ctx)) and if you don't want it enabled remove it from the filter chain. Nope. Read the comments. In order to fully support reality you have to have cross-header field matching using regular expressions. At the time that 'insert filter' is called the response headers are not yet available. You can't make all the 'right' decisions at that point alone. Again... read the comments or read the mod_gzip forum support messages. * remove the de-chunking, you won't see this anyway Yes, you will. * remove the 'enable' flag. if the user has setoutputfilter'd it he wants it Perhaps. Safer to keep it. Why not let them 'setoutputfilter' for the whole Server and then control the actual use in a particualy Server/Location with simple 'On/Off' command? I believe this is what people expect to be able to do and it currently works that way. * the filter should be a HTTP_HEADER not a content one. Whatever. * the filter should only be run on a 'main' request (do you check this?) otherwise you will have a gzip'ed included file in a non-gziped main file. Not true. If you only run on a MAIN request you won't be able to comrpess certain negotiated URLs. When someone just asks for a directory and the name resolves to index.html it's all happening on a SUBREQUEST. See the Apache output logs after mod_gzip has been working for a while... mod_gzip result clearly indicates if/when the request was something other than main and a simply home page request is almost always indicated as OK:SUBREQ because that's what it is. Later... Kevin
Re: [SUBMIT] mod_gzip 2.0.26a ( Non-debug version )
In a message dated 01-09-15 16:34:59 EDT, Cliff wrote... In the heated exchange last week that was one of my specific questions ( major formatting concerns ) and the specific answer was that it didn't matter much at this point in time. If I had thought anyone still cared about where braces are I would have done that myself. shrug It mattered to me. I scratched an itch. No big thing. Scratch away, then. I just go the impression that it wan't all that important at this time. I also removed some more debug stuff (the r-notes things) since none of it ever seemed to be really used. The ability to add compression statistics to the Apache logs via the r-notes interface was one of the things that people needed the MOST. Okay, I didn't know that. It wasn't clear from the code that they were used elsewhere. I'll put them back. See the mod_gzip forum and/or home page for 'compression analysis' stuff. It's all based on the 'notes' interface'. Some people like to create complete Custom logs that ONLY have the compression statistics in them using the existing Apache LogFormat/Customlog directives and what not. I am not saying any of it has to stay. I am just giving you a 'heads up'. Whatever you end with... I assure you that the ability to parse the stats in the logs is going to be a requirement as it became with mod_gzip 1.3.19.1a. I removed the ZLIB license since I don't think any of this code actually came from ZLIB. Cliff... now I really am confused. I really don't know what you are talking about. If a program USES ZLIB then it is SUPPOSED to have the ZLIB license. For people that get so bent out of shape about your own public license I would think you would have more respect for other's licenses. Huh? The license doesn't say that... does it? I read it several times and never got that impression. My interpretation was that if you distribute ZLIB (or a variant) itself, you must leave the license on _that_ code. So for example if we were distributing zutil.h, our version of zutil.h must retain the ZLIB license. Code that links with it is in a different category. I wasn't trying to disrespect the license, I just really didn't think it was supposed to be there. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. It's possible that I've misinterpreted that statement, of course... You haven't misinterpreted it. Mark and Jean-loup's LIBPNG/ZLIB license is about as 'don't care' as you get... but Mark is a friend of mine and I guess I was just saying that I think Apache SHOULD to the thing that he would 'appreciate'. If you are going to use his code then give him ( and Jean-loup ) some credit. That's all... no big whoop. However... just look at the code in ZUTIL.H and you will see you are better off trusting that header. It's been tested for over 8 years and it does the RIGHT thing. No doubt. The comments in the code submitted spelled it out. Most binary downloads of ZLIB only have the ZLIB.H and ZCONF.H header files so if you are going to require that everyone already have their own ZLIB libraries to link to ( Not recommended. See Dr. Mark Adlers comments on this forum from a few days ago ) then the odds are that they won't have ZUTIL.H and will have to suffer a ZLIB source code distribution download anyway. There are 3 ways around this... 1. Duplicate all the OS_CODE stuff from ZUTIL.H in the module itself so there is no dependency. 2. Dump the ACTUAL OS_CODE stuff from the ACTUAL ZUTIL.H right into the module. Again... no more dependency. 3. Include the latest/greates ZLIB source in the Apache tree complete with Dr. Mark Adler's (private) patches that fix the inflate() memory leaks and not only is there no dependency for people to have the 'right' ZLIB you are sure to have the best version available at all times in your Server. This is 'Best bet' but is, of course, subject to a debate that will probably still need to take place. Some of those comments were IMPORTANT because Apache 2.0 itself isn't even finished yet and there was good information about what to expect might need to happen when it is. I didn't remove _all_ comments of course, not by a long shot. I just removed things that seemed really obvious and that we typically don't comment. Roger that. Whatever. Rock on. Sure... Whatever. I still think it's important that the NAME show up in the 'Server:' response field. I just ripped out that whole line. I'll put the name part back, that's certainly reasonable. It's a 'support' issue. You will discover when bug reports start coming in that people have no idea what is really running inside their Server unless it 'announces' itself on the command line or as part of the 'Server:' field. I am not saying that every module should 'name'
Re: notes table? Re: [SUBMIT] mod_gzip 2.0.26a ( Non-debug version )
In a message dated 01-09-15 17:23:07 EDT, you write: [Light comes on] Ahhh... guess I should have looked more closely at mod_log_config and I would have realized that you can configure it to write certain notes to the log file. Duh. My fault. Wasn't the concensus a while back that request_rec-notes should be removed, because the more efficient 'userdata' functions on r-pool had made the notes table obsolete? --Brian It was 'discussed' but never played out. I wouldn't say there was anything near a 'consensus' on anything. Only a few people even responsed. FWIW: I think the 'notes' stuff should stay, for now anyway. Any discussion of removing it ( at this time ) is going to ignite the 'why don't we just get this Server finished first so people can at least start using it' debate. Yours... Kevin
Re: notes table? Re: [SUBMIT] mod_gzip 2.0.26a ( Non-debug version )
In a message dated 01-09-15 19:13:06 EDT, Ryan wrote... Wasn't the concensus a while back that request_rec-notes should be removed, because the more efficient 'userdata' functions on r-pool had made the notes table obsolete? --Brian It was 'discussed' but never played out. I wouldn't say there was anything near a 'consensus' on anything. Only a few people even responsed. Actually, a consensus was reached. I believe that we even tried to do that work, but it isn't as easy as it should be, because it is easy to merge a table, but hard to merge a hash. Roger that. It WILL be harder. All I can say is that 'notes' still works, doesn't seem to be buggy, and comes in handy. Why not just keep it until there's more time to do the work like after 2.0 gets out the door? There has to be some definable 'cut off' point for 2.0 changes. I think the 'userdata versus notes' discussion arose out of the new criteria which seems to have emerged ( Justin says he is now going to insist on it ) that Apache 2.0 be as FAST if not FASTER than 1.3.x before 2.0 ever sees the light of day. I thought the goal decided (long ago) was... 2.0 - Stable and released 2.1 - Address performance issues Yours... Kevin
Re: cvs commit: httpd-2.0 STATUS
In a message dated 01-09-10 10:00:09 EDT, Ryan wrote... All I keep thinking, is that we are trying to spite RC by adding a different GZ module Don't worry about it. Let's see if we can make a decision on what is good for the survival of Apache irrespective of what that means for RC. Peter I agree with Peter here. The only point is to do what is right for Apache *at this time* If everyone really agrees with Justin/Ian and others that the development tree needs a GZIP filter BEFORE the next beta and it ( for some reason ) has to become part of the core *at this moment* then just go with mod_gz. You are not going to 'spite' us... I swear. The decision has always been yours to make, not ours. We really are trying to help. We just really think that the timing is a little wrong for reasons stated. Heck... mod_gzip doesn't even use ZLIB so if my conversations with Mark Adler about the specifics of you guys using ZLIB if you want to doesn't prove we really are just trying to help and make sure you have all the information you need to make and intelligent decisin then I don't know what would. Yours... Kevin The following is NOT flamebait. I swear. It is just an observation that is missing from the discussion. I am just pointing out that no one has done a really good code review of mod_gz even if the 'consensus' is to drop it into the core. I've already pointed out one or two problems ( minor ) but a new one I found is that it seems to duplicate the addition of the GZIP header at all times ( gets added twice?) before the EOS bucket is sent down the pike. Client-side results could be unpredictable. Depends on the browser. It is also completely ignoring the FLUSH bucket when it arrives and has no way to add 'Content-length:' if/when it is necessary.
Re: cvs commit: httpd-2.0 STATUS
In a message dated 01-09-10 12:28:55 EDT, Kevin Kiley wrote... The following is NOT flamebait. I swear. It is just an observation that is missing from the discussion. I am just pointing out that no one has done a really good code review of mod_gz even if the 'consensus' is to drop it into the core. I've already pointed out one or two problems ( minor ) but a new one I found is that it seems to duplicate the addition of the GZIP header at all times ( gets added twice?) before the EOS bucket is sent down the pike. Client-side results could be unpredictable. Depends on the browser. It is also completely ignoring the FLUSH bucket when it arrives and has no way to add 'Content-length:' if/when it is necessary. Erratta... I meant to say that the GZIP 'footer' containing OCL ( Original Content Length ) and CRC seems to be duplicated on each call... the HEADER only seems to go in once as it should. Sorry about that. Later... Kevin
Dr. Mark Adler on ZLIB patent issues
Hello all. This is Kevin Kiley As promised... Below is a cut from the second conversation I had with Dr. Mark Adler ( co-author of ZLIB ) this weekend regarding some of the possible legal 'patent' issues that have been raised ( Ryan, Dirk, others? ) as they might relate to using ZLIB inside of Apache to dynamically compress the presentation layer content ( E.g. IETF Content-Encoding ). The 'summary' of what Mark had to say ( full text below ) is that he has no idea if using ZLIB as the main engine for dynamic data compression will violate generic 'lurker' patents or not. Even Jean-loup did not specifically address these kinds of patents with regards to ZLIB. Dirk told Ryan he has some specific patent 'numbers' that he is concerned about but he won't be back online until Tuesday so still not sure which ones Dirk was referring to or if they are relevant. Our own research in this area turned up a number of these generic 'dynamic compression of content' style patents, one of which is... US5652878 - Method and apparatus for compressing data. Issued to IBM on July 29, 1997 ( Filing date: Oct 2, 1995 ). We had lots of highly paid attorneys looking for these things and rendering opinions but, unfortunately, it wouldn't do any good for me to post our findings because ( as is the case with most IP legal work ) the opinions are not 'transferrable' in a legal sense. The 'gist' of it is that they are probably not worth worrying about since they are simply 'method' patents and the 'general use' clause always kicks in in these cases if/when there is a challenge. There is also the 'public good' clause. We've been compressing Apache presentation layer content all over the world for over a year now and we have not received any 'legal' complaints from anyone but the currently distributed mod_gzip does NOT contain or use ZLIB at all so whether that matters or not... I do not know. I imagine it does NOT since the 'lurker' patents are mostly of the generic 'method' type. If the ASF really needs to be 'severe clear' on this I'm afraid there is no substitute for having your own IP attorneys give you their own green light ( in writing ). Here is all that Dr. Mark Adler had to say about the 'generic' presentation layer data delivery patent(s)... At 8:45 PM EST - Saturday 9/8/01, Kevin Kiley and Dr. Mark Adler had the following conversation... [snip] Kevin Kiley asked... The Apache group doesn't appear to have any problem with the compatibility of the ZLIB/LIBPNG license with their own ( more restrictive ) ASF License but a few of the 'patents' of ( mild ) concern at Apache are those that are floating around out there which are as generic as they can be and basically ( supposedly ) cover the delivery of any compressed presentation layer data via any communications interface. ( In other words.. all of IETF Content-Encoding ). Ring any bells with you? Mark Adler wrote... No. But patents are sneaky things, and I haven't spent any time looking for the lurkers out there. Jean-loup spent quite a bit of time reading patents to make sure that the zlib deflate implementation did not violate any (and there were some that we had to skirt, where for example the level 1 compression in zlib could be faster were it not for a patent). But he did not look for compression delivery patents. mark [snip] Yours... Kevin Kiley PS: As with the previous message regarding ZLIB memory leaks, Dr. Mark Adler's verbatim comments are reprinted here on this public forum with his full permission.
Re: General Availability release qualities?
In a message dated 01-09-08 14:34:49 EDT, Justin wrote... As most of you know (like I haven't said it enough), I'm going to be out of regular email contact for a few weeks. But, I hope this enlightens you on my perspective on what should happen before a GA is released. I look forward to seeing any replies and thoughts before I leave tomorrow (please continue this thread even if I can't reply!). How do we know when we get there if we don't know where we're going? No quite sure what to say on this since you are the one that kicked the 'let's include mod_gz RIGHT NOW' football down the field last friday. I guess all I can say is... 'Have a nice vacation'. Did you know that I was on vacation myself when you kicked that football down the field and I had to cut it short because of that? Thanks. Once classes start again and I get settled in my new place, I'll resume active development. -- justin That's good to know, I guess. Thanks for sharing. Later... Kevin
Re: [PATCH] Turn apr_table_t into a hash table
In a message dated 01-09-08 17:43:15 EDT, Ryan wrote... I know that there aren't many modules for 2.0 today, but at some point, everybody who has a module for 1.3 will want to port it to 2.0. I can currently do that in under one hour for even complex modules. Changing API's like this after we have had 25 releases, makes that harder. Ryan It's a certainty that when the general public has enough confidence in 2.0 to start porting all the existing modules over ( mod_oas, mod_bandwidth, mod_my_module, God all what else ) they will do so by using the 1.x.x code and just changing as little as possible... which traps all the other API calls in there. It certainly would be best to be very careful about changing any more API's at this point until 2.0 is out the door and flying under its own power. It's almost ready. Just an opinion. Kevin Kiley