Re: Do these broken clients still exist?
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote: Does someone with a high-traffic, general-interest web site want to take a look through their logs for these user-agent strings. I don't mind keeping them if they make up even 1/100 of a percent of the trafic, but it seems silly to keep these extra regexes on every single request if these clients don't exist anymore in the wild. # # The following directives modify normal HTTP response behavior to # handle known problems with browser implementations. # BrowserMatch Mozilla/2 nokeepalive BrowserMatch MSIE 4\.0b2; nokeepalive downgrade-1.0 force-response-1.0 BrowserMatch RealPlayer 4\.0 force-response-1.0 BrowserMatch Java/1\.0 force-response-1.0 BrowserMatch JDK/1\.0 force-response-1.0 You're lucky, we've just got our monthly statistics. Pages: 8776121082 Visitors: 436645066 ( requests) Different user-agent strings: 2703 Matched strings: 11 Visitors who use those browsers: 3576 clem
Re: Do these broken clients still exist?
On Sun, 3 Apr 2005 13:58:56 -0400 (Eastern Daylight Time), Joshua Slive [EMAIL PROTECTED] wrote: Does someone with a high-traffic, general-interest web site want to take a look through their logs for these user-agent strings. I don't mind keeping them if they make up even 1/100 of a percent of the trafic, but it seems silly to keep these extra regexes on every single request if these clients don't exist anymore in the wild. Regexes are pretty cheap for a 'normal' apache setup. In the initial testing of a production server (2x 3.2Ghz Xeon, 6 GB RAM;) we found that, serving static pages, the overhead of processing regexes didn't become noticable until we had 1000 rewriting rules. Even then, at least 30% of the hits on this server are cgi-scripts, so the overhead of regexes is really nothing compared to the other ways we abuse our machine. In doing this testing I did notice that Apache's handling of regexes is pretty simplistic. Much of the time you can consolidate a large stack of regexes into a single state machine, and that could give vast (factors of hundreds or thousands) improvements in performance for handling large rule sets. On the other hand, it doesn't really matter. The people we've inherited this server from left us several very large regexes with a few hundred pipe symbols each that match UA's of non-browser clients that we don't want using our service. The trouble is that inevitably this kind of regex starts mutating into malignant forms as people start using parens, also we have no documentation for the rules; on slow days I think about breaking these up into 500-1000 rules, which we could in principle comment one-by-one... This wouldn't really impact the performance of our machine under 'real' circumstances, but we could measure the impact under specialized testing.
Re: Do these broken clients still exist?
My only concern is folks who just reinstalled their OS, and then, mostly for support sites. I'd think the typical server wouldn't need to deal with these. It's also odd to use regex for non-pattern strings, like these. All of them could be trivial strcmp's rather than the regex sledgehammer. At 12:58 PM 4/3/2005, Joshua Slive wrote: I don't mind keeping them if they make up even 1/100 of a percent of the trafic, but it seems silly to keep these extra regexes on every single request if these clients don't exist anymore in the wild. BrowserMatch Mozilla/2 nokeepalive BrowserMatch MSIE 4\.0b2; nokeepalive downgrade-1.0 force-response-1.0 BrowserMatch RealPlayer 4\.0 force-response-1.0 BrowserMatch Java/1\.0 force-response-1.0 BrowserMatch JDK/1\.0 force-response-1.0
Re: Do these broken clients still exist?
William A. Rowe, Jr. wrote: My only concern is folks who just reinstalled their OS, and then, mostly for support sites. I'd think the typical server wouldn't need to deal with these. It's also odd to use regex for non-pattern strings, like these. All of them could be trivial strcmp's rather than the regex sledgehammer. At 12:58 PM 4/3/2005, Joshua Slive wrote: I don't mind keeping them if they make up even 1/100 of a percent of the trafic, but it seems silly to keep these extra regexes on every single request if these clients don't exist anymore in the wild. BrowserMatch Mozilla/2 nokeepalive BrowserMatch MSIE 4\.0b2; nokeepalive downgrade-1.0 force-response-1.0 BrowserMatch RealPlayer 4\.0 force-response-1.0 BrowserMatch Java/1\.0 force-response-1.0 BrowserMatch JDK/1\.0 force-response-1.0 I probably wasn't correct when I said regex. I'd have to check the code to be sure, but I believe that mod_setenvif knows enough to interpret these as simple string matches. So if they exist in the wild, even in a miniscule quantity, then they should stay. Joshua.
Re: Do these broken clients still exist?
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote: ... BrowserMatch Mozilla/2 nokeepalive BrowserMatch MSIE 4\.0b2; nokeepalive downgrade-1.0 force-response-1.0 BrowserMatch RealPlayer 4\.0 force-response-1.0 BrowserMatch Java/1\.0 force-response-1.0 BrowserMatch JDK/1\.0 force-response-1.0 Over a two-week period on blogspot, the hit percentages for above: Mozilla/2: 0.20% MSIE 4.0b2: a single hit RealPlayer: no hits Java/1.0: no hits JDK/1.0: 0.76% I would recommend tossing ALL of them. The middle three for obvious reasons. The JDK because it is really, really insignificant. The Mozilla/2 is a bit tricker. The problem is that the match string is *way* too loose. It is matching *way* more browsers than I bet it was intended to match (I found 201 different User-Agent strings). For example, consider this User-Agent: Mozilla/2.0 (compatible; PlanetWeb/1.011 Golden; SEGA Saturn; TV; Do we really know if the keepalives are broken on that client? It certainly doesn't seem to bear any resemblance to the Mozilla that I'm guessing that BrowserMatch was trying to detect. Or maybe this one: Mozilla/2.0 (compatible; MSIE 3.0; Update a; AK; Windows 95) via proxy gateway CERN-HTTPD/3.0 libwww/2.17 Given that it appears to be proxied, then I'm betting the keepalive is totally fine. Cheers, -g -- Greg Stein, http://www.lyra.org/
Re: Do these broken clients still exist?
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote: Does someone with a high-traffic, general-interest web site want to take a look through their logs for these user-agent strings. I don't mind keeping them if they make up even 1/100 of a percent of the trafic, but it seems silly to keep these extra regexes on every single request if these clients don't exist anymore in the wild. # # The following directives modify normal HTTP response behavior to # handle known problems with browser implementations. # BrowserMatch Mozilla/2 nokeepalive BrowserMatch MSIE 4\.0b2; nokeepalive downgrade-1.0 force-response-1.0 BrowserMatch RealPlayer 4\.0 force-response-1.0 BrowserMatch Java/1\.0 force-response-1.0 BrowserMatch JDK/1\.0 force-response-1.0 You're lucky, we've just got our monthly statistics. Pages: 8776121082 Visitors: 436645066 ( requests) Different user-agent strings: 2703 Matched strings: 11 Visitors who use those browsers: 3576 clem
Re: Do these broken clients still exist?
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote: BrowserMatch Mozilla/2 nokeepalive I don't know about the rest, but Ask Jeeves spoofs this user-agent in its webcrawls; Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html) Not sure if the crawler supports keepalives, but either way it's a (somewhat artificial) source of requests matching the useragent. -- Colm MacCárthaighPublic Key: [EMAIL PROTECTED]
Re: Do these broken clients still exist?
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote: Does someone with a high-traffic, general-interest web site want to take a look through their logs for these user-agent strings. I don't mind keeping them if they make up even 1/100 of a percent of the trafic, but it seems silly to keep these extra regexes on every single request if these clients don't exist anymore in the wild. [snip] BrowserMatch Mozilla/2 nokeepalive I still see this; .001% of ~2M requests from the following full UA strings: Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html) Mozilla/2.0 (compatible; MS FrontPage 4.0) Mozilla/2.0 (compatible; MS FrontPage 5.0) Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; Windows 95) Mozilla/2.0 (compatible; MSIE 3.02; Windows 95) Mozilla/2.0 (compatible; MSIE 3.02; Windows CE) Mozilla/2.0 (compatible; MSIE 3.02; Windows CE; PPC; 240x320) BlackBerry7100/4.0.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 Mozilla/2.0 (compatible; MSIE 3.02; Windows CE; PPC; 240x320) Mozilla/2.0 (compatible; MSIE 3.0; Windows 95) Mozilla/2.0 (compatible; MSIE 3.0b; AOL 4.0; Windows 3.1) --n -- huey dd of=/dev/fd0 if=/dev/flippy bs=1024 huey ^^^ Making Flippy Floppy
Re: Do these broken clients still exist?
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote: Does someone with a high-traffic, general-interest web site want to take a look through their logs for these user-agent strings. I, uh, have just a few logs that I can scan. ;-) I'll use Blogger/BlogSpot logs, as those will have a wider variety of user agents than the main Google site. (which is fortunate, as I don't have access to the main logs anyways) A week or two of logs should provide good coverage. You never know what times of day, or days of the week, that a given UA might be active. I don't mind keeping them if they make up even 1/100 of a percent of the trafic, but it seems silly to keep these extra regexes on every single request if these clients don't exist anymore in the wild. I'll take a look at some logs tomorrow, and mail back with some results. Cheers, -g -- Greg Stein, http://www.lyra.org/
Re: Do these broken clients still exist?
On Apr 3, 2005, at 7:57 PM, Greg Stein wrote: On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote: Does someone with a high-traffic, general-interest web site want to take a look through their logs for these user-agent strings. I, uh, have just a few logs that I can scan. ;-) Showoff. :-)