Re: Do these broken clients still exist?

2005-04-04 Thread Clement Laforet
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote:
 Does someone with a high-traffic, general-interest web site want to take a 
 look through their logs for these user-agent strings.  I don't mind 
 keeping them if they make up even 1/100 of a percent of the trafic, but it 
 seems silly to keep these extra regexes on every single request if these 
 clients don't exist anymore in the wild.
 
 
 #
 # The following directives modify normal HTTP response behavior to
 # handle known problems with browser implementations.
 #
 BrowserMatch Mozilla/2 nokeepalive
 BrowserMatch MSIE 4\.0b2; nokeepalive downgrade-1.0 force-response-1.0
 BrowserMatch RealPlayer 4\.0 force-response-1.0
 BrowserMatch Java/1\.0 force-response-1.0
 BrowserMatch JDK/1\.0 force-response-1.0

You're lucky, we've just got our monthly statistics.
Pages: 8776121082
Visitors: 436645066 ( requests)
Different user-agent strings: 2703
Matched strings: 11
Visitors who use those browsers: 3576

clem


Re: Do these broken clients still exist?

2005-04-04 Thread Paul A. Houle
On Sun, 3 Apr 2005 13:58:56 -0400 (Eastern Daylight Time), Joshua Slive  
[EMAIL PROTECTED] wrote:

Does someone with a high-traffic, general-interest web site want to take  
a look through their logs for these user-agent strings.  I don't mind  
keeping them if they make up even 1/100 of a percent of the trafic, but  
it seems silly to keep these extra regexes on every single request if  
these clients don't exist anymore in the wild.


Regexes are pretty cheap for a 'normal' apache setup.
	In the initial testing of a production server (2x 3.2Ghz Xeon,  6 GB  
RAM;)  we found that,  serving static pages,  the overhead of processing  
regexes didn't become noticable until we had 1000 rewriting rules.  Even  
then,  at least 30% of the hits on this server are cgi-scripts,  so the  
overhead of regexes is really nothing compared to the other ways we abuse  
our machine.

	In doing this testing I did notice that Apache's handling of regexes is  
pretty simplistic.  Much of the time you can consolidate a large stack of  
regexes into a single state machine,  and that could give vast (factors of  
hundreds or thousands) improvements in performance for handling large rule  
sets.  On the other hand,  it doesn't really matter.

	The people we've inherited this server from left us several very large  
regexes with a few hundred pipe symbols each that match UA's of  
non-browser clients that we don't want using our service.  The trouble is  
that inevitably this kind of regex starts mutating into malignant forms as  
people start using parens,  also we have no documentation for the rules;   
on slow days I think about breaking these up into 500-1000 rules,  which  
we could in principle comment one-by-one...  This wouldn't really impact  
the performance of our machine under 'real' circumstances,  but we could  
measure the impact under specialized testing.


Re: Do these broken clients still exist?

2005-04-04 Thread William A. Rowe, Jr.
My only concern is folks who just reinstalled their OS, and
then, mostly for support sites.  I'd think the typical server
wouldn't need to deal with these.  It's also odd to use regex
for non-pattern strings, like these.  All of them could be
trivial strcmp's rather than the regex sledgehammer.

At 12:58 PM 4/3/2005, Joshua Slive wrote:
I don't mind keeping them if they make up even 1/100 of a percent of the 
trafic, but it seems silly to keep these extra regexes on every single request 
if these clients don't exist anymore in the wild.

BrowserMatch Mozilla/2 nokeepalive
BrowserMatch MSIE 4\.0b2; nokeepalive downgrade-1.0 force-response-1.0
BrowserMatch RealPlayer 4\.0 force-response-1.0
BrowserMatch Java/1\.0 force-response-1.0
BrowserMatch JDK/1\.0 force-response-1.0



Re: Do these broken clients still exist?

2005-04-04 Thread Joshua Slive

William A. Rowe, Jr. wrote:
My only concern is folks who just reinstalled their OS, and
then, mostly for support sites.  I'd think the typical server
wouldn't need to deal with these.  It's also odd to use regex
for non-pattern strings, like these.  All of them could be
trivial strcmp's rather than the regex sledgehammer.
At 12:58 PM 4/3/2005, Joshua Slive wrote:
I don't mind keeping them if they make up even 1/100 of a percent of the 
trafic, but it seems silly to keep these extra regexes on every single request 
if these clients don't exist anymore in the wild.
  BrowserMatch Mozilla/2 nokeepalive
  BrowserMatch MSIE 4\.0b2; nokeepalive downgrade-1.0 force-response-1.0
  BrowserMatch RealPlayer 4\.0 force-response-1.0
  BrowserMatch Java/1\.0 force-response-1.0
  BrowserMatch JDK/1\.0 force-response-1.0
I probably wasn't correct when I said regex.  I'd have to check the 
code to be sure, but I believe that mod_setenvif knows enough to 
interpret these as simple string matches.

So if they exist in the wild, even in a miniscule quantity, then they 
should stay.

Joshua.


Re: Do these broken clients still exist?

2005-04-04 Thread Greg Stein
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote:
...
 BrowserMatch Mozilla/2 nokeepalive
 BrowserMatch MSIE 4\.0b2; nokeepalive downgrade-1.0 force-response-1.0
 BrowserMatch RealPlayer 4\.0 force-response-1.0
 BrowserMatch Java/1\.0 force-response-1.0
 BrowserMatch JDK/1\.0 force-response-1.0

Over a two-week period on blogspot, the hit percentages for above:

  Mozilla/2: 0.20%
  MSIE 4.0b2: a single hit
  RealPlayer: no hits
  Java/1.0: no hits
  JDK/1.0: 0.76%

I would recommend tossing ALL of them. The middle three for obvious
reasons. The JDK because it is really, really insignificant.

The Mozilla/2 is a bit tricker. The problem is that the match string is
*way* too loose. It is matching *way* more browsers than I bet it was
intended to match (I found 201 different User-Agent strings). For example,
consider this User-Agent:

  Mozilla/2.0 (compatible; PlanetWeb/1.011 Golden; SEGA Saturn; TV;

Do we really know if the keepalives are broken on that client? It
certainly doesn't seem to bear any resemblance to the Mozilla that I'm
guessing that BrowserMatch was trying to detect. Or maybe this one:

  Mozilla/2.0 (compatible; MSIE 3.0; Update a; AK; Windows 95) via proxy 
gateway CERN-HTTPD/3.0 libwww/2.17

Given that it appears to be proxied, then I'm betting the keepalive is
totally fine.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


Re: Do these broken clients still exist?

2005-04-04 Thread Clement Laforet
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote:
 Does someone with a high-traffic, general-interest web site want to take a 
 look through their logs for these user-agent strings.  I don't mind 
 keeping them if they make up even 1/100 of a percent of the trafic, but it 
 seems silly to keep these extra regexes on every single request if these 
 clients don't exist anymore in the wild.
 
 
 #
 # The following directives modify normal HTTP response behavior to
 # handle known problems with browser implementations.
 #
 BrowserMatch Mozilla/2 nokeepalive
 BrowserMatch MSIE 4\.0b2; nokeepalive downgrade-1.0 force-response-1.0
 BrowserMatch RealPlayer 4\.0 force-response-1.0
 BrowserMatch Java/1\.0 force-response-1.0
 BrowserMatch JDK/1\.0 force-response-1.0

You're lucky, we've just got our monthly statistics.
Pages: 8776121082
Visitors: 436645066 ( requests)
Different user-agent strings: 2703
Matched strings: 11
Visitors who use those browsers: 3576

clem


Re: Do these broken clients still exist?

2005-04-03 Thread Colm MacCarthaigh
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote:
 BrowserMatch Mozilla/2 nokeepalive

I don't know about the rest, but Ask Jeeves spoofs this user-agent in
its webcrawls;

Mozilla/2.0 (compatible; Ask Jeeves/Teoma;
+http://sp.ask.com/docs/about/tech_crawling.html)

Not sure if the crawler supports keepalives, but either way it's a
(somewhat artificial) source of requests matching the useragent.

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: Do these broken clients still exist?

2005-04-03 Thread Noah
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote:
 Does someone with a high-traffic, general-interest web site want to take a 
 look through their logs for these user-agent strings.  I don't mind 
 keeping them if they make up even 1/100 of a percent of the trafic, but it 
 seems silly to keep these extra regexes on every single request if these 
 clients don't exist anymore in the wild.

[snip]
 BrowserMatch Mozilla/2 nokeepalive

I still see this; .001% of ~2M requests from the following full UA
strings:

Mozilla/2.0 (compatible; Ask Jeeves/Teoma; 
+http://sp.ask.com/docs/about/tech_crawling.html)
Mozilla/2.0 (compatible; MS FrontPage 4.0)
Mozilla/2.0 (compatible; MS FrontPage 5.0)
Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; Windows 95)
Mozilla/2.0 (compatible; MSIE 3.02; Windows 95)
Mozilla/2.0 (compatible; MSIE 3.02; Windows CE)
Mozilla/2.0 (compatible; MSIE 3.02; Windows CE; PPC; 240x320) 
BlackBerry7100/4.0.0 Profile/MIDP-2.0 Configuration/CLDC-1.1
Mozilla/2.0 (compatible; MSIE 3.02; Windows CE; PPC; 240x320)
Mozilla/2.0 (compatible; MSIE 3.0; Windows 95)
Mozilla/2.0 (compatible; MSIE 3.0b; AOL 4.0; Windows 3.1)

--n

-- 
huey dd of=/dev/fd0 if=/dev/flippy bs=1024
huey ^^^ Making Flippy Floppy



Re: Do these broken clients still exist?

2005-04-03 Thread Greg Stein
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote:
 Does someone with a high-traffic, general-interest web site want to take a 
 look through their logs for these user-agent strings.

I, uh, have just a few logs that I can scan. ;-)

I'll use Blogger/BlogSpot logs, as those will have a wider variety of user
agents than the main Google site. (which is fortunate, as I don't have
access to the main logs anyways)

A week or two of logs should provide good coverage. You never know what
times of day, or days of the week, that a given UA might be active.

 I don't mind
 keeping them if they make up even 1/100 of a percent of the trafic, but it 
 seems silly to keep these extra regexes on every single request if these 
 clients don't exist anymore in the wild.

I'll take a look at some logs tomorrow, and mail back with some results.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


Re: Do these broken clients still exist?

2005-04-03 Thread Ben Collins-Sussman
On Apr 3, 2005, at 7:57 PM, Greg Stein wrote:
On Sun, Apr 03, 2005 at 01:58:56PM -0400, Joshua Slive wrote:
Does someone with a high-traffic, general-interest web site want to 
take a
look through their logs for these user-agent strings.
I, uh, have just a few logs that I can scan. ;-)
Showoff.  :-)