Re: Favicon spam

2001-11-22 Thread Ian Davey

Greg Miller wrote:

 Last I heard, the industry averages were supposed to be something 
 like 3:1 pageviews-to-users ratio and 50% repeat visitors. So the 
 number of favicon 404s would be approximately 1/6 of the total number 
 of pageviews.

 That would only be true if every site consisted of just a single page, 
 which is clearly untrue. From what I've read so far, the current 
 implementation requests the favicon once for each domain.
 
 Erm, no. It would be *untrue* if each site consisted of a single page.


Yeah, sorry, I misread what you'd written. Is this per web site, or per 
domain name? I'm not sure how relevant those figures are anyway, they 
certainly don't gell with the patterns I've seen on sites on which I 
have access to the statistics. There are few sites these days on which 
you can navigate to what you what by visiting just three pages, and 
those on which you can are likely to be part of a number of sites hosted 
on a single domain (i.e. geocities.com). I imagine the above industry 
averages are largely influenced by behemoths like AOL and MSN.

 account the average number of images/stylesheets/javascript appearing 
 in external files. As this should be based on resources requested, not 
 pageviews as that is misleading.
 
 I thought I was quite clear about the fact that this was only a matter 
 of pageviews. I don't know of any good web-wide stats for requests or 
 bandwidth, and I suspect no useful stats could be determined since 
 things vary too widely.

Personally I think in this case specific examples would be far more 
useful than industry averages anyway, as they are far to swayed by huge 
hosts.


 You should probably also take into account the % of /favicon.ico 
 associated with domains, as those wouldn't appear as 404s (i.e. 
 Netscape Enterprise Server seems to come with one as default).
 
 
 
  From a bandwidth perspective, those are even worse than 404s. As I 
 mentioned before, averages are no consolation to the people getting hit 
 with worst-case scenarios.


But I thought part of your argument was about 404 errors in weblogs, these

wouldn't occur when favicons already exist, so in that case its no more 
a bandwidth problem than any other image. The 404 issue is the major 
problem with this, requesting resources that don't exist, rather than 
the bandwidth.

ian.








Re: Favicon spam

2001-11-22 Thread Ian Davey

Greg Miller wrote:

 
 That's not a terrible increase in bandwidth (the exact figures would 
 depend on protocol overhead and such), but web hosts have a nasty habit 
 of charging for disk space, which often includes the space for those log 
 files that shoot up by over 20% if everyone adopts this favicon practice 
 or 7% with the hypothetical 30% marketshare that was mentioned earlier.


It might be going out on a limb, but it sounds as though the real 
bandwidth problem is the collection of logfiles to generate 
statistics...  I've encountered this, the logfiles tend to take up far 
more space than the websites they cater for and quickly eat up gigabytes 
of space, but this is really a different issue that argues for better 
management of logfiles.

If your site is small I see little point in collecting anything but 
minimal filtered statistics, a summary rather than lots of raw data.

As a user I've found the feature quite useful, especially when using 
tabbed browsing, and I can't see that either Mozilla, Konqueror or even 
Netscape, have the clout to get people to put link's to a favicon on 
every page of their site. Whereas I've been surprised by how many sites 
do have them...

Though, as useful as I find this, I think that checking for favicons when:

1) bookmarking
2) visiting a bookmarked site without a cached 404 for the favicon

would be a better compromise than the current one. The reason being that 
you'd get a favicon for you most visited sites. Why would I want an icon 
cached for any old site I just happened to visit? They're only really 
useful for sites I visit regularly. 2) is for sites I already haved 
bookmarked which may not yet have acquired a favicon. It may cause a bit 
of noise in logs, but a far more acceptable amount, takes advantage of 
the caching of favicon status and only comes from visitors who care 
enough to bookmark your pages. You could also have a pref use favourite 
icons for bookmarks to let this be turned on or off.

ian.





Re: Favicon spam

2001-11-21 Thread Ian Davey

Greg Miller wrote:

 Jonas Sicking wrote:
 
 It would be really interesting to get some hard numbers on this. Just
 looking at the current logs will not really say anything since very few
 people browse with a mozilla with this pref turned on. So we need to 
 come up
 with some way to approximate the number of 404s per (for example) 
 month in
 the event of a browser with, say, 30% marketshare using the current
 configuration.
 
 
 Last I heard, the industry averages were supposed to be something like 
 3:1 pageviews-to-users ratio and 50% repeat visitors. So the number of 
 favicon 404s would be approximately 1/6 of the total number of pageviews.

That would only be true if every site consisted of just a single page, 
which is clearly untrue. From what I've read so far, the current 
implementation requests the favicon once for each domain.

So you're number above needs to be divided by the average number of 
pages visited by a single user on a server. You also need to take into 
account the average number of images/stylesheets/javascript appearing in 
external files. As this should be based on resources requested, not 
pageviews as that is misleading.

So it should actually be:

1/(6*visited pages per server*resources per page)

To fill in some numbers pulled from the air:

1/(6*10*10)

So that accounts to 1/6000 resource requests. If you can come up with 
some numbers to fill in the above guesses then you'd get closer to the 
actual figure.

You should probably also take into account the % of /favicon.ico 
associated with domains, as those wouldn't appear as 404s (i.e. Netscape 
Enterprise Server seems to come with one as default).

ian.





Re: Favicon spam

2001-11-21 Thread Ian Davey

Ian Davey wrote:

 
 1/(6*10*10)
 
 So that accounts to 1/6000 resource requests. If you can come up with 
 some numbers to fill in the above guesses then you'd get closer to the 
 actual figure.

That should be 1/600 - it's too early in the morning :-)

ian.





Re: Favicon spam

2001-11-21 Thread Greg Miller

Ian Davey wrote:

 Greg Miller wrote:
 Last I heard, the industry averages were supposed to be something like 
 3:1 pageviews-to-users ratio and 50% repeat visitors. So the number of 
 favicon 404s would be approximately 1/6 of the total number of pageviews.
 
 
 That would only be true if every site consisted of just a single page, 
 which is clearly untrue. From what I've read so far, the current 
 implementation requests the favicon once for each domain.


Erm, no. It would be *untrue* if each site consisted of a single page.


 
 So you're number above needs to be divided by the average number of 
 pages visited by a single user on a server. You also need to take into 


Already did that. That's what the 3:1 figure was for.


 account the average number of images/stylesheets/javascript appearing in 
 external files. As this should be based on resources requested, not 
 pageviews as that is misleading.


I thought I was quite clear about the fact that this was only a matter 
of pageviews. I don't know of any good web-wide stats for requests or 
bandwidth, and I suspect no useful stats could be determined since 
things vary too widely.


 You should probably also take into account the % of /favicon.ico 
 associated with domains, as those wouldn't appear as 404s (i.e. Netscape 
 Enterprise Server seems to come with one as default).


 From a bandwidth perspective, those are even worse than 404s. As I 
mentioned before, averages are no consolation to the people getting hit 
with worst-case scenarios.





Re: Favicon spam

2001-11-20 Thread David Hyatt

The icon is not cached forever.  It simply has no specified 
expiration.  That just means it won't be doomed based only off some 
expiration date.  It can still be removed from the cache as the cache 
fills up and needs to evict items.

dave
([EMAIL PROTECTED])

Jonas Sicking wrote:

 A lot of oppinions has been expressed with regard to if the favicon should
 be default on or off since it might spam webservers with requests to a
 non-existing file.
 
 It would be really interesting to get some hard numbers on this. Just
 looking at the current logs will not really say anything since very few
 people browse with a mozilla with this pref turned on. So we need to come up
 with some way to approximate the number of 404s per (for example) month in
 the event of a browser with, say, 30% marketshare using the current
 configuration.
 
 Since the absence of a /favicon.ico is cached the number of 404-ing requests
 will be much lower then the numbers of pagehits. Brendan says that the
 absense is cached persistently and with never-expire, does that mean that
 mozilla won't request /favicon.ico again unless the user manually clears the
 cache? In that case the number of 404s will be approximatly equal to the
 number of new users every month * 30%.
 
 If it's not possible to extract the number of new users from the logs i
 think that the number of new IP-addresses * 1.5 is a good enough estimation.
 There are probably more then 1.5 user per IP on average, but all users
 probably don't visit the site. If someone have a better number then 1.5,
 please speak up, my guess is very uneducated.
 
 However it seems a bit wrong to me that a resource is cached forever. What
 if a site want to start supporting /favicon.ico? Will only new users see the
 new icon? IMHO a resource should be reloaded at least sometime so that if
 the resource appears/changes we will eventually catch it.
 
 So say that we reload every 2 weeks. That means every user will reload
 /favicon.ico once every 14th day, which means that the number of 404s will
 be number of destict users during 14-days * 30% * 30/14.
 
 So, we've got:
 
 Hits = newUsersPerMonth * 0.3 if we cache indefenetly
 
 Hits = distinctUsersPerXDays * 0.3 * 30/X if we refetch every X days
 
 Where IP-addresses * 1.5 could approximate number of users. IMHO the right
 thing would be to use the second formula with X ~= 14.
 
 So it would be great if someone with access to the logs to a rather heavily
 used site could run these formulas and compare that to the number of
 normal 404s.
 
 / Jonas Sicking
 
 
 





Re: Favicon spam

2001-11-20 Thread Greg Miller

Jonas Sicking wrote:

 It would be really interesting to get some hard numbers on this. Just
 looking at the current logs will not really say anything since very few
 people browse with a mozilla with this pref turned on. So we need to come up
 with some way to approximate the number of 404s per (for example) month in
 the event of a browser with, say, 30% marketshare using the current
 configuration.

Last I heard, the industry averages were supposed to be something like 
3:1 pageviews-to-users ratio and 50% repeat visitors. So the number of 
favicon 404s would be approximately 1/6 of the total number of pageviews.

However, that's only an average and the effect on the number of requests 
and bandwidth consumed would vary wildly depending on the individual 
site. Every site without a favicon would suffer--it's just a question of 
degree. Good thing no browser I'm aware of has an equivalent policy for 
CSS (which would benefit me at the expense of people without external 
CSS), JS (which would benefit people using external JS files at the 
expense of everyone else), etc.