I have some doubts about massive "hosts" files for adblocking.  I
downloaded one that listed 13,148 sites.  I fed them through a script
that called "host" for each entry, and saved the output to a text file.
The result was 1,059 addresses.  Note that some adservers have multiple
IP address entries for the same name.  A back-of-the-envelope analysis
is that close to 95% of the entries in the large host file are invalid,
amd return "not found: 3(NXDOMAIN)".

  I'm not here to trash the people compiling the lists; the problem is
that hosts files are the wrong tool for the job.  Advertisers know about
hosts files and deliberately generate random subdomain names with short
lifetimes to invalidate the hosts files.  Every week the sites are
probably mostly renamed.  Further analysis of the 1,059 addresses show
810 unique entries, i.e. 249 duplicates.  It gets even better.  44
addresses show up in 52.84.146.xxx; I should probably block the entire
/24 with one entry.  There are multiple similar occurences, which could
be aggregated into small CIDRs.  So the number of blocking rules is
greatly reduced.

  I'm not a deep networking expert.  My question is whether I'm better
off adding iptables reject/drop rules or "reject routes", e.g...

route add -net 10.0.0.0 netmask 255.0.0.0 metric 1024 reject

(an example from the "route" man page).  iptables rules have to be
duplicated coming and going to catch inbound and outbound traffic.  A
reject route only needs to be entered once.  This excercise is intended
to block web adservers, so another question is how web browsers react to
route versus iptables blocking.

  While I'm at it (I did say I'm not an expert) is there another way to
handle this?  E.g. redirect "blocked CIDRs" via iptables or route to a
local pixel image?  Will that produce an immediate response by the web
browser, versus timing out with "regular blocking"?

-- 
Walter Dnes <waltd...@waltdnes.org>
I don't run "desktop environments"; I run useful applications

Reply via email to