Re: Research - Valid Data Gathering vs Annoying Others

2004-08-07 Thread Mike Leber


On Fri, 6 Aug 2004, John K Lerchey wrote:
 Hi NANOG folks,
 
 We have a situation (which has come up in the past) that I'd like some 
 opinions on.
 
 Periodically, we have researchers who develop projects which will do 
 things like randomly port probe off-campus addresses...

Here are some observations based on an internal corporate RD project we
ran about 4 years ago that crawled all the websites on the Internet for
use with a search engine.

* Lower your impact.  Limit the number of requests sent to a specific IP
within a time period.  Limit how fast you make requests.  Don't assume
adjacent IPs aren't the same server, don't make parallel requests to IPs
within the same /24.  Limit the total number of requests you make to a
specific IP.  Limit the amount of data transferred from each IP.

* Make sure to implement a block list to avoid scanning people that ask
you to stop.

* Make your hostname something that helps explain what you are doing.

* Make sure that other people in your group know that you are running the
experiment and who to forward phone calls to.

* Run a webserver on the IP or IPs that are doing the scanning explaining
what you are doing.

* Honor robots.txt, and other access denied type responses or error
codes.

* Don't assume the data returned is valid or nonhostile.  Some people run
search engine traps (infinitely large programmatically generated websites)
to try to salt the search engines with their bogus advertising data.  
Some people want to crash any program that scans them.  Some people will
do things you didn't think of.

* Expect some people to send automated complaints without knowing that 
they are sending them and without understanding the contents of the
complaints they are sending.

* Expect some people to complain about you attacking them on port 53 when
you look up the address for their domain name, even if you never scan
their website or otherwise interact with any of their IPs.  (During the
experiment this was the largest source of complaints.)

* If you run the project 24 x 7, you need to respond 24 x 7.

Mike.

+- H U R R I C A N E - E L E C T R I C -+
| Mike Leber   Direct Internet Connections   Voice 510 580 4100 |
| Hurricane Electric Web Hosting  Colocation   Fax 510 580 4151 |
| [EMAIL PROTECTED]   http://www.he.net |
+---+





Re: Research - Valid Data Gathering vs. Annoying Other

2004-08-07 Thread Stephen J. Wilcox

if i typo and enter your ip in my browser, thats illegal?

theres no way to tell the intentions of a received packet.. is it research, is 
it hostile, is it an infected machine, is it an error

whilst its prudent to assume that anything received unexpectedly is malicious, 
that doesnt necessarily follow that you should do anything more with it than 
discard it.. you should obtain more data before deciding to take action

and whilst you may not like the research community carrying out unauthorized 
probes it is these guys who made and maintain the internets systems, you should 
cut them some slack in the name of research!

Steve

On Fri, 6 Aug 2004, Robert Bonomi wrote:

 
 To: [EMAIL PROTECTED]
 Subject: Re: Research - Valid Data Gathering vs Annoying Others
 
  Date: Fri, 6 Aug 2004 14:09:01 -0400 (EDT)
  From: John K Lerchey [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Subject: Research - Valid Data Gathering vs Annoying Others
 
 
  Hi NANOG folks,
 
  We have a situation (which has come up in the past) that I'd like some 
  opinions on.
 
 [[.. $ mount /dev/soapbox  # you have been warned.   ..]]
 
  Periodically, we have researchers who develop projects which will do 
  things like randomly port probe off-campus addresses.  The most recent 
  instance of this is a group studying bottlenecks on the internet.  Thus, 
  they hit hosts (again, semi-randomly) on both the commodity internet and 
  on I2 (abeline) to look for places where there is traffic congestion.
 
  The problem is that many of their random targets consider the probes to 
  be either malicious in nature, or outright attacks.
 
 Why not?  Their network, *THEIR* rules. 
 
 *HOW* is one supposed to tell a 'benign' probe from a 'hostile' one,
 when it is addressed to a machine that doesn't exist, or to a 'service'
 that doesn't exist on an existant machine?
 
 With all the 'overtly hostile' traffic out there, why on earth would anyone
 consider that, with regard to 'unexpected'/'abnormal' traffic, there should 
 be _any_ 'expectation of innocence'?
 
 Surely you don't think that the 'recipient' needs to do a _complete_analysis_
 of what was being attempted, and why -- including making a determination of
 the 'intentions' of the perpetrator -- for -every- 'unauthorized' attempt 
 to use their network, before complaining about the fact of an attempt at
 'unauthorized use'?
 
 I have a very _simple_ rule -- if it isn't intended for a service I make
 available, on a machine I let the world have access to, then it is, _by_
 definition_, an attempt to access that machine 'without authorization, or
 in excess of the authorization granted'.  Because the -only- 'authorized
 use is those things whiich I expressly let past my firewall.  Ergo, if
 the firewall blocks it, it _IS_ an 'unauthorized access' attempt.
 
 Whereupon, 18 USC 1030 (b), becomes *very* relevant, given the language
 of 18 USC (a) (2) (C).   The minimum penalty is 'up to a year imprisonment'.
 given any 'extenuating circumstances' and it could be up to 20 years.
 
 
 On my _personal_ network, at home (a /29 -- big wow:), I currently see
 well over FIFTEEN THOUSAND unauthorized probes per day.  Of those, a
 *maximum* of 1-in-four-thousand *might* possibly be legitimate.
 
 I give people the 'benefit of the doubt', and assume that these probes
 are coming from virus-infected (unbeknownst to the owner) machines, rather
 than 'deliberate, with malice aforethought' hacking attempts by the machine's
 owner.
 
 HOWEVER, that notwithstanding, *EVERY*ONE* gets reported to the responsible
 _network_operator_ -- as an 'apparent virus-infected machine on your network',
 With the relevant supporting documentation, and a simple request that the 
 machine be disabled from external network access until it has been sterilized
 and secured against further infection.
 
 The reporting is mostly to help the other operators keep _their_ networks
 clean.  And to get those machines off-line  -- so that they cannot infect
 other 'unprotected' machines.  I'm confident _my_ network is adequately 
 protected. grin
 
 Note: I don't care _what_ the 'name' of the machine is -- I don't even
 check for rDNS, I look up the registered netblock _owner_ of the IP address,
 at the RIR.  And THAT is where the complaint reports go.   
 
 
   As a result of this, 
  we, of course, get complaints.
 
 Deservedly so.
 
  One suggestion that I received fro a co-worker to help to mitigate this is 
  to have the researchers run the experiments off of a www host, and to have 
  the default page explain the experiment and also provide contact info.
 
 People are supposed to 'take it on faith' that what the website _says_ about 
 what is going on _is_ what is *actually* happening?
 
 I hope you don't mind if I laugh -- Computerized 'social engineering', in
 an attempt to deflect complaints, _is_ a humorous concept.
 
 Do you *really* think that anybody is going to bother to go 

Re: that MIT paper again

2004-08-07 Thread Paul Vixie

i wrote:

 wrt the mit paper on why small ttl's are harmless, i recommend that
 y'all actually read it, the whole thing, plus some of the references,
 rather than assuming that the abstract is well supported by the body.
 
 http://nms.lcs.mit.edu/papers/dns-imw2001.html

here's what i've learned by watching nanog's reaction to this paper, and
by re-reading the paper itself.

1. almost nobody has time to invest in reading this kind of paper.
2. almost everybody is willing to form a strong opinion regardless of that.
3. people from #2 use the paper they didn't read in #1 to justify an opinion.

4. folks who need academic credit will write strong self-consistent papers.
5. those papers do not have to be inclusive or objective to get published.
6. on the internet, many folks by nature think locally and act globally.

7. #6 includes manufacturers, operators, endusers, spammers, and researchers.
8. the confluence of bad science and disinterested operators is disheartening.
9. good actual policy must often fly in the face of accepted mantra.

we now return control of your television set to you.