Server design

J.A. Bezemer Wed, 13 Apr 2011 17:09:24 -0700

Package: denyhosts
Version: 2.6-7

Also reported upstream ashttp://sourceforge.net/tracker/?func=detail&aid=3286240&group_id=131204&atid=720419

Since I have no idea how stable & findable the sourceforge tracker will bein some far future, I thought I'd include the entire report below. Thereare some things Debian can do pending an upstream solution.



(Warning: long report)

First of all: THANKS for this very useful piece of software!

Having run denyhosts in cron mode for a couple of years now, I was happyand never bothered to check its inner workings. But just now I haveinstalled a few new boxes with denyhosts in daemon mode (Debianpackage), which gives very nice /var/log/denyhosts output.

So I noticed that denyhosts will only receive max 50 new hosts from thecentral database during any sync run. I'd guess that's a hardcoded limitto prevent overloading the central server. The problem is that I get 50new blocked hosts _every_ sync run, which made me think that I'm notreceiving quite enough.

Indeed, the denyhosts stats page shows an average of 14624 new hosts perday (though that page hasn't been updated for more than a month now).Default sync interval is 1 hour, so any denyhosts client will by defaultreceive at most 24*50=1200 new hosts per day. That is 1200/14624 = just8.2% of all new active crackers of that day. In other words, any givenclient will _never_ know of 91.8% of all active crackers.


That's bad, since it renders the whole sync stuff mostly useless.

So, I tried to tweak the sync settings a bit. If I can receive only alimited maximum number of new hosts per day, I might well opt to receivethe hosts that are most likely to attack me, i.e. the hosts that havealready attacked the most other clients. Let's saySYNC_DOWNLOAD_THRESHOLD=10. And quick botnet cracks would be handy tocatch early, so SYNC_DOWNLOAD_RESILIENCY=3h. And to keep up with it all,SYNC_INTERVAL=8m. This seems a quite more useful setting, with theclient mostly receiving some 30-40 new crackers during each sync run,but sometimes still the maximum amount of 50 (once every few hours).

Conclusion 1: Please adjust the default sync settings which are _way_wrong.

Yet, there's still another problem. During any given sync run with theabove settings, only 5-10 hosts get added to hosts.deny. Indeed, most ofthe hosts returned by the sync are _already_ in the hosts.deny file.They have been put in at previous sync runs. Quick checking shows thatthe central server will happily re-send crackers at least 10-20 times(once 37 repeats).


This means the sync run is not working as advertised.

Okay, syncing is hard. A syncing client defines a set of "qualifying"crackers, namely those that have been reported at least N times over aresiliency period of at least T hours. From that set, any given sync runwants exactly those crackers that _started_ to qualify between time S1and S2 (S1=previous sync, S2=now).

[Sidenote: S1 = WORK_DIR/sync-timestamp should be _server_ time, sinceclient clocks can be way off. I didn't check if this file is indeedfilled with a server-provided value.]


I'd expect that the "_started_ to qualify" idea is the big problem.

But the denyhosts central database design is closed-source (Boohoo!), soI can't check what is going wrong. That's the problem withclosed-source.

Yet I couldn't stop myself thinking about this problem, and ended updesigning a denyhosts central-server idea myself. This design "should"work as intended, but I leave the implementation and verification as anexcercise to the reader.

(WARNING: This message is licensed under GNU GPL>=3. Other licensingavailable on request & payment.)

It's probably easiest to use simple C types to make my point. Actually,it could be a C implementation, given enough RAM (currently ~10Mcrackers, max ~3k reports per cracker).


cracker_t:
  addr_t    ipaddress
  time_t    firsttime
  time_t    latesttime
  time_t    resiliency      (= .latesttime - .firsttime)
  int       totalreports    (= all reports including cleaned-up ones)
  int       currentreports  (= length of current reports list)
  report_t *reports

report_t:
  addr_t    reportingipaddress
  time_t    firstreporttime
  time_t    latestreporttime
  time_t    latesttotalresiliency
                     (=report.latestreporttime - cracker.firsttime)
  report_t *next     (linked list, or somesuch)


Sync upload, reporter sends in some cracker:
  make sure reporter itself isn't "reliably" listed as cracker
    (if so, then drop the report entirely)
  if reported cracker already registered:
    if reportingipaddress already in reports list (*Note 1*):
      update report.latestreporttime and .latesttotalresiliency
      update cracker.latesttime and .resiliency
    else
      add new report at end of reports list
          (must stay sorted for condition (c) below)
      update cracker.* stuff
  else
    add new cracker at end of crackers list
       (should stay sorted for easy most-recently-first access)
    add new report
    set cracker.* stuff

*Note 1* Merging repeated reports from the same reportingipaddres is adesign tradeoff. Ideal would be to add all reports separately even ifrepeatedly from same reporter, but that could possibly use too muchdatabase memory. It also possibly messes with the meaning of the (c)condition below, depending on whether/how you want to count NAT-edhosts.

However, a simplistic always-merging solution would allow the (d)condition below to be manipulated: a small group of malicious clientsmay keep sending repeated reports for the same bogus cracker every 30minutes. Due to the merging, and the fact that no honest client willreport the bogus cracker, the (d) condition will always trigger and thisbogus cracker will certainly be included in all following syncdownloads. Do this for a slowly-changing list of 50 bogus crackers, andno real crackers will ever come through again.

So Alternative 2: just drop repeated reports from samereportingipaddress if .latestreporttime is less than 24 hours ago. After24 hours, merge once. Then again wait 24 hours, and repeat.

Alternative 2b: wait 12 hours if it is report[0], 18 hours if it isreport[1], 24 hours if report[>=2].

Both of these variants still have some probability to force the (d)condition to pass from time to time.

Alternative 3: After 24 hours, keep old report untouched and add new2nd report at end of reports list. Wait another 24 hours, dropping newduplicate reports. Then add a 3rd report for this reportingipaddress atend of reports list. New duplicates may be merged to this 3rd reportindefinitely, since the second test of (d) will keep failing at the 2ndreport (assuming all sensible clients have T less than 24 hours).

Downside of this variant is that the meaning of the (c) condition ischanged (which may not necessarily be bad in this case).



Sync download: client provides N, T, S1; server knows (and sends) S2.
  limit S1 to reasonable value, for example 1 day ago.
  _bottom-up_ check for qualifying crackers (most-recently-first, to
  have most useful results within max-50 limit)
  (a) cracker qualifies if
        cracker.currentreports >= N && cracker.resiliency >= T
  (b) cracker can only have _started_ to qualify during S1--S2 if it was
      actually updated during that period, so if
        cracker.latesttime >= S1    (always <= S2=now)
  we want to send as soon as both N and T are reached = (a), i.e. as
  soon as the _last_ of them is reached:
  (c) cracker _started_ to qualify for N during S1--S2 if
        reports[N-1].firstreporttime >= S1
  (d) cracker _started_ to qualify for T during S1--S2 if
        any report has .latestreporttime >= S1 &&
          .latesttotalresiliency >= T
        but no report has .latestreporttime <= S1 &&
          .latesttotalresiliency >= T
  cracker will be sent to syncing client if  a && b && (c || d)

(a) and (b) are very fast checks; (c) may be slow(er) depending onimplementation, but constant-time per cracker, (d) can be slowest butsecond check will often fail early. Depending on implementation, thefirst few (d) checks can be combined with (c).



"Use case" for (c):

Client sets N=5 reporting hosts, T=5 hours resiliency, S1 = 1 hourago.

New cracker gets 1 report each day.
Cracker will qualify for T from second day onward, but not yet for N.

On the fifth day, the fifth report is added, which means the cracker_starts_ to qualify.Cracker will be sent when client connects the first time after the fifthreport is received. All following client connects will have S1 afterreport[4].firstreporttime, and the cracker will not be sent again anymore.



"Use case" for (d):

Client sets N=5 reporting hosts, T=5 hours resiliency, S1 = 1 hourago.New cracker gets 10 reports within 1 hour, then it takes a weekvacation, then after that week suddenly one new report, and three hourslater another report.Cracker will not qualify during that week, and will only _start_ toqualify when the first new report arrives. At that point, (a) becomestrue for the first time ever, the new report passes the first check of(d), and no reports fail the second check of (d).The next client connects will have an increased S1, making the firstcheck of (d) fail.The second new report ("three hours later") is of no consequence, sincethe second new report indeed passes the first check of (d), but theearlier report already fails the second check of (d).

This will cause repeated sync-download if the final "three hours later"report actually was a duplicate from the same reportingipaddress as theearlier report and these reports were merged. See *Note 1* above.



Expiry/maintenance every hour/day:
  remove reports with .latestreporttime older than (for example) 1 month
    and only update cracker.currentreports
  remove reports that were reported by what we now "reliably" know to
    be crackers themselves
  remove crackers that have no reports left


Well that's about it. I hope this is clear enough.

Conclusion 2: Please adjust the central server design to startworking correctly, possibly by implementing above ideas.

Conclusion 2.5 in case you're not interested in doing that: Anyoneelse willing to try it?

Which brings me to the last related issue: preventing database poisoningby botnets.

Any black-hat botnet operator wanting to keep a bunch of real crackersoff the radar, will need to make sure that 50+ bogus crackers are sentto all clients all the time. (Because as long as a total of less than 50crackers are sent to a client, all qualifying _real_ crackers willcertainly be included.) So, many repeated 50-new-cracker sync downloadsare highly suspicious.


... which seems to be the case right now. Oops.

Some of this might well be caused by the semi-broken central serverissues as described above (10-20 times repeated downloads for any givencracker), some of this might not...

Next to solving the underlying central server issues, increasing themaximum of 50 crackers per sync would be a _very_ good idea to counterthis.

But pending that, and indeed supplemental to that, the straightforwardclient-based solution actually works: simply increaseSYNC_DOWNLOAD_THRESHOLD (called N above). Why does that work? Well, if areal cracker is forcibly kept off of the sync downloads, very few hostswill receive the real cracker and block it proactively. Which means thecracker will be seen - and _reactively_ blocked - by many hosts. Allthese many hosts will report the cracker to the central database. Whichmeans this real cracker will easily overcome a very high threshold.

As long as a client keeps the threshold low, it will receive the newest50 qualifying crackers, which might well be bogus, and it will never seethe real cracker which quickly will be (much) older and gets drowned inthe noise. As soon as a client sets a higher threshold, the real crackerwill still qualify easily, but there are much less bogus crackers tofill the 50 maximum.

This keeps working when all clients raise thresholds simultaneously.Actually, it will work better, since more clients will receive more realcrackers and proactively block them. In the "completely swamped"situation, many clients will report a cracker since they do not _know_that it has been reported very many times already. In the "controlled"situation, any cracker will be reported at most average(thresholds)+1times, and then anyone else will have proactively blocked it.

And this will probably continue working until a black-hat botnetoperator generates more traffic to the denyhosts central server than toall really brute-forced ssh hosts combined. After all, he'll need tomake sure there are always at least 50 new (very-)high-threshold boguscrackers in the database. At that point, it'll be very easy to singleout who the "heavy denyhosts contributers" are, and to simply block themfrom contributing. Additionally, this will reveal a major part of thebotnet IP addresses, which is usually not what botnet operators want...(Indeed, this would effectively amount to DOS'ing the denyhosts centralserver, which would probably be a lot easier to do _outside_ thedenyhosts sync protocol.)

So I'd propose an automatically-adaptive threshold for all clients, asfollows:

1. The central server will report some maximum allowed number ofcrackers, and additionally include an explicit "too many" notice if morecrackers were available. (This is to allow easy server-based adjustmentof the maximum.)

2. If a client sees the "too many" notice for, say, 10 consecutive synccycles, it will increase its threshold by 1. Then it starts counting 10sync cycles again, etc.

3. If a client does not see the "too many" notice for, say, 30consecutive sync cycles, it will decrease its threshold by 1, but nofurther down than the configfile value (which becomes a minimum,effectively).

Some more contrived control algorithms would be possible, but thisshould already work quite nicely.

This client-side adjustment idea has the significant advantage that itwill automatically find the optimal SYNC_DOWNLOAD_THRESHOLD value forany given combination of SYNC_DOWNLOAD_RESILIENCY and SYNC_INTERVAL. (Asopposed to other, completely server-based ideas that I had beenconsidering.)

To cater for manual or cron-based setups, the threshold and sync-cyclecounters should be saved in the WORK_DIR.

Pending an update of the central server, items 2 and 3 can already beimplemented in clients, with the "too many" check replaced by ">=50newly received crackers". But note that, due to the current serverissues outline above, most of the received "real" crackers will actuallybe duplicates.

Conclusion 3: Please implement something like the outlined automaticthreshold adjustment system to make poisoning the denyhosts centraldatabase a useless affair.



Best regards, and keep up the good work,

Anne Bezemer

J.A.BezemerXopensourcepartners.nl | s/X/@/

This message and all ideas expressed herein are Copyright (C) 2011 byme, and licensed under GNU General Public License version 3 or higher.Other licensing options are available on request & payment. No warrantywhatsoever.




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#622697: Denyhosts default sync leaves >90% unprotected / Server design

Reply via email to