Hello Ludovic,

Quoting "Ludovic =?iso-8859-1?Q?Court=E8s?=" <l...@gnu.org> from ml.p2p.hackers:
:You may be interested in the talk on GNUnet’s DHT available at
:<http://www.gnu.org/ghm/2010/denhaag/> under “DHT and routing in
:GNUnet”.  It presents security issues and how GNUnet’s implementation

Looks like it was a great show...  I grabbed some slides that I will review
calmly during the end-of-year vacation period. :-)

Regarding DHT security and Kademlia, I have implemented methods in the Gnutella
DHT (Kademlia-based) to prevent Sybil attacks, based on pure statistical
distribution of KUIDs in the path.

This is based on the following paper:

        "Efficient DHT attack mitigation through peers' ID distribution"
        Thibault Cholez et al.
        INRIA, Jun 2010, 8th.

I have read many papers on DHT security but the conclusions and algorithms
proposed by the authors most always involved a central authority of some kind
and I wanted something that would be decentralized and self-tuning (almost).

I was also looking for something simple enough to implement myself (i.e. not
depending on a priori cooperation among Gnutella servent providers) yet
efficient enough to get a decent level of protection (meaning overcoming the
protection would require "significant" resources).

The paper above provided me with the logic I was looking for.

My implementation in the Gnutella client gtk-gnutella is done in C.  Here is
a source code exerpt (only comments) that describes the logic behind the
approach.  This is my synthesis of the paper mentionned above.

First, the passive, systematic protections:

    /*
     * The following are safety precautions described in the article:
     *
     * "Efficient DHT attack mitigation through peers's ID distribution" by
     * Thibault Cholez et al., published in June 2010.
     *
     * The preventive rules implemeted here attempt to fight Sybil attacks:
     *
     * - We limit the amount of hosts coming from the same class-C network.
     *   This makes it more difficult to attack a KUID because rogue nodes
     *   have to be spread among many subnets.
     *
     * - We discard nodes too close from the target as being suspicious.
     *   The maximum amount of common bits allowed is determined at lookup
     *   creation based on the current estimated DHT size. Given that KUIDs
     *   are randomly generated, the chance the threshold be hit by accident
     *   is only 1 / 2^KDA_C.  And even then, there's no real damage done
     *   to the lookup path because that should only impact 1 node among the
     *   KDA_K closest.  The chance that 2 be affected is negligible in
     *   practice, unless an attack is under way in which case we want to
     *   exclude these nodes anyway.
     *
     * Naturally, lookups aimed at a specific node to grab its security token
     * must avoid performing the "too close" check, by construction!
     */

The constants mentionned in the comments are:

        KDA_K = 20
        KDA_C = 10

Then the active, self-tuned ones:

    /*
     * The following are safety precautions described in the article:
     *
     * "Efficient DHT attack mitigation through peers's ID distribution" by
     * Thibault Cholez et al., published in June 2010.
     * 
     * The idea is that Sybil attacks will necessarily change the statistical
     * distribution of the KUIDs surrounding the target.  By comparing the
     * actual KUID prefix distribution with the theoretical one, we can detect
     * that something is unusual.
     *
     * The divergence between the theoretical distribution of prefixes and
     * the actual one is measured by computing the Kullback-Leibler divergence
     * (K-L divergence for short).
     *
     * The measured distribution of prefixes sharing "i" leading bits with the
     * target is M(i).  For instance, if 6 nodes from the 20 closest share
     * 13 bits with the target, M(13) = 6/ 20 = 0.3.
     *
     * The prefix length b_min at which we start looking is the theoretical
     * k-ball furthest threshold, which dht_get_kball_furthest() gives us.
     * It is computed as:
     *
     *    b_min = E[log2(estimated_DHT_size / KDA_K)]
     *
     * with E[x] being the integer part of x.
     *
     * The maximum prefix length we consider, b_max, is computed as:
     *
     *    b_max = b_min + KDA_C
     *
     * where KDA_C is the "closeness factor", an arbitrary amount of extra
     * bits we allow to have in common with the key before looking suspicious.
     *
     * Starting at b_min, the theoretical distribution of prefixes, T(i) is
     * computed as:
     *
     *    T(i) = 1 / 2^(i - b_min + 1)
     *
     * So if b_min is 13, T(13) = 1/ 2, T(14) = 1/2^2, T(15) = 1/2^3, etc...
     * Up to T(b_max) = 1 / 2^(KDA_C + 1).
     *
     * The K-L divergence of T from M is given by:
     *
     *    Dkl(M|T) = SUM(M(i) * log2(M(i) / T(i))
     *            i = b_min to b_max
     *
     * Intuitively, the larger M(i)/T(i), the larger the divergence added
     * by the i-bit prefix.  Given that an attack will focus on getting close
     * to the KUID target, T(i) will get smaller and smaller as "i" increases
     * and a large M(i) will indicate a potential attack.
     *
     * Since Dkl is a summation, we can determine which prefix length
     * contributes the most towards the divergence and therefore remove these
     * nodes from the path as a counter-measure.
     *
     * The beauty of that protection is that the more efficient the Sybil
     * attack is designed to be, the more we'll spot it and defeat it!
     */

I have made some improvements to the algorithm described in the paper based
on my observations of how the algorithm behaved in real life.  You can find
these (commented as well) in the source code of gtk-gnutella, available
on Sourceforge's SVN for the gtk-gnutella project.

The sources are huge, so just look at src/dht/lookup.c where the above logic
is implemented, in routines lookup_path_is_safe() and lookup_node_is_safe().

Cheers,
Raphael

_______________________________________________
p2p-hackers mailing list
p2p-hackers@lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to