Re: [tor-dev] Sybil attack detection (was: Karsten's July 2014)

2014-08-05 Thread Andrea Shepard
On Tue, Aug 05, 2014 at 11:24:32AM -0400, Philipp Winter wrote:
> On Tue, Aug 05, 2014 at 11:37:45AM +0200, Karsten Loesing wrote:
> > Started looking into better algorithms to detect Sybil attacks on the
> > Tor network.  Current thinking is that we should define relay similarity
> > metrics like common IP address prefix length or time between first seen
> > in a consensus, go throw the consensus archives, and see which relays
> > look similar but are not defined to be in the same family.
> 
> Do you already have some code or more details on that?  I'm quite
> interested in this topic and I'm wondering if it wouldn't be best to
> start with something simple like cosine similarity [0].  We would have
> to transform a relay descriptor into a numerical vector and then
> calculate the cosine similarity to other relay vectors.  However, this
> might scale poorly as we would need (n^2)/2 comparisons.
> 
> We might also want to weigh the vector's elements differently as some of
> them are easy to control for an attacker (advertised bandwidth, uptime,
> flags, exit policy) and others require more effort (IP address, ASN,
> observed bandwidth).  Like you mentioned, the key thing to look at might
> be time, i.e., uptime and derived features such as "total online time in
> last month" etc.
> 
> [0] https://en.wikipedia.org/wiki/Cosine_similarity

You only need O(n*log(n)) if you can define any similarity metric that
respects the triangle inequality.  There's a lot of research on data
structures for this:

https://en.wikipedia.org/wiki/Metric_tree

-- 
Andrea Shepard

PGP fingerprint (ECC): BDF5 F867 8A52 4E4A BECF  DE79 A4FF BC34 F01D D536
PGP fingerprint (RSA): 3611 95A4 0740 ED1B 7EA5  DF7E 4191 13D9 D0CF BDA5


pgpbb3vBkhc9W.pgp
Description: PGP signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Sybil attack detection (was: Karsten's July 2014)

2014-08-05 Thread Philipp Winter
On Tue, Aug 05, 2014 at 11:37:45AM +0200, Karsten Loesing wrote:
> Started looking into better algorithms to detect Sybil attacks on the
> Tor network.  Current thinking is that we should define relay similarity
> metrics like common IP address prefix length or time between first seen
> in a consensus, go throw the consensus archives, and see which relays
> look similar but are not defined to be in the same family.

Do you already have some code or more details on that?  I'm quite
interested in this topic and I'm wondering if it wouldn't be best to
start with something simple like cosine similarity [0].  We would have
to transform a relay descriptor into a numerical vector and then
calculate the cosine similarity to other relay vectors.  However, this
might scale poorly as we would need (n^2)/2 comparisons.

We might also want to weigh the vector's elements differently as some of
them are easy to control for an attacker (advertised bandwidth, uptime,
flags, exit policy) and others require more effort (IP address, ASN,
observed bandwidth).  Like you mentioned, the key thing to look at might
be time, i.e., uptime and derived features such as "total online time in
last month" etc.

[0] https://en.wikipedia.org/wiki/Cosine_similarity

Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev