Re: [tor-dev] Projects to combat/defeat data correlation

2014-01-16 Thread Andreas Krey
On Wed, 15 Jan 2014 21:16:20 +, Jim Rucker wrote:
 There was a story in the news recently of a Harvard student who used Tor to
 send a bomb threat to Harvard in order to cancel classes so he wouldn't
 have to take a test. He was apprehended within a day, which puts into
 question the anonymity of Tor.

This was because it was known that the threat was delivered via tor, and
that he was the only one in $(organizational unit of harvard) using tor
at that time, and he confessed when being confronted with that. There
was nothing that actually proved that he did the threat. (Unless this
is a case of parallel construction, of course, which I don't assume.)

...
 Are there any projects in Tor being worked in to combat data correlation?
 For instance, relays the send/recv constant data rates continuously -
 capping data rates and padding partial or non-packets with random data to
 maintain the data rates

At the moment that would be prohibitively expensive. Also, it wouldn't
guard against the scenario above - you can't be online and shoveling
data all the time, so longterm correlation is still possible.

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Anyone wanting to write some Weather-tight code?

2014-01-16 Thread Abhiram Chintangal
Hello Norbert and Karsten,

I have added a couple of attachments to the projects wiki-page. The
first one, is a UML diagram of the data-models being used in the
current weather. It should gives us a good idea about the current
implementation.

The second attachment is the Design Document from the current Tor
Weather application. It should give a decent idea about the important
modules in the application and more importantly its work-flow.

Thanks!
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Using MaxMind's GeoIP2 databases in tor, BridgeDB, metrics-*, Onionoo, etc.

2014-01-16 Thread Karsten Loesing
Hi devs,

you probably know that we use MaxMind's GeoIP database in various of our
products (list may not be exhaustive):

 - tor: We ship little-t-tor with a geoip and a geoip6 file for clients
to support excluding relays by country code and for relays to generate
by-country statistics.
 - BridgeDB: I vaguely recall that the BridgeDB service uses GeoIP data
to return only bridges that are not blocked in a user's country.  Or
maybe that was a feature yet to be implemented.
 - Onionoo: The Onionoo service uses MaxMind's city database to provide
location information of relays.  (It also uses MaxMind's ASN database to
provide information on AS number and name.)
 - metrics-db: I'm planning to use GeoIP data to resolve bridge IP
addresses to country codes in the bridge descriptor sanitizing process.
 - metrics-web: We have been using GeoIP data to provide statistics on
relays by country.  This is currently disabled because the
implementation was eating too many resources, but I plan to put these
statistics back.

However, the GeoIP database that we currently use has a big shortcoming:
it replaces valid country codes with A1 or A2 whenever MaxMind thinks
that a relay is an anonymizing proxy or satellite provider.

That's why we currently repair their database by either automatically
guessing what country code an A1 entry could have had [1, 2], or by
manually looking it up in RIR delegation files [3, 4].  This is just a
workaround.  Also, I think BridgeDB doesn't repair its GeoIP database.

Here's the good news: MaxMind now provides their databases in new
formats which provide the A1/A2 information in *addition* to the correct
country codes [5, 6].  We should switch!

How do we switch?  First option is to ship their binary database files
and include their APIs [7] in our products.  Looks there are APIs for C,
Java, and Python, so all the languages we need for the tools listed
above.  Pros: we can kick out our parsing and lookup code.  Cons: we
need to check if their licenses are compatible, we have to kick out our
parsing and lookup code and learn their APIs, and we add new dependencies.

Another option is to write a new tool that parses their full databases
and converts them into file formats we already support.  (This would
also allow us to provide a custom format with multiple database versions
which would be pretty useful for metrics, see #6471.)  Also, it looks
like their license, Creative Commons Attribution-ShareAlike 3.0
Unported, allows converting their database to a different format.  If we
want to write such a tool, we have a few options:

 - We use their database specification [8] and write our own parser
using a language of our choice (read: whoever writes it pretty much
decides).  We could skip the binary search tree part of their files and
only process the contents.  Whenever they change their format, we'll
have to adapt.
 - We use their Python API [9] to build our parser, though it looks like
that requires pip or easy_install and compiling their C API.  I don't
know enough about Python to assess what headaches that's going to cause.
 - We use their Java API [10] to build our parser, though we're probably
forced to use Maven rather than Ant.  I don't have much experience with
Maven.  Also, using Java probably makes me the default (and only)
maintainer, which I'd want to avoid if possible.

Thoughts?  What other options did I miss, and what pros and cons that I
overlooked?

And is this something that people on this list would want to help with,
once we agreed on one of the options?  If so, please feel free to join
the discussion now and maybe influence which path we're going to take.

All the best,
Karsten


[1]
https://gitweb.torproject.org/tor.git/blob/HEAD:/src/config/deanonymind.py
[2]
https://gitweb.torproject.org/onionoo.git/blob/HEAD:/geoip/deanonymind.py
[3] https://gitweb.torproject.org/tor.git/blob/HEAD:/src/config/geoip-manual
[4] https://gitweb.torproject.org/onionoo.git/blob/HEAD:/geoip/geoip-manual
[5] http://dev.maxmind.com/geoip/geoip2/whats-new-in-geoip2/
[6] http://dev.maxmind.com/geoip/geoip2/geolite2/
[7] http://dev.maxmind.com/geoip/geoip2/downloadable/
[8]
https://github.com/marklr/MaxMind-IPDB-perl/blob/master/docs/MaxMind-IPDB-spec.md
[9] https://pypi.python.org/pypi/geoip2
[10] https://github.com/maxmind/MaxMind-DB-Reader-java
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Projects to combat/defeat data correlation

2014-01-16 Thread Matthew Finkel
On Wed, Jan 15, 2014 at 09:16:20PM -0600, Jim Rucker wrote:
 Are there any projects in Tor being worked in to combat data correlation?
 For instance, relays the send/recv constant data rates continuously -
 capping data rates and padding partial or non-packets with random data to
 maintain the data rates

The very quick answer without providing much detail is that you may want
to look at scramblesuit [0][1]. It doesn't try to provide constant
throughput, but (as the website says) we alter inter-arrival times and
the transported protocol's packet length distribution. This isn't a
perfect solution, and won't impress a GPA, but it's a start if you're
dealing with a localized passive observer.

- Matt

[0] http://www.cs.kau.se/philwint/scramblesuit/
[1] https://gitweb.torproject.org/user/phw/scramblesuit.git
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal 225: Strawman proposal: commit-and-reveal shared rng

2014-01-16 Thread Kang
I don't think that a solution which uses DKG is overkill, I think it
would be more secure.
The more all-or-nothing security provided by DKG based schemes seems
preferable to the sliding-scale-of-influence provided by coin flipping
ones.
Then again I don't know that much about coin flipping protocols so
that could simply be me being ignorant.

On the other hand he's right that DKG-based schemes would probably be
more complex, at least in regards to number of calculations.
For instance the Fouque protocol.
I've actually implemented a proof-of-concept version of the Fouque
protocol and it's not at all efficient.
Using a 1024 bit prime it can run for 5 participants in about 2
minutes (calculations only) on a single decent computer.
I imagine a more speed focused implementation running over 10
directory authorities would have an acceptable runtime, but 20 or 30
authorities might be pushing it.

Luckily other DKG protocols apparently have far less complexity [ O(n)
as opposed to O(n^2) ] since they don't have the nice
one-round-sharing-phase feature.

I guess it's all just a balancing act with no real correct answer.
Security vs calculation complexity vs network traffic vs protocol
complexity vs ...
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Projects to combat/defeat data correlation

2014-01-16 Thread David Stainton
In that case would it then look like zero in $(organizational unit of
harvard) using tor and
one in $(organizational unit of harvard) using scramble suit?

I like the idea of the tor pluggable transport combiner... wherein we
could wrap a pseudo-random appearing obfuscation protocol (such as
obfs3, scramblesuit etc) in a white listed obfuscation protocol such
as http?, sshrproxy, hexchat etc.

I imagine the anonymity set would be much smaller for these combined
transports... fewer people using them.


On Thu, Jan 16, 2014 at 12:54 PM, Matthew Finkel
matthew.fin...@gmail.com wrote:
 On Wed, Jan 15, 2014 at 09:16:20PM -0600, Jim Rucker wrote:
 Are there any projects in Tor being worked in to combat data correlation?
 For instance, relays the send/recv constant data rates continuously -
 capping data rates and padding partial or non-packets with random data to
 maintain the data rates

 The very quick answer without providing much detail is that you may want
 to look at scramblesuit [0][1]. It doesn't try to provide constant
 throughput, but (as the website says) we alter inter-arrival times and
 the transported protocol's packet length distribution. This isn't a
 perfect solution, and won't impress a GPA, but it's a start if you're
 dealing with a localized passive observer.

 - Matt

 [0] http://www.cs.kau.se/philwint/scramblesuit/
 [1] https://gitweb.torproject.org/user/phw/scramblesuit.git
 ___
 tor-dev mailing list
 tor-dev@lists.torproject.org
 https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Projects to combat/defeat data correlation

2014-01-16 Thread Ximin Luo
 I imagine the anonymity set would be much smaller for these combined
 transports... fewer people using them.

In my understanding, the anonymity set doesn't apply to use of PTs since this 
is only at the entry side. The exit side does not know[1] what PT the 
originator is using, so is unable to use that information to de-anonymise.

[1] at least, in theory should not know, perhaps someone can check there are no 
side-channels? would be pretty scary if exit could work out that originator is 
using PTs.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Projects to combat/defeat data correlation

2014-01-16 Thread Kevin P Dyer
On Wed, Jan 15, 2014 at 7:16 PM, Jim Rucker mrjim...@gmail.com wrote:
 [snip]

 From my understanding (please correct me if I'm wrong) Tor has a weakness in
 that if someone can monitor data going into the relays and going out of the
 exit nodes then they can defeat the anonymity of tor by correlating the size
 and number of packets being sent to relays and comparing those that the
 packets leaving the exit nodes.

 Are there any projects in Tor being worked in to combat data correlation?
 For instance, relays the send/recv constant data rates continuously -
 capping data rates and padding partial or non-packets with random data to
 maintain the data rates

What you are referring to is a traffic confirmation attack. It's a
deceptively hard problem --- even if the naive strategy of sending
data at a constant rate worked (for some definition) it would be
prohibitively expense in practice. It is also worth reiterating that
even if such a countermeasure is in place, it wouldn't conceal that
fact that a specific user is connecting to the Tor network.

If you are interested in recent academic works on traffic analysis,
you should have a look at [1] and [2]. They explore the related
setting of website fingerprinting attacks and defenses (including the
one you suggest.)

-Kevin

[1] https://kpdyer.com/publications/oakland2012-peekaboo.pdf
[2] http://cacr.uwaterloo.ca/techreports/2013/cacr2013-30.pdf
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Dusting off dir-spec.txt

2014-01-16 Thread Nick Mathewson
On Tue, Jan 14, 2014 at 1:56 PM, Karsten Loesing kars...@torproject.org wrote:
 [...]
 (Let me know if you prefer this review to happen in a ticket rather than
 here.)


Thanks, Karsten!  I think it should ideally be a ticket?

-- 
Nick
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Projects to combat/defeat data correlation

2014-01-16 Thread Griffin Boyce

Ximin Luo wrote:
In my understanding, the anonymity set doesn't apply to use of PTs 
since this is only at the entry side. The exit side does not know[1] 
what PT the originator is using, so is unable to use that information 
to de-anonymise.


[1] at least, in theory should not know, perhaps someone can check 
there are no side-channels? would be pretty scary if exit could work 
out that originator is using PTs.


  Anonymity is still a consideration, even if it's highly unlikely to 
be impinged upon by pluggable transports.  For example, if a network 
notices someone connect to a known obfsproxy bridge, then they can make 
an educated guess that the person is using both Tor and obfsproxy.  With 
flashproxy, this is of much less concern given address diversity.  With 
bananaphone, it wouldn't really apply at all as far as I can see.


~Griffin
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Review of Proposal 147: Eliminate the need for v2 directories in generating v3 directories

2014-01-16 Thread Nick Mathewson
On Wed, Jan 15, 2014 at 9:15 PM, Roger Dingledine a...@mit.edu wrote:
 On Wed, Jan 15, 2014 at 01:08:03PM +0100, Karsten Loesing wrote:
 I talked to Roger on IRC, and here's why this proposal may indeed be
 overkill:

 As of January 2013, there is only a single version 3 directory authority
 left that serves version 2 statuses: dizum.  moria1 and tor26 have been
 rejecting version 2 requests for a long time, and it's mostly an
 oversight that dizum still serves them.  The other six authorities have
 never generated version 2 statuses for others to be used as pre-voting
 opinions.  So, it's basically not true that version 2 statuses are
 required for the version 3 protocol to work properly.

 See git commits 2e692bd8 and eaf5487d, which went into 0.2.2.12-alpha:
   o Major bugfixes:
 - Many relays have been falling out of the consensus lately because
   not enough authorities know about their descriptor for them to get
   a majority of votes. When we deprecated the v2 directory protocol,
   we got rid of the only way that v3 authorities can hear from each
   other about other descriptors. Now authorities examine every v3
   vote for new descriptors, and fetch them from that authority. Bugfix
   on 0.2.1.23.

 That was the stopgap that made proposal 147 not so critical. I think
 based on Karsten's recent results that maybe it's enough.

Sounds good to me.

Is this in dir-spec.txt? I'm not finding it at first glance.  If it
isn't, Karsten, would you be able to add it?  Probably we should do
that _after_ merging your dirspec branch.

-- 
Nick
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Review of Proposal 147: Eliminate the need for v2 directories in generating v3 directories

2014-01-16 Thread Nick Mathewson
On Wed, Jan 15, 2014 at 7:08 AM, Karsten Loesing kars...@torproject.org wrote:
 [...]
 I talked to Roger on IRC, and here's why this proposal may indeed be
 overkill:

 As of January 2013, there is only a single version 3 directory authority
 left that serves version 2 statuses: dizum.  moria1 and tor26 have been
 rejecting version 2 requests for a long time, and it's mostly an
 oversight that dizum still serves them.  The other six authorities have
 never generated version 2 statuses for others to be used as pre-voting
 opinions.  So, it's basically not true that version 2 statuses are
 required for the version 3 protocol to work properly.

 Here's a possible way to move this forward.

 - Please review and merge my prop147tweaks branch that contains tweaks
 from our discussion above, regardless of whether this proposal will be
 implemented or not.

Done.

 - I'm going to run a quick analysis of archived vote documents to see
 how much authorities would have benefited from the others' votes before
 generating their own votes.

 - I'm going to ask Alex to disable version 2 statuses on dizum using
 DisableV2DirectoryInfo_ 1 to see what that does to the network.  We
 should probably finish the 2048 bits RSA keys upgrade first before
 changing yet another variable.

Soynds good.

 - If there's no convincing argument to implement opinion documents, we
 close this proposal as rejected.

Great.

 What do you think?

Sounds good.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Projects to combat/defeat data correlation

2014-01-16 Thread David Stainton
Yeah I guess if the PT doesn't draw attention and the bridge IP is not known
then one's Tor traffic may be somewhat obscured.

What about bananaphone? Do you mean the bananaphone PT?
It is trivially detectable... more so than say... a transport like obfs3
who's output looks like pseudo random noise.




On Thu, Jan 16, 2014 at 8:33 PM, Griffin Boyce grif...@cryptolab.net wrote:
 Ximin Luo wrote:

 In my understanding, the anonymity set doesn't apply to use of PTs since
 this is only at the entry side. The exit side does not know[1] what PT the
 originator is using, so is unable to use that information to de-anonymise.

 [1] at least, in theory should not know, perhaps someone can check there
 are no side-channels? would be pretty scary if exit could work out that
 originator is using PTs.


   Anonymity is still a consideration, even if it's highly unlikely to be
 impinged upon by pluggable transports.  For example, if a network notices
 someone connect to a known obfsproxy bridge, then they can make an educated
 guess that the person is using both Tor and obfsproxy.  With flashproxy,
 this is of much less concern given address diversity.  With bananaphone, it
 wouldn't really apply at all as far as I can see.

 ~Griffin

 ___
 tor-dev mailing list
 tor-dev@lists.torproject.org
 https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev