[tor-dev] On the visualization of OONI bridge reachability data

2014-10-04 Thread George Kadianakis
== What is bridge reachability data? ==

By bridge reachability data I'm referring to information about which
Tor bridges are censored in different parts of the world.

The OONI project has been developing a test that allows probes in
censored countries to test which bridges are blocked and which are
not. The test simply takes as input a list of bridges and tests
whether they work. It's also able to test obfuscated bridges with
various pluggable transports (PTs).

== Why do we care about this bridgability data? ==

A few different parties care about the results of the bridge
reachability test [0]. Some examples:

Tor developers and censorship researchers can study the bridge
reachability data to learn which PTs are currently useful around the
world, by seeing which pluggable transports get blocked and where.  We
can also learn which bridge distribution mechanisms are busted and
which are not.

Bridge operators, the press, funders and curious people, can learn
which countries conduct censorship and how advanced technology they
use. They can also learn how long it takes jurisdictions to block
public bridges. And in general, they can get a better understanding of
how well Tor is doing in censorship circumvention around the world.

Finally, censored users and world travelers can use the data to learn
which PTs are safe to use in a given jurisdiction.

== Visualizing bridge reachability data ==

So let's look at the data.

Currently, OONI bridge reachability reports look like this:
https://ooni.torproject.org/reports/0.1/CN/bridge_reachability-2014-07-02T21Z-AS4538-probe.yamloo
and you can retrieve them from this directory listing:
https://ooni.torproject.org/reports/0.1/

That's nice, but I doubt that many people will be able to access (let
alone understand) those reports. Hence, we need some kind of
visualization (and better dir listing) to conveniently display the
data to human beings.

However, a simple x-to-y graph will not suffice: our ploblem is
multidimensional. There are many use cases for the data and bridges
have various characteristics (obfuscation method, distribution method,
etc.) hence there are more than one useful ways to visualize this
dataset.

To give you an idea, I will show you two mockups of visualizations
that I would find useful. Please don't pay attention to the data
itself, I just made some things up while on a train.

Here is one that shows which PTs are blocked in which countries:
https://people.torproject.org/~asn/bridget_vis/countries_pts.jpg The
list would only include countries that are blocking at least a
bridge. Green is "works", red is "blocked". Also, you can imagine the
same visualization, but instead of PT names for columns it has
distribution methods ("BridgeDB HTTP distributor", "BridgeDB mail
distributor", "Private bridge", etc.).

And here is another one that shows how fast jurisdictions block the
default TBB bridges:
https://people.torproject.org/~asn/bridget_vis/tbb_blocked_timeline.jpg

These visualizations could be helpful, but they are not the only ones.

What other use cases do you imagine using this dataset for?

What graphs or visualizations would you like to see?

[0]: Here are some use cases:

  Tor developers / Researcers:
  *** Which pluggable transports are blocked and where?
  *** Do they do DPI? Or did they just block the TBB hardcoded bridges?
  *** Which jurisdictions are most aggressive and what blocking technology 
do they use?
  *** Do they block based on IP or on (IP && PORT)?
  
  Users:
  *** Which pluggable transport should I use in my jurisdiction?
  
  Bridge operators / Press / Funders / Curious people:
  *** Which jurisdictions conduct Tor censorship? (block pluggable 
transports/distribution methods)
  *** How quickly do jurisdictions block bridges?
  *** How many users/traffic (and which locations) did the blocked bridges 
serve?
   Can be found out through extrainfo descriptors.
  *** How well are Tor bridges doing in censorship circumvention? 

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-10-06 Thread Matthew Finkel
On Sat, Oct 04, 2014 at 06:27:22PM -0700, M. C. McGrath wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Hi,
> These were a few possibilities for visualization that we came up with
> at the OTF summit (I can send the full notes from that discussion if
> everyone is okay with it):
> - - Timelines (by protocol, pool, country)
> - - Pie charts for above
> - - Timeline/graph of time it takes to block bridge from when added to
> TBB (github parser)

Similar to the next one, I wonder if showing a map cooresponding to
this data would also help. At t0, zero countries block the built-in
bridges, at t1 = only China blocks, at t2 = China + Iran block, at t3 =
China + Iran + Syria block, t4 = t3 + Turkey, etc. I'm thinking this
would be nice in addition to the timeline which George sketched (where
some of the time points are clickable and update the map). I don't
actually know how difficult this is to make.

> - - Geographic breakdown by region (if enough data points) Could be
> similar to this map of % of internet users who use Tor by country
> https://transparencytoolkit.org/tormap.html

That's cool. Are you able to add a legend to it?

Having something like this or similar to RSF's Press Freedom Index [0],
based on the number of bridge users, would be nice. This is doable
today, using the available metrics data. We'll probably never be able
to know the number of users per protocol per country, but at least we
can visualize where in the world bridges (in general) are used most
and if this changes over time.

[0] http://rsf.org/index2014/en-index2014.php

But, it would also be really cool if we can create a map like this
based on the reachability of bridges per country per protocol and
maybe, in addition, color-code/denote how the ISPs/country are
interfering with the connection (e.g. throttling, DNS cache
poisoning, IP addr/port blocking).

> - - At what point in the tor bootstrapping does it fail (may be
> difficult to determine, especially anonymized)?

Yes, but there's already a risk to running ooni-probe (at least right
now, hopefully this will change in time). We will eventually need
probes running in most countries if we want a good understanding of
what network interference is taking place and who is affected.

> - - In all visualizations, compare with control (filter, line break,
> plot alongside, etc)
> 
> And the variables we thought would be relevant to visualizations:
> Protocol
> Pool
> Country (and region)- Iran, China, Netherlands (control)
> Time it takes to be blocked
> Point in bootstrap where it fails
> Classify the bridges by commercial/residential connection
> Time we started scanning the bridge from where
> 

Maybe latency measurements per protocol? Initially, I'm thinking
"the time is takes to download a consensus from the bridge" but
there are many variables that may affect this. Anyone have a better
idea?

I think this mostly covers it. The only addition can think of right
now is comparing different control countries against each other (and
different ISPs within the control countries). Maybe we'll find
something interesting.

> It should be relatively simple to make rough versions of a lot of
> visualizations to see what works once we have a parser/converter that
> will generate JSONs (or similar) from OONI output that include the
> variables listed above.
> 

Is someone already working on this? I'm not really volunteering, merely
curious if this is in progress. :)

> Are there any other variables that would be particularly helpful to
> track or visualize? And are there any visualizations (listed or
> otherwise) that anyone would find particularly helpful?
> 
> 
> On 10/04/2014 06:10 PM, George Kadianakis wrote:
> > == What is bridge reachability data? ==
> > 
> > By bridge reachability data I'm referring to information about
> > which Tor bridges are censored in different parts of the world.
> > 
> > The OONI project has been developing a test that allows probes in 
> > censored countries to test which bridges are blocked and which are 
> > not. The test simply takes as input a list of bridges and tests 
> > whether they work. It's also able to test obfuscated bridges with 
> > various pluggable transports (PTs).
> > 
> > == Why do we care about this bridgability data? ==
> > 
> > A few different parties care about the results of the bridge 
> > reachability test [0]. Some examples:
> > 
> > Tor developers and censorship researchers can study the bridge 
> > reachability data to learn which PTs are currently useful around
> > the world, by seeing which pluggable transports get blocked and
> > where.  We can also learn which bridge distribution mechanisms are
> > busted and which are not.
> > 
> > Bridge operators, the press, funders and curious people, can learn 
> > which countries conduct censorship and how advanced technology
> > they use. They can also learn how long it takes jurisdictions to
> > block public bridges. And in general, they can get a better
> > understandin

Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-10-06 Thread Arturo Filastò
On 10/6/14, 6:28 PM, Matthew Finkel wrote:
> On Sat, Oct 04, 2014 at 06:27:22PM -0700, M. C. McGrath wrote:
>> These were a few possibilities for visualization that we came up with
>> at the OTF summit (I can send the full notes from that discussion if
>> everyone is okay with it):

Is this something that is different from what is on this pad:
https://pad.riseup.net/p/bridgereachability?

If so please do!

>> - - Timelines (by protocol, pool, country)
>> - - Pie charts for above
>> - - Timeline/graph of time it takes to block bridge from when added to
>> TBB (github parser)
> 
> Similar to the next one, I wonder if showing a map cooresponding to
> this data would also help. At t0, zero countries block the built-in
> bridges, at t1 = only China blocks, at t2 = China + Iran block, at t3 =
> China + Iran + Syria block, t4 = t3 + Turkey, etc. I'm thinking this
> would be nice in addition to the timeline which George sketched (where
> some of the time points are clickable and update the map). I don't
> actually know how difficult this is to make.
> 

I like this idea, though having both the map and the timeline will take
up quite a bit of screen real estate. I think that both of these are
useful graphs to have and linking the two into one giant one probably
does not require that amount of effort so I would go for it.

>> - - Geographic breakdown by region (if enough data points) Could be
>> similar to this map of % of internet users who use Tor by country
>> https://transparencytoolkit.org/tormap.html

[...]

> 
> But, it would also be really cool if we can create a map like this
> based on the reachability of bridges per country per protocol and
> maybe, in addition, color-code/denote how the ISPs/country are
> interfering with the connection (e.g. throttling, DNS cache
> poisoning, IP addr/port blocking).

This would indeed be very cool. A problem is that it's quite hard to
make a statement as to which protocol is working especially in cases
like China where the blocking does not happen immediately.

What we can do however is have something like bubbles over every country
that show the percentage of bridges of every category that we have
detected as "not working" in the country at that given time and if "not
working" means that "Tor cannot bootstrap to 100%", "the connection
attempt failed" or "the connection was reset".

>> - - At what point in the tor bootstrapping does it fail (may be
>> difficult to determine, especially anonymized)?
> 
> Yes, but there's already a risk to running ooni-probe (at least right
> now, hopefully this will change in time). We will eventually need
> probes running in most countries if we want a good understanding of
> what network interference is taking place and who is affected.
> 

I don't think it's an issue to publish at what point Tor bootstrap
failed as it doesn't give away any particularly personally identifiable
information. Also keep in mind that at this stage all of the
measurements are being conducted from machines that we have rented and
operated ourselves so privacy of the probe operator is not much of a
problem.

>> - - In all visualizations, compare with control (filter, line break,
>> plot alongside, etc)
>>
>> And the variables we thought would be relevant to visualizations:
>> Protocol
>> Pool
>> Country (and region)- Iran, China, Netherlands (control)
>> Time it takes to be blocked
>> Point in bootstrap where it fails
>> Classify the bridges by commercial/residential connection
>> Time we started scanning the bridge from where
>>
> 
> Maybe latency measurements per protocol? Initially, I'm thinking
> "the time is takes to download a consensus from the bridge" but
> there are many variables that may affect this. Anyone have a better
> idea?
> 
> I think this mostly covers it. The only addition can think of right
> now is comparing different control countries against each other (and
> different ISPs within the control countries). Maybe we'll find
> something interesting.
> 

I was more thinking of something like "downloading a resource of [10k,
100k, 1M] from a fixed location" so that we don't have the variable of
the consensus size and can use this as a benchmark.

What I am looking for is patterns that can be symptoms of throttling of
encrypted/tor traffic.

>> It should be relatively simple to make rough versions of a lot of
>> visualizations to see what works once we have a parser/converter that
>> will generate JSONs (or similar) from OONI output that include the
>> variables listed above.
>>
> 
> Is someone already working on this? I'm not really volunteering, merely
> curious if this is in progress. :)
> 

I have written such scripts, but have not yet published them since I
still need to finish cleaning them up.

The kind of data that they end up generating looks something like this:
http://arturo.filasto.net/vizPlayground/bridge_rearchability.csv

>> Are there any other variables that would be particularly helpful to
>> track or visualize? And are there 

Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-10-07 Thread Arturo Filastò
On 10/7/14, 6:18 AM, M. C. McGrath wrote:
>> The kind of data that they end up generating looks something like 
>> this: 
>> http://arturo.filasto.net/vizPlayground/bridge_rearchability.csv
> 
> Nice- though the link seems to be dead for me (I get a 404). But it is
> great that you are working on this as having this even in a rough form
> makes it possible to get started on visualizations.

Yeah sorry I mistyped.

The correct link is:
http://arturo.filasto.net/vizPlayground/bridge_reachability.csv

~ Art.




___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-10-07 Thread Nima Fatemi
Arturo Filastò wrote:
> On 10/7/14, 6:18 AM, M. C. McGrath wrote:
>>> The kind of data that they end up generating looks something like 
>>> this: 
>>> http://arturo.filasto.net/vizPlayground/bridge_rearchability.csv
>>
>> Nice- though the link seems to be dead for me (I get a 404). But it is
>> great that you are working on this as having this even in a rough form
>> makes it possible to get started on visualizations.
> 
> Yeah sorry I mistyped.
> 
> The correct link is:
> http://arturo.filasto.net/vizPlayground/bridge_reachability.csv

This is awesome. one quick question tho. isn't it better to hash the
fingerprints? I mean you're giving out a list of _working_ bridges in
certain places to whoever is looking.


-- 
Nima
0XC009DB191C92A77B | @mrphs

"I disapprove of what you say, but I will defend to the death your right
to say it" --Evelyn Beatrice Hall



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-10-08 Thread Arturo Filastò
On 10/7/14, 8:56 PM, Nima Fatemi wrote:
> This is awesome. one quick question tho. isn't it better to hash the
> fingerprints?

Yes you are right.
The fingerprints are now hashed.

~ Art.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-10-23 Thread isis
George Kadianakis transcribed 4.1K bytes:
> Currently, OONI bridge reachability reports look like this:
> https://ooni.torproject.org/reports/0.1/CN/bridge_reachability-2014-07-02T21Z-AS4538-probe.yamloo
> and you can retrieve them from this directory listing:
> https://ooni.torproject.org/reports/0.1/

A few concerns:

1. The tests have no control.

   I am concerned that the test has no real control.  One cannot say, "The
experiment is testing if these bridges are reachable from China, and the
control is whether or not they are reachable from the US."  The problem with
that is that there is absolutely no way to determine if the act of measurement
is effecting the data being measured.  How do you know that the test isn't
causing the bridges to get blocked?

2. This test is attempting to connect simultaneously to multiple bridges with
multiple different PT protocols.

   That is, this test is doing precisely what we all decided that Tor Browser
should *not* do, because the Great Firewall probably can't ask for better
filter training material. :(

 3. That test still isn't able to reliably start some transports,
i.e. fteproxy.

 4. The fingerprint should always be in the bridge line; otherwise you've got
no proof that you've actually connected to the bridge. :)

5. There is unnecessarily unsafe data in the report output.

   BridgeDB sends the bridge descriptors to the Metrics backend, so that
Metrics can process them, come up with all the rest of the graphs we have, and
put the sanitised data in Onionoo.  What if these reports were to contain only
data which is public, such as the data which Onionoo currently has?

   To play it safe, I would prefer not to have a bunch of bridge fingerprints
and ip:ports lying around, on a thousand poorly maintained machines all over
the planet.  The generated reports could instead output:

   * The hashed fingerprint (as is the case for bridges in onionoo)
   * The hashed ip:port
   * The transport name
   * [true|false|null] for whether the test was successful.

   This way, the data added to the rest of the bridge's data in onionoo, and
all the visualisation/metrics tools which use Onionoo (all of them, I believe)
won't need to do anything different.  Then BridgeDB could either get the data
from Onionoo.

6. Your tests would give more accurate data if they didn't use "real"
   bridges.

   I've mentioned this in #ooni on IRC, but for everyone else: To figure out
if a PT protocol is blocked, you do not need to use "real" bridges from Tor
Browser or BridgeDB.  If you (ideally automatedly) setup a couple bridges for
each protocol, this would:

   * Reduce the number of test inputs, making test runs complete faster and use
 less memory.
   * Eliminate the potential to get "real" bridges blocked through testing.
   * Test both sides of the connection, thus reducing false negatives.
   * Allow us to more accurately control variables while attempting to
 determine if a PT protocol is blocked by a certain country.


> Here is one that shows which PTs are blocked in which countries:
> https://people.torproject.org/~asn/bridget_vis/countries_pts.jpg The
> list would only include countries that are blocking at least a
> bridge. Green is "works", red is "blocked". Also, you can imagine the
> same visualization, but instead of PT names for columns it has
> distribution methods ("BridgeDB HTTP distributor", "BridgeDB mail
> distributor", "Private bridge", etc.).


To be honest, I don't care which pool. Also, that data is in already publicly
available in Onionoo (or deducible via its lack of availability).


> And here is another one that shows how fast jurisdictions block the
> default TBB bridges:
> https://people.torproject.org/~asn/bridget_vis/tbb_blocked_timeline.jpg


Neat idea!


> These visualizations could be helpful, but they are not the only ones.
> 
> What other use cases do you imagine using this dataset for?


In order to better hand out bridges, it would be quite excellent if BridgeDB
could someday have something like:

 { hashed_bridge_address: SHA1('IP:PORT'),
   hashed_bridge_fingerprint: SHA1('FINGERPRINT'),
   pt_method: PT_METHOD|'vanilla',
   regions: {
 ...,
 BR: {
   reachable: false,
   since: TIMESTAMP_WHEN_IT_FIRST_BECAME_UNREACHABLE },
 ...,
 CA: {
   reachable: true,
   since: TIMESTAMP_WHEN_IT_FIRST_BECAME_REACHABLE },
 CN: {
   reachable: false,
   since: TIMESTAMP_WHEN_IT_FIRST_BECAME_UNREACHABLE },
 ...,
 },
 },
 ...,

-- 
 ♥Ⓐ isis agora lovecruft
_
OpenPGP: 4096R/0A6A58A14B5946ABDE18E207A3ADB67A2CDB8B35
Current Keys: https://blog.patternsinthevoid.net/isis.txt


signature.asc
Description: Digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-10-23 Thread isis
isis transcribed 6.6K bytes:
>* The hashed fingerprint (as is the case for bridges in onionoo)
>* The hashed ip:port

Actually, my apologies, I was quite tired when I wrote this and totally
completely wrong.

A hashed ip:port would be a terrible idea because IPv4 space is only 2^32 and
ports are 2^16. In total that's a 2^48 message space. Hashing for a preimage
to get the bridge addresses in quite feasible in those constaints, as well as
precomputing the attack offline.

We should come up with a different way to hide ip:ports.

-- 
 ♥Ⓐ isis agora lovecruft
_
OpenPGP: 4096R/0A6A58A14B5946ABDE18E207A3ADB67A2CDB8B35
Current Keys: https://blog.patternsinthevoid.net/isis.txt


signature.asc
Description: Digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-10-25 Thread Karsten Loesing
On 24/10/14 01:53, isis wrote:
> isis transcribed 6.6K bytes:
>>* The hashed fingerprint (as is the case for bridges in onionoo)
>>* The hashed ip:port
> 
> Actually, my apologies, I was quite tired when I wrote this and totally
> completely wrong.
> 
> A hashed ip:port would be a terrible idea because IPv4 space is only 2^32 and
> ports are 2^16. In total that's a 2^48 message space. Hashing for a preimage
> to get the bridge addresses in quite feasible in those constaints, as well as
> precomputing the attack offline.
> 
> We should come up with a different way to hide ip:ports.

I'm lacking context, but just in case this is even remotely relevant,
here's how CollecTor sanitizes bridge IP addresses:

https://collector.torproject.org/formats.html#bridge-descriptors

All the best,
Karsten

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-10-25 Thread Matthew Finkel
On Sat, Oct 25, 2014 at 01:01:52PM +0200, Karsten Loesing wrote:
> On 24/10/14 01:53, isis wrote:
> > isis transcribed 6.6K bytes:
> >>* The hashed fingerprint (as is the case for bridges in onionoo)
> >>* The hashed ip:port
> > 
> > Actually, my apologies, I was quite tired when I wrote this and totally
> > completely wrong.
> > 
> > A hashed ip:port would be a terrible idea because IPv4 space is only 2^32 
> > and
> > ports are 2^16. In total that's a 2^48 message space. Hashing for a preimage
> > to get the bridge addresses in quite feasible in those constaints, as well 
> > as
> > precomputing the attack offline.
> > 
> > We should come up with a different way to hide ip:ports.
> 
> I'm lacking context, but just in case this is even remotely relevant,
> here's how CollecTor sanitizes bridge IP addresses:
> 
> https://collector.torproject.org/formats.html#bridge-descriptors

Hey Karsten,

Yes, this is very relevant, thanks! Currently our plan involves
keying the JSON dataset using unsanitized "IP Address:port" internally
and the sanitized public version will replace this key with
H(H(fingerprint)). This seems like the easiest way to avoid the
problem of leaking the IP address.

At this point, we don't think we need an IP address in the resulting
dataset, so a unique, linkable fingerprint seems sufficient. If we
find that IP addresses are useful then Collector's algorithm seems like
a good starting point.

- Matt
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-10-25 Thread Matthew Finkel
On Sat, Oct 25, 2014 at 11:26:50AM +, Matthew Finkel wrote:
> On Sat, Oct 25, 2014 at 01:01:52PM +0200, Karsten Loesing wrote:
> > On 24/10/14 01:53, isis wrote:
> > > isis transcribed 6.6K bytes:
> > >>* The hashed fingerprint (as is the case for bridges in onionoo)
> > >>* The hashed ip:port
> > > 
> > > Actually, my apologies, I was quite tired when I wrote this and totally
> > > completely wrong.
> > > 
> > > A hashed ip:port would be a terrible idea because IPv4 space is only 2^32 
> > > and
> > > ports are 2^16. In total that's a 2^48 message space. Hashing for a 
> > > preimage
> > > to get the bridge addresses in quite feasible in those constaints, as 
> > > well as
> > > precomputing the attack offline.
> > > 
> > > We should come up with a different way to hide ip:ports.
> > 
> > I'm lacking context, but just in case this is even remotely relevant,
> > here's how CollecTor sanitizes bridge IP addresses:
> > 
> > https://collector.torproject.org/formats.html#bridge-descriptors
> 
> Hey Karsten,
> 
> Yes, this is very relevant, thanks! Currently our plan involves
> keying the JSON dataset using unsanitized "IP Address:port" internally
> and the sanitized public version will replace this key with
> H(H(fingerprint)). This seems like the easiest way to avoid the
> problem of leaking the IP address.

Whoops, that should be H(fingerprint), nothing special. Sorry, I got a
little hashing happy.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] On the visualization of OONI bridge reachability data

2014-11-02 Thread isis
Matthew Finkel transcribed 1.6K bytes:
> On Sat, Oct 25, 2014 at 01:01:52PM +0200, Karsten Loesing wrote:
> > On 24/10/14 01:53, isis wrote:
> > > isis transcribed 6.6K bytes:
> > >>* The hashed fingerprint (as is the case for bridges in onionoo)
> > >>* The hashed ip:port
> > > 
> > > Actually, my apologies, I was quite tired when I wrote this and totally
> > > completely wrong.
> > > 
> > > A hashed ip:port would be a terrible idea because IPv4 space is only 2^32 
> > > and
> > > ports are 2^16. In total that's a 2^48 message space. Hashing for a 
> > > preimage
> > > to get the bridge addresses in quite feasible in those constaints, as 
> > > well as
> > > precomputing the attack offline.
> > > 
> > > We should come up with a different way to hide ip:ports.
> > 
> > I'm lacking context, but just in case this is even remotely relevant,
> > here's how CollecTor sanitizes bridge IP addresses:
> > 
> > https://collector.torproject.org/formats.html#bridge-descriptors
> 
> Yes, this is very relevant, thanks! Currently our plan involves
> keying the JSON dataset using unsanitized "IP Address:port" internally
> and the sanitized public version will replace this key with
> H(H(fingerprint)). This seems like the easiest way to avoid the
> problem of leaking the IP address.
> 
> At this point, we don't think we need an IP address in the resulting
> dataset, so a unique, linkable fingerprint seems sufficient. If we
> find that IP addresses are useful then Collector's algorithm seems like
> a good starting point.

I agree that we could probably do without any IP:port information in the
resulting reports.  The hashed fingerprint is enough for BridgeDB to deduce a
bridge's IP:ports; it should also be enough for Metrics to deduce which bridge
a particular set of additional reachability information concerns, without
needing to do any additional processing of either the IP:ports or the
fingerprints.

With respect to CollecTor's algorithms for sanitising bridge IP:ports (should
we decide to instead keep the bridge address information in OONI's bridge
reachability reports and wish to sanitise those reports), Robert Ransom spoke
with me on the 24th of October, and made the following points and suggestions:

Robert Ransom transcribed 1.0K bytes:
> The Metrics system currently sanitizes bridge TCP addresses (IP+port)
> by HMACing them with a secret key stored on the server.  That won't
> work for the reachability testing system for two reasons:
> 
> * The reachability-testing bridge clients should not know the key
> needed to obfuscate TCP (or UDP, or other) addresses
> deterministically.  (A deterministic public-key encryption would be
> just as bad as a hash.)
> 
> * BridgeDB must be able to learn the address for which a bridge's
> reachability test was performed, so that it can decide whether the
> reachability-test results are valid for the bridge's current address.
> 
> I would suggest that the reachability-testing bridge client report a
> (randomized) public-key encryption of the address, where the
> decryption key is held by BridgeDB (so it can check whether the
> reachability test is relevant to the current ‘Bridge line’) and the
> Metrics sanitization server (so it can compute and publish a
> deterministically sanitized address, following the current
> sanitization procedure).

-- 
 ♥Ⓐ isis agora lovecruft
_
OpenPGP: 4096R/0A6A58A14B5946ABDE18E207A3ADB67A2CDB8B35
Current Keys: https://blog.patternsinthevoid.net/isis.txt


signature.asc
Description: Digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev