Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread Prateek Mittal
Thanks isis. We worked on PIR-Tor a while ago, so my recollection could be
a bit rusty, but here are some thoughts.

1) From a performance perspective, the best results are obtained by using
the information-theoretic variant of PIR-Tor (ITPIR) (as opposed to the
computational variant). As you correctly point out, we considered the three
guard relays to be the PIR directory servers, with the assumption that not
all three collude. (Our argument for the non-colluding assumption was that
if all three guards are malicious, then the security offered by Tor is
terrible already.) Note than ITPIR necessarily requires more than one
server. Given Tor's move to a single guard design, the first challenge is
to figure out alternative PIR servers for the client.

(I havn't thought much about this challenge, but it seems that the single
guard could be one server, and two other relays (maybe with guard flags)
could play the role of PIR directory servers. I'll have to think more about
this, but it is possible that this design continues to provide similar
security properties to Tor)

2) Even outside the context of PIR, there are several advantages to having
some structure in the database, with fine grained signatures on individual
relay/micro descriptors (relays could also be grouped into blocks, with
signatures at the block level). For example, if the Tor network were to use
2 hop circuits, then clients would have close to 0 (or at least
significantly less than 1%) overhead for maintaining network state.

Why? Because the first hop (guard) is fixed, fetching the directory to
select the guard is only a *one-time* operation; note that much of the
directory overhead in Mike Perry's analysis comes from *periodic fetches*.
Secondly, clients do not need to use PIR to fetch the exit relay in this
setting, since guards know the identity of exits in a two hop design anyway
(bandwidth overhead of fetching a single exit is simply the size of the
descriptor/micro-descriptor -- resulting in *several orders of magnitude*
bandwidth savings). (This optimization was referred to in the paper in the
context of fetching middle relays)

(Of course there are anonymity implications of moving to two hop designs.
Perhaps the most significant concern is easy guard identification, but
perhaps an additional 100 million clients + move to a single guard reduces
these concerns. The bandwidth savings could be very significant, specially
in the context of mobile clients, which otherwise might use hundreds of MB
per month just to maintain network state. Anyway, I digress.)

Please find some comments about your previous mail inline (below).

On Tue, Sep 30, 2014 at 1:04 AM, isis i...@torproject.org wrote:


   1. The authors assume that a client's three [sic] Guard Nodes will be the
  directory mirrors for that client, accepting PIR queries for
 descriptors.


Right, please see (1) above.


   3. There is zero handling of colluding directory mirrors.


This seems incorrect? We used the Percy open source library to perform PIR,
which does handle collusion among mirrors (to the best of my recollection).
The parameters can be set such that one honest server is sufficient for
security.


   4. The *increase* to descriptor size is not well factored in to the
 analysis.

 4.b. All of the Directory Authorities would need to sign each and every
   descriptor by itself. This would result in the current
   microdescriptors, which are sized from roughly 900 bytes to some
 of
   the larger ones which are 2.5 KB, increasing in size by roughly
 4 KB
   (that's the size of the DirAuth signatures on the current
 consensus).



I am not sure which signature scheme you are considering. In the paper, we
talk about threshold BLS signatures, which have only 22 byte overhead.
(This might increase for better security guarantees, but 4KB overhead seems
very conservative.)



 While one of the benefits of PIR would be that clients would no longer
 need to
 request the descriptors for all relays listed in the consensus, this
 benefit
 actually doesn't help as much as it would seem, because switching to a PIR
 scheme would require a client to make three of these roughly O(146449)-bit
 requests, every ten minutes or so, when the client wants a new circuit.


Right, clients would roughly need 18 middle/exit relays (using PIR queries)
over a three hour interval to build a Tor circuit every 10 minutes. My
recollection is that even in 2011, PIR seemed more efficient (we estimated
a 12KB overhead per PIR query) than a full network fetch every three hours
(12*18 = 216KB with PIR vs several MBs without PIR). Though these numbers
might change depending on implementation details (including the use of
micro-descriptors), the benefits would only improve with growing network
size.

Thanks,
Prateek

-- 
Prateek Mittal
Assistant Professor
Princeton University
http://www.princeton.edu/~pmittal/
___
tor-dev mailing 

Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread Nikita Borisov
On Tue, Sep 30, 2014 at 6:29 AM, Prateek Mittal pmit...@princeton.edu wrote:
   4. The *increase* to descriptor size is not well factored in to the
 analysis.

 4.b. All of the Directory Authorities would need to sign each and
 every
   descriptor by itself. This would result in the current
   microdescriptors, which are sized from roughly 900 bytes to some
 of
   the larger ones which are 2.5 KB, increasing in size by roughly
 4 KB
   (that's the size of the DirAuth signatures on the current
 consensus).



 I am not sure which signature scheme you are considering. In the paper, we
 talk about threshold BLS signatures, which have only 22 byte overhead. (This
 might increase for better security guarantees, but 4KB overhead seems very
 conservative.)

You can make this even lower because every PIR query returns a block
which is roughly square root the size of the entire database; you
would only need to have a signature on each block.

- Nikita
-- 
Nikita Borisov - http://hatswitch.org/~nikita/
Associate Professor, Electrical and Computer Engineering
Tel: +1 (217) 244-5385, Office: 460 CSL
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread Nikita Borisov
On Tue, Sep 30, 2014 at 12:04 AM, isis i...@torproject.org wrote:
 [1]: I assume that we need replication in Tor's use case. There are papers,
  such as the following:

  Kushilevitz, Eyal, and Rafail Ostrovsky.
Replication is not needed: Single database, computationally-private
information retrieval.
2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
IEEE Computer Society, 2013.
http://www.cs.fiu.edu/~carbunar/teaching/cis5374/slides/pir-single.pdf

  for which the research doesn't apply because it was aimed at
  computationally-hiding PIR schemes, and obviously Tor aims for
  information theoretic security. Other than the PIR-Tor paper, I haven't
  found any literature which analyses PIR for anything close to Tor's use
  case. (Although I'd be stoked to hear about any such papers.)

The current Tor design relies heavily on computational assumptions in
all of its cryptography, so I don't think that the Tor network setting
is a reason to use information-theoretic rather than computational
PIR. We advocated ITPIR because it is much more efficient and because
collusion among your three guards was already a significant problem.
With the move to a single guard, that argument no longer makes as much
sense. We do have an analysis in the paper of the computational PIR
scenario, but it does impose significantly more (computational)
overhead on the directory mirrors.

- Nikita
-- 
Nikita Borisov - http://hatswitch.org/~nikita/
Associate Professor, Electrical and Computer Engineering
Tel: +1 (217) 244-5385, Office: 460 CSL
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread Moritz Bartl
On 09/30/2014 06:28 AM, AFO-Admin wrote:
 E.g. you have a Server with 2x E5-2683 v3  v3 and a 10 Gbit/s pipe you
 would need atleast 14 IP's to use most of the CPU. 

Multicore support is hard and needs developers. Raising the limit from 2
relays per IP to x per IP has been discussed in the past and would be an
easy change.

-- 
Moritz Bartl
https://www.torservers.net/
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Getting the network consensus

2014-09-30 Thread Michael Rogers
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi all,

While working on peer-to-peer hidden services I've been wondering how
long two clients need to wait before arriving at the same view of the
Tor network. The answer, which I find surprising, is that this is
never guaranteed to happen. Here's why.

In what follows I'll assume that the interval between network
consensuses being published is one hour, but the conclusions remain
the same if the interval is longer or shorter - just change the units.
I'll use 1:00 consensus to mean a consensus that becomes valid at
1:00, is fresh until 2:00, and is valid until 4:00.

According to dir-spec.txt, each directory cache downloads the 1:00
consensus at a randomly chosen time between 1:00 and 1:30. Each client
replaces the 1:00 consensus with a newer consensus at a randomly
chosen time between 2:45 and 3:51. (The precise endpoint is 3:50:37.5,
due to a calculation described in section 5.1 of the spec. I'm curious
to know the origin of that calculation.)

Some observations:

1. The period when clients are replacing each consensus slightly
overlaps the period when they're replacing the previous consensus. For
about five and a half minutes of every hour there are twice as many
clients replacing their consensuses, so directory caches will
experience greater load during this period.

How much greater? We can divide clients that are downloading the
consensus into three groups:

A. Clients that have never downloaded a consensus.

B. Clients that are replacing a consensus that's no longer valid.

C. Clients that are replacing a valid consensus with a newer consensus.

Clients in group A bypass the directory caches and hit the
authorities, so they don't contribute to load on the caches. Clients
in group B have just started up and loaded an expired consensus from
disk. These clients are equally likely to hit the caches at any point
in the consensus interval (although the load will vary by time of day,
day of week, etc). Clients in group C are twice as likely to hit the
caches between 45 minutes and 51 minutes past the hour as at any other
time, due to the overlap between replacement intervals. So unless
group C is dwarfed by group B, we should see a noticeable increase in
load on the caches between 45 and 51 minutes past the hour. (If group
C is dwarfed by group B then the increase will be lost in the noise.)

If you run a directory cache, do your logs show that pattern?

2. The time a client chooses for replacing a newly downloaded
consensus may be earlier than the time it chose for replacing the
previous consensus. Again, this is due to the overlap between
replacement intervals: a client may replace the 1:00 consensus at 3:48
and then choose 3:47 as the random time to replace the newly
downloaded 2:00 consensus.

What happens in that case? I haven't looked at the code, but I'm going
to speculate that it doesn't explode in flames or travel through time.
A reasonable way to handle the situation would be to round up the
replacement time to the present time - in other words, to replace the
newly downloaded consensus immediately. If that guess is correct, the
extra load during the overlap between replacement intervals will get
squeezed towards the end of the overlap, as replacement times
occasionally get rounded up but never get rounded down. Some clients
will download two consensuses back-to-back during the overlap.

Again, if you run a directory cache I'd be interested to know whether
you're seeing this pattern.

3. Clients often skip consensuses. The interval for clients to replace
the 1:00 consensus runs from 2:45 to 3:51, and the interval for caches
to download the 3:00 consensus runs from 3:00 to 3:30. So there's
roughly a 46% chance that a client replacing the 1:00 consensus will
get the 2:00 consensus, and roughly a 54% chance that it will get the
3:00 consensus.

This means it's possible for two clients to replace their consensuses
regularly but never see the same consensus: one of them sees the
consensuses published on odd hours, the other sees the consensuses
published on even hours. This situation is unlikely to continue for
long if each client makes a fresh random choice for each replacement
time, but there's no time at which we can say the clients will
definitely have seen the same consensus.

I can't see any partitioning attacks arising from this, but I thought
I'd point it out anyway because I find the result surprising.

As I said before, I'm curious to know the origin of the calculation in
section 5.1 that produces the overlap between replacement intervals.
The replacement time is chosen uniformly at random from the interval
between the time 3/4 into the first interval after the consensus is no
longer fresh, and 7/8 of the time remaining after that before the
consensus is invalid. It's the 7/8 part that results in the odd
figure of 50 minutes 37.5 seconds.

If the replacement time were to be chosen uniformly at random from the
interval between the time 3/4 

[tor-dev] Damian's Status Report - September 2014

2014-09-30 Thread Damian Johnson
Code! Beautiful code! How I've missed thee!

As mentioned before I'm hip deep in the middle of an arm rewrite. Not only to
overhaul the arm codebase, but to ensure Stem will be up to snuff for a Tor
GUI or web dashboard (projects I'd *love* to jump into someday). I'm delighted
to announce a nice bit of progress on this front.

September went toward rewriting its header panel (the top portion of arm's
interface)...

  * Old zomg my eyes, they bleed!
  
https://gitweb.torproject.org/arm.git/blob/e249dc8f5c4c282326324161cf8421f947cf660e:/src/cli/headerPanel.py

  * New hotness
  https://gitweb.torproject.org/arm.git/blob/HEAD:/arm/header_panel.py

As mentioned though this is really more about Stem than arm. Stem improvements
made along the way this month include...

  * Far better handling for the 'private' keyword (the default prefix) and the
default suffix in our ExitPolicy class. (#10107)

  * New BaseController.connection_time() method that provides when our control
connection was established or detached. [1]

  * New Controller.get_accounting_stats() method which reports accounting
information for our relay. [2]

  * Controller now fetches our own descriptor if no fingerprint is provided to
get_server_descriptor(), get_network_status(), or get_microdescriptor().

  * New crop() and join() methods for the str_tools module. [3][4]

  * Dropped a redundant get_* prefix from most of our util function names.
Old names still work as aliases.

  * Our launch_tor_with_config() function raised an OSError if called too many
times, caught thanks to RSenet. (#13141)


DocTor, the monitor for directory authority health, also got some love this
month...

  * Corrected suppression behavior when we have staggered alerts. This should
greatly cut down on the amount of noise on the list.

  * We now check if too many relays lack a bandwidth authority measurement...
or we would if this wouldn't perpetually be in alarm. (#12989, #13046)

  * Revised DocTor's schedule as requested by Sebastian. (#13210)

  * Expanded suppression for messages about turtles and kicked off a
conversation to at long last sort this out. (#13296)


One final note is that I uploaded my Tor Ecosystem presentation to YouTube so
volunteer page visitors can see a streaming presentation rather than
downloading a 53 MB video...

  https://www.youtube.com/watch?v=fb6iqZcQsSg

Cheers! -Damian


[1] 
https://stem.torproject.org/api/control.html#stem.control.BaseController.connection_time
[2] 
https://stem.torproject.org/api/control.html#stem.control.Controller.get_accounting_stats
[3] https://stem.torproject.org/api/util/str_tools.html#stem.util.str_tools.crop
[4] https://stem.torproject.org/api/util/str_tools.html#stem.util.str_tools.join
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread isis
Moritz Bartl transcribed 0.5K bytes:
 Raising the limit from 2 relays per IP to x per IP has been discussed in the
 past and would be an easy change.

We *still* have that limit? I thought we killed it a long time ago.

Can we kill it now? It's not going to do anything to prevent Sybils, it'll
only prevent good relay operators on larger machines from giving more
bandwidth.

-- 
 ♥Ⓐ isis agora lovecruft
_
GPG: 4096R/A3ADB67A2CDB8B35
Current Keys: https://blog.patternsinthevoid.net/isis.txt


signature.asc
Description: Digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread Mike Perry
isis:
 Moritz Bartl transcribed 0.5K bytes:
  Raising the limit from 2 relays per IP to x per IP has been discussed in the
  past and would be an easy change.
 
 We *still* have that limit? I thought we killed it a long time ago.
 
 Can we kill it now? It's not going to do anything to prevent Sybils, it'll
 only prevent good relay operators on larger machines from giving more
 bandwidth.

If not kill it, raising it to 4-8 might make more sense and be a good
interim step.

We could lower it again if we have real multicore support (which I still
think we should aim for because it also improves the mix properties of
fast relays to an external observer, but I realize is no small task).


-- 
Mike Perry


signature.asc
Description: Digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev