Re: [tor-dev] Scaling tor for a global population

2014-10-02 Thread Mike Perry
Sebastian Hahn:
 On 27 Sep 2014, at 02:18, Mike Perry wrote:
  If we were willing to tolerate 10% directory overhead this would allow
  for 5 times as many users. In other words, 100M daily connecting users.
  We would still need to find some way to fund the growth of the network
  to support this 40X increase, but there are no actual *technical*
  reasons why it cannot be done.
 One thing there isn't an automatic answer to is whether our current users
 and our would-be users are differ in their usage pattern. Currently, my
 intuition would be that most of our users are responsible for a relative
 small amount of traffic, whereas we have some users who pull a lot of
 data. I wonder what happens with Netflix, youtube and other services.
 We might at least want to try and figure this into the equation by
 estimating our average daily bytes sent/received per users, and
 comparing that to the bytes sent/received by our target group.

This is a great point, Sebastian. Some of the browser vendors do this
sort of analytics, and other parties have also made some inferences

In particular, these two studies are old, and may not be accurate
anymore, but they say that between 5-15% of each browser's userbase
used Private Browsing Mode, depending on the browser (pg 9):

And moreover, that the typical usage duration for Private Browsing Mode
was 4-22 minutes, with 10 minutes being most common:

Refreshed studies of this type of data would be very helpful for
determining what size rollout we could handle, and if we'd want to
further limit it by making the Tor mode opt-in, not prominent, or
otherwise less likely to be selected by most users, at least for the
initial versions.

We could also capture more detailed usage frequency analytics and
bytecount statistics in an alpha or other trial version.

Mike Perry

Description: Digital signature
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-10-01 Thread Sebastian G. bastik.tor
30.09.2014, 01:12 isis:
 isis [mash-up]
 [3]: Please, don't give all the shit relays to me as bridges. I think it's
  less important scalability-wise (right now) to have a strict cutoff rate
  for bridges, but eventually, when/if we ever have Bridge Bandwidth
  Authorities, BridgeDB should cut off anything below some completely
  arbitrary rate, like 100 KB/s. I've gotten a bridge (from which was 28 B/s. Yes, *bytes*. That thing
  was probably slowing down the rest of the Tor network just by *existing*
  via its molasses-speeds blocking the Exit from continuing the response
  after SENDME number of cells, which is probably eventually going to cause
  TCP timeouts on the Exit's side and a whole bunch of other messes.

I think the Tor Project requires a high number of bridges to make
collection of all addresses harder for some adversaries. I'm aware that
adversaries can outrun the number of brides. This might or might not be
valid until you shut down all vanilla bridges.

Obviously bridges that don't provide too less bandwidth should not take
part in the network.

   What I usually recommend is to users is based on their bandwidth and how
 frequently their IP changes.  If their connection is fast and their IP never
 changes (eg, a desktop or server), then run a non-exit relay [2].  For a
 laptop that moves to-from work, then a relay or bridge.
 Actually, anything with a constantly changing IP is a terrible idea for a
 Think of it this way: BridgeDB hands you a bridge line on Monday. On Tuesday,
 the bridge's IP rotates to something else. Now your Tor doesn't work. Wah-wah,
 sucks for you!

The bridge being down is indeed a problem. I got the same recommendation
of running a bridge on my home-network, rather than a relay (or exit)
from a person working for/on Tor. I passed that recommendation around,
IIRC even on this list, and no one complained.

It is problematic to have this information around and you, isis,
speaking up s late.

The changing IP part is beneficial for censorship circumvention. It
would be broken if clients could learn the new IP address automatically.

Distribution of bridges just sucks for circumvention of censorship. (Not
your work, isis.) That could be improved by handing out puzzles.

 There's one issue if you remove all the small relays, only relays run by
 the NSA will be around. Not many people have access to multi-megabit upload
 speeds. And those that do might also be using bittorrent.
 I'm quite certain that I'm definitely not the NSA, and I run a multi megabyte
 exit relay[.]

I don't think you are the Tor Project either, isis. You are working for
the Tor Project. Well depending on the view all employees are the Tor

I'm not suggesting you would run those nodes for the NSA or that you
work for them in any way.

Thank you all for your work in upscaling Tor.

Sebastian G. bastik
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread Prateek Mittal
Thanks isis. We worked on PIR-Tor a while ago, so my recollection could be
a bit rusty, but here are some thoughts.

1) From a performance perspective, the best results are obtained by using
the information-theoretic variant of PIR-Tor (ITPIR) (as opposed to the
computational variant). As you correctly point out, we considered the three
guard relays to be the PIR directory servers, with the assumption that not
all three collude. (Our argument for the non-colluding assumption was that
if all three guards are malicious, then the security offered by Tor is
terrible already.) Note than ITPIR necessarily requires more than one
server. Given Tor's move to a single guard design, the first challenge is
to figure out alternative PIR servers for the client.

(I havn't thought much about this challenge, but it seems that the single
guard could be one server, and two other relays (maybe with guard flags)
could play the role of PIR directory servers. I'll have to think more about
this, but it is possible that this design continues to provide similar
security properties to Tor)

2) Even outside the context of PIR, there are several advantages to having
some structure in the database, with fine grained signatures on individual
relay/micro descriptors (relays could also be grouped into blocks, with
signatures at the block level). For example, if the Tor network were to use
2 hop circuits, then clients would have close to 0 (or at least
significantly less than 1%) overhead for maintaining network state.

Why? Because the first hop (guard) is fixed, fetching the directory to
select the guard is only a *one-time* operation; note that much of the
directory overhead in Mike Perry's analysis comes from *periodic fetches*.
Secondly, clients do not need to use PIR to fetch the exit relay in this
setting, since guards know the identity of exits in a two hop design anyway
(bandwidth overhead of fetching a single exit is simply the size of the
descriptor/micro-descriptor -- resulting in *several orders of magnitude*
bandwidth savings). (This optimization was referred to in the paper in the
context of fetching middle relays)

(Of course there are anonymity implications of moving to two hop designs.
Perhaps the most significant concern is easy guard identification, but
perhaps an additional 100 million clients + move to a single guard reduces
these concerns. The bandwidth savings could be very significant, specially
in the context of mobile clients, which otherwise might use hundreds of MB
per month just to maintain network state. Anyway, I digress.)

Please find some comments about your previous mail inline (below).

On Tue, Sep 30, 2014 at 1:04 AM, isis wrote:

   1. The authors assume that a client's three [sic] Guard Nodes will be the
  directory mirrors for that client, accepting PIR queries for

Right, please see (1) above.

   3. There is zero handling of colluding directory mirrors.

This seems incorrect? We used the Percy open source library to perform PIR,
which does handle collusion among mirrors (to the best of my recollection).
The parameters can be set such that one honest server is sufficient for

   4. The *increase* to descriptor size is not well factored in to the

 4.b. All of the Directory Authorities would need to sign each and every
   descriptor by itself. This would result in the current
   microdescriptors, which are sized from roughly 900 bytes to some
   the larger ones which are 2.5 KB, increasing in size by roughly
 4 KB
   (that's the size of the DirAuth signatures on the current

I am not sure which signature scheme you are considering. In the paper, we
talk about threshold BLS signatures, which have only 22 byte overhead.
(This might increase for better security guarantees, but 4KB overhead seems
very conservative.)

 While one of the benefits of PIR would be that clients would no longer
 need to
 request the descriptors for all relays listed in the consensus, this
 actually doesn't help as much as it would seem, because switching to a PIR
 scheme would require a client to make three of these roughly O(146449)-bit
 requests, every ten minutes or so, when the client wants a new circuit.

Right, clients would roughly need 18 middle/exit relays (using PIR queries)
over a three hour interval to build a Tor circuit every 10 minutes. My
recollection is that even in 2011, PIR seemed more efficient (we estimated
a 12KB overhead per PIR query) than a full network fetch every three hours
(12*18 = 216KB with PIR vs several MBs without PIR). Though these numbers
might change depending on implementation details (including the use of
micro-descriptors), the benefits would only improve with growing network


Prateek Mittal
Assistant Professor
Princeton University
tor-dev mailing 

Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread Nikita Borisov
On Tue, Sep 30, 2014 at 6:29 AM, Prateek Mittal wrote:
   4. The *increase* to descriptor size is not well factored in to the

 4.b. All of the Directory Authorities would need to sign each and
   descriptor by itself. This would result in the current
   microdescriptors, which are sized from roughly 900 bytes to some
   the larger ones which are 2.5 KB, increasing in size by roughly
 4 KB
   (that's the size of the DirAuth signatures on the current

 I am not sure which signature scheme you are considering. In the paper, we
 talk about threshold BLS signatures, which have only 22 byte overhead. (This
 might increase for better security guarantees, but 4KB overhead seems very

You can make this even lower because every PIR query returns a block
which is roughly square root the size of the entire database; you
would only need to have a signature on each block.

- Nikita
Nikita Borisov -
Associate Professor, Electrical and Computer Engineering
Tel: +1 (217) 244-5385, Office: 460 CSL
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread Nikita Borisov
On Tue, Sep 30, 2014 at 12:04 AM, isis wrote:
 [1]: I assume that we need replication in Tor's use case. There are papers,
  such as the following:

  Kushilevitz, Eyal, and Rafail Ostrovsky.
Replication is not needed: Single database, computationally-private
information retrieval.
2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
IEEE Computer Society, 2013.

  for which the research doesn't apply because it was aimed at
  computationally-hiding PIR schemes, and obviously Tor aims for
  information theoretic security. Other than the PIR-Tor paper, I haven't
  found any literature which analyses PIR for anything close to Tor's use
  case. (Although I'd be stoked to hear about any such papers.)

The current Tor design relies heavily on computational assumptions in
all of its cryptography, so I don't think that the Tor network setting
is a reason to use information-theoretic rather than computational
PIR. We advocated ITPIR because it is much more efficient and because
collusion among your three guards was already a significant problem.
With the move to a single guard, that argument no longer makes as much
sense. We do have an analysis in the paper of the computational PIR
scenario, but it does impose significantly more (computational)
overhead on the directory mirrors.

- Nikita
Nikita Borisov -
Associate Professor, Electrical and Computer Engineering
Tel: +1 (217) 244-5385, Office: 460 CSL
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread Moritz Bartl
On 09/30/2014 06:28 AM, AFO-Admin wrote:
 E.g. you have a Server with 2x E5-2683 v3  v3 and a 10 Gbit/s pipe you
 would need atleast 14 IP's to use most of the CPU. 

Multicore support is hard and needs developers. Raising the limit from 2
relays per IP to x per IP has been discussed in the past and would be an
easy change.

Moritz Bartl
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread isis
Moritz Bartl transcribed 0.5K bytes:
 Raising the limit from 2 relays per IP to x per IP has been discussed in the
 past and would be an easy change.

We *still* have that limit? I thought we killed it a long time ago.

Can we kill it now? It's not going to do anything to prevent Sybils, it'll
only prevent good relay operators on larger machines from giving more

 ♥Ⓐ isis agora lovecruft
GPG: 4096R/A3ADB67A2CDB8B35
Current Keys:

Description: Digital signature
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-30 Thread Mike Perry
 Moritz Bartl transcribed 0.5K bytes:
  Raising the limit from 2 relays per IP to x per IP has been discussed in the
  past and would be an easy change.
 We *still* have that limit? I thought we killed it a long time ago.
 Can we kill it now? It's not going to do anything to prevent Sybils, it'll
 only prevent good relay operators on larger machines from giving more

If not kill it, raising it to 4-8 might make more sense and be a good
interim step.

We could lower it again if we have real multicore support (which I still
think we should aim for because it also improves the mix properties of
fast relays to an external observer, but I realize is no small task).

Mike Perry

Description: Digital signature
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-29 Thread isis
Ryan Carboni transcribed 1.1K bytes:
 There's one issue if you remove all the small relays, only relays run by
 the NSA will be around. Not many people have access to multi-megabit upload
 speeds. And those that do might also be using bittorrent.

I'm quite certain that I'm definitely not the NSA, and I run a multi megabyte
exit relay:

 ♥Ⓐ isis agora lovecruft
GPG: 4096R/A3ADB67A2CDB8B35
Current Keys:

Post Scriptum:

  Mike, for the numbers you're gathering on the tor-relays mailist list
  thread, the above exit relay costs me approximately $20/month.

Description: Digital signature
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-29 Thread Ryan Carboni
On Mon, Sep 29, 2014 at 4:36 PM, isis wrote:

 Ryan Carboni transcribed 1.1K bytes:
  There's one issue if you remove all the small relays, only relays run by
  the NSA will be around. Not many people have access to multi-megabit
  speeds. And those that do might also be using bittorrent.

 I'm quite certain that I'm definitely not the NSA, and I run a multi
 exit relay:

You're missing the point. It would be trivial for a multibillion dollar
organization to sybil attack Tor if you add excessive restrictions.
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-29 Thread Thomas White
It would be trivial for a multimillion dollar organisation to sybil
attack Tor even as it stands right now.

On 30/09/2014 01:12, Ryan Carboni wrote:
 On Mon, Sep 29, 2014 at 4:36 PM, isis wrote:
 Ryan Carboni transcribed 1.1K bytes:
 There's one issue if you remove all the small relays, only relays run by
 the NSA will be around. Not many people have access to multi-megabit
 speeds. And those that do might also be using bittorrent.

 I'm quite certain that I'm definitely not the NSA, and I run a multi
 exit relay:

 You're missing the point. It would be trivial for a multibillion dollar
 organization to sybil attack Tor if you add excessive restrictions.
 tor-dev mailing list
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-29 Thread Nikita Borisov
On Mon, Sep 29, 2014 at 7:12 PM, Ryan Carboni wrote:
 You're missing the point. It would be trivial for a multibillion dollar
 organization to sybil attack Tor if you add excessive restrictions.

If you look at the numbers isis posted, all relays below the median
contribute less than 3% of the overall bandwidth. I think cutting all
of them off might be overkill, but even if you did, you would not make
a Sybil attack appreciably harder.

- Nikita
Nikita Borisov -
Associate Professor, Electrical and Computer Engineering
Tel: +1 (217) 244-5385, Office: 460 CSL
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-29 Thread isis
Andrew Lewman transcribed 1.8K bytes:
 The last research paper I see directly addressing scalability is Torsk
 ( or PIR-Tor

Using Private Information Retrieval (PIR) to retrieve a 50 KB/s relay would
likely make the current network [0] completely unusable.


First, the PIR-Tor paper makes a few naïve assumptions which grossly effect
the analysis of the overhead for directory fetches:

  1. The authors assume that a client's three [sic] Guard Nodes will be the
 directory mirrors for that client, accepting PIR queries for descriptors.

  2. There is zero handling of cases where multiple directory mirrors have
 differing data.

  3. There is zero handling of colluding directory mirrors.

  4. The *increase* to descriptor size is not well factored in to the analysis.

 In the PIR-Tor paper, §4.2:
   | Even if all three guard relays are compromised, they cannot actively
   | manipulate elements in the PIR database since they are signed by the
   | directory authorities[...]
 Single descriptors are *not* signed, the set of all descriptors is.
 Therefore, if a client wanted to actually check that all (or a majority)
 or the Directory Authorities had signed a descriptor, that client would
 need to:

 4.a. Download all of them and check. Which negates the whole point of
  this PIR thing.

 4.b. All of the Directory Authorities would need to sign each and every
  descriptor by itself. This would result in the current
  microdescriptors, which are sized from roughly 900 bytes to some of
  the larger ones which are 2.5 KB, increasing in size by roughly 4 KB
  (that's the size of the DirAuth signatures on the current consensus).

  Result: Each microdescriptor is 4X larger, and the Directory Authorities
  need to do several orders of magnitudes more signatures.


Second, because the PIR-Tor paper doesn't seem to be what we would actually
want to implement, the following is an analysis.

Using some safe assumptions about what security guarantees we would likely
want to require from any PIR scheme we used... That is, we would like want a
1-roundtrip (RT), b-private, b-Byzantine, k-out-of-m databases, [1] PIR
scheme. (Meaning that there are m total directory mirrors, only k of those
actually answer a set of your queries, and you want the privacy of your
queries and the authenticity of the answers to your queries to be protected
even if b number of directory mirrors are malicious and colluding with one

Then, with those assumptions in mind, from Theorem 7.7 in §7.1 of this review
on PIR schemes [2] (which is, admittedly, a bit dated) used in one of Dan
Boneh's classes, we have a size complexity [3] of

  O((k/3b) n^(1/[(k-1)/3]) m log_2 m)

where bk/3 (Meaning that the number of malicious, colluding directory mirrors
 which answered is less than one third of the total number of
 mirrors which answered.)

 import math
 m = 3000
 k = 30
 b = 4
 n = 160  # the number of bits in a relay fingerprint
...   * (n**(1.0/((float(k)-1.0)/float(3)
...   * float(m)) * math.log(float(m), 2.0)))

the above space complexity comes out to O(146449)-bits per lookup. (Where, by
lookup, I mean the set of all queries pertaining to fetching a single relay
descriptor, since, in PIR, a query is usually for a single bit.) This would
mean that adding any sane PIR scheme for directory requests would result in
something like a 900X increase in the bytes used for each request.

While one of the benefits of PIR would be that clients would no longer need to
request the descriptors for all relays listed in the consensus, this benefit
actually doesn't help as much as it would seem, because switching to a PIR
scheme would require a client to make three of these roughly O(146449)-bit
requests, every ten minutes or so, when the client wants a new circuit.

Don't get me wrong; PIR is awesome. But the current Tor network is likely
*not* the correct real-world application of any of the current PIR schemes. At
least, it isn't until we have some HUGE number of nodes in the network, which
by #1 and #2 in my original reply to this thread, [4] we shouldn't.

[0]: Note, I said the current network. An imaginary future Tor network which
 has 10,000,000 relays would be different. And then note the very last
 paragraph where I state that we probably shouldn't ever have 10,000,000

[1]: I assume that we need replication in Tor's use case. There are papers,
 such as the following:

 Kushilevitz, Eyal, and Rafail Ostrovsky.

Re: [tor-dev] Scaling tor for a global population

2014-09-28 Thread Sebastian Hahn

On 28 Sep 2014, at 02:12, Tom Ritter wrote:
 why not also change the consensus
 and related document formats to be something more efficient than ASCII
 text?  Taking the latest consensus and doing some rough estimates, I
 found the following:
 Original consensus, xz-ed: 407K
 Change flags to uint16: ~399K
 +Removing names: 363K
 +Compressing IPv6 to 16Bytes + 4 Bytes - 360K
 +Compressing IPv4 to 4 Bytes + 4Bytes + 4bytes - 315K
 +Compressing the Datetime to 4 bytes - 291K
 +Compressing the Version string to 4bytes - 288K
 +Replacing reject 1-65K to a single byte - 287K
 +Replacing Bandwidth=# with a 4 byte - 273K
 These numbers are optimistic - you won't see quite this much gain, but
 if I'm understanding you correctly that the consensus is painful, it
 seems like you could save at least 50K-70K out of 400K with relatively
 straightforward changes.

This analysis doesn't make much sense, I'm afraid. We use compression
on the wire, so repeating flags as human-readable strings has a much
lower overhead than you estimate, for example. Re-doing your estimates
with actually compressed consensuses might make sense, but probably
you'll see a lot less value.

tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-28 Thread teor
On 27 Sep 2014, at 10:18 , wrote:
 Date: Fri, 26 Sep 2014 17:18:07 -0700
 From: Mike Perry
 Subject: Re: [tor-dev] Scaling tor for a global population
 2. Cut off relays below the median capacity, and turn them into bridges. 
Relays in the top 10% of the network are 164 times faster than
relays in the 50-60% range, 1400 times faster than relays in the 
70-80% range, and 35000 times faster than relays in the 90-100% range.
In fact, many relays are so slow that they provide less bytes to the
network than it costs to tell all of our users about them. There
should be a sweet spot where we can set this cutoff such that the
overhead from directory activity balances the loss of capacity from
these relays, as a function of userbase size.
Result: ~2X reduction in consensus and directory size.

Do these extra relays function as spare (albeit slower) capacity in the network?
If so, we'd have to be careful to set the cut-off low enough that any spikes in 
usage could be accommodated.

Also, doesn't reducing the number of routers in the consensus risk harming 
network diversity?

I understand that if these routers are a net burden on the network, then 
they're not actually contributing much to diversity overall. 

But what about users with obscure configurations like New Zealand routers 
only, or custom list of 20 trusted routers? (Is a restricted set of routers 
a common enough use case?)
Do we risk pulling the data they need from the consensus?
Then again, users with these sorts of configurations are probably technically 
savvy enough to fix any issues that occur with any changeover.


pgp 0xABFED1AC

Description: Message signed with OpenPGP using GPGMail
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-28 Thread Tom Ritter
On 28 September 2014 07:00, Sebastian Hahn wrote:
 This analysis doesn't make much sense, I'm afraid. We use compression
 on the wire, so repeating flags as human-readable strings has a much
 lower overhead than you estimate, for example. Re-doing your estimates
 with actually compressed consensuses might make sense, but probably
 you'll see a lot less value.

All of those numbers were after compressing the consensus document
using xz, which is the best compression method I know.

tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-28 Thread Sebastian Hahn

On 28 Sep 2014, at 16:33, Tom Ritter wrote:
 On 28 September 2014 07:00, Sebastian Hahn wrote:
 This analysis doesn't make much sense, I'm afraid. We use compression
 on the wire, so repeating flags as human-readable strings has a much
 lower overhead than you estimate, for example. Re-doing your estimates
 with actually compressed consensuses might make sense, but probably
 you'll see a lot less value.
 All of those numbers were after compressing the consensus document
 using xz, which is the best compression method I know.

sorry, I completely missed that you showed compressed numbers. Interesting
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-28 Thread Ryan Carboni
There's one issue if you remove all the small relays, only relays run by
the NSA will be around. Not many people have access to multi-megabit upload
speeds. And those that do might also be using bittorrent.
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-27 Thread Virgil Griffith
To avoid squashing the Tor network with all of these new clients, the
company would almost certainly have to run some big relays to help
compensate for the additional load.  Another proposal would be some sort of
incentive for running relays.

tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-27 Thread Fabio Pietrosanti (naif)
Il 9/27/14, 2:33 AM, Mike Perry ha scritto:

 We could also handle controlled rollouts to fractions of their userbase
 to test the waters, and slowly add high capacity nodes to the network to
 support these new users, to ensure we have the people ready to accept
 payment for running the servers, and maintain diversity.
I read your very detailed estimations and improvement paths, i love it!

However i see that the main suggestion to increase the network
capacity can be simplified as follow:
- improve big nodes ability to push even more traffic
- add more big nodes

Other improvements are to reduce the consensus size and directory
load, but not specifically on network capacity.

While this is the obvious way to add more capacity i feel that's going
to have impacts such as:
1) reduce the diversity (thus the anonymity, because few players will
handle most of the network's traffic)
2) make it irrelevant for anyone to run their own small/volounteer relay

That sounds like the easier way to scale up in a defined amount of
time and with a defined budget, but imho also with consequences and
pre-defined limits.

I feel that the only way to scale-up without limits and consequences is
to have end-users became active elements of the network, where we have
success story such as Skype.

End-users have important network resources available that can be
estimated and used (with care).

Not all end-users are equal, i'm now on a 2M Hyperlan line (damn digital
divide!), but someone else in Stockholm or San Francisco it's on a
1000M/100M fiber connection @home (not in a datacenter) and while in
Milan i've a 100M/10M fiber!

That bandwith resources are amazing, usually quite cheap (home broadband
lines), widely available in the end-users hands.

IMHO those are the bandwidth resources, widely available, cheap, very
diverse/sparse that could help the Tor network to scale-up.

How to use it properly within/for the Tor network? That's a different topic.

But those big bandwidth resources are there, available under our feet,
in our home, and we're not using it!

tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-27 Thread M. Ziebell
Besides the fact that this could be a great opportunity for tor in many
ways I see two problems we should consider:

- this vastly growth would be artificial. What happens to all the
  users and servers if they stop supporting the product or close-down?

- IMHO it is a problem if the network expands because the SOME users
  pay for it. Don't get me wrong. I know that many people are willing
  to spend their private money to run tor-relays ... but I know of no
  user who have to pay for it. (Expect torplug, but that is a different

Possible that I'm getting something completely wrong, but just for the


Description: PGP signature
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-27 Thread Tom Ritter
On 26 September 2014 22:28, Mike Perry wrote:
 That's basically what I'm arguing: We can increase the capacity of the
 network by reducing directory waste but adding more high capacity relays
 to replace this waste, causing the overall directory to be the same
 size, but with more capacity.

I'm sure that diffs will make a huge difference, but if you're
focusing on the directory documents why not also change the consensus
and related document formats to be something more efficient than ASCII
text?  Taking the latest consensus and doing some rough estimates, I
found the following:

Original consensus, xz-ed: 407K
Change flags to uint16: ~399K
+Removing names: 363K
+Compressing IPv6 to 16Bytes + 4 Bytes - 360K
+Compressing IPv4 to 4 Bytes + 4Bytes + 4bytes - 315K
+Compressing the Datetime to 4 bytes - 291K
+Compressing the Version string to 4bytes - 288K
+Replacing reject 1-65K to a single byte - 287K
+Replacing Bandwidth=# with a 4 byte - 273K

These numbers are optimistic - you won't see quite this much gain, but
if I'm understanding you correctly that the consensus is painful, it
seems like you could save at least 50K-70K out of 400K with relatively
straightforward changes.

tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-27 Thread Ryan Carboni

 But, because this is fraction rises with both D and U, these research
 papers rightly point out that you can't keep adding relays *and* users
 and expect Tor to scale.

Broadcast a fraction of all available directories? Use md5 as a random
number generator, hash the ECC/RSA keys using md5. A user connecting to the
network will generate an 8-bit random value, and based on that, will
download one of 1/256 directories.

Right now, any relay with more than ~100Mbit of capacity really
needs to run an additional tor relay instance on that link to make
use of it. If they have AES-NI, this might go up to 300Mbit.

Any plans to use ChaCha8 instead of AES? It would an order of magnitude
faster. It is also unlikely for ChaCha8 to become sufficiently insecure to
effect web-size non-video traffic.
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-27 Thread Mike Perry
M. Ziebell:
 Besides the fact that this could be a great opportunity for tor in many
 ways I see two problems we should consider:
 - this vastly growth would be artificial. What happens to all the
   users and servers if they stop supporting the product or close-down?

In this case, presumably the users would disappear, as this vendor's
browser product would no longer include Tor (and I would hope push out
an update that removed the feature).

 - IMHO it is a problem if the network expands because the SOME users
   pay for it. Don't get me wrong. I know that many people are willing
   to spend their private money to run tor-relays ... but I know of no
   user who have to pay for it. (Expect torplug, but that is a different

Well, I think the model we are considering is that a vendor is
monitizing their users some other way (perhaps because they purchased
the phone that comes with Tor on it, or through advertising revenue
during non-private mode).

This vendor would presumably then have an interest in contributing
either relays, money, or both to make sure the Tor network is fast
enough to be useful to their users.

I am worried that this particular vendor is thinking about just using
our Tor Browser patches without talking to us on the engineering side,
but I guess this is still in the initial stages of discussion.

Mike Perry

Description: Digital signature
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-27 Thread Mike Perry
Mike Perry:
 5. Invest in the Tor network.
Based purely on extrapolating from the Noisebridge relays, we could
add ~300 relays, and double the network capacity for $3M/yr, or about $1
per user per year (based on the user counts from:
Note that this value should be treated as a minimum estimate. We
actually want to ensure diversity as we grow the network, which may make
this number higher. I am working on better estimates using replies from:
Automated donation/funding distribution mechanisms such as are especially interesting ways to do this
(and can even automatically enforce our diversity goals) but more
traditional partnerships are also possible.
Result: 100% capacity increase for each O($3M/yr), or ~$1 per new user
per year.

Naif's point about there being 100Mbit residential uplinks out there
suggests that there may be a hybrid approach here.

If this vendor could detect super-high-speed client uplinks, they could
ask only these users if they wanted to be non-exit relays. But this is
complicated, as it also requires understanding if the user's ISP will
get upset at the traffic consumption or the fact that a listening TCP
service is running. For example, I know Comcast calls their residential
service unlimited, but yells at you if you transfer more than 250GB in
a month, or if they discover any listening TCP ports on your IP address.

Even if we could figure these problems out by looking up ISP policy
based on client IP address, I think we still need to fund exit relays. I
don't think we can just enlist random home users connections to be exits
without giving them a wall of text explaining how to deal with issues
that may arise.

So this may be something to consider to reduce network expenditure, but
it won't completely eliminate it.

Mike Perry

Description: Digital signature
tor-dev mailing list

[tor-dev] Scaling tor for a global population

2014-09-26 Thread Andrew Lewman
I had a conversation with a vendor yesterday. They are
interested in including Tor as their private browsing mode and
basically shipping a re-branded tor browser which lets people toggle the
connectivity to the Tor network on and off.

They very much like Tor Browser and would like to ship it to their
customer base. Their product is 10-20% of the global market, this is of
roughly 2.8 billion global Internet users.

As Tor Browser is open source, they are already working on it. However
,their concern is scaling up to handling some percent of global users
with tor mode enabled. They're willing to entertain offering their
resources to help us solve the scalability challenges of handling
hundreds of millions of users and relays on Tor.

As this question keeps popping up by the business world looking at
privacy as the next must have feature in their products, I'm trying to
compile a list of tasks to solve to help us scale. The old 2008
three-year roadmap looks at performance,

I've been through the specs, to see if
there are proposals for scaling the network or directory authorities. I
didn't see anything directly related.

The last research paper I see directly addressing scalability is Torsk
( or PIR-Tor

Is there a better list available for someone new to Tor to read up on
the scalability challenges?

pgp 0x6B4D6475
tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-26 Thread Fabio Pietrosanti (naif)
Il 9/26/14, 4:58 PM, Andrew Lewman ha scritto:
 They very much like Tor Browser and would like to ship it to their
 customer base. Their product is 10-20% of the global market, this is of
 roughly 2.8 billion global Internet users.
. WOW! .

 Is there a better list available for someone new to Tor to read up on
 the scalability challenges?
As a basic concept, i don't think that Tor could scale up to huge
numbers without making the end-user to became active part of the network

Fabio Pietrosanti (naif)
HERMES - Center for Transparency and Digital Human Rights - -

tor-dev mailing list

Re: [tor-dev] Scaling tor for a global population

2014-09-26 Thread Thomas White
Hash: SHA1

I think one of the important thoughts here, at least as an exit
operator is that having a large group like that can significantly
influence how Tor is seen and I am sure having that kind of backing
could open many avenues for us.

If we are to scale up, we can reduce CPU load (or optimise) per node
or we can have more ISPs who welcome Tor. I think whilst there are
lots of great people doing fantastic work expanding the ISP's who
accept Tor, we need to perhaps revisit those ISP's who have shown to
be hostile towards us. Do you believe this person or group could
assist in perhaps pursuading ISP's to open up to Tor exit operators
like myself?

If they are willing to offer their name as a backing, I'd be more than
happy to dedicate myself for many hours per week to get in touch with
ISP's and try to change their policies. If we see much success, I can
easily co-ordinate a revamp of the good/bad ISP list which has become
a bit messy over the last few months. Given the sheer volume of
traffic my exits have pushed (Petabytes a month), the amount of abuse
complaints I've had and even police raids I am quite comfortable
giving ISP's the honest picture. Some won't open up even with major
backing but I am sure we can convince some to change their policies
when they see and hear from the actual operators who aren't in prison
(since many of them seem to equate running Tor exits to being a
criminal or a guaranteed way to get in trouble with the police).

- -T

On 26/09/2014 15:58, Andrew Lewman wrote:
 I had a conversation with a vendor yesterday. They are interested
 in including Tor as their private browsing mode and basically
 shipping a re-branded tor browser which lets people toggle the 
 connectivity to the Tor network on and off.
 They very much like Tor Browser and would like to ship it to their 
 customer base. Their product is 10-20% of the global market, this
 is of roughly 2.8 billion global Internet users.
 As Tor Browser is open source, they are already working on it.
 However ,their concern is scaling up to handling some percent of
 global users with tor mode enabled. They're willing to entertain
 offering their resources to help us solve the scalability
 challenges of handling hundreds of millions of users and relays on
 As this question keeps popping up by the business world looking at 
 privacy as the next must have feature in their products, I'm
 trying to compile a list of tasks to solve to help us scale. The
 old 2008 three-year roadmap looks at performance,

  I've been through the specs, to
 see if there are proposals for scaling the network or directory
 authorities. I didn't see anything directly related.
 The last research paper I see directly addressing scalability is
 Torsk ( or
 Is there a better list available for someone new to Tor to read up
 on the scalability challenges?
Version: GnuPG v2.0.22 (MingW32)

tor-dev mailing list