Re: On the performance scalability of Tor

2007-07-18 Thread Vlad \"SATtva\" Miller
Michael_google gmail_Gersten wrote on 19.07.2007 11:43:
> The discussion about fast nodes: Is there a way to tell the client,
> "Only select fast nodes"? Does "fast" imply high throughput (but it
> might be slow to start up), or fast startup/turn around?

I think if such an option would be implemented and many use it, then
these "fast nodes" soon become not so really fast. I simply reason this
on the Steven's original blog posting.

-- 
SATtva | security consulting
www.vladmiller.info | www.pgpru.com



Re: On the performance scalability of Tor

2007-07-18 Thread Michael_google gmail_Gersten

They can be issued concurrently. Tor doesn't care.


Indeed; I will see vidalia show a lot of connectings all at once,
followed by all switching to open at once (TCP streams inside a tor
circuit).

The overhead to open a new TCP HTTP connection through a tor circuit
seems to be very long. I know it has to be encrypted/decrypted at each
node, as well as originated at the exit to the destination, but I have
to wonder: Can this be sped up at all?

The discussion about fast nodes: Is there a way to tell the client,
"Only select fast nodes"? Does "fast" imply high throughput (but it
might be slow to start up), or fast startup/turn around?


Re: On the performance scalability of Tor

2007-07-18 Thread Mike Perry
Thus spake Roger Dingledine ([EMAIL PROTECTED]):

> On Wed, Jul 18, 2007 at 07:52:14PM -0700, Mike Perry wrote:
> > Thus spake Mike Perry ([EMAIL PROTECTED]):
> > 
> > > RELAY_EXTEND is the way this is done. I believe clients can and do
> > > send multiple RELAY_EXTENDs in a row, so it's not like its a
> > 
> > Sorry, I'm a moron. I meant to say RELAY_BEGIN. Also, Roger/Nick,
> > please correct me if these can't be issued concurrently.
> 
> They can be issued concurrently. Tor doesn't care.
> 
> Also, with HTTP/1.1 pipelining to the actual website, we just need to
> open a single TCP stream (one RELAY_BEGIN) and we can then fetch many
> pages in a row.
> 
> I'm not sure if this is better or worse than fetching them in parallel. I
> suspect Juliusz (the polipo developer) has opinions :), and I'll defer
> to him.

Yeah, cause it's not like optimizing high latency networks and chatty
protocols for speed is my day job or anything. Probably should wait
for the expert to weigh in to really be sure ;)

-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs


pgphk8XMyUpdv.pgp
Description: PGP signature


Re: On the performance scalability of Tor

2007-07-18 Thread Roger Dingledine
On Wed, Jul 18, 2007 at 07:52:14PM -0700, Mike Perry wrote:
> Thus spake Mike Perry ([EMAIL PROTECTED]):
> 
> > RELAY_EXTEND is the way this is done. I believe clients can and do
> > send multiple RELAY_EXTENDs in a row, so it's not like its a
> 
> Sorry, I'm a moron. I meant to say RELAY_BEGIN. Also, Roger/Nick,
> please correct me if these can't be issued concurrently.

They can be issued concurrently. Tor doesn't care.

Also, with HTTP/1.1 pipelining to the actual website, we just need to
open a single TCP stream (one RELAY_BEGIN) and we can then fetch many
pages in a row.

I'm not sure if this is better or worse than fetching them in parallel. I
suspect Juliusz (the polipo developer) has opinions :), and I'll defer
to him.

--Roger



Re: On the performance scalability of Tor

2007-07-18 Thread Mike Perry
Thus spake Mike Perry ([EMAIL PROTECTED]):

> RELAY_EXTEND is the way this is done. I believe clients can and do
> send multiple RELAY_EXTENDs in a row, so it's not like its a

Sorry, I'm a moron. I meant to say RELAY_BEGIN. Also, Roger/Nick,
please correct me if these can't be issued concurrently.


-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs


pgpeVSV95QEQn.pgp
Description: PGP signature


Re: On the performance scalability of Tor

2007-07-18 Thread Mike Perry
Thus spake Robert Hogan ([EMAIL PROTECTED]):

> On Wednesday 18 July 2007 15:58:17 Steven Murdoch wrote:
> > As always, comments and suggestions, either here on the list or on the
> > blog, are appreciated.

(Excellent observations Steven. I think you're spot-on. I've long
suspected this sort of behavior also. The converse is why I always
argue to make Tor more efficient to get more users. Also, I think
first Tor needs to fix balancing issues before more nodes can even
really support more users to begin with).

> It has always seemed to me that there is plenty of raw 'bandwidth' on the tor 
> network. I've just downloaded the tor tarball at a relatively nippy 17KB/s. 
> Not greased lightning by any means but clearly if it was just bandwidth at 
> issue the general browsing experience would be a lot different.

Well, also, the bandwidth of Tor fetches can vary considerably, and in
a balanced network this shouldn't be the case.  Currently, the middle
of the network is overloaded due to guard bug #440 and exit clipping.
I published a brief study of this in December, though at that point I
did not yet know the cause.

The 35-45% tier nodes are much more unreliable than the upper AND
lower bandwidths. Note that the guard node cutoff is about 50% by
bandwidth.
http://archives.seul.org/or/talk/Dec-2006/msg00123.html

I think this plus the exit issue are one of the reasons why Tor
bandwidth performance is irregular. Another major other reason being path
selection issues: like crossing the atlantic ocean 4 times to retrieve
some document.

> From a layman's point of view, opening a web page with tor seems to involve 
> at 
> least 10 to 15 separate streams, usually over the same circuit. Once the 
> streams are up they are only up for a short time before a new one is created. 
> Just looking at the connection monitor on TorK it seems to me that half the 
> time is spent creating these streams and half the time (often a lot less) 
> actually using them.
> 
> I presume Tor is just reflecting the behaviour of privoxy and the browser 
> here, which is opening up new tcp sessions for numerous different requests to 
> the same destination. I understand that pipelining in firefox and polipo 
> mitigates this somewhat. Or does it? I'm not 100% sure on that score.
> 
> At any rate couldn't/shouldn't Tor take care of this? Why can't Tor maintain 
> and reuse a successfully created stream for all requests to an active 
> destination and let the exit break out the requests into their respective tcp 
> sessions? Put simply, if my understanding is correct Tor is respecting the 
> tcp architecture in the wrong place, at the client (creating new streams for 
> each tcp request/session) rather than the exit (creating new tcp sessions 
> where appropriate from the same stream).

The problem is the round trip time to create the TCP connection. A
client has to tell the exit to create the TCP connections somehow.
RELAY_EXTEND is the way this is done. I believe clients can and do
send multiple RELAY_EXTENDs in a row, so it's not like its a
chattyness issue.. Sending multiple requests in a row is effectively
the same as sending a single request with a bunch of "please connect"
requests stacked in it, from a networking standpoint. You just have to
wait for the requests to cross n oceans and 3 queues on the way out,
and on the way back. So long as you are not waiting for them to be
sent one at a time, you are going about as fast as you can go.

So, pipeling may or may not help this aspect, at least. So long as the
circuit is reused, it probably is not terribly significant as to
whether there is a single RELAY_EXTEND that will set up a pipelined
HTTP request, or a bunch of RELAY_EXTENDs issued in parallel for a
bunch of HTTP requests. What probably matters more is how many
concurrent proxy requests your browser is willing to issue.
Concurrency is main way to mitigate request/response latency..

But, that being said, the one case where pipelining WILL help more
than just multiple HTTP requests is if you have a high latency between
your exit and your TCP destination. High latency makes TCP slow start
a real bottleneck for short-lived connections like HTTP. So if you
exit out of germany to fetch documents in the US, you will spend most
of your time waiting for TCPs congestion windows to grow up to fill
the BDP of their links.  If you use pipelining, one TCP connection and
thus only one TCP slow start is incurred for these fetches. So that
can be a huge bonus for cross-oceanic webpage downloads.


P.S. Yes, I do know way too much about this sort of thing. :)

-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs


pgppI5XZoH3Dq.pgp
Description: PGP signature


Re: On the performance scalability of Tor

2007-07-18 Thread Robert Hogan
On Wednesday 18 July 2007 15:58:17 Steven Murdoch wrote:
> A frequently stated problem with Tor is the poor performance and
> improving this is the goal of several sub-projects. One of these is to
> simply encourage the deployment of more Tor servers. This will
> increase the capacity of the network, but the consequent improvement
> to users is more difficult to estimate.
>
> The intuitive hypothesis -- that a n% increase in network capacity
> will result in an n% increase in performance for users -- is almost
> certainly wrong. In fact, the defining factor is how the number of
> users scales with the available bandwidth -- a currently unknown
> function.
>
> I discuss this way of looking at Tor as well as the consequences and
> limitations of the approach in a blog post published today:
>
> 
> http://www.lightbluetouchpaper.org/2007/07/18/economics-of-tor-performance/
>
> As always, comments and suggestions, either here on the list or on the
> blog, are appreciated.
>
> Steven.

It has always seemed to me that there is plenty of raw 'bandwidth' on the tor 
network. I've just downloaded the tor tarball at a relatively nippy 17KB/s. 
Not greased lightning by any means but clearly if it was just bandwidth at 
issue the general browsing experience would be a lot different.

From a layman's point of view, opening a web page with tor seems to involve at 
least 10 to 15 separate streams, usually over the same circuit. Once the 
streams are up they are only up for a short time before a new one is created. 
Just looking at the connection monitor on TorK it seems to me that half the 
time is spent creating these streams and half the time (often a lot less) 
actually using them.

I presume Tor is just reflecting the behaviour of privoxy and the browser 
here, which is opening up new tcp sessions for numerous different requests to 
the same destination. I understand that pipelining in firefox and polipo 
mitigates this somewhat. Or does it? I'm not 100% sure on that score.

At any rate couldn't/shouldn't Tor take care of this? Why can't Tor maintain 
and reuse a successfully created stream for all requests to an active 
destination and let the exit break out the requests into their respective tcp 
sessions? Put simply, if my understanding is correct Tor is respecting the 
tcp architecture in the wrong place, at the client (creating new streams for 
each tcp request/session) rather than the exit (creating new tcp sessions 
where appropriate from the same stream).

Feel free to slap some sense into me on this one.
-- 

Browse Anonymously Anywhere - http://anonymityanywhere.com
TorK- KDE Anonymity Manager - http://tork.sf.net
KlamAV  - KDE Anti-Virus- http://www.klamav.net