Re: [tor-dev] The case for Tor-over-QUIC

2022-03-20 Thread Ali Clark
Sections 3 and 6 of the Quux paper have some relevant discussion [1]

> Unfortunately, it appears that the Quux QUIC paper studied QUIC at the
> wrong position - between relays, and the QuTor implementation is
> unclear. This means that it may still be an open question as to if
> QUIC's congestion control algorithms will behave optimally in a
> Tor-like network under heavy load.

  As Reardon and Goldberg noted in concluding remarks, approaches other
  than hop-by-hop will incur an extra cost for retransmissions, since
  these must be rerouted through a larger part of the network [RG09].
 
  As Tschorsch and Scheuermann discuss [TS12], due to the longer RTT of
  TCP connections, end-to-end approaches will also take longer to “ramp
  up” through slow start and up to a steady state.
 
  Both of these factors (not to mention increased security risk of
  information leakage [DM09]) suggest that hop-by-hop designs are likely
  to yield beer results. In fact, the hop-by-hop approach may be viewed as
  an instance of the Split TCP Performance-Enhancing Proxy design, whereby
  arbitrary TCP connections are split in two to negate the issues noted
  above.

> Unfortunately, the Quux implementation appears to use QUIC at a
> suboptimal position -- they replace Tor's TLS connections with QUIC,
> and use QUIC streams in place of Tor's circuit ID -- but only between
> relays. This means that it does not benefit from QUIC's end-to-end
> congestion control for the entire path of the circuit. Such a design
> will not solve the queuing and associated OOM problems at Tor relays,
> since relays would be unable to drop cells to signal backpressure to
> endpoints. Drops will instead block every circuit on a connection
> between relays, and even then, due to end-to-end reliability, relays
> will still need to queue without bound, subject to Tor's current (and
> inadequate) flow control.

  A fully QUIC relay path (with slight modication to fix a limit on
  internal buffer sizes) would allow end-to-end backpressure to be used
  from the client application TCP stream up to the exit TCP stream.
  Leaving aside Tor’s inbound rate limit mechanism but retaining the
  global outbound limit, this design would allow max-min fairness to be
  achieved in the network, as outlined by Tschorsch and Scheuermann
  [TS11].

  ...

  Once implemented however, backpressure would allow Tor to adopt a
  signicantly improved internal design. In such a design, a Tor relay
  could read a single cell from one QUIC stream’s read buffer, onion crypt
  it, and immediately place it onto the write buffer of the next stream in
  the circuit. This process would be able to operate at the granularity of
  a single cell because the read and write operations for QUIC are very
  cheap user-space function calls and not syscalls as for host TCP.

  The schedule of this action would be governed by the existing EWMA
  scheduler for circuits that have both a readable stream and a writeable
  stream (and as allowed by a global outgoing token bucket), allowing
  optimal quality of service for circuits.

  It’s expected that backpressure implemented in this way will yield
  signicant performance and fairness gains on top of the performance
  improvement found in this thesis.

One issue for Quux was that it used the Chromium demo QUIC server code as the
basis for its implementation, which was fine for performance research but not
such a good choice for Tor's networking stack.

Several Rust implementations have been released with server-side (not just
client-side) usage, so I expect that to be much less of an issue today.

io_uring is also a significant development since Quux was developed, as
it can reduce the performance hit for host-TCP syscalls, or for using
recvmsg instead of recvmmsg with QUIC if the implementation makes
it difficult to use recvmmsg on the listener side.

[1] 
https://www.benthamsgaze.org/wp-content/uploads/2016/09/393617_Alastair_Clark_aclark_2360661_860598830.pdf

The following paper has in-depth discussion, but I don't have a copy to
hand unfortunately:

Ali Clark. Tor network performance — transport and flow control.  Technical 
report, University College London, April 2016
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Thesis using QUIC in Tor

2016-09-30 Thread Ali Clark
> Well done, this looks really neat! A couple of questions:

Thanks Jesse :)

> 1) Are you looking into publishing your work in a peer-reviewed journal
> such as CSS, NDSS, PoPETS, or elsewhere?

Not at the moment, however there's another research group investigating QUIC
and I've also shared my code with them.

> 2) Did you examine the performance improvements for 6-hop onion/hidden
> service circuits?

Afraid I didn't have time, but I expect performance should improve for that
case too.

> 3) Tor currently multiplexes circuits over the same TLS connection. This
> is by design to avoid leaking circuit-level metadata, including the
> observation of construction and tear-down. The first paragraph on page
> 21 seems to suggest that QUUX leaks this information. Is this correct,
> or did you take steps to address this? For that matter, does QUUX leak
> any additional metadata that could assist with de-anonymization attacks?

That paragraph only refers to the internal code buffers before send so
shouldn't be an issue. The stream frames are contained in an encrypted QUIC
packet for transfer over a QUIC connection, and it shouldn't be possible to
tell what streams/circuits are communicating just by looking at an encrypted
QUIC packet from a connection between relays.

The initial stream creation currently sends an "unusual" 32 byte hash and 4
byte circ-id on the connection. If the connection is busy this would hopefully
be resegmented with other streams' data on the connection to create a full
packet. If it's an issue the size could be rounded up to a full cell size
instead though. However, in truth I would be surprised if Tor currently resists
traffic analysis on creation of circuits, since I expect handshake cell timings
would be quite identifiable, especially over a quiet relay.

I agree for a busy relay (both in and out) analysis of a Tor relay's
established circuit traffic should be difficult, and I expect it'd be about as
difficult for QUIC, depending on its algorithm/heuristics for how it chooses to
resegment stream data onto packets send them.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Thesis using QUIC in Tor

2016-09-30 Thread Ali Clark
Hi all,

For my master's thesis this summer I looked into the performance impact from
using QUIC instead of TCP/TLS as the relay transport. Results from the
experiments look quite promising.

For more details and the thesis, please see my blog post:
https://www.benthamsgaze.org/2016/09/29/quux-a-quic-un-multiplexing-of-the-tor-relay-transport/

I'm happy to respond to questions either here or in comments on the blog post.

Ali
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev