[tor-dev] Onion Service Intro Point Retry Behavior

2019-10-29 Thread David Goulet
Greetings everyone,

After some discussions between arma, mikeperry, asn and I, it is time for a
famous tor-dev@ email thread to try to get to a consensus if need be.

In the past weeks, a set of HSv3 reachability issues have been found regarding
*service* intro points (IP):

- hs-v3: Service can keep unused intro points in its list
  https://bugs.torproject.org/31561
- hs-v3: Service can pick more than HiddenServiceNumIntroductionPoints
  intro points https://bugs.torproject.org/31548
- hs-v3: Stop using ip->circuit_established flag
  https://bugs.torproject.org/32094
- hs-v3: Client can re-pick bad intro points
  https://bugs.torproject.org/31541
- hs-v3: Service circuit retry limit should not close a valid circuit
  https://bugs.torproject.org/31652

Long story short, couple weeks ago we've almost merged a new behavior on the
service side with #31561 that would have ditch an intro point if its circuit
would time out instead of retrying it. (Today, a service always retry their
intro point up to 3 times on any type of circuit failure.)

And here comes the core of the discussion of this thread: Retrying intro point
on failure or simply ditch it on failure and pick a new one?

Some 7 years ago, this ticket was created and thus we implemented roughly 4
years ago a mechanism that makes a service retry to establish the intro point
circuit up to 3 times when it collapses (except for very very specific cases
for which we wouldn't):

https://bugs.torproject.org/8239

HSv3 tried to be on feature parity there with v2 up until now that the above
bugs have been mostly fixed.

That being all said, regarding the retry feature, there are pros and cons.
I'll try to organize them below based on many adhoc discussions in the past
and what I can get from all the tickets up to this day (there could be more!
this is just what I could recall and find in the tickets):

== Pros ==

The primary original argument for retrying is based on the mobile use case. If
a .onion is running on a cellphone and the network happens to be bad all the
sudden, the service is better off to re-establish the intro circuits which
would make the retry attempts of the client to finally succeed after a bit
instead of having to re-fetch a descriptor and go to the new intro points.

Thus, in theory, it is mostly a reachability argument.

One question that can arise from this is: Will the client be able to reconnect
using the old intro points by the time the service re-established?

In other words, is the retry behavior of the *client* allows enough time for
the service to stabilize for the mobile use case? I'm curious to learn from
people with experience with this!

== Cons ==

Recently, mikeperry raised concerns about the retry behavior all together and
proposed to simply ditch each time the intro point instead of retrying.

(@Mike, I do invite you to comment here as you mentionned many times
rationales for this but I don't have enough IRC backlog :S).

== Pros _and_ Cons at the same time ==

There is a possible Guard discovery attack argument against retrying. But it
is nuanced on what exactly constitute a failure and when should it retry vs
ditching.

Quote from https://trac.torproject.org/projects/tor/ticket/8239#comment:6

FWIW, it's also worth mentioning that making HSes more stubborn towards
old IPs might also allow guard discovery attacks from the IP. That is the
IP kills incoming circuits, till a compromised middle node is selected,
and since the HS is stubborn it will keep on establishing new circuits.

This was mentioned by waldo here:
https://lists.torproject.org/pipermail/tor-dev/2014-May/006843.html

... which is where the "what is the failure" is important as arma's mentions
in the same ticket:

That's why you should only stick to your intro point when it's your
network that failed (that is, the connection between you and your guard),
not the intro circuit. (This is what I meant in the body of the bug in the
'main tricky point' sentence.)

We had this discussion before in Tor many times on "how to detect network
failures" vs "circuit failures". In other words, if the link to your Guard
fails, that would be enough to consider a network failure and thus retry the
intro point.

But if the circuit collapses due to let say a DESTROY or TRUNCATED cell, then
it could be the IP closing it for the purpose of an attack and thus you would
select a new intro point. But, it could also be that the middle node died...
That one has many false positive.

Soo, to repeat what I first said at the beginning, today an HSv3 will
_always_ retry up to 3 times regardless of the reason why the circuit
collapsed.

Should that behavior get more refined with the network failed vs circuit close
argument? Should we stop at once retrying? Should we change the retry behavior
client side to better match the latency of the mobile use case?

Whatever we decide, most importantly, we need to 

Re: [tor-dev] HSv3 descriptor work in stem

2019-10-29 Thread George Kadianakis
George Kadianakis  writes:

> Damian Johnson  writes:
>
>> Thanks George! Yup, work on that branch is in progress:
>>
>> https://gitweb.torproject.org/user/atagar/stem.git/log/?h=hsv3
>
> Hello Damian,
>
> thanks for the reply here! I'm now back and ready to start working again
> on onionbalance/stem.
>
> What is your plan with the hsv3 branch? Should I start reviewing your
> changes already, or give you more time to do more?
>
> Thanks a lot for all the work! :)

Hello again,

I took a super quick look (particularly at the easy parts of your
changes). Thanks for all the changes!

My only feedback so far is that the python2 port commits have broken
python3 for me (particularly the ed25519 blinding implementation). In
general, the ed25519 blinding implementation is very hairy Python3
crypto code and it won't be easy to support both versions I think.

Would it be egregious to provide hsv3 support only for python3 users so
that we can use python3 features as we wish?

I personally plan to use HSv3 support for onionbalance and that will be
in python3, so I wouldn't mind that at all. Not sure who else is gonna
use hsv3 support in the near future.

Cheers!

PS: From now on perhaps we can use #31823 for code related discussions
(sorry for the medium mixing)
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev