Re: [tor-dev] DirAuth usage and 503 try again later

2021-01-28 Thread Anthony Korte
I have an unrelated question... where could I go with similar minds so that
I may ask or would it be appropriate and acceptable to do that here ,
thanks .

On Mon, Jan 11, 2021, 6:21 PM James  wrote:

> Good day.
>
> Is there any chance that torpy (https://github.com/torpyorg/torpy) was
> triggered this issue
> https://gitlab.torproject.org/tpo/core/tor/-/issues/33018 ?
>
> Some wary facts:
> - Torpy using old fashion consensus (not mircodesc)
> - When consensus not present in cache (first time usage) it downloads
> consensus from random directory authorities only.
> - Before August 2020 it was using plain HTTP requests to DirAuths. Now
> it creates "CREATE_FAST" circuits to DirAuths (is that right way by the
> way?)
>
>  From other side:
> - Torpy store consensus on disk (so whenever client restart it must not
> download full consensus again)
> - It will try download consensus after time which sets by valid_time
> field from consensus which more than 1 hour (so it's not so often)
> - Torpy try get consensus by "diff" feature (so it's minimize traffic)
>
> Still may be some of this features not working well in some conditions.
> Which could cause a lot of consensus downloads in Jan 2020... Or may be
> you know more info about this situation?
>
>
>
> Do you have some recommendations for tor client implementation?
> Can you explain in several paragraphs what behavior of original tor
> client is? As far as I understand when first time original tor starts it
> tries download consensus from fallback dirs not from DA? Is this key point?
>
> There is one more issue
> https://gitlab.torproject.org/tpo/core/tor/-/issues/40239
> which I'm not understand correctly. Let's imagine it's first run of tor
> client and that time coincidentally coincided with DA voting. That means
> client will not be able to download consensus? That is strange decision.
> Or do you mean clients must download consensus from fallback dirs which
> never in "voting" process?
>
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] DirAuth usage and 503 try again later

2021-01-20 Thread Sebastian Hahn



> On 18. Jan 2021, at 18:00, Roger Dingledine  wrote:
> While I was looking at moria1's directory activity during the overload,
> I did say to myself "wow that's a lot of microdescriptor downloads".
>
> So hearing that torpy isn't caching mirodescriptors yet makes me think
> that it's a good bet for explaining our overload last weekend.

The fact that torpy doesn't use microdescriptors makes me think there's
at least some other party involved here. Hopefully they can also improve
their software, but it makes me wonder what that software is :/

Cheers
Sebastian
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] DirAuth usage and 503 try again later

2021-01-18 Thread Roger Dingledine
On Sat, Jan 16, 2021 at 01:56:02AM +0300, James wrote:
> In any case, it seems to me that if there was some high-level description of
> logic for official tor client, it would be very useful.

Hi James! Thanks for starting this discussion.

While I was looking at moria1's directory activity during the overload,
I did say to myself "wow that's a lot of microdescriptor downloads".

So hearing that torpy isn't caching mirodescriptors yet makes me think
that it's a good bet for explaining our overload last weekend.

I agree that we should have clearer docs for "how to be nice to the Tor
network." We actually have an open ticket for that goal but nobody has
worked on it in a while:
https://gitlab.torproject.org/tpo/core/tor/-/issues/7106

Quoting from that ticket:

"""Second, it's easy to make client-side decisions that harm the Tor
network. For examples, you can hold your TLS connections open too long,
or do too many TLS connections, or make circuits too often, or ask the
directory authorities for everything. We need to write up a spec to
clarify how well-behaving Tor clients should do things. Maybe that means
we write up some principles along the way, or maybe we just identify
every design point that matters and say what to do for each of them."""

And in fact, since Nick has been working a lot on Arti lately:
https://gitlab.torproject.org/tpo/core/arti/
it might be a perfect time for him to help document the current Tor
behavior and the current Arti behavior, and we can think about where
there is room for improvement.

>  If you have
> some sort of statistic about increasing traffic we can compare that

Here's the most interesting graph so far:
https://metrics.torproject.org/dirbytes.html

So from that graph, the number of bytes handled by the directory
authorities doesn't go up a lot, because they were already rate limited
(instead, they just failed more often).

But the number of bytes handled by directory mirrors (including
fallbackdirs) shot up a huge amount. For context, if we imagine that
the normal Tor network handles between 2M and 8M daily users, then that
added dir mirror load would imply an extra 4M to 16M daily users if they
follow Tor's directory update habits. I'm guessing that the torpy users
weren't following Tor's directory update habits, and so a much smaller
set of users accounted for a much larger fraction of the load.

> >The logic that if a network_status document was already downloaded that
> >is used rather than trying to download a new one does not work.
> It works. But probably not in optimal way. It caches network_status only.

Here's my first start at three principles we should all follow when
writing Tor clients:

(1) Reduce redundant interactions. For examples:

- Cache as much as possible of the directory information you fetch
(consensus documents, microdescriptors, certs)

- If a directory fetch failed, don't just relaunch a duplicate request
right after (because it will probably fail too).

- If your setup involves running multiple Tors locally, consider using a
shared directory cache, so only one of them needs to fetch new directory
info and then all of them can use it.

(2) Reduce impact of interactions. For examples:

- Always use the "If-Modified-Since" header on consensus updates, so
they don't send you a consensus that you already have.

- Try to use the consensus diff system, so if you have an existing
consensus you aren't fetching an entire new consensus.

- Ask for compression, to save overall bandwidth in the network.

- Move load off of directory authorities, and then off of fallback
directories, as soon as possible. That is, if you have a list of
fallbackdirs, ask them instead of directory authorities. And once
you have a consensus and you've chosen your directory guards, ask
them instead of the fallbackdirs.

(3) Plan ahead for what your current code will do in a few years when
the world is different.

- To start here, check out the "slow zombies and fast zombies" discussion
in Proposal 266:
https://gitweb.torproject.org/torspec.git/tree/proposals/266-removing-current-obsolete-clients.txt

- Specifically, think about how your code handles failures, and design
your interactions with the Tor network so that if many people are running
your code in the future, and it's failing for example because it is
asking directory questions in an old format or because the directory
servers have started rate limiting differently, it will back off rather
than become more aggressive.

- When possible, look for ways to recognize when your code is asking old
questions, so it can warn the user and stop interacting with the network.

...What else should be on the list?

Thanks!
--Roger

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] DirAuth usage and 503 try again later

2021-01-15 Thread Sebastian Hahn
Hi James,

thanks for already working on patches for these issues! I will reply
inline some more.

> On 15. Jan 2021, at 23:56, James  wrote:
> 
> First of all, sorry if torpy hurt in some way Tor Network. It was 
> unintentionally.

I believe you :)

> In any case, it seems to me that if there was some high-level description of 
> logic for official tor client, it would be very useful.

Indeed. The more people work on alternative clients etc, the more we can
learn here. Perhaps you can help point out places where documentation
could help or something was not easy to understand.

> >First, I found this string in the code: "Hardcoded into each Tor client
> >is the information about 10 beefy Tor nodes run by trusted volunteers".
> >The word beefy is definitely wrong here. The nodes are not particularly
> >powerful, which is why we have the fallback dir design for
> >bootstrapping.
> At first glance, it seemed that the AuthDirs were the most trusted and 
> reliable place for obtaining consensus. Now I'm understand more.

The consensus is signed, so all the places to get it from are equally
trusted. That's the beauty of the consensus system :) The dirauths
are just trusted to create it, it doesn't matter who spreads it.

> >Once this
> >happens, torpy goes into a deathly loop of "consensus invalid,
> >trying again". There are no timeouts, backoffs, or failures noted.
> Not really, because torpy has only 3 retries for getting consensus. But 
> probably you are right because user code probably can do retry calling torpy 
> in a loop. So that will always try download network_status... If you have 
> some sort of statistic about increasing traffic we can compare that with time 
> when was consensus signed by 4 signers which enough for tor but not enough 
> for torpy.

Interesting, I ran torpy and on the console it seemed to try more
often. Perhaps it made some progress and then failed on a different
thing, which it then tried again.

To your second point, something like this can probably be done using
https://metrics.torproject.org. But I am not doing the analysis here
at the moment for personal reasons, sorry. Maybe someone else wants
to look at it.

> >The code frequently throws exceptions, but when an exception occurs
> >it just continues doing what it was doing before. It has absolutely
> >no regards to constrain its resources when using the Tor network.
> What kind of constraints can you advise?

I think instead of throwing an exception and continuing, you should
give clear error messages and consider whether you need to stop
execution. For example, if you downloaded a consensus and it is
invalid, you're likely not going to get a valid one by trying again
immediately. Instead, it would be better to declare who gave you the
invalid one and log a sensible error.

In addition, properly using already downloaded directory information
would be a much more considerate use of resources.

> >The logic that if a network_status document was already downloaded that
> >is used rather than trying to download a new one does not work.
> It works. But probably not in optimal way. It caches network_status only.

I may have confused it with asking for the diff. But that should not
be necessary at all if you already have the latest one, so don't ask
for a diff in this case.

> >I have
> >a network_status document, but the dirauths are contacted anyway.
> >Perhaps descriptors are not cached to disk and downloaded on every new
> >start of the application?
> 
> Exactly. Descriptors and network_status diff every hour was asking always 
> from AuthDirs.

Please cache descriptors.

> >New consensuses never seem to be downloaded from guards, only from
> >dirauths.
> Thanks for pointing out. I looked more deeply into tor client sources. So 
> basically if we have network_status we can use guard nodes to ask 
> network_status and descriptors from them. Otherwise using fallback dirs to 
> download network_status. I've implemented such logic in last commit.

Cool!

> >- Stop automatically retrying on failure, without backoff
> I've added delays and backoff between retries.
> 
> >- Cache failures to disk to ensure a newly started torpy_cli does not
> >  request the same resources again that the previous instance failed to
> >  get.
> That will be on the list. But probably even if there is a loop level above 
> and without this feature but with backoff it will be delays like: 3 sec, 5, 
> 7, 9; 3, 5, 7, 9. Seems ok?

Well, the problem is if I run torpy_cli in parallel 100 times, we will
still send many requests per second. From dirauth access patterns, we
can see that some people indeed have such access patterns. So I think
the backoff is a great start (tor client uses exponential backoff I
think) but it definitely is not enough. If you couldn't get something
this hour and you tried a few times, you need to stop trying again for
this hour.

> > Defenses are probably necessary to implement even if
> >torpy can be fixed very quickly, because 

[tor-dev] DirAuth usage and 503 try again later

2021-01-15 Thread James

Sebastian,
Thank you for comments.

First of all, sorry if torpy hurt in some way Tor Network. It was 
unintentionally.


In any case, it seems to me that if there was some high-level 
description of logic for official tor client, it would be very useful.


>First, I found this string in the code: "Hardcoded into each Tor client
>is the information about 10 beefy Tor nodes run by trusted volunteers".
>The word beefy is definitely wrong here. The nodes are not particularly
>powerful, which is why we have the fallback dir design for
>bootstrapping.
At first glance, it seemed that the AuthDirs were the most trusted and 
reliable place for obtaining consensus. Now I'm understand more.



>The code counts Serge as a directory authority which signs the
>consensus, and checks that over half of the dirauths signed it. But
>Serge is only the bridge authority and never signs the consensus, so
>torpy will reject some consensuses that are indeed valid.
Yep, here you right. Thanks for pointing out.

>Once this
>happens, torpy goes into a deathly loop of "consensus invalid,
>trying again". There are no timeouts, backoffs, or failures noted.
Not really, because torpy has only 3 retries for getting consensus. But 
probably you are right because user code probably can do retry calling 
torpy in a loop. So that will always try download network_status... If 
you have some sort of statistic about increasing traffic we can compare 
that with time when was consensus signed by 4 signers which enough for 
tor but not enough for torpy.



>The code frequently throws exceptions, but when an exception occurs
>it just continues doing what it was doing before. It has absolutely
>no regards to constrain its resources when using the Tor network.
What kind of constraints can you advise?

>The logic that if a network_status document was already downloaded that
>is used rather than trying to download a new one does not work.
It works. But probably not in optimal way. It caches network_status only.


>I have
>a network_status document, but the dirauths are contacted anyway.
>Perhaps descriptors are not cached to disk and downloaded on every new
>start of the application?

Exactly. Descriptors and network_status diff every hour was asking 
always from AuthDirs.



>New consensuses never seem to be downloaded from guards, only from
>dirauths.
Thanks for pointing out. I looked more deeply into tor client sources. 
So basically if we have network_status we can use guard nodes to ask 
network_status and descriptors from them. Otherwise using fallback dirs 
to download network_status. I've implemented such logic in last commit.



>There are probably more things suboptimal that I missed here.
If you find more please let me know. It really helpful.

>Generally, I think torpy needs to implement the following quickly if it
>wants to stop hurting the network. This is in order of priority, but I
>think _ALL_ (maybe more) are needed before torpy stops being an abuser
>of the network:
>

>- Stop automatically retrying on failure, without backoff
I've added delays and backoff between retries.

>- Cache failures to disk to ensure a newly started torpy_cli does not
>  request the same resources again that the previous instance failed to
>  get.
That will be on the list. But probably even if there is a loop level 
above and without this feature but with backoff it will be delays like: 
3 sec, 5, 7, 9; 3, 5, 7, 9. Seems ok?


>- Fix consensus validation logic to work the same way as tor cli (maybe
>  as easy as removing Serge)
Done. Only auth dirs with V3_DIRINFO flag will be counted. It wasn't 
obvious =(


>- use microdescs/consensus, cache descriptors
On the list.

Moreover, I've switched to using fallback dirs instead of auth dirs and 
to guards if torpy has "reasonable" live consensus.


> Defenses are probably necessary to implement even if
>torpy can be fixed very quickly, because the older versions of torpy 
>are out there and I assume will continue to be used. Hopefully that

>point is wrong?
I believe that old versions doesn't work any more because them could not 
connect to auth dirs. Users getting 503 many times, so they will update 
client. I hope.



Thank you very much. And sorry again.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] DirAuth usage and 503 try again later

2021-01-11 Thread Sebastian Hahn



> On 11. Jan 2021, at 23:20, James  wrote:
> 
> Good day.
> 
> Is there any chance that torpy (https://github.com/torpyorg/torpy) was 
> triggered this issue 
> https://gitlab.torproject.org/tpo/core/tor/-/issues/33018 ?
> 
> Some wary facts:
> - Torpy using old fashion consensus (not mircodesc)
> - When consensus not present in cache (first time usage) it downloads 
> consensus from random directory authorities only.
> - Before August 2020 it was using plain HTTP requests to DirAuths. Now it 
> creates "CREATE_FAST" circuits to DirAuths (is that right way by the way?)
> 
> From other side:
> - Torpy store consensus on disk (so whenever client restart it must not 
> download full consensus again)
> - It will try download consensus after time which sets by valid_time field 
> from consensus which more than 1 hour (so it's not so often)
> - Torpy try get consensus by "diff" feature (so it's minimize traffic)
> 
> Still may be some of this features not working well in some conditions. Which 
> could cause a lot of consensus downloads in Jan 2020... Or may be you know 
> more info about this situation?

Hi there,

thanks for the message. I think it is very likely that torpy is
responsible for a at least a part of the increased load we're seeing on
dirauths. I have taken a (very!) quick look at the source, and it appears
that there are some problems. Please excuse any inaccuracies, I am not
that strong in Python nor have I done too much Tor development recently:

First, I found this string in the code: "Hardcoded into each Tor client
is the information about 10 beefy Tor nodes run by trusted volunteers".
The word beefy is definitely wrong here. The nodes are not particularly
powerful, which is why we have the fallback dir design for
bootstrapping.

The code counts Serge as a directory authority which signs the
consensus, and checks that over half of the dirauths signed it. But
Serge is only the bridge authority and never signs the consensus, so
torpy will reject some consensuses that are indeed valid. Once this
happens, torpy goes into a deathly loop of "consensus invalid,
trying again". There are no timeouts, backoffs, or failures noted.

The code frequently throws exceptions, but when an exception occurs
it just continues doing what it was doing before. It has absolutely
no regards to constrain its resources when using the Tor network.

The logic that if a network_status document was already downloaded that
is used rather than trying to download a new one does not work. I have
a network_status document, but the dirauths are contacted anyway.
Perhaps descriptors are not cached to disk and downloaded on every new
start of the application?

New consensuses never seem to be downloaded from guards, only from
dirauths.

If my analsis above is at least mostly correct, if only some few people
are running a scraper using torpy and call the binary in a loop, they
will quickly overload the dirauths, causing exactly the trouble we're
seeing. The effects compound, because torpy is relentless in trying
again. Especially a scraper that might call torpy in a loop would just
think that a single file failed to download and go to the next, once
again creating load on all the dirauths.

There are probably more things suboptimal that I missed here.
Generally, I think torpy needs to implement the following quickly if it
wants to stop hurting the network. This is in order of priority, but I
think _ALL_ (maybe more) are needed before torpy stops being an abuser
of the network:

- Stop automatically retrying on failure, without backoff
- Cache failures to disk to ensure a newly started torpy_cli does not
  request the same resources again that the previous instance failed to
  get.
- Fix consensus validation logic to work the same way as tor cli (maybe
  as easy as removing Serge)
- use microdescs/consensus, cache descriptors

I wonder if we can actively defend against network abuse like this in
a sensible way. Perhaps you have some ideas, too? I think torpy has the
ability to also quickly overwhelm fallback dirs in its current
implementation, so simply switching to them from dirauths is not a
solution here. Defenses are probably necessary to implement even if
torpy can be fixed very quickly, because the older versions of torpy are
out there and I assume will continue to be used. Hopefully that point
is wrong?

Thanks
Sebastian

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] DirAuth usage and 503 try again later

2021-01-11 Thread James

Good day.

Is there any chance that torpy (https://github.com/torpyorg/torpy) was 
triggered this issue 
https://gitlab.torproject.org/tpo/core/tor/-/issues/33018 ?


Some wary facts:
- Torpy using old fashion consensus (not mircodesc)
- When consensus not present in cache (first time usage) it downloads 
consensus from random directory authorities only.
- Before August 2020 it was using plain HTTP requests to DirAuths. Now 
it creates "CREATE_FAST" circuits to DirAuths (is that right way by the 
way?)


From other side:
- Torpy store consensus on disk (so whenever client restart it must not 
download full consensus again)
- It will try download consensus after time which sets by valid_time 
field from consensus which more than 1 hour (so it's not so often)

- Torpy try get consensus by "diff" feature (so it's minimize traffic)

Still may be some of this features not working well in some conditions. 
Which could cause a lot of consensus downloads in Jan 2020... Or may be 
you know more info about this situation?




Do you have some recommendations for tor client implementation?
Can you explain in several paragraphs what behavior of original tor 
client is? As far as I understand when first time original tor starts it 
tries download consensus from fallback dirs not from DA? Is this key point?


There is one more issue 
https://gitlab.torproject.org/tpo/core/tor/-/issues/40239
which I'm not understand correctly. Let's imagine it's first run of tor 
client and that time coincidentally coincided with DA voting. That means 
client will not be able to download consensus? That is strange decision. 
Or do you mean clients must download consensus from fallback dirs which 
never in "voting" process?


___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev