Re: [tor-relays] Overloaded state indicator on relay-search

2021-09-28 Thread Silvia/Hiro


On 9/28/21 8:40 PM, Toralf Förster wrote:
> On 9/23/21 3:39 PM, Silvia/Hiro wrote:
>> Let us known how you find this new feature.
> It would be nice if even the search form would have that feature too.
> Currently here all is green:
> https://metrics.torproject.org/rs.html#search/zwiebeltoralf
> wherease the details of each of the 2 relays shows the overload indicator.
> 

Yes, good catch. I have just deployed a few minor fixes, among which the
overloaded indicator in the search form. I had the intention to announce
it tomorrow together with a few updates to the support article following
the email threads on the list, but since you mentioned I though you
should know already :))

Cheers,
-hiro


> -- 
> Toralf
> ___
> tor-relays mailing list
> tor-relays@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Overloaded state indicator on relay-search

2021-09-28 Thread Bleedangel Tor Admin
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

I am having a lot of trouble figuring out why my relay keeps showing as 
overloaded on the search page. I believe I have more than enough memory and cpu 
power to not be overloaded on hardware. My server internet connection is 10GB 
up/down unmetered.

I cannot for the life of me figure out why the relay search page continuously 
tells me i am overloaded. 
Can someone assist me in troubleshooting this?

Thank you. Pertinent hardware information is pasted below:

output of /proc/meminfo:

--[ BEGIN PASTE ]--

MemTotal:   65777296 kB
MemFree:    63088388 kB
MemAvailable:   63415088 kB
Buffers:  180096 kB
Cached:   736396 kB
SwapCached:    0 kB
Active:   449428 kB
Inactive:    1729304 kB
Active(anon):  14552 kB
Inactive(anon):  1292048 kB
Active(file): 434876 kB
Inactive(file):   437256 kB
Unevictable:   0 kB
Mlocked:   0 kB
SwapTotal:  33520636 kB
SwapFree:   33520636 kB
Dirty:   236 kB
Writeback: 0 kB
AnonPages:   1262300 kB
Mapped:   273164 kB
Shmem: 48592 kB
KReclaimable:  94828 kB
Slab: 204624 kB
SReclaimable:  94828 kB
SUnreclaim:   109796 kB
KernelStack:    5040 kB
PageTables: 9308 kB
NFS_Unstable:  0 kB
Bounce:    0 kB
WritebackTmp:  0 kB
CommitLimit:    66409284 kB
Committed_AS:    2374432 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   34348 kB
VmallocChunk:  0 kB
Percpu:    16384 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages:    0 kB
ShmemPmdMapped:    0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
CmaTotal:  0 kB
CmaFree:   0 kB
HugePages_Total:   0
HugePages_Free:    0
HugePages_Rsvd:    0
HugePages_Surp:    0
Hugepagesize:   2048 kB
Hugetlb:   0 kB
DirectMap4k:  332988 kB
DirectMap2M: 7981056 kB
DirectMap1G:    58720256 kB
--[ END PASTE ]--

output of /proc/cpuinfo:

--[ BEGIN PASTE ]--
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 23
model   : 113
model name  : AMD Ryzen 5 3600 6-Core Processor
stepping    : 0
microcode   : 0x8701021
cpu MHz : 2200.000
cache size  : 512 KB
physical id : 0
siblings    : 12
core id : 0
cpu cores   : 6
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 16
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf 
rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave 
avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext 
perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall 
fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni 
xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total 
cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock 
nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter 
pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov 
succor smca sme sev sev_es
bugs    : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips    : 7202.22
TLB size    : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

processor   : 1
vendor_id   : AuthenticAMD
cpu family  : 23
model   : 113
model name  : AMD Ryzen 5 3600 6-Core Processor
stepping    : 0
microcode   : 0x8701021
cpu MHz : 2200.000
cache size  : 512 KB
physical id : 0
siblings    : 12
core id : 1
cpu cores   : 6
apicid  : 2
initial apicid  : 2
fpu : yes
fpu_exception   : yes
cpuid level : 16
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf 
rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave 
avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext 
perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall 
fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni 
xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total 
cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd 

Re: [tor-relays] Overloaded state indicator on relay-search

2021-09-28 Thread Toralf Förster

On 9/23/21 3:39 PM, Silvia/Hiro wrote:

Let us known how you find this new feature.

It would be nice if even the search form would have that feature too.
Currently here all is green:
https://metrics.torproject.org/rs.html#search/zwiebeltoralf
wherease the details of each of the 2 relays shows the overload indicator.

--
Toralf
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Overloaded state indicator on relay-search

2021-09-28 Thread Gary C. New via tor-relays
 David,
This is exactly the type of information I was hoping for. You should make this 
an article and link it to the overloaded support page.
I guess I assumed that Tor preformed external timeout monitoring apposed to 
relay reported resource monitoring.
It's interesting that you mention loadbalancing Tor as that is precisely what 
my recent efforts have been geared toward.
I'm fairly confident that my last overloaded state was due to migrating one of 
my Tor relay nodes onto a previously provisioned BotFarm node and forgetting to 
kill the existing bot processes; thus, having competing resources. I can 
confirm that when loadbalancing Tor relay nodes that the whole is only as good 
as the weakest link; thus, it's important to have identical Tor relay nodes to 
evenly distribute circuits and maintain consensus.
In this paradigm, I was hoping to be able to define a timeout value associated 
with the overloaded state and tune the loadbalancer to redistribute to 
different upstream nodes should a Tor relay node reach such a value. However, 
it seems this is a moot point, after reading your summary of the reporting 
process.
At present, I have the upstream, loadbalancing timeout values disabled and let 
the Tor nodes build or teardown circuits based on available resources per node. 
I do see spikes alternate through various nodes throughout the day. It would be 
nice to find an upstream timeout value to better manage those spikes.
Any recommendations would be greatly appreciated.
Respectfully,

Gary
P.S. This is all being done on ASUSWRT-Merlin using AiMesh nodes, but isn't 
limited to that architecture. I hope to publish a tutorial, after ironing out 
all the kinks.

On Tuesday, September 28, 2021, 7:01:04 AM MDT, David Goulet 
 wrote:  
 
 On 27 Sep (14:23:34), Gary C. New via tor-relays wrote:

>  George,
> The referenced support article provides recommendations as to what might be
> causing the overloaded state, but it doesn't provide the metric(s) for how
> Tor decides whether a relay is overloaded. I'm trying to ascertain the
> later.  I would assume the overloaded state metric(S) is/are a maximum
> timeout value and/or reoccurrence value, etc.  By knowing what the
> overloaded state metric is, I can tune my Tor relay just short of it.  Thank
> you for your reply.  Respectfully,

Hi Gary!

I'll try to answer the best I can from what we've have worked on for these
overload metrics.

Essentially, there few places within a Tor relay that we can easily notice an
"overloaded" state. I'll list them and tell you how we decide:

1. Out-Of-Memory invocation

  Tor has its own OOM and it is invoked when 75% of the total memory tor
  thinks it can use is reached. Thus, let say tor thinks it can use 2GB in
  total then at 1.5GB of memory usage, it will start freeing memory. That, is
  considered an overload state.

  Now the real question here is what is the memory "tor thinks" it has.
  Unfortunately, it is not the greatest estimation but that is what it is.
  When tor starts, it will use MaxMemInQueues for that value or will look at
  the total RAM available on the system and apply this algorithm:

    if RAM >= 8GB {
      memory = RAM * 40%
    } else {
      memory = RAM * 75%
    }
    /* Capped. */
    memory = min(memory, 8GB) -> [8GB on 64bit and 2GB on 32bit)
    /* Minimum value. */
    memory = max(250MB, memory)

  Why we picked those numbers, I can't tell you that, these come from the very
  early days of the tor software and I can't tell you why.

  And so to avoid such overload state, clearly run a relay above 2GB of RAM on
  64bit should be the bare minimum in my opinion. 4GB would be much much
  better. In DDoS circumstances, there is a whole lot of memory pressure.

  One keen observer can notice that this approach also has the problem that it
  doesn't shield tor from being called by the OS OOM itself. Reason is that
  because we take the total memory on the system when tor starts, if the
  overall system has many other applications running using RAM, we end up
  eating too much memory and the OS could OOM tor without tor even noticing
  memory pressure. Fortunately, this is not a problem affecting the overload
  status situation.

2. Onionskins processing

  Tor is sadly single threaded _except_ for when the "onion skins" are
  processed that is the cryptographic work that needs to be done on the famous
  "onion layers" in every circuits.

  For that we have a thread pool and outsource all of that work to that pool.
  It can happen that this pool starts dropping work due to back pressure and
  that in turn is an overload state.

  Why this can happen, essentially CPU pressure. If your server is running at
  capacity and it is not only your tor, then this is likely to trigger.

3. DNS Timeout

  This applies only to Exits. If tor starts noticing DNS timeouts, you'll get
  the overload flag. This might not be because your relay is overloaded in
  terms of resources but it signals a problem on

Re: [tor-relays] Overloaded state indicator on relay-search

2021-09-28 Thread David Goulet
On 27 Sep (14:23:34), Gary C. New via tor-relays wrote:

>  George,
> The referenced support article provides recommendations as to what might be
> causing the overloaded state, but it doesn't provide the metric(s) for how
> Tor decides whether a relay is overloaded. I'm trying to ascertain the
> later.  I would assume the overloaded state metric(S) is/are a maximum
> timeout value and/or reoccurrence value, etc.  By knowing what the
> overloaded state metric is, I can tune my Tor relay just short of it.  Thank
> you for your reply.  Respectfully,

Hi Gary!

I'll try to answer the best I can from what we've have worked on for these
overload metrics.

Essentially, there few places within a Tor relay that we can easily notice an
"overloaded" state. I'll list them and tell you how we decide:

1. Out-Of-Memory invocation

  Tor has its own OOM and it is invoked when 75% of the total memory tor
  thinks it can use is reached. Thus, let say tor thinks it can use 2GB in
  total then at 1.5GB of memory usage, it will start freeing memory. That, is
  considered an overload state.

  Now the real question here is what is the memory "tor thinks" it has.
  Unfortunately, it is not the greatest estimation but that is what it is.
  When tor starts, it will use MaxMemInQueues for that value or will look at
  the total RAM available on the system and apply this algorithm:

if RAM >= 8GB {
  memory = RAM * 40%
} else {
  memory = RAM * 75%
}
/* Capped. */
memory = min(memory, 8GB) -> [8GB on 64bit and 2GB on 32bit)
/* Minimum value. */
memory = max(250MB, memory)

  Why we picked those numbers, I can't tell you that, these come from the very
  early days of the tor software and I can't tell you why.

  And so to avoid such overload state, clearly run a relay above 2GB of RAM on
  64bit should be the bare minimum in my opinion. 4GB would be much much
  better. In DDoS circumstances, there is a whole lot of memory pressure.

  One keen observer can notice that this approach also has the problem that it
  doesn't shield tor from being called by the OS OOM itself. Reason is that
  because we take the total memory on the system when tor starts, if the
  overall system has many other applications running using RAM, we end up
  eating too much memory and the OS could OOM tor without tor even noticing
  memory pressure. Fortunately, this is not a problem affecting the overload
  status situation.

2. Onionskins processing

  Tor is sadly single threaded _except_ for when the "onion skins" are
  processed that is the cryptographic work that needs to be done on the famous
  "onion layers" in every circuits.

  For that we have a thread pool and outsource all of that work to that pool.
  It can happen that this pool starts dropping work due to back pressure and
  that in turn is an overload state.

  Why this can happen, essentially CPU pressure. If your server is running at
  capacity and it is not only your tor, then this is likely to trigger.

3. DNS Timeout

  This applies only to Exits. If tor starts noticing DNS timeouts, you'll get
  the overload flag. This might not be because your relay is overloaded in
  terms of resources but it signals a problem on the network.

  And DNS timeouts at the Exits are a _huge_ UX problem for tor users and so
  Exit operators really need to be on top of those to help. It is not clear
  with the overload line but at least if an operator notices the overload
  line, it can then investigate DNS timeouts in case there is no resources
  pressure.

4. TCP port exhaustion

  This should be extremely rare though. The idea about this one is that you
  ran out of TCP ports which is a range that is usually, on Linux, 32768-60999
  and so having that many connections would lead to the overload state.

  However, I think (I might be wrong though) that nowadays this range is per
  source IP and not process wide so it would likely have to be deliberate from
  someone to put your relay in that state.


There are two other overload lines that tor relay report:
"overload-ratelimits" and "overload-fd-exhausted" but they are not used yet
for the overload status on Metrics. But you can find them in your relay
descriptor[0] if you are curious.

They are about when your relay reaches its connection global limit too often
and when your relay runs out of file descriptors.

Hope this helps but overall as you can see, a lot of factor can influence
these metrics and so the ideal ideal ideal situation for a tor relay is that
it runs alone on a fairly good machine. Any kinds of pullback from a tor relay
like being overloaded has cascading effects on the network both in terms of UX
but also in terms of load balancing which tor is not yet very good at (but we
are working on hard on making it much better!!).

Cheers!
David

[0]

https://collector.torproject.org/recent/relay-descriptors/server-descriptors/?C=M;O=D

-- 
+7Xz1XCshqTyudrO7K4kGBEl+NghDNbqiTGYZpSOw4U=


signature.asc
Description: