commit 5a44410af282b3193298518783e492e1a3cdae4b Author: juga0 <j...@riseup.net> Date: Wed May 20 13:21:23 2020 +0000
fix: doc: Update/clarify Torflow aggregation --- docs/source/torflow_aggr.rst | 351 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 297 insertions(+), 54 deletions(-) diff --git a/docs/source/torflow_aggr.rst b/docs/source/torflow_aggr.rst index 49441b4..10b12a8 100644 --- a/docs/source/torflow_aggr.rst +++ b/docs/source/torflow_aggr.rst @@ -3,27 +3,34 @@ Torflow measurements aggregation ================================== -From Torflow's README.spec.txt (section 2.2):: +Torflow aggregation or scaling goal is: + +From Torflow's `README.spec.txt`_ (section 2.2):: In this way, the resulting network status consensus bandwidth values are effectively re-weighted proportional to how much faster the node was as compared to the rest of the network. -The variables and steps used in Torflow: +With and without PID control +---------------------------- + +Per relay measurements' bandwidth +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -**strm_bw**:: +They are calculated in the same way whether or not `PID controller`_ feedback +is used. + +From Torflow's `README.spec.txt`_ (section 1.6):: The strm_bw field is the average (mean) of all the streams for the relay identified by the fingerprint field. - strm_bw = sum(bw stream x)/|n stream| - -**filt_bw**:: The filt_bw field is computed similarly, but only the streams equal to or greater than the strm_bw are counted in order to filter very slow streams due to slow node pairings. -**filt_sbw and strm_sbw**:: +In the code, `SQLSupport.py`_, ``strm_bw`` is ``sbw`` and +``filt_bw`` is ``filt_sbws``:: for rs in RouterStats.query.filter(stats_clause).\ options(eagerload_all('router.streams.circuit.routers')).all(): @@ -46,36 +53,73 @@ The variables and steps used in Torflow: if sbw_cnt: rs.filt_sbw = tot_sbw/sbw_cnt else: rs.filt_sbw = None -**filt_avg, and strm_avg**:: +This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4 +step 1. + +Calling ``bw_i`` to ``strm_bw`` and ``bwfilt_i`` to ``filt_bw``, +if ``bw_j`` is a measurement for a relay ``i`` and ``m`` is the number of +measurements for that relay, then: + +.. math:: + + bw_i = \mu(bw_j) = \frac{\sum_{j=1}^{m}bw_j}{m} + +.. math:: + + bwfilt_i &= \mu(max(\mu(bw_j), bw_j)) + = \frac{\sum_{j=1}^{m} max\left(\frac{\sum_{j=1}^{m}bw_j}{m}, bw_j\right)}{m} + +Network measurements' bandwidth average +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +From `README.spec.txt`_ (section 2.1):: Once we have determined the most recent measurements for each node, we compute an average of the filt_bw fields over all nodes we have measured. -:: +In Torflow's `aggregate.py`_ code:: filt_avg = sum(map(lambda n: n.filt_bw, nodes.itervalues()))/float(len(nodes)) strm_avg = sum(map(lambda n: n.strm_bw, nodes.itervalues()))/float(len(nodes)) -**true_filt_avg and true_strm_avg**:: +Both in the code with PID and without, all types of nodes get the same +average. - for cl in ["Guard+Exit", "Guard", "Exit", "Middle"]: - true_filt_avg[cl] = filt_avg - true_strm_avg[cl] = strm_avg +This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4 +step 2. -In the non-pid case, all types of nodes get the same avg +Calling ``bwstrm`` to ``strm_avg`` and ``bwfilt`` to ``fitl_avg``, if ``n`` is +the number of relays in the network, then: -**n.fbw_ratio and n.fsw_ratio**:: +.. math:: + + bwstrm &= \mu(bw_i) + = \frac{\sum_{i=1}^{n}bw_i}{n} + = \frac{\sum_{i=1}^{n} \frac{\sum_{j=1}^{m}bw_j}{m} }{n} + +.. math:: + + bwfilt &= \mu(bwfilt_i) + = \frac{\sum_{i=1}^{n}bwfilt_i}{n} + = \frac{\sum_{i=1}^{n}\frac{\sum_{j=1}^{m}max(\mu(bw_j), bw_j)}{m}}{n} + = \frac{\sum_{i=1}^{n}\frac{\sum_{j=1}^{m}max\left(\frac{\sum_{j=1}^{m}bw_j}{m}, bw_j\right)}{m}}{n} - for n in nodes.itervalues(): - n.fbw_ratio = n.filt_bw/true_filt_avg[n.node_class()] - n.sbw_ratio = n.strm_bw/true_strm_avg[n.node_class()] -**n.ratio**:: +Per relay bandwidth ratio +~~~~~~~~~~~~~~~~~~~~~~~~~ + +From `README.spec.txt`_ (section 2.2):: These averages are used to produce ratios for each node by dividing the measured value for that node by the network average. -:: +In Torflow's `aggregate.py`_ code:: + + for n in nodes.itervalues(): + n.fbw_ratio = n.filt_bw/true_filt_avg[n.node_class()] + n.sbw_ratio = n.strm_bw/true_strm_avg[n.node_class()] + + [snip] # Choose the larger between sbw and fbw if n.sbw_ratio > n.fbw_ratio: @@ -83,68 +127,267 @@ In the non-pid case, all types of nodes get the same avg else: n.ratio = n.fbw_ratio -**desc_bw**: +This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4 +step 2 and 3. -It is the minimum of all the descriptor bandwidth values:: +Calling ``rf_i`` to ``fbw_ratio`` and ``rs_i`` to ``sbw_ration`` and ``r_i`` +to ``ratio``: + +.. math:: + + rf_i = \frac{bwfilt_i}{bwfilt} + + rs_i = \frac{bw_i}{bwstrm} + + +.. math:: + + r_i = max(rf_i, rs_i) + = max\left(\frac{bwfilt_i}{bwfilt}, \frac{bw_i}{bwstrm}\right) + = max\left(\frac{bwfilt_i}{\mu(bwfilt_i)}, \frac{bw_i}{\mu(bwfilt_i)}\right) + +Per relay descriptors bandwidth +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +From `TorCtl.py`_ code, it is the minimum of all the descriptor bandwidth +values:: bws = map(int, g) bw_observed = min(bws) + [snip] + return Router(ns.idhex, ns.nickname, bw_observed, dead, exitpolicy, ns.flags, ip, version, os, uptime, published, contact, rate_limited, ns.orhash, ns.bandwidth, extra_info_digest, ns.unmeasured) +Because of the matched regular expression, ``bws`` is **not** all the descriptor +bandwidth values, but the observed bandwidth and the burst bandwidth, ie., it +does not take the average bandwidth, what seems to be a bug in Torflow. + +This is passed to ``Router``, in which the consensus bandwidth is assigned to the +descriptor bandwidth when there is no consensus bandwidth:: + + (idhex, name, bw, down, exitpolicy, flags, ip, version, os, uptime, + published, contact, rate_limited, orhash, + ns_bandwidth,extra_info_digest,unmeasured) = args + + [snip] + + if ns_bandwidth != None: + self.bw = max(ns_bandwidth,1) # Avoid div by 0 + else: + self.bw = max(bw,1) # Avoid div by 0 + + [snip] + self.desc_bw = max(bw,1) # Avoid div by 0 -**new_bw**:: +And written by `SQLSupport.py`_ as descriptor and conensus bandwidth:: + + f.write(" desc_bw="+str(int(cvt(s.avg_desc_bw,0)))) + f.write(" ns_bw="+str(int(cvt(s.avg_bw,0)))+"\n") + +Without PID control +------------------- + +Per relay scaled bandwidth +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +From `README.spec.txt`_ (section 2.2):: These ratios are then multiplied by the most recent observed descriptor bandwidth we have available for each node, to produce a new value for the network status consensus process. -:: +In `aggregate.py`_ code:: n.new_bw = n.desc_bw*n.ratio -The descriptor observed bandwidth is multiplied by the ratio. +This is also expressed in pseudocode in the `bandwidth file spec`_, section B.4 +step 5. + +Calling ``bwnew_i`` to ``new_bw`` and ``descbw_i`` to ``use_bw``: + +.. math:: + + descbw_i = min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) + + bwnew_i =& descbw_i \times r_i \ + + &= min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times max(rf_i, rs_i) \ -**Limit the bandwidth to a maximum**:: + &= min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times max\left(\frac{bwfilt_i}{bwfilt}, \frac{bw_i}{bwstrm}\right) \ + + &= min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times max\left(\frac{bwfilt_i}{\mu(bwfilt_i)}, \frac{bw_i}{\mu(bw_i)}\right) + + +With PID control +---------------- + +Per relay descriptors bandwidth +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Even though `README.spec.txt`_ talks about the consensus bandwidth, in +`aggregate.py`_ code, the consensus bandwidth is never used, since +``use_desc_bw`` is initialized to True and never changed:: + + self.use_desc_bw = True + + [snip] + + if cs_junk.bwauth_pid_control: + if cs_junk.use_desc_bw: + n.use_bw = n.desc_bw + else: + n.use_bw = n.ns_bw + +Per relay scaled bandwidth +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +From `README.spec.txt`_ section 3.1:: + + The bandwidth authorities measure F_node: the filtered stream + capacity through a given node (filtering is described in Section 1.6). + + [snip] + + pid_error = e(t) = (F_node - F_avg)/F_avg. + + [snip] + + new_consensus_bw = old_consensus_bw + + old_consensus_bw * K_p * e(t) + + old_consensus_bw * K_i * \integral{e(t)} + + old_consensus_bw * K_d * \derivative{e(t)} + + [snip] + + For the case where K_p = 1, K_i=0, and K_d=0, it can be seen that this + system is equivalent to the one defined in 2.2, except using consensus + bandwidth instead of descriptor bandwidth: + + new_bw = old_bw + old_bw*e(t) + new_bw = old_bw + old_bw*(F_node/F_avg - 1) + new_bw = old_bw*F_node/F_avg + new_bw = old_bw*ratio + +In Torflow's code, this is actually the case and most of the code is not +executed because the default ``K`` values. + +It seems then that ``F_node`` is ``filt_bw`` in Torflow's code or ``bwfilt_i`` +here, and ``F_avg`` is ``filt_avg`` in Torflow's code or ``bwfilt`` here. + +In `aggregate.py`_ code, pid error also depends on which of the ratios is +greater:: + + if cs_junk.use_best_ratio and n.sbw_ratio > n.fbw_ratio: + n.pid_error = (n.strm_bw - true_strm_avg[n.node_class()]) \ + / true_strm_avg[n.node_class()] + else: + n.pid_error = (n.filt_bw - true_filt_avg[n.node_class()]) \ + / true_filt_avg[n.node_class()] + + [snip] + + n.new_bw = n.use_bw + cs_junk.K_p*n.use_bw*n.pid_error + +Calling ``e_i`` to ``pid_error``, in the case that ``rs_i`` > ``rf_i``: + +.. math:: + + e_i = \frac{bw_i - bwstrm}{bwstrm} = \frac{bw_i}{bwstrm} - 1 + + bwn_i = descbw_i + descbw_i \times e_i = descbw_i \times (1 + e_i) + = descbw_i \times (1 + \frac{bw_i}{bwstrm} - 1) + = descbw_i \times \frac{bw_i}{bwstrm} = descbw_i \times rs_i + +And in the case that ``rs_i`` < ``rf_i``: + +.. math:: + + e_i = \frac{bwfilt_i - bwfilt}{bwfilt} = \frac{bwfilt_i}{bwfilt} - 1 + + bwn_i = descbw_i + descbw_i \times e_i = descbw_i \times (1 + e_i) + = descbw_i \times (1 + \frac{bwfilt_i}{bwfilt} - 1) + = descbw_i \times \frac{bwfilt_i}{bwfilt} = descbw_i \times rf_i + +So, it is the same as the scaled bandwidth in the case without PID controller, +ie.: + +.. math:: + + bwn_i = descbw_i \times max(rf_i, rs_i) + +With and without PID control +---------------------------- + +Per relay scaled bandwidth limit +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Once each relay bandwidth is scaled, it is limited to a maximum, that is +calculated as the sum of all the relays in the current consensus scaled +bandwidth per 0.05. + +From `aggregate.py`_ code:: NODE_CAP = 0.05 -:: + [snip] + + if n.idhex in prev_consensus: + if prev_consensus[n.idhex].bandwidth != None: + prev_consensus[n.idhex].measured = True + tot_net_bw += n.new_bw + + [snip] if n.new_bw > tot_net_bw*NODE_CAP: - plog("INFO", "Clipping extremely fast "+n.node_class()+" node "+n.idhex+"="+n.nick+ - " at "+str(100*NODE_CAP)+"% of network capacity ("+ - str(n.new_bw)+"->"+str(int(tot_net_bw*NODE_CAP))+") "+ - " pid_error="+str(n.pid_error)+ - " pid_error_sum="+str(n.pid_error_sum)) + [snip] n.new_bw = int(tot_net_bw*NODE_CAP) -However, tot_net_bw does not seems to be updated when not using pid. -This clipping would make faster relays to all have the same value. -All of that can be expressed as: +.. math:: + + bwn_i =& min\left(bwnew_i, + \sum_{i=1}^{n}bwnew_i \times 0.05\right) \ + + &= min\left( + \left(min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times r_i\right), + \sum_{i=1}^{n}\left(min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times r_i\right) + \times 0.05\right)\ + + &= min\left( + \left(min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times max\left(rf_i, rs_i\right)\right), + \sum_{i=1}^{n}\left(min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times + max\left(rf_i, rs_i\right)\right) \times 0.05\right)\ + + &= min\left( + \left(min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times max\left(\frac{bwfilt_i}{bwfilt}, + \frac{bw_i}{bwstrm}\right)\right), + \sum_{i=1}^{n}\left(min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times + max\left(\frac{bwfilt_i}{bwfilt}, + \frac{bw_i}{bwstrm}\right)\right) \times 0.05\right) .. math:: - bwn_i =& min\\left(bwnew_i, - \\sum_{i=1}^{n}bwnew_i \\times 0.05\\right) \\ - - &= min\\left( - \\left(min\\left(bwobs_i, bwavg_i, bwbur_i \\right) \\times r_i\\right), - \\sum_{i=1}^{n}\\left(min\\left(bwobs_i, bwavg_i, bwbur_i \\right) \\times r_i\\right) - \\times 0.05\\right)\\ - - &= min\\left( - \\left(min\\left(bwobs_i, bwavg_i, bwbur_i \\right) \\times max\\left(rf_i, rs_i\\right)\\right), - \\sum_{i=1}^{n}\\left(min\\left(bwobs_i, bwavg_i, bwbur_i \\right) \\times - max\\left(rf_i, rs_i\\right)\\right) \\times 0.05\\right)\\ - - &= min\\left( - \\left(min\\left(bwobs_i, bwavg_i, bwbur_i \\right) \\times max\\left(\\frac{bwfilt_i}{bwfilt}, - \\frac{bw_i}{bwstrm}\\right)\\right), - \\sum_{i=1}^{n}\\left(min\\left(bwobs_i, bwavg_i, bwbur_i \\right) \\times - max\\left(\\frac{bwfilt_i}{bwfilt}, - \\frac{bw_i}{bwstrm}\\right)\\right) \\times 0.05\\right) + bwn_i = min\left( + \left(min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times max\left(\frac{bwfilt_i}{bwfilt}, + \frac{bw_i}{bwstrm}\right)\right), + \sum_{i=1}^{n}\left(min\left(bwobs_i, bwavg_i, bwburst_i, measuredconsensusbw_i \right) \times + max\left(\frac{bwfilt_i}{bwfilt}, + \frac{bw_i}{bwstrm}\right)\right) \times 0.05\right) + + +Per relay scaled bandwidth rounding +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Finally, the new scaled bandwidth is expressed in kilobytes and rounded a number +of digits. + +.. _README.spec.txt: https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt +.. _PID Controller: https://en.wikipedia.org/wiki/PID_controller +.. _SQLSupport.py: https://gitweb.torproject.org/pytorctl.git/tree/SQLSupport.py#n493 +.. _bandwidth file spec: https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt +.. _aggregate.py: https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/aggregate.py +.. _TorCtly.py: https://gitweb.torproject.org/pytorctl.git/tree/TorCtl.py _______________________________________________ tor-commits mailing list tor-commits@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-commits