Re: [tor-dev] Understanding bwauth data in Stem?

2014-12-09 Thread Karsten Loesing
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/12/14 00:26, Anna Kornfeld Simpson wrote:
 Thanks all for the responses!
 
 On Fri, Nov 21, 2014 at 4:53 PM, Sebastian Hahn
 sebast...@torproject.org wrote:
 
 Hi there,
 
 On 21 Nov 2014, at 23:44, Damian Johnson ata...@torproject.org
 wrote:
 In other words, if I sorted the descriptors by measured
 value, what
 would
 that order mean?
 
 I *think* that would be the ordering of 'relays who receive the
 most tor client traffic due to having a more highly weighted
 heuristic for relay selection'.
 
 that would be accurate, is my understanding
 
 
 Is there documentation of why this heuristic for relay selection
 does not correlate that well with bandwidth in the descriptor?
 I've attached a couple of scatter plots pulled from moria1's
 measured and bandwidth values for each descriptor a couple
 hours ago (and the plots look similar from the other bwauths).  One
 shows all values, the other shows the bottom 75% of values (sorted
 by measurements), and neither shows as much of a correlation as I
 would expect.  Are there factors other than bandwidth that 
 contribute to this heuristic for relay selection?

Hi Anna,

I don't have answers, but maybe ideas for further investigations:

 - Not sure if this was mentioned before, but did you take a look at
the spec?
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt

 - Maybe try removing bandwidth values close to 1, or just values
exactly at 1.  IIRC, values are capped at that value.  (Removing
just those values may be more accurate than removing the top 25%.)

 - Very small bandwidth values might be the result from newly started
or restarted relays.  (Advertised) bandwidth values are the volume of
traffic, both incoming and outgoing, that a relay is willing to
sustain, as configured by the operator and claimed to be observed from
recent data transfers.  If a relay didn't observe larger data
transfers, the reported bandwidth value will be small, but still the
(past) measurements might be large.  Maybe compare this for single
relays over time.

 - There's an interesting pattern at 1024 (?) kB/s.  Maybe there are
more at 512 kB/s and others.  Can you reduce the amount of
overplotting in the graph?  In R/ggplot2, you'd set the alpha value
to something smaller than 1, so that dots become somewhat transparent.
 Could be that these patterns are normal, because operators tend to
pick certain bandwidth rates more often than others.

All the best,
Karsten

-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJUhqytAAoJEJd5OEYhk8hI//UH/Re5nPKAClCMc919YFxtwBsk
o5dkCvh7a3fK0G9LOakuHunxNeXpJYrNJHlhA9djYeUKDL54DfzJFytiA80pkdNV
jaw3EC00oWsS04S29fBAZVsnRRm8neR16hraL3ULgxYAgMLxUy8XOAzAlO4lHmxh
+3aROoAytSvVHgsdwFd7ltRBtG7/NrIJmOxlNGWn8QlG9UYW4QsUYrl56Ghj0alQ
3+J1FIPYNXH0BH+t1CDM1jfjm84WbUTe/WPsXn7e1pWWUOOJOFYyIF9A41KGbJOZ
HKRni9lyV1sdfRi8xrdOigZTcN6yHyW9U119kPg8x3/PEAJqmrJGRw9//PQHqdk=
=Gm4F
-END PGP SIGNATURE-
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Understanding bwauth data in Stem?

2014-11-22 Thread Karsten Loesing
On 21/11/14 23:44, Damian Johnson wrote:
 Agreed, it would be nice for CollecTor to have bandwidth authority
 information. However, with a few small exceptions (like rdns and geoip
 lookups) CollecTor is simply a distilled version of what's in the
 consensus. That is to say, by directly collecting descriptor information
 like you are you're already have a superset of what CollecTor provides.

Minor clarification: I think you're confusing Onionoo with CollecTor.

Onionoo indeed distills data obtained from CollecTor and adds things
like rDNS and GeoIP lookups.  But Onionoo is probably not the right tool
for researching bandwidth authorities.

CollecTor simply fetches descriptors from the directory authorities and
makes them available.  There's no difference between using Stem to fetch
recent votes or using votes from CollecTor's descriptor archives.

All the best,
Karsten

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Understanding bwauth data in Stem?

2014-11-22 Thread Karsten Loesing
On 22/11/14 01:53, Sebastian Hahn wrote:
 On 21 Nov 2014, at 23:44, Damian Johnson ata...@torproject.org wrote:
 Separately, is there a way (using Stem or some other tool) to see the raw
 bwauth measurements rather than the weights?

 I don't believe this is exposed anywhere, so only the bandwidth authority
 operators have this. And by 'have' I mean 'maybe in their logs, or possibly
 not even surfaced at all'.
 
 We could publish those. Let's ask karsten if he thinks that'd be worthwhile?

Mike and I discussed this a few years ago in the context of a
deliverable where we were supposed to get bwauth and torperf data up on
metrics.tp.o (#2394, #2534):

https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorF/Year1

I'm afraid I can't find Mike's exact statement anymore, but it was
something along: the bandwidth authority measurement setup is so
artificial, I can't imagine how anyone would use the raw measurement
data for anything useful.  I cc'ed Mike to correct me if my memory is
wrong, or to say if he changed his opinion.

That being said, I don't see any harm in publishing raw bandwidth
authority data if it's for research purposes.  Let's just not create the
whole infrastructure for making the data available on CollecTor, at
least until we're more certain that that's worthwhile.

I'm also cc'ing Aaron who has worked quite a bit on the bandwidth
authority code.

All the best,
Karsten

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Understanding bwauth data in Stem?

2014-11-21 Thread Sebastian Hahn
Hi there,

On 21 Nov 2014, at 23:44, Damian Johnson ata...@torproject.org wrote:
 In other words, if I sorted the descriptors by measured value, what would
 that order mean?
 
 I *think* that would be the ordering of 'relays who receive the most tor
 client traffic due to having a more highly weighted heuristic for relay
 selection'.

that would be accurate, is my understanding

 That said, this is an area I'm honestly not that familiar. I'm looping in
 Sebastian, Karsten, and Roger. As mentioned on irc Sebastian has touched the
 Bandwidth Authorities most recently, so he's likely the most knowledgeable at
 present about this space.

I've tried fixing stuff and have mostly given up. I'm not too familiar with
it, and probably can't help too much. I'll try to answer questions if there
are any, tho.

 Separately, is there a way (using Stem or some other tool) to see the raw
 bwauth measurements rather than the weights?
 
 I don't believe this is exposed anywhere, so only the bandwidth authority
 operators have this. And by 'have' I mean 'maybe in their logs, or possibly
 not even surfaced at all'.

We could publish those. Let's ask karsten if he thinks that'd be worthwhile?

 Is that a calculation I can reverse?
 
 Maybe run a bandwidth authority of your own? This could be a terrible idea.
 Sebastian would know.

You can look at the votes before a consensus was formed, they'll continue the
values for each measuring bwauth. Running your own bwauth might be interesting,
but it's probably not very useful if you want to learn values for the
deployed network.

Cheers
Sebastian
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev