Re: [tor-dev] Understanding bwauth data in Stem?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/12/14 00:26, Anna Kornfeld Simpson wrote: Thanks all for the responses! On Fri, Nov 21, 2014 at 4:53 PM, Sebastian Hahn sebast...@torproject.org wrote: Hi there, On 21 Nov 2014, at 23:44, Damian Johnson ata...@torproject.org wrote: In other words, if I sorted the descriptors by measured value, what would that order mean? I *think* that would be the ordering of 'relays who receive the most tor client traffic due to having a more highly weighted heuristic for relay selection'. that would be accurate, is my understanding Is there documentation of why this heuristic for relay selection does not correlate that well with bandwidth in the descriptor? I've attached a couple of scatter plots pulled from moria1's measured and bandwidth values for each descriptor a couple hours ago (and the plots look similar from the other bwauths). One shows all values, the other shows the bottom 75% of values (sorted by measurements), and neither shows as much of a correlation as I would expect. Are there factors other than bandwidth that contribute to this heuristic for relay selection? Hi Anna, I don't have answers, but maybe ideas for further investigations: - Not sure if this was mentioned before, but did you take a look at the spec? https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt - Maybe try removing bandwidth values close to 1, or just values exactly at 1. IIRC, values are capped at that value. (Removing just those values may be more accurate than removing the top 25%.) - Very small bandwidth values might be the result from newly started or restarted relays. (Advertised) bandwidth values are the volume of traffic, both incoming and outgoing, that a relay is willing to sustain, as configured by the operator and claimed to be observed from recent data transfers. If a relay didn't observe larger data transfers, the reported bandwidth value will be small, but still the (past) measurements might be large. Maybe compare this for single relays over time. - There's an interesting pattern at 1024 (?) kB/s. Maybe there are more at 512 kB/s and others. Can you reduce the amount of overplotting in the graph? In R/ggplot2, you'd set the alpha value to something smaller than 1, so that dots become somewhat transparent. Could be that these patterns are normal, because operators tend to pick certain bandwidth rates more often than others. All the best, Karsten -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJUhqytAAoJEJd5OEYhk8hI//UH/Re5nPKAClCMc919YFxtwBsk o5dkCvh7a3fK0G9LOakuHunxNeXpJYrNJHlhA9djYeUKDL54DfzJFytiA80pkdNV jaw3EC00oWsS04S29fBAZVsnRRm8neR16hraL3ULgxYAgMLxUy8XOAzAlO4lHmxh +3aROoAytSvVHgsdwFd7ltRBtG7/NrIJmOxlNGWn8QlG9UYW4QsUYrl56Ghj0alQ 3+J1FIPYNXH0BH+t1CDM1jfjm84WbUTe/WPsXn7e1pWWUOOJOFYyIF9A41KGbJOZ HKRni9lyV1sdfRi8xrdOigZTcN6yHyW9U119kPg8x3/PEAJqmrJGRw9//PQHqdk= =Gm4F -END PGP SIGNATURE- ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Understanding bwauth data in Stem?
On 21/11/14 23:44, Damian Johnson wrote: Agreed, it would be nice for CollecTor to have bandwidth authority information. However, with a few small exceptions (like rdns and geoip lookups) CollecTor is simply a distilled version of what's in the consensus. That is to say, by directly collecting descriptor information like you are you're already have a superset of what CollecTor provides. Minor clarification: I think you're confusing Onionoo with CollecTor. Onionoo indeed distills data obtained from CollecTor and adds things like rDNS and GeoIP lookups. But Onionoo is probably not the right tool for researching bandwidth authorities. CollecTor simply fetches descriptors from the directory authorities and makes them available. There's no difference between using Stem to fetch recent votes or using votes from CollecTor's descriptor archives. All the best, Karsten ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Understanding bwauth data in Stem?
On 22/11/14 01:53, Sebastian Hahn wrote: On 21 Nov 2014, at 23:44, Damian Johnson ata...@torproject.org wrote: Separately, is there a way (using Stem or some other tool) to see the raw bwauth measurements rather than the weights? I don't believe this is exposed anywhere, so only the bandwidth authority operators have this. And by 'have' I mean 'maybe in their logs, or possibly not even surfaced at all'. We could publish those. Let's ask karsten if he thinks that'd be worthwhile? Mike and I discussed this a few years ago in the context of a deliverable where we were supposed to get bwauth and torperf data up on metrics.tp.o (#2394, #2534): https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorF/Year1 I'm afraid I can't find Mike's exact statement anymore, but it was something along: the bandwidth authority measurement setup is so artificial, I can't imagine how anyone would use the raw measurement data for anything useful. I cc'ed Mike to correct me if my memory is wrong, or to say if he changed his opinion. That being said, I don't see any harm in publishing raw bandwidth authority data if it's for research purposes. Let's just not create the whole infrastructure for making the data available on CollecTor, at least until we're more certain that that's worthwhile. I'm also cc'ing Aaron who has worked quite a bit on the bandwidth authority code. All the best, Karsten ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Understanding bwauth data in Stem?
Hi there, On 21 Nov 2014, at 23:44, Damian Johnson ata...@torproject.org wrote: In other words, if I sorted the descriptors by measured value, what would that order mean? I *think* that would be the ordering of 'relays who receive the most tor client traffic due to having a more highly weighted heuristic for relay selection'. that would be accurate, is my understanding That said, this is an area I'm honestly not that familiar. I'm looping in Sebastian, Karsten, and Roger. As mentioned on irc Sebastian has touched the Bandwidth Authorities most recently, so he's likely the most knowledgeable at present about this space. I've tried fixing stuff and have mostly given up. I'm not too familiar with it, and probably can't help too much. I'll try to answer questions if there are any, tho. Separately, is there a way (using Stem or some other tool) to see the raw bwauth measurements rather than the weights? I don't believe this is exposed anywhere, so only the bandwidth authority operators have this. And by 'have' I mean 'maybe in their logs, or possibly not even surfaced at all'. We could publish those. Let's ask karsten if he thinks that'd be worthwhile? Is that a calculation I can reverse? Maybe run a bandwidth authority of your own? This could be a terrible idea. Sebastian would know. You can look at the votes before a consensus was formed, they'll continue the values for each measuring bwauth. Running your own bwauth might be interesting, but it's probably not very useful if you want to learn values for the deployed network. Cheers Sebastian ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev