Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-08 Thread Karsten Loesing
On 05/04/14 17:46, Lukas Erlacher wrote:
 Hello Nikita, Karsten,
 
 On 04/05/2014 05:03 PM, Nikita Borisov wrote:
 On Sat, Apr 5, 2014 at 3:58 PM, Karsten Loesing
 kars...@torproject.org wrote:
 Installing packages using Python-specific package managers is
 going to make our sysadmins sad, so we should have a very good
 reason for wanting such a package.  In general, we don't need
 the latest and greatest package.  Unless we do.
 What about virtualenv? Part of the premise behind it is that you
 can configure appropriate packages as a developer / operator
 without having to bother sysadmins and making them worried about
 system-wide effects.
 
 - Nikita
 
 I was going to mention virtualenv as well, but I have to admit that
 I find it weird and scary, especially since I haven't found good
 documentation for it. If there is somebody who is familiar with
 virtualenv that would probably be the best solution.

I'm afraid I don't know enough about Python or virtualenv.  So far, it
was almost zero effort for our sysadmins to install a package from the
repositories and keep that up-to-date.  I'd like to stick with the
apt-get approach and save the virtualenv approach for situations when
we really need a package that is not contained in the repositories.

Thanks for the suggestion, though!

 On 04/05/2014 04:58 PM, Karsten Loesing wrote:
 My hope with challenger is that it's written quickly, working
 quietly for a year, and then disappearing without anybody
 noticing.  I'd rather not want to maintain yet another thing.
 So, maybe Weather is a better candidate for using onion-py than
 challenger.
 
 Yes, I understand.
 
 Yeah, I think we'll want to define a maximum lifetime of cache 
 entries, or the poor cache will explode pretty soon.
 
 What usage patterns do we have to expect? Do we want to hit onionoo
 to check if the cache is still valid for every request, or should
 we do hard caching for several minutes? The best UX solution
 would be to have a background task that keeps the cache current so
 user requests can be delivered without hitting onionoo at all.

That's a fine question.  I can see various caching approaches here.
But I just realize that this is premature optimization.  Let's first
build the thing and download whatever we need and whenever we need it.
 And once we know what caching needs we have, let's build the cache.

 In other words, unless we do something intelligent with the cache,
 the cache is not actually going to be very useful.

Valid point. :)

 Great, your help would be much appreciated!  Want to send me a
 pull request whenever you have something to merge?
 
 Will do.

Great.  Thanks!

All the best,
Karsten

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-08 Thread Lukas Erlacher
Hi Kostas,

right now, we're coding challenger against what exists in debian wheezy, which 
means version 0.1.2 of the requests lib using the python-requests package you 
mentioned, where response.json is correct, and not response.json() to get json 
content from the response.

I'd recommend that if you want to make your own grab stuff from onionoo 
script suite, to work with onion-py[1] . It's very new, very spiffy and uses 
python 3 and the newest requests lib. (full disclosure: It's my baby and I'm 
desperately looking for testers/users, but that should be obvious to anyone who 
read this thread.)
Alternatively, convince the right people (presumably Karsten and arma) that 
challenger should switch to a more sustainable runtime than what we can get 
from wheezy's repositories. ;-)

Cheers,
Luke

[1] https://github.com/duk3luk3/onion-py



signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-07 Thread Christian
On 07.04.2014 10:43, Karsten Loesing wrote:
 On 06/04/14 21:29, Christian wrote:
 On 04.04.2014 19:13, Karsten Loesing wrote:
 Christian, Lukas, everyone,

 I learned today that we should have something working in a week or two.
  That's why I started hacking on this today and produced some code:

 https://github.com/kloesing/challenger

 Here are a few things I could use help with:

  - Anybody want to help turning this script into a web app, possibly
 using Flask?  See the first next step in README.md.

  - Lukas, you announced OnionPy on tor-dev@ the other day.  Want to look
 into the Add local cache for ... bullet points under Next steps?  Is
 this something OnionPy could support?  Want to write the glue code?

  - Christian, want to help write the graphing code that visualizes the
 `combined-*.json` files produced by that tool?  The README.md suggests a
 few possible graphs.


 Sure,
 should I create a new repo for the website with graphing code or work
 directly in the kloesing/challenger repository?
 
 My hope is that we can turn my script into a Flask web app which serves
 JSON data which is then graphed by your JavaScript that is embedded into
 the HTML.  So it probably makes sense to have everything in a single
 repository.  I'd say feel free to clone kloesing/challenger and send me
 pull requests.  And feel free to create new directories as needed, we
 can still move around things later.
 

I send you a pull request with the first working version:
https://github.com/kloesing/challenger/pull/2 .
The ui is temporary but it works so far.

 All the best,
 Karsten
 

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Lukas Erlacher
Hi Karsten,

On 04/05/2014 09:58 AM, Karsten Loesing wrote:
 On second thought, and after sleeping over this, I'm less convinced that we 
 should use an external library for the caching. We should rather start with a 
 simple dict in memory and flush it based on some simple rules. That would 
 allow us to tweak the caching specifically for our use case. And it would 
 mean avoiding a dependency. We can think about moving to onion-py at a later 
 point. That gives you the opportunity to unspaghettize your code, and once 
 that is done we'll have a better idea what caching needs we have for the 
 challenger tool to decide whether to move to onion-py or not. Would you still 
 want to help write the simple caching code for challenger? 
I cleaned up the caching code and added a simple in-memory dict caching 
provider that has no further dependencies to onion-py. (it also has no 
provisions for eviction/flushing at all, but I will add that next. Right now 
everything is cached forever, but of course a new response from OnionOO 
replaces an old one.)

I can write the OnionOO API code and caching code for challenger, if I can use 
Python 3 and the requests library. (See below)
Of course I'd really like to actually have a user for onion-py, since it would 
help getting the necessary feedback and polish to push the library to version 
1.0, but I understand if that isn't appropriate for this project.
  I don't really understand what the code does. What is meant by
 combining documents? What exactly are we trying to measure? Once I
 know that and have thought of a sensible way to integrate it into
 onion-py I'm confident I can infact write that glue code :)
 Right now, the script sums up all graphs contained in Onionoo's
 bandwidth, clients, uptime, and weights documents.  It also limits the
 range of the new graphs to max(first) to max(last) of given input graphs.

 For example, assume we want to know the total bandwidth provided by the
 following 2 relays participating in the relay challenge:

 datetime:  0, 1, 2, 3, 4, 5, ...

 relay 1: [5, 4, 5, 6]
 relay 2:  [4, 3, 5, 4]

 combined:[8, 9, 9, 6]

 This is not perfect for various reasons, but it's the best I came up
 with yesterday.  Also, as we all know, perfect is the enemy of good.

 (If you're curious, reason #1: the graph goes down at the end, and we
 can't say whether it's because relay 2 disappeared or did not report
 data yet; reason #2: we're weighting both relays' B/s equally, though
 relay 1 might have been online 24/7 and relay 2 only long enough that
 Onionoo doesn't put in null; there may be more reasons.)
Ah, I see! :) So for scalar attributes of relays (such as 
consensus_weight_fraction) it's just a sum, and for histories it's the graphs 
combined as you just outlined. That makes sense, thank you!
 I'm not also sure about Python 3.  Whatever we write needs to run on
 Debian Wheezy with whatever libraries are present there.  If they're all
 Python 3, great.  If not, can't do.

I would strongly prefer to use Python 3. I understand wanting to use debian 
stable (I use it myself), but Python 3 is 6 years old and Python 2 is 
completely dead and its use for new projects is not recommended.
The only mandatory dependency for onion-py, and for me, is requests (I really 
dislike using urllib* directly - if you want to know why, check 
https://gist.github.com/kennethreitz/973705), and the python3-requests package 
in Wheezy is from 2012, and there is no python3-flask. :-(

Is there anything standing against using pip (python3-pip package) to install 
requests and flask from pypi?

 Thanks for your feedback!

 All the best,
 Karsten
Cheers,
Luke



signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Nikita Borisov
On Sat, Apr 5, 2014 at 8:58 AM, Karsten Loesing kars...@torproject.org wrote:
 Right now, the script sums up all graphs contained in Onionoo's
 bandwidth, clients, uptime, and weights documents.  It also limits the
 range of the new graphs to max(first) to max(last) of given input graphs.

 For example, assume we want to know the total bandwidth provided by the
 following 2 relays participating in the relay challenge:

 datetime:  0, 1, 2, 3, 4, 5, ...

 relay 1: [5, 4, 5, 6]
 relay 2:  [4, 3, 5, 4]

 combined:[8, 9, 9, 6]

 This is not perfect for various reasons, but it's the best I came up
 with yesterday.  Also, as we all know, perfect is the enemy of good.

 (If you're curious, reason #1: the graph goes down at the end, and we
 can't say whether it's because relay 2 disappeared or did not report
 data yet; reason #2: we're weighting both relays' B/s equally, though
 relay 1 might have been online 24/7 and relay 2 only long enough that
 Onionoo doesn't put in null; there may be more reasons.)

For the relay challenge, wouldn't you want to include the entire
period that data is available for (i.e., min(first) to max(last))?
Otherwise, if you are looking at a month's worth of data and a new
relay arrives on the last day, your graph would only contain that day.

Also, I think you would want to do datetime.strptime(max(first), ...)
here: https://github.com/kloesing/challenger/blob/master/challenge.py#L177-L178
Otherwise you're just taking the last relay's first and last values as
the new_first and new_last.

Cheers,
- Nikita
-- 
Nikita Borisov - http://hatswitch.org/~nikita/
Associate Professor, Electrical and Computer Engineering
Tel: +1 (217) 244-5385, Office: 460 CSL
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Nikita Borisov
On Sat, Apr 5, 2014 at 3:58 PM, Karsten Loesing kars...@torproject.org wrote:
 Installing packages using Python-specific package managers is going to
 make our sysadmins sad, so we should have a very good reason for
 wanting such a package.  In general, we don't need the latest and
 greatest package.  Unless we do.

What about virtualenv? Part of the premise behind it is that you can
configure appropriate packages as a developer / operator without
having to bother sysadmins and making them worried about system-wide
effects.

- Nikita
-- 
Nikita Borisov - http://hatswitch.org/~nikita/
Associate Professor, Electrical and Computer Engineering
Tel: +1 (217) 244-5385, Office: 460 CSL
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Lukas Erlacher
Hello Nikita, Karsten,

On 04/05/2014 05:03 PM, Nikita Borisov wrote:
 On Sat, Apr 5, 2014 at 3:58 PM, Karsten Loesing kars...@torproject.org 
 wrote:
 Installing packages using Python-specific package managers is going to
 make our sysadmins sad, so we should have a very good reason for
 wanting such a package.  In general, we don't need the latest and
 greatest package.  Unless we do.
 What about virtualenv? Part of the premise behind it is that you can
 configure appropriate packages as a developer / operator without
 having to bother sysadmins and making them worried about system-wide
 effects.

 - Nikita

I was going to mention virtualenv as well, but I have to admit that I find it 
weird and scary, especially since I haven't found good documentation for it. If 
there is somebody who is familiar with virtualenv that would probably be the 
best solution.

On 04/05/2014 04:58 PM, Karsten Loesing wrote:
 My hope with challenger is that it's written quickly, working quietly
 for a year, and then disappearing without anybody noticing.  I'd
 rather not want to maintain yet another thing.  So, maybe Weather is a
 better candidate for using onion-py than challenger.

Yes, I understand.
 Yeah, I think we'll want to define a maximum lifetime of cache
 entries, or the poor cache will explode pretty soon.

What usage patterns do we have to expect? Do we want to hit onionoo to check if 
the cache is still valid for every request, or should we do hard caching for 
several minutes? The best UX solution would be to have a background task that 
keeps the cache current so user requests can be delivered without hitting 
onionoo at all.
In other words, unless we do something intelligent with the cache, the cache is 
not actually going to be very useful.

 Great, your help would be much appreciated!  Want to send me a pull
 request whenever you have something to merge?

Will do.

Cheers,
Luke



signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Lukas Erlacher
On 04/05/2014 04:58 PM, Karsten Loesing wrote:
 Great, your help would be much appreciated!  Want to send me a pull
 request whenever you have something to merge?


Alright, so I wrote a few lines and sent you a pull request. Could you please 
check if that downloads the data you expect?
And when we know what exactly we want to cache and how, I'll add the logic for 
that.

Cheers,
Luke



signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-04 Thread Karsten Loesing
Christian, Lukas, everyone,

I learned today that we should have something working in a week or two.
 That's why I started hacking on this today and produced some code:

https://github.com/kloesing/challenger

Here are a few things I could use help with:

 - Anybody want to help turning this script into a web app, possibly
using Flask?  See the first next step in README.md.

 - Lukas, you announced OnionPy on tor-dev@ the other day.  Want to look
into the Add local cache for ... bullet points under Next steps?  Is
this something OnionPy could support?  Want to write the glue code?

 - Christian, want to help write the graphing code that visualizes the
`combined-*.json` files produced by that tool?  The README.md suggests a
few possible graphs.

Thanks in advance!  You're all helping grow the Tor network!

Also replying to Christian's mail inline.

On 28/03/14 09:07, Christian wrote:
 On 27.03.2014 16:25, Karsten Loesing wrote:
 On 27/03/14 11:57, Roger Dingledine wrote:
 Hi Christian, other tor relay fans,

 I'm looking for some volunteers, hopefully including Christian, to work
 on metrics and visualization of impact from new relays.

 We're working with EFF to do another Tor relay challenge [*], to both
 help raise awareness of the value of Tor, and encourage many people to
 run relays -- probably non-exit relays for the most part, since that's
 the easiest for normal volunteers to step up and do.

 You can read about the first round from several years ago here:
 https://www.eff.org/torchallenge

 To make it succeed, the challenge for us here is to figure out what to
 measure to track progress, and then measure it and graph it for everybody.

 I'm figuring that like last time, EFF will collect a list of fingerprints
 of relays that signed up because of the challenge.

 One of the main pushes we're aiming for this year is longevity: it's
 easy to sign up a relay for two weeks and then stop. We want to emphasize
 consistency and encourage having the relays up for many months.
 
 Do you want the challenge application to simply provide some graphs or
 give some sort of interactive dashboard (clientside JavaScript)?

You asked Roger, and I'm not Roger, but I'd say let's start with some
graphs.  We can always make it more interactive later.  Though I doubt
it will be necessary.

 Before going through your list of things we'd want to track below, let's
 first talk about our options to turn a list of fingerprints into fancy
 graphs:

  1. Write a new metrics-web module and put graphs on the metrics
 website.  This means parsing relay descriptors and storing certain
 per-relay statistics for all relays.  That gives us maximum flexibility
 in the kinds of statistics, but is also most expensive in terms of
 developer hours.  I don't want to do this.

  2. Extend Globe to show details pages for multiple relays.  This
 requires us to move to the server-based Globe-node, because the poor
 browser shouldn't download graph data for all relays, but the server
 should return a single graph for all relays.  It's also unclear if the
 new graphs will be of general interest for Globe users, and if the rest
 of the Globe details will be confusing to people interested in the relay
 challenge.  Probably not a great idea, but I'm not sure.

 
 I agree that Globe isn't the best place to display the challenge graphs.
 Currently the only focus for Globe is to provide data for single relays
 and bridges.
 Imo it would be better if the challenge participants list adds links to
 atlas, blutmagie and globe.

Agreed!

  3. Extend Onionoo to return aggregate graph data for a given set of
 fingerprints.  Seems useful.  But has the big disadvantage that Onionoo
 would suddenly have to create responses dynamically.  I'm worried about
 creating a new performance bottleneck there, and this is certainly not
 possible with poor overloaded yatei.

  4. Write a new little tool that fetches Onionoo documents once (or
 twice) per day for all relays participating in the relay challenge and
 that produces graph data.  That new tool could probably re-use some
 Compass code for the backend and some Globe code for the frontend.
 Graphs could be integrated directly into EFF's website.  This is
 currently my favorite approach.

 
 I like this idea.

Glad to hear!  I slightly moved away from the fetches once or twice per
day idea to a more elaborate approach.  But the general idea is still
the same.

 Note for 2--4: Onionoo currently only gives out data for relays that
 have been running in the past 7 days.  I'd have to extend it to give out
 all data for a list of fingerprints, regardless of when relays were
 running the last time.  That's 2--3 days of coding and testing for me.
 It's also potentially creating a bottleneck, so we should first have a
 replacement for yatei.

 So what are the things we'd want to track?

 - Number of relays signed up that are Running, over time.

 We can do something here with Onionoo's new uptime documents.

 - Total 

Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-04 Thread Lukas Erlacher
Hello everyone (reply all ftw),

On 04/04/2014 07:13 PM, Karsten Loesing wrote:
 Christian, Lukas, everyone,

 I learned today that we should have something working in a week or two.
  That's why I started hacking on this today and produced some code:

 https://github.com/kloesing/challenger

 Here are a few things I could use help with:

  - Anybody want to help turning this script into a web app, possibly
 using Flask?  See the first next step in README.md.
I might be able to do that, but currently I don't have enough free time to make 
a commitment.
  - Lukas, you announced OnionPy on tor-dev@ the other day.  Want to look
 into the Add local cache for ... bullet points under Next steps?  Is
 this something OnionPy could support?  Want to write the glue code?
onion-py already supports transparent caching using memcached. I use a 
(hopefully) unique serialisation of the query as the key (see serializer 
functions here: 
https://github.com/duk3luk3/onion-py/blob/master/onion_py/manager.py#L7) and 
have a bit of spaghetti code to check for available cached data and the 304 
response status from onionoo 
(https://github.com/duk3luk3/onion-py/blob/master/onion_py/manager.py#L97).

I don't really understand what the code does. What is meant by combining 
documents? What exactly are we trying to measure? Once I know that and have 
thought of a sensible way to integrate it into onion-py I'm confident I can 
infact write that glue code :)

Cutting off the rest of the quote tree here (is that a polite thing to do on 
mailing lists? Sorry if not.), I just have two more comments towards Roger's 
thoughts:

1. Groups of relays taking the challenge together could just form relay 
families and we could count relay families in aggregate. (I'm already thinking 
about relay families a lot because gamambel wants me to overhaul the torservers 
exit-funding scripts to use relay families.)
2. If you want to do something with consensus weight, why not compare against 
all other new relays based on the first_seen property? (new can be adjusted 
until sufficiently pretty graphs emerge; and we'd need to periodically (every 4 
or 12 or 24 hours?) fetch the consensus_weights from onionoo)

Cheers,
Luke

PS: If you'd like me to support different backends for the caching in onion-py, 
I'm open to integrating anything that has a python 3 library.



signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-03-28 Thread Karsten Loesing
On 27/03/14 19:51, Runa A. Sandvik wrote:
 On Thu, Mar 27, 2014 at 3:25 PM, Karsten Loesing kars...@torproject.org 
 wrote:
 Before going through your list of things we'd want to track below, let's
 first talk about our options to turn a list of fingerprints into fancy
 graphs:
 
 Would it be possible to also have a Top 10 countries with the most
 Tor relays graph?

Hi Runa!

Hmm hmm hmm---yes!  Onionoo's details documents contain country
information, and it shouldn't be too hard to combine them with uptime or
bandwidth information to make per-country graphs.

(Wow, your question made me rethink how we resolve relay/bridge IP
addresses to country codes for statistics.  I was always thinking that
we need to remember the full history of country codes that a
relay/bridge IP address was resolved to, because a relay/bridge could be
moved to another country, or a new IP-to-country database might change
its mind about which country it is in.  But that doesn't really matter
for statistics where we're mostly interested in the big picture.  We can
probably just use whatever country code we learned last and apply that
to the full history of the relay/bridge.  Guess I should resume working
on per-country graphs for the metrics website soon, for both relays and
bridges.  Thanks!)

(Disclaimer: it's pre-second coffee time!)

All the best,
Karsten

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-03-28 Thread Runa A. Sandvik
On Fri, Mar 28, 2014 at 5:45 AM, Karsten Loesing kars...@torproject.org wrote:
 On 27/03/14 19:51, Runa A. Sandvik wrote:
 On Thu, Mar 27, 2014 at 3:25 PM, Karsten Loesing kars...@torproject.org 
 wrote:
 Before going through your list of things we'd want to track below, let's
 first talk about our options to turn a list of fingerprints into fancy
 graphs:

 Would it be possible to also have a Top 10 countries with the most
 Tor relays graph?

 Hi Runa!

Hi Karsten! :)

 Hmm hmm hmm---yes!  Onionoo's details documents contain country
 information, and it shouldn't be too hard to combine them with uptime or
 bandwidth information to make per-country graphs.

 (Wow, your question made me rethink how we resolve relay/bridge IP
 addresses to country codes for statistics.  I was always thinking that
 we need to remember the full history of country codes that a
 relay/bridge IP address was resolved to, because a relay/bridge could be
 moved to another country, or a new IP-to-country database might change
 its mind about which country it is in.  But that doesn't really matter
 for statistics where we're mostly interested in the big picture.  We can
 probably just use whatever country code we learned last and apply that
 to the full history of the relay/bridge.  Guess I should resume working
 on per-country graphs for the metrics website soon, for both relays and
 bridges.  Thanks!)

Great! I look forward to seeing the stats for this.

-- 
Runa A. Sandvik
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-03-27 Thread Karsten Loesing
On 27/03/14 11:57, Roger Dingledine wrote:
 Hi Christian, other tor relay fans,
 
 I'm looking for some volunteers, hopefully including Christian, to work
 on metrics and visualization of impact from new relays.
 
 We're working with EFF to do another Tor relay challenge [*], to both
 help raise awareness of the value of Tor, and encourage many people to
 run relays -- probably non-exit relays for the most part, since that's
 the easiest for normal volunteers to step up and do.
 
 You can read about the first round from several years ago here:
 https://www.eff.org/torchallenge
 
 To make it succeed, the challenge for us here is to figure out what to
 measure to track progress, and then measure it and graph it for everybody.
 
 I'm figuring that like last time, EFF will collect a list of fingerprints
 of relays that signed up because of the challenge.
 
 One of the main pushes we're aiming for this year is longevity: it's
 easy to sign up a relay for two weeks and then stop. We want to emphasize
 consistency and encourage having the relays up for many months.

Before going through your list of things we'd want to track below, let's
first talk about our options to turn a list of fingerprints into fancy
graphs:

 1. Write a new metrics-web module and put graphs on the metrics
website.  This means parsing relay descriptors and storing certain
per-relay statistics for all relays.  That gives us maximum flexibility
in the kinds of statistics, but is also most expensive in terms of
developer hours.  I don't want to do this.

 2. Extend Globe to show details pages for multiple relays.  This
requires us to move to the server-based Globe-node, because the poor
browser shouldn't download graph data for all relays, but the server
should return a single graph for all relays.  It's also unclear if the
new graphs will be of general interest for Globe users, and if the rest
of the Globe details will be confusing to people interested in the relay
challenge.  Probably not a great idea, but I'm not sure.

 3. Extend Onionoo to return aggregate graph data for a given set of
fingerprints.  Seems useful.  But has the big disadvantage that Onionoo
would suddenly have to create responses dynamically.  I'm worried about
creating a new performance bottleneck there, and this is certainly not
possible with poor overloaded yatei.

 4. Write a new little tool that fetches Onionoo documents once (or
twice) per day for all relays participating in the relay challenge and
that produces graph data.  That new tool could probably re-use some
Compass code for the backend and some Globe code for the frontend.
Graphs could be integrated directly into EFF's website.  This is
currently my favorite approach.

Note for 2--4: Onionoo currently only gives out data for relays that
have been running in the past 7 days.  I'd have to extend it to give out
all data for a list of fingerprints, regardless of when relays were
running the last time.  That's 2--3 days of coding and testing for me.
It's also potentially creating a bottleneck, so we should first have a
replacement for yatei.

 So what are the things we'd want to track?
 
 - Number of relays signed up that are Running, over time.

We can do something here with Onionoo's new uptime documents.

 - Total bandwidth history of these running relays, over time.

We can sum up data from bandwidth documents for this.

 - Maybe a graph showing the total number of bytes ever contributed
   by these relays? That would impress people perhaps.

Sure, same data as above.

 - Total consensus weight of these running relays, over time.

We only have total consensus weight *fraction*, but yes.

 - Something emphasizing duration -- e.g. the total consensus weight of
   the subset of the relays that have been in the consensus for 90% of
   the past month, 2 months, 6 months, etc. Are there better ideas here
   I hope? We'll want to be cognizant that if we're in the first week
   of the challenge, the 2 month graph will be empty and thus look sad.

Not sure what the 90% part is for, but yes, graphs with total consensus
weight fraction are doable.

Regarding the sad-looking 2 month graph, we can easily define the data
when the challenge starts and not show graphs until they make sense.
Note that the current intervals for most data are 1 week, 1 month, 3
months, 1 year, and 5 years.

 - Something comparing the above numbers to the total numbers. Given how
   huge some of the relays are lately, it would be easily to visualize
   the new contribution as a tiny irrelevant fraction, which could be
   disheartening to new relay operators even if their relays will actually
   become a big deal with some patience. What are some strategies for
   making this work right? E.g. a layer graph showing y layered on top of
   x where y is the new contribution, rather than a percentage-of-total
   graph that shows approximately 0%.

Absolute contributions to consensus weight are not available, just
relative fractions.

 We could also 

Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-03-27 Thread Runa A. Sandvik
On Thu, Mar 27, 2014 at 3:25 PM, Karsten Loesing kars...@torproject.org wrote:
 Before going through your list of things we'd want to track below, let's
 first talk about our options to turn a list of fingerprints into fancy
 graphs:

Would it be possible to also have a Top 10 countries with the most
Tor relays graph?

-- 
Runa A. Sandvik
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays