Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-08 Thread Kostas Jakeliunas
On Wed, Apr 9, 2014 at 4:18 AM, Kostas Jakeliunas wrote:

> On Wed, Apr 9, 2014 at 4:06 AM, Lukas Erlacher  wrote:
>
>> Hi Kostas,
>>
>> right now, we're coding challenger against what exists in debian wheezy,
>> which means version 0.1.2 of the requests lib using the python-requests
>> package you mentioned, where response.json is correct, and not
>> response.json() to get json content from the response.
>>
>> I'd recommend that if you want to make your own "grab stuff from onionoo"
>> script suite, to work with onion-py[1] . It's very new, very spiffy and
>> uses python 3 and the newest requests lib. (full disclosure: It's my baby
>> and I'm desperately looking for testers/users, but that should be obvious
>> to anyone who read this thread.)
>> Alternatively, convince the right people (presumably Karsten and arma)
>> that challenger should switch to a more sustainable runtime than "what we
>> can get from wheezy's repositories". ;-)
>>
>
> A-ha! :) That makes sense. (fwiw, i used pip under virtualenv in wheezy;
> requests lib version ancient indeed; such is life. fwiw, convincing wheezy
> cavepeople to use what you suggest makes sense. It's a false dichotomy
> between 'ensures dependences vs. breaks dependencies.')
>
> So
>
>   - the timeout stuff might be useful to everyone involved; it's rough
>   - the 'fix' might be useful for people using old 'requests'
>

Actually, I might have that one kind of backwards. So timeout stuff for
everyone (who wants to use things from the
'luk3duk3-onionoo-integration'[2] branch), the 'fix' for *certain* people
(for example, for those using pip.)


>- your onion-py sounds nice
>
> g'day
>
>
>> Cheers,
>> Luke
>>
>> [1] https://github.com/duk3luk3/onion-py
>
>
[2]:
https://github.com/kloesing/challenger/commits/luk3duk3-onionoo-integration
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-08 Thread Kostas Jakeliunas
On Wed, Apr 9, 2014 at 4:06 AM, Lukas Erlacher  wrote:

> Hi Kostas,
>
> right now, we're coding challenger against what exists in debian wheezy,
> which means version 0.1.2 of the requests lib using the python-requests
> package you mentioned, where response.json is correct, and not
> response.json() to get json content from the response.
>
> I'd recommend that if you want to make your own "grab stuff from onionoo"
> script suite, to work with onion-py[1] . It's very new, very spiffy and
> uses python 3 and the newest requests lib. (full disclosure: It's my baby
> and I'm desperately looking for testers/users, but that should be obvious
> to anyone who read this thread.)
> Alternatively, convince the right people (presumably Karsten and arma)
> that challenger should switch to a more sustainable runtime than "what we
> can get from wheezy's repositories". ;-)
>

A-ha! :) That makes sense. (fwiw, i used pip under virtualenv in wheezy;
requests lib version ancient indeed; such is life. fwiw, convincing wheezy
cavepeople to use what you suggest makes sense. It's a false dichotomy
between 'ensures dependences vs. breaks dependencies.')

So

  - the timeout stuff might be useful to everyone involved; it's rough
  - the 'fix' might be useful for people using old 'requests'
  - your onion-py sounds nice

g'day


> Cheers,
> Luke
>
> [1] https://github.com/duk3luk3/onion-py
>
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-08 Thread Lukas Erlacher
Hi Kostas,

right now, we're coding challenger against what exists in debian wheezy, which 
means version 0.1.2 of the requests lib using the python-requests package you 
mentioned, where response.json is correct, and not response.json() to get json 
content from the response.

I'd recommend that if you want to make your own "grab stuff from onionoo" 
script suite, to work with onion-py[1] . It's very new, very spiffy and uses 
python 3 and the newest requests lib. (full disclosure: It's my baby and I'm 
desperately looking for testers/users, but that should be obvious to anyone who 
read this thread.)
Alternatively, convince the right people (presumably Karsten and arma) that 
challenger should switch to a more sustainable runtime than "what we can get 
from wheezy's repositories". ;-)

Cheers,
Luke

[1] https://github.com/duk3luk3/onion-py



signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-08 Thread Kostas Jakeliunas
On Tue, Apr 8, 2014 at 12:59 PM, Karsten Loesing wrote:

> On 05/04/14 17:46, Lukas Erlacher wrote:
> > Hello Nikita, Karsten,
> >
> > On 04/05/2014 05:03 PM, Nikita Borisov wrote:
> >> On Sat, Apr 5, 2014 at 3:58 PM, Karsten Loesing
> >>  wrote:
> >>> Installing packages using Python-specific package managers is
> >>> going to make our sysadmins sad, so we should have a very good
> >>> reason for wanting such a package.  In general, we don't need
> >>> the latest and greatest package.  Unless we do.
> >> What about virtualenv? Part of the premise behind it is that you
> >> can configure appropriate packages as a developer / operator
> >> without having to bother sysadmins and making them worried about
> >> system-wide effects.
> >>
> >> - Nikita
> >
> > I was going to mention virtualenv as well, but I have to admit that
> > I find it weird and scary, especially since I haven't found good
> > documentation for it. If there is somebody who is familiar with
> > virtualenv that would probably be the best solution.
>
> I'm afraid I don't know enough about Python or virtualenv.  So far, it
> was almost zero effort for our sysadmins to install a package from the
> repositories and keep that up-to-date.  I'd like to stick with the
> apt-get approach and save the virtualenv approach for situations when
> we really need a package that is not contained in the repositories.
>
> Thanks for the suggestion, though!
>
> > On 04/05/2014 04:58 PM, Karsten Loesing wrote:
> >> My hope with challenger is that it's written quickly, working
> >> quietly for a year, and then disappearing without anybody
> >> noticing.  I'd rather not want to maintain yet another thing.
> >> So, maybe Weather is a better candidate for using onion-py than
> >> challenger.
> >
> > Yes, I understand.
> >
> >> Yeah, I think we'll want to define a maximum lifetime of cache
> >> entries, or the poor cache will explode pretty soon.
> >
> > What usage patterns do we have to expect? Do we want to hit onionoo
> > to check if the cache is still valid for every request, or should
> > we do "hard caching" for several minutes? The best UX solution
> > would be to have a background task that keeps the cache current so
> > user requests can be delivered without hitting onionoo at all.
>
> That's a fine question.  I can see various caching approaches here.
> But I just realize that this is premature optimization.  Let's first
> build the thing and download whatever we need and whenever we need it.
>  And once we know what caching needs we have, let's build the cache.
>
> > In other words, unless we do something intelligent with the cache,
> > the cache is not actually going to be very useful.
>
> Valid point. :)
>
> >> Great, your help would be much appreciated!  Want to send me a
> >> pull request whenever you have something to merge?
> >
> > Will do.
>
> Great.  Thanks!
>

Hi Karsten and others,

I got to run the challenger script by chance[1], and spotted a small
mistake that was preventing Lukas' onion.py downloader code from working.
Ended up forking and creating a separate branch:

https://github.com/wfn/challenger/commits/wfn_fix_luk3s_download

Relevant commits:

  - 38d88bcb1136f97881f81152d3d883c4e9480188[2] (enables downloader)
  - 39c800643c040474402fc62d2a2db75c25889dfc[3] (this is the one with the
small thingie-fix)

(It was a very small thing with the way the 'requests' module
handles/provides json documents.)

I was doing this to be able to give Roger the 'combined-*.json' files for
currently vulnerable (re: openssl) relays (he wanted to see which part of
the combined weight fraction they comprise, etc.)

Fingerprints for those relays are here, fwiw:
http://ravinesmp.com/volatile/challenger-stuff/vuln_fingerprints.txt (the
original link that Roger gave me was http://fpaste.org/92688/ )
(count: 1024.)

If you download these fingerprints, you can just run `python challenge.py
-f vuln_fingerprints.txt`

(for anyone using virtualenv, you might need to `pip install requests`, and
then things should work. For anyone who's just cloned the thing, everything
should probably work after simply installing the 'requests' python module,
if it's not there. I see that 'python-requests' is available in the repos.)

I guess the code hasn't been tested for those amounts of fingerprints
before. Good news: it works (where 'works' means 'i opened the resulting
files and they contained all those fingerprints, and/or they contained lots
of numbers.') Kinda-bad news: Onionoo doesn't seem to share the enthusiasm,
and hiccups, and spits 502 Proxy Error some time after the lookups for the
first document (combined bandwidth) are made.

My cheap quick hack was to insert time.sleep() here and there:

  - 7425ef6fc00dedf3b2b7f2649e832fb4c93909ae[4]

(cheap hack is cheap, but it worked. Note: takes time to download
everything. Didn't time it yet - sorry.)

For anyone interested, these are the resulting 'combined-*.json' files from
all those fingerprints:

  -
http://ravinesmp.com

Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-08 Thread Karsten Loesing
On 05/04/14 17:46, Lukas Erlacher wrote:
> Hello Nikita, Karsten,
> 
> On 04/05/2014 05:03 PM, Nikita Borisov wrote:
>> On Sat, Apr 5, 2014 at 3:58 PM, Karsten Loesing
>>  wrote:
>>> Installing packages using Python-specific package managers is
>>> going to make our sysadmins sad, so we should have a very good
>>> reason for wanting such a package.  In general, we don't need
>>> the latest and greatest package.  Unless we do.
>> What about virtualenv? Part of the premise behind it is that you
>> can configure appropriate packages as a developer / operator
>> without having to bother sysadmins and making them worried about
>> system-wide effects.
>> 
>> - Nikita
> 
> I was going to mention virtualenv as well, but I have to admit that
> I find it weird and scary, especially since I haven't found good
> documentation for it. If there is somebody who is familiar with
> virtualenv that would probably be the best solution.

I'm afraid I don't know enough about Python or virtualenv.  So far, it
was almost zero effort for our sysadmins to install a package from the
repositories and keep that up-to-date.  I'd like to stick with the
apt-get approach and save the virtualenv approach for situations when
we really need a package that is not contained in the repositories.

Thanks for the suggestion, though!

> On 04/05/2014 04:58 PM, Karsten Loesing wrote:
>> My hope with challenger is that it's written quickly, working
>> quietly for a year, and then disappearing without anybody
>> noticing.  I'd rather not want to maintain yet another thing.
>> So, maybe Weather is a better candidate for using onion-py than
>> challenger.
> 
> Yes, I understand.
> 
>> Yeah, I think we'll want to define a maximum lifetime of cache 
>> entries, or the poor cache will explode pretty soon.
> 
> What usage patterns do we have to expect? Do we want to hit onionoo
> to check if the cache is still valid for every request, or should
> we do "hard caching" for several minutes? The best UX solution
> would be to have a background task that keeps the cache current so
> user requests can be delivered without hitting onionoo at all.

That's a fine question.  I can see various caching approaches here.
But I just realize that this is premature optimization.  Let's first
build the thing and download whatever we need and whenever we need it.
 And once we know what caching needs we have, let's build the cache.

> In other words, unless we do something intelligent with the cache,
> the cache is not actually going to be very useful.

Valid point. :)

>> Great, your help would be much appreciated!  Want to send me a
>> pull request whenever you have something to merge?
> 
> Will do.

Great.  Thanks!

All the best,
Karsten

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-07 Thread Christian
On 07.04.2014 10:43, Karsten Loesing wrote:
> On 06/04/14 21:29, Christian wrote:
>> On 04.04.2014 19:13, Karsten Loesing wrote:
>>> Christian, Lukas, everyone,
>>>
>>> I learned today that we should have something working in a week or two.
>>>  That's why I started hacking on this today and produced some code:
>>>
>>> https://github.com/kloesing/challenger
>>>
>>> Here are a few things I could use help with:
>>>
>>>  - Anybody want to help turning this script into a web app, possibly
>>> using Flask?  See the first next step in README.md.
>>>
>>>  - Lukas, you announced OnionPy on tor-dev@ the other day.  Want to look
>>> into the "Add local cache for ..." bullet points under "Next steps"?  Is
>>> this something OnionPy could support?  Want to write the glue code?
>>>
>>>  - Christian, want to help write the graphing code that visualizes the
>>> `combined-*.json` files produced by that tool?  The README.md suggests a
>>> few possible graphs.
>>>
>>
>> Sure,
>> should I create a new repo for the website with graphing code or work
>> directly in the kloesing/challenger repository?
> 
> My hope is that we can turn my script into a Flask web app which serves
> JSON data which is then graphed by your JavaScript that is embedded into
> the HTML.  So it probably makes sense to have everything in a single
> repository.  I'd say feel free to clone kloesing/challenger and send me
> pull requests.  And feel free to create new directories as needed, we
> can still move around things later.
> 

I send you a pull request with the first working version:
https://github.com/kloesing/challenger/pull/2 .
The ui is temporary but it works so far.

> All the best,
> Karsten
> 

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-07 Thread Karsten Loesing
On 06/04/14 21:29, Christian wrote:
> On 04.04.2014 19:13, Karsten Loesing wrote:
>> Christian, Lukas, everyone,
>>
>> I learned today that we should have something working in a week or two.
>>  That's why I started hacking on this today and produced some code:
>>
>> https://github.com/kloesing/challenger
>>
>> Here are a few things I could use help with:
>>
>>  - Anybody want to help turning this script into a web app, possibly
>> using Flask?  See the first next step in README.md.
>>
>>  - Lukas, you announced OnionPy on tor-dev@ the other day.  Want to look
>> into the "Add local cache for ..." bullet points under "Next steps"?  Is
>> this something OnionPy could support?  Want to write the glue code?
>>
>>  - Christian, want to help write the graphing code that visualizes the
>> `combined-*.json` files produced by that tool?  The README.md suggests a
>> few possible graphs.
>>
> 
> Sure,
> should I create a new repo for the website with graphing code or work
> directly in the kloesing/challenger repository?

My hope is that we can turn my script into a Flask web app which serves
JSON data which is then graphed by your JavaScript that is embedded into
the HTML.  So it probably makes sense to have everything in a single
repository.  I'd say feel free to clone kloesing/challenger and send me
pull requests.  And feel free to create new directories as needed, we
can still move around things later.

All the best,
Karsten

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-06 Thread Christian
On 04.04.2014 19:13, Karsten Loesing wrote:
> Christian, Lukas, everyone,
> 
> I learned today that we should have something working in a week or two.
>  That's why I started hacking on this today and produced some code:
> 
> https://github.com/kloesing/challenger
> 
> Here are a few things I could use help with:
> 
>  - Anybody want to help turning this script into a web app, possibly
> using Flask?  See the first next step in README.md.
> 
>  - Lukas, you announced OnionPy on tor-dev@ the other day.  Want to look
> into the "Add local cache for ..." bullet points under "Next steps"?  Is
> this something OnionPy could support?  Want to write the glue code?
> 
>  - Christian, want to help write the graphing code that visualizes the
> `combined-*.json` files produced by that tool?  The README.md suggests a
> few possible graphs.
> 

Sure,
should I create a new repo for the website with graphing code or work
directly in the kloesing/challenger repository?
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Lukas Erlacher
On 04/05/2014 04:58 PM, Karsten Loesing wrote:
> Great, your help would be much appreciated!  Want to send me a pull
> request whenever you have something to merge?
>
>
Alright, so I wrote a few lines and sent you a pull request. Could you please 
check if that downloads the data you expect?
And when we know what exactly we want to cache and how, I'll add the logic for 
that.

Cheers,
Luke



signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Lukas Erlacher
Hello Nikita, Karsten,

On 04/05/2014 05:03 PM, Nikita Borisov wrote:
> On Sat, Apr 5, 2014 at 3:58 PM, Karsten Loesing  
> wrote:
>> Installing packages using Python-specific package managers is going to
>> make our sysadmins sad, so we should have a very good reason for
>> wanting such a package.  In general, we don't need the latest and
>> greatest package.  Unless we do.
> What about virtualenv? Part of the premise behind it is that you can
> configure appropriate packages as a developer / operator without
> having to bother sysadmins and making them worried about system-wide
> effects.
>
> - Nikita

I was going to mention virtualenv as well, but I have to admit that I find it 
weird and scary, especially since I haven't found good documentation for it. If 
there is somebody who is familiar with virtualenv that would probably be the 
best solution.

On 04/05/2014 04:58 PM, Karsten Loesing wrote:
> My hope with challenger is that it's written quickly, working quietly
> for a year, and then disappearing without anybody noticing.  I'd
> rather not want to maintain yet another thing.  So, maybe Weather is a
> better candidate for using onion-py than challenger.

Yes, I understand.
> Yeah, I think we'll want to define a maximum lifetime of cache
> entries, or the poor cache will explode pretty soon.

What usage patterns do we have to expect? Do we want to hit onionoo to check if 
the cache is still valid for every request, or should we do "hard caching" for 
several minutes? The best UX solution would be to have a background task that 
keeps the cache current so user requests can be delivered without hitting 
onionoo at all.
In other words, unless we do something intelligent with the cache, the cache is 
not actually going to be very useful.

> Great, your help would be much appreciated!  Want to send me a pull
> request whenever you have something to merge?

Will do.

Cheers,
Luke



signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Karsten Loesing
On 05/04/14 16:42, Nikita Borisov wrote:
> On Sat, Apr 5, 2014 at 8:58 AM, Karsten Loesing  
> wrote:
>> Right now, the script sums up all graphs contained in Onionoo's
>> bandwidth, clients, uptime, and weights documents.  It also limits the
>> range of the new graphs to max(first) to max(last) of given input graphs.
>>
>> For example, assume we want to know the total bandwidth provided by the
>> following 2 relays participating in the relay challenge:
>>
>> datetime:  0, 1, 2, 3, 4, 5, ...
>>
>> relay 1: [5, 4, 5, 6]
>> relay 2:  [4, 3, 5, 4]
>>
>> combined:[8, 9, 9, 6]
>>
>> This is not perfect for various reasons, but it's the best I came up
>> with yesterday.  Also, as we all know, perfect is the enemy of good.
>>
>> (If you're curious, reason #1: the graph goes down at the end, and we
>> can't say whether it's because relay 2 disappeared or did not report
>> data yet; reason #2: we're weighting both relays' B/s equally, though
>> relay 1 might have been online 24/7 and relay 2 only long enough that
>> Onionoo doesn't put in null; there may be more reasons.)
> 
> For the relay challenge, wouldn't you want to include the entire
> period that data is available for (i.e., min(first) to max(last))?
> Otherwise, if you are looking at a month's worth of data and a new
> relay arrives on the last day, your graph would only contain that day.

Very good point!

The reason why I didn't include everything from min(first) to max(last)
is that any graph covers the last $time_period of the relay or bridge
being online and reporting data.  So, the "3_days" graph of a specific
relay could show a 3-day period weeks ago, and we wouldn't want to merge
that with other 3-day periods which are more recent.  Of corse, you're
right that a new relay covering only a few hours in their "3_days" graph
would reduce our combined graph to just that.  Oops.

So, I guess what we want to do is include everything from $(now - 3
days) to $now in the combined graph.  Will fix.

> Also, I think you would want to do datetime.strptime(max(first), ...)
> here: 
> https://github.com/kloesing/challenger/blob/master/challenge.py#L177-L178
> Otherwise you're just taking the last relay's first and last values as
> the new_first and new_last.

Another very good point.  Will fix.

Thanks for the review!

All the best,
Karsten

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Karsten Loesing
On 05/04/14 12:19, Lukas Erlacher wrote:
> Hi Karsten,
> 
> On 04/05/2014 09:58 AM, Karsten Loesing wrote:
>> On second thought, and after sleeping over this, I'm less
>> convinced that we should use an external library for the caching.
>> We should rather start with a simple dict in memory and flush it
>> based on some simple rules. That would allow us to tweak the
>> caching specifically for our use case. And it would mean avoiding
>> a dependency. We can think about moving to onion-py at a later
>> point. That gives you the opportunity to unspaghettize your code,
>> and once that is done we'll have a better idea what caching needs
>> we have for the challenger tool to decide whether to move to
>> onion-py or not. Would you still want to help write the simple
>> caching code for challenger?
> 
> I cleaned up the caching code and added a simple in-memory dict
> caching provider that has no further dependencies to onion-py. (it
> also has no provisions for eviction/flushing at all, but I will add
> that next. Right now everything is cached forever, but of course a
> new response from OnionOO replaces an old one.)

Yeah, I think we'll want to define a maximum lifetime of cache
entries, or the poor cache will explode pretty soon.

> I can write the OnionOO API code and caching code for challenger,
> if I can use Python 3 and the requests library. (See below)

Great, your help would be much appreciated!  Want to send me a pull
request whenever you have something to merge?

See my response regarding Python 3 below.

> Of course I'd really like to actually have a user for onion-py,
> since it would help getting the necessary feedback and polish to
> push the library to version 1.0, but I understand if that isn't
> appropriate for this project.

My hope with challenger is that it's written quickly, working quietly
for a year, and then disappearing without anybody noticing.  I'd
rather not want to maintain yet another thing.  So, maybe Weather is a
better candidate for using onion-py than challenger.

>>> I don't really understand what the code does. What is meant by 
>>> "combining" documents? What exactly are we trying to measure?
>>> Once I know that and have thought of a sensible way to
>>> integrate it into onion-py I'm confident I can infact write
>>> that glue code :)
>> Right now, the script sums up all graphs contained in Onionoo's 
>> bandwidth, clients, uptime, and weights documents.  It also
>> limits the range of the new graphs to max(first) to max(last) of
>> given input graphs.
>> 
>> For example, assume we want to know the total bandwidth provided
>> by the following 2 relays participating in the relay challenge:
>> 
>> datetime:  0, 1, 2, 3, 4, 5, ...
>> 
>> relay 1: [5, 4, 5, 6] relay 2:  [4, 3, 5, 4]
>> 
>> combined:[8, 9, 9, 6]
>> 
>> This is not perfect for various reasons, but it's the best I came
>> up with yesterday.  Also, as we all know, perfect is the enemy of
>> good.
>> 
>> (If you're curious, reason #1: the graph goes down at the end,
>> and we can't say whether it's because relay 2 disappeared or did
>> not report data yet; reason #2: we're weighting both relays' B/s
>> equally, though relay 1 might have been online 24/7 and relay 2
>> only long enough that Onionoo doesn't put in null; there may be
>> more reasons.)
> 
> Ah, I see! :) So for scalar attributes of relays (such as
> consensus_weight_fraction) it's just a sum, and for histories it's
> the graphs combined as you just outlined. That makes sense, thank
> you!

Right.  Though details documents are not included, so just graphs, no
scalar attributes.

>> I'm not also sure about Python 3.  Whatever we write needs to run
>> on Debian Wheezy with whatever libraries are present there.  If
>> they're all Python 3, great.  If not, can't do.
> 
> I would strongly prefer to use Python 3. I understand wanting to
> use debian stable (I use it myself), but Python 3 is 6 years old
> and Python 2 is completely dead and its use for new projects is not
> recommended. The only mandatory dependency for onion-py, and for
> me, is requests (I really dislike using urllib* directly - if you
> want to know why, check
> https://gist.github.com/kennethreitz/973705), and the
> python3-requests package in Wheezy is from 2012, and there is no
> python3-flask. :-(
> 
> Is there anything standing against using pip (python3-pip package)
> to install requests and flask from pypi?

If there's a way to build it only with packages coming out of Wheezy's
apt-get, our sysadmins will like us more, and that's a good thing.

Installing packages using Python-specific package managers is going to
make our sysadmins sad, so we should have a very good reason for
wanting such a package.  In general, we don't need the latest and
greatest package.  Unless we do.

All the best,
Karsten

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Nikita Borisov
On Sat, Apr 5, 2014 at 3:58 PM, Karsten Loesing  wrote:
> Installing packages using Python-specific package managers is going to
> make our sysadmins sad, so we should have a very good reason for
> wanting such a package.  In general, we don't need the latest and
> greatest package.  Unless we do.

What about virtualenv? Part of the premise behind it is that you can
configure appropriate packages as a developer / operator without
having to bother sysadmins and making them worried about system-wide
effects.

- Nikita
-- 
Nikita Borisov - http://hatswitch.org/~nikita/
Associate Professor, Electrical and Computer Engineering
Tel: +1 (217) 244-5385, Office: 460 CSL
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Nikita Borisov
On Sat, Apr 5, 2014 at 8:58 AM, Karsten Loesing  wrote:
> Right now, the script sums up all graphs contained in Onionoo's
> bandwidth, clients, uptime, and weights documents.  It also limits the
> range of the new graphs to max(first) to max(last) of given input graphs.
>
> For example, assume we want to know the total bandwidth provided by the
> following 2 relays participating in the relay challenge:
>
> datetime:  0, 1, 2, 3, 4, 5, ...
>
> relay 1: [5, 4, 5, 6]
> relay 2:  [4, 3, 5, 4]
>
> combined:[8, 9, 9, 6]
>
> This is not perfect for various reasons, but it's the best I came up
> with yesterday.  Also, as we all know, perfect is the enemy of good.
>
> (If you're curious, reason #1: the graph goes down at the end, and we
> can't say whether it's because relay 2 disappeared or did not report
> data yet; reason #2: we're weighting both relays' B/s equally, though
> relay 1 might have been online 24/7 and relay 2 only long enough that
> Onionoo doesn't put in null; there may be more reasons.)

For the relay challenge, wouldn't you want to include the entire
period that data is available for (i.e., min(first) to max(last))?
Otherwise, if you are looking at a month's worth of data and a new
relay arrives on the last day, your graph would only contain that day.

Also, I think you would want to do datetime.strptime(max(first), ...)
here: https://github.com/kloesing/challenger/blob/master/challenge.py#L177-L178
Otherwise you're just taking the last relay's first and last values as
the new_first and new_last.

Cheers,
- Nikita
-- 
Nikita Borisov - http://hatswitch.org/~nikita/
Associate Professor, Electrical and Computer Engineering
Tel: +1 (217) 244-5385, Office: 460 CSL
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Lukas Erlacher
Hi Karsten,

On 04/05/2014 09:58 AM, Karsten Loesing wrote:
> On second thought, and after sleeping over this, I'm less convinced that we 
> should use an external library for the caching. We should rather start with a 
> simple dict in memory and flush it based on some simple rules. That would 
> allow us to tweak the caching specifically for our use case. And it would 
> mean avoiding a dependency. We can think about moving to onion-py at a later 
> point. That gives you the opportunity to unspaghettize your code, and once 
> that is done we'll have a better idea what caching needs we have for the 
> challenger tool to decide whether to move to onion-py or not. Would you still 
> want to help write the simple caching code for challenger? 
I cleaned up the caching code and added a simple in-memory dict caching 
provider that has no further dependencies to onion-py. (it also has no 
provisions for eviction/flushing at all, but I will add that next. Right now 
everything is cached forever, but of course a new response from OnionOO 
replaces an old one.)

I can write the OnionOO API code and caching code for challenger, if I can use 
Python 3 and the requests library. (See below)
Of course I'd really like to actually have a user for onion-py, since it would 
help getting the necessary feedback and polish to push the library to version 
1.0, but I understand if that isn't appropriate for this project.
>>  I don't really understand what the code does. What is meant by
>> "combining" documents? What exactly are we trying to measure? Once I
>> know that and have thought of a sensible way to integrate it into
>> onion-py I'm confident I can infact write that glue code :)
> Right now, the script sums up all graphs contained in Onionoo's
> bandwidth, clients, uptime, and weights documents.  It also limits the
> range of the new graphs to max(first) to max(last) of given input graphs.
>
> For example, assume we want to know the total bandwidth provided by the
> following 2 relays participating in the relay challenge:
>
> datetime:  0, 1, 2, 3, 4, 5, ...
>
> relay 1: [5, 4, 5, 6]
> relay 2:  [4, 3, 5, 4]
>
> combined:[8, 9, 9, 6]
>
> This is not perfect for various reasons, but it's the best I came up
> with yesterday.  Also, as we all know, perfect is the enemy of good.
>
> (If you're curious, reason #1: the graph goes down at the end, and we
> can't say whether it's because relay 2 disappeared or did not report
> data yet; reason #2: we're weighting both relays' B/s equally, though
> relay 1 might have been online 24/7 and relay 2 only long enough that
> Onionoo doesn't put in null; there may be more reasons.)
Ah, I see! :) So for scalar attributes of relays (such as 
consensus_weight_fraction) it's just a sum, and for histories it's the graphs 
combined as you just outlined. That makes sense, thank you!
> I'm not also sure about Python 3.  Whatever we write needs to run on
> Debian Wheezy with whatever libraries are present there.  If they're all
> Python 3, great.  If not, can't do.

I would strongly prefer to use Python 3. I understand wanting to use debian 
stable (I use it myself), but Python 3 is 6 years old and Python 2 is 
completely dead and its use for new projects is not recommended.
The only mandatory dependency for onion-py, and for me, is requests (I really 
dislike using urllib* directly - if you want to know why, check 
https://gist.github.com/kennethreitz/973705), and the python3-requests package 
in Wheezy is from 2012, and there is no python3-flask. :-(

Is there anything standing against using pip (python3-pip package) to install 
requests and flask from pypi?
>
> Thanks for your feedback!
>
> All the best,
> Karsten
Cheers,
Luke



signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-05 Thread Karsten Loesing
On 04/04/14 21:24, Lukas Erlacher wrote:
> Hello everyone (reply all ftw),

Hi Lukas,

> On 04/04/2014 07:13 PM, Karsten Loesing wrote:
>> Christian, Lukas, everyone,
>> 
>> I learned today that we should have something working in a week or
>> two. That's why I started hacking on this today and produced some
>> code:
>> 
>> https://github.com/kloesing/challenger
>> 
>> Here are a few things I could use help with:
>> 
>> - Anybody want to help turning this script into a web app,
>> possibly using Flask?  See the first next step in README.md.
>
> I might be able to do that, but currently I don't have enough free
> time to make a commitment.

Okay.  Maybe I'll give it a try by stealing heavily from Sathya's
Compass code.  Unless somebody else wants to give this a try?

>> - Lukas, you announced OnionPy on tor-dev@ the other day.  Want to
>> look into the "Add local cache for ..." bullet points under "Next
>> steps"?  Is this something OnionPy could support?  Want to write
>> the glue code?
>
> onion-py already supports transparent caching using memcached. I use
> a (hopefully) unique serialisation of the query as the key (see
> serializer functions here:
> https://github.com/duk3luk3/onion-py/blob/master/onion_py/manager.py#L7)
> and have a bit of spaghetti code to check for available cached data
> and the 304 response status from onionoo
> (https://github.com/duk3luk3/onion-py/blob/master/onion_py/manager.py#L97).

On second thought, and after sleeping over this, I'm less convinced that
we should use an external library for the caching.  We should rather
start with a simple dict in memory and flush it based on some simple
rules.  That would allow us to tweak the caching specifically for our
use case.  And it would mean avoiding a dependency.

We can think about moving to onion-py at a later point.  That gives you
the opportunity to unspaghettize your code, and once that is done we'll
have a better idea what caching needs we have for the challenger tool to
decide whether to move to onion-py or not.

Would you still want to help write the simple caching code for challenger?

>  I don't really understand what the code does. What is meant by
> "combining" documents? What exactly are we trying to measure? Once I
> know that and have thought of a sensible way to integrate it into
> onion-py I'm confident I can infact write that glue code :)

Right now, the script sums up all graphs contained in Onionoo's
bandwidth, clients, uptime, and weights documents.  It also limits the
range of the new graphs to max(first) to max(last) of given input graphs.

For example, assume we want to know the total bandwidth provided by the
following 2 relays participating in the relay challenge:

datetime:  0, 1, 2, 3, 4, 5, ...

relay 1: [5, 4, 5, 6]
relay 2:  [4, 3, 5, 4]

combined:[8, 9, 9, 6]

This is not perfect for various reasons, but it's the best I came up
with yesterday.  Also, as we all know, perfect is the enemy of good.

(If you're curious, reason #1: the graph goes down at the end, and we
can't say whether it's because relay 2 disappeared or did not report
data yet; reason #2: we're weighting both relays' B/s equally, though
relay 1 might have been online 24/7 and relay 2 only long enough that
Onionoo doesn't put in null; there may be more reasons.)

> Cutting off the rest of the quote tree here (is that a polite thing
> to do on mailing lists? Sorry if not.), I just have two more comments
> towards Roger's thoughts:
> 
> 1. Groups of relays taking the challenge together could just form
> relay families and we could count relay families in aggregate. (I'm
> already thinking about relay families a lot because gamambel wants me
> to overhaul the torservers exit-funding scripts to use relay
> families.)

Relay families are a difficult topic.  I remember spending a day or two
figuring out how to group by family in Compass a while back.  There must
be some notes or thoughts on Trac if you're curious.

Regarding these graphs, I'm not sure what we would gain from grouping
new relays by family.  My current plan is to provide only graphs that
have a single graph line for all relays and bridges participating in the
challenge.  So, "total bytes read", "total bytes written", "total number
of new relays and bridges", "total consensus weight fraction added",
"total advertised bandwidth added", etc.  I don't think we should add
categories by family or any other criteria.  KISS.

> 2. If you want to do something with consensus weight, why
> not compare against all other new relays based on the first_seen
> property? ("new" can be adjusted until sufficiently pretty graphs
> emerge; and we'd need to periodically (every 4 or 12 or 24 hours?)
> fetch the consensus_weights from onionoo)

I'm not sure what you mean.  We do have consensus weight fractions in
(combined) weights documents.  I'm also planning to add absolute
consensus weights to those documents in the future.

By "fetching something periodically from Onionoo", do you mean 

Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-04 Thread Lukas Erlacher
Hello everyone (reply all ftw),

On 04/04/2014 07:13 PM, Karsten Loesing wrote:
> Christian, Lukas, everyone,
>
> I learned today that we should have something working in a week or two.
>  That's why I started hacking on this today and produced some code:
>
> https://github.com/kloesing/challenger
>
> Here are a few things I could use help with:
>
>  - Anybody want to help turning this script into a web app, possibly
> using Flask?  See the first next step in README.md.
I might be able to do that, but currently I don't have enough free time to make 
a commitment.
>  - Lukas, you announced OnionPy on tor-dev@ the other day.  Want to look
> into the "Add local cache for ..." bullet points under "Next steps"?  Is
> this something OnionPy could support?  Want to write the glue code?
onion-py already supports transparent caching using memcached. I use a 
(hopefully) unique serialisation of the query as the key (see serializer 
functions here: 
https://github.com/duk3luk3/onion-py/blob/master/onion_py/manager.py#L7) and 
have a bit of spaghetti code to check for available cached data and the 304 
response status from onionoo 
(https://github.com/duk3luk3/onion-py/blob/master/onion_py/manager.py#L97).

I don't really understand what the code does. What is meant by "combining" 
documents? What exactly are we trying to measure? Once I know that and have 
thought of a sensible way to integrate it into onion-py I'm confident I can 
infact write that glue code :)

Cutting off the rest of the quote tree here (is that a polite thing to do on 
mailing lists? Sorry if not.), I just have two more comments towards Roger's 
thoughts:

1. Groups of relays taking the challenge together could just form relay 
families and we could count relay families in aggregate. (I'm already thinking 
about relay families a lot because gamambel wants me to overhaul the torservers 
exit-funding scripts to use relay families.)
2. If you want to do something with consensus weight, why not compare against 
all other new relays based on the first_seen property? ("new" can be adjusted 
until sufficiently pretty graphs emerge; and we'd need to periodically (every 4 
or 12 or 24 hours?) fetch the consensus_weights from onionoo)

Cheers,
Luke

PS: If you'd like me to support different backends for the caching in onion-py, 
I'm open to integrating anything that has a python 3 library.



signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-04-04 Thread Karsten Loesing
Christian, Lukas, everyone,

I learned today that we should have something working in a week or two.
 That's why I started hacking on this today and produced some code:

https://github.com/kloesing/challenger

Here are a few things I could use help with:

 - Anybody want to help turning this script into a web app, possibly
using Flask?  See the first next step in README.md.

 - Lukas, you announced OnionPy on tor-dev@ the other day.  Want to look
into the "Add local cache for ..." bullet points under "Next steps"?  Is
this something OnionPy could support?  Want to write the glue code?

 - Christian, want to help write the graphing code that visualizes the
`combined-*.json` files produced by that tool?  The README.md suggests a
few possible graphs.

Thanks in advance!  You're all helping grow the Tor network!

Also replying to Christian's mail inline.

On 28/03/14 09:07, Christian wrote:
> On 27.03.2014 16:25, Karsten Loesing wrote:
>> On 27/03/14 11:57, Roger Dingledine wrote:
>>> Hi Christian, other tor relay fans,
>>>
>>> I'm looking for some volunteers, hopefully including Christian, to work
>>> on metrics and visualization of impact from new relays.
>>>
>>> We're working with EFF to do another "Tor relay challenge" [*], to both
>>> help raise awareness of the value of Tor, and encourage many people to
>>> run relays -- probably non-exit relays for the most part, since that's
>>> the easiest for normal volunteers to step up and do.
>>>
>>> You can read about the first round from several years ago here:
>>> https://www.eff.org/torchallenge
>>>
>>> To make it succeed, the challenge for us here is to figure out what to
>>> measure to track progress, and then measure it and graph it for everybody.
>>>
>>> I'm figuring that like last time, EFF will collect a list of fingerprints
>>> of relays that signed up "because of the challenge".
>>>
>>> One of the main pushes we're aiming for this year is longevity: it's
>>> easy to sign up a relay for two weeks and then stop. We want to emphasize
>>> consistency and encourage having the relays up for many months.
> 
> Do you want the challenge application to simply provide some graphs or
> give some sort of interactive dashboard (clientside JavaScript)?

You asked Roger, and I'm not Roger, but I'd say let's start with some
graphs.  We can always make it more interactive later.  Though I doubt
it will be necessary.

>> Before going through your list of things we'd want to track below, let's
>> first talk about our options to turn a list of fingerprints into fancy
>> graphs:
>>
>>  1. Write a new metrics-web module and put graphs on the metrics
>> website.  This means parsing relay descriptors and storing certain
>> per-relay statistics for all relays.  That gives us maximum flexibility
>> in the kinds of statistics, but is also most expensive in terms of
>> developer hours.  I don't want to do this.
>>
>>  2. Extend Globe to show details pages for multiple relays.  This
>> requires us to move to the server-based Globe-node, because the poor
>> browser shouldn't download graph data for all relays, but the server
>> should return a single graph for all relays.  It's also unclear if the
>> new graphs will be of general interest for Globe users, and if the rest
>> of the Globe details will be confusing to people interested in the relay
>> challenge.  Probably not a great idea, but I'm not sure.
>>
> 
> I agree that Globe isn't the best place to display the challenge graphs.
> Currently the only focus for Globe is to provide data for single relays
> and bridges.
> Imo it would be better if the challenge participants list adds links to
> atlas, blutmagie and globe.

Agreed!

>>  3. Extend Onionoo to return aggregate graph data for a given set of
>> fingerprints.  Seems useful.  But has the big disadvantage that Onionoo
>> would suddenly have to create responses dynamically.  I'm worried about
>> creating a new performance bottleneck there, and this is certainly not
>> possible with poor overloaded yatei.
>>
>>  4. Write a new little tool that fetches Onionoo documents once (or
>> twice) per day for all relays participating in the relay challenge and
>> that produces graph data.  That new tool could probably re-use some
>> Compass code for the backend and some Globe code for the frontend.
>> Graphs could be integrated directly into EFF's website.  This is
>> currently my favorite approach.
>>
> 
> I like this idea.

Glad to hear!  I slightly moved away from the "fetches once or twice per
day" idea to a more elaborate approach.  But the general idea is still
the same.

>> Note for 2--4: Onionoo currently only gives out data for relays that
>> have been running in the past 7 days.  I'd have to extend it to give out
>> all data for a list of fingerprints, regardless of when relays were
>> running the last time.  That's 2--3 days of coding and testing for me.
>> It's also potentially creating a bottleneck, so we should first have a
>> replacement for yatei.
>>
>>> So what are the

Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-03-28 Thread Runa A. Sandvik
On Fri, Mar 28, 2014 at 5:45 AM, Karsten Loesing  wrote:
> On 27/03/14 19:51, Runa A. Sandvik wrote:
>> On Thu, Mar 27, 2014 at 3:25 PM, Karsten Loesing  
>> wrote:
>>> Before going through your list of things we'd want to track below, let's
>>> first talk about our options to turn a list of fingerprints into fancy
>>> graphs:
>>
>> Would it be possible to also have a "Top 10 countries with the most
>> Tor relays" graph?
>
> Hi Runa!

Hi Karsten! :)

> Hmm hmm hmm---yes!  Onionoo's details documents contain country
> information, and it shouldn't be too hard to combine them with uptime or
> bandwidth information to make per-country graphs.
>
> (Wow, your question made me rethink how we resolve relay/bridge IP
> addresses to country codes for statistics.  I was always thinking that
> we need to remember the full history of country codes that a
> relay/bridge IP address was resolved to, because a relay/bridge could be
> moved to another country, or a new IP-to-country database might change
> its mind about which country it is in.  But that doesn't really matter
> for statistics where we're mostly interested in the big picture.  We can
> probably just use whatever country code we learned last and apply that
> to the full history of the relay/bridge.  Guess I should resume working
> on per-country graphs for the metrics website soon, for both relays and
> bridges.  Thanks!)

Great! I look forward to seeing the stats for this.

-- 
Runa A. Sandvik
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-03-27 Thread Karsten Loesing
On 27/03/14 19:51, Runa A. Sandvik wrote:
> On Thu, Mar 27, 2014 at 3:25 PM, Karsten Loesing  
> wrote:
>> Before going through your list of things we'd want to track below, let's
>> first talk about our options to turn a list of fingerprints into fancy
>> graphs:
> 
> Would it be possible to also have a "Top 10 countries with the most
> Tor relays" graph?

Hi Runa!

Hmm hmm hmm---yes!  Onionoo's details documents contain country
information, and it shouldn't be too hard to combine them with uptime or
bandwidth information to make per-country graphs.

(Wow, your question made me rethink how we resolve relay/bridge IP
addresses to country codes for statistics.  I was always thinking that
we need to remember the full history of country codes that a
relay/bridge IP address was resolved to, because a relay/bridge could be
moved to another country, or a new IP-to-country database might change
its mind about which country it is in.  But that doesn't really matter
for statistics where we're mostly interested in the big picture.  We can
probably just use whatever country code we learned last and apply that
to the full history of the relay/bridge.  Guess I should resume working
on per-country graphs for the metrics website soon, for both relays and
bridges.  Thanks!)

(Disclaimer: it's pre-second coffee time!)

All the best,
Karsten

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-03-27 Thread Runa A. Sandvik
On Thu, Mar 27, 2014 at 3:25 PM, Karsten Loesing  wrote:
> Before going through your list of things we'd want to track below, let's
> first talk about our options to turn a list of fingerprints into fancy
> graphs:

Would it be possible to also have a "Top 10 countries with the most
Tor relays" graph?

-- 
Runa A. Sandvik
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-03-27 Thread Karsten Loesing
On 27/03/14 11:57, Roger Dingledine wrote:
> Hi Christian, other tor relay fans,
> 
> I'm looking for some volunteers, hopefully including Christian, to work
> on metrics and visualization of impact from new relays.
> 
> We're working with EFF to do another "Tor relay challenge" [*], to both
> help raise awareness of the value of Tor, and encourage many people to
> run relays -- probably non-exit relays for the most part, since that's
> the easiest for normal volunteers to step up and do.
> 
> You can read about the first round from several years ago here:
> https://www.eff.org/torchallenge
> 
> To make it succeed, the challenge for us here is to figure out what to
> measure to track progress, and then measure it and graph it for everybody.
> 
> I'm figuring that like last time, EFF will collect a list of fingerprints
> of relays that signed up "because of the challenge".
> 
> One of the main pushes we're aiming for this year is longevity: it's
> easy to sign up a relay for two weeks and then stop. We want to emphasize
> consistency and encourage having the relays up for many months.

Before going through your list of things we'd want to track below, let's
first talk about our options to turn a list of fingerprints into fancy
graphs:

 1. Write a new metrics-web module and put graphs on the metrics
website.  This means parsing relay descriptors and storing certain
per-relay statistics for all relays.  That gives us maximum flexibility
in the kinds of statistics, but is also most expensive in terms of
developer hours.  I don't want to do this.

 2. Extend Globe to show details pages for multiple relays.  This
requires us to move to the server-based Globe-node, because the poor
browser shouldn't download graph data for all relays, but the server
should return a single graph for all relays.  It's also unclear if the
new graphs will be of general interest for Globe users, and if the rest
of the Globe details will be confusing to people interested in the relay
challenge.  Probably not a great idea, but I'm not sure.

 3. Extend Onionoo to return aggregate graph data for a given set of
fingerprints.  Seems useful.  But has the big disadvantage that Onionoo
would suddenly have to create responses dynamically.  I'm worried about
creating a new performance bottleneck there, and this is certainly not
possible with poor overloaded yatei.

 4. Write a new little tool that fetches Onionoo documents once (or
twice) per day for all relays participating in the relay challenge and
that produces graph data.  That new tool could probably re-use some
Compass code for the backend and some Globe code for the frontend.
Graphs could be integrated directly into EFF's website.  This is
currently my favorite approach.

Note for 2--4: Onionoo currently only gives out data for relays that
have been running in the past 7 days.  I'd have to extend it to give out
all data for a list of fingerprints, regardless of when relays were
running the last time.  That's 2--3 days of coding and testing for me.
It's also potentially creating a bottleneck, so we should first have a
replacement for yatei.

> So what are the things we'd want to track?
> 
> - Number of relays signed up that are Running, over time.

We can do something here with Onionoo's new uptime documents.

> - Total bandwidth history of these running relays, over time.

We can sum up data from bandwidth documents for this.

> - Maybe a graph showing the total number of bytes ever contributed
>   by these relays? That would impress people perhaps.

Sure, same data as above.

> - Total consensus weight of these running relays, over time.

We only have total consensus weight *fraction*, but yes.

> - Something emphasizing duration -- e.g. the total consensus weight of
>   the subset of the relays that have been in the consensus for 90% of
>   the past month, 2 months, 6 months, etc. Are there better ideas here
>   I hope? We'll want to be cognizant that if we're in the first week
>   of the challenge, the 2 month graph will be empty and thus look sad.

Not sure what the 90% part is for, but yes, graphs with total consensus
weight fraction are doable.

Regarding the sad-looking 2 month graph, we can easily define the data
when the challenge starts and not show graphs until they make sense.
Note that the current intervals for most data are 1 week, 1 month, 3
months, 1 year, and 5 years.

> - Something comparing the above numbers to the total numbers. Given how
>   huge some of the relays are lately, it would be easily to visualize
>   the new contribution as a tiny irrelevant fraction, which could be
>   disheartening to new relay operators even if their relays will actually
>   become a big deal with some patience. What are some strategies for
>   making this work right? E.g. a layer graph showing y layered on top of
>   x where y is the new contribution, rather than a percentage-of-total
>   graph that shows approximately 0%.

Absolute contributions to consensus weight are not availabl

[tor-relays] Metrics for assessing EFF's Tor relay challenge?

2014-03-27 Thread Roger Dingledine
Hi Christian, other tor relay fans,

I'm looking for some volunteers, hopefully including Christian, to work
on metrics and visualization of impact from new relays.

We're working with EFF to do another "Tor relay challenge" [*], to both
help raise awareness of the value of Tor, and encourage many people to
run relays -- probably non-exit relays for the most part, since that's
the easiest for normal volunteers to step up and do.

You can read about the first round from several years ago here:
https://www.eff.org/torchallenge

To make it succeed, the challenge for us here is to figure out what to
measure to track progress, and then measure it and graph it for everybody.

I'm figuring that like last time, EFF will collect a list of fingerprints
of relays that signed up "because of the challenge".

One of the main pushes we're aiming for this year is longevity: it's
easy to sign up a relay for two weeks and then stop. We want to emphasize
consistency and encourage having the relays up for many months.

So what are the things we'd want to track?

- Number of relays signed up that are Running, over time.
- Total bandwidth history of these running relays, over time.
- Maybe a graph showing the total number of bytes ever contributed
  by these relays? That would impress people perhaps.
- Total consensus weight of these running relays, over time.
- Something emphasizing duration -- e.g. the total consensus weight of
  the subset of the relays that have been in the consensus for 90% of
  the past month, 2 months, 6 months, etc. Are there better ideas here
  I hope? We'll want to be cognizant that if we're in the first week
  of the challenge, the 2 month graph will be empty and thus look sad.
- Something comparing the above numbers to the total numbers. Given how
  huge some of the relays are lately, it would be easily to visualize
  the new contribution as a tiny irrelevant fraction, which could be
  disheartening to new relay operators even if their relays will actually
  become a big deal with some patience. What are some strategies for
  making this work right? E.g. a layer graph showing y layered on top of
  x where y is the new contribution, rather than a percentage-of-total
  graph that shows approximately 0%.

We could also imagine more niche categories. For example, if we're hoping
to get people to sign up relays at universities, we could imagine that
the folks running the challenge give us a list of fingerprints of relays
that self-identify as being at universities, and then we do up the same
set of graphs with that subset of relays.

So, Christian, others, how much of this is possible as-is or with some
limited tweaking, with Globe and related scripts? I am hoping the answer
is most of it. :) I also cc Karsten because a lot of this overlaps with
the metrics scripts, but I am expecting Karsten to push back against
the idea of integrating these measurements more with the metrics project.

Any other ideas for what to measure to help people know whether their
contribution is being worthwhile?

[*] Please don't take this mail as any official announcement, or timeline,
or any of that. At this point we need to collect people to help make
this happen, not collect news stories.

Thanks!
--Roger

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays