[tor-dev] Help me help you : )

2015-12-08 Thread Lain Iwakura
Hello guys;
My name is Lain, I'm from Brazil. only it's not my real name. :) And I
love it.
I identify a lot with the Tor project, and would like a simple guideline.
I work with programming (Pascal) in the last five years, I want to
contribute to developing the project. but do not know where to start,
take that language, which would be more effective my acting. ...
something that a veteran is tired of doing I could do, and learn from
that expedite the work.
Anyway, give me some tips.

Hey! I am learning English for that. help me to be useful. :)


___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-08 Thread nusenu

> Also, here are the steps to reproduce:
> 
>   wget 
> https://collector.torproject.org/archive/relay-descriptors/consensuses/consensuses-2015-11.tar.xz
>   tar xvJf consensuses-2015-11.tar.xz
>   go get git.torproject.org/user/phw/sybilhunter.git
>   sybilhunter -data consensuses-2015-11/ -uptime

How much of an effort would it be to support onionoo files as input
data? (onionoo data would be able to display more data like AS, CC,
first-seen)
I could provide some archived onionoo data.



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-08 Thread Philipp Winter
On Mon, Dec 07, 2015 at 01:44:47PM -0800, David Fifield wrote:
> On Mon, Dec 07, 2015 at 02:51:23PM -0500, Philipp Winter wrote:
> > I spent some time improving the existing relay uptime visualisation [0].
> > Inspired by a research paper [1], the new algorithm uses single-linkage
> > clustering with Pearson's correlation coefficient as distance function.
> > The idea is that relays are grouped next to each other if their uptime
> > (basically a binary sequence) is highly correlated.  Check out the
> > following gallery.  It contains monthly relay uptime images, dating back
> > to 2007:
> > 
> 
> How about just taking the XOR of two sequences as the distance?

Here's Nov 2015, with XOR as distance:


> It would be interesting to know if there are any near-perfect
> anticorrelations; i.e., one relay starts when another stops.

It looks like there's many of them.  So far, I calculated the
correlation as 1 - Pearson(s1,s2) because I'm only interested in
positively correlated sequences.  Here's an uptime image with
Pearson(s1,s2) as distance function, so positive correlation is
considered just as much as negative correlation.  Have a look at the
leftmost part:


Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-08 Thread Philipp Winter
On Mon, Dec 07, 2015 at 09:57:18PM +, nusenu wrote:
> > and every column is a relay.  White pixels mean
> > that a relay was offline and black pixels means that a relay was
> > online.  Red pixels are used to highlight suspiciously similar clusters.
> 
> I assume they are highlighted only if they exceed a certain group size?
> What is the threshold?

Exactly.  Groups >= 5 are considered for highlighting.

> Until I looked at the heartbleed example I assumed grouping requires
> "perfect matches" across the entire month but after seeing the
> heartbleed example I'm not sure whether that is actually the case or if
> two distinct groups are just next to each other and do not have a
> "separator" between them.

Right, I don't use perfect matching, so we can account for some noise,
e.g., some of the Sybils having small downtimes, or not starting and
stopping at the exact same hour.  Here's the code:


> I would also find it useful to have it accept fingerprints as input and
> graph their uptime to look at a given set of relays in certain cases
> 
> example input could be the fingerprints from [1]+[2] after these relays
> have been around for some time.

Good point.  That has been on my todo list and I hope to get it done
soon.

> Are you planing to generate these graphs on an ongoing basis?

Yes, I would like to.  We could easily generate them every other hour,
or even hourly.  The details will depend on this thread, Karsten
started:


Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-08 Thread Philipp Winter
On Tue, Dec 08, 2015 at 04:52:45PM +, nusenu wrote:
>> Also, here are the steps to reproduce:
>> 
>>   wget 
>> https://collector.torproject.org/archive/relay-descriptors/consensuses/consensuses-2015-11.tar.xz
>>   tar xvJf consensuses-2015-11.tar.xz
>>   go get git.torproject.org/user/phw/sybilhunter.git
>>   sybilhunter -data consensuses-2015-11/ -uptime
> 
> How much of an effort would it be to support onionoo files as input
> data? (onionoo data would be able to display more data like AS, CC,
> first-seen)
> I could provide some archived onionoo data.

It's not trivial, but feasible.  Sybilhunter uses a Go-based descriptor
parsing library [0] that doesn't support Onionoo's format; so an Onionoo
parser is necessary, and an update to sybilhunter's uptime analysis
code.

[0] 

Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-08 Thread Philipp Winter
On Mon, Dec 07, 2015 at 11:43:38PM -0500, grarpamp wrote:
> Can a one be generated covering each year and maybe a five year one.

I haven't checked the complexity of the clustering algorithm I use, but
it's probably quadratic.  I think a full year worth of uptimes would
require pruning the data, e.g., remove all relays that were online for
only one or two hours.

For now, here's three months, Sep 2015 to Nov 2015, in a 12 MiB file:


Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Scaling Tor Metrics, Round 2

2015-12-08 Thread Spencer

Hi,



Karsten Loesing:
We briefly discussed making a JavaScript-free Globe a while ago by
using Node.js.  I'm not sure whether this would also work for Metrics.
 It may depend on how interactive graphs are supposed to be.



As said later in this thread, .png seems okay.  Though I see the load on 
the server if tons of peeps get at the site; I respect the client-side 
preference.


Thanks :)



I think the main option is to keep rendering graphs on the server.
Right now, we're using R/ggplot2 for that, but we could switch to
server-side JavaScript or really anything else.  The main downside is
lack of real interactivity.



I see the need for interaction :)  David McCandless [0] has some cool 
stuff that isn't very interactive (but uses JS).


Can the data be processed offline by each person? Tor Rendering Engine 
:P


Wordlife,
Spencer

[0]: http://www.davidmccandless.com



___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev