Re: [tor-dev] tor's definition of 'median'

2015-08-13 Thread Nick Mathewson
On Wed, Aug 12, 2015 at 5:34 PM, nusenu nus...@openmailbox.org wrote:
 from today's measurement meeting:

 15:00:20 virgil karsten: I've decided I'm going to fix the definition of 
 median
 15:00:26 virgil in the tor sourcecode
 15:00:36 karsten virgil: is it broken?
 15:00:53 karsten or just not specified as clearly as it should be?
 15:01:01 virgil for ordered list {a,b,c,d}, it returns b instead of 
 (b+c)/2.
 15:01:24 karsten yes. maybe that's for a reason (which I don't know).
 15:01:40 virgil I look forward to hearing this reason when my patch is 
 rejected.
 15:01:41 karsten like, using value (b+c)/2 would break for some reason, 
 whereas any of a, b, c, d would be fine.
 15:01:45 Sebastian you cannot do that
 15:01:51 Sebastian without breaking Tor's voting
 15:02:21 Sebastian Tor's specification requires low median for a bunch of 
 directory stuff


 I'd be interested in the reason as well.

The correct fix here is to update the code documentation to define the
functions as returning the low-median, and to update dir-spec.txt to
say so too.  I'd accept documentation patches like that.

Changing the code to return the mean of the two center elements from
even arrays would break all authority voting, and wouldn't actually be
useful.

-- 
Nick
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-13 Thread nusenu
 Changing the code to return the mean of the two center elements from
 even arrays would break all authority voting, and wouldn't actually be
 useful.

Yes, that is what Sebastian said on IRC as well. Can you shed some light
as to why it would break voting?

thanks



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-13 Thread teor

 On 14 Aug 2015, at 03:10 , nusenu nus...@openmailbox.org wrote:
 
 Changing the code to return the mean of the two center elements from
 even arrays would break all authority voting, and wouldn't actually be
 useful.
 
 Yes, that is what Sebastian said on IRC as well. Can you shed some light
 as to why it would break voting?

If the authorities supply different values in the consensus, voting breaks.

Authorities using the low-median would supply one value, and authorities using 
the mean-median would supply another value. (Authorities typically run 
different versions of tor, and don't upgrade all at once.)

Breaking changes like this are typically negotiated among the authorities using 
numbered consensus methods. Once enough authorities support a new consensus 
method, it is activated during voting.

Rather than creating a new consensus method to implement mean-median, it's much 
easier to patch the documentation to specify low-median. (And I see no 
significant gain in changing from low-median to mean-median.)

I'd rather see bandwidth measurements become more accurate, for more relays, 
more of the time, than change how their median is defined.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com
pgp ABFED1AC
https://gist.github.com/teor2345/d033b8ce0a99adbc89c5

teor at blah dot im
OTR D5BE4EC2 255D7585 F3874930 DB130265 7C9EBBC7



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-13 Thread Sebastian Hahn

 On 13 Aug 2015, at 18:50, Nick Mathewson ni...@alum.mit.edu wrote:
 
 On Wed, Aug 12, 2015 at 5:34 PM, nusenu nus...@openmailbox.org wrote:
 from today's measurement meeting:
 
 15:00:20 virgil karsten: I've decided I'm going to fix the definition of 
 median
 15:00:26 virgil in the tor sourcecode
 15:00:36 karsten virgil: is it broken?
 15:00:53 karsten or just not specified as clearly as it should be?
 15:01:01 virgil for ordered list {a,b,c,d}, it returns b instead of 
 (b+c)/2.
 15:01:24 karsten yes. maybe that's for a reason (which I don't know).
 15:01:40 virgil I look forward to hearing this reason when my patch is 
 rejected.
 15:01:41 karsten like, using value (b+c)/2 would break for some reason, 
 whereas any of a, b, c, d would be fine.
 15:01:45 Sebastian you cannot do that
 15:01:51 Sebastian without breaking Tor's voting
 15:02:21 Sebastian Tor's specification requires low median for a bunch of 
 directory stuff
 
 
 I'd be interested in the reason as well.
 
 The correct fix here is to update the code documentation to define the
 functions as returning the low-median, and to update dir-spec.txt to
 say so too.  I'd accept documentation patches like that.

The documentation for the code already says that. The spec could be
updated to say low-median consistently, tho.

Cheers
Sebastian
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-12 Thread nusenu
from today's measurement meeting:

 15:00:20 virgil karsten: I've decided I'm going to fix the definition of 
 median
 15:00:26 virgil in the tor sourcecode
 15:00:36 karsten virgil: is it broken?
 15:00:53 karsten or just not specified as clearly as it should be?
 15:01:01 virgil for ordered list {a,b,c,d}, it returns b instead of (b+c)/2.
 15:01:24 karsten yes. maybe that's for a reason (which I don't know).
 15:01:40 virgil I look forward to hearing this reason when my patch is 
 rejected.
 15:01:41 karsten like, using value (b+c)/2 would break for some reason, 
 whereas any of a, b, c, d would be fine.
 15:01:45 Sebastian you cannot do that
 15:01:51 Sebastian without breaking Tor's voting
 15:02:21 Sebastian Tor's specification requires low median for a bunch of 
 directory stuff


I'd be interested in the reason as well.






signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-11 Thread Virgil Griffith
Is there some implementation-specific reason not to use the standard
mathematical definition of median?  If not, I propose changing the
implementation to become it.

-V

On Tue, Aug 11, 2015 at 2:44 AM Nick Mathewson ni...@alum.mit.edu wrote:

 On Mon, Aug 10, 2015 at 1:11 PM, nusenu nus...@openmailbox.org wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA512
 
  Hi,
 
  https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028
 
  If 3 or more authorities provide a Measured= keyword for a router,
  the authorities produce a consensus containing a w Bandwidth=
  keyword equal to the median of the Measured= votes.
 
  a random sample from recent votes:
 
  grep 37.59.38.117 -A 3 *|grep Measured
  w Bandwidth=6869 Measured=7570
  w Bandwidth=6869 Measured=15500
  w Bandwidth=6869 Measured=18100
  w Bandwidth=6869 Measured=30500
 
  Tor says the median value is
  15500
 
  2015-08-10-16-00-00-consensus:
  w Bandwidth=15500
 
  but the median of these 4 values is actually:
  (18100+15500)/2 = 16800
  no?
 
  Has tor a different definition of 'median' and simply takes always the
  second ordered measurement vote out of 4 votes or is there a bug in
  the spec or implementation?

 There's one misplaced throwaway sentence in dir-spec.txt:

   All ties in computing medians are broken in favor of the smaller or
earlier item.
 

 We should bring this, and probably other things, into a definitions
 section earlier in dir-spec.txt.  Patches welcome. ;)

 --
 Nick
 ___
 tor-dev mailing list
 tor-dev@lists.torproject.org
 https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-11 Thread Andreas Krey
On Tue, 11 Aug 2015 13:44:48 +, Virgil Griffith wrote:
 I mean the median.
 
 From Wikipedia...
 
 For example, if *a*  *b*  *c*, then the median of the list {*a*, *b*, *c*}
 is *b*, and, if *a*  *b*  *c*  *d*, then the median of the list {*a*, *b*
 , *c*, *d*} is the mean of *b* and *c*; i.e., it is (*b* + *c*) / 2.

In the preceding paragraph wikipedia isn't that strict:
'the median is then usually defined to be the mean of the two middle values',
and tor is unusual. :-)

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-11 Thread Maciej Soltysiak
Virgil's absolutely right. Median as the middle value in a _sorted_ set
is:
- for odd number of data points, it's the middle one: set[N/2]
- for even number of data points, it's the average of two in the middle:
(set[N/2] + set[(N+1)/2]) / 2

Best regards,
Maciej

On Tue, Aug 11, 2015 at 3:44 PM, Virgil Griffith i...@virgil.gr wrote:

 I mean the median.

 From Wikipedia...

 For example, if *a*  *b*  *c*, then the median of the list {*a*, *b*,
 *c*} is *b*, and, if *a*  *b*  *c*  *d*, then the median of the list {
 *a*, *b*, *c*, *d*} is the mean of *b* and *c*; i.e., it is (*b* + *c*) /
 2.

 -V

 On Tue, Aug 11, 2015 at 9:29 PM John oneoft...@riseup.net wrote:

 I think you are confusing the median with the mean:

 https://en.wikipedia.org/wiki/Median
 https://en.wikipedia.org/wiki/Mean

 Taking the median instead of the mean can be beneficial in situations
 where you have larger outliers in your data, which typically affect the
 mean very much.

 -j

 Virgil Griffith:
  Is there some implementation-specific reason not to use the standard
  mathematical definition of median?  If not, I propose changing the
  implementation to become it.
 
  -V
 
  On Tue, Aug 11, 2015 at 2:44 AM Nick Mathewson ni...@alum.mit.edu
 wrote:
 
  On Mon, Aug 10, 2015 at 1:11 PM, nusenu nus...@openmailbox.org
 wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA512
 
  Hi,
 
  https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028
 
  If 3 or more authorities provide a Measured= keyword for a router,
  the authorities produce a consensus containing a w Bandwidth=
  keyword equal to the median of the Measured= votes.
 
  a random sample from recent votes:
 
  grep 37.59.38.117 -A 3 *|grep Measured
  w Bandwidth=6869 Measured=7570
  w Bandwidth=6869 Measured=15500
  w Bandwidth=6869 Measured=18100
  w Bandwidth=6869 Measured=30500
 
  Tor says the median value is
  15500
 
  2015-08-10-16-00-00-consensus:
  w Bandwidth=15500
 
  but the median of these 4 values is actually:
  (18100+15500)/2 = 16800
  no?
 
  Has tor a different definition of 'median' and simply takes always the
  second ordered measurement vote out of 4 votes or is there a bug in
  the spec or implementation?
 
  There's one misplaced throwaway sentence in dir-spec.txt:
 
All ties in computing medians are broken in favor of the smaller or
 earlier item.
  
 
  We should bring this, and probably other things, into a definitions
  section earlier in dir-spec.txt.  Patches welcome. ;)
 
  --
  Nick
  ___
  tor-dev mailing list
  tor-dev@lists.torproject.org
  https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
 
 
 
 
  ___
  tor-dev mailing list
  tor-dev@lists.torproject.org
  https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
 
 ___
 tor-dev mailing list
 tor-dev@lists.torproject.org
 https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


 ___
 tor-dev mailing list
 tor-dev@lists.torproject.org
 https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-11 Thread Virgil Griffith
I mean the median.

From Wikipedia...

For example, if *a*  *b*  *c*, then the median of the list {*a*, *b*, *c*}
is *b*, and, if *a*  *b*  *c*  *d*, then the median of the list {*a*, *b*
, *c*, *d*} is the mean of *b* and *c*; i.e., it is (*b* + *c*) / 2.

-V

On Tue, Aug 11, 2015 at 9:29 PM John oneoft...@riseup.net wrote:

 I think you are confusing the median with the mean:

 https://en.wikipedia.org/wiki/Median
 https://en.wikipedia.org/wiki/Mean

 Taking the median instead of the mean can be beneficial in situations
 where you have larger outliers in your data, which typically affect the
 mean very much.

 -j

 Virgil Griffith:
  Is there some implementation-specific reason not to use the standard
  mathematical definition of median?  If not, I propose changing the
  implementation to become it.
 
  -V
 
  On Tue, Aug 11, 2015 at 2:44 AM Nick Mathewson ni...@alum.mit.edu
 wrote:
 
  On Mon, Aug 10, 2015 at 1:11 PM, nusenu nus...@openmailbox.org wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA512
 
  Hi,
 
  https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028
 
  If 3 or more authorities provide a Measured= keyword for a router,
  the authorities produce a consensus containing a w Bandwidth=
  keyword equal to the median of the Measured= votes.
 
  a random sample from recent votes:
 
  grep 37.59.38.117 -A 3 *|grep Measured
  w Bandwidth=6869 Measured=7570
  w Bandwidth=6869 Measured=15500
  w Bandwidth=6869 Measured=18100
  w Bandwidth=6869 Measured=30500
 
  Tor says the median value is
  15500
 
  2015-08-10-16-00-00-consensus:
  w Bandwidth=15500
 
  but the median of these 4 values is actually:
  (18100+15500)/2 = 16800
  no?
 
  Has tor a different definition of 'median' and simply takes always the
  second ordered measurement vote out of 4 votes or is there a bug in
  the spec or implementation?
 
  There's one misplaced throwaway sentence in dir-spec.txt:
 
All ties in computing medians are broken in favor of the smaller or
 earlier item.
  
 
  We should bring this, and probably other things, into a definitions
  section earlier in dir-spec.txt.  Patches welcome. ;)
 
  --
  Nick
  ___
  tor-dev mailing list
  tor-dev@lists.torproject.org
  https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
 
 
 
 
  ___
  tor-dev mailing list
  tor-dev@lists.torproject.org
  https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
 
 ___
 tor-dev mailing list
 tor-dev@lists.torproject.org
 https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-11 Thread John
I think you are confusing the median with the mean:

https://en.wikipedia.org/wiki/Median
https://en.wikipedia.org/wiki/Mean

Taking the median instead of the mean can be beneficial in situations
where you have larger outliers in your data, which typically affect the
mean very much.

-j

Virgil Griffith:
 Is there some implementation-specific reason not to use the standard
 mathematical definition of median?  If not, I propose changing the
 implementation to become it.
 
 -V
 
 On Tue, Aug 11, 2015 at 2:44 AM Nick Mathewson ni...@alum.mit.edu wrote:
 
 On Mon, Aug 10, 2015 at 1:11 PM, nusenu nus...@openmailbox.org wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512

 Hi,
 
 https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028

 If 3 or more authorities provide a Measured= keyword for a router,
 the authorities produce a consensus containing a w Bandwidth=
 keyword equal to the median of the Measured= votes.

 a random sample from recent votes:

 grep 37.59.38.117 -A 3 *|grep Measured
 w Bandwidth=6869 Measured=7570
 w Bandwidth=6869 Measured=15500
 w Bandwidth=6869 Measured=18100
 w Bandwidth=6869 Measured=30500

 Tor says the median value is
 15500

 2015-08-10-16-00-00-consensus:
 w Bandwidth=15500

 but the median of these 4 values is actually:
 (18100+15500)/2 = 16800
 no?

 Has tor a different definition of 'median' and simply takes always the
 second ordered measurement vote out of 4 votes or is there a bug in
 the spec or implementation?

 There's one misplaced throwaway sentence in dir-spec.txt:

   All ties in computing medians are broken in favor of the smaller or
earlier item.
 

 We should bring this, and probably other things, into a definitions
 section earlier in dir-spec.txt.  Patches welcome. ;)

 --
 Nick
 ___
 tor-dev mailing list
 tor-dev@lists.torproject.org
 https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

 
 
 
 ___
 tor-dev mailing list
 tor-dev@lists.torproject.org
 https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
 
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-10 Thread Nick Mathewson
On Mon, Aug 10, 2015 at 1:11 PM, nusenu nus...@openmailbox.org wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512

 Hi,

 https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028

 If 3 or more authorities provide a Measured= keyword for a router,
 the authorities produce a consensus containing a w Bandwidth=
 keyword equal to the median of the Measured= votes.

 a random sample from recent votes:

 grep 37.59.38.117 -A 3 *|grep Measured
 w Bandwidth=6869 Measured=7570
 w Bandwidth=6869 Measured=15500
 w Bandwidth=6869 Measured=18100
 w Bandwidth=6869 Measured=30500

 Tor says the median value is
 15500

 2015-08-10-16-00-00-consensus:
 w Bandwidth=15500

 but the median of these 4 values is actually:
 (18100+15500)/2 = 16800
 no?

 Has tor a different definition of 'median' and simply takes always the
 second ordered measurement vote out of 4 votes or is there a bug in
 the spec or implementation?

There's one misplaced throwaway sentence in dir-spec.txt:

  All ties in computing medians are broken in favor of the smaller or
   earlier item.


We should bring this, and probably other things, into a definitions
section earlier in dir-spec.txt.  Patches welcome. ;)

-- 
Nick
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] tor's definition of 'median'

2015-08-10 Thread nusenu
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Hi,

https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028

 If 3 or more authorities provide a Measured= keyword for a router,
 the authorities produce a consensus containing a w Bandwidth=
 keyword equal to the median of the Measured= votes.

a random sample from recent votes:

grep 37.59.38.117 -A 3 *|grep Measured
w Bandwidth=6869 Measured=7570
w Bandwidth=6869 Measured=15500
w Bandwidth=6869 Measured=18100
w Bandwidth=6869 Measured=30500

Tor says the median value is
15500

2015-08-10-16-00-00-consensus:
w Bandwidth=15500

but the median of these 4 values is actually:
(18100+15500)/2 = 16800
no?

Has tor a different definition of 'median' and simply takes always the
second ordered measurement vote out of 4 votes or is there a bug in
the spec or implementation?
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJVyNtYAAoJEFv7XvVCELh0YU8P/0hWhCLfafvFDPdAfUwUJFPA
A1ZA946JGKg1Vjy+ch3tffjDHRosXt7K5U33N9rKMUlW9ul2BQ+uNgzK4eTbghHz
yCMn6D+uLk1xruYTsIUZ+Pk/ZywaUKj/FngohVvQnaJIgJCHCEnCIqqBNEK0PjUh
EBzI5GdyWpEA2fh55PSRuSNCVzbiVhGwYSKgrVwFrFcKr9iPLmdZsa9SD1mb1PD1
IZOkU6TnSnFGbPaN+pHYdNr5/QJA1/08yYBpG7qAJaTCAMvMif7bKMJ7ElYo7opc
oXcECIcBBPnATgKvbO47dbSnX3s+vPMsrRhhdTa1BMr1MIzVG3RWGhNSJwXXQTJh
jyaPGj10JRWNxDsY0ro001MX0HymmXeLLCkY4nWsUqPXDiZcQe4oRzLQas5bOI94
ct4tCavj9pRNp2XYCWe631gqcsQ3xV7y37dWUCvpdXqt2NC1B7j2w/Y/UiuNArSj
QBtS4Ap5wqnU5JySjndi+lIPOlaPk9uitmzpKLxNF9fnpI6ZECP3T50vHVeZiiQ9
7o1ZyBsPLk3bi2hRy8ZJ0wyO+5WF8bdrlsKXSR40UjrwQZ5kCQf6xmWiSg9PqZFU
rIL14yCOsQkR3P4/IgMlL8dtgFk3Asmkkx039144fRKveEhffFjo54CUMhg/jLra
EF5cUqz4QdgcpRvKa/5h
=lA/6
-END PGP SIGNATURE-
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev