Re: [tor-dev] tor's definition of 'median'

2015-08-13 Thread Sebastian Hahn

> On 13 Aug 2015, at 18:50, Nick Mathewson  wrote:
> 
> On Wed, Aug 12, 2015 at 5:34 PM, nusenu  wrote:
>> from today's measurement meeting:
>> 
>>> 15:00:20  karsten: I've decided I'm going to fix the definition of 
>>> median
>>> 15:00:26  in the tor sourcecode
>>> 15:00:36  virgil: is it broken?
>>> 15:00:53  or just not specified as clearly as it should be?
>>> 15:01:01  for ordered list {a,b,c,d}, it returns b instead of 
>>> (b+c)/2.
>>> 15:01:24  yes. maybe that's for a reason (which I don't know).
>>> 15:01:40  I look forward to hearing this reason when my patch is 
>>> rejected.
>>> 15:01:41  like, using value (b+c)/2 would break for some reason, 
>>> whereas any of a, b, c, d would be fine.
>>> 15:01:45  you cannot do that
>>> 15:01:51  without breaking Tor's voting
>>> 15:02:21  Tor's specification requires low median for a bunch of 
>>> directory stuff
>> 
>> 
>> I'd be interested in the reason as well.
> 
> The correct fix here is to update the code documentation to define the
> functions as returning the low-median, and to update dir-spec.txt to
> say so too.  I'd accept documentation patches like that.

The documentation for the code already says that. The spec could be
updated to say low-median consistently, tho.

Cheers
Sebastian
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-13 Thread teor

> On 14 Aug 2015, at 03:10 , nusenu  wrote:
> 
>> Changing the code to return the mean of the two center elements from
>> even arrays would break all authority voting, and wouldn't actually be
>> useful.
> 
> Yes, that is what Sebastian said on IRC as well. Can you shed some light
> as to why it would break voting?

If the authorities supply different values in the consensus, voting breaks.

Authorities using the low-median would supply one value, and authorities using 
the mean-median would supply another value. (Authorities typically run 
different versions of tor, and don't upgrade all at once.)

Breaking changes like this are typically negotiated among the authorities using 
numbered consensus methods. Once enough authorities support a new consensus 
method, it is activated during voting.

Rather than creating a new consensus method to implement mean-median, it's much 
easier to patch the documentation to specify low-median. (And I see no 
significant gain in changing from low-median to mean-median.)

I'd rather see bandwidth measurements become more accurate, for more relays, 
more of the time, than change how their median is defined.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com
pgp ABFED1AC
https://gist.github.com/teor2345/d033b8ce0a99adbc89c5

teor at blah dot im
OTR D5BE4EC2 255D7585 F3874930 DB130265 7C9EBBC7



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-13 Thread nusenu
> Changing the code to return the mean of the two center elements from
> even arrays would break all authority voting, and wouldn't actually be
> useful.

Yes, that is what Sebastian said on IRC as well. Can you shed some light
as to why it would break voting?

thanks



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-13 Thread Nick Mathewson
On Wed, Aug 12, 2015 at 5:34 PM, nusenu  wrote:
> from today's measurement meeting:
>
>> 15:00:20  karsten: I've decided I'm going to fix the definition of 
>> median
>> 15:00:26  in the tor sourcecode
>> 15:00:36  virgil: is it broken?
>> 15:00:53  or just not specified as clearly as it should be?
>> 15:01:01  for ordered list {a,b,c,d}, it returns b instead of 
>> (b+c)/2.
>> 15:01:24  yes. maybe that's for a reason (which I don't know).
>> 15:01:40  I look forward to hearing this reason when my patch is 
>> rejected.
>> 15:01:41  like, using value (b+c)/2 would break for some reason, 
>> whereas any of a, b, c, d would be fine.
>> 15:01:45  you cannot do that
>> 15:01:51  without breaking Tor's voting
>> 15:02:21  Tor's specification requires low median for a bunch of 
>> directory stuff
>
>
> I'd be interested in the reason as well.

The correct fix here is to update the code documentation to define the
functions as returning the low-median, and to update dir-spec.txt to
say so too.  I'd accept documentation patches like that.

Changing the code to return the mean of the two center elements from
even arrays would break all authority voting, and wouldn't actually be
useful.

-- 
Nick
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-12 Thread nusenu
from today's measurement meeting:

> 15:00:20  karsten: I've decided I'm going to fix the definition of 
> median
> 15:00:26  in the tor sourcecode
> 15:00:36  virgil: is it broken?
> 15:00:53  or just not specified as clearly as it should be?
> 15:01:01  for ordered list {a,b,c,d}, it returns b instead of (b+c)/2.
> 15:01:24  yes. maybe that's for a reason (which I don't know).
> 15:01:40  I look forward to hearing this reason when my patch is 
> rejected.
> 15:01:41  like, using value (b+c)/2 would break for some reason, 
> whereas any of a, b, c, d would be fine.
> 15:01:45  you cannot do that
> 15:01:51  without breaking Tor's voting
> 15:02:21  Tor's specification requires low median for a bunch of 
> directory stuff


I'd be interested in the reason as well.






signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-12 Thread Virgil Griffith
I looked into this.

Apparently Tor often uses the "low median", in cases where it needs to be a
middle value, but an inbetween value is not allowed.  This is chiefly for
voting.

On Tue, Aug 11, 2015 at 10:49 PM Andreas Krey  wrote:

> On Tue, 11 Aug 2015 13:44:48 +, Virgil Griffith wrote:
> > I mean the median.
> >
> > >From Wikipedia...
> >
> > For example, if *a* < *b* < *c*, then the median of the list {*a*, *b*,
> *c*}
> > is *b*, and, if *a* < *b* < *c* < *d*, then the median of the list {*a*,
> *b*
> > , *c*, *d*} is the mean of *b* and *c*; i.e., it is (*b* + *c*) / 2.
>
> In the preceding paragraph wikipedia isn't that strict:
> 'the median is then usually defined to be the mean of the two middle
> values',
> and tor is unusual. :-)
>
> Andreas
>
> --
> "Totally trivial. Famous last words."
> From: Linus Torvalds 
> Date: Fri, 22 Jan 2010 07:29:21 -0800
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-11 Thread Andreas Krey
On Tue, 11 Aug 2015 13:44:48 +, Virgil Griffith wrote:
> I mean the median.
> 
> >From Wikipedia...
> 
> For example, if *a* < *b* < *c*, then the median of the list {*a*, *b*, *c*}
> is *b*, and, if *a* < *b* < *c* < *d*, then the median of the list {*a*, *b*
> , *c*, *d*} is the mean of *b* and *c*; i.e., it is (*b* + *c*) / 2.

In the preceding paragraph wikipedia isn't that strict:
'the median is then usually defined to be the mean of the two middle values',
and tor is unusual. :-)

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-11 Thread Maciej Soltysiak
Virgil's absolutely right. Median as the "middle" value in a _sorted_ set
is:
- for odd number of data points, it's the middle one: set[N/2]
- for even number of data points, it's the average of two in the middle:
(set[N/2] + set[(N+1)/2]) / 2

Best regards,
Maciej

On Tue, Aug 11, 2015 at 3:44 PM, Virgil Griffith  wrote:

> I mean the median.
>
> From Wikipedia...
>
> For example, if *a* < *b* < *c*, then the median of the list {*a*, *b*,
> *c*} is *b*, and, if *a* < *b* < *c* < *d*, then the median of the list {
> *a*, *b*, *c*, *d*} is the mean of *b* and *c*; i.e., it is (*b* + *c*) /
> 2.
>
> -V
>
> On Tue, Aug 11, 2015 at 9:29 PM John  wrote:
>
>> I think you are confusing the median with the mean:
>>
>> https://en.wikipedia.org/wiki/Median
>> https://en.wikipedia.org/wiki/Mean
>>
>> Taking the median instead of the mean can be beneficial in situations
>> where you have larger outliers in your data, which typically affect the
>> mean very much.
>>
>> -j
>>
>> Virgil Griffith:
>> > Is there some implementation-specific reason not to use the standard
>> > mathematical definition of "median"?  If not, I propose changing the
>> > implementation to become it.
>> >
>> > -V
>> >
>> > On Tue, Aug 11, 2015 at 2:44 AM Nick Mathewson 
>> wrote:
>> >
>> >> On Mon, Aug 10, 2015 at 1:11 PM, nusenu 
>> wrote:
>> >>> -BEGIN PGP SIGNED MESSAGE-
>> >>> Hash: SHA512
>> >>>
>> >>> Hi,
>> >>>
>> >>> https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028
>> >>>
>>  If 3 or more authorities provide a Measured= keyword for a router,
>>  the authorities produce a consensus containing a "w" Bandwidth=
>>  keyword equal to the median of the Measured= votes.
>> >>>
>> >>> a random sample from recent votes:
>> >>>
>> >>> grep 37.59.38.117 -A 3 *|grep Measured
>> >>> w Bandwidth=6869 Measured=7570
>> >>> w Bandwidth=6869 Measured=15500
>> >>> w Bandwidth=6869 Measured=18100
>> >>> w Bandwidth=6869 Measured=30500
>> >>>
>> >>> Tor says the median value is
>> >>> 15500
>> >>>
>> >>> 2015-08-10-16-00-00-consensus:
>> >>> w Bandwidth=15500
>> >>>
>> >>> but the median of these 4 values is actually:
>> >>> (18100+15500)/2 = 16800
>> >>> no?
>> >>>
>> >>> Has tor a different definition of 'median' and simply takes always the
>> >>> second ordered measurement vote out of 4 votes or is there a bug in
>> >>> the spec or implementation?
>> >>
>> >> There's one misplaced throwaway sentence in dir-spec.txt:
>> >>
>> >> "  All ties in computing medians are broken in favor of the smaller or
>> >>earlier item.
>> >> "
>> >>
>> >> We should bring this, and probably other things, into a "definitions"
>> >> section earlier in dir-spec.txt.  Patches welcome. ;)
>> >>
>> >> --
>> >> Nick
>> >> ___
>> >> tor-dev mailing list
>> >> tor-dev@lists.torproject.org
>> >> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>> >>
>> >
>> >
>> >
>> > ___
>> > tor-dev mailing list
>> > tor-dev@lists.torproject.org
>> > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>> >
>> ___
>> tor-dev mailing list
>> tor-dev@lists.torproject.org
>> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>>
>
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
>
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-11 Thread Virgil Griffith
I mean the median.

>From Wikipedia...

For example, if *a* < *b* < *c*, then the median of the list {*a*, *b*, *c*}
is *b*, and, if *a* < *b* < *c* < *d*, then the median of the list {*a*, *b*
, *c*, *d*} is the mean of *b* and *c*; i.e., it is (*b* + *c*) / 2.

-V

On Tue, Aug 11, 2015 at 9:29 PM John  wrote:

> I think you are confusing the median with the mean:
>
> https://en.wikipedia.org/wiki/Median
> https://en.wikipedia.org/wiki/Mean
>
> Taking the median instead of the mean can be beneficial in situations
> where you have larger outliers in your data, which typically affect the
> mean very much.
>
> -j
>
> Virgil Griffith:
> > Is there some implementation-specific reason not to use the standard
> > mathematical definition of "median"?  If not, I propose changing the
> > implementation to become it.
> >
> > -V
> >
> > On Tue, Aug 11, 2015 at 2:44 AM Nick Mathewson 
> wrote:
> >
> >> On Mon, Aug 10, 2015 at 1:11 PM, nusenu  wrote:
> >>> -BEGIN PGP SIGNED MESSAGE-
> >>> Hash: SHA512
> >>>
> >>> Hi,
> >>>
> >>> https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028
> >>>
>  If 3 or more authorities provide a Measured= keyword for a router,
>  the authorities produce a consensus containing a "w" Bandwidth=
>  keyword equal to the median of the Measured= votes.
> >>>
> >>> a random sample from recent votes:
> >>>
> >>> grep 37.59.38.117 -A 3 *|grep Measured
> >>> w Bandwidth=6869 Measured=7570
> >>> w Bandwidth=6869 Measured=15500
> >>> w Bandwidth=6869 Measured=18100
> >>> w Bandwidth=6869 Measured=30500
> >>>
> >>> Tor says the median value is
> >>> 15500
> >>>
> >>> 2015-08-10-16-00-00-consensus:
> >>> w Bandwidth=15500
> >>>
> >>> but the median of these 4 values is actually:
> >>> (18100+15500)/2 = 16800
> >>> no?
> >>>
> >>> Has tor a different definition of 'median' and simply takes always the
> >>> second ordered measurement vote out of 4 votes or is there a bug in
> >>> the spec or implementation?
> >>
> >> There's one misplaced throwaway sentence in dir-spec.txt:
> >>
> >> "  All ties in computing medians are broken in favor of the smaller or
> >>earlier item.
> >> "
> >>
> >> We should bring this, and probably other things, into a "definitions"
> >> section earlier in dir-spec.txt.  Patches welcome. ;)
> >>
> >> --
> >> Nick
> >> ___
> >> tor-dev mailing list
> >> tor-dev@lists.torproject.org
> >> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
> >>
> >
> >
> >
> > ___
> > tor-dev mailing list
> > tor-dev@lists.torproject.org
> > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
> >
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-11 Thread John
I think you are confusing the median with the mean:

https://en.wikipedia.org/wiki/Median
https://en.wikipedia.org/wiki/Mean

Taking the median instead of the mean can be beneficial in situations
where you have larger outliers in your data, which typically affect the
mean very much.

-j

Virgil Griffith:
> Is there some implementation-specific reason not to use the standard
> mathematical definition of "median"?  If not, I propose changing the
> implementation to become it.
> 
> -V
> 
> On Tue, Aug 11, 2015 at 2:44 AM Nick Mathewson  wrote:
> 
>> On Mon, Aug 10, 2015 at 1:11 PM, nusenu  wrote:
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA512
>>>
>>> Hi,
>>> 
>>> https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028
>>>
 If 3 or more authorities provide a Measured= keyword for a router,
 the authorities produce a consensus containing a "w" Bandwidth=
 keyword equal to the median of the Measured= votes.
>>>
>>> a random sample from recent votes:
>>>
>>> grep 37.59.38.117 -A 3 *|grep Measured
>>> w Bandwidth=6869 Measured=7570
>>> w Bandwidth=6869 Measured=15500
>>> w Bandwidth=6869 Measured=18100
>>> w Bandwidth=6869 Measured=30500
>>>
>>> Tor says the median value is
>>> 15500
>>>
>>> 2015-08-10-16-00-00-consensus:
>>> w Bandwidth=15500
>>>
>>> but the median of these 4 values is actually:
>>> (18100+15500)/2 = 16800
>>> no?
>>>
>>> Has tor a different definition of 'median' and simply takes always the
>>> second ordered measurement vote out of 4 votes or is there a bug in
>>> the spec or implementation?
>>
>> There's one misplaced throwaway sentence in dir-spec.txt:
>>
>> "  All ties in computing medians are broken in favor of the smaller or
>>earlier item.
>> "
>>
>> We should bring this, and probably other things, into a "definitions"
>> section earlier in dir-spec.txt.  Patches welcome. ;)
>>
>> --
>> Nick
>> ___
>> tor-dev mailing list
>> tor-dev@lists.torproject.org
>> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>>
> 
> 
> 
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
> 
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-11 Thread Virgil Griffith
Is there some implementation-specific reason not to use the standard
mathematical definition of "median"?  If not, I propose changing the
implementation to become it.

-V

On Tue, Aug 11, 2015 at 2:44 AM Nick Mathewson  wrote:

> On Mon, Aug 10, 2015 at 1:11 PM, nusenu  wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > Hi,
> >
> > https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028
> >
> >> If 3 or more authorities provide a Measured= keyword for a router,
> >> the authorities produce a consensus containing a "w" Bandwidth=
> >> keyword equal to the median of the Measured= votes.
> >
> > a random sample from recent votes:
> >
> > grep 37.59.38.117 -A 3 *|grep Measured
> > w Bandwidth=6869 Measured=7570
> > w Bandwidth=6869 Measured=15500
> > w Bandwidth=6869 Measured=18100
> > w Bandwidth=6869 Measured=30500
> >
> > Tor says the median value is
> > 15500
> >
> > 2015-08-10-16-00-00-consensus:
> > w Bandwidth=15500
> >
> > but the median of these 4 values is actually:
> > (18100+15500)/2 = 16800
> > no?
> >
> > Has tor a different definition of 'median' and simply takes always the
> > second ordered measurement vote out of 4 votes or is there a bug in
> > the spec or implementation?
>
> There's one misplaced throwaway sentence in dir-spec.txt:
>
> "  All ties in computing medians are broken in favor of the smaller or
>earlier item.
> "
>
> We should bring this, and probably other things, into a "definitions"
> section earlier in dir-spec.txt.  Patches welcome. ;)
>
> --
> Nick
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor's definition of 'median'

2015-08-10 Thread Nick Mathewson
On Mon, Aug 10, 2015 at 1:11 PM, nusenu  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Hi,
>
> https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028
>
>> If 3 or more authorities provide a Measured= keyword for a router,
>> the authorities produce a consensus containing a "w" Bandwidth=
>> keyword equal to the median of the Measured= votes.
>
> a random sample from recent votes:
>
> grep 37.59.38.117 -A 3 *|grep Measured
> w Bandwidth=6869 Measured=7570
> w Bandwidth=6869 Measured=15500
> w Bandwidth=6869 Measured=18100
> w Bandwidth=6869 Measured=30500
>
> Tor says the median value is
> 15500
>
> 2015-08-10-16-00-00-consensus:
> w Bandwidth=15500
>
> but the median of these 4 values is actually:
> (18100+15500)/2 = 16800
> no?
>
> Has tor a different definition of 'median' and simply takes always the
> second ordered measurement vote out of 4 votes or is there a bug in
> the spec or implementation?

There's one misplaced throwaway sentence in dir-spec.txt:

"  All ties in computing medians are broken in favor of the smaller or
   earlier item.
"

We should bring this, and probably other things, into a "definitions"
section earlier in dir-spec.txt.  Patches welcome. ;)

-- 
Nick
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev