On Wednesday 06 August 2008 13:06, Ed Tomlinson wrote:
> On August 6, 2008, Daniel Cheng wrote:
> > On Wed, Aug 6, 2008 at 7:14 AM, Matthew Toseland
> > <toad at amphibian.dyndns.org> wrote:
> > > On Tuesday 05 August 2008 23:11, Ed Tomlinson wrote:
> > >> On August 5, 2008, Matthew Toseland wrote:
> > >> > On Sunday 03 August 2008 00:58, you wrote:
> > >> > > On August 2, 2008, Matthew Toseland wrote:
> > >> > > > On Saturday 02 August 2008 02:41, Ed Tomlinson wrote:
> > >> > > > > On August 1, 2008, Michael Rogers wrote:
> > >> > > > > > Daniel Cheng wrote:
> > >> > > > > > > in a circular space, we can get infinite number 
of "average" by
> > >> > changing
> > >> > > > > > > point of reference. --- choose the point of reference with 
the
> > >> > smallest
> > >> > > > > > > standard deviation.
> > >> > > > > >
> > >> > > > > > From an infinite number of alternatives? Sounds like it might 
take
> > > a
> > >> > > > > > while. ;-) I think we can get away with just trying each 
location
> > > as
> > >> > the
> > >> > > > > > reference point, but that's still O(n^2) running time.
> > >> > > > > >
> > >> > > > > > How about this: the average of two locations is the location
> > > midway
> > >> > > > > > along the shortest line between them. So to estimate the 
average
> > > of a
> > >> > > > > > set of locations, choose two locations at random from the set 
and
> > >> > > > > > replace them with their average, and repeat until there's 
only one
> > >> > > > > > location in the set.
> > >> > > > > >
> > >> > > > > > It's alchemy but at least it runs in linear time. :-)
> > >> > > > >
> > >> > > > > Another idea that should run in linear time.  Consider each key 
a
> > > point
> > >> > on
> > >> > > > the edge
> > >> > > > > of a circle (with a radius of 1).  Convert each key (0=0 
degress,
> > > 1=360)
> > >> > to
> > >> > > > an x, y cord and
> > >> > > > > average these numbers.  Once all keys are averaged convert the 
(x,
> > > y)
> > >> > back
> > >> > > > into a key to
> > >> > > > > get the average.
> > >> > > > >
> > >> > > > > eg    x = sin (key * 360), y = cos(key * 360)  assuming the 
angle is
> > > in
> > >> > > > degrees not radians.
> > >> > > > >       where a key is a number between 0 and 1
> > >> > > >
> > >> > > > You miss the point. We already have what is effectively an angle, 
it's
> > >> > just in
> > >> > > > 0 to 1 instead of 0 to 360 deg / 2*pi rads. The problem is the
> > > circular
> > >> > > > keyspace.
> > >> > >
> > >> > > No I have _not_ missed the point.  If you map each key onto the rim 
of a
> > >> > circle and
> > >> > > average resulting the x and y coords of all the keys you get an 
average
> > > in a
> > >> > circular
> > >> > > keyspace.  Try it.
> > >> > >
> > >> > > If fact radius of the averaged x, y will also be a measure of just 
how
> > >> > specialized your
> > >> > > store is... (eg r = sqrt(average(x cords)^2+average(y cords)^2)
> > >> >
> > >> > Cool! So the principle is that the closer the mid-point is to being
> > > actually
> > >> > on the circle, the more specialised the store?
> > >>
> > >> Yes.  'r' should be between 0 and 1.  The closer to 0 the less 
specialized
> > > the store is.
> > >> It is not a perfect measure of specialization though.  If the store is
> > > specialized in
> > >> two areas 180 degrees apart or 3 areas 120 degrees apart and so on, r 
could
> > > be small
> > >> with the store fairly specialized...
> > >>
> > >> > Somebody should implement this... We don't need to keep the actual
> > > samples, we
> > >> > can just keep a running average of x and y, right? What about 
sensitivity,
> > >> > can we reuse the bootstrapping-decaying-running-average code? Aka 
klein
> > >> > filter with sensitivity reducing over time so that for the first N 
samples
> > >> > it's effectively a simple running average and after that it's a klein
> > > filter
> > >> > with sensitivity equal to that at the end of the first phase? Or 
would we
> > >> > need to use a running average and reset it periodically?
> > >>
> > >> you just need to keep the totals for x and y along with the number of 
keys
> > > (call this n) .  When a
> > >> key is removed from the store subtract its x, y coord and reduce n by 
1.
> > > reverse this when adding
> > >> a key...
> > >
> > > That could work for the datastore yes, assuming it's persistent; meaning 
it
> > > will gradually become more inaccurate with unclean shutdowns. and we 
can't
> > > just wipe it either because then we'd have to reconstruct it from the 
store
> > > before adding more; maybe it could be integrated with bloom filter
> > > reconstruction for sdiz's store?
> > 
> > The original routing key will not be stored in the new store, only a
> > SHA256-digested version will be stored.
> > 
> > Do we need to store the location for each key? (so that we can update
> > the average location when the key is removed)
> > Maybe a running average is just good enough?
> > Or just store an approximated location, not the actual value?
> 
> A salted location would work as long as the salt had a mean of 0 and was not
> all that large (eg. less that the routingMissDistance value).
> 
> On another topic.  For FOF routing, we could also salt locations sent to 
friends.  Again 
> if the salt is less than the routingMissDistance value, I do not think it 
will affect routing too 
> much...  This would make it harder for the reciever of FOF into to figure 
out the actual friends.
> Needs simulation though.
>  
> > > For requests we'd need something more like a BDRA anyway.
> 
> If you use a decaying running average the half life would have to be fairly 
large and 
> adjusted if the store size changed (we would have to track x, y & n).  It 
will also tend to distort 
> the mean since new kyes will have more effect that old keys - for something 
like a store, where
> every key is equally important, this is not a great model.
> 
> I this case, I think we _will_ get more meanfull results from a true 
average.
> 
> How long would it take to recreate the true numbers (maybe from salted keys) 
after an unclean shutdown?  

It depends on what you use it for. My main interest here is in an accurate and 
efficient key location averager that we can use for for example estimating 
the incoming requests specialisation. For that we will probably want a 
decaying average.

With regards to biasing swapping by the datastore, I would have to see some 
detailed simulations, plus I'd want our theoreticians to support it. So far 
none of them have expressed an opinion. IMHO the more obvious and urgent 
solution is to not swap on opennet, but we need vive to do some simulations 
to determine how much impact this would have on a hybrid network.
> 
> Thanks
> Ed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20080806/dd506556/attachment.pgp>

Reply via email to