On August 6, 2008, Daniel Cheng wrote:
> On Wed, Aug 6, 2008 at 7:14 AM, Matthew Toseland
> <[EMAIL PROTECTED]> wrote:
> > On Tuesday 05 August 2008 23:11, Ed Tomlinson wrote:
> >> On August 5, 2008, Matthew Toseland wrote:
> >> > On Sunday 03 August 2008 00:58, you wrote:
> >> > > On August 2, 2008, Matthew Toseland wrote:
> >> > > > On Saturday 02 August 2008 02:41, Ed Tomlinson wrote:
> >> > > > > On August 1, 2008, Michael Rogers wrote:
> >> > > > > > Daniel Cheng wrote:
> >> > > > > > > in a circular space, we can get infinite number of "average" by
> >> > changing
> >> > > > > > > point of reference. --- choose the point of reference with the
> >> > smallest
> >> > > > > > > standard deviation.
> >> > > > > >
> >> > > > > > From an infinite number of alternatives? Sounds like it might 
> >> > > > > > take
> > a
> >> > > > > > while. ;-) I think we can get away with just trying each location
> > as
> >> > the
> >> > > > > > reference point, but that's still O(n^2) running time.
> >> > > > > >
> >> > > > > > How about this: the average of two locations is the location
> > midway
> >> > > > > > along the shortest line between them. So to estimate the average
> > of a
> >> > > > > > set of locations, choose two locations at random from the set and
> >> > > > > > replace them with their average, and repeat until there's only 
> >> > > > > > one
> >> > > > > > location in the set.
> >> > > > > >
> >> > > > > > It's alchemy but at least it runs in linear time. :-)
> >> > > > >
> >> > > > > Another idea that should run in linear time.  Consider each key a
> > point
> >> > on
> >> > > > the edge
> >> > > > > of a circle (with a radius of 1).  Convert each key (0=0 degress,
> > 1=360)
> >> > to
> >> > > > an x, y cord and
> >> > > > > average these numbers.  Once all keys are averaged convert the (x,
> > y)
> >> > back
> >> > > > into a key to
> >> > > > > get the average.
> >> > > > >
> >> > > > > eg    x = sin (key * 360), y = cos(key * 360)  assuming the angle 
> >> > > > > is
> > in
> >> > > > degrees not radians.
> >> > > > >       where a key is a number between 0 and 1
> >> > > >
> >> > > > You miss the point. We already have what is effectively an angle, 
> >> > > > it's
> >> > just in
> >> > > > 0 to 1 instead of 0 to 360 deg / 2*pi rads. The problem is the
> > circular
> >> > > > keyspace.
> >> > >
> >> > > No I have _not_ missed the point.  If you map each key onto the rim of 
> >> > > a
> >> > circle and
> >> > > average resulting the x and y coords of all the keys you get an average
> > in a
> >> > circular
> >> > > keyspace.  Try it.
> >> > >
> >> > > If fact radius of the averaged x, y will also be a measure of just how
> >> > specialized your
> >> > > store is... (eg r = sqrt(average(x cords)^2+average(y cords)^2)
> >> >
> >> > Cool! So the principle is that the closer the mid-point is to being
> > actually
> >> > on the circle, the more specialised the store?
> >>
> >> Yes.  'r' should be between 0 and 1.  The closer to 0 the less specialized
> > the store is.
> >> It is not a perfect measure of specialization though.  If the store is
> > specialized in
> >> two areas 180 degrees apart or 3 areas 120 degrees apart and so on, r could
> > be small
> >> with the store fairly specialized...
> >>
> >> > Somebody should implement this... We don't need to keep the actual
> > samples, we
> >> > can just keep a running average of x and y, right? What about 
> >> > sensitivity,
> >> > can we reuse the bootstrapping-decaying-running-average code? Aka klein
> >> > filter with sensitivity reducing over time so that for the first N 
> >> > samples
> >> > it's effectively a simple running average and after that it's a klein
> > filter
> >> > with sensitivity equal to that at the end of the first phase? Or would we
> >> > need to use a running average and reset it periodically?
> >>
> >> you just need to keep the totals for x and y along with the number of keys
> > (call this n) .  When a
> >> key is removed from the store subtract its x, y coord and reduce n by 1.
> > reverse this when adding
> >> a key...
> >
> > That could work for the datastore yes, assuming it's persistent; meaning it
> > will gradually become more inaccurate with unclean shutdowns. and we can't
> > just wipe it either because then we'd have to reconstruct it from the store
> > before adding more; maybe it could be integrated with bloom filter
> > reconstruction for sdiz's store?
> 
> The original routing key will not be stored in the new store, only a
> SHA256-digested version will be stored.
> 
> Do we need to store the location for each key? (so that we can update
> the average location when the key is removed)
> Maybe a running average is just good enough?
> Or just store an approximated location, not the actual value?

A salted location would work as long as the salt had a mean of 0 and was not
all that large (eg. less that the routingMissDistance value).

On another topic.  For FOF routing, we could also salt locations sent to 
friends.  Again 
if the salt is less than the routingMissDistance value, I do not think it will 
affect routing too 
much...  This would make it harder for the reciever of FOF into to figure out 
the actual friends.
Needs simulation though.
 
> > For requests we'd need something more like a BDRA anyway.

If you use a decaying running average the half life would have to be fairly 
large and 
adjusted if the store size changed (we would have to track x, y & n).  It will 
also tend to distort 
the mean since new kyes will have more effect that old keys - for something 
like a store, where
every key is equally important, this is not a great model.

I this case, I think we _will_ get more meanfull results from a true average.

How long would it take to recreate the true numbers (maybe from salted keys) 
after an unclean shutdown?  

Thanks
Ed


_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to