On Wed, Aug 6, 2008 at 7:14 AM, Matthew Toseland <toad at amphibian.dyndns.org> wrote: > On Tuesday 05 August 2008 23:11, Ed Tomlinson wrote: >> On August 5, 2008, Matthew Toseland wrote: >> > On Sunday 03 August 2008 00:58, you wrote: >> > > On August 2, 2008, Matthew Toseland wrote: >> > > > On Saturday 02 August 2008 02:41, Ed Tomlinson wrote: >> > > > > On August 1, 2008, Michael Rogers wrote: >> > > > > > Daniel Cheng wrote: >> > > > > > > in a circular space, we can get infinite number of "average" by >> > changing >> > > > > > > point of reference. --- choose the point of reference with the >> > smallest >> > > > > > > standard deviation. >> > > > > > >> > > > > > From an infinite number of alternatives? Sounds like it might take > a >> > > > > > while. ;-) I think we can get away with just trying each location > as >> > the >> > > > > > reference point, but that's still O(n^2) running time. >> > > > > > >> > > > > > How about this: the average of two locations is the location > midway >> > > > > > along the shortest line between them. So to estimate the average > of a >> > > > > > set of locations, choose two locations at random from the set and >> > > > > > replace them with their average, and repeat until there's only one >> > > > > > location in the set. >> > > > > > >> > > > > > It's alchemy but at least it runs in linear time. :-) >> > > > > >> > > > > Another idea that should run in linear time. Consider each key a > point >> > on >> > > > the edge >> > > > > of a circle (with a radius of 1). Convert each key (0=0 degress, > 1=360) >> > to >> > > > an x, y cord and >> > > > > average these numbers. Once all keys are averaged convert the (x, > y) >> > back >> > > > into a key to >> > > > > get the average. >> > > > > >> > > > > eg x = sin (key * 360), y = cos(key * 360) assuming the angle is > in >> > > > degrees not radians. >> > > > > where a key is a number between 0 and 1 >> > > > >> > > > You miss the point. We already have what is effectively an angle, it's >> > just in >> > > > 0 to 1 instead of 0 to 360 deg / 2*pi rads. The problem is the > circular >> > > > keyspace. >> > > >> > > No I have _not_ missed the point. If you map each key onto the rim of a >> > circle and >> > > average resulting the x and y coords of all the keys you get an average > in a >> > circular >> > > keyspace. Try it. >> > > >> > > If fact radius of the averaged x, y will also be a measure of just how >> > specialized your >> > > store is... (eg r = sqrt(average(x cords)^2+average(y cords)^2) >> > >> > Cool! So the principle is that the closer the mid-point is to being > actually >> > on the circle, the more specialised the store? >> >> Yes. 'r' should be between 0 and 1. The closer to 0 the less specialized > the store is. >> It is not a perfect measure of specialization though. If the store is > specialized in >> two areas 180 degrees apart or 3 areas 120 degrees apart and so on, r could > be small >> with the store fairly specialized... >> >> > Somebody should implement this... We don't need to keep the actual > samples, we >> > can just keep a running average of x and y, right? What about sensitivity, >> > can we reuse the bootstrapping-decaying-running-average code? Aka klein >> > filter with sensitivity reducing over time so that for the first N samples >> > it's effectively a simple running average and after that it's a klein > filter >> > with sensitivity equal to that at the end of the first phase? Or would we >> > need to use a running average and reset it periodically? >> >> you just need to keep the totals for x and y along with the number of keys > (call this n) . When a >> key is removed from the store subtract its x, y coord and reduce n by 1. > reverse this when adding >> a key... > > That could work for the datastore yes, assuming it's persistent; meaning it > will gradually become more inaccurate with unclean shutdowns. and we can't > just wipe it either because then we'd have to reconstruct it from the store > before adding more; maybe it could be integrated with bloom filter > reconstruction for sdiz's store?
The original routing key will not be stored in the new store, only a SHA256-digested version will be stored. Do we need to store the location for each key? (so that we can update the average location when the key is removed) Maybe a running average is just good enough? Or just store an approximated location, not the actual value? > For requests we'd need something more like a BDRA anyway. >> >> Ed > > _______________________________________________ > Devl mailing list > Devl at freenetproject.org > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl >
