Re: How to deal with 96 Dimensional Points ?

Werner Van Belle Tue, 30 Mar 2010 22:07:27 -0700

Geert-Jan Brits wrote:
> Perhaps you could give us a (generalized) description of your use-case, so
> we can better grasp what you want to achieve, and how you want to use it.
> i.e: since I can't imagine/ envison a real 'eucledian distance' over 96
> dimensions I bet you're talking a generalized distance function over N
> dimenions.
> This is usually only used in two general ways afaik: 
> 2 calculating an ordered top-M list of closests points to the
> target point (Chris' implementation slightly altered) 
This is indeed the situation. A small alteration to chris his
implementation won't do, since we do not know with radius to start with,
so it is not just a matter of adapting the post-filtering.


> 3 (hmm maybe three:
> clustering points based on their distance to eachother)
>   
Yes, this is part of the usecase, but at the moment not my main focus. A
statistical approach will need to employed for that, without going for
full aggregation.
> It helps if we know what you'r after. For instance: if you're points don't
> change often and you want to achieve case 1 or 2  I would calculate these
> once and all-at-once and save them in a seperate table, bc. the on-demand
> variant may quickly become too slow. again depending on your case: option A.
> {<pointid, {neighborids}>}   <-- list of neigborids per point id, with
> pointid as key.
> option B {<pointid, neighborid} <-- one neighborid per  point id, with
> pointid + neighborid as key.
>   
That is not an option. Every 2 minutes or so the next point is randomly
choosen and we need a collection of points in the neighboorhood.
> perhaps also helpful foor google etc.: - a distance function if more often
> called a similarity function
> - top-n 'points' for a given point are usually called its neighbors. - in
> most cases you don't have to take the sqrt in Chris' implementation which
> can save a lot (but instead  do:  if($SumSq <=($r*$r)){//code here}
>   
Indeed, but this is only a fraction of the time. The larger problem lies
in searching all points that have potential. An idea that might work is
to modify the radius of what we are looking at while we are searching
based on the maximum radius we have so far and cut down distance
comparisons if they will surely fall outside the current N closest
neighbours.

-- 
http://werner.yellowcouch.org/

signature.asc
Description: OpenPGP digital signature

Re: How to deal with 96 Dimensional Points ?

Reply via email to