Re: [PERFORM] Cursor + upsert (astronomical data)

Marc Mamin Sun, 27 Jul 2014 12:06:25 -0700

[Craig]
>>If you haven't looked at clustering algorithms yet, you might want to do so.
>>Your problem is a special case of clustering, where you have a large number
>>of small clusters.  A good place to start is the overview on Wikipedia:
>>http://en.wikipedia.org/wiki/Cluster_analysis


According to this list, your method is similar to 
http://en.wikipedia.org/wiki/Basic_sequential_algorithmic_scheme,
but with what seems to be some logical errors.

>The simplest approach I could think of is that I process each row of the 3e8 
>rows
>sequentially and ask:
>Do I have an identifier in the radius of 1 arcsec?
>No: Generate one and assign me to it.
>Yes: Update it and assigne me to it. The update is done as weighted average – 
>I keep the
>number of how many observations the identifier has been computed. The result 
>is that the
>identifier will have average coordinates of all the observations it identifies 
>– it will
>be the center.


Let say you have 2 single measures on a line at arcsec 1 and 2.1. which hence 
correspond to 2 ipix_cat.
Now add a new measure at 1.9: as you choose any of the possible adjacent 
ipix_cat whitout considering the least distance, you may end with an ipix_at at 
1.45 which is at less than 1 arcsec then the next one.
   Moreover you have raised the weight of both ipix_cat.
     Which increases the lock probability when trying a nutlithreaded upgrade.

The max distance between 2 observations belonging to a same ipix_cat tends to 2 
arcsec with your method. If this is ok, you should probbaly modify your method 
so that the 2 first points of my example would have megred to a single 
ipix_cat. You could use your weigth for this: increase your search radius to 
2arcsec and then reject the candidates located between 1 and 2 arsec depending 
on their weight. The additional work load might be compensated by the smaller 
number of ipix_cat that woul will have.

Re: [PERFORM] Cursor + upsert (astronomical data)

Reply via email to