Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

Glynn Clements Thu, 19 Sep 2013 19:47:53 -0700

Luca Delucchi wrote:

> maybe v.median [0] could help?


Not for large datasets. First, it requires that the data will fit into
RAM. Second, numpy.median() sorts the entire array and takes the
middle value, which is somewhere between O(n.log(n)) for the typical
case and O(n^2) for the worst case (numpy.median uses the default
sorting algorithm, which is quicksort).

See r.quantile for a more efficient approach for large datasets. The
first pass computes a histogram which allows upper and lower bounds to
be determined for each quantile. The second pass extracts values which
lie within those bounds and sorts them. Except for pathological cases
(where the majority of the data lie within a tiny proportion of the
overall range), only a small fraction of the data are sorted.

In any case: is this question about the mean or the median?
Calculating the mean is far simpler, and can easily be done in O(n)
time and O(1) space.

-- 
Glynn Clements <gl...@gclements.plus.com>
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] How to calculate mean coordinates from big point datasets?

Reply via email to