Dear Tom,
Hengl, T. wrote:
Dear Edzer,
I do have a similar problem (extensive computation time, memory limits). I would like to run interpolation of 60,000 points for 500,000 new points using OK and nmax of 30 closest points (I am using a windowsxp machine with 2GB RAM).
But I have doubts that it can ever run in less than few hours in gstat. I mean, at each new location, the algorithm has to derive distances to all locations (60,000^2 matrix) and then find the 30 closest ones. There is no more intelligent way to pick up the closest points - or is there?
There is, and it is being used: the short name is spatial index and
quadtree; you'll find links and explanation on
http://www.gstat.org/manual/node8.html#SECTION00351000000000000000
Both computers & geosciences papers on gstat mention this.
To see the effect of this, try to enlarge split to a number larger than
the number of observations, as in
krige(log(zinc)~1, meuse, meuse[1,], vgm(1, "Exp", 300), set =
list(split=1000))
Here, the bucket size is 1000, and all observations will be ranked
according to distance to the prediction location (which does not require
an n^2 matrix), using quick sort.
Other software, such as GSLIB (iirc) put a course regular grid over the
domain to speed up neighbourhood selection.
--
Edzer
Thanx,
Tom
________________________________
From: [EMAIL PROTECTED] on behalf of Dave Depew
Sent: Fri 9/19/2008 5:00 PM
To: Edzer Pebesma
Cc: Chris Taylor; r-sig-geo@stat.math.ethz.ch
Subject: Re: [R-sig-Geo] cokriging question
Thanks for the quick responses.
I've often done global OK or UK that can take ~ 2-3 days to complete. I
always assumed that it was b/c the matrices were so large. Looking at
task manager indicated that Rgui.exe only consumed ~ 800 Mb of RAM
during the processes.
I'll try it by passing the maxdist to gstat() rather than predict.gstat().
The only other time I encounter the memory.c line is when I neglect to
remove duplicate observations.
I think I only should have at most 500 or so points within a 100-200m
maxdist...
Edzer Pebesma wrote:
Good morning Dave (late afternoon here),
Chris Taylor wrote:
Good morning Edzer and Dave,
Thanks for bringing up this point. I had a similar issue recently
using krige(). Observations at 5800 locations, attempting to krige()
predictions at 112,000 locations resulted in the same "memory.c"
error message. Reducing predicted locations to <<50,000 and reducing
max.dist seemed to help, but the predictions still took a very long
time (>2 hours). (Running winxp with 4GB memory.)
Can you clarify your suspicion regarding the "lack of standardization
of coordinates"?
In this message, a trend was modelled based on x and y coordinates, as
follows:
x y DN4 indicator4
Min. :670462 Min. :4215236 Min. :18.00 Min. :0.0000
1st Qu.:670683 1st Qu.:4215456 1st Qu.:24.00 1st Qu.:0.0000
Median :670904 Median :4215677 Median :32.50 Median :0.0000
Mean :670904 Mean :4215677 Mean :43.26 Mean :0.4795
3rd Qu.:671125 3rd Qu.:4215898 3rd Qu.:64.00 3rd Qu.:1.0000
Max. :671346 Max. :4216119 Max. :87.00 Max. :1.0000
g<-gstat(id="indicator6",formula=indicator6~x+y+x*y+sqrt(x)+sqrt(y),location=~x+y,data=band6.data,...
computing the x*y will give numbers many orders of magnitude larger
than sqrt(x),or the intercept. The advice is usually to (somewhat)
standardize coordinates before using them as a trend. But I doubt this
helps you very much.
I find it hard to consider > 2 hours as a very long time before I know
all the details (e.g. how many points were there within your
maxdist?), the reason why you want an instant answer, and preferably
have heard a comparison with other software. If you then tell me what
your budget is, I might come up with possible solutions (starting very
cheap, e.g. look at demo(snow) in package gstat, or use an OS that can
assign this 4Gb to a single process).
--
Edzer
Chris
Edzer Pebesma wrote:
Dave,
12000 observations fit, in the c representation, in less than 1 Mb
(64 bytes per observation).
The issue is that you think that passing maxdist to predict.gstat
has an effect. It doesn't; you need to pass it to function gstat().
The same thing happened in this
https://stat.ethz.ch/pipermail/r-sig-geo/2008-September/004182.html
message, where nmax was passed to predict.gstat, and simulation took
forever. The other issue in that question was, I suspect, lack of
standardization of coordinates, used in a trend surface.
--
Edzer
Dave Depew wrote:
Is there a limit to the # of observations or size of file that can
be co-kriged in gstat?
I have a ~12000 observation data set (2 variables), the variograms,
cross variogram and lmc are fit well, and co-kriging starts ok
Linear Model of Coregionalization found. Good.
[using ordinary cokriging]
then immediately outputs
"memory.c", line 57: can't allocate memory in function m_get()
Error in predict.gstat(fit.ck, newdata = EcoSAV.grid, maxdist = 100) :
m_get
Iv tried different maxdist from 10 to 1000, with exactly the same
result.
I recently upgraded my RAM to 4Gb and flipped the windows XP /3GB
switch.
--
David Depew
PhD Candidate
Department of Biology
University of Waterloo
200 University Ave W.
Waterloo, ON. Canada
N2L 3G1
(T) 1-519-888-4567 x33895
(F) 1-519-746-0614
http://www.science.uwaterloo.ca/~ddepew
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
[[alternative HTML version deleted]]
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
--
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster,
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de/
http://www.springer.com/978-0-387-78170-9
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo