Re: [R-sig-Geo] SpatialGridDataFrame to data.frame

Ned Horning Thu, 12 Feb 2009 01:11:23 -0800

Roger and Robert,

Thanks for the help. Once again I am close but I can't figure out how touse the attribute data to control the sampling. I'd like to stratify thesampling by attribute value. In the Shapefile I have an integerattribute "covertype". There can be several polygons with the samecovertype ID. What I want to do is get 100 sample points from polygonsof covertype=1, 100 points from polygons of covertype=2... Is thatpossible? I looked at readShapePoly, readORG, and the suite of tools inspsample and but didn't see what I was looking for.

It looks like overlay() is a good way to assign the value of covertypeto coordinate pairs.

One solution to fix all of this is to have a separate Shapefile for eachcovertype but I'd like to avoid that if possible.


Ned

Roger Bivand wrote:

On Thu, 12 Feb 2009, Robert Hijmans wrote:
Hi Ned,

Good to hear that. For your other question, have a look at:

require(maptools)
?readShapePoly
Or more generally readOGR() in rgdal for another use of the underlyingshapelib code.
require(sp)
?sample.Polygons
Most likely the spsample() method will be enough, without having topick a specific method - but if necessary the Polygons objects can bedissolved using unionSpatialPolygons() in maptools.
Roger
Robert


On Thu, Feb 12, 2009 at 2:30 PM, Ned Horning <horn...@amnh.org> wrote:
Robert,
This worked - thanks. It's always uplifting to see an actual imageafterworking on something for a while. Now I can start playing withparametersand playing with different approaches. I'm just (re)starting my Reducationand I'm pretty slow getting the hang of it but your examples help alot.They also give me more avenues to discover other functionality anddifferentways of doing things. I hope I can keep working with R and GRASSuntil Iknow it this time. My goal is to get to the point where I can beproductive
with these packages and start training other folks.
The raster package looks very nice and I'll keep an eye on itsdevelopment.
One step that I am currently doing in GRASS is to read a Shapefile with
training data (polygons with an integer attribute representing aland cover
type) and then randomly create 100 points within each set of polygons
representing a specific land cover type. I do this for each landcover type
and then concatenate the results into a text file. This file has the
coordinates that I use in xyValues to get the pixels values from theSPOT
image. Is there a way to do this sampling using the raster package or
another R package that you are familiar with? In GRASS I convert the
Shapefile to a raster image before doing the random sampling and itwould be
nice if I could skip this step.

All the best,

Ned

Robert Hijmans wrote:
I see, in my example, I had a single quantitative variable but you
probably have land cover classes or something like that. If the
classes are in fact numbers stored as text then use

pred <- as.numeric(pred)

but if you have words such as 'forest', 'crops', 'water' you could do
something like

...
  pred <- predict(randfor, rowvals)
  pred[pred=='forest'] <- 1
  pred[pred=='crops'] <- 2
  pred[pred=='water'] <- 3
  pred <- as.numeric(pred)
  predrast <- setValues(predrast, pred, r)
...

not pretty, you could also fit RF with classes that can be interpreted
as numbers..
Make sure you do not get:

Warning message:
NAs introduced by coercion


which would suggest that some character values could not be
transformed to numbers.

On Thu, Feb 12, 2009 at 1:56 AM, Ned Horning <horn...@amnh.org> wrote:
Robert,
Using predrast <- setValues(predrast, as.vector(pred), r) I gotanother
error: values must be numeric, integer or logical.

class(pred) = "factor"
dim(pred) = NULL
class(v) = "character"
length(v) == ncol(spot) = TRUE





Ned

Robert Hijmans wrote:
Strange. You could try
    predrast <- setValues(predrast, as.vector(pred), r)

But it would be good to know what pred is.

Can you do this:

class(pred)
dim(pred)
v <- as.vector(pred)
class(v)
length(v) == ncol(spot)


Robert
On Wed, Feb 11, 2009 at 11:16 PM, Ned Horning <horn...@amnh.org>wrote:
Robert and Roger,
Thanks for the information and pointers. The raster packagelooks quite
interesting and I'll try to get up to speed on some of its
capabilities.
Are
the man pages the best way to do that or is that a single document
available?

I made some progress but still have some questions. I followed the
steps
laid out by Robert and everything went fine except I ran into anerror
with
"predrast <- setValues(predrast, pred, r)" in the for loop whenI triedprocessing one line at a time and "r <- setValues(r, pred)" whenI ran
the
full image in one go. The error was: "values must be a vector." Any
idea
what I'm doing wrong?
I tried to read the GRASS files directly but got a messagesaying it is
not
a supported file format. Can you confirm that is the case or amI doingsomething wrong? I was able to read a tiff version of the image.I am
able
to run gdalinfo on GRASS files just fine from a terminal window.

Thanks again for the help.

Ned


Robert Hijmans wrote:
Ned,

This is an example of running a RandomForest prediction with the
raster package (for the simple case that there are no NA valuesin the
raster data; if there are, you have to into account that "predict'
does not return any values (not even NA) for those cells).

Robert

#install.packages("raster", repos="http://R-Forge.R-project.org";)
require(raster)
require(randomForest)

# for single band files
spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
# for multiple band files
# spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'),
c(1,2,3) )

# simulate random points and values to model with
xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
response <- runif(100) * 100
# read values of raster layers at points, and bind to respinse
trainvals <- cbind(response, xyValues(spot, xy))

# run RandomForest
randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)

# apply the prediction, row by row
predrast <- setRaster(spot)
filename(predrast) <- 'RF_pred.grd'
for (r in 1:nrow(spot)) {
     spot <- readRow(spot, r)
     rowvals <- values(spot, names=TRUE)
# this next line should not be necessary, but it is
# I'll fix that
     colnames(rowvals) <- c('b1', 'b2', 'b3')
     pred <- predict(randfor, rowvals)
     predrast <- setValues(predrast, pred, r)
     predrast <- writeRaster(predrast, overwrite=TRUE)
}

plot(predrast)
On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand<roger.biv...@nhh.no>
wrote:
Ned:
The three bands are most likely treated as 4-byte integers, sothe
object
will be 2732 by 3058 by 3 by 4 plus a little bit. That's100MB. They
may
get copied too. There are no single byte user-level containersfor
you
(there is a raw data type, but you can't calculate with it).Possibly
saying spot_frame <- slot(spot, "data") will save one copying
operation,
but its hard to tell - your choice of method first adds innall thecoordinates, which are 8-byte numbers, so more than doublesits size
and
makes more copies (to 233MB for each copy). Running gc() several
times
manually between steps often helps by making the garbagecollector
more
aggressive.
I would watch the developments in the R-Forge package"raster", whichbuilds on some of these things, and try to see how that works.If you
have
the GDAL-GRASS plugin for rasters, you can use readGDAL toread from
GRASS
- which would work with raster package functions now. Look at the
code
of
recent readRAST6 to see which incantations are needed. If you are
going
to
use randomForest for prediction, you can use smaller tilesuntil you
find
an alternative solution. Note that feeding a data frame ofintegers
to
a
model fitting or prediction function will result in coercion to a
matrix of doubles, so your subsequent workflow should takethat into
account.
 Getting more memory is another option, and may be very cost and
especially
time effective - at the moment your machine is swapping. Buying
memory
may
save you time programming around too little memory.

Hope this helps,

Roger


---
Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
roger.biv...@nhh.no



-----Original Message-----
From: r-sig-geo-boun...@stat.math.ethz.ch on behalf of NedHorning
Sent: Wed 11.02.2009 07:40
To: r-sig-geo@stat.math.ethz.ch
Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame

Greetings,
I am trying to read an image from GRASS using the spgrass6commandreadRAST6 and then convert it into a data.frame object so Ican use
it
with randomForest. The byte image I'm reading is 2732 rows x 3058
columns x 3 bands. It's a small subset of a larger image Iwould like
to
use eventually. I have no problem reading the image usingreadRAST6
but
when I try to convert it to a data.frame object my linux system
resources (1BG RAM, 3GB swap) nearly get maxed out and it runsfor acouple hours before I kill the process. The image is less than25MB
so
I'm surprised it requires this level of memory. Can someonelet me
know
why this is. Should I use something other than the GRASSinterface
for
this? These are the commands I'm using:

spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
spot_frame <- as(spot, "data.frame")

Any help would be appreciated.

All the best,

Ned

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo




    [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: [R-sig-Geo] SpatialGridDataFrame to data.frame

Reply via email to