Roger,Thanks - It looks like this will do the trick. I had to change [X$covertype="forest",] to [X$covertype=="forest",] but it appears to work just fine.
Ned Roger Bivand wrote:
On Thu, 12 Feb 2009, Ned Horning wrote:Roger and Robert,Thanks for the help. Once again I am close but I can't figure out how to use the attribute data to control the sampling. I'd like to stratify the sampling by attribute value. In the Shapefile I have an integer attribute "covertype". There can be several polygons with the same covertype ID. What I want to do is get 100 sample points from polygons of covertype=1, 100 points from polygons of covertype=2... Is that possible? I looked at readShapePoly, readORG, and the suite of tools in spsample and but didn't see what I was looking for.It looks like overlay() is a good way to assign the value of covertype to coordinate pairs.One solution to fix all of this is to have a separate Shapefile for each covertype but I'd like to avoid that if possible.If you subset the SpatialPolygonsDataFrame object by covertype, and then spsample() for the subset, you might get a bit nearer. Subset using the [ operator as for data.frames:X_forest <- X[X$covertype="forest",] forest_pts <- spsample(X_forest, n=100) (untried)Using lapply, you could probably step along the factor (categorical) levels of covertype if there are many and (say) n is fixed.RogerNed Roger Bivand wrote:On Thu, 12 Feb 2009, Robert Hijmans wrote:Hi Ned, Good to hear that. For your other question, have a look at: require(maptools) ?readShapePolyOr more generally readOGR() in rgdal for another use of the underlying shapelib code.require(sp) ?sample.PolygonsMost likely the spsample() method will be enough, without having to pick a specific method - but if necessary the Polygons objects can be dissolved using unionSpatialPolygons() in maptools.RogerRobert On Thu, Feb 12, 2009 at 2:30 PM, Ned Horning <horn...@amnh.org> wrote:Robert,This worked - thanks. It's always uplifting to see an actual image after working on something for a while. Now I can start playing with parameters and playing with different approaches. I'm just (re)starting my R education and I'm pretty slow getting the hang of it but your examples help a lot. They also give me more avenues to discover other functionality and different ways of doing things. I hope I can keep working with R and GRASS until I know it this time. My goal is to get to the point where I can be productivewith these packages and start training other folks.The raster package looks very nice and I'll keep an eye on its development. One step that I am currently doing in GRASS is to read a Shapefile with training data (polygons with an integer attribute representing a land covertype) and then randomly create 100 points within each set of polygonsrepresenting a specific land cover type. I do this for each land cover typeand then concatenate the results into a text file. This file has thecoordinates that I use in xyValues to get the pixels values from the SPOTimage. Is there a way to do this sampling using the raster package or another R package that you are familiar with? In GRASS I convert theShapefile to a raster image before doing the random sampling and it would benice if I could skip this step. All the best, Ned Robert Hijmans wrote:I see, in my example, I had a single quantitative variable but you probably have land cover classes or something like that. If the classes are in fact numbers stored as text then use pred <- as.numeric(pred)but if you have words such as 'forest', 'crops', 'water' you could dosomething like ... pred <- predict(randfor, rowvals) pred[pred=='forest'] <- 1 pred[pred=='crops'] <- 2 pred[pred=='water'] <- 3 pred <- as.numeric(pred) predrast <- setValues(predrast, pred, r) ...not pretty, you could also fit RF with classes that can be interpretedas numbers.. Make sure you do not get: Warning message: NAs introduced by coercion which would suggest that some character values could not be transformed to numbers.On Thu, Feb 12, 2009 at 1:56 AM, Ned Horning <horn...@amnh.org> wrote:Robert,Using predrast <- setValues(predrast, as.vector(pred), r) I got anothererror: values must be numeric, integer or logical. class(pred) = "factor" dim(pred) = NULL class(v) = "character" length(v) == ncol(spot) = TRUE Ned Robert Hijmans wrote:Strange. You could try predrast <- setValues(predrast, as.vector(pred), r) But it would be good to know what pred is. Can you do this: class(pred) dim(pred) v <- as.vector(pred) class(v) length(v) == ncol(spot) RobertOn Wed, Feb 11, 2009 at 11:16 PM, Ned Horning <horn...@amnh.org> wrote:Robert and Roger,Thanks for the information and pointers. The raster package looks quiteinteresting and I'll try to get up to speed on some of its capabilities. Arethe man pages the best way to do that or is that a single documentavailable?I made some progress but still have some questions. I followed thestepslaid out by Robert and everything went fine except I ran into an errorwith"predrast <- setValues(predrast, pred, r)" in the for loop when I tried processing one line at a time and "r <- setValues(r, pred)" when I ranthefull image in one go. The error was: "values must be a vector." Anyidea what I'm doing wrong?I tried to read the GRASS files directly but got a message saying it isnota supported file format. Can you confirm that is the case or am I doing something wrong? I was able to read a tiff version of the image. I amable to run gdalinfo on GRASS files just fine from a terminal window. Thanks again for the help. Ned Robert Hijmans wrote:Ned, This is an example of running a RandomForest prediction with theraster package (for the simple case that there are no NA values in the raster data; if there are, you have to into account that "predict'does not return any values (not even NA) for those cells). Robert#install.packages("raster", repos="http://R-Forge.R-project.org")require(raster) require(randomForest) # for single band files spot <- stack('b1.tif', 'b2.tif', 'b3.tif') # for multiple band files# spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'),c(1,2,3) ) # simulate random points and values to model with xy <- xyFromCell(spot, round(runif(100) * ncell(spot))) response <- runif(100) * 100 # read values of raster layers at points, and bind to respinse trainvals <- cbind(response, xyValues(spot, xy)) # run RandomForest randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals) # apply the prediction, row by row predrast <- setRaster(spot) filename(predrast) <- 'RF_pred.grd' for (r in 1:nrow(spot)) { spot <- readRow(spot, r) rowvals <- values(spot, names=TRUE) # this next line should not be necessary, but it is # I'll fix that colnames(rowvals) <- c('b1', 'b2', 'b3') pred <- predict(randfor, rowvals) predrast <- setValues(predrast, pred, r) predrast <- writeRaster(predrast, overwrite=TRUE) } plot(predrast)On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand <roger.biv...@nhh.no>wrote:Ned:The three bands are most likely treated as 4-byte integers, so theobjectwill be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB. Theymayget copied too. There are no single byte user-level containers foryou(there is a raw data type, but you can't calculate with it). Possiblysaying spot_frame <- slot(spot, "data") will save one copying operation,but its hard to tell - your choice of method first adds inn all the coordinates, which are 8-byte numbers, so more than doubles its sizeandmakes more copies (to 233MB for each copy). Running gc() severaltimesmanually between steps often helps by making the garbage collectormore aggressive.I would watch the developments in the R-Forge package "raster", which builds on some of these things, and try to see how that works. If youhavethe GDAL-GRASS plugin for rasters, you can use readGDAL to read fromGRASS- which would work with raster package functions now. Look at thecode ofrecent readRAST6 to see which incantations are needed. If you aregoing touse randomForest for prediction, you can use smaller tiles until youfindan alternative solution. Note that feeding a data frame of integersto amodel fitting or prediction function will result in coercion to a matrix of doubles, so your subsequent workflow should take that intoaccount.Getting more memory is another option, and may be very cost andespecially time effective - at the moment your machine is swapping. Buying memory may save you time programming around too little memory. Hope this helps, Roger --- Roger Bivand, NHH, Helleveien 30, N-5045 Bergen, roger.biv...@nhh.no -----Original Message-----From: r-sig-geo-boun...@stat.math.ethz.ch on behalf of Ned HorningSent: Wed 11.02.2009 07:40 To: r-sig-geo@stat.math.ethz.ch Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame Greetings,I am trying to read an image from GRASS using the spgrass6 command readRAST6 and then convert it into a data.frame object so I can useitwith randomForest. The byte image I'm reading is 2732 rows x 3058 columns x 3 bands. It's a small subset of a larger image I would liketouse eventually. I have no problem reading the image using readRAST6but when I try to convert it to a data.frame object my linux systemresources (1BG RAM, 3GB swap) nearly get maxed out and it runs for a couple hours before I kill the process. The image is less than 25MBsoI'm surprised it requires this level of memory. Can someone let meknowwhy this is. Should I use something other than the GRASS interfacefor this? These are the commands I'm using:spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))spot_frame <- as(spot, "data.frame") Any help would be appreciated. All the best, Ned _______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo_______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo