On Friday 25 January 2008, Roger Bivand wrote: > On Thu, 24 Jan 2008, Dylan Beaudette wrote: > > On Thursday 24 January 2008, Roger Bivand wrote: > >> On Thu, 24 Jan 2008, Dylan Beaudette wrote: > >>> Hi Roger. Thanks for contributing some answers to this. > >>> > >>> I was recently working with a colleague on developing some sample > >>> exercises for new students. Since joining new attribute data to a GIS > >>> layer's table is a very common operation we included some samples on > >>> how to do this within R. You have hinted at some possible ways to do it > >>> above, but do you have a 'best practices' approach to doing this using > >>> 'sp' methods and objects? > >>> > >>> For example: > >>> > >>> # contains an attribute col named 'veg_code' > >>> veg <- readOGR(something.shp) > >>> > >>> # code meanings: indexed by 'veg_code' > >>> codes <- read.dbf(table.dbf) > >>> > >>> # what is the best way to join up the attributes in 'veg' with the rows > >>> in 'codes' ? > >> > >> Hi Dylan, > >> > >> This is a different question, but I won't break it out of this thread > >> yet. > >> > >> You are doing look-up on codes to give labels to veg$veg_code, right? > >> > >> veg$veg_code are integer indices to codes$V1 (say V1, I don't know what > >> it is). If length(unique(veg$veg_code)) == length(codes$V1), and > >> sort(unique(veg$veg_code)) is 1:length(codes$V1), you should think of > >> the factor as your friend: > >> > >> veg$veg_code_factor <- factor(veg$veg_code, > >> labels=as.character(codes$V1)) > >> > >> If not, you need another layer using perhaps order() or match() on the > >> matching substring of codes$V1 to find out which value of veg$veg_code > >> should have which label in as.character(codes$V1). Alternatively use the > >> levels= argument to factor(). > >> > >> Something like: > >> > >> set.seed(1) > >> veg_code <- rpois(100, 4) > >> table(veg_code) > >> V1 <- paste("code", 0:10) > >> V1 > >> levs <- 0:10 > >> veg_code_factor <- factor(veg_code, levels=levs, labels=V1) > >> table(veg_code_factor, veg_code) > >> > >> No merging or messing with veg itself is needed, apart from adding a > >> single extra factor column. The factor abstraction is a great strength > >> of the S language. > >> > >> Have I misunderstood you? > >> > >> Roger > > > > I think so. > > > > I was (trying to) describe the process of joining, either 1:1 or many:1, > > the att table associated with an sp object and some other data frame. > > > > merge() seems to work fine, but the order of the rows are different from > > the original data frame attached to the sp object.
Thanks for the detailed response Roger. > My experience is that merge() is often a false friend, because of object > coercion and sorting, as you suggest. The rules for SpatialLinesDataFrame > and SpatialPolygonsDataFrame are simple, and SpatialPointsDataFrame (and > by extension SpatialPixelsDataFrame) can be made as simple. The first two > constructor functions take a match.ID= argument, default TRUE, which > matches the geometry ID - the ID slot in the component Lines or Polygons > objects - to the data.frame row.name. For SpatialPointsDataFrame(), the > same argument exists, but is only used if the coordinate matrix has row > names to be matched to the data.frame row names. If it has been used, the > data slot row names are the geometry IDs. > > From there, it gets harder. merge() takes lots of arguments, and a lot of > trial and error is needed to reach a satisfactory result, that is a single > data frame with row names matching the geometry IDs (not necessarily in > the right order, but the same set of IDs). > > > My question was on the best way to 'update' the data frame attached to an > > sp object, based on the results from a merge() with some other data. > > For one to one, there isn't really an alternative to extracting the data > slot, do the merge, check the geometry IDs against the output data frame > row names, and re-construct the object. That is, essentially where this > thread began. > > If there is less data than geometries, merge() should fill out with NAs. > If there is more data than geometries, the output data frame will need > subsetting. Both match(), order(), and %in% are very handy here. > > For one geometry list to many - see reshape() first, for example to > flatten a "tall" set of space/time observations so that the time values > become new attribute columns. This could be met stations stacked by > station and date, which need widening to stations by date*attributes > > If need be, the IDs of the geometries can be changed too, see > ?"spChFIDs-methods" in maptools. > > Any closer? I think that you have answered my questions and reaffirmed my thoughts on how to proceed. I am mostly interested in 1:1 and 1:many joins (geom:attr), and I think that the following approach should work in the general case: # read vector data: v <- readOGR(dsn='...', layer='...') # safer approach to 1:1 or 1:many (geom:atts) joins # add a sorting id for later use [EMAIL PROTECTED] <- 1:nrow([EMAIL PROTECTED]) # make a copy of the original table orig.table <- [EMAIL PROTECTED] # make the table with 'joined' data new.table <- merge(x=orig.table, y=veg_codes, by.x='CODE', by.y='code') # re-order this table based on the original row order new.table.ordered <- new.table[order(new.table$sorting_id), ] # restore the origina row names row.names(new.table.ordered) <- row.names(orig.table) # replace the data table [EMAIL PROTECTED] <- new.table.ordered This should ensure that the merged rows are in the original order, and have the original row.names. Does that sound correct? Thanks, Dylan > Roger > > > does that help? > > > > cheers, > > > > Dylan > > > >>> As of now we are using merge to replace the dataframe slot of the > >>> original file. We first re-order the results from merge to match the > >>> original row ordering: > >>> > >>> > >>> # an example file: > >>> veg <- readOGR(dsn='ArcGISLabData/BrownsPond/', layer='vegevector') > >>> > >>> # some example codes > >>> veg_codes <- data.frame(code=1:4, meaning=c('code 1','code 2','code > >>> 3','code 4')) > >>> > >>> # join the original data table with the veg codes table > >>> combined <- merge([EMAIL PROTECTED], y=veg_codes, by.x='CODE', > >>> by.y='code') > >>> > >>> > >>> # overwrite the original data frame with the combined version > >>> # note that the original order needs to be restored > >>> # since the original data was sorted on 'ID', we can use that to > >>> restore # the correct order in the 'combined' dataframe: > >>> [EMAIL PROTECTED] <- combined[order(combined$ID),] > >>> > >>> > >>> In summary, is there a safer or preferred way to do this? > >>> > >>> thanks, > >>> > >>> Dylan -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 _______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo