Yes, but why you are quering each point individually (one by one) in a loop, and why you are writing every single data point in a different file? Loops are very slow in R, and reading/writing from/to a disk file is also a very slow process. Do it several hundred thousand times and it is no surprise that this takes forever. I do not have time to examine your code either (next time provide a minimally reproducible example, not a long script), but it seems that you have some netCDF files with environmental data and a shapefile with locations, and you want to get the environmental data for each location. If this is what you want to do, you should:
a) Read the shapefiles into a SpatialPoints object. b) Use the raster package and read the netCDF rasters into a rasterbrick object. c) Use the over function and do an overlay between the points and the rasterbrick. You will get a SpaitalPointsDataFrame object, with the data associated to each point. d) Save the data into a single file. This should take some time, say 20-30 minutes perhaps, maybe much less, depending on your machine speed. Julian ping yang writes: > Hi, > > The purpose is to generate weather information(tmin,tmim,prcp) for every 4 > KM (every 4KM there is a point represent that grid) for 32 years throughout > Contingent U.S. > I want to run several states simultaneously which I want it run them > parallel. > > Thanks, > > Ping > > > On Sun, Feb 2, 2014 at 1:45 PM, Julian Burgos <[email protected]> wrote: > >> A clear explanation of what you are trying to do would help. I cannot >> figure out why you need to generate such a huge amount of files! Writing >> data into a disk is a very slow step. >> >> > Hi r-sig-geo, >> > >> > >> > >> > I am using the following script to extraction weather information for >> each >> > point (from a rasterized shapefile for the COUS) which generate about >> > 460,000 files: >> > >> > Originally I read the coordinates from a shapefile: >> > >> > Library(rgdal) >> > >> > pl <- readOGR(".","US_2_points") >> > >> > >> > >> > then I used the following code to extract the information from three >> > NetCDF >> > files that contains the tmin,tmax and pr for each year : >> > >> > for (year in 1981:2012) >> > >> > { >> > >> > #get the combined netCDF file >> > >> > tminfile <- paste("tmin","_",year,".nc",sep='') >> > >> > b_tmin <- brick(tminfile,varname='tmin') >> > >> > pptfile <- paste("ppt","_",year,".nc",sep='') >> > >> > b_ppt <- brick(pptfile,varname='ppt') >> > >> > tmaxfile <- paste("tmax","_",year,".nc",sep='') >> > >> > b_tmax <- brick(tmaxfile,varname='tmax') >> > >> > #Get the first year here!!! >> > >> > print(paste("processing year :",year,sep='')) >> > >> > for(l in 1:length(pl)) >> > >> > { >> > >> > v <- NULL >> > >> > #generate file with the name convention with >> > t_n(latitude)w(longitude).txt, 5 digits after point should be work >> > >> > filename <- >> > >> paste("e:/PRISM/US3/N",round(coordinates(pl[l,])[2],5),"W",abs(round(coordinates(pl[l,])[1],5)),".wth",sep='') >> > >> > >> > print(paste("processing file :",filename,sep='')) >> > >> > tmin <- >> > as.numeric(round(extract(b_tmin,coordinates(pl[l,])),digits=1)) >> > >> > tmax <- >> > as.numeric(round(extract(b_tmax,coordinates(pl[l,])),digits=1)) >> > >> > ppt <- >> > as.numeric(round(extract(b_ppt,coordinates(pl[l,])),digits=2)) >> > >> > v <- cbind(tmax,tmin,ppt) >> > >> > tablename <- c("tmin","tmax","ppt") >> > >> > v <- data.frame(v) >> > >> > colnames(v) <- tablename >> > >> > v["default"] <- 0 >> > >> > v["year"] <- year >> > >> > date <- >> > >> seq(as.Date(paste(year,"/1/1",sep='')),as.Date(paste(year,"/12/31",sep='')),"days") >> > >> > month <- as.numeric(substr(date,6,7)) >> > >> > day <- as.numeric(substr(date,9,10)) >> > >> > v["month"] <- month >> > >> > v["day"] <- day >> > >> > v <- v[c("year","month","day","default","tmin","tmax","ppt")] >> > >> > #write into a file with the fixed wide format >> > >> > if(file.existes(filename)) >> > >> > { >> > >> > >> > >> write.fwf(x=v,filename,append=TRUE,na="NA",rownames=FALSE,colnames=FALSE,width=c(6,3,3,5,5,5,6)) >> > >> > } >> > >> > else >> > >> > { >> > >> > >> > >> write.fwf(x=v,filename,append=FALSE,na="NA",rownames=FALSE,colnames=FALSE,width=c(6,3,3,5,5,5,6)) >> > >> > } >> > >> > >> > >> > I found it will takes weeks to finish, then I divide the points from the >> > shape file into each State(to generate a shape file for each state for >> the >> > points), and using the following function: >> > >> > >> > >> > weather4Point <- function(state) >> > >> > { >> > >> > #get the point file for the state >> > >> > setwd("c:/PRISM/US1Points/") >> > >> > pl <- readOGR(".",paste("points4_",state,sep='')) >> > >> > setwd("e:/PRISM/NetCDF/") >> > >> > for (year in 1981:2012) >> > >> > { >> > >> > #get the combined netCDF file >> > >> > tminfile <- paste("tmin","_",year,".nc",sep='') >> > >> > b_tmin <- brick(tminfile,varname='tmin') >> > >> > pptfile <- paste("ppt","_",year,".nc",sep='') >> > >> > b_ppt <- brick(pptfile,varname='ppt') >> > >> > tmaxfile <- paste("tmax","_",year,".nc",sep='') >> > >> > b_tmax <- brick(tmaxfile,varname='tmax') >> > >> > #Get the first year here!!! >> > >> > print(paste("processing year :",year,sep='')) >> > >> > for(l in 1:length(pl)) >> > >> > { >> > >> > v <- NULL >> > >> > #generate file with the name convention with >> > t_n(latitude)w(longitude).txt, 5 digits after point should be work >> > >> > filename <- >> > >> paste("e:/PRISM/US3/N",round(coordinates(pl[l,])[2],5),"W",abs(round(coordinates(pl[l,])[1],5)),".wth",sep='') >> > >> > >> > print(paste("processing file :",filename,sep='')) >> > >> > tmin <- >> > as.numeric(round(extract(b_tmin,coordinates(pl[l,])),digits=1)) >> > >> > tmax <- >> > as.numeric(round(extract(b_tmax,coordinates(pl[l,])),digits=1)) >> > >> > ppt <- >> > as.numeric(round(extract(b_ppt,coordinates(pl[l,])),digits=2)) >> > >> > v <- cbind(tmax,tmin,ppt) >> > >> > tablename <- c("tmin","tmax","ppt") >> > >> > v <- data.frame(v) >> > >> > colnames(v) <- tablename >> > >> > v["default"] <- 0 >> > >> > v["year"] <- year >> > >> > date <- >> > >> seq(as.Date(paste(year,"/1/1",sep='')),as.Date(paste(year,"/12/31",sep='')),"days") >> > >> > month <- as.numeric(substr(date,6,7)) >> > >> > day <- as.numeric(substr(date,9,10)) >> > >> > v["month"] <- month >> > >> > v["day"] <- day >> > >> > v <- v[c("year","month","day","default","tmin","tmax","ppt")] >> > >> > v >> > >> > #print(paste(v), zero.print = ".") >> > >> > #write into a file with the APEX format >> > >> > if(file.existes(filename)) >> > >> > { >> > >> > >> > >> write.fwf(x=v,filename,append=TRUE,na="NA",rownames=FALSE,colnames=FALSE,width=c(6,3,3,5,5,5,6)) >> > >> > } >> > >> > else >> > >> > { >> > >> > >> > >> write.fwf(x=v,filename,append=FALSE,na="NA",rownames=FALSE,colnames=FALSE,width=c(6,3,3,5,5,5,6)) >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > >> > >> > Then using the following code by opening several R session to make this >> > process "parallel". >> > >> > >> > >> > States <- c("AL","AZ","AR","CO","CT","DC","FL","GA","ID","IA", >> > >> > "IN","KS","KY","LA","MA","ME","MI","MN","MT","NC") >> > >> > for (state in States) >> > >> > { >> > >> > weather4Point(state) >> > >> > } >> > >> > >> > >> > However, I still feel slow when I plan to do for the whole US. I am >> asking >> > here is there a better solution to speed up this process(any possibility >> > to >> > employ parallel programming in R) ? >> > >> > >> > >> > Environment information: >> > >> > R version 3.0.1 (2013-05-16) >> > >> > Platform: x86_64-w64-mingw32/x64 (64-bit) >> > >> > Memory: 16 GB >> > >> > Processor: Intel(R) Core(tm) i7-3770 CPU @ 3.4Ghz 3.4Ghz >> > >> > >> > >> > Looking forward to any suggestions? >> > >> > >> > >> > Regards, >> > >> > >> > >> > Ping >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > R-sig-Geo mailing list >> > [email protected] >> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> > >> >> >> -- Julian Mariano Burgos, PhD Hafrannsóknastofnun/Marine Research Institute Skúlagata 4, 121 Reykjavík, Iceland Sími/Telephone : +354-5752037 Bréfsími/Telefax: +354-5752001 Netfang/Email: [email protected] _______________________________________________ R-sig-Geo mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-geo
