Hi all: Wanted to respond a bit to this thread. rasterEngine is designed for parallel processing a single file (via chunking the inputs and parsing each chunk to a different worker). I do have a parallel resample algorithm on my radar, and I think I could adapt it for use with rasterEngine. However, this particular test case, that many files need a highly optimized system, so I think would benefit from:
1) Using e.g. gdalwarp natively from GDAL, which should be a lot faster than any resampling techniques I've seen in any software package. You can use gdalUtils for a within-R wrapper for the core GDAL binaries: install.packages("gdalUtils", repos="http://R-Forge.R-project.org") Note that I just pushed version 0.2.0 to both r-forge and to CRAN. Waiting to hear from CRAN, as soon as they accept it I'll make a general announcement. 2) As Matteo mentioned, the "unit" of parallelization with that many files should be the file itself (not within-file parallelization). In general, I recommend you develop your parallel code using foreach, and then use whatever parallel backend you want. "parallel" is now built into the core R, so you might want to stick with that as a backend rather than snow/snowfall. foreach is, in my opinion, conceptually easier than the other parallel packages, and is also very flexible (you can move your code from a single, multicore machine to an OpenMPI cluster with little effort). This will only execute the resampling one per worker, so multiple files being open shouldn't be an issue (since if you have 8 workers, you'll only have 8 files open at a time). To be clear, we're talking about the difference between: (what rasterEngine does): fileA -> fileA_chunk1 -> worker1 -> fileA_chunk2 -> worker2 (what Matteo and I are proposing): fileA -> worker1 fileB -> worker2 Be aware that resampling is I/O intensive, so parallel processing the files may not give you much (or any) benefit, if you are on a single hard drive. Even minor tweaks like reading from one drive and writing to another may generate some additional speed. If you've never learned parallel computing in R, I HIGHLY recommend working through: http://trg.apbionet.org/euasiagrid/docs/parallelR.notes.pdf This is the best tutorial I've seen on parallel computing in R. Wherever it says library("snow") I'd recommend switching that with library("parallel"), and pay attention to the foreach discussion. --j On Wed, Jan 8, 2014 at 12:00 PM, Camilo Mora <cm...@dal.ca> wrote: > Thank you Roger, > > Yes I should have mentioned that I have looked extensively over this > question. Curiously, Jonathan From "spatial.tools" provided me with some > advice as his tool at the current time does not run the function "resample". > > Thanks again, > > Camilo > > > > > Quoting Roger Bivand <roger.biv...@nhh.no>: > >> Did you notice the thread started yesterday that appears to meet your >> need: >> >> https://stat.ethz.ch/pipermail/r-sig-geo/2014-January/020156.html >> >> It is always a good idea to look at the list archives, a search on: >> >> "list:R-sig-geo raster parallel" >> >> gives a number of potentially interesting hits. You could then preface >> your posting by saying that you have already tried some possible solutions, >> and would like help with them. >> >> Roger >> >> On Wed, 8 Jan 2014, Camilo Mora wrote: >> >>> Hi everyone, >>> >>> I am using the package "raster" to interpolate a large number of rasters >>> (~1million) of different resolutions to a unique 1degree resolution grid and >>> wonder if you know if it is possible to do this in parallel computer?. >>> >>> My code (example below) works like a charm but it will take 30 days to >>> process all the rasters. Sadly, the process only uses one core of my >>> computer. I wonder if there is a way to run this code (example below) in >>> parallel computer?. >>> >>> Thanks, >>> >>> Camilo >>> >>> ####TEST CODE###### >>> library (raster) >>> >>> #creates 3 test rasters >>> a <- raster(nrow=3, ncol=3) >>> a[] <- 1:9 >>> >>> b <- raster(nrow=3, ncol=3) >>> b[] <- 10:18 >>> >>> c <- raster(nrow=3, ncol=3) >>> c[] <- 19:27 >>> >>> #concatenates the rasters >>> d<-brick(a,b,c) >>> >>> #creates a raster at a different resolution >>> s <- raster(nrow=10, ncol=10) >>> >>> #interpolates data from the brick to the new resolution >>> s <- resample(d, s, method='bilinear') >>> >>> _______________________________________________ >>> R-sig-Geo mailing list >>> R-sig-Geo@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> >> >> -- >> Roger Bivand >> Department of Economics, Norwegian School of Economics, >> Helleveien 30, N-5045 Bergen, Norway. >> voice: +47 55 95 93 55; fax +47 55 95 95 43 >> e-mail: roger.biv...@nhh.no >> >> >> > > _______________________________________________ > R-sig-Geo mailing list > R-sig-Geo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-geo -- Jonathan A. Greenberg, PhD Assistant Professor Global Environmental Analysis and Remote Sensing (GEARS) Laboratory Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 259 Computing Applications Building, MC-150 605 East Springfield Avenue Champaign, IL 61820-6371 Phone: 217-300-1924 http://www.geog.illinois.edu/~jgrn/ AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 _______________________________________________ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo