Sorry for the barrage of emails, but here is a more relevant example with two parallel solutions:
install.packages("spatial.tools", repos="http://R-Forge.R-project.org",type="source") library(spatial.tools) # The function passed to rasterEngine does the looping via lapply: classID_rasterEngine_function <- function(inraster,target_ids) { output <- lapply(target_ids,function(x,inraster) return(inraster == x),inraster=inraster) return(output) } tahoe_lidar_highesthit <- raster(system.file("external/tahoe_lidar_highesthit.tif", package="spatial.tools")) # We'll set up a set of input/outputs to loop through: output_basenames <- c("boulanger_test1","boulanger_test2") inrasters <- c(tahoe_lidar_highesthit, tahoe_lidar_highesthit) # Here are the ids you want to map: target_ids <- 1:3 ### We'll do two parallel processing approaches. The first will loop through each input sequentially, and process the ### input in parallel: # Parallel process applied within a file: sfQuickInit() for(i in seq(output_basenames)) { output_filenames <- paste(output_basenames[i],target_ids,sep="_") output <- rasterEngine(inraster=inrasters[[i]],fun=classID_rasterEngine_function,filename=output_filenames, args=list(target_ids=target_ids)) } sfQuickStop() ### Parallel process each individual file (rasterEngine will run sequentially in this case): ## In this case, you are processing multiple input/outputs at once, but rasterEngine will ## run sequentially. sfQuickInit() foreach(i=seq(output_basenames),.packages="spatial.tools") %dopar% { output_filenames <- paste(output_basenames[i],target_ids,sep="_") # You could use calc() here instead: output <- rasterEngine(inraster=inrasters[[i]],fun=classID_rasterEngine_function, filename=output_filenames,args=list(target_ids=target_ids)) } sfQuickStop() On Thu, Mar 13, 2014 at 11:41 AM, Jonathan Greenberg <j...@illinois.edu> wrote: > There is typically a diminishing returns with larger and larger > "chunks", but there is a low-end in terms of file size and chunk size > for how much parallel processing helps. However, if you are only > reading and writing once, I don't see too much of an advantage of > loading everything into memory once, since that is what calc()/focal() > and rasterEngine() both do (read from the file once, process it, and > write the output), just in small sizes. If you are using the same > input file over and over again than I do see it being helpful. As a > general rule, we try to use chunking in raster processing because it > becomes almost infinitely scalable (almost :) > > One issue that I've run into with R is that I've found it REALLY hard > to do a memory profile of a function -- if I could figure out the > memory footprint of a function (its MAX memory usage during its > execution), I could auto-optimize the chunk size. Right now, I use a > conservative multiplier of the number of input bands divided by the > number of workers in the cluster, but this doesn't account for memory > spikes within the function itself. Perhaps Robert can chime in on any > tricks for auto-optimizing the chunk size he's come across? > > --j > > > On Thu, Mar 13, 2014 at 11:26 AM, Boulanger, Yan > <yan.boulan...@rncan-nrcan.gc.ca> wrote: >> Is it so that system.time in this case can strongly depend on how much data >> is placed in RAM? In my case, I'm far from being memory limited (RAM = 192 >> Gb) and most of the time, it's faster to put everything in memory and then >> process it. The major limiting speed factor here is I/O. >> >> yan >> >> Yan Boulanger, Chercheur scientifique / Research scientist >> Ressources Naturelles Canada, Canadian Forest Service >> Centre de Foresterie des Laurentides >> 1055, rue du P.E.P.S. >> C.P. 10380, succ. Sainte-Foy >> Québec (Québec) Canada >> G1V 4C7 >> Tel. : +1 418 649-6859 >> >> >> -----Original Message----- >> From: jgrn...@gmail.com [mailto:jgrn...@gmail.com] On Behalf Of Jonathan >> Greenberg >> Sent: 13 mars 2014 12:18 >> To: Alex Zvoleff >> Cc: Boulanger, Yan; r-sig-geo@r-project.org >> Subject: Re: [R-sig-Geo] loops in rasterEngine >> >> Yan: >> >> Looks like you are getting great help with this -- I want to echo Alex's >> note that rasterEngine is not a catchall -- for REALLY simple processes >> you'll get better performance using calc() or using LESS workers (which may >> seem counter intuitive). I'm submitting a paper this week that showed that >> a function that just multiplies a raster by >> 10 ran faster than calc() only when using 4 workers >> (sfQuickInit(cpus=4)) (vs. calc's 1), but was slower than calc if you have >> less or more workers. As a rule, rasterEngine, at present, is slower than >> calc when operation in sequential mode. >> >> Now, as an important note, if you grab the latest spatial.tools from >> r-forge, I have added a feature that will return multiple rasters at once, >> which seems like what you want to do. You'll want to return a >> list-of-arrays (each component will be written to its own raster) and make >> sure you specify the output filenames (the components will be matched >> against the output filenames). This may result in a significant speedup >> because you are only reading each raster once, and returning all the outputs >> (vs. the example above reads/writes the rasters for every i). >> >> --j >> >> On Thu, Mar 13, 2014 at 9:06 AM, Alex Zvoleff <azvol...@conservation.org> >> wrote: >>> On Wed, Mar 12, 2014 at 11:29 PM, Boulanger, Yan >>> <yan.boulan...@rncan-nrcan.gc.ca> wrote: >>>> Actually, I have several rasters of more than 440 000 000 pixels >>>> (MODIS covering all Canada) and I have a 32-cores machine so I would >>>> like to take advantage of it! ;-) >>>> >>>> Time is money (really?!!) >>> >>> As mentioned earlier, I would be careful about using rasterEngine for >>> this kind of task. It may actually slow you down. I would recommend >>> testing on smaller subsets to determine your gains (or losses) from >>> doing this type of calculation in parallel versus sequentially. While >>> I have seen great speed increases for CPU intensive calculations from >>> using rasterEngine, it sounds like your processing is heavily IO >>> intensive. I am not sure 32 cores will help you unless you have a very >>> fast disk or RAID array. >>> >>> Alex >>> >>>> >>>> Thanks again! >>>> yan >>>> >>>> Yan Boulanger, Chercheur scientifique / Research scientist Ressources >>>> Naturelles Canada, Canadian Forest Service Centre de Foresterie des >>>> Laurentides 1055, rue du P.E.P.S. >>>> C.P. 10380, succ. Sainte-Foy >>>> Québec (Québec) Canada >>>> G1V 4C7 >>>> Tel. : +1 418 649-6859 >>>> >>>> From: Forrest Stevens [mailto:forr...@ufl.edu] >>>> Sent: 12 mars 2014 22:25 >>>> To: Boulanger, Yan >>>> Cc: r-sig-geo@r-project.org >>>> Subject: Re: [R-sig-Geo] loops in rasterEngine >>>> >>>> Hi Yan, I guess I would be surprised for such a simple process if >>>> rasterEngine() would be worth the overhead? Though, admittedly, Jonathan >>>> Greenberg might have more information on the topic. To do such an >>>> operation this is the approach I would take without using rasterEngine(): >>>> >>>> >>>> for (i in 1:5) { >>>> assign(paste("Safranyik_zones_1961_1990b_",i, sep=""), >>>> Safranyik_zones_1961_1990b == i) } >>>> >>>> >>>> To do it using rasterEngine() this is the function definition that I would >>>> use. This of course requires that you've already created a cluster using >>>> one of the various supported parallel backends otherwise you'll gain >>>> nothing from the parallel processing. >>>> >>>> >>>> require("spatial.tools") >>>> >>>> ## Begin a parallel cluster and register it with foreach: >>>> ## The number of nodes/cores to use in the cluster cpus = 2 cl <- >>>> makeCluster(spec = cpus, type = "PSOCK", methods = FALSE) ## Register >>>> the cluster with foreach: >>>> registerDoParallel(cl) >>>> >>>> ## Or use the following, quick and dirty way: >>>> #sfQuickInit(cpus=2) >>>> >>>> fun_zone <- function( zones, i, ...) { >>>> return(zones == i) >>>> } >>>> >>>> for (j in 1:5){ >>>> assign(paste("Safranyik_zones_1961_1990b_",j, sep=""), >>>> rasterEngine( zones=Safranyik_zones_1961_1990b, args=list("i"=j), >>>> fun=fun_zone) ) } >>>> >>>> stopCluster(cl) >>>> #sfQuickStop() >>>> >>>> >>>> Hope this helps, >>>> Forrest >>>> >>>> -- >>>> Forrest R. Stevens >>>> Ph.D. Candidate, QSE3 IGERT Fellow >>>> Department of Geography >>>> Land Use and Environmental Change Institute University of Florida >>>> www.clas.ufl.edu/users/forrest<http://www.clas.ufl.edu/users/forrest> >>>> >>>> On Wed, Mar 12, 2014 at 8:51 PM, Boulanger, Yan >>>> <yan.boulan...@rncan-nrcan.gc.ca<mailto:yan.boulan...@rncan-nrcan.gc.ca>> >>>> wrote: >>>> Hi folks, >>>> >>>> I guess I have a lot to learn to write functions but I'm stuck when using >>>> rasterEngine. It seems that it should be very easy to do but I'm missing >>>> something, apparently... I have a raster, Safranyik_zones_1961_1990, with >>>> values (integer) from 1 to 5. I would like to create five rasters for >>>> which value will be 1 when the raster Safranyik_zones_1961_1990 is equal >>>> to "i", and NA otherwise. I would like to run everything in a loop . >>>> Here's what I thought would be ok. >>>> >>>> fun_zone <- function(Safranyik_zones,i,...) { Safranyik_zonesb <- >>>> Safranyik_zones Safranyik_zonesb[] <- NA >>>> Safranyik_zonesb[Safranyik_zones == i] <- 1 >>>> return(Safranyik_zonesb) >>>> } >>>> >>>> for (i in 1:5){ >>>> Safranyik_zones_1961_1990b <- >>>> rasterEngine(Safranyik_zones=Safranyik_zones_1961_1990,i=i, >>>> fun=fun_zone) assign(paste("Safranyik_zones_1961_1990b_",i, >>>> sep=""),Safranyik_zones_1961_1990b[[1]]) >>>> } >>>> >>>> Of course, it says that « i » is missing...: >>>> >>>>>Erreur dans Safranyik_zones == i : 'i' est manquant >>>> >>>> Any help? >>>> >>>> Thanks in advance, >>>> >>>> Yan >>>> >>>> >>>> Yan Boulanger, Chercheur scientifique / Research scientist Ressources >>>> Naturelles Canada, Canadian Forest Service Centre de Foresterie des >>>> Laurentides 1055, rue du P.E.P.S. >>>> C.P. 10380, succ. Sainte-Foy >>>> Québec (Québec) Canada >>>> G1V 4C7 >>>> Tel. : +1 418 649-6859 >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> _______________________________________________ >>>> R-sig-Geo mailing list >>>> R-sig-Geo@r-project.org<mailto:R-sig-Geo@r-project.org> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> _______________________________________________ >>>> R-sig-Geo mailing list >>>> R-sig-Geo@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >>>> >>> >>> >>> >>> -- >>> Alex Zvoleff >>> Postdoctoral Associate >>> Tropical Ecology Assessment and Monitoring (TEAM) Network Conservation >>> International >>> 2011 Crystal Dr. Suite 500, Arlington, Virginia 22202, USA >>> Tel: +1-703-341-2749, Fax: +1-703-979-0953, Skype: azvoleff >>> http://www.teamnetwork.org | http://www.conservation.org >>> >>> _______________________________________________ >>> R-sig-Geo mailing list >>> R-sig-Geo@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> >> >> >> -- >> Jonathan A. Greenberg, PhD >> Assistant Professor >> Global Environmental Analysis and Remote Sensing (GEARS) Laboratory >> Department of Geography and Geographic Information Science University of >> Illinois at Urbana-Champaign >> 259 Computing Applications Building, MC-150 >> 605 East Springfield Avenue >> Champaign, IL 61820-6371 >> Phone: 217-300-1924 >> http://www.geog.illinois.edu/~jgrn/ >> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 > > > > -- > Jonathan A. Greenberg, PhD > Assistant Professor > Global Environmental Analysis and Remote Sensing (GEARS) Laboratory > Department of Geography and Geographic Information Science > University of Illinois at Urbana-Champaign > 259 Computing Applications Building, MC-150 > 605 East Springfield Avenue > Champaign, IL 61820-6371 > Phone: 217-300-1924 > http://www.geog.illinois.edu/~jgrn/ > AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 -- Jonathan A. Greenberg, PhD Assistant Professor Global Environmental Analysis and Remote Sensing (GEARS) Laboratory Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 259 Computing Applications Building, MC-150 605 East Springfield Avenue Champaign, IL 61820-6371 Phone: 217-300-1924 http://www.geog.illinois.edu/~jgrn/ AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 _______________________________________________ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo