Sorry for the barrage of emails, but here is a more relevant example
with two parallel solutions:

install.packages("spatial.tools",
repos="http://R-Forge.R-project.org",type="source";)
library(spatial.tools)

# The function passed to rasterEngine does the looping via lapply:
classID_rasterEngine_function <- function(inraster,target_ids)
{
output <- lapply(target_ids,function(x,inraster) return(inraster ==
x),inraster=inraster)
return(output)
}

tahoe_lidar_highesthit <-
raster(system.file("external/tahoe_lidar_highesthit.tif",
package="spatial.tools"))

# We'll set up a set of input/outputs to loop through:
output_basenames <- c("boulanger_test1","boulanger_test2")
inrasters <- c(tahoe_lidar_highesthit, tahoe_lidar_highesthit)

# Here are the ids you want to map:
target_ids <- 1:3

### We'll do two parallel processing approaches.  The first will loop
through each input sequentially, and process the
### input in parallel:

# Parallel process applied within a file:
sfQuickInit()
for(i in seq(output_basenames))
{
output_filenames <- paste(output_basenames[i],target_ids,sep="_")
output <- 
rasterEngine(inraster=inrasters[[i]],fun=classID_rasterEngine_function,filename=output_filenames,
args=list(target_ids=target_ids))
}
sfQuickStop()

### Parallel process each individual file (rasterEngine will run
sequentially in this case):
## In this case, you are processing multiple input/outputs at once,
but rasterEngine will
## run sequentially.

sfQuickInit()
foreach(i=seq(output_basenames),.packages="spatial.tools") %dopar%
{
output_filenames <- paste(output_basenames[i],target_ids,sep="_")
# You could use calc() here instead:
output <- 
rasterEngine(inraster=inrasters[[i]],fun=classID_rasterEngine_function,
filename=output_filenames,args=list(target_ids=target_ids))
}
sfQuickStop()



On Thu, Mar 13, 2014 at 11:41 AM, Jonathan Greenberg <j...@illinois.edu> wrote:
> There is typically a diminishing returns with larger and larger
> "chunks", but there is a low-end in terms of file size and chunk size
> for how much parallel processing helps.  However, if you are only
> reading and writing once, I don't see too much of an advantage of
> loading everything into memory once, since that is what calc()/focal()
> and rasterEngine() both do (read from the file once, process it, and
> write the output), just in small sizes.  If you are using the same
> input file over and over again than I do see it being helpful.  As a
> general rule, we try to use chunking in raster processing because it
> becomes almost infinitely scalable (almost :)
>
> One issue that I've run into with R is that I've found it REALLY hard
> to do a memory profile of a function -- if I could figure out the
> memory footprint of a function (its MAX memory usage during its
> execution), I could auto-optimize the chunk size.  Right now, I use a
> conservative multiplier of the number of input bands divided by the
> number of workers in the cluster, but this doesn't account for memory
> spikes within the function itself.  Perhaps Robert can chime in on any
> tricks for auto-optimizing the chunk size he's come across?
>
> --j
>
>
> On Thu, Mar 13, 2014 at 11:26 AM, Boulanger, Yan
> <yan.boulan...@rncan-nrcan.gc.ca> wrote:
>> Is it so that system.time in this case can strongly depend on how much data 
>> is placed in RAM? In my case, I'm far from being memory limited (RAM = 192 
>> Gb) and most of the time, it's faster to put everything in memory and then 
>> process it. The major limiting speed factor here is I/O.
>>
>> yan
>>
>> Yan Boulanger, Chercheur scientifique / Research scientist
>> Ressources Naturelles Canada, Canadian Forest Service
>> Centre de Foresterie des Laurentides
>> 1055, rue du P.E.P.S.
>> C.P. 10380, succ. Sainte-Foy
>> Québec (Québec) Canada
>> G1V 4C7
>> Tel. : +1 418 649-6859
>>
>>
>> -----Original Message-----
>> From: jgrn...@gmail.com [mailto:jgrn...@gmail.com] On Behalf Of Jonathan 
>> Greenberg
>> Sent: 13 mars 2014 12:18
>> To: Alex Zvoleff
>> Cc: Boulanger, Yan; r-sig-geo@r-project.org
>> Subject: Re: [R-sig-Geo] loops in rasterEngine
>>
>> Yan:
>>
>> Looks like you are getting great help with this -- I want to echo Alex's 
>> note that rasterEngine is not a catchall -- for REALLY simple processes 
>> you'll get better performance using calc() or using LESS workers (which may 
>> seem counter intuitive).  I'm submitting a paper this week that showed that 
>> a function that just multiplies a raster by
>> 10 ran faster than calc() only when using 4 workers
>> (sfQuickInit(cpus=4)) (vs. calc's 1), but was slower than calc if you have 
>> less or more workers.  As a rule, rasterEngine, at present, is slower than 
>> calc when operation in sequential mode.
>>
>> Now, as an important note, if you grab the latest spatial.tools from 
>> r-forge, I have added a feature that will return multiple rasters at once, 
>> which seems like what you want to do.  You'll want to return a 
>> list-of-arrays (each component will be written to its own raster) and make 
>> sure you specify the output filenames (the components will be matched 
>> against the output filenames).  This may result in a significant speedup 
>> because you are only reading each raster once, and returning all the outputs 
>> (vs. the example above reads/writes the rasters for every i).
>>
>> --j
>>
>> On Thu, Mar 13, 2014 at 9:06 AM, Alex Zvoleff <azvol...@conservation.org> 
>> wrote:
>>> On Wed, Mar 12, 2014 at 11:29 PM, Boulanger, Yan
>>> <yan.boulan...@rncan-nrcan.gc.ca> wrote:
>>>> Actually, I have several rasters of more than 440 000 000 pixels
>>>> (MODIS covering all Canada) and I have a 32-cores machine so I would
>>>> like to take advantage of it! ;-)
>>>>
>>>> Time is money (really?!!)
>>>
>>> As mentioned earlier, I would be careful about using rasterEngine for
>>> this kind of task. It may actually slow you down. I would recommend
>>> testing on smaller subsets to determine your gains (or losses) from
>>> doing this type of calculation in parallel versus sequentially. While
>>> I have seen great speed increases for CPU intensive calculations from
>>> using rasterEngine, it sounds like your processing is heavily IO
>>> intensive. I am not sure 32 cores will help you unless you have a very
>>> fast disk or RAID array.
>>>
>>> Alex
>>>
>>>>
>>>> Thanks again!
>>>> yan
>>>>
>>>> Yan Boulanger, Chercheur scientifique / Research scientist Ressources
>>>> Naturelles Canada, Canadian Forest Service Centre de Foresterie des
>>>> Laurentides 1055, rue du P.E.P.S.
>>>> C.P. 10380, succ. Sainte-Foy
>>>> Québec (Québec) Canada
>>>> G1V 4C7
>>>> Tel. : +1 418 649-6859
>>>>
>>>> From: Forrest Stevens [mailto:forr...@ufl.edu]
>>>> Sent: 12 mars 2014 22:25
>>>> To: Boulanger, Yan
>>>> Cc: r-sig-geo@r-project.org
>>>> Subject: Re: [R-sig-Geo] loops in rasterEngine
>>>>
>>>> Hi Yan, I guess I would be surprised for such a simple process if 
>>>> rasterEngine() would be worth the overhead? Though, admittedly, Jonathan 
>>>> Greenberg might have more information on the topic.  To do such an 
>>>> operation this is the approach I would take without using rasterEngine():
>>>>
>>>>
>>>> for (i in 1:5) {
>>>>   assign(paste("Safranyik_zones_1961_1990b_",i, sep=""),
>>>> Safranyik_zones_1961_1990b == i) }
>>>>
>>>>
>>>> To do it using rasterEngine() this is the function definition that I would 
>>>> use. This of course requires that you've already created a cluster using 
>>>> one of the various supported parallel backends otherwise you'll gain 
>>>> nothing from the parallel processing.
>>>>
>>>>
>>>> require("spatial.tools")
>>>>
>>>> ## Begin a parallel cluster and register it with foreach:
>>>> ## The number of nodes/cores to use in the cluster cpus = 2 cl <-
>>>> makeCluster(spec = cpus, type = "PSOCK", methods = FALSE) ## Register
>>>> the cluster with foreach:
>>>> registerDoParallel(cl)
>>>>
>>>> ##       Or use the following, quick and dirty way:
>>>> #sfQuickInit(cpus=2)
>>>>
>>>> fun_zone <- function( zones, i, ...) {
>>>>   return(zones == i)
>>>> }
>>>>
>>>> for (j in 1:5){
>>>>   assign(paste("Safranyik_zones_1961_1990b_",j, sep=""),
>>>> rasterEngine( zones=Safranyik_zones_1961_1990b, args=list("i"=j),
>>>> fun=fun_zone) ) }
>>>>
>>>> stopCluster(cl)
>>>> #sfQuickStop()
>>>>
>>>>
>>>> Hope this helps,
>>>> Forrest
>>>>
>>>> --
>>>> Forrest R. Stevens
>>>> Ph.D. Candidate, QSE3 IGERT Fellow
>>>> Department of Geography
>>>> Land Use and Environmental Change Institute University of Florida
>>>> www.clas.ufl.edu/users/forrest<http://www.clas.ufl.edu/users/forrest>
>>>>
>>>> On Wed, Mar 12, 2014 at 8:51 PM, Boulanger, Yan 
>>>> <yan.boulan...@rncan-nrcan.gc.ca<mailto:yan.boulan...@rncan-nrcan.gc.ca>> 
>>>> wrote:
>>>> Hi folks,
>>>>
>>>> I guess I have a lot to learn to write functions but I'm stuck when using 
>>>> rasterEngine. It seems that it should be very easy to do but I'm missing 
>>>> something, apparently... I have a raster, Safranyik_zones_1961_1990, with 
>>>> values (integer) from 1 to 5. I would like to create five rasters for 
>>>> which value will be 1 when the raster Safranyik_zones_1961_1990 is equal 
>>>> to "i", and NA otherwise. I would like to run everything in a loop . 
>>>> Here's what I thought would be ok.
>>>>
>>>> fun_zone <- function(Safranyik_zones,i,...) { Safranyik_zonesb <-
>>>> Safranyik_zones Safranyik_zonesb[] <- NA
>>>> Safranyik_zonesb[Safranyik_zones == i] <- 1
>>>> return(Safranyik_zonesb)
>>>> }
>>>>
>>>> for (i in 1:5){
>>>> Safranyik_zones_1961_1990b <-
>>>> rasterEngine(Safranyik_zones=Safranyik_zones_1961_1990,i=i,
>>>> fun=fun_zone) assign(paste("Safranyik_zones_1961_1990b_",i,
>>>> sep=""),Safranyik_zones_1961_1990b[[1]])
>>>> }
>>>>
>>>> Of course, it says that « i » is missing...:
>>>>
>>>>>Erreur dans Safranyik_zones == i : 'i' est manquant
>>>>
>>>> Any help?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Yan
>>>>
>>>>
>>>> Yan Boulanger, Chercheur scientifique / Research scientist Ressources
>>>> Naturelles Canada, Canadian Forest Service Centre de Foresterie des
>>>> Laurentides 1055, rue du P.E.P.S.
>>>> C.P. 10380, succ. Sainte-Foy
>>>> Québec (Québec) Canada
>>>> G1V 4C7
>>>> Tel. : +1 418 649-6859
>>>>
>>>>
>>>>
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo@r-project.org<mailto:R-sig-Geo@r-project.org>
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo@r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>
>>>
>>>
>>> --
>>> Alex Zvoleff
>>> Postdoctoral Associate
>>> Tropical Ecology Assessment and Monitoring (TEAM) Network Conservation
>>> International
>>> 2011 Crystal Dr. Suite 500, Arlington, Virginia 22202, USA
>>> Tel: +1-703-341-2749, Fax: +1-703-979-0953, Skype: azvoleff
>>> http://www.teamnetwork.org | http://www.conservation.org
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>>
>> --
>> Jonathan A. Greenberg, PhD
>> Assistant Professor
>> Global Environmental Analysis and Remote Sensing (GEARS) Laboratory 
>> Department of Geography and Geographic Information Science University of 
>> Illinois at Urbana-Champaign
>> 259 Computing Applications Building, MC-150
>> 605 East Springfield Avenue
>> Champaign, IL  61820-6371
>> Phone: 217-300-1924
>> http://www.geog.illinois.edu/~jgrn/
>> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
>
>
>
> --
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 259 Computing Applications Building, MC-150
> 605 East Springfield Avenue
> Champaign, IL  61820-6371
> Phone: 217-300-1924
> http://www.geog.illinois.edu/~jgrn/
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007



-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
259 Computing Applications Building, MC-150
605 East Springfield Avenue
Champaign, IL  61820-6371
Phone: 217-300-1924
http://www.geog.illinois.edu/~jgrn/
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Reply via email to