[R-sig-Geo] Faster way to extract raster values using multiple polygons?

2016-04-06 Thread Thiago V. dos Santos via R-sig-Geo
Dear all,

I am trying to extract a time series from a raster brick using multiple 
polygons.

The raster brick is a global temperature netcdf file with 1200 layers (months) 
and the polygons are each of the 5567 municipalities in Brazil. I intend to 
extract the temperature time series for each municipality.

As a final result, I would like to have a data frame containing: -the name and 
coordinates of the municipality, -the date tag from the raster, and -the 
extracted temperature value.

My initial, VERY SLOW attempt was something like this:


library(raster)

# Create some sample raster data
idx <- seq(as.Date("1960/1/1"), as.Date("1990/12/01"), by = "month")
r <- raster(ncol=360, nrow=180)
b <- brick(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)
b <- setZ(b, idx)

# Import polygon data
br <- getData("GADM", country="Brazil", level=2) # about 7MB to download

# Subset the SMALLEST state - only 75 municipalities
br_sub <- br[br$NAME_1=="Sergipe" & !is.na(br$NAME_1),]
plot(br_sub)

# Now let's extract the data for each municipality
beginCluster()
e <- extract(b, br_sub, fun=mean, df=T)
endCluster()


Even using the smallest state possible, this example takes about 20 minutes to 
run on a dual-core 2.5GHz Macbook Pro using four threads. As a reference, there 
are states with over 850 municipalities. And remember, in total there are over 
5500 municipalities I need to extract the data for.

I have the feeling that my code is not very optimized.

Has anybody ever dealt with this amount of data in this kind of raster 
operation? Is there any fastest way to achieve my expected result?
Thanks in advance,
-- Thiago V. dos Santos

PhD student
Land and Atmospheric Science

University of Minnesota

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


Re: [R-sig-Geo] Faster way to extract raster values using multiple polygons?

2016-04-06 Thread Ben Tupper
Hi,

The resolution of your raster data (1 degree) is much more coarse than what 
your polygons represent.  Could you short-circuit the process by assuming that 
the temp at the centroid of each polygon would suitably represent the mean 
temperature across each polygon?  Unless you have some much bigger polygons, I 
can't imagine it will be very far off. If so, then you could pretty quickly 
extract the values for each layer in the raster at each centroid.  Perhaps like 
this?

cents <- coordinates(br_sub)
v <- extract(b, cents)

Is that close enough?

Cheers,
Ben

> On Apr 6, 2016, at 8:00 AM, Thiago V. dos Santos via R-sig-Geo 
>  wrote:
> 
> Dear all,
> 
> I am trying to extract a time series from a raster brick using multiple 
> polygons.
> 
> The raster brick is a global temperature netcdf file with 1200 layers 
> (months) and the polygons are each of the 5567 municipalities in Brazil. I 
> intend to extract the temperature time series for each municipality.
> 
> As a final result, I would like to have a data frame containing: -the name 
> and coordinates of the municipality, -the date tag from the raster, and -the 
> extracted temperature value.
> 
> My initial, VERY SLOW attempt was something like this:
> 
> 
> library(raster)
> 
> # Create some sample raster data
> idx <- seq(as.Date("1960/1/1"), as.Date("1990/12/01"), by = "month")
> r <- raster(ncol=360, nrow=180)
> b <- brick(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)
> b <- setZ(b, idx)
> 
> # Import polygon data
> br <- getData("GADM", country="Brazil", level=2) # about 7MB to download
> 
> # Subset the SMALLEST state - only 75 municipalities
> br_sub <- br[br$NAME_1=="Sergipe" & !is.na(br$NAME_1),]
> plot(br_sub)
> 
> # Now let's extract the data for each municipality
> beginCluster()
> e <- extract(b, br_sub, fun=mean, df=T)
> endCluster()
> 
> 
> Even using the smallest state possible, this example takes about 20 minutes 
> to run on a dual-core 2.5GHz Macbook Pro using four threads. As a reference, 
> there are states with over 850 municipalities. And remember, in total there 
> are over 5500 municipalities I need to extract the data for.
> 
> I have the feeling that my code is not very optimized.
> 
> Has anybody ever dealt with this amount of data in this kind of raster 
> operation? Is there any fastest way to achieve my expected result?
> Thanks in advance,
> -- Thiago V. dos Santos
> 
> PhD student
> Land and Atmospheric Science
> 
> University of Minnesota
> 
> ___
> R-sig-Geo mailing list
> R-sig-Geo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


Re: [R-sig-Geo] Faster way to extract raster values using multiple polygons?

2016-04-08 Thread Thiago V. dos Santos via R-sig-Geo
Hi Ben,


Thank you so much for the suggestion. 

It is pretty fast, and I guess it's a good assumption for temperature. I'll 
have to do the same with precipitation, which is way more spatially variable 
than temperature. In that case, I think I'll have to wait the 10 hours it takes 
to run "extract" using a mean value of all cells covering a polygon.

It is a one-time processing anyway, so I should not worry too much about the 
waiting time.

Greetings,
-- Thiago V. dos Santos

PhD student
Land and Atmospheric Science
University of Minnesota




On Wednesday, April 6, 2016 10:48 AM, Ben Tupper  wrote:
Hi,

The resolution of your raster data (1 degree) is much more coarse than what 
your polygons represent.  Could you short-circuit the process by assuming that 
the temp at the centroid of each polygon would suitably represent the mean 
temperature across each polygon?  Unless you have some much bigger polygons, I 
can't imagine it will be very far off. If so, then you could pretty quickly 
extract the values for each layer in the raster at each centroid.  Perhaps like 
this?

cents <- coordinates(br_sub)
v <- extract(b, cents)

Is that close enough?

Cheers,
Ben


> On Apr 6, 2016, at 8:00 AM, Thiago V. dos Santos via R-sig-Geo 
>  wrote:
> 
> Dear all,
> 
> I am trying to extract a time series from a raster brick using multiple 
> polygons.
> 
> The raster brick is a global temperature netcdf file with 1200 layers 
> (months) and the polygons are each of the 5567 municipalities in Brazil. I 
> intend to extract the temperature time series for each municipality.
> 
> As a final result, I would like to have a data frame containing: -the name 
> and coordinates of the municipality, -the date tag from the raster, and -the 
> extracted temperature value.
> 
> My initial, VERY SLOW attempt was something like this:
> 
> 
> library(raster)
> 
> # Create some sample raster data
> idx <- seq(as.Date("1960/1/1"), as.Date("1990/12/01"), by = "month")
> r <- raster(ncol=360, nrow=180)
> b <- brick(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)
> b <- setZ(b, idx)
> 
> # Import polygon data
> br <- getData("GADM", country="Brazil", level=2) # about 7MB to download
> 
> # Subset the SMALLEST state - only 75 municipalities
> br_sub <- br[br$NAME_1=="Sergipe" & !is.na(br$NAME_1),]
> plot(br_sub)
> 
> # Now let's extract the data for each municipality
> beginCluster()
> e <- extract(b, br_sub, fun=mean, df=T)
> endCluster()
> 
> 
> Even using the smallest state possible, this example takes about 20 minutes 
> to run on a dual-core 2.5GHz Macbook Pro using four threads. As a reference, 
> there are states with over 850 municipalities. And remember, in total there 
> are over 5500 municipalities I need to extract the data for.
> 
> I have the feeling that my code is not very optimized.
> 
> Has anybody ever dealt with this amount of data in this kind of raster 
> operation? Is there any fastest way to achieve my expected result?
> Thanks in advance,
> -- Thiago V. dos Santos
> 
> PhD student
> Land and Atmospheric Science
> 
> University of Minnesota
> 
> ___
> R-sig-Geo mailing list
> R-sig-Geo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo