Re: [R-sig-Geo] stack many files without loading into memory
Just wanted to update this thread in case anyone else comes looking. Some of these things were not immediately clear to me. I ended up doing: library(raster) library(ncdf4) fn=list.files('serverpath') fnstack=stack(fn) layerdates=names(fnstack) #instead of writeRaster, use ncdf4 directly to get around the issue in this thread http://r-sig-geo.2731867.n2.nabble.com/writeRaster-does-not-preserve-names-when-writing-to-NetCDF-td7586909.html . dim1=ncdim_def('Long','degree',seq(-112.25,-104.125,0.0041667)) dim2=ncdim_def('Lat','degree',seq(43.75,33,-0.0041667)) dim3=ncdim_def('time','yrdoy',unlim=T,vals=layerdates)#where layerdates is a vector something like 20120101, 20120109,...etc since thats what my files were called. var=ncvar_def('swe','meters',dim=list(dim1,dim2,dim3),missval=-99,longname='snow water equivalent',compression=9) #important to note, dim1 is the x direction and should be ascending. dim2 is the y direction and should be descending. this is because the cell numbers from a raster* object start top-left and count by row. outputfn='localpath' newnc=nc_create(outputfn,var) ncvar_put(newnc, var, vals=getValues(fnstack)) ncatt_put(ncnew,0,'proj4string','+proj=longlat +datum=WGS84')#add a global attribute defining the geographic information. nc_close(newnc) Then when I open the file: ncnew=nc_open(outputfn) ncnew$dim[[3]]$vals #this will give the list of dates stored above in dim3. you can get the spatial coordinates likewise in dim[[1]] and dim[[2]] (or ncnew$dim$Lat$vals etc.) lyr=grep('20120109',ncnew$dim[[3]]$vals) #use grep to find the date again ncvar_get(ncnew,start=c(1,1,lyr),count=c(-1,-1,1))#get the raster I stored for that date. nc_close(outputfn) Hope that helps someone! Dominik Schneider o 303.735.6296 | c 518.956.3978 On Fri, Feb 6, 2015 at 1:30 PM, dschneiderch [via R-sig-geo] ml-node+s2731867n7587748...@n2.nabble.com wrote: Ok - Looks like it worked this time for 112 files from 2012. The netcdf is 2.25 GB while the compressed multiband geotiff is 510MB. Does the netcdf have so much overhead- the 112 file at 10MB each are only 1.12 GB individually? I like the tidiness of 1 file per year so I'll have to play with how easily these can be accessed and the best way of annotating the layers. I was just reading that netcdf4 is based on hdf5 with a subset of features so I might look to see if hdf5 can do what I want. Thanks ds -- If you reply to this email, your message will be added to the discussion below: http://r-sig-geo.2731867.n2.nabble.com/stack-many-files-without-loading-into-memory-tp7587729p7587748.html To unsubscribe from stack many files without loading into memory, click here http://r-sig-geo.2731867.n2.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=7587729code=RG9taW5pay5TY2huZWlkZXJAY29sb3JhZG8uZWR1fDc1ODc3Mjl8LTEwMzMyMTA1OQ== . NAML http://r-sig-geo.2731867.n2.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/stack-many-files-without-loading-into-memory-tp7587729p7587831.html Sent from the R-sig-geo mailing list archive at Nabble.com. ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Re: [R-sig-Geo] stack many files without loading into memory
Ok - Looks like it worked this time for 112 files from 2012. The netcdf is 2.25 GB while the compressed multiband geotiff is 510MB. Does the netcdf have so much overhead- the 112 file at 10MB each are only 1.12 GB individually? I like the tidiness of 1 file per year so I'll have to play with how easily these can be accessed and the best way of annotating the layers. I was just reading that netcdf4 is based on hdf5 with a subset of features so I might look to see if hdf5 can do what I want. Thanks ds -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/stack-many-files-without-loading-into-memory-tp7587729p7587748.html Sent from the R-sig-geo mailing list archive at Nabble.com. ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Re: [R-sig-Geo] stack many files without loading into memory
Hi Michael - Yes saving as GTiff with the compression options reduced the file size from ~10MB to ~2.5M for a single file but I am having a lot of trouble getting it to save the whole stack. I'm definitely running out of memory on my computer so maybe R is being slow and timing out? I've left it overnight with no success (for a single year). That said, is there some overhead involved in this? 112files * 10MB is only 1.12 GB I might try this on the command line with gdal tools. Another issue that I'm encountering is that I don't seem to be able to save layer names either in GTiff or CDF. Since these are ~100 remote sensing images from the year, I need to be able to annotate the layer name so I know the date (I actually posted onto an older thread about this specific to CDF format because it came up in something else I was doing.) Thanks for your help. -- View this message in context: http://r-sig-geo.2731867.n2.nabble.com/stack-many-files-without-loading-into-memory-tp7587729p7587745.html Sent from the R-sig-geo mailing list archive at Nabble.com. ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
[R-sig-Geo] stack many files without loading into memory
Hi - I have some data on a server but would like to bring them local in a somewhat compressed format that is still easy to access. /Volumes/hD/2012 - 100 geotiffs ~/project/data/ - store those geotiffs here without needing server access. untested, I think I could do something like: s=stack() writeRaster(s,'2012stack') fn=list.files('/Volumes/hD/2012',pattern='*.tif',full.names=T) lapply(fn,function(f){ s=stack('2012stack') r=raster(f) names(r)=gsub(pattern='.tif',replacement='',basename(f)) s=addLayer(s,r) writeRaster(s,'2012stack') }) Or is it better to save to a .RData? Is there a better way that doesn't require me to loop through each geotiff since I can't load it all into memory. Thanks Dominik [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Re: [R-sig-Geo] stack many files without loading into memory
Wouldn't that keep the link to the server on which they are stored now? Dominik Schneider o 303.735.6296 | c 518.956.3978 On Wed, Feb 4, 2015 at 12:50 PM, Michael Sumner mdsum...@gmail.com wrote: Why not stack(fn) ? On Thu, 5 Feb 2015 06:41 Dominik Schneider dominik.schnei...@colorado.edu wrote: Hi - I have some data on a server but would like to bring them local in a somewhat compressed format that is still easy to access. /Volumes/hD/2012 - 100 geotiffs ~/project/data/ - store those geotiffs here without needing server access. untested, I think I could do something like: s=stack() writeRaster(s,'2012stack') fn=list.files('/Volumes/hD/2012',pattern='*.tif',full.names=T) lapply(fn,function(f){ s=stack('2012stack') r=raster(f) names(r)=gsub(pattern='.tif',replacement='',basename(f)) s=addLayer(s,r) writeRaster(s,'2012stack') }) Or is it better to save to a .RData? Is there a better way that doesn't require me to loop through each geotiff since I can't load it all into memory. Thanks Dominik [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Re: [R-sig-Geo] stack many files without loading into memory
Why not stack(fn) ? On Thu, 5 Feb 2015 06:41 Dominik Schneider dominik.schnei...@colorado.edu wrote: Hi - I have some data on a server but would like to bring them local in a somewhat compressed format that is still easy to access. /Volumes/hD/2012 - 100 geotiffs ~/project/data/ - store those geotiffs here without needing server access. untested, I think I could do something like: s=stack() writeRaster(s,'2012stack') fn=list.files('/Volumes/hD/2012',pattern='*.tif',full.names=T) lapply(fn,function(f){ s=stack('2012stack') r=raster(f) names(r)=gsub(pattern='.tif',replacement='',basename(f)) s=addLayer(s,r) writeRaster(s,'2012stack') }) Or is it better to save to a .RData? Is there a better way that doesn't require me to loop through each geotiff since I can't load it all into memory. Thanks Dominik [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Re: [R-sig-Geo] stack many files without loading into memory
On Thu Feb 05 2015 at 7:41:33 AM Dominik Schneider dominik.schnei...@colorado.edu wrote: I think you are correct. s=stack(fn,quick=T) writeRaster(s,'localpath/2012data') Ugh, sorry yes that's me reading too fast. I should have suggested the next step to writeRaster, I'm not sure why you don't include the file extension here though? Why not writeRaster(s,'localpath/2012data.grd') would get the data local. I guess the trade off is that the file size is an order of magnitude bigger than if I saved them in an .RData file but much quicker to access. You might achieve similar compression if you choose GeoTIFF, with the right options (and you need rgdal). Try a test with a single layer, e.g. s=stack(fn,quick = TRUE) require(rgdal) writeRaster(s[[1]],'localpath/2012data_temp01.tif', options = c(COMPRESS=LZW, TILED=YES) Does the file size of 2012data_temp01.tif look promising? The native rasterfile format does not support compression as far as I know. Tiling may be of help or hindrance, depending on the dimensions and the extra margin added by the tiles if they need to extend beyond the margins - you can control tile size with BLOCKX/YSIZE if needed: http://www.gdal.org/frmt_gtiff.html (NetCDF4 - with ncdf4 package - can also compress and tile natively, but I haven't tried that via raster myself). Cheers, Mike. ds On Wed, Feb 4, 2015 at 12:51 PM, Dominik Schneider dominik.schnei...@colorado.edu wrote: Wouldn't that keep the link to the server on which they are stored now? On Wed, Feb 4, 2015 at 12:50 PM, Michael Sumner mdsum...@gmail.com wrote: Why not stack(fn) ? On Thu, 5 Feb 2015 06:41 Dominik Schneider dominik.schnei...@colorado.edu wrote: Hi - I have some data on a server but would like to bring them local in a somewhat compressed format that is still easy to access. /Volumes/hD/2012 - 100 geotiffs ~/project/data/ - store those geotiffs here without needing server access. untested, I think I could do something like: s=stack() writeRaster(s,'2012stack') fn=list.files('/Volumes/hD/2012',pattern='*.tif',full.names=T) lapply(fn,function(f){ s=stack('2012stack') r=raster(f) names(r)=gsub(pattern='.tif',replacement='',basename(f)) s=addLayer(s,r) writeRaster(s,'2012stack') }) Or is it better to save to a .RData? Is there a better way that doesn't require me to loop through each geotiff since I can't load it all into memory. Thanks Dominik [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo [[alternative HTML version deleted]] ___ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo