Folks:

Asked this question some time ago, and found what appeared (at first) to be
the best solution, but I'm now finding a new problem.  First off, it seemed
like ff as Jens suggested worked:

# outdata_ncells = the number of rows * number of columns * number of bands
in an image:
out<-ff(vmode="double",length=outdata_ncells,filename=filename)
finalizer(out) <- close
close(out)

This was working fine until I attempted to set length to a VERY large
number: outdata_ncells = 17711913600.  This would create a file that is
131.964GB.  Big, but not obscenely so (and certainly not larger than the
filesystem can handle).  However, length appears to be restricted
by .Machine$integer.max (I'm on a 64-bit windows box):
> .Machine$integer.max
[1] 2147483647

Any suggestions on how to solve this problem for much larger file sizes?

--j


On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg <j...@illinois.edu>wrote:

> Thanks, all!  I'll try these out.  I'm trying to work up something that is
> platform independent (if possible) for use with mmap.  I'll do some tests
> on these suggestions and see which works best. I'll try to report back in a
> few days.  Cheers!
>
> --j
>
>
>
> 2012/5/3 "Jens Oehlschlägel" <jens.oehlschlae...@truecluster.com>
>
>> Jonathan,
>>
>> On some filesystems (e.g. NTFS, see below) it is possible to create
>> 'sparse' memory-mapped files, i.e. reserving the space without the cost of
>> actually writing initial values.
>> Package 'ff' does this automatically and also allows to access the file
>> in parallel. Check the example below and see how big file creation is
>> immediate.
>>
>> Jens Oehlschlägel
>>
>>
>> > library(ff)
>> > library(snowfall)
>> > ncpus <- 2
>> > n <- 1e8
>> > system.time(
>> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
>> + )
>>        User      System verstrichen
>>        0.01        0.00        0.02
>> > # check finalizer, with an explicit filename we should have a 'close'
>> finalizer
>> > finalizer(x)
>> [1] "close"
>> > # if not, set it to 'close' inorder to not let slaves delete x on slave
>> shutdown
>> > finalizer(x) <- "close"
>> > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
>> R Version:  R version 2.15.0 (2012-03-30)
>>
>> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
>> CPUs.
>>
>> > sfLibrary(ff)
>> Library ff loaded.
>> Library ff loaded in cluster.
>>
>> Warnmeldung:
>> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
>> = TRUE,  :
>>   'keep.source' is deprecated and will be ignored
>> > sfExport("x") # note: do not export the same ff multiple times
>> > # explicitely opening avoids a gc problem
>> > sfClusterEval(open(x, caching="mmeachflush")) # opening with
>> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
>> write storms when the file is larger than RAM
>> [[1]]
>> [1] TRUE
>>
>> [[2]]
>> [1] TRUE
>>
>> > system.time(
>> + sfLapply( chunk(x, length=ncpus), function(i){
>> +   x[i] <- runif(sum(i))
>> +   invisible()
>> + })
>> + )
>>        User      System verstrichen
>>        0.00        0.00       30.78
>> > system.time(
>> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i],
>> c(0.05, 0.95)) )
>> + )
>>        User      System verstrichen
>>        0.00        0.00        4.38
>> > # for completeness
>> > sfClusterEval(close(x))
>> [[1]]
>> [1] TRUE
>>
>> [[2]]
>> [1] TRUE
>>
>> > csummary(s)
>>              5%  95%
>> Min.    0.04998 0.95
>> 1st Qu. 0.04999 0.95
>> Median  0.05001 0.95
>> Mean    0.05001 0.95
>> 3rd Qu. 0.05002 0.95
>> Max.    0.05003 0.95
>> > # stop slaves
>> > sfStop()
>>
>> Stopping cluster
>>
>> > # with the close finalizer we are responsible for deleting the file
>> explicitely (unless we want to keep it)
>> > delete(x)
>> [1] TRUE
>> > # remove r-side metadata
>> > rm(x)
>> > # truly free memory
>> > gc()
>>
>>
>>
>>  *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr
>> *Von:* "Jonathan Greenberg" <j...@illinois.edu>
>> *An:* r-help <r-help@r-project.org>, r-sig-...@r-project.org
>> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on
>> disk?
>>  R-helpers:
>>
>> What would be the absolute fastest way to make a large "empty" file (e.g.
>> filled with all zeroes) on disk, given a byte size and a given number
>> number of empty values. I know I can use writeBin, but the "object" in
>> this case may be far too large to store in main memory. I'm asking because
>> I'm going to use this file in conjunction with mmap to do parallel writes
>> to this file. Say, I want to create a blank file of 10,000 floating point
>> numbers.
>>
>> Thanks!
>>
>> --j
>>
>> --
>> Jonathan A. Greenberg, PhD
>> Assistant Professor
>> Department of Geography and Geographic Information Science
>> University of Illinois at Urbana-Champaign
>> 607 South Mathews Avenue, MC 150
>> Urbana, IL 61801
>> Phone: 415-763-5476
>> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
>> http://www.geog.illinois.edu/people/JonathanGreenberg.html
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> r-sig-...@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>>
>>
>
>
> --
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 607 South Mathews Avenue, MC 150
> Urbana, IL 61801
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
> http://www.geog.illinois.edu/people/JonathanGreenberg.html
>



-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 217-300-1924
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
http://www.geog.illinois.edu/people/JonathanGreenberg.html

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to