   ff has a utility function file.resize() which allows to give a new filesize
   in bytes using doubles.
   See ?file.resize
   Jens Oehlschlägel
   Gesendet: Donnerstag, 27. September 2012 um 21:17 Uhr
   Von: "Jonathan Greenberg" <>
   An: r-help <>,
   Betreff: Re: [R-sig-hpc] Quickest way to make a large "empty" file on disk?
   Asked this question some time ago, and found what appeared (at first) to be
   the best solution, but I'm now finding a new problem. First off, it seemed
   like ff as Jens suggested worked:
   # outdata_ncells = the number of rows * number of columns * number of bands
   in an image:
   finalizer(out) <- close
   This was working fine until I attempted to set length to a VERY large
   number: outdata_ncells = 17711913600. This would create a file that is
   131.964GB. Big, but not obscenely so (and certainly not larger than the
   filesystem can handle). However, length appears to be restricted
   by .Machine$integer.max (I'm on a 64-bit windows box):
   > .Machine$integer.max
   [1] 2147483647
   Any suggestions on how to solve this problem for much larger file sizes?
   On    Thu,   May   3,   2012   at   10:44   AM,   Jonathan   Greenberg
   > Thanks, all! I'll try these out. I'm trying to work up something that is
   > platform independent (if possible) for use with mmap. I'll do some tests
   > on these suggestions and see which works best. I'll try to report back in
   > few days. Cheers!
   > --j
   > 2012/5/3 "Jens Oehlschlägel" <>
   >> Jonathan,
   >> On some filesystems (e.g. NTFS, see below) it is possible to create
   >> 'sparse' memory-mapped files, i.e. reserving the space without the cost
   >> actually writing initial values.
   >> Package 'ff' does this automatically and also allows to access the file
   >> in parallel. Check the example below and see how big file creation is
   >> immediate.
   >> Jens Oehlschlägel
   >> > library(ff)
   >> > library(snowfall)
   >> > ncpus <- 2
   >> > n <- 1e8
   >> > system.time(
   >> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
   >> + )
   >> User System verstrichen
   >> 0.01 0.00 0.02
   >> > # check finalizer, with an explicit filename we should have a 'close'
   >> finalizer
   >> > finalizer(x)
   >> [1] "close"
   >> > # if not, set it to 'close' inorder to not let slaves delete x on slave
   >> shutdown
   >> > finalizer(x) <- "close"
   >> > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
   >> R Version: R version 2.15.0 (2012-03-30)
   >> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
   >> CPUs.
   >> > sfLibrary(ff)
   >> Library ff loaded.
   >> Library ff loaded in cluster.
   >> Warnmeldung:
   >> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
   >> = TRUE, :
   >> 'keep.source' is deprecated and will be ignored
   >> > sfExport("x") # note: do not export the same ff multiple times
   >> > # explicitely opening avoids a gc problem
   >> > sfClusterEval(open(x, caching="mmeachflush")) # opening with
   >> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
   >> write storms when the file is larger than RAM
   >> [[1]]
   >> [1] TRUE
   >> [[2]]
   >> [1] TRUE
   >> > system.time(
   >> + sfLapply( chunk(x, length=ncpus), function(i){
   >> + x[i] <- runif(sum(i))
   >> + invisible()
   >> + })
   >> + )
   >> User System verstrichen
   >> 0.00 0.00 30.78
   >> > system.time(
   >> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i],
   >> c(0.05, 0.95)) )
   >> + )
   >> User System verstrichen
   >> 0.00 0.00 4.38
   >> > # for completeness
   >> > sfClusterEval(close(x))
   >> [[1]]
   >> [1] TRUE
   >> [[2]]
   >> [1] TRUE
   >> > csummary(s)
   >> 5% 95%
   >> Min. 0.04998 0.95
   >> 1st Qu. 0.04999 0.95
   >> Median 0.05001 0.95
   >> Mean 0.05001 0.95
   >> 3rd Qu. 0.05002 0.95
   >> Max. 0.05003 0.95
   >> > # stop slaves
   >> > sfStop()
   >> Stopping cluster
   >> > # with the close finalizer we are responsible for deleting the file
   >> explicitely (unless we want to keep it)
   >> > delete(x)
   >> [1] TRUE
   >> > # remove r-side metadata
   >> > rm(x)
   >> > # truly free memory
   >> > gc()
   >> *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr
   >> *Von:* "Jonathan Greenberg" <>
   >> *An:* r-help <>,
   >> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on
   >> disk?
   >> R-helpers:
   >> What would be the absolute fastest way to make a large "empty" file (e.g.
   >> filled with all zeroes) on disk, given a byte size and a given number
   >> number of empty values. I know I can use writeBin, but the "object" in
   >>  this case may be far too large to store in main memory. I'm asking
   >> I'm going to use this file in conjunction with mmap to do parallel writes
   >> to this file. Say, I want to create a blank file of 10,000 floating point
   >> numbers.
   >> Thanks!
   >> --j
   >> --
   >> Jonathan A. Greenberg, PhD
   >> Assistant Professor
   >> Department of Geography and Geographic Information Science
   >> University of Illinois at Urbana-Champaign
   >> 607 South Mathews Avenue, MC 150
   >> Urbana, IL 61801
   >> Phone: 415-763-5476
   >> AIM: jgrn307, MSN:, Gchat: jgrn307, Skype: jgrn3007
   >> [1]
   >> [[alternative HTML version deleted]]
   >> _______________________________________________
   >> R-sig-hpc mailing list
   >> [2]
   > --
   > Jonathan A. Greenberg, PhD
   > Assistant Professor
   > Department of Geography and Geographic Information Science
   > University of Illinois at Urbana-Champaign
   > 607 South Mathews Avenue, MC 150
   > Urbana, IL 61801
   > Phone: 415-763-5476
   > AIM: jgrn307, MSN:, Gchat: jgrn307, Skype: jgrn3007
   > [3]
   Jonathan A. Greenberg, PhD
   Assistant Professor
   Department of Geography and Geographic Information Science
   University of Illinois at Urbana-Champaign
   607 South Mathews Avenue, MC 150
   Urbana, IL 61801
   Phone: 217-300-1924
   AIM: jgrn307, MSN:, Gchat: jgrn307, Skype: jgrn3007
   [[alternative HTML version deleted]]
   R-sig-hpc mailing list


______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Reply via email to