Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
Jonathan, ff has a utility function file.resize() which allows to give a new filesize in bytes using doubles. See ?file.resize Regards Jens Oehlschlägel Gesendet: Donnerstag, 27. September 2012 um 21:17 Uhr Von: Jonathan Greenberg j...@illinois.edu An: r-help r-help@r-project.org, r-sig-...@r-project.org Betreff: Re: [R-sig-hpc] Quickest way to make a large empty file on disk? Folks: Asked this question some time ago, and found what appeared (at first) to be the best solution, but I'm now finding a new problem. First off, it seemed like ff as Jens suggested worked: # outdata_ncells = the number of rows * number of columns * number of bands in an image: out-ff(vmode=double,length=outdata_ncells,filename=filename) finalizer(out) - close close(out) This was working fine until I attempted to set length to a VERY large number: outdata_ncells = 17711913600. This would create a file that is 131.964GB. Big, but not obscenely so (and certainly not larger than the filesystem can handle). However, length appears to be restricted by .Machine$integer.max (I'm on a 64-bit windows box): .Machine$integer.max [1] 2147483647 Any suggestions on how to solve this problem for much larger file sizes? --j OnThu, May 3, 2012 at 10:44 AM, Jonathan Greenberg j...@illinois.eduwrote: Thanks, all! I'll try these out. I'm trying to work up something that is platform independent (if possible) for use with mmap. I'll do some tests on these suggestions and see which works best. I'll try to report back in a few days. Cheers! --j 2012/5/3 Jens Oehlschlägel jens.oehlschlae...@truecluster.com Jonathan, On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse' memory-mapped files, i.e. reserving the space without the cost of actually writing initial values. Package 'ff' does this automatically and also allows to access the file in parallel. Check the example below and see how big file creation is immediate. Jens Oehlschlägel library(ff) library(snowfall) ncpus - 2 n - 1e8 system.time( + x - ff(vmode=double, length=n, filename=c:/Temp/x.ff) + ) User System verstrichen 0.01 0.00 0.02 # check finalizer, with an explicit filename we should have a 'close' finalizer finalizer(x) [1] close # if not, set it to 'close' inorder to not let slaves delete x on slave shutdown finalizer(x) - close sfInit(parallel=TRUE, cpus=ncpus, type=SOCK) R Version: R version 2.15.0 (2012-03-30) snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs. sfLibrary(ff) Library ff loaded. Library ff loaded in cluster. Warnmeldung: In library(package = ff, character.only = TRUE, pos = 2, warn.conflicts = TRUE, : 'keep.source' is deprecated and will be ignored sfExport(x) # note: do not export the same ff multiple times # explicitely opening avoids a gc problem sfClusterEval(open(x, caching=mmeachflush)) # opening with 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS write storms when the file is larger than RAM [[1]] [1] TRUE [[2]] [1] TRUE system.time( + sfLapply( chunk(x, length=ncpus), function(i){ + x[i] - runif(sum(i)) + invisible() + }) + ) User System verstrichen 0.00 0.00 30.78 system.time( + s - sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05, 0.95)) ) + ) User System verstrichen 0.00 0.00 4.38 # for completeness sfClusterEval(close(x)) [[1]] [1] TRUE [[2]] [1] TRUE csummary(s) 5% 95% Min. 0.04998 0.95 1st Qu. 0.04999 0.95 Median 0.05001 0.95 Mean 0.05001 0.95 3rd Qu. 0.05002 0.95 Max. 0.05003 0.95 # stop slaves sfStop() Stopping cluster # with the close finalizer we are responsible for deleting the file explicitely (unless we want to keep it) delete(x) [1] TRUE # remove r-side metadata rm(x) # truly free memory gc() *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr *Von:* Jonathan Greenberg j...@illinois.edu *An:* r-help r-help@r-project.org, r-sig-...@r-project.org *Betreff:* [R-sig-hpc] Quickest way to make a large empty file on disk? R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
) quantile(x[i], c(0.05, 0.95)) ) + ) User System verstrichen 0.000.004.38 # for completeness sfClusterEval(close(x)) [[1]] [1] TRUE [[2]] [1] TRUE csummary(s) 5% 95% Min.0.04998 0.95 1st Qu. 0.04999 0.95 Median 0.05001 0.95 Mean0.05001 0.95 3rd Qu. 0.05002 0.95 Max.0.05003 0.95 # stop slaves sfStop() Stopping cluster # with the close finalizer we are responsible for deleting the file explicitely (unless we want to keep it) delete(x) [1] TRUE # remove r-side metadata rm(x) # truly free memory gc() *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr *Von:* Jonathan Greenberg j...@illinois.edu j...@illinois.edu *An:* r-help r-help@r-project.org r-help@r-project.org, r-sig-...@r-project.org *Betreff:* [R-sig-hpc] Quickest way to make a large empty file on disk? R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing listR-sig-hpc@r-project.orghttps://stat.ethz.ch/mailman/listinfo/r-sig-hpc -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007http://www.geog.illinois.edu/people/JonathanGreenberg.html __r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 217-300-1924 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base loaded via a namespace (and not attached): [1] tools_2.15.1 On Thu, Sep 27, 2012 at 3:49 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, If you really need to trash your disk, why not use seek()? fl - file(Test.txt, open = wb) seek(fl, where = 1024, origin = start, rw = write) [1] 0 writeChar(character(1), fl, nchars = 1, useBytes = TRUE) Warning message: In writeChar(character(1), fl, nchars = 1, useBytes = TRUE) : writeChar: more characters requested than are in the string - will zero-pad close(fl) File Test.txt is now 1Kb in size. Hope this helps, Rui Barradas Em 27-09-2012 20:17, Jonathan Greenberg escreveu: Folks: Asked this question some time ago, and found what appeared (at first) to be the best solution, but I'm now finding a new problem. First off, it seemed like ff as Jens suggested worked: # outdata_ncells = the number of rows * number of columns * number of bands in an image: out-ff(vmode=double,length=outdata_ncells,filename=filename) finalizer(out) - close close(out) This was working fine until I attempted to set length to a VERY large number: outdata_ncells = 17711913600. This would create a file that is 131.964GB. Big, but not obscenely so (and certainly not larger than the filesystem can handle). However, length appears to be restricted by .Machine$integer.max (I'm on a 64-bit windows box): .Machine$integer.max [1] 2147483647 Any suggestions on how to solve this problem for much larger file sizes? --j On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg j...@illinois.edu j...@illinois.eduwrote: Thanks, all! I'll try these out. I'm trying to work up something that is platform independent (if possible) for use with mmap. I'll do some tests on these suggestions and see which works best. I'll try to report back in a few days. Cheers! --j 2012/5/3 Jens Oehlschlägel jens.oehlschlae...@truecluster.com jens.oehlschlae...@truecluster.com Jonathan, On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse' memory-mapped files, i.e. reserving the space without the cost of actually writing initial values. Package 'ff' does this automatically and also allows to access the file in parallel. Check the example below and see how big file creation is immediate. Jens Oehlschlägel library(ff) library(snowfall) ncpus - 2 n - 1e8 system.time( + x - ff(vmode=double, length=n, filename=c:/Temp/x.ff) + ) User System verstrichen 0.010.000.02 # check finalizer, with an explicit filename we should have a 'close' finalizer finalizer(x) [1] close # if not, set it to 'close' inorder to not let slaves delete x on slave shutdown finalizer(x) - close sfInit(parallel=TRUE, cpus=ncpus, type=SOCK) R Version: R version 2.15.0 (2012-03-30) snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs. sfLibrary(ff) Library ff loaded. Library ff loaded in cluster. Warnmeldung: In library(package = ff, character.only = TRUE, pos = 2, warn.conflicts = TRUE, : 'keep.source' is deprecated and will be ignored sfExport(x) # note: do not export the same ff multiple times # explicitely opening avoids a gc problem sfClusterEval(open(x, caching=mmeachflush)) # opening with 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS write storms when the file is larger than RAM [[1]] [1] TRUE [[2]] [1] TRUE system.time( + sfLapply( chunk(x, length=ncpus), function(i){ + x[i] - runif(sum(i)) + invisible() + }) + ) User System verstrichen 0.000.00 30.78 system.time( + s - sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05, 0.95)) ) + ) User System verstrichen 0.000.004.38 # for completeness sfClusterEval(close(x)) [[1]] [1] TRUE [[2]] [1] TRUE csummary(s) 5% 95% Min.0.04998 0.95 1st Qu. 0.04999 0.95 Median 0.05001 0.95 Mean0.05001 0.95 3rd Qu. 0.05002 0.95 Max.0.05003 0.95 # stop slaves sfStop() Stopping cluster # with the close finalizer we are responsible for deleting the file explicitely (unless we want to keep it) delete(x) [1] TRUE # remove r-side metadata rm(x) # truly free memory gc() *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr *Von:* Jonathan Greenberg j...@illinois.edu j...@illinois.edu *An:* r-help r-help@r-project.org r-help@r-project.org, r-sig-...@r-project.org *Betreff:* [R-sig-hpc] Quickest way to make a large empty file on disk? R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
Version: R version 2.15.0 (2012-03-30) snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs. sfLibrary(ff) Library ff loaded. Library ff loaded in cluster. Warnmeldung: In library(package = ff, character.only = TRUE, pos = 2, warn.conflicts = TRUE, : 'keep.source' is deprecated and will be ignored sfExport(x) # note: do not export the same ff multiple times # explicitely opening avoids a gc problem sfClusterEval(open(x, caching=mmeachflush)) # opening with 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS write storms when the file is larger than RAM [[1]] [1] TRUE [[2]] [1] TRUE system.time( + sfLapply( chunk(x, length=ncpus), function(i){ + x[i] - runif(sum(i)) + invisible() + }) + ) User System verstrichen 0.000.00 30.78 system.time( + s - sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05, 0.95)) ) + ) User System verstrichen 0.000.004.38 # for completeness sfClusterEval(close(x)) [[1]] [1] TRUE [[2]] [1] TRUE csummary(s) 5% 95% Min.0.04998 0.95 1st Qu. 0.04999 0.95 Median 0.05001 0.95 Mean0.05001 0.95 3rd Qu. 0.05002 0.95 Max.0.05003 0.95 # stop slaves sfStop() Stopping cluster # with the close finalizer we are responsible for deleting the file explicitely (unless we want to keep it) delete(x) [1] TRUE # remove r-side metadata rm(x) # truly free memory gc() *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr *Von:* Jonathan Greenberg j...@illinois.edu j...@illinois.edu *An:* r-help r-help@r-project.org r-help@r-project.org, r-sig-...@r-project.org *Betreff:* [R-sig-hpc] Quickest way to make a large empty file on disk? R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing listR-sig-hpc@r-project.orghttps://stat.ethz.ch/mailman/listinfo/r-sig-hpc -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007http://www.geog.illinois.edu/people/JonathanGreenberg.html __r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 217-300-1924 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing list r-sig-...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
Folks: Asked this question some time ago, and found what appeared (at first) to be the best solution, but I'm now finding a new problem. First off, it seemed like ff as Jens suggested worked: # outdata_ncells = the number of rows * number of columns * number of bands in an image: out-ff(vmode=double,length=outdata_ncells,filename=filename) finalizer(out) - close close(out) This was working fine until I attempted to set length to a VERY large number: outdata_ncells = 17711913600. This would create a file that is 131.964GB. Big, but not obscenely so (and certainly not larger than the filesystem can handle). However, length appears to be restricted by .Machine$integer.max (I'm on a 64-bit windows box): .Machine$integer.max [1] 2147483647 Any suggestions on how to solve this problem for much larger file sizes? --j On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg j...@illinois.eduwrote: Thanks, all! I'll try these out. I'm trying to work up something that is platform independent (if possible) for use with mmap. I'll do some tests on these suggestions and see which works best. I'll try to report back in a few days. Cheers! --j 2012/5/3 Jens Oehlschlägel jens.oehlschlae...@truecluster.com Jonathan, On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse' memory-mapped files, i.e. reserving the space without the cost of actually writing initial values. Package 'ff' does this automatically and also allows to access the file in parallel. Check the example below and see how big file creation is immediate. Jens Oehlschlägel library(ff) library(snowfall) ncpus - 2 n - 1e8 system.time( + x - ff(vmode=double, length=n, filename=c:/Temp/x.ff) + ) User System verstrichen 0.010.000.02 # check finalizer, with an explicit filename we should have a 'close' finalizer finalizer(x) [1] close # if not, set it to 'close' inorder to not let slaves delete x on slave shutdown finalizer(x) - close sfInit(parallel=TRUE, cpus=ncpus, type=SOCK) R Version: R version 2.15.0 (2012-03-30) snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs. sfLibrary(ff) Library ff loaded. Library ff loaded in cluster. Warnmeldung: In library(package = ff, character.only = TRUE, pos = 2, warn.conflicts = TRUE, : 'keep.source' is deprecated and will be ignored sfExport(x) # note: do not export the same ff multiple times # explicitely opening avoids a gc problem sfClusterEval(open(x, caching=mmeachflush)) # opening with 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS write storms when the file is larger than RAM [[1]] [1] TRUE [[2]] [1] TRUE system.time( + sfLapply( chunk(x, length=ncpus), function(i){ + x[i] - runif(sum(i)) + invisible() + }) + ) User System verstrichen 0.000.00 30.78 system.time( + s - sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05, 0.95)) ) + ) User System verstrichen 0.000.004.38 # for completeness sfClusterEval(close(x)) [[1]] [1] TRUE [[2]] [1] TRUE csummary(s) 5% 95% Min.0.04998 0.95 1st Qu. 0.04999 0.95 Median 0.05001 0.95 Mean0.05001 0.95 3rd Qu. 0.05002 0.95 Max.0.05003 0.95 # stop slaves sfStop() Stopping cluster # with the close finalizer we are responsible for deleting the file explicitely (unless we want to keep it) delete(x) [1] TRUE # remove r-side metadata rm(x) # truly free memory gc() *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr *Von:* Jonathan Greenberg j...@illinois.edu *An:* r-help r-help@r-project.org, r-sig-...@r-project.org *Betreff:* [R-sig-hpc] Quickest way to make a large empty file on disk? R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing list r-sig-...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
Hello, If you really need to trash your disk, why not use seek()? fl - file(Test.txt, open = wb) seek(fl, where = 1024, origin = start, rw = write) [1] 0 writeChar(character(1), fl, nchars = 1, useBytes = TRUE) Warning message: In writeChar(character(1), fl, nchars = 1, useBytes = TRUE) : writeChar: more characters requested than are in the string - will zero-pad close(fl) File Test.txt is now 1Kb in size. Hope this helps, Rui Barradas Em 27-09-2012 20:17, Jonathan Greenberg escreveu: Folks: Asked this question some time ago, and found what appeared (at first) to be the best solution, but I'm now finding a new problem. First off, it seemed like ff as Jens suggested worked: # outdata_ncells = the number of rows * number of columns * number of bands in an image: out-ff(vmode=double,length=outdata_ncells,filename=filename) finalizer(out) - close close(out) This was working fine until I attempted to set length to a VERY large number: outdata_ncells = 17711913600. This would create a file that is 131.964GB. Big, but not obscenely so (and certainly not larger than the filesystem can handle). However, length appears to be restricted by .Machine$integer.max (I'm on a 64-bit windows box): .Machine$integer.max [1] 2147483647 Any suggestions on how to solve this problem for much larger file sizes? --j On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg j...@illinois.eduwrote: Thanks, all! I'll try these out. I'm trying to work up something that is platform independent (if possible) for use with mmap. I'll do some tests on these suggestions and see which works best. I'll try to report back in a few days. Cheers! --j 2012/5/3 Jens Oehlschlägel jens.oehlschlae...@truecluster.com Jonathan, On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse' memory-mapped files, i.e. reserving the space without the cost of actually writing initial values. Package 'ff' does this automatically and also allows to access the file in parallel. Check the example below and see how big file creation is immediate. Jens Oehlschlägel library(ff) library(snowfall) ncpus - 2 n - 1e8 system.time( + x - ff(vmode=double, length=n, filename=c:/Temp/x.ff) + ) User System verstrichen 0.010.000.02 # check finalizer, with an explicit filename we should have a 'close' finalizer finalizer(x) [1] close # if not, set it to 'close' inorder to not let slaves delete x on slave shutdown finalizer(x) - close sfInit(parallel=TRUE, cpus=ncpus, type=SOCK) R Version: R version 2.15.0 (2012-03-30) snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs. sfLibrary(ff) Library ff loaded. Library ff loaded in cluster. Warnmeldung: In library(package = ff, character.only = TRUE, pos = 2, warn.conflicts = TRUE, : 'keep.source' is deprecated and will be ignored sfExport(x) # note: do not export the same ff multiple times # explicitely opening avoids a gc problem sfClusterEval(open(x, caching=mmeachflush)) # opening with 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS write storms when the file is larger than RAM [[1]] [1] TRUE [[2]] [1] TRUE system.time( + sfLapply( chunk(x, length=ncpus), function(i){ + x[i] - runif(sum(i)) + invisible() + }) + ) User System verstrichen 0.000.00 30.78 system.time( + s - sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05, 0.95)) ) + ) User System verstrichen 0.000.004.38 # for completeness sfClusterEval(close(x)) [[1]] [1] TRUE [[2]] [1] TRUE csummary(s) 5% 95% Min.0.04998 0.95 1st Qu. 0.04999 0.95 Median 0.05001 0.95 Mean0.05001 0.95 3rd Qu. 0.05002 0.95 Max.0.05003 0.95 # stop slaves sfStop() Stopping cluster # with the close finalizer we are responsible for deleting the file explicitely (unless we want to keep it) delete(x) [1] TRUE # remove r-side metadata rm(x) # truly free memory gc() *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr *Von:* Jonathan Greenberg j...@illinois.edu *An:* r-help r-help@r-project.org, r-sig-...@r-project.org *Betreff:* [R-sig-hpc] Quickest way to make a large empty file on disk? R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
Jonathan, On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse' memory-mapped files, i.e. reserving the space without the cost of actually writing initial values. Package 'ff' does this automatically and also allows to access the file in parallel. Check the example below and see how big file creation is immediate. Jens Oehlschlägel library(ff) library(snowfall) ncpus - 2 n - 1e8 system.time( + x - ff(vmode=double, length=n, filename=c:/Temp/x.ff) + ) User System verstrichen 0.010.000.02 # check finalizer, with an explicit filename we should have a 'close' finalizer finalizer(x) [1] close # if not, set it to 'close' inorder to not let slaves delete x on slave shutdown finalizer(x) - close sfInit(parallel=TRUE, cpus=ncpus, type=SOCK) R Version: R version 2.15.0 (2012-03-30) snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs. sfLibrary(ff) Library ff loaded. Library ff loaded in cluster. Warnmeldung: In library(package = ff, character.only = TRUE, pos = 2, warn.conflicts = TRUE, : 'keep.source' is deprecated and will be ignored sfExport(x) # note: do not export the same ff multiple times # explicitely opening avoids a gc problem sfClusterEval(open(x, caching=mmeachflush)) # opening with 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS write storms when the file is larger than RAM [[1]] [1] TRUE [[2]] [1] TRUE system.time( + sfLapply( chunk(x, length=ncpus), function(i){ + x[i] - runif(sum(i)) + invisible() + }) + ) User System verstrichen 0.000.00 30.78 system.time( + s - sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05, 0.95)) ) + ) User System verstrichen 0.000.004.38 # for completeness sfClusterEval(close(x)) [[1]] [1] TRUE [[2]] [1] TRUE csummary(s) 5% 95% Min.0.04998 0.95 1st Qu. 0.04999 0.95 Median 0.05001 0.95 Mean0.05001 0.95 3rd Qu. 0.05002 0.95 Max.0.05003 0.95 # stop slaves sfStop() Stopping cluster # with the close finalizer we are responsible for deleting the file explicitely (unless we want to keep it) delete(x) [1] TRUE # remove r-side metadata rm(x) # truly free memory gc() Gesendet: Donnerstag, 03. Mai 2012 um 00:23 Uhr Von: Jonathan Greenberg j...@illinois.edu An: r-help r-help@r-project.org, r-sig-...@r-project.org Betreff: [R-sig-hpc] Quickest way to make a large empty file on disk? R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 [1]http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing list r-sig-...@r-project.org [2]https://stat.ethz.ch/mailman/listinfo/r-sig-hpc References 1. http://www.geog.illinois.edu/people/JonathanGreenberg.html 2. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
Thanks, all! I'll try these out. I'm trying to work up something that is platform independent (if possible) for use with mmap. I'll do some tests on these suggestions and see which works best. I'll try to report back in a few days. Cheers! --j 2012/5/3 Jens Oehlschlägel jens.oehlschlae...@truecluster.com Jonathan, On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse' memory-mapped files, i.e. reserving the space without the cost of actually writing initial values. Package 'ff' does this automatically and also allows to access the file in parallel. Check the example below and see how big file creation is immediate. Jens Oehlschlägel library(ff) library(snowfall) ncpus - 2 n - 1e8 system.time( + x - ff(vmode=double, length=n, filename=c:/Temp/x.ff) + ) User System verstrichen 0.010.000.02 # check finalizer, with an explicit filename we should have a 'close' finalizer finalizer(x) [1] close # if not, set it to 'close' inorder to not let slaves delete x on slave shutdown finalizer(x) - close sfInit(parallel=TRUE, cpus=ncpus, type=SOCK) R Version: R version 2.15.0 (2012-03-30) snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs. sfLibrary(ff) Library ff loaded. Library ff loaded in cluster. Warnmeldung: In library(package = ff, character.only = TRUE, pos = 2, warn.conflicts = TRUE, : 'keep.source' is deprecated and will be ignored sfExport(x) # note: do not export the same ff multiple times # explicitely opening avoids a gc problem sfClusterEval(open(x, caching=mmeachflush)) # opening with 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS write storms when the file is larger than RAM [[1]] [1] TRUE [[2]] [1] TRUE system.time( + sfLapply( chunk(x, length=ncpus), function(i){ + x[i] - runif(sum(i)) + invisible() + }) + ) User System verstrichen 0.000.00 30.78 system.time( + s - sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05, 0.95)) ) + ) User System verstrichen 0.000.004.38 # for completeness sfClusterEval(close(x)) [[1]] [1] TRUE [[2]] [1] TRUE csummary(s) 5% 95% Min.0.04998 0.95 1st Qu. 0.04999 0.95 Median 0.05001 0.95 Mean0.05001 0.95 3rd Qu. 0.05002 0.95 Max.0.05003 0.95 # stop slaves sfStop() Stopping cluster # with the close finalizer we are responsible for deleting the file explicitely (unless we want to keep it) delete(x) [1] TRUE # remove r-side metadata rm(x) # truly free memory gc() *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr *Von:* Jonathan Greenberg j...@illinois.edu *An:* r-help r-help@r-project.org, r-sig-...@r-project.org *Betreff:* [R-sig-hpc] Quickest way to make a large empty file on disk? R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing list r-sig-...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
Look at the man page for dd (assuming you are on *nix) A quick google will get you a command to try. I'm not at my desk or I would as well. Jeff Jeffrey Ryan|Founder|jeffrey.r...@lemnica.com www.lemnica.com On May 2, 2012, at 5:23 PM, Jonathan Greenberg j...@illinois.edu wrote: R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing list r-sig-...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
Something like: http://markus.revti.com/2007/06/creating-empty-file-with-specified-size/ Is one way I know of. Jeff Jeffrey Ryan|Founder|jeffrey.r...@lemnica.com www.lemnica.com On May 2, 2012, at 5:23 PM, Jonathan Greenberg j...@illinois.edu wrote: R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing list r-sig-...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
On May 2, 2012, at 6:23 PM, Jonathan Greenberg wrote: R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. The most trivial way is to simply seek to the end and write a byte: n=10 f=file(foo,wb) seek(f,n-1) [1] 0 writeBin(raw(1),f) close(f) file.info(foo)$size [1] 1e+05 Cheers, Simon Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing list r-sig-...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
On most UNIX systems this will leave a large unallocated virtual hole in the file. If you are not bothered by spreading the allocation task out over the program execution interval, this won't matter and will probably give the best performance. However, if you wanted to benchmark your algorithms without the erratic filesystem updates mixed in, then you need to write all of those zeroes. For that to work most efficiently, write data in large blocks, and if possible bypass the C standard library. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Simon Urbanek simon.urba...@r-project.org wrote: On May 2, 2012, at 6:23 PM, Jonathan Greenberg wrote: R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. The most trivial way is to simply seek to the end and write a byte: n=10 f=file(foo,wb) seek(f,n-1) [1] 0 writeBin(raw(1),f) close(f) file.info(foo)$size [1] 1e+05 Cheers, Simon Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing list r-sig-...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] Quickest way to make a large empty file on disk?
Jonathon, 10,000 numbers is pretty small, so I don't think time will be a big problem. You could write this using writeBin with no problems. For larger files, why not just use a loop? The writing is pretty fast, so I don't think you'll have too many problems. On my machine: ptm - proc.time() zz - file(testbin.bin, wb) for(i in 10) writeBin(rep(0,1),zz, size=16) close(zz) proc.time() - ptm user system elapsed 2.416 1.728 16.705 Otherwise I would suggest writing a little piece of c code to do what you want. Robert -Original Message- From: r-sig-hpc-boun...@r-project.org [mailto:r-sig-hpc-boun...@r-project.org] On Behalf Of Jonathan Greenberg Sent: Thursday, 3 May 2012 8:24 AM To: r-help; r-sig-...@r-project.org Subject: [R-sig-hpc] Quickest way to make a large empty file on disk? R-helpers: What would be the absolute fastest way to make a large empty file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the object in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] ___ R-sig-hpc mailing list r-sig-...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc -- The information in this email together with any attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. There is no waiver of any confidentiality/privilege by your inadvertent receipt of this material. Any form of review, disclosure, modification, distribution and/or publication of this email message is prohibited, unless as a necessary part of Departmental business. If you have received this message in error, you are asked to inform the sender as quickly as possible and delete this message and any copies of this message from your computer and/or your computer system network. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.