Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-09-28 Thread Simon Urbanek

On Sep 28, 2012, at 12:44 PM, Jonathan Greenberg wrote:

> Rui:
> 
> Quick follow-up -- it looks like seek does do what I want (I see Simon
> suggested it some time ago) -- what do mean by "trash your disk"?  

I can't speak for Rui, but the difference between seeking and explicit write is 
that the FS can optimize the former by not actually writing anything to disk 
(which is why it's so fast on some OS/FS combos). However, what this means that 
the layout on the disk may not be sequential depending on the write patterns of 
the actual data blocks, because the FS may keep a mask of unused blocks and 
don't write them. But that is just a FS issue and thus varies vasty by OS and 
FS. For your use this probably doesn't matter as you probably don't need to 
stream the resulting file at the end.


> What I'm
> trying to accomplish is getting parallel, asynchronous writes to a large
> binary image (just a binary file) working.  Each node writes to a different
> sector of the file via mmap, "filling in" the values as the process runs,
> but the file needs to be pre-created before I can mmap it.  Running a
> writeBin with a bunch of 0s would mean I'd basically have to write the file
> twice, but the seek/ff trick seems to be much faster.
> 
> Do I risk doing some damage to my filesystem if I use seek?  I see there is
> a strongly worded warning in the help for ?seek:
> 
> "Use of seek on Windows is discouraged. We have found so many errors in the
> Windows implementation of file positioning that users are advised to use it
> only at their own risk, and asked not to waste the *R* developers' time
> with bug reports on Windows' deficiencies." --> there's no detail here on
> which errors people have experienced, so I'm not sure if doing something as
> simple as just "creating" a file using seek falls under the "discouraging"
> category.
> 

Quick search in my mail shows issues that were related to what Windows reports 
as the seek location on text files when querying. AFAICS it did not affect the 
side-effect of seek which is what you're interested in.

Cheers,
Simon


> As a note, we are trying to work this up on both Windows and *nix systems,
> hence our wanting to have a single approach that works on both OSs.
> 
> --j
> 
> 
> On Thu, Sep 27, 2012 at 3:49 PM, Rui Barradas  wrote:
> 
>> Hello,
>> 
>> If you really need to trash your disk, why not use seek()?
>> 
>>> fl <- file("Test.txt", open = "wb")
>>> seek(fl, where = 1024, origin = "start", rw = "write")
>> [1] 0
>>> writeChar(character(1), fl, nchars = 1, useBytes = TRUE)
>> Warning message:
>> In writeChar(character(1), fl, nchars = 1, useBytes = TRUE) :
>>  writeChar: more characters requested than are in the string - will
>> zero-pad
>>> close(fl)
>> 
>> 
>> File "Test.txt" is now 1Kb in size.
>> 
>> Hope this helps,
>> 
>> Rui Barradas
>> Em 27-09-2012 20:17, Jonathan Greenberg escreveu:
>> 
>> Folks:
>> 
>> Asked this question some time ago, and found what appeared (at first) to be
>> the best solution, but I'm now finding a new problem.  First off, it seemed
>> like ff as Jens suggested worked:
>> 
>> # outdata_ncells = the number of rows * number of columns * number of bands
>> in an image:
>> out<-ff(vmode="double",length=outdata_ncells,filename=filename)
>> finalizer(out) <- close
>> close(out)
>> 
>> This was working fine until I attempted to set length to a VERY large
>> number: outdata_ncells = 17711913600.  This would create a file that is
>> 131.964GB.  Big, but not obscenely so (and certainly not larger than the
>> filesystem can handle).  However, length appears to be restricted
>> by .Machine$integer.max (I'm on a 64-bit windows box):
>> 
>> .Machine$integer.max
>> 
>> [1] 2147483647
>> 
>> Any suggestions on how to solve this problem for much larger file sizes?
>> 
>> --j
>> 
>> 
>> On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg  
>> wrote:
>> 
>> 
>> Thanks, all!  I'll try these out.  I'm trying to work up something that is
>> platform independent (if possible) for use with mmap.  I'll do some tests
>> on these suggestions and see which works best. I'll try to report back in a
>> few days.  Cheers!
>> 
>> --j
>> 
>> 
>> 
>> 2012/5/3 "Jens Oehlschlägel"  
>> 
>> 
>> Jonathan,
>> 
>> On some filesystems (e.g. NTFS, see below) it is possible to create
>> 'sparse' memory-mapped files, i.e. reserving the space without the cost of
>> actually writing initial values.
>> Package 'ff' does this automatically and also allows to access the file
>> in parallel. Check the example below and see how big file creation is
>> immediate.
>> 
>> Jens Oehlschlägel
>> 
>> 
>> 
>> library(ff)
>> library(snowfall)
>> ncpus <- 2
>> n <- 1e8
>> system.time(
>> 
>> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
>> + )
>>   User  System verstrichen
>>   0.010.000.02
>> 
>> # check finalizer, with an explicit filename we should have a 'close'
>> 
>> finalizer
>> 
>> finalizer(x)
>> 
>> [1] "close"
>> 
>> # if not, set it

Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-09-28 Thread Rui Barradas

Hello,

I've written a function to try to answer to your op request, but I've 
run into a problem. See in the end.

In the mean time, inline.
Em 28-09-2012 17:44, Jonathan Greenberg escreveu:

Rui:

Quick follow-up -- it looks like seek does do what I want (I see Simon
suggested it some time ago) -- what do mean by "trash your disk"?
Nothing special, just that sometimes there are good ways of doing so. 
mmap seems to be safe.

   What I'm
trying to accomplish is getting parallel, asynchronous writes to a large
binary image (just a binary file) working.  Each node writes to a different
sector of the file via mmap, "filling in" the values as the process runs,
but the file needs to be pre-created before I can mmap it.  Running a
writeBin with a bunch of 0s would mean I'd basically have to write the file
twice, but the seek/ff trick seems to be much faster.

Do I risk doing some damage to my filesystem if I use seek?  I see there is
a strongly worded warning in the help for ?seek:

"Use of seek on Windows is discouraged. We have found so many errors in the
Windows implementation of file positioning that users are advised to use it
only at their own risk, and asked not to waste the *R* developers' time
with bug reports on Windows' deficiencies." --> there's no detail here on
which errors people have experienced, so I'm not sure if doing something as
simple as just "creating" a file using seek falls under the "discouraging"
category.


I'm not a great system programmer but in 20+ years of using seek on 
Windows has shown nothing of the sort. In fact, I've just found a 
problem with ubuntu 12.04, where seek gives the expected result on 
Windows, it goes up to a certain point on ubuntu and then "stops 
seeking", or whatever is happening. I installed ubuntu very recently so 
I really don't know why the behavior that you can see in the example run 
below. But I do that Windows 7 is causing no problem, as expected.

As a note, we are trying to work this up on both Windows and *nix systems,
hence our wanting to have a single approach that works on both OSs.

--j


#
# Function: creates a file of ascii nulls using seek/writeBin. File size 
can be big.

#
createBig <- function(filename, size){
if(size == 0) return(0)
chunk <- .Machine$integer.max
nchunks <- as.integer(size / chunk)
rest <- size - as.double(nchunks)*as.double(chunk)
fl <- file(filename, open = "wb")
for(i in seq_len(nchunks)){
seek(fl, where = chunk - 1, origin = "current", rw = "write")
writeBin(raw(1), fl)
# -- debug --
print(seek(fl, where = NA))
}
if(rest > 0){
seek(fl, where = rest - 1, origin = "current", rw = "write")
writeBin(raw(1), fl)
}
close(fl)
}

As you can see from the debug prints, on Windows 7,  everything works as 
planned while on ubuntu 12.04 when it reaches 17Gb seek stops seeking. 
The increments in file size become 1 byte at a time, explained by the 
writeBin instruction. (The different, slightly larger, size is 
irrelevant, the code was ran several times all with the same result:  at 
17179869176 bytes it no longer works.)


#
#
# System: Windows 7 / R 2.15.1

size <- 10*.Machine$integer.max + sample(.Machine$integer.max, 1)
size
[1] 22195364413

createBig("Test.txt", size)
[1] 2147483647
[1] 4294967294
[1] 6442450941
[1] 8589934588
[1] 10737418235
[1] 12884901882
[1] 15032385529
[1] 17179869176
[1] 19327352823
[1] 21474836470

file.info("Test.txt")$size
[1] 22195364413
file.info("Test.txt")$size %/% .Machine$integer.max
[1] 10
file.info("Test.txt")$size %% .Machine$integer.max
[1] 720527943

sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] fortunes_1.5-0

#
#
# System: ubuntu 12.04 precise pangolim / R 2.15.1
size <- 10*.Machine$integer.max + sample(.Machine$integer.max, 1)
size
[1] 23091487381

createBig("Test.txt", size)
[1] 2147483647
[1] 4294967294
[1] 6442450941
[1] 8589934588
[1] 10737418235
[1] 12884901882
[1] 15032385529
[1] 17179869176
[1] 17179869177
[1] 17179869178

file.info("Test.txt")$size
[1] 17179869179
file.info("Test.txt")$size %/% .Machine$integer.max
[1] 8
file.info("Test.txt")$size %% .Machine$integer.max
[1] 3


sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=pt_PT.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8
 [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C

Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-09-28 Thread Jonathan Greenberg
Rui:

Quick follow-up -- it looks like seek does do what I want (I see Simon
suggested it some time ago) -- what do mean by "trash your disk"?  What I'm
trying to accomplish is getting parallel, asynchronous writes to a large
binary image (just a binary file) working.  Each node writes to a different
sector of the file via mmap, "filling in" the values as the process runs,
but the file needs to be pre-created before I can mmap it.  Running a
writeBin with a bunch of 0s would mean I'd basically have to write the file
twice, but the seek/ff trick seems to be much faster.

Do I risk doing some damage to my filesystem if I use seek?  I see there is
a strongly worded warning in the help for ?seek:

"Use of seek on Windows is discouraged. We have found so many errors in the
Windows implementation of file positioning that users are advised to use it
only at their own risk, and asked not to waste the *R* developers' time
with bug reports on Windows' deficiencies." --> there's no detail here on
which errors people have experienced, so I'm not sure if doing something as
simple as just "creating" a file using seek falls under the "discouraging"
category.

As a note, we are trying to work this up on both Windows and *nix systems,
hence our wanting to have a single approach that works on both OSs.

--j


On Thu, Sep 27, 2012 at 3:49 PM, Rui Barradas  wrote:

>  Hello,
>
> If you really need to trash your disk, why not use seek()?
>
> > fl <- file("Test.txt", open = "wb")
> > seek(fl, where = 1024, origin = "start", rw = "write")
> [1] 0
> > writeChar(character(1), fl, nchars = 1, useBytes = TRUE)
> Warning message:
> In writeChar(character(1), fl, nchars = 1, useBytes = TRUE) :
>   writeChar: more characters requested than are in the string - will
> zero-pad
> > close(fl)
>
>
> File "Test.txt" is now 1Kb in size.
>
> Hope this helps,
>
> Rui Barradas
> Em 27-09-2012 20:17, Jonathan Greenberg escreveu:
>
> Folks:
>
> Asked this question some time ago, and found what appeared (at first) to be
> the best solution, but I'm now finding a new problem.  First off, it seemed
> like ff as Jens suggested worked:
>
> # outdata_ncells = the number of rows * number of columns * number of bands
> in an image:
> out<-ff(vmode="double",length=outdata_ncells,filename=filename)
> finalizer(out) <- close
> close(out)
>
> This was working fine until I attempted to set length to a VERY large
> number: outdata_ncells = 17711913600.  This would create a file that is
> 131.964GB.  Big, but not obscenely so (and certainly not larger than the
> filesystem can handle).  However, length appears to be restricted
> by .Machine$integer.max (I'm on a 64-bit windows box):
>
>  .Machine$integer.max
>
>  [1] 2147483647
>
> Any suggestions on how to solve this problem for much larger file sizes?
>
> --j
>
>
> On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg  
> wrote:
>
>
>  Thanks, all!  I'll try these out.  I'm trying to work up something that is
> platform independent (if possible) for use with mmap.  I'll do some tests
> on these suggestions and see which works best. I'll try to report back in a
> few days.  Cheers!
>
> --j
>
>
>
> 2012/5/3 "Jens Oehlschlägel"  
> 
>
>  Jonathan,
>
> On some filesystems (e.g. NTFS, see below) it is possible to create
> 'sparse' memory-mapped files, i.e. reserving the space without the cost of
> actually writing initial values.
> Package 'ff' does this automatically and also allows to access the file
> in parallel. Check the example below and see how big file creation is
> immediate.
>
> Jens Oehlschlägel
>
>
>
>  library(ff)
> library(snowfall)
> ncpus <- 2
> n <- 1e8
> system.time(
>
>  + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
> + )
>User  System verstrichen
>0.010.000.02
>
>  # check finalizer, with an explicit filename we should have a 'close'
>
>  finalizer
>
>  finalizer(x)
>
>  [1] "close"
>
>  # if not, set it to 'close' inorder to not let slaves delete x on slave
>
>  shutdown
>
>  finalizer(x) <- "close"
> sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
>
>  R Version:  R version 2.15.0 (2012-03-30)
>
> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
> CPUs.
>
>
>  sfLibrary(ff)
>
>  Library ff loaded.
> Library ff loaded in cluster.
>
> Warnmeldung:
> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
> = TRUE,  :
>   'keep.source' is deprecated and will be ignored
>
>  sfExport("x") # note: do not export the same ff multiple times
> # explicitely opening avoids a gc problem
> sfClusterEval(open(x, caching="mmeachflush")) # opening with
>
>  'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
> write storms when the file is larger than RAM
> [[1]]
> [1] TRUE
>
> [[2]]
> [1] TRUE
>
>
>  system.time(
>
>  + sfLapply( chunk(x, length=ncpus), function(i){
> +   x[i] <- runif(sum(i))
> +   invisible()
> + })
> + )
>User  System verstrichen
>0.000.00   30.78
>
>

Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-09-28 Thread jens . oehlschlaegel

   Jonathan,
   ff has a utility function file.resize() which allows to give a new filesize
   in bytes using doubles.
   See ?file.resize
   Regards
   Jens Oehlschlägel
   Gesendet: Donnerstag, 27. September 2012 um 21:17 Uhr
   Von: "Jonathan Greenberg" 
   An: r-help , r-sig-...@r-project.org
   Betreff: Re: [R-sig-hpc] Quickest way to make a large "empty" file on disk?
   Folks:
   Asked this question some time ago, and found what appeared (at first) to be
   the best solution, but I'm now finding a new problem. First off, it seemed
   like ff as Jens suggested worked:
   # outdata_ncells = the number of rows * number of columns * number of bands
   in an image:
   out<-ff(vmode="double",length=outdata_ncells,filename=filename)
   finalizer(out) <- close
   close(out)
   This was working fine until I attempted to set length to a VERY large
   number: outdata_ncells = 17711913600. This would create a file that is
   131.964GB. Big, but not obscenely so (and certainly not larger than the
   filesystem can handle). However, length appears to be restricted
   by .Machine$integer.max (I'm on a 64-bit windows box):
   > .Machine$integer.max
   [1] 2147483647
   Any suggestions on how to solve this problem for much larger file sizes?
   --j
   OnThu,   May   3,   2012   at   10:44   AM,   Jonathan   Greenberg
   wrote:
   > Thanks, all! I'll try these out. I'm trying to work up something that is
   > platform independent (if possible) for use with mmap. I'll do some tests
   > on these suggestions and see which works best. I'll try to report back in
   a
   > few days. Cheers!
   >
   > --j
   >
   >
   >
   > 2012/5/3 "Jens Oehlschlägel" 
   >
   >> Jonathan,
   >>
   >> On some filesystems (e.g. NTFS, see below) it is possible to create
   >> 'sparse' memory-mapped files, i.e. reserving the space without the cost
   of
   >> actually writing initial values.
   >> Package 'ff' does this automatically and also allows to access the file
   >> in parallel. Check the example below and see how big file creation is
   >> immediate.
   >>
   >> Jens Oehlschlägel
   >>
   >>
   >> > library(ff)
   >> > library(snowfall)
   >> > ncpus <- 2
   >> > n <- 1e8
   >> > system.time(
   >> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
   >> + )
   >> User System verstrichen
   >> 0.01 0.00 0.02
   >> > # check finalizer, with an explicit filename we should have a 'close'
   >> finalizer
   >> > finalizer(x)
   >> [1] "close"
   >> > # if not, set it to 'close' inorder to not let slaves delete x on slave
   >> shutdown
   >> > finalizer(x) <- "close"
   >> > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
   >> R Version: R version 2.15.0 (2012-03-30)
   >>
   >> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
   >> CPUs.
   >>
   >> > sfLibrary(ff)
   >> Library ff loaded.
   >> Library ff loaded in cluster.
   >>
   >> Warnmeldung:
   >> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
   >> = TRUE, :
   >> 'keep.source' is deprecated and will be ignored
   >> > sfExport("x") # note: do not export the same ff multiple times
   >> > # explicitely opening avoids a gc problem
   >> > sfClusterEval(open(x, caching="mmeachflush")) # opening with
   >> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
   >> write storms when the file is larger than RAM
   >> [[1]]
   >> [1] TRUE
   >>
   >> [[2]]
   >> [1] TRUE
   >>
   >> > system.time(
   >> + sfLapply( chunk(x, length=ncpus), function(i){
   >> + x[i] <- runif(sum(i))
   >> + invisible()
   >> + })
   >> + )
   >> User System verstrichen
   >> 0.00 0.00 30.78
   >> > system.time(
   >> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i],
   >> c(0.05, 0.95)) )
   >> + )
   >> User System verstrichen
   >> 0.00 0.00 4.38
   >> > # for completeness
   >> > sfClusterEval(close(x))
   >> [[1]]
   >> [1] TRUE
   >>
   >> [[2]]
   >> [1] TRUE
   >>
   >> > csummary(s)
   >> 5% 95%
   >> Min. 0.04998 0.95
   >> 1st Qu. 0.04999 0.95
   >> Median 0.05001 0.95
   >> Mean 0.05001 0.95
   >> 3rd Qu. 0.05002 0.95
   >> Max. 0.05003 0.95
   >> > # stop slaves
   >> > sfStop()
   >>
   >> Stopping cluster
   >>
   >> > # with the close finalizer we are responsible for deleting the file
   >> explicitely (unless we want to keep it)
   >> > delete(x)
   >> [1] TRUE
   >> > # remove r-side metadata
   >> > rm(x)
   >> > # truly free memory
   >> > gc()
   >>
   >>
   >>
   >> *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr
   >> *Von:* "Jonathan Greenberg" 
   >> *An:* r-help , r-sig-...@r-project.org
   >> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on
   >> disk?
   >> R-helpers:
   >>
   >> What would be the absolute fastest way to make a large "empty" file (e.g.
   >> filled with all zeroes) on disk, given a byte size and a given number
   >> number of empty values. I know I can use writeBin, but the "object" in
   >>  this case may be far too large to store in main

Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-09-27 Thread Rui Barradas
Hello,

If you really need to trash your disk, why not use seek()?

 > fl <- file("Test.txt", open = "wb")
 > seek(fl, where = 1024, origin = "start", rw = "write")
[1] 0
 > writeChar(character(1), fl, nchars = 1, useBytes = TRUE)
Warning message:
In writeChar(character(1), fl, nchars = 1, useBytes = TRUE) :
   writeChar: more characters requested than are in the string - will 
zero-pad
 > close(fl)


File "Test.txt" is now 1Kb in size.

Hope this helps,

Rui Barradas
Em 27-09-2012 20:17, Jonathan Greenberg escreveu:
> Folks:
>
> Asked this question some time ago, and found what appeared (at first) to be
> the best solution, but I'm now finding a new problem.  First off, it seemed
> like ff as Jens suggested worked:
>
> # outdata_ncells = the number of rows * number of columns * number of bands
> in an image:
> out<-ff(vmode="double",length=outdata_ncells,filename=filename)
> finalizer(out) <- close
> close(out)
>
> This was working fine until I attempted to set length to a VERY large
> number: outdata_ncells = 17711913600.  This would create a file that is
> 131.964GB.  Big, but not obscenely so (and certainly not larger than the
> filesystem can handle).  However, length appears to be restricted
> by .Machine$integer.max (I'm on a 64-bit windows box):
>> .Machine$integer.max
> [1] 2147483647
>
> Any suggestions on how to solve this problem for much larger file sizes?
>
> --j
>
>
> On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg wrote:
>
>> Thanks, all!  I'll try these out.  I'm trying to work up something that is
>> platform independent (if possible) for use with mmap.  I'll do some tests
>> on these suggestions and see which works best. I'll try to report back in a
>> few days.  Cheers!
>>
>> --j
>>
>>
>>
>> 2012/5/3 "Jens Oehlschlägel" 
>>
>>> Jonathan,
>>>
>>> On some filesystems (e.g. NTFS, see below) it is possible to create
>>> 'sparse' memory-mapped files, i.e. reserving the space without the cost of
>>> actually writing initial values.
>>> Package 'ff' does this automatically and also allows to access the file
>>> in parallel. Check the example below and see how big file creation is
>>> immediate.
>>>
>>> Jens Oehlschlägel
>>>
>>>
 library(ff)
 library(snowfall)
 ncpus <- 2
 n <- 1e8
 system.time(
>>> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
>>> + )
>>> User  System verstrichen
>>> 0.010.000.02
 # check finalizer, with an explicit filename we should have a 'close'
>>> finalizer
 finalizer(x)
>>> [1] "close"
 # if not, set it to 'close' inorder to not let slaves delete x on slave
>>> shutdown
 finalizer(x) <- "close"
 sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
>>> R Version:  R version 2.15.0 (2012-03-30)
>>>
>>> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
>>> CPUs.
>>>
 sfLibrary(ff)
>>> Library ff loaded.
>>> Library ff loaded in cluster.
>>>
>>> Warnmeldung:
>>> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
>>> = TRUE,  :
>>>'keep.source' is deprecated and will be ignored
 sfExport("x") # note: do not export the same ff multiple times
 # explicitely opening avoids a gc problem
 sfClusterEval(open(x, caching="mmeachflush")) # opening with
>>> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
>>> write storms when the file is larger than RAM
>>> [[1]]
>>> [1] TRUE
>>>
>>> [[2]]
>>> [1] TRUE
>>>
 system.time(
>>> + sfLapply( chunk(x, length=ncpus), function(i){
>>> +   x[i] <- runif(sum(i))
>>> +   invisible()
>>> + })
>>> + )
>>> User  System verstrichen
>>> 0.000.00   30.78
 system.time(
>>> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i],
>>> c(0.05, 0.95)) )
>>> + )
>>> User  System verstrichen
>>> 0.000.004.38
 # for completeness
 sfClusterEval(close(x))
>>> [[1]]
>>> [1] TRUE
>>>
>>> [[2]]
>>> [1] TRUE
>>>
 csummary(s)
>>>   5%  95%
>>> Min.0.04998 0.95
>>> 1st Qu. 0.04999 0.95
>>> Median  0.05001 0.95
>>> Mean0.05001 0.95
>>> 3rd Qu. 0.05002 0.95
>>> Max.0.05003 0.95
 # stop slaves
 sfStop()
>>> Stopping cluster
>>>
 # with the close finalizer we are responsible for deleting the file
>>> explicitely (unless we want to keep it)
 delete(x)
>>> [1] TRUE
 # remove r-side metadata
 rm(x)
 # truly free memory
 gc()
>>>
>>>
>>>   *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr
>>> *Von:* "Jonathan Greenberg" 
>>> *An:* r-help , r-sig-...@r-project.org
>>> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on
>>> disk?
>>>   R-helpers:
>>>
>>> What would be the absolute fastest way to make a large "empty" file (e.g.
>>> filled with all zeroes) on disk, given a byte size and a given number
>>> number of empty values. I know I can use writeBin, but the "object" in
>>> this case may be far too large to store in main memory. I'm as

Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-09-27 Thread Jonathan Greenberg
Folks:

Asked this question some time ago, and found what appeared (at first) to be
the best solution, but I'm now finding a new problem.  First off, it seemed
like ff as Jens suggested worked:

# outdata_ncells = the number of rows * number of columns * number of bands
in an image:
out<-ff(vmode="double",length=outdata_ncells,filename=filename)
finalizer(out) <- close
close(out)

This was working fine until I attempted to set length to a VERY large
number: outdata_ncells = 17711913600.  This would create a file that is
131.964GB.  Big, but not obscenely so (and certainly not larger than the
filesystem can handle).  However, length appears to be restricted
by .Machine$integer.max (I'm on a 64-bit windows box):
> .Machine$integer.max
[1] 2147483647

Any suggestions on how to solve this problem for much larger file sizes?

--j


On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg wrote:

> Thanks, all!  I'll try these out.  I'm trying to work up something that is
> platform independent (if possible) for use with mmap.  I'll do some tests
> on these suggestions and see which works best. I'll try to report back in a
> few days.  Cheers!
>
> --j
>
>
>
> 2012/5/3 "Jens Oehlschlägel" 
>
>> Jonathan,
>>
>> On some filesystems (e.g. NTFS, see below) it is possible to create
>> 'sparse' memory-mapped files, i.e. reserving the space without the cost of
>> actually writing initial values.
>> Package 'ff' does this automatically and also allows to access the file
>> in parallel. Check the example below and see how big file creation is
>> immediate.
>>
>> Jens Oehlschlägel
>>
>>
>> > library(ff)
>> > library(snowfall)
>> > ncpus <- 2
>> > n <- 1e8
>> > system.time(
>> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
>> + )
>>User  System verstrichen
>>0.010.000.02
>> > # check finalizer, with an explicit filename we should have a 'close'
>> finalizer
>> > finalizer(x)
>> [1] "close"
>> > # if not, set it to 'close' inorder to not let slaves delete x on slave
>> shutdown
>> > finalizer(x) <- "close"
>> > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
>> R Version:  R version 2.15.0 (2012-03-30)
>>
>> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
>> CPUs.
>>
>> > sfLibrary(ff)
>> Library ff loaded.
>> Library ff loaded in cluster.
>>
>> Warnmeldung:
>> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
>> = TRUE,  :
>>   'keep.source' is deprecated and will be ignored
>> > sfExport("x") # note: do not export the same ff multiple times
>> > # explicitely opening avoids a gc problem
>> > sfClusterEval(open(x, caching="mmeachflush")) # opening with
>> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
>> write storms when the file is larger than RAM
>> [[1]]
>> [1] TRUE
>>
>> [[2]]
>> [1] TRUE
>>
>> > system.time(
>> + sfLapply( chunk(x, length=ncpus), function(i){
>> +   x[i] <- runif(sum(i))
>> +   invisible()
>> + })
>> + )
>>User  System verstrichen
>>0.000.00   30.78
>> > system.time(
>> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i],
>> c(0.05, 0.95)) )
>> + )
>>User  System verstrichen
>>0.000.004.38
>> > # for completeness
>> > sfClusterEval(close(x))
>> [[1]]
>> [1] TRUE
>>
>> [[2]]
>> [1] TRUE
>>
>> > csummary(s)
>>  5%  95%
>> Min.0.04998 0.95
>> 1st Qu. 0.04999 0.95
>> Median  0.05001 0.95
>> Mean0.05001 0.95
>> 3rd Qu. 0.05002 0.95
>> Max.0.05003 0.95
>> > # stop slaves
>> > sfStop()
>>
>> Stopping cluster
>>
>> > # with the close finalizer we are responsible for deleting the file
>> explicitely (unless we want to keep it)
>> > delete(x)
>> [1] TRUE
>> > # remove r-side metadata
>> > rm(x)
>> > # truly free memory
>> > gc()
>>
>>
>>
>>  *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr
>> *Von:* "Jonathan Greenberg" 
>> *An:* r-help , r-sig-...@r-project.org
>> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on
>> disk?
>>  R-helpers:
>>
>> What would be the absolute fastest way to make a large "empty" file (e.g.
>> filled with all zeroes) on disk, given a byte size and a given number
>> number of empty values. I know I can use writeBin, but the "object" in
>> this case may be far too large to store in main memory. I'm asking because
>> I'm going to use this file in conjunction with mmap to do parallel writes
>> to this file. Say, I want to create a blank file of 10,000 floating point
>> numbers.
>>
>> Thanks!
>>
>> --j
>>
>> --
>> Jonathan A. Greenberg, PhD
>> Assistant Professor
>> Department of Geography and Geographic Information Science
>> University of Illinois at Urbana-Champaign
>> 607 South Mathews Avenue, MC 150
>> Urbana, IL 61801
>> Phone: 415-763-5476
>> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
>> http://www.geog.illinois.edu/people/JonathanGreenberg.html
>>
>> [[alternative HTML version deleted]]
>>
>> __

Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-05-03 Thread Jonathan Greenberg
Thanks, all!  I'll try these out.  I'm trying to work up something that is
platform independent (if possible) for use with mmap.  I'll do some tests
on these suggestions and see which works best. I'll try to report back in a
few days.  Cheers!

--j



2012/5/3 "Jens Oehlschlägel" 

> Jonathan,
>
> On some filesystems (e.g. NTFS, see below) it is possible to create
> 'sparse' memory-mapped files, i.e. reserving the space without the cost of
> actually writing initial values.
> Package 'ff' does this automatically and also allows to access the file in
> parallel. Check the example below and see how big file creation is
> immediate.
>
> Jens Oehlschlägel
>
>
> > library(ff)
> > library(snowfall)
> > ncpus <- 2
> > n <- 1e8
> > system.time(
> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
> + )
>User  System verstrichen
>0.010.000.02
> > # check finalizer, with an explicit filename we should have a 'close'
> finalizer
> > finalizer(x)
> [1] "close"
> > # if not, set it to 'close' inorder to not let slaves delete x on slave
> shutdown
> > finalizer(x) <- "close"
> > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
> R Version:  R version 2.15.0 (2012-03-30)
>
> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs.
>
> > sfLibrary(ff)
> Library ff loaded.
> Library ff loaded in cluster.
>
> Warnmeldung:
> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
> = TRUE,  :
>   'keep.source' is deprecated and will be ignored
> > sfExport("x") # note: do not export the same ff multiple times
> > # explicitely opening avoids a gc problem
> > sfClusterEval(open(x, caching="mmeachflush")) # opening with
> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
> write storms when the file is larger than RAM
> [[1]]
> [1] TRUE
>
> [[2]]
> [1] TRUE
>
> > system.time(
> + sfLapply( chunk(x, length=ncpus), function(i){
> +   x[i] <- runif(sum(i))
> +   invisible()
> + })
> + )
>User  System verstrichen
>0.000.00   30.78
> > system.time(
> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i],
> c(0.05, 0.95)) )
> + )
>User  System verstrichen
>0.000.004.38
> > # for completeness
> > sfClusterEval(close(x))
> [[1]]
> [1] TRUE
>
> [[2]]
> [1] TRUE
>
> > csummary(s)
>  5%  95%
> Min.0.04998 0.95
> 1st Qu. 0.04999 0.95
> Median  0.05001 0.95
> Mean0.05001 0.95
> 3rd Qu. 0.05002 0.95
> Max.0.05003 0.95
> > # stop slaves
> > sfStop()
>
> Stopping cluster
>
> > # with the close finalizer we are responsible for deleting the file
> explicitely (unless we want to keep it)
> > delete(x)
> [1] TRUE
> > # remove r-side metadata
> > rm(x)
> > # truly free memory
> > gc()
>
>
>
>  *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr
> *Von:* "Jonathan Greenberg" 
> *An:* r-help , r-sig-...@r-project.org
> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on disk?
>  R-helpers:
>
> What would be the absolute fastest way to make a large "empty" file (e.g.
> filled with all zeroes) on disk, given a byte size and a given number
> number of empty values. I know I can use writeBin, but the "object" in
> this case may be far too large to store in main memory. I'm asking because
> I'm going to use this file in conjunction with mmap to do parallel writes
> to this file. Say, I want to create a blank file of 10,000 floating point
> numbers.
>
> Thanks!
>
> --j
>
> --
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 607 South Mathews Avenue, MC 150
> Urbana, IL 61801
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
> http://www.geog.illinois.edu/people/JonathanGreenberg.html
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-hpc mailing list
> r-sig-...@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
>
>


-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
http://www.geog.illinois.edu/people/JonathanGreenberg.html

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-05-03 Thread Jens Oehlschlägel

   Jonathan,
   On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse'
   memory-mapped files, i.e. reserving the space without the cost of actually
   writing initial values.
   Package 'ff' does this automatically and also allows to access the file in
   parallel.  Check  the  example  below and see how big file creation is
   immediate.
   Jens Oehlschlägel
   > library(ff)
   > library(snowfall)
   > ncpus <- 2
   > n <- 1e8
   > system.time(
   + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
   + )
  User  System verstrichen
  0.010.000.02
   > # check finalizer, with an explicit filename we should have a 'close'
   finalizer
   > finalizer(x)
   [1] "close"
   > # if not, set it to 'close' inorder to not let slaves delete x on slave
   shutdown
   > finalizer(x) <- "close"
   > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
   R Version:  R version 2.15.0 (2012-03-30)
   snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs.
   > sfLibrary(ff)
   Library ff loaded.
   Library ff loaded in cluster.
   Warnmeldung:
   In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts =
   TRUE,  :
 'keep.source' is deprecated and will be ignored
   > sfExport("x") # note: do not export the same ff multiple times
   > # explicitely opening avoids a gc problem
   > sfClusterEval(open(x, caching="mmeachflush")) # opening with 'mmeachflush'
   inststead of 'mmnoflush' is a bit slower but prevents OS write storms when
   the file is larger than RAM
   [[1]]
   [1] TRUE
   [[2]]
   [1] TRUE
   > system.time(
   + sfLapply( chunk(x, length=ncpus), function(i){
   +   x[i] <- runif(sum(i))
   +   invisible()
   + })
   + )
  User  System verstrichen
  0.000.00   30.78
   > system.time(
   + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05,
   0.95)) )
   + )
  User  System verstrichen
  0.000.004.38
   > # for completeness
   > sfClusterEval(close(x))
   [[1]]
   [1] TRUE
   [[2]]
   [1] TRUE
   > csummary(s)
5%  95%
   Min.0.04998 0.95
   1st Qu. 0.04999 0.95
   Median  0.05001 0.95
   Mean0.05001 0.95
   3rd Qu. 0.05002 0.95
   Max.0.05003 0.95
   > # stop slaves
   > sfStop()
   Stopping cluster
   >  # with the close finalizer we are responsible for deleting the file
   explicitely (unless we want to keep it)
   > delete(x)
   [1] TRUE
   > # remove r-side metadata
   > rm(x)
   > # truly free memory
   > gc()
   Gesendet: Donnerstag, 03. Mai 2012 um 00:23 Uhr
   Von: "Jonathan Greenberg" 
   An: r-help , r-sig-...@r-project.org
   Betreff: [R-sig-hpc] Quickest way to make a large "empty" file on disk?
   R-helpers:
   What would be the absolute fastest way to make a large "empty" file (e.g.
   filled with all zeroes) on disk, given a byte size and a given number
   number of empty values. I know I can use writeBin, but the "object" in
   this case may be far too large to store in main memory. I'm asking because
   I'm going to use this file in conjunction with mmap to do parallel writes
   to this file. Say, I want to create a blank file of 10,000 floating point
   numbers.
   Thanks!
   --j
   --
   Jonathan A. Greenberg, PhD
   Assistant Professor
   Department of Geography and Geographic Information Science
   University of Illinois at Urbana-Champaign
   607 South Mathews Avenue, MC 150
   Urbana, IL 61801
   Phone: 415-763-5476
   AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
   [1]http://www.geog.illinois.edu/people/JonathanGreenberg.html
   [[alternative HTML version deleted]]
   ___
   R-sig-hpc mailing list
   r-sig-...@r-project.org
   [2]https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

References

   1. http://www.geog.illinois.edu/people/JonathanGreenberg.html
   2. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-05-02 Thread Denham Robert
Jonathon,
10,000 numbers is pretty small, so I don't think time will be a
big problem. You could write this using writeBin with no problems. For
larger files, why not just use a loop? The writing is pretty fast, so I
don't think you'll have too many problems. 

On my machine:

> ptm <- proc.time()
> zz <- file("testbin.bin", "wb")
> for(i in 10) writeBin(rep(0,1),zz, size=16)
> close(zz)
> proc.time() - ptm
   user  system elapsed 
  2.416   1.728  16.705 
 
Otherwise I would suggest writing a little piece of c code to do what
you want.

Robert
  

-Original Message-
From: r-sig-hpc-boun...@r-project.org
[mailto:r-sig-hpc-boun...@r-project.org] On Behalf Of Jonathan Greenberg
Sent: Thursday, 3 May 2012 8:24 AM
To: r-help; r-sig-...@r-project.org
Subject: [R-sig-hpc] Quickest way to make a large "empty" file on disk?

R-helpers:

What would be the absolute fastest way to make a large "empty" file
(e.g.
filled with all zeroes) on disk, given a byte size and a given number
number of empty values.  I know I can use writeBin, but the "object" in
this case may be far too large to store in main memory.  I'm asking
because I'm going to use this file in conjunction with mmap to do
parallel writes to this file.  Say, I want to create a blank file of
10,000 floating point numbers.

Thanks!

--j

--
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography and Geographic Information Science University of
Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
http://www.geog.illinois.edu/people/JonathanGreenberg.html

[[alternative HTML version deleted]]

___
R-sig-hpc mailing list
r-sig-...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc


--
The information in this email together with any attachments is intended only 
for the person or entity to which it is addressed and may contain confidential 
and/or privileged material. There is no waiver of any confidentiality/privilege 
by your inadvertent receipt of this material. 
Any form of review, disclosure, modification, distribution and/or publication 
of this email message is prohibited, unless as a necessary part of Departmental 
business.
If you have received this message in error, you are asked to inform the sender 
as quickly as possible and delete this message and any copies of this message 
from your computer and/or your computer system network.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-05-02 Thread Jeff Newmiller
On most UNIX systems this will leave a large unallocated virtual "hole" in the 
file. If you are not bothered by spreading the allocation task out over the 
program execution interval, this won't matter and will probably give the best 
performance.  However, if you wanted to benchmark your algorithms without the 
erratic filesystem updates mixed in, then you need to write all of those 
zeroes. For that to work most efficiently, write data in large blocks, and if 
possible bypass the C standard library.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Simon Urbanek  wrote:

>
>On May 2, 2012, at 6:23 PM, Jonathan Greenberg wrote:
>
>> R-helpers:
>> 
>> What would be the absolute fastest way to make a large "empty" file
>(e.g.
>> filled with all zeroes) on disk, given a byte size and a given number
>> number of empty values.  I know I can use writeBin, but the "object"
>in
>> this case may be far too large to store in main memory.  I'm asking
>because
>> I'm going to use this file in conjunction with mmap to do parallel
>writes
>> to this file.  Say, I want to create a blank file of 10,000 floating
>point
>> numbers.
>> 
>
>The most trivial way is to simply seek to the end and write a byte:
>
>> n=10
>>  f=file("foo","wb")
>> seek(f,n-1)
>[1] 0
>> writeBin(raw(1),f)
>> close(f)
>> file.info("foo")$size
>[1] 1e+05
>
>Cheers,
>Simon
>
>
>> Thanks!
>> 
>> --j
>> 
>> -- 
>> Jonathan A. Greenberg, PhD
>> Assistant Professor
>> Department of Geography and Geographic Information Science
>> University of Illinois at Urbana-Champaign
>> 607 South Mathews Avenue, MC 150
>> Urbana, IL 61801
>> Phone: 415-763-5476
>> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype:
>jgrn3007
>> http://www.geog.illinois.edu/people/JonathanGreenberg.html
>> 
>>  [[alternative HTML version deleted]]
>> 
>> ___
>> R-sig-hpc mailing list
>> r-sig-...@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>> 
>>
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-05-02 Thread Simon Urbanek

On May 2, 2012, at 6:23 PM, Jonathan Greenberg wrote:

> R-helpers:
> 
> What would be the absolute fastest way to make a large "empty" file (e.g.
> filled with all zeroes) on disk, given a byte size and a given number
> number of empty values.  I know I can use writeBin, but the "object" in
> this case may be far too large to store in main memory.  I'm asking because
> I'm going to use this file in conjunction with mmap to do parallel writes
> to this file.  Say, I want to create a blank file of 10,000 floating point
> numbers.
> 

The most trivial way is to simply seek to the end and write a byte:

> n=10
>  f=file("foo","wb")
> seek(f,n-1)
[1] 0
> writeBin(raw(1),f)
> close(f)
> file.info("foo")$size
[1] 1e+05

Cheers,
Simon


> Thanks!
> 
> --j
> 
> -- 
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 607 South Mathews Avenue, MC 150
> Urbana, IL 61801
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
> http://www.geog.illinois.edu/people/JonathanGreenberg.html
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-hpc mailing list
> r-sig-...@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-05-02 Thread Jeff Ryan
Something like:

http://markus.revti.com/2007/06/creating-empty-file-with-specified-size/

Is one way I know of. 

Jeff

Jeffrey Ryan|Founder|jeffrey.r...@lemnica.com

www.lemnica.com

On May 2, 2012, at 5:23 PM, Jonathan Greenberg  wrote:

> R-helpers:
> 
> What would be the absolute fastest way to make a large "empty" file (e.g.
> filled with all zeroes) on disk, given a byte size and a given number
> number of empty values.  I know I can use writeBin, but the "object" in
> this case may be far too large to store in main memory.  I'm asking because
> I'm going to use this file in conjunction with mmap to do parallel writes
> to this file.  Say, I want to create a blank file of 10,000 floating point
> numbers.
> 
> Thanks!
> 
> --j
> 
> -- 
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 607 South Mathews Avenue, MC 150
> Urbana, IL 61801
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
> http://www.geog.illinois.edu/people/JonathanGreenberg.html
> 
>[[alternative HTML version deleted]]
> 
> ___
> R-sig-hpc mailing list
> r-sig-...@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-05-02 Thread Jeff Ryan
Look at the man page for dd (assuming you are on *nix)

A quick google will get you a command to try. I'm not at my desk or I would as 
well. 

Jeff

Jeffrey Ryan|Founder|jeffrey.r...@lemnica.com

www.lemnica.com

On May 2, 2012, at 5:23 PM, Jonathan Greenberg  wrote:

> R-helpers:
> 
> What would be the absolute fastest way to make a large "empty" file (e.g.
> filled with all zeroes) on disk, given a byte size and a given number
> number of empty values.  I know I can use writeBin, but the "object" in
> this case may be far too large to store in main memory.  I'm asking because
> I'm going to use this file in conjunction with mmap to do parallel writes
> to this file.  Say, I want to create a blank file of 10,000 floating point
> numbers.
> 
> Thanks!
> 
> --j
> 
> -- 
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 607 South Mathews Avenue, MC 150
> Urbana, IL 61801
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
> http://www.geog.illinois.edu/people/JonathanGreenberg.html
> 
>[[alternative HTML version deleted]]
> 
> ___
> R-sig-hpc mailing list
> r-sig-...@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.