Re: [R] Reading File Sizes: very slow!

2021-09-27 Thread Rui Barradas

Hello,

R 4.1.0 on Ubuntu 20.04, sessionInfo at the end.

I'm arriving a bit late to this thread but here are the timings I'm 
getting on an 10+ years old PC.


1. I am not getting anything even close to 5 or 10 mins running times.
2. Like Bill said, there seems to be a caching effect, the first runs 
are consistently slower. And this is Ubuntu, not Windows, so different 
OS's present the same behavior. It's not unexpected, disk accesses are 
slow operations and have been cached for a while now.
3. I am not at all sure if this is relevant but as for how to clean the 
Windows File Explorer cache, open a File Explorer window and click


View > Options > (Privacy section) Clear

4. Now for my timings. The cache effect is large, from 23s down to 2.5s.
But even with an old PC nowhere near 300s or 500s.

rui@rui:~$ R -q -f rhelp.R
#
# functions size.pkg and size.f.pkg omitted
#
> R_LIBS_USER <- Sys.getenv("R_LIBS_USER")
>
> cat("\nLeonard Mada's code:\n\n")

Leonard Mada's code:

> system.time({
+ x = size.pkg(path=R_LIBS_USER, file=NULL)
+ })
   user  system elapsed
  1.700   0.988  23.339
> system.time({
+ x = size.pkg(path=R_LIBS_USER, file=NULL)
+ })
   user  system elapsed
  1.578   0.921   2.540
> system.time({
+ x = size.pkg(path=R_LIBS_USER, file=NULL)
+ })
   user  system elapsed
  1.542   0.949   2.523
>
> cat("\nBill Dunlap's code:\n\n")

Bill Dunlap's code:

> system.time(L1 <- size.f.pkg(R_LIBS_USER))
   user  system elapsed
  1.608   0.887   2.538
> system.time(L2 <- size.f.pkg(R_LIBS_USER))
   user  system elapsed
  1.515   0.982   2.510
> identical(L1,L2)
[1] TRUE
> length(L1)
[1] 1773
> length(dir(R_LIBS_USER,recursive=TRUE))
[1] 85204
>
> cat("\n\nsessionInfo return value:\n\n")


sessionInfo return value:

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=pt_PT.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8
 [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8
 [7] LC_PAPER=pt_PT.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.1.1


And the sapply code.


rui@rui:~$ R -q -f rhelp2.R
> R_LIBS_USER <- Sys.getenv("R_LIBS_USER")
> path <- R_LIBS_USER
> system.time({
+ sapply(list.dirs(path=path, full.name=F, recursive=F),
+  function(f) length(list.files(path = file.path(path, f),
+ full.names = FALSE, recursive = TRUE)))
+ })
   user  system elapsed
  0.802   0.901  15.964
>
>
rui@rui:~$ R -q -f rhelp2.R
> R_LIBS_USER <- Sys.getenv("R_LIBS_USER")
> path <- R_LIBS_USER
> system.time({
+ sapply(list.dirs(path=path, full.name=F, recursive=F),
+  function(f) length(list.files(path = file.path(path, f),
+ full.names = FALSE, recursive = TRUE)))
+ })
   user  system elapsed
  0.730   0.528   1.264


Once again the 2nd run took a fraction of the 1st run.

Leonard, if you are getting those timings, is there another process 
running or that has previously run and eat up the cache?


Hope this helps,

Rui Barradas

Às 23:31 de 26/09/21, Leonard Mada via R-help escreveu:


On 9/27/2021 1:06 AM, Leonard Mada wrote:


Dear Bill,


Does list.files() always sort the results?

It seems so. The option: full.names = FALSE does not have any effect:
the results seem always sorted.


Maybe it is better to process the files in an unsorted order: as
stored on the disk?



After some more investigations:

This took only a few seconds:

sapply(list.dirs(path=path, full.name=F, recursive=F),
      function(f) length(list.files(path = paste0(path, "/", f),
full.names = FALSE, recursive = TRUE)))

# maybe with caching, but the difference is enormous


Seems BH contains *by far* the most files: 11701 files.

But excluding it from processing did have only a liniar effect: still 377 s.


I had a look at src/main/platform.c, but do not fully understand it.


Sincerely,


Leonard




Sincerely,


Leonard


On 9/25/2021 8:13 PM, Bill Dunlap wrote:

On my Windows 10 laptop I see evidence of the operating system
caching information about recently accessed files.  This makes it
hard to say how the speed might be improved.  Is there a way to clear
this cache?


system.time(L1 <- size.f.pkg(R.home("library")))

    user  system elapsed
    0.48    2.81   30.42

system.time(L2 <- size.f.pkg(R.home("library")))

    user  system elapsed
    0.35    1.10    1.43

identical(L1,L2)

[1] TRUE

length(L1)

[1] 30

length(dir(R.home("library"),recursive=TRUE))

[1] 12949

On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help
mailto:r-help@r-project.org>> wrote:

 Dear List Members,


 I tried to compute the file siz

Re: [R] Reading File Sizes: very slow!

2021-09-26 Thread Leonard Mada via R-help


On 9/27/2021 1:06 AM, Leonard Mada wrote:
>
> Dear Bill,
>
>
> Does list.files() always sort the results?
>
> It seems so. The option: full.names = FALSE does not have any effect: 
> the results seem always sorted.
>
>
> Maybe it is better to process the files in an unsorted order: as 
> stored on the disk?
>

After some more investigations:

This took only a few seconds:

sapply(list.dirs(path=path, full.name=F, recursive=F),
     function(f) length(list.files(path = paste0(path, "/", f), 
full.names = FALSE, recursive = TRUE)))

# maybe with caching, but the difference is enormous


Seems BH contains *by far* the most files: 11701 files.

But excluding it from processing did have only a liniar effect: still 377 s.


I had a look at src/main/platform.c, but do not fully understand it.


Sincerely,


Leonard


>
> Sincerely,
>
>
> Leonard
>
>
> On 9/25/2021 8:13 PM, Bill Dunlap wrote:
>> On my Windows 10 laptop I see evidence of the operating system 
>> caching information about recently accessed files.  This makes it 
>> hard to say how the speed might be improved.  Is there a way to clear 
>> this cache?
>>
>> > system.time(L1 <- size.f.pkg(R.home("library")))
>>    user  system elapsed
>>    0.48    2.81   30.42
>> > system.time(L2 <- size.f.pkg(R.home("library")))
>>    user  system elapsed
>>    0.35    1.10    1.43
>> > identical(L1,L2)
>> [1] TRUE
>> > length(L1)
>> [1] 30
>> > length(dir(R.home("library"),recursive=TRUE))
>> [1] 12949
>>
>> On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help 
>> mailto:r-help@r-project.org>> wrote:
>>
>> Dear List Members,
>>
>>
>> I tried to compute the file sizes of each installed package and the
>> process is terribly slow.
>>
>> It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.
>>
>>
>> 1.) Package Sizes
>>
>>
>> system.time({
>>      x = size.pkg(file=NULL);
>> })
>> # elapsed time: 509 s !!!
>> # 512 Packages; 1.64 GB;
>> # R 4.1.1 on MS Windows 10
>>
>>
>> The code for the size.pkg() function is below and the latest
>> version is
>> on Github:
>>
>> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
>> 
>>
>>
>> Questions:
>> Is there a way to get the file size faster?
>> It takes long on Windows as well, but of the order of 10-20 s,
>> not 10
>> minutes.
>> Do I miss something?
>>
>>
>> 1.b.) Alternative
>>
>> It came to my mind to read first all file sizes and then use
>> tapply or
>> aggregate - but I do not see why it should be faster.
>>
>> Would it be meaningful to benchmark each individual package?
>>
>> Although I am not very inclined to wait 10 minutes for each new
>> try out.
>>
>>
>> 2.) Big Packages
>>
>> Just as a note: there are a few very large packages (in my list
>> of 512
>> packages):
>>
>> 1  123,566,287   BH
>> 2  113,578,391   sf
>> 3  112,252,652    rgdal
>> 4   81,144,868   magick
>> 5   77,791,374 openNLPmodels.en
>>
>> I suspect that sf & rgdal have a lot of duplicated data structures
>> and/or duplicate code and/or duplicated libraries - although I am
>> not an
>> expert in the field and did not check the sources.
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>> ===
>>
>>
>> # Package Size:
>> size.f.pkg = function(path=NULL) {
>>  if(is.null(path)) path = R.home("library");
>>  xd = list.dirs(path = path, full.names = FALSE, recursive =
>> FALSE);
>>  size.f = function(p) {
>>      p = paste0(path, "/", p);
>>      sum(file.info (list.files(path=p,
>> pattern=".",
>>          full.names = TRUE, all.files = TRUE, recursive =
>> TRUE))$size);
>>  }
>>  sapply(xd, size.f);
>> }
>>
>> size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
>>  x = size.f.pkg(path=path);
>>  x = as.data.frame(x);
>>  names(x) = "Size"
>>  x$Name = rownames(x);
>>  # Order
>>  if(sort) {
>>      id = order(x$Size, decreasing=TRUE)
>>      x = x[id,];
>>  }
>>  if( ! is.null(file)) {
>>      if( ! is.character(file)) {
>>          print("Error: Size NOT written to file!");
>>      } else write.csv(x, file=file, row.names=FALSE);
>>  }
>>  return(x);
>> }
>>
>> __
>> R-help@r-project.org  mailing list
>> -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> 
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> 
>> and provide commented, minimal,

Re: [R] Reading File Sizes: very slow!

2021-09-26 Thread Leonard Mada via R-help
Dear Bill,


Does list.files() always sort the results?

It seems so. The option: full.names = FALSE does not have any effect: 
the results seem always sorted.


Maybe it is better to process the files in an unsorted order: as stored 
on the disk?


Sincerely,


Leonard


On 9/25/2021 8:13 PM, Bill Dunlap wrote:
> On my Windows 10 laptop I see evidence of the operating system caching 
> information about recently accessed files.  This makes it hard to say 
> how the speed might be improved.  Is there a way to clear this cache?
>
> > system.time(L1 <- size.f.pkg(R.home("library")))
>    user  system elapsed
>    0.48    2.81   30.42
> > system.time(L2 <- size.f.pkg(R.home("library")))
>    user  system elapsed
>    0.35    1.10    1.43
> > identical(L1,L2)
> [1] TRUE
> > length(L1)
> [1] 30
> > length(dir(R.home("library"),recursive=TRUE))
> [1] 12949
>
> On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help 
> mailto:r-help@r-project.org>> wrote:
>
> Dear List Members,
>
>
> I tried to compute the file sizes of each installed package and the
> process is terribly slow.
>
> It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.
>
>
> 1.) Package Sizes
>
>
> system.time({
>      x = size.pkg(file=NULL);
> })
> # elapsed time: 509 s !!!
> # 512 Packages; 1.64 GB;
> # R 4.1.1 on MS Windows 10
>
>
> The code for the size.pkg() function is below and the latest
> version is
> on Github:
>
> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
> 
>
>
> Questions:
> Is there a way to get the file size faster?
> It takes long on Windows as well, but of the order of 10-20 s, not 10
> minutes.
> Do I miss something?
>
>
> 1.b.) Alternative
>
> It came to my mind to read first all file sizes and then use
> tapply or
> aggregate - but I do not see why it should be faster.
>
> Would it be meaningful to benchmark each individual package?
>
> Although I am not very inclined to wait 10 minutes for each new
> try out.
>
>
> 2.) Big Packages
>
> Just as a note: there are a few very large packages (in my list of
> 512
> packages):
>
> 1  123,566,287   BH
> 2  113,578,391   sf
> 3  112,252,652    rgdal
> 4   81,144,868   magick
> 5   77,791,374 openNLPmodels.en
>
> I suspect that sf & rgdal have a lot of duplicated data structures
> and/or duplicate code and/or duplicated libraries - although I am
> not an
> expert in the field and did not check the sources.
>
>
> Sincerely,
>
>
> Leonard
>
> ===
>
>
> # Package Size:
> size.f.pkg = function(path=NULL) {
>  if(is.null(path)) path = R.home("library");
>  xd = list.dirs(path = path, full.names = FALSE, recursive =
> FALSE);
>  size.f = function(p) {
>      p = paste0(path, "/", p);
>      sum(file.info (list.files(path=p,
> pattern=".",
>          full.names = TRUE, all.files = TRUE, recursive =
> TRUE))$size);
>  }
>  sapply(xd, size.f);
> }
>
> size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
>  x = size.f.pkg(path=path);
>  x = as.data.frame(x);
>  names(x) = "Size"
>  x$Name = rownames(x);
>  # Order
>  if(sort) {
>      id = order(x$Size, decreasing=TRUE)
>      x = x[id,];
>  }
>  if( ! is.null(file)) {
>      if( ! is.character(file)) {
>          print("Error: Size NOT written to file!");
>      } else write.csv(x, file=file, row.names=FALSE);
>  }
>  return(x);
> }
>
> __
> R-help@r-project.org  mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> 
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> 
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading File Sizes: very slow!

2021-09-26 Thread Jiefei Wang
What kind of disk do you use? The hardware differences might be important
to this issue.

Best,
Jiefei

Leonard Mada via R-help  于 2021年9月26日周日 下午9:04写道:

> Dear Bill,
>
>
> - using the Ms Windows Properties: ~ 15 s;
>
> [Windows new start, 1st operation, bulk size]
>
> - using R / file.info() (2nd operation): still 523.6 s
>
> [and R seems mostly unresponsive during this time]
>
>
> Unfortunately, I do not know how to clear any cache.
>
> [The cache may play a role only for smaller sizes? But I am rather not
> inclined to run the ~ 10 minutes procedure multiple times.]
>
>
> Sincerely,
>
>
> Leonard
>
>
> On 9/26/2021 5:49 AM, Richard O'Keefe wrote:
> > On a $150 second-hand laptop with 0.9GB of library,
> > and a single-user installation of R so only one place to look
> > LIBRARY=$HOME/R/x86_64-pc-linux-gnu-library/4.0
> > cd $LIBRARY
> > echo "kbytes package"
> > du -sk * | sort -k1n
> >
> > took 150 msec to report the disc space needed for every package.
> >
> > That'
> >
> > On Sun, 26 Sept 2021 at 06:14, Bill Dunlap 
> wrote:
> >> On my Windows 10 laptop I see evidence of the operating system caching
> >> information about recently accessed files.  This makes it hard to say
> how
> >> the speed might be improved.  Is there a way to clear this cache?
> >>
> >>> system.time(L1 <- size.f.pkg(R.home("library")))
> >> user  system elapsed
> >> 0.482.81   30.42
> >>> system.time(L2 <- size.f.pkg(R.home("library")))
> >> user  system elapsed
> >> 0.351.101.43
> >>> identical(L1,L2)
> >> [1] TRUE
> >>> length(L1)
> >> [1] 30
> >>> length(dir(R.home("library"),recursive=TRUE))
> >> [1] 12949
> >>
> >> On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help <
> >> r-help@r-project.org> wrote:
> >>
> >>> Dear List Members,
> >>>
> >>>
> >>> I tried to compute the file sizes of each installed package and the
> >>> process is terribly slow.
> >>>
> >>> It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.
> >>>
> >>>
> >>> 1.) Package Sizes
> >>>
> >>>
> >>> system.time({
> >>>   x = size.pkg(file=NULL);
> >>> })
> >>> # elapsed time: 509 s !!!
> >>> # 512 Packages; 1.64 GB;
> >>> # R 4.1.1 on MS Windows 10
> >>>
> >>>
> >>> The code for the size.pkg() function is below and the latest version is
> >>> on Github:
> >>>
> >>> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
> >>>
> >>>
> >>> Questions:
> >>> Is there a way to get the file size faster?
> >>> It takes long on Windows as well, but of the order of 10-20 s, not 10
> >>> minutes.
> >>> Do I miss something?
> >>>
> >>>
> >>> 1.b.) Alternative
> >>>
> >>> It came to my mind to read first all file sizes and then use tapply or
> >>> aggregate - but I do not see why it should be faster.
> >>>
> >>> Would it be meaningful to benchmark each individual package?
> >>>
> >>> Although I am not very inclined to wait 10 minutes for each new try
> out.
> >>>
> >>>
> >>> 2.) Big Packages
> >>>
> >>> Just as a note: there are a few very large packages (in my list of 512
> >>> packages):
> >>>
> >>> 1  123,566,287   BH
> >>> 2  113,578,391   sf
> >>> 3  112,252,652rgdal
> >>> 4   81,144,868   magick
> >>> 5   77,791,374 openNLPmodels.en
> >>>
> >>> I suspect that sf & rgdal have a lot of duplicated data structures
> >>> and/or duplicate code and/or duplicated libraries - although I am not
> an
> >>> expert in the field and did not check the sources.
> >>>
> >>>
> >>> Sincerely,
> >>>
> >>>
> >>> Leonard
> >>>
> >>> ===
> >>>
> >>>
> >>> # Package Size:
> >>> size.f.pkg = function(path=NULL) {
> >>>   if(is.null(path)) path = R.home("library");
> >>>   xd = list.dirs(path = path, full.names = FALSE, recursive =
> FALSE);
> >>>   size.f = function(p) {
> >>>   p = paste0(path, "/", p);
> >>>   sum(file.info(list.files(path=p, pattern=".",
> >>>   full.names = TRUE, all.files = TRUE, recursive =
> TRUE))$size);
> >>>   }
> >>>   sapply(xd, size.f);
> >>> }
> >>>
> >>> size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
> >>>   x = size.f.pkg(path=path);
> >>>   x = as.data.frame(x);
> >>>   names(x) = "Size"
> >>>   x$Name = rownames(x);
> >>>   # Order
> >>>   if(sort) {
> >>>   id = order(x$Size, decreasing=TRUE)
> >>>   x = x[id,];
> >>>   }
> >>>   if( ! is.null(file)) {
> >>>   if( ! is.character(file)) {
> >>>   print("Error: Size NOT written to file!");
> >>>   } else write.csv(x, file=file, row.names=FALSE);
> >>>   }
> >>>   return(x);
> >>> }
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>  [[alternat

Re: [R] Reading File Sizes: very slow!

2021-09-26 Thread Leonard Mada via R-help

Dear Bill,


- using the Ms Windows Properties: ~ 15 s;

[Windows new start, 1st operation, bulk size]

- using R / file.info() (2nd operation): still 523.6 s

[and R seems mostly unresponsive during this time]


Unfortunately, I do not know how to clear any cache.

[The cache may play a role only for smaller sizes? But I am rather not 
inclined to run the ~ 10 minutes procedure multiple times.]



Sincerely,


Leonard


On 9/26/2021 5:49 AM, Richard O'Keefe wrote:

On a $150 second-hand laptop with 0.9GB of library,
and a single-user installation of R so only one place to look
LIBRARY=$HOME/R/x86_64-pc-linux-gnu-library/4.0
cd $LIBRARY
echo "kbytes package"
du -sk * | sort -k1n

took 150 msec to report the disc space needed for every package.

That'

On Sun, 26 Sept 2021 at 06:14, Bill Dunlap  wrote:

On my Windows 10 laptop I see evidence of the operating system caching
information about recently accessed files.  This makes it hard to say how
the speed might be improved.  Is there a way to clear this cache?


system.time(L1 <- size.f.pkg(R.home("library")))

user  system elapsed
0.482.81   30.42

system.time(L2 <- size.f.pkg(R.home("library")))

user  system elapsed
0.351.101.43

identical(L1,L2)

[1] TRUE

length(L1)

[1] 30

length(dir(R.home("library"),recursive=TRUE))

[1] 12949

On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help <
r-help@r-project.org> wrote:


Dear List Members,


I tried to compute the file sizes of each installed package and the
process is terribly slow.

It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.


1.) Package Sizes


system.time({
  x = size.pkg(file=NULL);
})
# elapsed time: 509 s !!!
# 512 Packages; 1.64 GB;
# R 4.1.1 on MS Windows 10


The code for the size.pkg() function is below and the latest version is
on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R


Questions:
Is there a way to get the file size faster?
It takes long on Windows as well, but of the order of 10-20 s, not 10
minutes.
Do I miss something?


1.b.) Alternative

It came to my mind to read first all file sizes and then use tapply or
aggregate - but I do not see why it should be faster.

Would it be meaningful to benchmark each individual package?

Although I am not very inclined to wait 10 minutes for each new try out.


2.) Big Packages

Just as a note: there are a few very large packages (in my list of 512
packages):

1  123,566,287   BH
2  113,578,391   sf
3  112,252,652rgdal
4   81,144,868   magick
5   77,791,374 openNLPmodels.en

I suspect that sf & rgdal have a lot of duplicated data structures
and/or duplicate code and/or duplicated libraries - although I am not an
expert in the field and did not check the sources.


Sincerely,


Leonard

===


# Package Size:
size.f.pkg = function(path=NULL) {
  if(is.null(path)) path = R.home("library");
  xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE);
  size.f = function(p) {
  p = paste0(path, "/", p);
  sum(file.info(list.files(path=p, pattern=".",
  full.names = TRUE, all.files = TRUE, recursive = TRUE))$size);
  }
  sapply(xd, size.f);
}

size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
  x = size.f.pkg(path=path);
  x = as.data.frame(x);
  names(x) = "Size"
  x$Name = rownames(x);
  # Order
  if(sort) {
  id = order(x$Size, decreasing=TRUE)
  x = x[id,];
  }
  if( ! is.null(file)) {
  if( ! is.character(file)) {
  print("Error: Size NOT written to file!");
  } else write.csv(x, file=file, row.names=FALSE);
  }
  return(x);
}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading File Sizes: very slow!

2021-09-25 Thread Richard O'Keefe
On a $150 second-hand laptop with 0.9GB of library,
and a single-user installation of R so only one place to look
LIBRARY=$HOME/R/x86_64-pc-linux-gnu-library/4.0
cd $LIBRARY
echo "kbytes package"
du -sk * | sort -k1n

took 150 msec to report the disc space needed for every package.

That'

On Sun, 26 Sept 2021 at 06:14, Bill Dunlap  wrote:
>
> On my Windows 10 laptop I see evidence of the operating system caching
> information about recently accessed files.  This makes it hard to say how
> the speed might be improved.  Is there a way to clear this cache?
>
> > system.time(L1 <- size.f.pkg(R.home("library")))
>user  system elapsed
>0.482.81   30.42
> > system.time(L2 <- size.f.pkg(R.home("library")))
>user  system elapsed
>0.351.101.43
> > identical(L1,L2)
> [1] TRUE
> > length(L1)
> [1] 30
> > length(dir(R.home("library"),recursive=TRUE))
> [1] 12949
>
> On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help <
> r-help@r-project.org> wrote:
>
> > Dear List Members,
> >
> >
> > I tried to compute the file sizes of each installed package and the
> > process is terribly slow.
> >
> > It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.
> >
> >
> > 1.) Package Sizes
> >
> >
> > system.time({
> >  x = size.pkg(file=NULL);
> > })
> > # elapsed time: 509 s !!!
> > # 512 Packages; 1.64 GB;
> > # R 4.1.1 on MS Windows 10
> >
> >
> > The code for the size.pkg() function is below and the latest version is
> > on Github:
> >
> > https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
> >
> >
> > Questions:
> > Is there a way to get the file size faster?
> > It takes long on Windows as well, but of the order of 10-20 s, not 10
> > minutes.
> > Do I miss something?
> >
> >
> > 1.b.) Alternative
> >
> > It came to my mind to read first all file sizes and then use tapply or
> > aggregate - but I do not see why it should be faster.
> >
> > Would it be meaningful to benchmark each individual package?
> >
> > Although I am not very inclined to wait 10 minutes for each new try out.
> >
> >
> > 2.) Big Packages
> >
> > Just as a note: there are a few very large packages (in my list of 512
> > packages):
> >
> > 1  123,566,287   BH
> > 2  113,578,391   sf
> > 3  112,252,652rgdal
> > 4   81,144,868   magick
> > 5   77,791,374 openNLPmodels.en
> >
> > I suspect that sf & rgdal have a lot of duplicated data structures
> > and/or duplicate code and/or duplicated libraries - although I am not an
> > expert in the field and did not check the sources.
> >
> >
> > Sincerely,
> >
> >
> > Leonard
> >
> > ===
> >
> >
> > # Package Size:
> > size.f.pkg = function(path=NULL) {
> >  if(is.null(path)) path = R.home("library");
> >  xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE);
> >  size.f = function(p) {
> >  p = paste0(path, "/", p);
> >  sum(file.info(list.files(path=p, pattern=".",
> >  full.names = TRUE, all.files = TRUE, recursive = TRUE))$size);
> >  }
> >  sapply(xd, size.f);
> > }
> >
> > size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
> >  x = size.f.pkg(path=path);
> >  x = as.data.frame(x);
> >  names(x) = "Size"
> >  x$Name = rownames(x);
> >  # Order
> >  if(sort) {
> >  id = order(x$Size, decreasing=TRUE)
> >  x = x[id,];
> >  }
> >  if( ! is.null(file)) {
> >  if( ! is.character(file)) {
> >  print("Error: Size NOT written to file!");
> >  } else write.csv(x, file=file, row.names=FALSE);
> >  }
> >  return(x);
> > }
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading File Sizes: very slow!

2021-09-25 Thread Bill Dunlap
On my Windows 10 laptop I see evidence of the operating system caching
information about recently accessed files.  This makes it hard to say how
the speed might be improved.  Is there a way to clear this cache?

> system.time(L1 <- size.f.pkg(R.home("library")))
   user  system elapsed
   0.482.81   30.42
> system.time(L2 <- size.f.pkg(R.home("library")))
   user  system elapsed
   0.351.101.43
> identical(L1,L2)
[1] TRUE
> length(L1)
[1] 30
> length(dir(R.home("library"),recursive=TRUE))
[1] 12949

On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help <
r-help@r-project.org> wrote:

> Dear List Members,
>
>
> I tried to compute the file sizes of each installed package and the
> process is terribly slow.
>
> It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.
>
>
> 1.) Package Sizes
>
>
> system.time({
>  x = size.pkg(file=NULL);
> })
> # elapsed time: 509 s !!!
> # 512 Packages; 1.64 GB;
> # R 4.1.1 on MS Windows 10
>
>
> The code for the size.pkg() function is below and the latest version is
> on Github:
>
> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
>
>
> Questions:
> Is there a way to get the file size faster?
> It takes long on Windows as well, but of the order of 10-20 s, not 10
> minutes.
> Do I miss something?
>
>
> 1.b.) Alternative
>
> It came to my mind to read first all file sizes and then use tapply or
> aggregate - but I do not see why it should be faster.
>
> Would it be meaningful to benchmark each individual package?
>
> Although I am not very inclined to wait 10 minutes for each new try out.
>
>
> 2.) Big Packages
>
> Just as a note: there are a few very large packages (in my list of 512
> packages):
>
> 1  123,566,287   BH
> 2  113,578,391   sf
> 3  112,252,652rgdal
> 4   81,144,868   magick
> 5   77,791,374 openNLPmodels.en
>
> I suspect that sf & rgdal have a lot of duplicated data structures
> and/or duplicate code and/or duplicated libraries - although I am not an
> expert in the field and did not check the sources.
>
>
> Sincerely,
>
>
> Leonard
>
> ===
>
>
> # Package Size:
> size.f.pkg = function(path=NULL) {
>  if(is.null(path)) path = R.home("library");
>  xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE);
>  size.f = function(p) {
>  p = paste0(path, "/", p);
>  sum(file.info(list.files(path=p, pattern=".",
>  full.names = TRUE, all.files = TRUE, recursive = TRUE))$size);
>  }
>  sapply(xd, size.f);
> }
>
> size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
>  x = size.f.pkg(path=path);
>  x = as.data.frame(x);
>  names(x) = "Size"
>  x$Name = rownames(x);
>  # Order
>  if(sort) {
>  id = order(x$Size, decreasing=TRUE)
>  x = x[id,];
>  }
>  if( ! is.null(file)) {
>  if( ! is.character(file)) {
>  print("Error: Size NOT written to file!");
>  } else write.csv(x, file=file, row.names=FALSE);
>  }
>  return(x);
> }
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading File Sizes: very slow!

2021-09-25 Thread Leonard Mada via R-help

Dear List Members,


I tried to compute the file sizes of each installed package and the 
process is terribly slow.


It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.


1.) Package Sizes


system.time({
        x = size.pkg(file=NULL);
})
# elapsed time: 509 s !!!
# 512 Packages; 1.64 GB;
# R 4.1.1 on MS Windows 10


The code for the size.pkg() function is below and the latest version is 
on Github:


https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R


Questions:
Is there a way to get the file size faster?
It takes long on Windows as well, but of the order of 10-20 s, not 10 
minutes.

Do I miss something?


1.b.) Alternative

It came to my mind to read first all file sizes and then use tapply or 
aggregate - but I do not see why it should be faster.


Would it be meaningful to benchmark each individual package?

Although I am not very inclined to wait 10 minutes for each new try out.


2.) Big Packages

Just as a note: there are a few very large packages (in my list of 512 
packages):


1  123,566,287   BH
2  113,578,391   sf
3  112,252,652    rgdal
4   81,144,868   magick
5   77,791,374 openNLPmodels.en

I suspect that sf & rgdal have a lot of duplicated data structures 
and/or duplicate code and/or duplicated libraries - although I am not an 
expert in the field and did not check the sources.



Sincerely,


Leonard

===


# Package Size:
size.f.pkg = function(path=NULL) {
    if(is.null(path)) path = R.home("library");
    xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE);
    size.f = function(p) {
        p = paste0(path, "/", p);
        sum(file.info(list.files(path=p, pattern=".",
            full.names = TRUE, all.files = TRUE, recursive = TRUE))$size);
    }
    sapply(xd, size.f);
}

size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
    x = size.f.pkg(path=path);
    x = as.data.frame(x);
    names(x) = "Size"
    x$Name = rownames(x);
    # Order
    if(sort) {
        id = order(x$Size, decreasing=TRUE)
        x = x[id,];
    }
    if( ! is.null(file)) {
        if( ! is.character(file)) {
            print("Error: Size NOT written to file!");
        } else write.csv(x, file=file, row.names=FALSE);
    }
    return(x);
}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading file labels into R

2017-01-27 Thread PIKAL Petr
Hi

Did you try

?list.files

Cheers
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of WRAY
> NICHOLAS
> Sent: Friday, January 27, 2017 1:07 PM
> To: r-help ; r-help-request  project.org>
> Subject: [R] reading file labels into R
>
> Hello R-ren   I have a list of csv files in a folder which are labelled
> essentially in this way (actual data has scores of files)
>
> F010116, F020116, F030116
>
> G020116, G030116, G040116, G 050116
>
> H020116, H030116
>
> where F G and H are engines I've got data from and the numbers are the
> dates. I can manually make a "register" to create a paste label of each engine
> and the dates for which I have data, which, as in the example, are not the
> same for each engine, but I am wondering whether there's any way of
> getting R to read the labels from the folder so that it can then loop through
> successive engines and dates without being explicitly told what labels it will
> need to upload each csv file in turn
>
> If anyone has ideas I'd be grateful
>
> Thanks,Nick
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] reading file labels into R

2017-01-27 Thread WRAY NICHOLAS
Hello R-ren   I have a list of csv files in a folder which are labelled
essentially in this way (actual data has scores of files)

F010116, F020116, F030116

G020116, G030116, G040116, G 050116

H020116, H030116

where F G and H are engines I've got data from and the numbers are the dates. I
can manually make a "register" to create a paste label of each engine and the
dates for which I have data, which, as in the example, are not the same for each
engine, but I am wondering whether there's any way of getting R to read the
labels from the folder so that it can then loop through successive engines and
dates without being explicitly told what labels it will need to upload each csv
file in turn

If anyone has ideas I'd be grateful

Thanks,Nick
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading file in zip archive

2012-05-31 Thread David Winsemius


On May 31, 2012, at 6:11 AM, Iain Gallagher wrote:


Hi Phil

That's it. Thanks.

Will have a read at the docs now and see if I can figure out why  
leaving the 'r'ead instruction out works. Seems counter-intuitive!


It says that unz uses binary mode. You were specifying text mode. See  
if open="rb" is any more successful.


--
David.


Best

Iain




From: Phil Spector 
To: Iain Gallagher 
Cc: r-help 
Sent: Thursday, 31 May 2012, 0:06
Subject: Re: [R] reading file in zip archive

Iain -
   Do you see the same behaviour if you use

z <- unz(pathToZip, 'x.txt')

instead of

z <- unz(pathToZip, 'x.txt','r')

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
spec...@stat.berkeley.edu


On Wed, 30 May 2012, Iain Gallagher wrote:


Hi Phil

Thanks, but this still doesn't work.

Here's a reproducible example (was wrapping my head around these  
functions before).


x <- as.data.frame(cbind(rep('a',5), rep('b',5)))
y <- as.data.frame(cbind(rep('c',5), rep('d',5)))

write.table(x, 'x.txt', sep='\t', quote=FALSE)
write.table(y, 'y.txt', sep='\t', quote=FALSE)

zip('test.zip', files = c('x.txt', 'y.txt'))

pathToZip <- paste(getwd(), '/test.zip', sep='')

z <- unz(pathToZip, 'x.txt', 'r')
zT <- read.table(z, header=FALSE, sep='\t')

Error in read.table(z, header = FALSE, sep = "\t") :
  seek not enabled for this connection

As I said in my previous email readLines fails as well. Rather  
strange really.


Anyway, as before any advice would be appreciated.

Best

Iain

_
From: Phil Spector 
To: Iain Gallagher 
Cc: r-help 
Sent: Wednesday, 30 May 2012, 20:16
Subject: Re: [R] reading file in zip archive

Iain -
Once you specify the file to unzip in the call to unz, there's no
need to repeat the filename in read.table.  Try:

z <- unz(pathToZip, 'goCats.txt', 'r')
zT <- read.table(z, header=TRUE, sep='\t')

(Although I can't reproduce the exact error which you saw.)

- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spec...@stat.berkeley.edu



On Wed, 30 May 2012, Iain Gallagher wrote:


Hi List

I have a series of zip archives each containing several files. One  
of these files is called
goCats.txt and I would like to read it into R from the archive.  
It's a simple tab delimited text

file.
pathToZip <-'/home/iain/Documents/Work/Results/bovineMacRNAData/ 
deAnalysis/afInfection/commonNorm/twoHrs/af2

hrs.zip'


z <- unz(pathToZip, 'goCats.txt', 'r')
zT <- read.table(z, 'goCats.txt', header=T, sep='\t')

Error in read.table(z, "goCats.txt", header = T, sep = "\t") :
? seek not enabled for this connection


The same error arises with readLines.

Can anyone advise?

Best

iain


sessionInfo()

R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
?[1] LC_CTYPE=en_GB.utf8?? LC_NUMERIC=C
?[3] LC_TIME=en_GB.utf8??? LC_COLLATE=en_GB.utf8???
?[5] LC_MONETARY=en_GB.utf8??? LC_MESSAGES=en_GB.utf8??
?[7] LC_PAPER=C??? LC_NAME=C???
?[9] LC_ADDRESS=C? LC_TELEPHONE=C??
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C?

attached base packages:
[1] stats graphics? grDevices utils datasets? methods??  
base


loaded via a namespace (and not attached):
[1] tools_2.15.0
[[alternative HTML version deleted]]








[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading file in zip archive

2012-05-31 Thread Iain Gallagher
Hi Phil

That's it. Thanks.

Will have a read at the docs now and see if I can figure out why leaving the 
'r'ead instruction out works. Seems counter-intuitive!

Best

Iain




 From: Phil Spector 
To: Iain Gallagher  
Cc: r-help  
Sent: Thursday, 31 May 2012, 0:06
Subject: Re: [R] reading file in zip archive

Iain -
   Do you see the same behaviour if you use

z <- unz(pathToZip, 'x.txt')

instead of

z <- unz(pathToZip, 'x.txt','r')

                    - Phil Spector
                     Statistical Computing Facility
                     Department of Statistics
                     UC Berkeley
                    spec...@stat.berkeley.edu


On Wed, 30 May 2012, Iain Gallagher wrote:

> Hi Phil
> 
> Thanks, but this still doesn't work.
> 
> Here's a reproducible example (was wrapping my head around these functions 
> before).
> 
> x <- as.data.frame(cbind(rep('a',5), rep('b',5)))
> y <- as.data.frame(cbind(rep('c',5), rep('d',5)))
> 
> write.table(x, 'x.txt', sep='\t', quote=FALSE)
> write.table(y, 'y.txt', sep='\t', quote=FALSE)
> 
> zip('test.zip', files = c('x.txt', 'y.txt'))
> 
> pathToZip <- paste(getwd(), '/test.zip', sep='')
> 
> z <- unz(pathToZip, 'x.txt', 'r')
> zT <- read.table(z, header=FALSE, sep='\t')
> 
> Error in read.table(z, header = FALSE, sep = "\t") :
>   seek not enabled for this connection
> 
> As I said in my previous email readLines fails as well. Rather strange really.
> 
> Anyway, as before any advice would be appreciated.
> 
> Best
> 
> Iain
> 
> _
> From: Phil Spector 
> To: Iain Gallagher 
> Cc: r-help 
> Sent: Wednesday, 30 May 2012, 20:16
> Subject: Re: [R] reading file in zip archive
> 
> Iain -
>     Once you specify the file to unzip in the call to unz, there's no
> need to repeat the filename in read.table.  Try:
> 
> z <- unz(pathToZip, 'goCats.txt', 'r')
> zT <- read.table(z, header=TRUE, sep='\t')
> 
> (Although I can't reproduce the exact error which you saw.)
> 
>                     - Phil Spector
>                     Statistical Computing Facility
>                     Department of Statistics
>                     UC Berkeley
>                     spec...@stat.berkeley.edu
> 
> 
> 
> On Wed, 30 May 2012, Iain Gallagher wrote:
> 
> > Hi List
> >
> > I have a series of zip archives each containing several files. One of these 
> > files is called
> goCats.txt and I would like to read it into R from the archive. It's a simple 
> tab delimited text
> file.
> > pathToZip 
> > <-'/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/afInfection/commonNorm/twoHrs/af2
> hrs.zip'
> >
> > z <- unz(pathToZip, 'goCats.txt', 'r')
> > zT <- read.table(z, 'goCats.txt', header=T, sep='\t')
> >
> > Error in read.table(z, "goCats.txt", header = T, sep = "\t") :
> > ? seek not enabled for this connection
> >
> >
> > The same error arises with readLines.
> >
> > Can anyone advise?
> >
> > Best
> >
> > iain
> >
> >> sessionInfo()
> > R version 2.15.0 (2012-03-30)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> > ?[1] LC_CTYPE=en_GB.utf8?? LC_NUMERIC=C
> > ?[3] LC_TIME=en_GB.utf8??? LC_COLLATE=en_GB.utf8???
> > ?[5] LC_MONETARY=en_GB.utf8??? LC_MESSAGES=en_GB.utf8??
> > ?[7] LC_PAPER=C??? LC_NAME=C???
> > ?[9] LC_ADDRESS=C? LC_TELEPHONE=C??
> > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C?
> >
> > attached base packages:
> > [1] stats graphics? grDevices utils datasets? methods?? base
> >
> > loaded via a namespace (and not attached):
> > [1] tools_2.15.0
> >     [[alternative HTML version deleted]]
> >
> >
> 
> 
> 
> 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading file in zip archive

2012-05-30 Thread Phil Spector

Iain -
   Do you see the same behaviour if you use

z <- unz(pathToZip, 'x.txt')

instead of

z <- unz(pathToZip, 'x.txt','r')

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Wed, 30 May 2012, Iain Gallagher wrote:


Hi Phil

Thanks, but this still doesn't work.

Here's a reproducible example (was wrapping my head around these functions 
before).

x <- as.data.frame(cbind(rep('a',5), rep('b',5)))
y <- as.data.frame(cbind(rep('c',5), rep('d',5)))

write.table(x, 'x.txt', sep='\t', quote=FALSE)
write.table(y, 'y.txt', sep='\t', quote=FALSE)

zip('test.zip', files = c('x.txt', 'y.txt'))

pathToZip <- paste(getwd(), '/test.zip', sep='')

z <- unz(pathToZip, 'x.txt', 'r')
zT <- read.table(z, header=FALSE, sep='\t')

Error in read.table(z, header = FALSE, sep = "\t") :
  seek not enabled for this connection

As I said in my previous email readLines fails as well. Rather strange really.

Anyway, as before any advice would be appreciated.

Best

Iain

_
From: Phil Spector 
To: Iain Gallagher 
Cc: r-help 
Sent: Wednesday, 30 May 2012, 20:16
Subject: Re: [R] reading file in zip archive

Iain -
    Once you specify the file to unzip in the call to unz, there's no
need to repeat the filename in read.table.  Try:

z <- unz(pathToZip, 'goCats.txt', 'r')
zT <- read.table(z, header=TRUE, sep='\t')

(Although I can't reproduce the exact error which you saw.)

                    - Phil Spector
                    Statistical Computing Facility
                    Department of Statistics
                    UC Berkeley
                    spec...@stat.berkeley.edu



On Wed, 30 May 2012, Iain Gallagher wrote:

> Hi List
>
> I have a series of zip archives each containing several files. One of these 
files is called
goCats.txt and I would like to read it into R from the archive. It's a simple 
tab delimited text
file.
> pathToZip 
<-'/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/afInfection/commonNorm/twoHrs/af2
hrs.zip'
>
> z <- unz(pathToZip, 'goCats.txt', 'r')
> zT <- read.table(z, 'goCats.txt', header=T, sep='\t')
>
> Error in read.table(z, "goCats.txt", header = T, sep = "\t") :
> ? seek not enabled for this connection
>
>
> The same error arises with readLines.
>
> Can anyone advise?
>
> Best
>
> iain
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> ?[1] LC_CTYPE=en_GB.utf8?? LC_NUMERIC=C
> ?[3] LC_TIME=en_GB.utf8??? LC_COLLATE=en_GB.utf8???
> ?[5] LC_MONETARY=en_GB.utf8??? LC_MESSAGES=en_GB.utf8??
> ?[7] LC_PAPER=C??? LC_NAME=C???
> ?[9] LC_ADDRESS=C? LC_TELEPHONE=C??
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C?
>
> attached base packages:
> [1] stats graphics? grDevices utils datasets? methods?? base
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.0
>     [[alternative HTML version deleted]]
>
>



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading file in zip archive

2012-05-30 Thread Peter Langfelder
On Wed, May 30, 2012 at 12:47 PM, Iain Gallagher
 wrote:
> Hi Phil
>
> Thanks, but this still doesn't work.
>
>
> Here's a reproducible example (was wrapping my head around these functions 
> before).
>
> x <- as.data.frame(cbind(rep('a',5), rep('b',5)))
> y <- as.data.frame(cbind(rep('c',5), rep('d',5)))
>
> write.table(x, 'x.txt', sep='\t', quote=FALSE)
> write.table(y, 'y.txt', sep='\t', quote=FALSE)
>
> zip('test.zip', files = c('x.txt', 'y.txt'))
>
> pathToZip <- paste(getwd(), '/test.zip', sep='')
>
> z <- unz(pathToZip, 'x.txt', 'r')
> zT <- read.table(z, header=FALSE, sep='\t')
>
> Error in read.table(z, header = FALSE, sep = "\t") :
>   seek not enabled for this connection

I get the same error and I don't have direct advice on how to avoid
it, but you can avoid working directly with the zip connection by
first unzipping the file, then reading it in:

pathToZip <- paste(getwd(), '/test.zip', sep='')
file = "x.txt"
command = paste("unzip -f ", pathToZip, file);
system(command)
zT = read.table(file, header = FALSE, sep = "\t")

By the way, I got an error reading the file x.txt since the
write.table command also saved row names. I had to add

row.names = FALSE

to the write table calls to make it work, like this:


x <- as.data.frame(cbind(rep('a',5), rep('b',5)))
y <- as.data.frame(cbind(rep('c',5), rep('d',5)))

write.table(x, 'x.txt', sep='\t', quote=FALSE, row.names = FALSE)
write.table(y, 'y.txt', sep='\t', quote=FALSE, row.names = FALSE)

zip('test.zip', files = c('x.txt', 'y.txt'))

HTH,

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading file in zip archive

2012-05-30 Thread Iain Gallagher
Hi Phil

Thanks, but this still doesn't work. 


Here's a reproducible example (was wrapping my head around these functions 
before).

x <- as.data.frame(cbind(rep('a',5), rep('b',5)))
y <- as.data.frame(cbind(rep('c',5), rep('d',5)))

write.table(x, 'x.txt', sep='\t', quote=FALSE)
write.table(y, 'y.txt', sep='\t', quote=FALSE)

zip('test.zip', files = c('x.txt', 'y.txt'))

pathToZip <- paste(getwd(), '/test.zip', sep='')

z <- unz(pathToZip, 'x.txt', 'r')
zT <- read.table(z, header=FALSE, sep='\t')

Error in read.table(z, header = FALSE, sep = "\t") : 
  seek not enabled for this connection


As I said in my previous email readLines fails as well. Rather strange really.

Anyway, as before any advice would be appreciated.

Best

Iain





 From: Phil Spector 
To: Iain Gallagher  
Cc: r-help  
Sent: Wednesday, 30 May 2012, 20:16
Subject: Re: [R] reading file in zip archive

Iain -
    Once you specify the file to unzip in the call to unz, there's no
need to repeat the filename in read.table.  Try:

z <- unz(pathToZip, 'goCats.txt', 'r')
zT <- read.table(z, header=TRUE, sep='\t')

(Although I can't reproduce the exact error which you saw.)

                    - Phil Spector
                     Statistical Computing Facility
                     Department of Statistics
                     UC Berkeley
                    spec...@stat.berkeley.edu



On Wed, 30 May 2012, Iain Gallagher wrote:

> Hi List
>
> I have a series of zip archives each containing several files. One of these 
> files is called goCats.txt and I would like to read it into R from the 
> archive. It's a simple tab delimited text file.
> pathToZip <- 
> '/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/afInfection/commonNorm/twoHrs/af2hrs.zip'
>
> z <- unz(pathToZip, 'goCats.txt', 'r')
> zT <- read.table(z, 'goCats.txt', header=T, sep='\t')
>
> Error in read.table(z, "goCats.txt", header = T, sep = "\t") :
> ? seek not enabled for this connection
>
>
> The same error arises with readLines.
>
> Can anyone advise?
>
> Best
>
> iain
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> ?[1] LC_CTYPE=en_GB.utf8?? LC_NUMERIC=C
> ?[3] LC_TIME=en_GB.utf8??? LC_COLLATE=en_GB.utf8???
> ?[5] LC_MONETARY=en_GB.utf8??? LC_MESSAGES=en_GB.utf8??
> ?[7] LC_PAPER=C??? LC_NAME=C???
> ?[9] LC_ADDRESS=C? LC_TELEPHONE=C??
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C?
>
> attached base packages:
> [1] stats graphics? grDevices utils datasets? methods?? base
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.0
>     [[alternative HTML version deleted]]
>
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading file in zip archive

2012-05-30 Thread Phil Spector

Iain -
   Once you specify the file to unzip in the call to unz, there's no
need to repeat the filename in read.table.  Try:

z <- unz(pathToZip, 'goCats.txt', 'r')
zT <- read.table(z, header=TRUE, sep='\t')

(Although I can't reproduce the exact error which you saw.)

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Wed, 30 May 2012, Iain Gallagher wrote:


Hi List

I have a series of zip archives each containing several files. One of these 
files is called goCats.txt and I would like to read it into R from the archive. 
It's a simple tab delimited text file.
pathToZip <- 
'/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/afInfection/commonNorm/twoHrs/af2hrs.zip'

z <- unz(pathToZip, 'goCats.txt', 'r')
zT <- read.table(z, 'goCats.txt', header=T, sep='\t')

Error in read.table(z, "goCats.txt", header = T, sep = "\t") :
? seek not enabled for this connection


The same error arises with readLines.

Can anyone advise?

Best

iain


sessionInfo()

R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
?[1] LC_CTYPE=en_GB.utf8?? LC_NUMERIC=C
?[3] LC_TIME=en_GB.utf8??? LC_COLLATE=en_GB.utf8???
?[5] LC_MONETARY=en_GB.utf8??? LC_MESSAGES=en_GB.utf8??
?[7] LC_PAPER=C??? LC_NAME=C???
?[9] LC_ADDRESS=C? LC_TELEPHONE=C??
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C?

attached base packages:
[1] stats graphics? grDevices utils datasets? methods?? base

loaded via a namespace (and not attached):
[1] tools_2.15.0
[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading file in zip archive

2012-05-30 Thread Iain Gallagher
Hi List

I have a series of zip archives each containing several files. One of these 
files is called goCats.txt and I would like to read it into R from the archive. 
It's a simple tab delimited text file.
pathToZip <- 
'/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/afInfection/commonNorm/twoHrs/af2hrs.zip'

z <- unz(pathToZip, 'goCats.txt', 'r')
zT <- read.table(z, 'goCats.txt', header=T, sep='\t')

Error in read.table(z, "goCats.txt", header = T, sep = "\t") : 
  seek not enabled for this connection


The same error arises with readLines.

Can anyone advise?

Best

iain

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8   LC_NUMERIC=C 
 [3] LC_TIME=en_GB.utf8    LC_COLLATE=en_GB.utf8    
 [5] LC_MONETARY=en_GB.utf8    LC_MESSAGES=en_GB.utf8   
 [7] LC_PAPER=C    LC_NAME=C    
 [9] LC_ADDRESS=C  LC_TELEPHONE=C   
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C  

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_2.15.0
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading file

2011-04-27 Thread Jeff Newmiller
I don't know the answer to your question, but I avoid these problems by saving 
my data as csv and avoiding direct interaction with Excel files. Excel is NOT a 
database, even though it has supposed support through ODBC. I find this holds 
true regardless of the programming environment from which I try to access Excel.
---
Jeff Newmiller The . . Go Live...
DCN: Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Val  wrote:

Hi all, I am trying to read Excel file usingthe follwoing commnad 
library(RODBC) data=odbcConnectExcel(file.choose()) sqlTables(data) 
Bdat=sqlFetch(data, "test") odbcClose(data) head(Bdat) 1. The above script 
works if the Excel file is opened. If the excel file is not opened then I get 
the following message "External table is not in the expected format" and it 
stops. 2. Instead of "(file.choose())" I wanted to use the following command 
data <- read.table('G:/test.xlsx", header=T) But it did not work again. Warning 
message: In read.table("G:/test.xlsx", header = T) : incomplete final line 
found by readTableHeader on 'G:/test.xlsx' Did I miss something there? Your 
help is highly appreciated Val   [[alternative HTML version 
deleted]]_
R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code. 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading file

2011-04-27 Thread Val
Hi all,

I am trying to read Excel file usingthe follwoing commnad
  library(RODBC)
 data=odbcConnectExcel(file.choose())
 sqlTables(data)
Bdat=sqlFetch(data, "test")
odbcClose(data)
head(Bdat)

1. The above script works if the Excel file is opened. If the excel file is
not opened then I get the following message "External table is not in the
expected format" and it stops.

2.  Instead of "(file.choose())" I wanted to use the following command
data <- read.table('G:/test.xlsx",  header=T)

  But it did not work again.
Warning message:
   In read.table("G:/test.xlsx", header = T) :
   incomplete final line found by readTableHeader on 'G:/test.xlsx'

Did I miss something there?


Your help is highly appreciated

Val

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading file names containing Metacharacters

2011-01-31 Thread Jeisson F

I am trying to read some file names from an specific directory and such names 
contains metacharacters.
The file names is like V4.35_T01-400720.csv
In total I have 14 files for which the value T01 goes up to T14. I need to read 
the files into a string vector that looks  like
>names"V4.35_T01-400720.csv"  "V4.35_T02-400720.csv" ... "V4.35_T14-400720.csv"
So far, I have not been able to read the ".", "_" and "-" characters.
Regards
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading file

2010-06-01 Thread David Winsemius


On Jun 1, 2010, at 3:19 PM, Robert Tsutakawa wrote:

I am trying to read a source program into a mac pro laptop, which  
uses Snow Leopard.  R is unable to find the file containing my  
source program.  I'm using the function  source(" file name").  I  
need some examples or detailed instructions.  I have no problem  
reading the file using PC.

Bob


Just drag the file from a Finder window to the console to get the full  
path name.


A cross-platform solution would be file.choose()

--
David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading file

2010-06-01 Thread Stephan Kolassa

Have you set the correct working directory?

?setwd
?getwd

HTH
Stephan


Robert Tsutakawa schrieb:
I am trying to read a source program into a mac pro laptop, which uses 
Snow Leopard.  R is unable to find the file containing my source 
program.  I'm using the function  source(" file name").  I need some 
examples or detailed instructions.  I have no problem reading the file 
using PC.

Bob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading file

2010-06-01 Thread Robert Tsutakawa
I am trying to read a source program into a mac pro laptop, which uses  
Snow Leopard.  R is unable to find the file containing my source  
program.  I'm using the function  source(" file name").  I need some  
examples or detailed instructions.  I have no problem reading the file  
using PC.

Bob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading file from remote location or network drive.

2009-01-02 Thread Prof Brian Ripley

This is an FAQ (both in the main FAQ and the rw-FAQ)

http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-file-names-work-in-Windows_003f
http://cran.r-project.org/bin/windows/base/rw-FAQ.html#R-can_0027t-find-my-file

You may find it easier to map your network drives: most users do.

See also ?Quotes in R.

On Fri, 2 Jan 2009, Harsh wrote:


Hello,

I'm trying to pull data from a network drive on a windows machine. The
location is read into a string  and then used later with a data
input command.


rem<- "\\192.192.192.3\Shared\iris1.csv"

Warning messages:
1: '\S' is an unrecognized escape in a character string
2: '\i' is an unrecognized escape in a character string
3: unrecognized escapes removed from "\\192.168.16.3\Shared\iris1.csv"

When using a data input operation

datafile<- read.csv(rem,header= T, sep = ",")
Error in file(file, "r") : cannot open the connection
In addition: Warning message:
In file(file, "r") :
 cannot open file '\192.192.192.3Sharediris1.csv': No such file or directory

I have tried to use strsplit to split on "\\"


strsplit(rem,"\\")

Error in strsplit(rem, "\\") : invalid split pattern '\'
In addition: Warning message:
In strsplit(rem, "\\") : regcomp error:  'Trailing backslash'

Also, I tried to split to extract all characters and this is what I obtained.


print(strsplit(rem,""))

[[1]]
[1] "\\" "1"  "9"  "2"  "."  "1"  "6"  "8"  "."  "1"  "6"  "."  "3"  "S"  "h"
[16] "a"  "r"  "e"  "d"  "i"  "r"  "i"  "s"  "1"  "."  "c"  "s"  "v"


The problem is that, I cannot check for each character and if "\",
convert it to "/".

Ofcourse, if I were to assign


rem<- "//192.192.192.3/Shared/iris1.csv"


Then rem can be used successfully

datafile<- read.csv(rem,header= T, sep = ",")


Alternately, I would like to know, if the network drive were to have a
username and password,
how would I be able to pass those parameters in read.csv below

datafile<- read.csv("\\192.168.16.3\Shared\iris1.csv username:user
password:user",header= T, sep = ",")


Clearly not, do read the help page as requested.




Thank you,

Harsh Singhal
Mu Sigma Decision Systems Inc.,
Chicago, IL
USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading file from remote location or network drive.

2009-01-02 Thread ONKELINX, Thierry
Dear Harsh,

You have to replace each "\" with "\\" or try to use "/" instead.

HTH,

Thierry 




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
thierry.onkel...@inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens Harsh
Verzonden: vrijdag 2 januari 2009 8:46
Aan: r-help@r-project.org
Onderwerp: [R] Reading file from remote location or network drive.

Hello,

I'm trying to pull data from a network drive on a windows machine. The
location is read into a string  and then used later with a data
input command.

> rem<- "\\192.192.192.3\Shared\iris1.csv"
Warning messages:
1: '\S' is an unrecognized escape in a character string
2: '\i' is an unrecognized escape in a character string
3: unrecognized escapes removed from "\\192.168.16.3\Shared\iris1.csv"

When using a data input operation

datafile<- read.csv(rem,header= T, sep = ",")
Error in file(file, "r") : cannot open the connection
In addition: Warning message:
In file(file, "r") :
  cannot open file '\192.192.192.3Sharediris1.csv': No such file or
directory

I have tried to use strsplit to split on "\\"

> strsplit(rem,"\\")
Error in strsplit(rem, "\\") : invalid split pattern '\'
In addition: Warning message:
In strsplit(rem, "\\") : regcomp error:  'Trailing backslash'

Also, I tried to split to extract all characters and this is what I
obtained.

> print(strsplit(rem,""))
[[1]]
 [1] "\\" "1"  "9"  "2"  "."  "1"  "6"  "8"  "."  "1"  "6"  "."  "3"
"S"  "h"
[16] "a"  "r"  "e"  "d"  "i"  "r"  "i"  "s"  "1"  "."  "c"  "s"  "v"


The problem is that, I cannot check for each character and if "\",
convert it to "/".

Ofcourse, if I were to assign

> rem<- "//192.192.192.3/Shared/iris1.csv"

Then rem can be used successfully
> datafile<- read.csv(rem,header= T, sep = ",")

Alternately, I would like to know, if the network drive were to have a
username and password,
how would I be able to pass those parameters in read.csv below

datafile<- read.csv("\\192.168.16.3\Shared\iris1.csv username:user
password:user",header= T, sep = ",")


Thank you,

Harsh Singhal
Mu Sigma Decision Systems Inc.,
Chicago, IL
USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading file from remote location or network drive.

2009-01-01 Thread Harsh
Hello,

I'm trying to pull data from a network drive on a windows machine. The
location is read into a string  and then used later with a data
input command.

> rem<- "\\192.192.192.3\Shared\iris1.csv"
Warning messages:
1: '\S' is an unrecognized escape in a character string
2: '\i' is an unrecognized escape in a character string
3: unrecognized escapes removed from "\\192.168.16.3\Shared\iris1.csv"

When using a data input operation

datafile<- read.csv(rem,header= T, sep = ",")
Error in file(file, "r") : cannot open the connection
In addition: Warning message:
In file(file, "r") :
  cannot open file '\192.192.192.3Sharediris1.csv': No such file or directory

I have tried to use strsplit to split on "\\"

> strsplit(rem,"\\")
Error in strsplit(rem, "\\") : invalid split pattern '\'
In addition: Warning message:
In strsplit(rem, "\\") : regcomp error:  'Trailing backslash'

Also, I tried to split to extract all characters and this is what I obtained.

> print(strsplit(rem,""))
[[1]]
 [1] "\\" "1"  "9"  "2"  "."  "1"  "6"  "8"  "."  "1"  "6"  "."  "3"  "S"  "h"
[16] "a"  "r"  "e"  "d"  "i"  "r"  "i"  "s"  "1"  "."  "c"  "s"  "v"


The problem is that, I cannot check for each character and if "\",
convert it to "/".

Ofcourse, if I were to assign

> rem<- "//192.192.192.3/Shared/iris1.csv"

Then rem can be used successfully
> datafile<- read.csv(rem,header= T, sep = ",")

Alternately, I would like to know, if the network drive were to have a
username and password,
how would I be able to pass those parameters in read.csv below

datafile<- read.csv("\\192.168.16.3\Shared\iris1.csv username:user
password:user",header= T, sep = ",")


Thank you,

Harsh Singhal
Mu Sigma Decision Systems Inc.,
Chicago, IL
USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.