it may be worth it writing a script to transpose the data (in awk, it
takes 10min on my laptop)... then read in the transposed data...
> system.time({x <- read.delim("testTransposed.txt", header=F,
colClasses="numeric", nrow=700000); x <- t(x)})
user system elapsed
4.958 0.412 5.477
b
On Sep 25, 2009, at 1:35 PM, Ping-Hsun Hsieh wrote:
Thanks, Ben.
The matrix is a pure numeric matrix (6x700000, 31mb).
I tried the colClasses='numeric' as well as nrows=7(one of these is
header line) on the matrix.
Also I tested it with not setting the two options in read.delim()
Here is the time spent on reading the matrix for each test.
system.time( tmp <- read.delim("test_data.txt"))
user system elapsed
50985.421 27.665 51013.384
system.time(tmp <-
read
.delim("test_data.txt",colClasses="numeric",nrows=7,comment.char=""))
user system elapsed
51301.563 60.491 51362.208
It seems setting the options does not speed up the reading at all.
Is it because of the header line? I will test it.
Did I misunderstand something?
One additional and interesting observation:
The one with the options does save memory a lot. It took ~150mb,
while the other took ~4GB for reading the matrix.
I will try the scan() and see if it helps.
Thanks!
Mike
-----Original Message-----
From: Benilton Carvalho [mailto:bcarv...@jhsph.edu]
Sent: Wednesday, September 23, 2009 4:56 PM
To: Ping-Hsun Hsieh
Cc: r-help@r-project.org
Subject: Re: [R] read.delim very slow in reading files with lots of
columns
use the 'colClasses' argument and you can also set 'nrows'.
b
On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote:
Hi,
I am trying to read a tab-delimited file into R (Ver. 2.8). The
machine I am using is 64bit Linux with 16 GB.
The file is basically a matrix(~600x700000) and as large as 3GB.
The read.delim() ran extremely slow (hours) even with a subset of
the file (31 MB with 6x700000)
I monitored the memory usage, and found it constantly only took less
than 1% of 16GB memory.
Does read.delim() have difficulty to read files with lots of columns?
Any suggestions?
Thanks,
Mike
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.