Re: [R] Fwd: combining data from multiple read.delim() invocations.

2014-07-01 Thread Bert Gunter
On Tue, Jul 1, 2014 at 12:03 PM, John McKown
 wrote:
> On Tue, Jul 1, 2014 at 11:31 AM, David L Carlson  wrote:
>
>> There is a better way. First we need some data. This creates three files
>> in your home directory, each with five rows:
>>
>> write.table(data.frame(rep("A", 5), Sys.time(), Sys.time()),
>> "A.tab", sep="\t", row.names=FALSE, col.names=FALSE)
>> write.table(data.frame(rep("B", 5), Sys.time(), Sys.time()),
>>  "B.tab", sep="\t", row.names=FALSE, col.names=FALSE)
>> write.table(data.frame(rep("C", 5), Sys.time(), Sys.time()),
>> "C.tab", sep="\t", row.names=FALSE, col.names=FALSE)
>>
>> Now to read and combine them into a single data.frame:
>>
>> fls <- c("A.tab", "B.tab", "C.tab")
>> df.list <- lapply(fls, read.delim, header=FALSE,
>> col.names=c("lpar","started","ended"),
>>as.is=TRUE, na.strings='\\N',
>> colClasses=c("character","POSIXct","POSIXct"))
>> df.all <- do.call(rbind, df.list)
>> > str(df.all)
>> 'data.frame':   15 obs. of  3 variables:
>>  $ lpar   : chr  "A" "A" "A" "A" ...
>>  $ started: POSIXct, format: "2014-07-01 11:25:05" "2014-07-01 11:25:05"
>> ...
>>  $ ended  : POSIXct, format: "2014-07-01 11:25:05" "2014-07-01 11:25:05"
>> ...
>>
>> -
>> David L Carlson
>>
>
> I do like that better than my version. Mainly because it is fewer
> statements. I'm rather new with R and the *apply series of functions is
> "bleeding edge" for me. And I haven't the the "do.call" before either. I'm
> still reading. But the way that I learn best is to try projects as I am
> learning. So I get ahead of myself.

If you have not already done so, please read "An Introduction to R" or
online tutorial of your choice before posting further. I do not
consider it proper to post queries concerning basics that you can
easily learn about yourself.  I DO consider it proper to post queries
about such topics if you have made the effort but are still confused.
That is what this list is for. You can decide -- and chastise me if
you like -- into which category you fit.

Cheers,
Bert



>
> According to the Linux "time" command, your method for a single input file,
> resulting in 144 output elements in the data.frame, took:
> real0m0.525s
> user0m0.441s
> sys 0m0.063s
>
> Mine:
> real0m0.523s
> user0m0.446s
> sys 0m0.060s
>
> Basically, a "wash". For a stress, I took in all 136 of my files in a
> single execution. Output was 22,823 elements in the data.frame.
> Yours:
> real3m32.651s
> user3m26.837s
> sys 0m2.292s
>
> Mine:
> real3m24.603s
> user3m20.225s
> sys 0m0.969s
>
> Still a wash. Of course, since I run this only once a week, on a Sunday,
> the time is not too important. I actually think that your solution is a bit
> more readable than mine. So long as I document what is going on.
>
> ===
>
> I had considered combining all the files together using the R "pipe"
> command to run the UNIX "cat" command, something like:
>
> command <- paste("cat ",arguments,collapse=" ");
> read.delim(pipe(command), ...
>
> but I was trying to be "pure R" since I am a Linux bigot surrounded by
> Windows weenies .
>
> ===
>
> Hook'em horns!
>
> --
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
>
> Maranatha! <><
> John McKown
>
>
>
> --
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
>
> Maranatha! <><
> John McKown
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: combining data from multiple read.delim() invocations.

2014-07-01 Thread John McKown
On Tue, Jul 1, 2014 at 11:31 AM, David L Carlson  wrote:

> There is a better way. First we need some data. This creates three files
> in your home directory, each with five rows:
>
> write.table(data.frame(rep("A", 5), Sys.time(), Sys.time()),
> "A.tab", sep="\t", row.names=FALSE, col.names=FALSE)
> write.table(data.frame(rep("B", 5), Sys.time(), Sys.time()),
>  "B.tab", sep="\t", row.names=FALSE, col.names=FALSE)
> write.table(data.frame(rep("C", 5), Sys.time(), Sys.time()),
> "C.tab", sep="\t", row.names=FALSE, col.names=FALSE)
>
> Now to read and combine them into a single data.frame:
>
> fls <- c("A.tab", "B.tab", "C.tab")
> df.list <- lapply(fls, read.delim, header=FALSE,
> col.names=c("lpar","started","ended"),
>as.is=TRUE, na.strings='\\N',
> colClasses=c("character","POSIXct","POSIXct"))
> df.all <- do.call(rbind, df.list)
> > str(df.all)
> 'data.frame':   15 obs. of  3 variables:
>  $ lpar   : chr  "A" "A" "A" "A" ...
>  $ started: POSIXct, format: "2014-07-01 11:25:05" "2014-07-01 11:25:05"
> ...
>  $ ended  : POSIXct, format: "2014-07-01 11:25:05" "2014-07-01 11:25:05"
> ...
>
> -
> David L Carlson
>

I do like that better than my version. Mainly because it is fewer
statements. I'm rather new with R and the *apply series of functions is
"bleeding edge" for me. And I haven't the the "do.call" before either. I'm
still reading. But the way that I learn best is to try projects as I am
learning. So I get ahead of myself.

According to the Linux "time" command, your method for a single input file,
resulting in 144 output elements in the data.frame, took:
real0m0.525s
user0m0.441s
sys 0m0.063s

Mine:
real0m0.523s
user0m0.446s
sys 0m0.060s

Basically, a "wash". For a stress, I took in all 136 of my files in a
single execution. Output was 22,823 elements in the data.frame.
Yours:
real3m32.651s
user3m26.837s
sys 0m2.292s

Mine:
real3m24.603s
user3m20.225s
sys 0m0.969s

Still a wash. Of course, since I run this only once a week, on a Sunday,
the time is not too important. I actually think that your solution is a bit
more readable than mine. So long as I document what is going on.

===

I had considered combining all the files together using the R "pipe"
command to run the UNIX "cat" command, something like:

command <- paste("cat ",arguments,collapse=" ");
read.delim(pipe(command), ...

but I was trying to be "pure R" since I am a Linux bigot surrounded by
Windows weenies .

===

Hook'em horns!

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown



-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.