If you want something that is fast, read the file in, strip off the
colon/data, write it out to a temp and then read it back in.  Here is
a 355K line file:

> temp <- tempfile()
> input <- readLines('/temp/colon.txt')
> length(input)
[1] 355212
> system.time(input <- gsub("(:[0-9]+)", "", input))
   user  system elapsed
   0.72    0.00    0.74
> head(input)
[1] "1  5  27  345" "1  5  27  345" "1  5  27  345" "1  5  27  345" "1
 5  27  345" "1  5  27  345"
> writeLines(input, temp)
> system.time(newInput <- read.table(temp))
   user  system elapsed
   1.08    0.02    1.13
> dim(newInput)
[1] 355212      4
>
> head(newInput)
  V1 V2 V3  V4
1  1  5 27 345
2  1  5 27 345
3  1  5 27 345
4  1  5 27 345
5  1  5 27 345
6  1  5 27 345


On Tue, Oct 9, 2012 at 12:56 AM, Noah Silverman <noahsilver...@ucla.edu> wrote:
> I have a bunch of data sets that were created for the libsvm tool.  They are 
> in "colon separated sparse format".
>
> i.e.
>
> 1  5:1  27:3  345:10
>
> Is a row with the label of "1" and only has values in columns 5, 27, and 345.
>
> I want to read these into a data.frame in R.
>
> Is there a simple way to do this?
>
> --
> Noah Silverman, M.S.
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to