Thank you so much, Jorge and Arun - I'll give it a try!
Dimitri

On Fri, Jun 7, 2013 at 11:27 PM, arun <smartpink...@yahoo.com> wrote:

> HI,
> Tried it on 1e5 row dataset:
>
> l1<- letters[1:10]
> s1<-sapply(seq_along(l1),function(i) paste(rep(l1[i],3),collapse=""))
> set.seed(24)
>
> x1<-data.frame(x=paste(paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),sep="_"),stringsAsFactors=FALSE)
>
> system.time(resNew<-data.frame(x=x1,read.table(text=gsub("[A-Za-z]","",x1[,1]),sep="_",header=FALSE),stringsAsFactors=FALSE))
> #   user  system elapsed
> #  2.712   0.016   2.732
>
> head(resNew)
> #                  x V1 V2 V3
> #1  ccc12_ggg2_jjj14 12  2 14
> #2  ccc7_ddd15_aaa11  7 15 11
> #3 hhh12_ddd14_fff12 12 14 12
> #4  fff11_bbb15_aaa6 11 15  6
> #5   ggg12_ccc9_ggg8 12  9  8
> #6   jjj8_eee12_eee4  8 12  4
>
> A.K.
>
>
> ----- Original Message -----
> From: arun <smartpink...@yahoo.com>
> To: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com>
> Cc: R help <r-help@r-project.org>
> Sent: Friday, June 7, 2013 11:00 PM
> Subject: Re: [R] splitting a string column into multiple columns faster
>
> HI,
> May be this helps:
>
>
> res<-data.frame(x=x,read.table(text=gsub("[A-Za-z]","",x[,1]),sep="_",header=FALSE),stringsAsFactors=FALSE)
> res
> #               x V1 V2 V3
> #1 aaa1_bbb1_ccc3  1  1  3
> #2 aaa2_bbb3_ccc2  2  3  2
> #3 aaa3_bbb2_ccc1  3  2  1
> A.K.
>
> ----- Original Message -----
> From: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com>
> To: r-help <r-help@r-project.org>
> Cc:
> Sent: Friday, June 7, 2013 9:24 PM
> Subject: [R] splitting a string column into multiple columns faster
>
> Hello!
>
> I have a column in my data frame that I have to split: I have to distill
> the numbers from the text. Below is my example and my solution.
>
> x<-data.frame(x=c("aaa1_bbb1_ccc3","aaa2_bbb3_ccc2","aaa3_bbb2_ccc1"))
> x
> library(stringr)
> out<-as.data.frame(str_split_fixed(x$x,"aaa",2))
> out2<-as.data.frame(str_split_fixed(out$V2,"_bbb",2))
> out3<-as.data.frame(str_split_fixed(out2$V2,"_ccc",2))
> result<-cbind(x,out2[1],out3)
> result
> My problem is:
> str_split.fixed is relatively slow. In my real data frame I have over
> 80,000 rows so that it takes almost 30 seconds to run just one line (like
> out<-... above)
> And it's even slower because I have to do it step-by-step many times.
>
> Any way to do it by specifying all 3 delimiters at once
> ("aaa","_bbb","_ccc") and then split it in one swoop into a data frame with
> several columns?
>
> Thanks a lot for any pointers!
>
> --
> Dimitri Liakhovitski
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Dimitri Liakhovitski

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to