On Wed, 2006-11-01 at 16:47 +0100, [EMAIL PROTECTED] wrote: > Hello, > > I've a very long character array (>500k characters) that need to split > by '\n' resulting in an array of about 60k numbers. The help on > strsplit says to use perl=TRUE to get better formance, but still it > takes several minutes to split this string. > > The massive string is the return value of a call to > xmlElementsByTagName from the XML library and looks like this: > > ... > 12345 > 564376 > 5674 > 6356656 > 5666 > ... > > I've to read about a hundred of these files and was wondering whether > there's a more efficient way to turn this string into an array of > numerics. Any ideas? > > thanks a lot for your help > and kind regards, > > Arne >
Vec <- sample(c(0:9, "\n"), 500000, replace = TRUE) > str(Vec) chr [1:500000] "7" "0" "9" "6" "5" "3" "1" "9" ... > table(Vec) Vec \n 0 1 2 3 4 5 6 7 8 9 45432 45723 45641 45526 45460 45284 45378 45392 45374 45314 45476 > sink("Vec.txt") > cat(Vec) > sink() First 10 lines of Vec.txt: 7 0 9 6 5 3 1 9 8 1 8 3 4 2 1 2 2 3 7 7 6 8 3 4 7 4 9 2 1 9 8 7 2 0 9 4 3 9 3 5 2 2 5 8 0 5 4 5 6 1 5 8 7 4 1 2 8 3 2 6 4 9 4 1 6 8 5 0 8 8 8 5 3 0 5 3 5 4 8 5 4 3 9 5 3 6 5 8 9 7 6 9 5 8 2 4 6 5 > system.time(Vec.Split <- scan("Vec.txt", sep = "\n")) Read 41276 items [1] 0.180 0.004 0.186 0.000 0.000 > str(Vec.Split) num [1:41276] 7.10e+13 1.22e+02 3.78e+08 9.22e+10 9.35e+44 ... > sprintf("%.0f", Vec.Split[1:10]) [1] "70965319818342" [2] "122" [3] "377683474" [4] "92198720943" [5] "935225805456158720742405574866620654670577664" [6] "9" [7] "536589769" [8] "58" [9] "246" [10] "5" Does that help? Marc Schwartz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.