[R] character to numeric conversion
Hi. Is there a straightforward way to convert a character string containing comma-delimited numbers to a numeric vector? In my application, I use system(executable.string, intern=TRUE) which returns a string like [0.E-38, 2.096751179214927596171268230, 3.678944959657480671183123052, 4.976528845643001020345216157, 6.072390165503099343887569007, 7.007958550337542210168866070, 7.807464185827177139302778736, 8.486139455817034846608029724, 9.053706780665060873259065771, 9.516172308326877463284426111, 9.876856047379733199590985269, 10.13695826383869052536062804, 10.29580989588667234885515374, 10.35092785255025551187463209, 10.29795676261278695909972578, 10.13052574735986793562227138, 9.839990935943625006580521345, 9.414977153151389385186358494, 8.840562526759586215404890348, 8.096830792651667245232639586, 7.156244887881612948153311800, 5.978569259122249264778017262, 4.499809670330265066808481929, 2.602689685444383764768503589, 0.E-38] (the output is a single line). In a big run, the string may contain 10^5 or possibly 10^6 numbers. What's the recommended way to convert this to a numeric vector? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character to numeric conversion
you could give a try to strsplit(), e.g., strg - 0.E-38, 2.096751179214927596171268230, 3.678944959657480671183123052 strg - paste(rep(strg, 5000), collapse = , ) ## f.out - factor(strsplit(strg, , )[[1]]) n.out - as.numeric(levels(f.out))[as.integer(f.out)] I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Robin Hankin [EMAIL PROTECTED] To: RHelp help r-help@stat.math.ethz.ch Sent: Monday, March 19, 2007 10:18 AM Subject: [R] character to numeric conversion Hi. Is there a straightforward way to convert a character string containing comma-delimited numbers to a numeric vector? In my application, I use system(executable.string, intern=TRUE) which returns a string like [0.E-38, 2.096751179214927596171268230, 3.678944959657480671183123052, 4.976528845643001020345216157, 6.072390165503099343887569007, 7.007958550337542210168866070, 7.807464185827177139302778736, 8.486139455817034846608029724, 9.053706780665060873259065771, 9.516172308326877463284426111, 9.876856047379733199590985269, 10.13695826383869052536062804, 10.29580989588667234885515374, 10.35092785255025551187463209, 10.29795676261278695909972578, 10.13052574735986793562227138, 9.839990935943625006580521345, 9.414977153151389385186358494, 8.840562526759586215404890348, 8.096830792651667245232639586, 7.156244887881612948153311800, 5.978569259122249264778017262, 4.499809670330265066808481929, 2.602689685444383764768503589, 0.E-38] (the output is a single line). In a big run, the string may contain 10^5 or possibly 10^6 numbers. What's the recommended way to convert this to a numeric vector? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character to numeric conversion
Robin Hankin wrote: Hi. Is there a straightforward way to convert a character string containing comma-delimited numbers to a numeric vector? In my application, I use system(executable.string, intern=TRUE) which returns a string like [0.E-38, 2.096751179214927596171268230, 3.678944959657480671183123052, 4.976528845643001020345216157, 6.072390165503099343887569007, 7.007958550337542210168866070, 7.807464185827177139302778736, 8.486139455817034846608029724, 9.053706780665060873259065771, 9.516172308326877463284426111, 9.876856047379733199590985269, 10.13695826383869052536062804, 10.29580989588667234885515374, 10.35092785255025551187463209, 10.29795676261278695909972578, 10.13052574735986793562227138, 9.839990935943625006580521345, 9.414977153151389385186358494, 8.840562526759586215404890348, 8.096830792651667245232639586, 7.156244887881612948153311800, 5.978569259122249264778017262, 4.499809670330265066808481929, 2.602689685444383764768503589, 0.E-38] (the output is a single line). In a big run, the string may contain 10^5 or possibly 10^6 numbers. What's the recommended way to convert this to a numeric vector? scan() on a text connection: x - [0.E-38, 2.096751179214927596171268230, + 3.678944959657480671183123052, 4.976528845643001020345216157, + 6.072390165503099343887569007, 7.007958550337542210168866070, + 7.807464185827177139302778736, 8.486139455817034846608029724, + 9.053706780665060873259065771, 9.516172308326877463284426111, + 9.876856047379733199590985269, 10.13695826383869052536062804, + 10.29580989588667234885515374, 10.35092785255025551187463209, + 10.29795676261278695909972578, 10.13052574735986793562227138, + 9.839990935943625006580521345, 9.414977153151389385186358494, + 8.840562526759586215404890348, 8.096830792651667245232639586, + 7.156244887881612948153311800, 5.978569259122249264778017262, + 4.499809670330265066808481929, 2.602689685444383764768503589, 0.E-38] tc - textConnection(gsub([][ \n],,x)) xx - scan(tc,sep=,) Read 25 items summary(xx) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.000 4.977 8.097 7.049 9.840 10.350 close(tc) (By far, the hardest bit was getting the gsub regexp right...) Alternatively, just get rid of the brackets and replace commas with whitespace. A problem with sep=, is that it gets confused by line endings following a comma. tc - textConnection(gsub(,, , gsub([][], , x))) xx - scan(tc) Read 25 items summary(xx) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.000 4.977 8.097 7.049 9.840 10.350 close(tc) -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character to numeric conversion
Hello everybody thanks for the tips. I *think* this should be the same thread The manpage for system() says that lines of over 8095 characters will be split. This is causing me problems. How do I get round the 8095 character limit? Simple toy example follows: jj - system(echo 4 | awk '{for(i=1;i100;i++){printf(\%s,\, $1)}}'| sed -e \s/,$//\,intern=T) This is fine. But .. . . jj - system(echo 4 | awk '{for(i=1;i1;i++){printf(\%s,\, $1)}}'| sed -e \s/,$//\,intern=T) has jj split into three bits, which is upsetting my call. In my application the split occurs in the middle of a multi-digit number, which messes up my conversion to numeric? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character to numeric conversion
On 19 Mar 2007, at 11:20, Robin Hankin wrote: Hello everybody thanks for the tips. I *think* this should be the same thread The manpage for system() says that lines of over 8095 characters will be split. This is causing me problems. How do I get round the 8095 character limit? Er, just paste the output together using paste(..., collapse = ) The split is clean so concatenating the lines will not lose any characters. HTH -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character to numeric conversion
On Mon, 19 Mar 2007, Robin Hankin wrote: Hello everybody thanks for the tips. I *think* this should be the same thread The manpage for system() says that lines of over 8095 characters will be split. This is causing me problems. How do I get round the 8095 character limit? Can you use sed or awk in a pipe externally to change , into \n while still out in the system() call, for example the record separator RS in awk? Simple toy example follows: jj - system(echo 4 | awk '{for(i=1;i100;i++){printf(\%s,\, $1)}}'| sed -e \s/,$//\,intern=T) This is fine. But .. . . jj - system(echo 4 | awk '{for(i=1;i1;i++){printf(\%s,\, $1)}}'| sed -e \s/,$//\,intern=T) has jj split into three bits, which is upsetting my call. In my application the split occurs in the middle of a multi-digit number, which messes up my conversion to numeric? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character to numeric conversion
Here is one way. This matches strings which contain those characters found in a number, converting each such string to numeric. library(gsubfn) strapply(x, [-0-9+.E]+, as.numeric) On 3/19/07, Robin Hankin [EMAIL PROTECTED] wrote: Hi. Is there a straightforward way to convert a character string containing comma-delimited numbers to a numeric vector? In my application, I use system(executable.string, intern=TRUE) which returns a string like [0.E-38, 2.096751179214927596171268230, 3.678944959657480671183123052, 4.976528845643001020345216157, 6.072390165503099343887569007, 7.007958550337542210168866070, 7.807464185827177139302778736, 8.486139455817034846608029724, 9.053706780665060873259065771, 9.516172308326877463284426111, 9.876856047379733199590985269, 10.13695826383869052536062804, 10.29580989588667234885515374, 10.35092785255025551187463209, 10.29795676261278695909972578, 10.13052574735986793562227138, 9.839990935943625006580521345, 9.414977153151389385186358494, 8.840562526759586215404890348, 8.096830792651667245232639586, 7.156244887881612948153311800, 5.978569259122249264778017262, 4.499809670330265066808481929, 2.602689685444383764768503589, 0.E-38] (the output is a single line). In a big run, the string may contain 10^5 or possibly 10^6 numbers. What's the recommended way to convert this to a numeric vector? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character to numeric conversion
Could you replace 'system' with 'pipe' and read directly from the pipe connection rather than the intermediate step of having a text string? If the external function just returns the numbers with commas and spaces (but no line feeds), then you should be able to use 'scan' directly on the connection. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robin Hankin Sent: Monday, March 19, 2007 5:20 AM To: Peter Dalgaard Cc: RHelp help; Robin Hankin Subject: Re: [R] character to numeric conversion Hello everybody thanks for the tips. I *think* this should be the same thread The manpage for system() says that lines of over 8095 characters will be split. This is causing me problems. How do I get round the 8095 character limit? Simple toy example follows: jj - system(echo 4 | awk '{for(i=1;i100;i++){printf(\%s,\, $1)}}'| sed -e \s/,$//\,intern=T) This is fine. But .. . . jj - system(echo 4 | awk '{for(i=1;i1;i++){printf(\%s,\, $1)}}'| sed -e \s/,$//\,intern=T) has jj split into three bits, which is upsetting my call. In my application the split occurs in the middle of a multi-digit number, which messes up my conversion to numeric? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.