Re: [R] Cut intervals (character) to numeric midpoint; regex problem
You also might want to look at demo(gsubfn-cut) On Tue, Dec 1, 2009 at 2:41 PM, David Winsemius dwinsem...@comcast.netwrote: Starting with the head of a 499 element matrix whose column names are now the labels trom a cut() operation, I needed to get to a vector of midpoints to serve as the basis for plotting a calibration curve ( exp(linear predictor) vs. : dput(head(dimnames(mtcal)[2][[1]])) # was starting point testvec - c((-8.616,-3.084], (-3.084,-2.876], (-2.876,-2.756], (-2.756,-2.668], (-2.668,-2.597], (-2.597,-2.539]) I started this message with the thought of requesting an answer but kept asking myself if I really had check the docs and tested my understanding. I eventually solved it using the gsubfn from the gsubfn package: testintvl -as.numeric(gsubfn(\\((-?[[:digit:]]+.?[[:digit:]]*), (-?[[:digit:]]+.?[[:digit:]]*)\\], ~ (as.numeric(x)+as.numeric(y))/2, testvec)) # I did discover that carriage returns in the middle of the pattern will not give desired results, so if this is broken by your mail-client, be sure to rejoin in the console. The extra ?'s after the decimal point are in there because I had 4 NA's around the median linear predictor: dimnames(mtcal)[2][[1]][which(is.na(testintvl))] [1] (-1.008,-1] (-1,-0.9922] (0.9914,1] (1,1.009] So a better test vector would be: testvec - c((-8.616,-3.084], (-3.084,-2.876], (-2.876,-2.756], (-2.756,-2.668], (-2.668,-2.597], (-2.597,-2.539], (-1.008,-1], (-1,-0.9922], (0.9914,1], (1,1.009] ) testintvl -as.numeric(gsubfn(\\((-?[[:digit:]]+.?[[:digit:]]*),(-?[[:digit:]]+.?[[:digit:]]*)\\], + ~ (as.numeric(x)+as.numeric(y))/2, testvec)) testintvl [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961 0.9957 1.0045 I offer this to those who may feel regex challenged (as I often do). The gsubfn function is pretty slick. I don't see an author listed for the function, but the author of the package documents is Gabor Grothendieck. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cut intervals (character) to numeric midpoint; regex problem
Perhaps this shoul work too: sapply(strsplit(gsub(^\\W|\\W$, , testvec), ,), function(x)sum(as.numeric(x))/2) On Tue, Dec 1, 2009 at 5:41 PM, David Winsemius dwinsem...@comcast.net wrote: Starting with the head of a 499 element matrix whose column names are now the labels trom a cut() operation, I needed to get to a vector of midpoints to serve as the basis for plotting a calibration curve ( exp(linear predictor) vs. : dput(head(dimnames(mtcal)[2][[1]])) # was starting point testvec - c((-8.616,-3.084], (-3.084,-2.876], (-2.876,-2.756], (-2.756,-2.668], (-2.668,-2.597], (-2.597,-2.539]) I started this message with the thought of requesting an answer but kept asking myself if I really had check the docs and tested my understanding. I eventually solved it using the gsubfn from the gsubfn package: testintvl -as.numeric(gsubfn(\\((-?[[:digit:]]+.?[[:digit:]]*), (-?[[:digit:]]+.?[[:digit:]]*)\\], ~ (as.numeric(x)+as.numeric(y))/2, testvec)) # I did discover that carriage returns in the middle of the pattern will not give desired results, so if this is broken by your mail-client, be sure to rejoin in the console. The extra ?'s after the decimal point are in there because I had 4 NA's around the median linear predictor: dimnames(mtcal)[2][[1]][which(is.na(testintvl))] [1] (-1.008,-1] (-1,-0.9922] (0.9914,1] (1,1.009] So a better test vector would be: testvec - c((-8.616,-3.084], (-3.084,-2.876], (-2.876,-2.756], (-2.756,-2.668], (-2.668,-2.597], (-2.597,-2.539], (-1.008,-1], (-1,-0.9922], (0.9914,1], (1,1.009] ) testintvl -as.numeric(gsubfn(\\((-?[[:digit:]]+.?[[:digit:]]*),(-?[[:digit:]]+.?[[:digit:]]*)\\], + ~ (as.numeric(x)+as.numeric(y))/2, testvec)) testintvl [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961 0.9957 1.0045 I offer this to those who may feel regex challenged (as I often do). The gsubfn function is pretty slick. I don't see an author listed for the function, but the author of the package documents is Gabor Grothendieck. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cut intervals (character) to numeric midpoint; regex problem
I'm sitting here chuckling. Your solution is just so pure. I would offer an enhancement. When I tested with my cuts that had - before the digits, you solution dropped them, so my suggestion for the pattern would be: [-[:digit:].]+ I will admit that I thought it might fail with positive numbers but it does not seem to: interv - strapply(testvec, [-[:digit:].]+, as.numeric, simplify = TRUE) interv [,1] [,2] [,3] [,4] [,5] [,6] [,7][,8] [, 9] [,10] [1,] -8.616 -3.084 -2.876 -2.756 -2.668 -2.597 -1.008 -1. 0.9914 1.000 [2,] -3.084 -2.876 -2.756 -2.668 -2.597 -2.539 -1.000 -0.9922 1. 1.009 I was not able to get that pattern to give acceptable results in gsubfn, so I obviously need to study this more closely. -- David. On Dec 1, 2009, at 2:47 PM, Gabor Grothendieck wrote: You also might want to look at demo(gsubfn-cut) On Tue, Dec 1, 2009 at 2:41 PM, David Winsemius dwinsem...@comcast.net wrote: Starting with the head of a 499 element matrix whose column names are now the labels trom a cut() operation, I needed to get to a vector of midpoints to serve as the basis for plotting a calibration curve ( exp(linear predictor) vs. : dput(head(dimnames(mtcal)[2][[1]])) # was starting point testvec - c((-8.616,-3.084], (-3.084,-2.876], (-2.876,-2.756], (-2.756,-2.668], (-2.668,-2.597], (-2.597,-2.539]) I started this message with the thought of requesting an answer but kept asking myself if I really had check the docs and tested my understanding. I eventually solved it using the gsubfn from the gsubfn package: testintvl -as.numeric(gsubfn(\\((-?[[:digit:]]+.?[[:digit:]]*), (-?[[:digit:]]+.?[[:digit:]]*)\\], ~ (as.numeric(x)+as.numeric(y))/2, testvec)) # I did discover that carriage returns in the middle of the pattern will not give desired results, so if this is broken by your mail- client, be sure to rejoin in the console. The extra ?'s after the decimal point are in there because I had 4 NA's around the median linear predictor: dimnames(mtcal)[2][[1]][which(is.na(testintvl))] [1] (-1.008,-1] (-1,-0.9922] (0.9914,1] (1,1.009] So a better test vector would be: testvec - c((-8.616,-3.084], (-3.084,-2.876], (-2.876,-2.756], (-2.756,-2.668], (-2.668,-2.597], (-2.597,-2.539], (-1.008,-1], (-1,-0.9922], (0.9914,1], (1,1.009] ) testintvl -as.numeric(gsubfn(\\((-?[[:digit:]]+.?[[:digit:]]*), (-?[[:digit:]]+.?[[:digit:]]*)\\], + ~ (as.numeric(x)+as.numeric(y))/2, testvec)) testintvl [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961 0.9957 1.0045 I offer this to those who may feel regex challenged (as I often do). The gsubfn function is pretty slick. I don't see an author listed for the function, but the author of the package documents is Gabor Grothendieck. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cut intervals (character) to numeric midpoint; regex problem
Try this: library(gsubfn) strapply(testvec, [-+.0-9]+, as.numeric, simplify = ~ colMeans(cbind(...))) [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 On Tue, Dec 1, 2009 at 3:14 PM, David Winsemius dwinsem...@comcast.netwrote: I'm sitting here chuckling. Your solution is just so pure. I would offer an enhancement. When I tested with my cuts that had - before the digits, you solution dropped them, so my suggestion for the pattern would be: [-[:digit:].]+ I will admit that I thought it might fail with positive numbers but it does not seem to: interv - strapply(testvec, [-[:digit:].]+, as.numeric, simplify = TRUE) interv [,1] [,2] [,3] [,4] [,5] [,6] [,7][,8] [,9] [,10] [1,] -8.616 -3.084 -2.876 -2.756 -2.668 -2.597 -1.008 -1. 0.9914 1.000 [2,] -3.084 -2.876 -2.756 -2.668 -2.597 -2.539 -1.000 -0.9922 1. 1.009 I was not able to get that pattern to give acceptable results in gsubfn, so I obviously need to study this more closely. -- David. On Dec 1, 2009, at 2:47 PM, Gabor Grothendieck wrote: You also might want to look at demo(gsubfn-cut) On Tue, Dec 1, 2009 at 2:41 PM, David Winsemius dwinsem...@comcast.net wrote: Starting with the head of a 499 element matrix whose column names are now the labels trom a cut() operation, I needed to get to a vector of midpoints to serve as the basis for plotting a calibration curve ( exp(linear predictor) vs. : dput(head(dimnames(mtcal)[2][[1]])) # was starting point testvec - c((-8.616,-3.084], (-3.084,-2.876], (-2.876,-2.756], (-2.756,-2.668], (-2.668,-2.597], (-2.597,-2.539]) I started this message with the thought of requesting an answer but kept asking myself if I really had check the docs and tested my understanding. I eventually solved it using the gsubfn from the gsubfn package: testintvl -as.numeric(gsubfn(\\((-?[[:digit:]]+.?[[:digit:]]*), (-?[[:digit:]]+.?[[:digit:]]*)\\], ~ (as.numeric(x)+as.numeric(y))/2, testvec)) # I did discover that carriage returns in the middle of the pattern will not give desired results, so if this is broken by your mail-client, be sure to rejoin in the console. The extra ?'s after the decimal point are in there because I had 4 NA's around the median linear predictor: dimnames(mtcal)[2][[1]][which(is.na(testintvl))] [1] (-1.008,-1] (-1,-0.9922] (0.9914,1] (1,1.009] So a better test vector would be: testvec - c((-8.616,-3.084], (-3.084,-2.876], (-2.876,-2.756], (-2.756,-2.668], (-2.668,-2.597], (-2.597,-2.539], (-1.008,-1], (-1,-0.9922], (0.9914,1], (1,1.009] ) testintvl -as.numeric(gsubfn(\\((-?[[:digit:]]+.?[[:digit:]]*),(-?[[:digit:]]+.?[[:digit:]]*)\\], + ~ (as.numeric(x)+as.numeric(y))/2, testvec)) testintvl [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961 0.9957 1.0045 I offer this to those who may feel regex challenged (as I often do). The gsubfn function is pretty slick. I don't see an author listed for the function, but the author of the package documents is Gabor Grothendieck. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cut intervals (character) to numeric midpoint; regex problem
On Dec 1, 2009, at 3:28 PM, Gabor Grothendieck wrote: Try this: library(gsubfn) strapply(testvec, [-+.0-9]+, as.numeric, simplify = ~ colMeans(cbind(...))) [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 Very, nice. Also tried on some other valid (200,2) and invalid )2..2) numbers as it has worked as expected. It did not accept --2.597 but that hardly seems to be a plausible result from a cut operation. -- David. On Tue, Dec 1, 2009 at 3:14 PM, David Winsemius dwinsem...@comcast.net wrote: I'm sitting here chuckling. Your solution is just so pure. I would offer an enhancement. When I tested with my cuts that had - before the digits, you solution dropped them, so my suggestion for the pattern would be: [-[:digit:].]+ I will admit that I thought it might fail with positive numbers but it does not seem to: interv - strapply(testvec, [-[:digit:].]+, as.numeric, simplify = TRUE) interv [,1] [,2] [,3] [,4] [,5] [,6] [,7][,8] [, 9] [,10] [1,] -8.616 -3.084 -2.876 -2.756 -2.668 -2.597 -1.008 -1. 0.9914 1.000 [2,] -3.084 -2.876 -2.756 -2.668 -2.597 -2.539 -1.000 -0.9922 1. 1.009 I was not able to get that pattern to give acceptable results in gsubfn, so I obviously need to study this more closely. -- David. On Dec 1, 2009, at 2:47 PM, Gabor Grothendieck wrote: You also might want to look at demo(gsubfn-cut) On Tue, Dec 1, 2009 at 2:41 PM, David Winsemius dwinsem...@comcast.net wrote: Starting with the head of a 499 element matrix whose column names are now the labels trom a cut() operation, I needed to get to a vector of midpoints to serve as the basis for plotting a calibration curve ( exp(linear predictor) vs. : dput(head(dimnames(mtcal)[2][[1]])) # was starting point testvec - c((-8.616,-3.084], (-3.084,-2.876], (-2.876,-2.756], (-2.756,-2.668], (-2.668,-2.597], (-2.597,-2.539]) I started this message with the thought of requesting an answer but kept asking myself if I really had check the docs and tested my understanding. I eventually solved it using the gsubfn from the gsubfn package: testintvl -as.numeric(gsubfn(\\((-?[[:digit:]]+.?[[:digit:]]*), (-?[[:digit:]]+.?[[:digit:]]*)\\], ~ (as.numeric(x)+as.numeric(y))/2, testvec)) # I did discover that carriage returns in the middle of the pattern will not give desired results, so if this is broken by your mail- client, be sure to rejoin in the console. The extra ?'s after the decimal point are in there because I had 4 NA's around the median linear predictor: dimnames(mtcal)[2][[1]][which(is.na(testintvl))] [1] (-1.008,-1] (-1,-0.9922] (0.9914,1] (1,1.009] So a better test vector would be: testvec - c((-8.616,-3.084], (-3.084,-2.876], (-2.876,-2.756], (-2.756,-2.668], (-2.668,-2.597], (-2.597,-2.539], (-1.008,-1], (-1,-0.9922], (0.9914,1], (1,1.009] ) testintvl -as.numeric(gsubfn(\\((-?[[:digit:]]+.?[[:digit:]]*), (-?[[:digit:]]+.?[[:digit:]]*)\\], + ~ (as.numeric(x)+as.numeric(y))/2, testvec)) testintvl [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961 0.9957 1.0045 I offer this to those who may feel regex challenged (as I often do). The gsubfn function is pretty slick. I don't see an author listed for the function, but the author of the package documents is Gabor Grothendieck. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT David Winsemius, MD Heritage Laboratories West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.