Re: [R] Cut intervals (character) to numeric midpoint; regex problem

David Winsemius Tue, 01 Dec 2009 12:15:03 -0800

I'm sitting here chuckling. Your solution is just so "pure".

I would offer an enhancement. When I tested with my cuts that had "-"before the digits, you solution dropped them, so my suggestion for thepattern would be: "[-[:digit:].]+"

I will admit that I thought it might fail with positive numbers but itdoes not seem to:

> interv <- strapply(testvec, "[-[:digit:].]+", as.numeric, simplify= TRUE)

> interv

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10][1,] -8.616 -3.084 -2.876 -2.756 -2.668 -2.597 -1.008 -1.0000 0.99141.000[2,] -3.084 -2.876 -2.756 -2.668 -2.597 -2.539 -1.000 -0.9922 1.00001.009

I was not able to get that pattern to give acceptable results ingsubfn, so I obviously need to study this more closely.


--
David.

On Dec 1, 2009, at 2:47 PM, Gabor Grothendieck wrote:

You also might want to look at

demo("gsubfn-cut")
On Tue, Dec 1, 2009 at 2:41 PM, David Winsemius <dwinsem...@comcast.net> wrote:Starting with the head of a 499 element matrix whose column namesare now the labels trom a cut() operation, I needed to get to avector of midpoints to serve as the basis for plotting a calibrationcurve ( exp(linear predictor) vs. :
> dput(head(dimnames(mtcal)[2][[1]])) # was starting point
testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]","(-2.876,-2.756]", "(-2.756,-2.668]",
"(-2.668,-2.597]", "(-2.597,-2.539]")
I started this message with the thought of requesting an answer butkept asking myself if I really had check the docs and tested myunderstanding. I eventually solved it using the gsubfn from thegsubfn package:
testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),
(-?[[:digit:]]+.?[[:digit:]]*)\\]",
~ (as.numeric(x)+as.numeric(y))/2,  testvec))
# I did discover that carriage returns in the middle of the patternwill not give desired results, so if this is broken by your mail-client, be sure to rejoin in the console.
The extra "?"'s after the decimal point are in there because I had 4NA's around the median linear predictor:
> dimnames(mtcal)[2][[1]][which(is.na(testintvl))]
[1] "(-1.008,-1]"  "(-1,-0.9922]" "(0.9914,1]"   "(1,1.009]"

So a better test vector would be:
testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]","(-2.876,-2.756]", "(-2.756,-2.668]","(-2.668,-2.597]", "(-2.597,-2.539]", "(-1.008,-1]","(-1,-0.9922]", "(0.9914,1]", "(1,1.009]" )
> testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),(-?[[:digit:]]+.?[[:digit:]]*)\\]",
+ ~ (as.numeric(x)+as.numeric(y))/2,  testvec))

> testintvl
[1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040-0.9961 0.9957 1.0045
I offer this to those who may feel regex challenged (as I often do).The gsubfn function is pretty slick. I don't see an author listedfor the function, but the author of the package documents is GaborGrothendieck.
--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cut intervals (character) to numeric midpoint; regex problem

Reply via email to