Thanks Marc for your reply and detailed explanation. As you said, I also agree that, using stringr package I wont get anything really important, however I already have created a long code-book and now I do not want to change anything. However function names are here better meaningful.
I have one more query here. Does "\\1" mean that, I want to report the selected string (in place of replacing with something?) What are the other related things? Can you help me giving some online reference? Thanks, -----Original Message----- From: Marc Schwartz [mailto:marc_schwa...@me.com] Sent: 07 July 2011 21:54 To: Bogaso Christofer Cc: r-help@r-project.org Subject: Re: [R] Working with string On Jul 7, 2011, at 11:21 AM, Bogaso Christofer wrote: > Hi there, I have to extract some relevant portion from a defined > string, which is a mix of numeric and character. However this has > following > sequence: > > > > Some String - Some numerical - "c/C" (or "p/P") - then again some set > of numbers. > > > > Examples of such string is "fdahsdfcha163517253c463278643" or > "fdahsdfcha163517253C463278643" or "fdahsdfcha163517253P463278643", > "fdahsdfcha163517253p463278643" etc. > > > > I have tried using latest stringr package to accomplice that. Here is > my > try: > > > >> library(stringr) > >> str_extract("fdahsdfcha163517253c463278643", "[c]") > > [1] "c" > > > > But it seems that, above code fetching "c" from "fdahsdfcha" only. My > goal is to understand what is there between above 2 set of numbers, "C/c/P/p"? > Can somebody help me how to do that? I would like to use stringr > syntax because, I am already using lot of other functions from that. > Therefore if I can do it using that package then it would be good in terms of consistency. > > > > Thanks for your help. I don't use 'stringr', but you can get the desired result using ?gsub: x <- c("fdahsdfcha163517253c463278643", "fdahsdfcha163517253C463278643", "fdahsdfcha163517253P463278643", "fdahsdfcha163517253p463278643") > gsub(".+[0-9]+([cCpP])[0-9]+", "\\1", x) [1] "c" "C" "P" "p" The regex in the first argument tells gsub to find a sequence of any characters, followed by a sequence of numbers, followed a by single 'c', 'C', 'p' or 'P', finally followed by a sequence of numbers. Surrounding the [cCpP] in parens allows us to use a 'back reference' and return what is found within the parens using the "\\1" in the second argument. >From a brief review of the stringr manual, it looks like str_extract() supports the use of a regex for the pattern argument, but does not support the use of back references. It looks like str_replace_all() is a wrapper to gsub(), so you may want to look at that function and the examples for it. Thus, the syntax might be something like: str_replace_all(x, ".+[0-9]+([cCpP])[0-9]+", "\\1") and therefore, I am not sure what you are really saving by using it versus gsub() directly. HTH, Marc Schwartz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.