[R] matching problem
Hi R gurus I have a matching problem that I cant solve. I have tried multiple solutions and searched varius help-sites but I cant get it to work. This is the problem myexstrings = c("*AAA.AA","BBB BB","*.CCC.","**dd- d") what I want do do is to remove any non-characters in the beginning and everything else after the non-character symbol after the first set of characters so that the string becomes: c("AAA","BBB","CCC","dd") I can figure out the start, sub("^\\W*","", myexstrings,perl=T) will remove the unwanted beginnings but then its the rest. And please no links to any helppages, I have been looking at most of them for the last hour without any success. Thanks Regards Tom -- View this message in context: http://www.nabble.com/matching-problem-tp18152158p18152158.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matching problem
I have a matching problem that I cant solve. mystring = "xxx{XX}yy{YYY}zzz{Z}" where "x","X","y","Y","z","Z" basiclly can be anything, letters, digits etc. I'm only interested in the content within each "{}". I am close but not really there yet. library(gsubfn) strapply(mystring,"\\{[^\\}]+",, perl=F) gives me [[1]] [1] "{XX" "{YYY" "{Z" but what should I add in the code to remove the "{" in the answer Regards Tom -- View this message in context: http://www.nabble.com/matching-problem-tp18850718p18850718.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matching Problem
Hi I have this vector of strings. MyData <- c("Test1","Test2","I(Test1^2)","I(Test2^3)","I(Test1.Test2^2)") where I want to extract only the text after "I(" and before "^" so that the string returned only contain c("Test1","Test2","Test1.Test2") I am not very skilled in the use of matching patterns so bare with me but I belive I should use gsub('^.\\(', "",MyData) for removing the "I(" and gsub("\\^.+", '',MyData) for the end. but theres got to be a more elegant way that does the trick in one go. So I would appriciate I anyone could give me some advice. Thanks Tom -- View this message in context: http://www.nabble.com/Matching-Problem-tp15430660p15430660.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching problem
On 27 Jun 2008, at 12:23, Tom.O wrote: Hi R gurus I have a matching problem that I cant solve. I have tried multiple solutions and searched varius help-sites but I cant get it to work. This is the problem myexstrings = c("*AAA.AA","BBB BB","*.CCC.","**dd- d") what I want do do is to remove any non-characters in the beginning and everything else after the non-character symbol after the first set of characters so that the string becomes: c("AAA","BBB","CCC","dd") I can figure out the start, sub("^\\W*","", myexstrings,perl=T) will remove the unwanted beginnings but then its the rest. Try gsub("\\W*","", myexstrings,perl=T) Cheers, --Hans __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching problem
Well I have tried that and it's unfortuanally not the solution. This return all the characters in the string, but I dont want the characters after the ending non-character symbol. Only the starting characters ore of interest. > gsub("\\W*","", myexstrings,perl=T) [1] "A" "B" "CCC" "ddd" Regards Tom Hans-Jörg Bibiko wrote: > > > On 27 Jun 2008, at 12:23, Tom.O wrote: > >> >> Hi R gurus >> I have a matching problem that I cant solve. I have tried multiple >> solutions >> and searched varius help-sites but I cant get it to work. >> >> This is the problem >> myexstrings = c("*AAA.AA","BBB BB","*.CCC.","**dd- d") >> >> what I want do do is to remove any non-characters in the beginning and >> everything else after the non-character symbol after the first set of >> characters so that the string becomes: >> >> c("AAA","BBB","CCC","dd") >> >> >> I can figure out the start, sub("^\\W*","", myexstrings,perl=T) will >> remove >> the unwanted beginnings but then its the rest. > > Try > > gsub("\\W*","", myexstrings,perl=T) > > Cheers, > > --Hans > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/matching-problem-tp18152158p18153583.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching problem
this should do what you want: > myexstrings = c("*AAA.AA","BBB BB","*.CCC.","**dd- d") > a = gsub("^\\W*","", myexstrings,perl=T) > b = gsub("\\W.*", "", a, perl=T) > b [1] "AAA" "BBB" "CCC" "dd" first one, removes any non-word characters from the beginning (as you already figured out) second one, removes any remaining non-word characters AND everything following. on 06/27/2008 06:23 AM Tom.O said the following: Hi R gurus I have a matching problem that I cant solve. I have tried multiple solutions and searched varius help-sites but I cant get it to work. This is the problem myexstrings = c("*AAA.AA","BBB BB","*.CCC.","**dd- d") what I want do do is to remove any non-characters in the beginning and everything else after the non-character symbol after the first set of characters so that the string becomes: c("AAA","BBB","CCC","dd") I can figure out the start, sub("^\\W*","", myexstrings,perl=T) will remove the unwanted beginnings but then its the rest. And please no links to any helppages, I have been looking at most of them for the last hour without any success. Thanks Regards Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching problem
On 27 Jun 2008, at 13:56, Tom.O wrote: Well I have tried that and it's unfortuanally not the solution. This return all the characters in the string, but I dont want the characters after the ending non-character symbol. Only the starting characters ore of interest. gsub("\\W*","", myexstrings,perl=T) [1] "A" "B" "CCC" "ddd" Oops, try this one: gsub("^\\W*(\\w+)\\W.*","\\1", myexstrings,perl=T) --Hans __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching problem
Thanks guys, all of you. You have just made this weekend a much more happier weekend. Regards Tom Hans-Jörg Bibiko wrote: > > > On 27 Jun 2008, at 13:56, Tom.O wrote: > >> >> Well I have tried that and it's unfortuanally not the solution. >> This return all the characters in the string, but I dont want the >> characters >> after the ending non-character symbol. Only the starting characters >> ore of >> interest. >> >>> gsub("\\W*","", myexstrings,perl=T) >> [1] "A" "B" "CCC" "ddd" >> > > > Oops, > > try this one: > > gsub("^\\W*(\\w+)\\W.*","\\1", myexstrings,perl=T) > > --Hans > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/matching-problem-tp18152158p18156086.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching problem
Here is a solution using strapply from the gsubfn package: library(gsubfn) strapply(myexstrings, "(\\w+).*", backref = -1, simplify = c) It matches the first string of word characters following by anything else and then returns the first backreference in each match, i.e. the portion within parentheses, simplifying it all into a character vector (rather than a list). On Fri, Jun 27, 2008 at 6:23 AM, Tom.O <[EMAIL PROTECTED]> wrote: > > Hi R gurus > I have a matching problem that I cant solve. I have tried multiple solutions > and searched varius help-sites but I cant get it to work. > > This is the problem > myexstrings = c("*AAA.AA","BBB BB","*.CCC.","**dd- d") > > what I want do do is to remove any non-characters in the beginning and > everything else after the non-character symbol after the first set of > characters so that the string becomes: > > c("AAA","BBB","CCC","dd") > > > I can figure out the start, sub("^\\W*","", myexstrings,perl=T) will remove > the unwanted beginnings but then its the rest. > > And please no links to any helppages, I have been looking at most of them > for the last hour without any success. > > Thanks > Regards > Tom > > -- > View this message in context: > http://www.nabble.com/matching-problem-tp18152158p18152158.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching problem
One option is: strapply(mystring, "\\{[^\\}]+" , function(x)gsub("\\{", "", x), perl = F) On Wed, Aug 6, 2008 at 10:01 AM, Tom.O <[EMAIL PROTECTED]> wrote: > > I have a matching problem that I cant solve. > mystring = "xxx{XX}yy{YYY}zzz{Z}" where "x","X","y","Y","z","Z" basiclly can > be anything, letters, digits etc. I'm only interested in the content within > each "{}". > > I am close but not really there yet. > > library(gsubfn) > strapply(mystring,"\\{[^\\}]+",, perl=F) > > gives me > [[1]] > [1] "{XX" "{YYY" "{Z" > > but what should I add in the code to remove the "{" in the answer > > Regards Tom > -- > View this message in context: > http://www.nabble.com/matching-problem-tp18850718p18850718.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching problem
On Wed, Aug 6, 2008 at 9:01 AM, Tom.O <[EMAIL PROTECTED]> wrote: > > I have a matching problem that I cant solve. > mystring = "xxx{XX}yy{YYY}zzz{Z}" where "x","X","y","Y","z","Z" basiclly can > be anything, letters, digits etc. I'm only interested in the content within > each "{}". > > I am close but not really there yet. > > library(gsubfn) > strapply(mystring,"\\{[^\\}]+",, perl=F) > > gives me > [[1]] > [1] "{XX" "{YYY" "{Z" > > but what should I add in the code to remove the "{" in the answer > Surround the portion you want with parentheses. That makes it a backreference and you can ask for the backreference rather than the entire match. -1 means return 1 backreference but not the entire match. strapply(mystring,"\\{([^\\}]+)", backref = -1) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matching Problem
> sub("^I\\((.*)\\^.*$", "\\1", MyData) [1] "Test1" "Test2" "Test1" "Test2" "Test1.Test2" In this case, there is a simple way of discovering which variable names are present, though > all.vars(parse(text = MyData)) [1] "Test1" "Test2" "Test1.Test2" Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom.O Sent: Tuesday, 12 February 2008 8:44 PM To: r-help@r-project.org Subject: [R] Matching Problem Hi I have this vector of strings. MyData <- c("Test1","Test2","I(Test1^2)","I(Test2^3)","I(Test1.Test2^2)") where I want to extract only the text after "I(" and before "^" so that the string returned only contain c("Test1","Test2","Test1.Test2") I am not very skilled in the use of matching patterns so bare with me but I belive I should use gsub('^.\\(', "",MyData) for removing the "I(" and gsub("\\^.+", '',MyData) for the end. but theres got to be a more elegant way that does the trick in one go. So I would appriciate I anyone could give me some advice. Thanks Tom -- View this message in context: http://www.nabble.com/Matching-Problem-tp15430660p15430660.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matching Problem
Here is one way of doing it: > MyData <- c("Test1","Test2","I(Test1^2)","I(Test2^3)","I(Test1.Test2^2)") > x <- gsub("^(.*\\(|)([^^)]*|.*).*", "\\2", MyData) > x [1] "Test1" "Test2" "Test1" "Test2" "Test1.Test2" > unique(x) [1] "Test1" "Test2" "Test1.Test2" > On Feb 12, 2008 5:44 AM, Tom.O <[EMAIL PROTECTED]> wrote: > > Hi > > I have this vector of strings. > > MyData <- c("Test1","Test2","I(Test1^2)","I(Test2^3)","I(Test1.Test2^2)") > where I want to extract only the text after "I(" and before "^" so that the > string returned only contain c("Test1","Test2","Test1.Test2") > > I am not very skilled in the use of matching patterns so bare with me but I > belive I should use gsub('^.\\(', "",MyData) for removing the "I(" and > gsub("\\^.+", '',MyData) for the end. but theres got to be a more elegant > way that does the trick in one go. > > So I would appriciate I anyone could give me some advice. > > Thanks Tom > -- > View this message in context: > http://www.nabble.com/Matching-Problem-tp15430660p15430660.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matching Problem
A way to do it is to use groups (in perl terminology) in connection with regular expressions. My (limited) understanding of it is as follows: Consider > s <-"BBBEEE" > gsub("BBB(.*)EEE(.*)", "\\1AAA\\2\\", s) [1] "AAA" > The terms in the parentheses are groups which you can refer to with \\1 \\2 etc in the replacement string. So, a solution to your problem could be: > MyData <- c("Test1","Test2","I(Test1^2)","I(Test2^3)","I(Test1.Test2^2)") > gsub("^I\\((.*)\\^.+", "\\1\\", MyData) [1] "Test1" "Test2" "Test1" "Test2" "Test1.Test2" Now use unique on the result. Regards Søren Fra: [EMAIL PROTECTED] på vegne af Tom.O Sendt: ti 12-02-2008 11:44 Til: r-help@r-project.org Emne: [R] Matching Problem Hi I have this vector of strings. MyData <- c("Test1","Test2","I(Test1^2)","I(Test2^3)","I(Test1.Test2^2)") where I want to extract only the text after "I(" and before "^" so that the string returned only contain c("Test1","Test2","Test1.Test2") I am not very skilled in the use of matching patterns so bare with me but I belive I should use gsub('^.\\(', "",MyData) for removing the "I(" and gsub("\\^.+", '',MyData) for the end. but theres got to be a more elegant way that does the trick in one go. So I would appriciate I anyone could give me some advice. Thanks Tom -- View this message in context: http://www.nabble.com/Matching-Problem-tp15430660p15430660.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matching Problem
Maybe: all.vars(parse(text=paste(MyData, collapse="+"))) On 12/02/2008, Tom.O <[EMAIL PROTECTED]> wrote: > > Hi > > I have this vector of strings. > > MyData <- c("Test1","Test2","I(Test1^2)","I(Test2^3)","I(Test1.Test2^2)") > where I want to extract only the text after "I(" and before "^" so that the > string returned only contain c("Test1","Test2","Test1.Test2") > > I am not very skilled in the use of matching patterns so bare with me but I > belive I should use gsub('^.\\(', "",MyData) for removing the "I(" and > gsub("\\^.+", '',MyData) for the end. but theres got to be a more elegant > way that does the trick in one go. > > So I would appriciate I anyone could give me some advice. > > Thanks Tom > -- > View this message in context: > http://www.nabble.com/Matching-Problem-tp15430660p15430660.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matching Problem
See this post: https://stat.ethz.ch/pipermail/r-help/2008-February/153819.html as well as the rest of that thread. On Feb 12, 2008 5:44 AM, Tom.O <[EMAIL PROTECTED]> wrote: > > Hi > > I have this vector of strings. > > MyData <- c("Test1","Test2","I(Test1^2)","I(Test2^3)","I(Test1.Test2^2)") > where I want to extract only the text after "I(" and before "^" so that the > string returned only contain c("Test1","Test2","Test1.Test2") > > I am not very skilled in the use of matching patterns so bare with me but I > belive I should use gsub('^.\\(', "",MyData) for removing the "I(" and > gsub("\\^.+", '',MyData) for the end. but theres got to be a more elegant > way that does the trick in one go. > > So I would appriciate I anyone could give me some advice. > > Thanks Tom > -- > View this message in context: > http://www.nabble.com/Matching-Problem-tp15430660p15430660.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matching Problem
unique(gsub("^.*\\((.+)\\^.*", "\\1", MyData)) ? b On Feb 12, 2008, at 5:44 AM, Tom.O wrote: Hi I have this vector of strings. MyData <- c("Test1","Test2","I(Test1^2)","I(Test2^3)","I(Test1.Test2^2)") where I want to extract only the text after "I(" and before "^" so that the string returned only contain c("Test1","Test2","Test1.Test2") I am not very skilled in the use of matching patterns so bare with me but I belive I should use gsub('^.\\(', "",MyData) for removing the "I(" and gsub("\\^.+", '',MyData) for the end. but theres got to be a more elegant way that does the trick in one go. So I would appriciate I anyone could give me some advice. Thanks Tom -- View this message in context: http://www.nabble.com/Matching-Problem-tp15430660p15430660.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.