On Sat, Oct 30, 2010 at 9:43 AM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Oct 30, 2010, at 8:42 AM, Gabor Grothendieck wrote: > >> On Fri, Oct 29, 2010 at 6:54 PM, M.Ribeiro <mresende...@yahoo.com.br> >> wrote: >>> >>> So, I am having a tricky reference file to extract information from. >>> >>> The format of the file is >>> >>> x 1 + 4 * 3 + 5 + 6 + 11 * 0.5 >>> >>> So, the elements that are not being multiplied (1, 5 and 6) and the >>> elements >>> before the multiplication sign (4 and 11) means actually the reference >>> for >>> the row in a matrix where I need to extract the element from. >>> >>> The numbers after the multiplication sign are regular numbers >>> Ex: >>> >>>> x<-matrix(20:35) >>> >>> I would like to read the rows 1,4,5,6 and 11 and sum then. However the >>> numbers in the elements row 4 and 11 are multiplied by 3 and 0.5 >>> >>> So it would be >>> 20 + 23 * 3 + 24 + 25 + 30 * 0.5. >>> >>> And I have this format in different files so I can't do all by hand. >>> Can anybody help me with a script that can differentiate this? >> >> >> I assume that every number except for the second number in the pattern >> number * number is to be replaced by that row number in x. Try this. >> We define a regular expression which matches the first number ([0-9]+) >> of each potential pair and optionally (?) spaces ( *) a star (\\*), >> more spaces ( *) and digits [0-9.]+ passing the first and second >> backreferences (matches to the parenthesized portions of the regular >> expression) to f and inserting the output of f where the matches had >> been. >> >> library(gsubfn) >> f <- function(a, b) paste(x[as.numeric(a)], b) >> s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s) >> >> If the objective is to then perform the calculation that that >> represents then try this: >> sapply(s2, function(x) eval(parse(text = x))) >> >> For example, >> >>> s <- c("1 + 4 * 3 + 5 + 6 + 11 * 0.5", "1 + 4 * 3 + 5 + 6 + 11 * 0.5") >>> x <- matrix(20:35) >>> f <- function(a, b) paste(x[as.numeric(a)], b) >>> s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s) >>> s2 >> >> [1] "20 + 23 * 3 + 24 + 25 + 30 * 0.5" "20 + 23 * 3 + 24 + 25 + 30 >> * 0.5" >>> >>> sapply(s2, function(x) eval(parse(text = x))) >> >> 20 + 23 * 3 + 24 + 25 + 30 * 0.5 20 + 23 * 3 + 24 + 25 + 30 * >> 0.5 >> 153 153 >> >> For more see the gsubfn home page at http://gsubfn.googlecode.com > > > I am scratching my head regarding the gsubfn workings. It appears that as > gsubfn moves across the input strings that it will either match just > "[0-9+]" or it will match "[0-9+] *\\* *[0-9.]+?".
In the regular expression "([0-9]+)( *\\* *[0-9.]+)?" it matches the first (...) and then the (...)? part. ? means 0 or 1 occurrences so it can match by matching the content or if that is not possible it will match the empty string. > > In either case the match will do a lookup in x[] for the first match using > the "a" index, and if there is a match for the second position assigned to > "*b" then that x[a] will be followed by "*b" and is therefore destined to > be multiplied by "b". I cannot quite figure out how the NULL value gets > not-matched to the second back-reference and then doesn't screw up the f() > function by only providing one argument to a two argument function. Maybe > it's due to this? (So can you comment on how optional back-references return > values?) (...)? says to match 0 or 1 occurrences of the ... expression. Iif (...) does not match then (...)? will be successful in matching the empty string. The function is always called with two arguments. Try this: > s <- "1 + 4 * 3 + 5 + 6 + 11 * 0.5" > g <- function(a, b) sprintf("<a='%s'><b='%s'>", a, b) > gsubfn("([0-9]+)( *\\* *[0-9.]+)?", g, s) [1] "<a='1'><b=''> + <a='4'><b=' * 3'> + <a='5'><b=''> + <a='6'><b=''> + <a='11'><b=' * 0.5'>" -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.