Re: [R] Differenciate numbers from reference for rows
On Sat, Oct 30, 2010 at 9:43 AM, David Winsemius wrote: > > On Oct 30, 2010, at 8:42 AM, Gabor Grothendieck wrote: > >> On Fri, Oct 29, 2010 at 6:54 PM, M.Ribeiro >> wrote: >>> >>> So, I am having a tricky reference file to extract information from. >>> >>> The format of the file is >>> >>> x 1 + 4 * 3 + 5 + 6 + 11 * 0.5 >>> >>> So, the elements that are not being multiplied (1, 5 and 6) and the >>> elements >>> before the multiplication sign (4 and 11) means actually the reference >>> for >>> the row in a matrix where I need to extract the element from. >>> >>> The numbers after the multiplication sign are regular numbers >>> Ex: >>> x<-matrix(20:35) >>> >>> I would like to read the rows 1,4,5,6 and 11 and sum then. However the >>> numbers in the elements row 4 and 11 are multiplied by 3 and 0.5 >>> >>> So it would be >>> 20 + 23 * 3 + 24 + 25 + 30 * 0.5. >>> >>> And I have this format in different files so I can't do all by hand. >>> Can anybody help me with a script that can differentiate this? >> >> >> I assume that every number except for the second number in the pattern >> number * number is to be replaced by that row number in x. Try this. >> We define a regular expression which matches the first number ([0-9]+) >> of each potential pair and optionally (?) spaces ( *) a star (\\*), >> more spaces ( *) and digits [0-9.]+ passing the first and second >> backreferences (matches to the parenthesized portions of the regular >> expression) to f and inserting the output of f where the matches had >> been. >> >> library(gsubfn) >> f <- function(a, b) paste(x[as.numeric(a)], b) >> s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s) >> >> If the objective is to then perform the calculation that that >> represents then try this: >> sapply(s2, function(x) eval(parse(text = x))) >> >> For example, >> >>> s <- c("1 + 4 * 3 + 5 + 6 + 11 * 0.5", "1 + 4 * 3 + 5 + 6 + 11 * 0.5") >>> x <- matrix(20:35) >>> f <- function(a, b) paste(x[as.numeric(a)], b) >>> s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s) >>> s2 >> >> [1] "20 + 23 * 3 + 24 + 25 + 30 * 0.5" "20 + 23 * 3 + 24 + 25 + 30 >> * 0.5" >>> >>> sapply(s2, function(x) eval(parse(text = x))) >> >> 20 + 23 * 3 + 24 + 25 + 30 * 0.5 20 + 23 * 3 + 24 + 25 + 30 * >> 0.5 >> 153 153 >> >> For more see the gsubfn home page at http://gsubfn.googlecode.com > > > I am scratching my head regarding the gsubfn workings. It appears that as > gsubfn moves across the input strings that it will either match just > "[0-9+]" or it will match "[0-9+] *\\* *[0-9.]+?". In the regular expression "([0-9]+)( *\\* *[0-9.]+)?" it matches the first (...) and then the (...)? part. ? means 0 or 1 occurrences so it can match by matching the content or if that is not possible it will match the empty string. > > In either case the match will do a lookup in x[] for the first match using > the "a" index, and if there is a match for the second position assigned to > "*b" then that x[a] will be followed by "*b" and is therefore destined to > be multiplied by "b". I cannot quite figure out how the NULL value gets > not-matched to the second back-reference and then doesn't screw up the f() > function by only providing one argument to a two argument function. Maybe > it's due to this? (So can you comment on how optional back-references return > values?) (...)? says to match 0 or 1 occurrences of the ... expression. Iif (...) does not match then (...)? will be successful in matching the empty string. The function is always called with two arguments. Try this: > s <- "1 + 4 * 3 + 5 + 6 + 11 * 0.5" > g <- function(a, b) sprintf("", a, b) > gsubfn("([0-9]+)( *\\* *[0-9.]+)?", g, s) [1] " + + + + " -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Differenciate numbers from reference for rows
On Oct 30, 2010, at 8:42 AM, Gabor Grothendieck wrote: On Fri, Oct 29, 2010 at 6:54 PM, M.Ribeiro wrote: So, I am having a tricky reference file to extract information from. The format of the file is x 1 + 4 * 3 + 5 + 6 + 11 * 0.5 So, the elements that are not being multiplied (1, 5 and 6) and the elements before the multiplication sign (4 and 11) means actually the reference for the row in a matrix where I need to extract the element from. The numbers after the multiplication sign are regular numbers Ex: x<-matrix(20:35) I would like to read the rows 1,4,5,6 and 11 and sum then. However the numbers in the elements row 4 and 11 are multiplied by 3 and 0.5 So it would be 20 + 23 * 3 + 24 + 25 + 30 * 0.5. And I have this format in different files so I can't do all by hand. Can anybody help me with a script that can differentiate this? I assume that every number except for the second number in the pattern number * number is to be replaced by that row number in x. Try this. We define a regular expression which matches the first number ([0-9]+) of each potential pair and optionally (?) spaces ( *) a star (\\*), more spaces ( *) and digits [0-9.]+ passing the first and second backreferences (matches to the parenthesized portions of the regular expression) to f and inserting the output of f where the matches had been. library(gsubfn) f <- function(a, b) paste(x[as.numeric(a)], b) s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s) If the objective is to then perform the calculation that that represents then try this: sapply(s2, function(x) eval(parse(text = x))) For example, s <- c("1 + 4 * 3 + 5 + 6 + 11 * 0.5", "1 + 4 * 3 + 5 + 6 + 11 * 0.5") x <- matrix(20:35) f <- function(a, b) paste(x[as.numeric(a)], b) s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s) s2 [1] "20 + 23 * 3 + 24 + 25 + 30 * 0.5" "20 + 23 * 3 + 24 + 25 + 30 * 0.5" sapply(s2, function(x) eval(parse(text = x))) 20 + 23 * 3 + 24 + 25 + 30 * 0.5 20 + 23 * 3 + 24 + 25 + 30 * 0.5 153 153 For more see the gsubfn home page at http://gsubfn.googlecode.com I am scratching my head regarding the gsubfn workings. It appears that as gsubfn moves across the input strings that it will either match just "[0-9+]" or it will match "[0-9+] *\\* *[0-9.]+?". In either case the match will do a lookup in x[] for the first match using the "a" index, and if there is a match for the second position assigned to "*b" then that x[a] will be followed by "*b" and is therefore destined to be multiplied by "b". I cannot quite figure out how the NULL value gets not-matched to the second back-reference and then doesn't screw up the f() function by only providing one argument to a two argument function. Maybe it's due to this? (So can you comment on how optional back-references return values?) > paste("a", NULL) [1] "a " Furthermore, somehow (and this is further functiona magic I am missing) these results are concatenated in a string, and then evaluated, a step which I do get. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Differenciate numbers from reference for rows
On Fri, Oct 29, 2010 at 6:54 PM, M.Ribeiro wrote: > > So, I am having a tricky reference file to extract information from. > > The format of the file is > > x 1 + 4 * 3 + 5 + 6 + 11 * 0.5 > > So, the elements that are not being multiplied (1, 5 and 6) and the elements > before the multiplication sign (4 and 11) means actually the reference for > the row in a matrix where I need to extract the element from. > > The numbers after the multiplication sign are regular numbers > Ex: > >> x<-matrix(20:35) >> x > [,1] > [1,] 20 > [2,] 21 > [3,] 22 > [4,] 23 > [5,] 24 > [6,] 25 > [7,] 26 > [8,] 27 > [9,] 28 > [10,] 29 > [11,] 30 > [12,] 31 > [13,] 32 > [14,] 33 > [15,] 34 > [16,] 35 > > I would like to read the rows 1,4,5,6 and 11 and sum then. However the > numbers in the elements row 4 and 11 are multiplied by 3 and 0.5 > > So it would be > 20 + 23 * 3 + 24 + 25 + 30 * 0.5. > > And I have this format in different files so I can't do all by hand. > Can anybody help me with a script that can differentiate this? I assume that every number except for the second number in the pattern number * number is to be replaced by that row number in x. Try this. We define a regular expression which matches the first number ([0-9]+) of each potential pair and optionally (?) spaces ( *) a star (\\*), more spaces ( *) and digits [0-9.]+ passing the first and second backreferences (matches to the parenthesized portions of the regular expression) to f and inserting the output of f where the matches had been. library(gsubfn) f <- function(a, b) paste(x[as.numeric(a)], b) s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s) If the objective is to then perform the calculation that that represents then try this: sapply(s2, function(x) eval(parse(text = x))) For example, > s <- c("1 + 4 * 3 + 5 + 6 + 11 * 0.5", "1 + 4 * 3 + 5 + 6 + 11 * 0.5") > x <- matrix(20:35) > f <- function(a, b) paste(x[as.numeric(a)], b) > s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s) > s2 [1] "20 + 23 * 3 + 24 + 25 + 30 * 0.5" "20 + 23 * 3 + 24 + 25 + 30 * 0.5" > sapply(s2, function(x) eval(parse(text = x))) 20 + 23 * 3 + 24 + 25 + 30 * 0.5 20 + 23 * 3 + 24 + 25 + 30 * 0.5 153 153 For more see the gsubfn home page at http://gsubfn.googlecode.com -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Differenciate numbers from reference for rows
On Oct 29, 2010, at 11:16 PM, Dennis Murphy wrote: Hi: x <- matrix(20:35, ncol = 1) u <- c(1, 4, 5, 6, 11) # 'x values' m <- c(1, 3, 1, 1, 0.5) # Function to compute the inner product of the multipliers with the extracted # elements of x determined by u f <- function(mat, inputs, mults) crossprod(mat[inputs], mults) f(x, u, mults = c(1, 3, 1, 1, 0.5)) [,1] [1,] 153 20 + 23 * 3 + 24 + 25 + 30 * 0.5 [1] 153 The function is flexible enough to allow you to play with the input matrix (although a vector would also work), the 'observation vector' inputs and the set of multipliers. Here's one way (not necessarily the most efficient): uv <- matrix(sample(1:15, 25, replace = TRUE), ncol = 5) uv # like an X matrix, where each row provides the input values of the vars [,1] [,2] [,3] [,4] [,5] [1,] 128 11 10 15 [2,] 15 11 14 148 [3,]484 10 12 [4,] 105217 [5,] 11491 11 # Apply the function f to each row of uv: apply(uv, 1, function(y) f(x, y, mults = c(1, 3, 1, 1, 0.5))) [1] 188.0 203.5 171.5 155.0 162.0 The direct matrix version: crossprod(t(matrix(x[uv], ncol = 5)), c(1, 3, 1, 1, 0.5)) [,1] [1,] 188.0 [2,] 203.5 [3,] 171.5 [4,] 155.0 [5,] 162.0 Notice that the apply() call returns a vector whereas crossprod() returns a matrix. x[uv] selects the x values associated with the indices in uv and returns a vector in column-major order. The crossprod() call transposes the reshaped x[uv] and then 'matrix' multiplies it by the vector c(1, 3, 1, 1, 0.5). HTH, Dennis On Fri, Oct 29, 2010 at 3:54 PM, M.Ribeiro wrote: So, I am having a tricky reference file to extract information from. The format of the file is x 1 + 4 * 3 + 5 + 6 + 11 * 0.5 I saw the beginning of this task as parsing to extract the digits from a character string (possibly decimal digits in the case of the third and seventh positions) delimited by + and *: library(gsubfn) > x <- "1 + 4 * 3 + 5 + 6 + 11 * 0.5" xin <- readLines(textConnection(x)) xp <- strapply(xin, "^(\\d+) \\+ (\\d+) \\* (\\d+\\.*\\d*) \\+ (\\d +) \\+ (\\d+) \\+ (\\d+) \\* (\\d+\\.*\\d*)", c) sapply(xp, as.numeric) [,1] [1,] 1.0 [2,] 4.0 [3,] 3.0 [4,] 5.0 [5,] 6.0 [6,] 11.0 [7,] 0.5 -- David So, the elements that are not being multiplied (1, 5 and 6) and the elements before the multiplication sign (4 and 11) means actually the reference for the row in a matrix where I need to extract the element from. The numbers after the multiplication sign are regular numbers Ex: x<-matrix(20:35) x [,1] [1,] 20 [2,] 21 [3,] 22 [4,] 23 [5,] 24 [6,] 25 [7,] 26 [8,] 27 [9,] 28 [10,] 29 [11,] 30 [12,] 31 [13,] 32 [14,] 33 [15,] 34 [16,] 35 I would like to read the rows 1,4,5,6 and 11 and sum then. However the numbers in the elements row 4 and 11 are multiplied by 3 and 0.5 So it would be 20 + 23 * 3 + 24 + 25 + 30 * 0.5. And I have this format in different files so I can't do all by hand. Can anybody help me with a script that can differentiate this? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Differenciate-numbers-from-reference-for-rows-tp3019853p3019853.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Differenciate numbers from reference for rows
Hi: x <- matrix(20:35, ncol = 1) u <- c(1, 4, 5, 6, 11) # 'x values' m <- c(1, 3, 1, 1, 0.5) # Function to compute the inner product of the multipliers with the extracted # elements of x determined by u f <- function(mat, inputs, mults) crossprod(mat[inputs], mults) f(x, u, mults = c(1, 3, 1, 1, 0.5)) [,1] [1,] 153 20 + 23 * 3 + 24 + 25 + 30 * 0.5 [1] 153 The function is flexible enough to allow you to play with the input matrix (although a vector would also work), the 'observation vector' inputs and the set of multipliers. Here's one way (not necessarily the most efficient): uv <- matrix(sample(1:15, 25, replace = TRUE), ncol = 5) uv # like an X matrix, where each row provides the input values of the vars [,1] [,2] [,3] [,4] [,5] [1,] 128 11 10 15 [2,] 15 11 14 148 [3,]484 10 12 [4,] 105217 [5,] 11491 11 # Apply the function f to each row of uv: apply(uv, 1, function(y) f(x, y, mults = c(1, 3, 1, 1, 0.5))) [1] 188.0 203.5 171.5 155.0 162.0 The direct matrix version: crossprod(t(matrix(x[uv], ncol = 5)), c(1, 3, 1, 1, 0.5)) [,1] [1,] 188.0 [2,] 203.5 [3,] 171.5 [4,] 155.0 [5,] 162.0 Notice that the apply() call returns a vector whereas crossprod() returns a matrix. x[uv] selects the x values associated with the indices in uv and returns a vector in column-major order. The crossprod() call transposes the reshaped x[uv] and then 'matrix' multiplies it by the vector c(1, 3, 1, 1, 0.5). HTH, Dennis On Fri, Oct 29, 2010 at 3:54 PM, M.Ribeiro wrote: > > So, I am having a tricky reference file to extract information from. > > The format of the file is > > x 1 + 4 * 3 + 5 + 6 + 11 * 0.5 > > So, the elements that are not being multiplied (1, 5 and 6) and the > elements > before the multiplication sign (4 and 11) means actually the reference for > the row in a matrix where I need to extract the element from. > > The numbers after the multiplication sign are regular numbers > Ex: > > > x<-matrix(20:35) > > x > [,1] > [1,] 20 > [2,] 21 > [3,] 22 > [4,] 23 > [5,] 24 > [6,] 25 > [7,] 26 > [8,] 27 > [9,] 28 > [10,] 29 > [11,] 30 > [12,] 31 > [13,] 32 > [14,] 33 > [15,] 34 > [16,] 35 > > I would like to read the rows 1,4,5,6 and 11 and sum then. However the > numbers in the elements row 4 and 11 are multiplied by 3 and 0.5 > > So it would be > 20 + 23 * 3 + 24 + 25 + 30 * 0.5. > > And I have this format in different files so I can't do all by hand. > Can anybody help me with a script that can differentiate this? > Thanks > -- > View this message in context: > http://r.789695.n4.nabble.com/Differenciate-numbers-from-reference-for-rows-tp3019853p3019853.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Differenciate numbers from reference for rows
So, I am having a tricky reference file to extract information from. The format of the file is x 1 + 4 * 3 + 5 + 6 + 11 * 0.5 So, the elements that are not being multiplied (1, 5 and 6) and the elements before the multiplication sign (4 and 11) means actually the reference for the row in a matrix where I need to extract the element from. The numbers after the multiplication sign are regular numbers Ex: > x<-matrix(20:35) > x [,1] [1,] 20 [2,] 21 [3,] 22 [4,] 23 [5,] 24 [6,] 25 [7,] 26 [8,] 27 [9,] 28 [10,] 29 [11,] 30 [12,] 31 [13,] 32 [14,] 33 [15,] 34 [16,] 35 I would like to read the rows 1,4,5,6 and 11 and sum then. However the numbers in the elements row 4 and 11 are multiplied by 3 and 0.5 So it would be 20 + 23 * 3 + 24 + 25 + 30 * 0.5. And I have this format in different files so I can't do all by hand. Can anybody help me with a script that can differentiate this? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Differenciate-numbers-from-reference-for-rows-tp3019853p3019853.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.