Re: [R] Differenciate numbers from reference for rows

2010-10-30 Thread Gabor Grothendieck
On Sat, Oct 30, 2010 at 9:43 AM, David Winsemius  wrote:
>
> On Oct 30, 2010, at 8:42 AM, Gabor Grothendieck wrote:
>
>> On Fri, Oct 29, 2010 at 6:54 PM, M.Ribeiro 
>> wrote:
>>>
>>> So, I am having a tricky reference file to extract information from.
>>>
>>> The format of the file is
>>>
>>> x   1 + 4 * 3 + 5 + 6 + 11 * 0.5
>>>
>>> So, the elements that are not being multiplied (1, 5 and 6) and the
>>> elements
>>> before the multiplication sign (4 and 11) means actually the reference
>>> for
>>> the row in a matrix where I need to extract the element from.
>>>
>>> The numbers after the multiplication sign are regular numbers
>>> Ex:
>>>
 x<-matrix(20:35)
>>>
>>> I would like to read the rows 1,4,5,6 and 11 and sum then. However the
>>> numbers in the elements row 4 and 11 are multiplied by 3 and 0.5
>>>
>>> So it would be
>>> 20 + 23 * 3 + 24 + 25 + 30 * 0.5.
>>>
>>> And I have this format in different files so I can't do all by hand.
>>> Can anybody help me with a script that can differentiate this?
>>
>>
>> I assume that every number except for the second number in the pattern
>> number * number is to be replaced by that row number in x.  Try this.
>> We define a regular expression which matches the first number ([0-9]+)
>> of each potential pair and optionally (?) spaces ( *) a star (\\*),
>> more spaces ( *) and digits [0-9.]+ passing the first and second
>> backreferences (matches to the parenthesized portions of the regular
>> expression) to f and inserting the output of f where the matches had
>> been.
>>
>> library(gsubfn)
>> f <- function(a, b) paste(x[as.numeric(a)], b)
>> s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s)
>>
>> If the objective is to then perform the calculation that that
>> represents then try this:
>> sapply(s2, function(x) eval(parse(text = x)))
>>
>> For example,
>>
>>> s <- c("1 + 4 * 3 + 5 + 6 + 11 * 0.5", "1 + 4 * 3 + 5 + 6 + 11 * 0.5")
>>> x <- matrix(20:35)
>>> f <- function(a, b) paste(x[as.numeric(a)], b)
>>> s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s)
>>> s2
>>
>> [1] "20  + 23  * 3 + 24  + 25  + 30  * 0.5" "20  + 23  * 3 + 24  + 25 + 30
>>  * 0.5"
>>>
>>> sapply(s2, function(x) eval(parse(text = x)))
>>
>> 20  + 23  * 3 + 24  + 25  + 30  * 0.5 20  + 23  * 3 + 24  + 25  + 30  *
>> 0.5
>>                                 153                                   153
>>
>> For more see the gsubfn home page at http://gsubfn.googlecode.com
>
>
> I am scratching my head regarding the gsubfn workings. It appears that as
> gsubfn moves across the input strings that it will either match just
> "[0-9+]" or it will match "[0-9+] *\\* *[0-9.]+?".

In the regular expression

   "([0-9]+)( *\\* *[0-9.]+)?"

it matches the first (...) and then the (...)?  part.  ? means 0 or 1
occurrences so it can match by matching the content or if that is not
possible it will match the empty string.

>
> In either case the match will do a lookup in x[] for the first match using
> the "a" index, and if there is a match for the second position assigned to
> "*b" then that x[a] will be followed by "*b"  and is therefore destined to
> be multiplied by "b". I cannot quite figure out how the NULL value gets
> not-matched to the second back-reference and then doesn't screw up the f()
> function by only providing one argument to a two argument function. Maybe
> it's due to this? (So can you comment on how optional back-references return
> values?)

(...)? says to match 0 or 1 occurrences of the ... expression.  Iif
(...) does not match then (...)? will be successful in matching the
empty string.  The function is always called with two arguments.  Try
this:

> s <- "1 + 4 * 3 + 5 + 6 + 11 * 0.5"
> g <- function(a, b) sprintf("", a, b)
> gsubfn("([0-9]+)( *\\* *[0-9.]+)?", g, s)
[1] " +  +  + 
+ "



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Differenciate numbers from reference for rows

2010-10-30 Thread David Winsemius


On Oct 30, 2010, at 8:42 AM, Gabor Grothendieck wrote:

On Fri, Oct 29, 2010 at 6:54 PM, M.Ribeiro  
 wrote:


So, I am having a tricky reference file to extract information from.

The format of the file is

x   1 + 4 * 3 + 5 + 6 + 11 * 0.5

So, the elements that are not being multiplied (1, 5 and 6) and the  
elements
before the multiplication sign (4 and 11) means actually the  
reference for

the row in a matrix where I need to extract the element from.

The numbers after the multiplication sign are regular numbers
Ex:


x<-matrix(20:35)


I would like to read the rows 1,4,5,6 and 11 and sum then. However  
the

numbers in the elements row 4 and 11 are multiplied by 3 and 0.5

So it would be
20 + 23 * 3 + 24 + 25 + 30 * 0.5.

And I have this format in different files so I can't do all by hand.
Can anybody help me with a script that can differentiate this?



I assume that every number except for the second number in the pattern
number * number is to be replaced by that row number in x.  Try this.
We define a regular expression which matches the first number ([0-9]+)
of each potential pair and optionally (?) spaces ( *) a star (\\*),
more spaces ( *) and digits [0-9.]+ passing the first and second
backreferences (matches to the parenthesized portions of the regular
expression) to f and inserting the output of f where the matches had
been.

library(gsubfn)
f <- function(a, b) paste(x[as.numeric(a)], b)
s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s)

If the objective is to then perform the calculation that that
represents then try this:
sapply(s2, function(x) eval(parse(text = x)))

For example,

s <- c("1 + 4 * 3 + 5 + 6 + 11 * 0.5", "1 + 4 * 3 + 5 + 6 + 11 *  
0.5")

x <- matrix(20:35)
f <- function(a, b) paste(x[as.numeric(a)], b)
s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s)
s2
[1] "20  + 23  * 3 + 24  + 25  + 30  * 0.5" "20  + 23  * 3 + 24  +  
25 + 30  * 0.5"

sapply(s2, function(x) eval(parse(text = x)))
20  + 23  * 3 + 24  + 25  + 30  * 0.5 20  + 23  * 3 + 24  + 25  +  
30  * 0.5
  
153   153


For more see the gsubfn home page at http://gsubfn.googlecode.com



I am scratching my head regarding the gsubfn workings. It appears that  
as gsubfn moves across the input strings that it will either match  
just "[0-9+]" or it will match "[0-9+] *\\* *[0-9.]+?".


In either case the match will do a lookup in x[] for the first match  
using the "a" index, and if there is a match for the second position  
assigned to "*b" then that x[a] will be followed by "*b"  and is  
therefore destined to be multiplied by "b". I cannot quite figure out  
how the NULL value gets not-matched to the second back-reference and  
then doesn't screw up the f() function by only providing one argument  
to a two argument function. Maybe it's due to this? (So can you  
comment on how optional back-references return values?)


> paste("a", NULL)
[1] "a "

Furthermore, somehow (and this is further functiona magic I am  
missing) these results are concatenated in a string, and then  
evaluated, a step which I do get.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Differenciate numbers from reference for rows

2010-10-30 Thread Gabor Grothendieck
On Fri, Oct 29, 2010 at 6:54 PM, M.Ribeiro  wrote:
>
> So, I am having a tricky reference file to extract information from.
>
> The format of the file is
>
> x   1 + 4 * 3 + 5 + 6 + 11 * 0.5
>
> So, the elements that are not being multiplied (1, 5 and 6) and the elements
> before the multiplication sign (4 and 11) means actually the reference for
> the row in a matrix where I need to extract the element from.
>
> The numbers after the multiplication sign are regular numbers
> Ex:
>
>> x<-matrix(20:35)
>> x
>      [,1]
>  [1,]   20
>  [2,]   21
>  [3,]   22
>  [4,]   23
>  [5,]   24
>  [6,]   25
>  [7,]   26
>  [8,]   27
>  [9,]   28
> [10,]   29
> [11,]   30
> [12,]   31
> [13,]   32
> [14,]   33
> [15,]   34
> [16,]   35
>
> I would like to read the rows 1,4,5,6 and 11 and sum then. However the
> numbers in the elements row 4 and 11 are multiplied by 3 and 0.5
>
> So it would be
> 20 + 23 * 3 + 24 + 25 + 30 * 0.5.
>
> And I have this format in different files so I can't do all by hand.
> Can anybody help me with a script that can differentiate this?


I assume that every number except for the second number in the pattern
number * number is to be replaced by that row number in x.  Try this.
We define a regular expression which matches the first number ([0-9]+)
of each potential pair and optionally (?) spaces ( *) a star (\\*),
more spaces ( *) and digits [0-9.]+ passing the first and second
backreferences (matches to the parenthesized portions of the regular
expression) to f and inserting the output of f where the matches had
been.

library(gsubfn)
f <- function(a, b) paste(x[as.numeric(a)], b)
s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s)

If the objective is to then perform the calculation that that
represents then try this:
sapply(s2, function(x) eval(parse(text = x)))

For example,

> s <- c("1 + 4 * 3 + 5 + 6 + 11 * 0.5", "1 + 4 * 3 + 5 + 6 + 11 * 0.5")
> x <- matrix(20:35)
> f <- function(a, b) paste(x[as.numeric(a)], b)
> s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s)
> s2
[1] "20  + 23  * 3 + 24  + 25  + 30  * 0.5" "20  + 23  * 3 + 24  + 25
+ 30  * 0.5"
> sapply(s2, function(x) eval(parse(text = x)))
20  + 23  * 3 + 24  + 25  + 30  * 0.5 20  + 23  * 3 + 24  + 25  + 30  * 0.5
  153   153

For more see the gsubfn home page at http://gsubfn.googlecode.com


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Differenciate numbers from reference for rows

2010-10-30 Thread David Winsemius


On Oct 29, 2010, at 11:16 PM, Dennis Murphy wrote:


Hi:

x <- matrix(20:35, ncol = 1)
u <- c(1, 4, 5, 6, 11)  # 'x values'
m <- c(1, 3, 1, 1, 0.5)

# Function to compute the inner product of the multipliers with the
extracted
# elements of x determined by u
f <- function(mat, inputs, mults) crossprod(mat[inputs], mults)
f(x, u, mults = c(1, 3, 1, 1, 0.5))
[,1]
[1,]  153
20 + 23 * 3 + 24 + 25 + 30 * 0.5
[1] 153

The function is flexible enough to allow you to play with the input  
matrix
(although a vector would also work), the 'observation vector' inputs  
and the
set of multipliers. Here's one way (not necessarily the most  
efficient):


uv <- matrix(sample(1:15, 25, replace = TRUE), ncol = 5)
uv   # like an X matrix, where each row provides the input values of  
the

vars
[,1] [,2] [,3] [,4] [,5]
[1,]   128   11   10   15
[2,]   15   11   14   148
[3,]484   10   12
[4,]   105217
[5,]   11491   11

# Apply the function f to each row of uv:
apply(uv, 1, function(y) f(x, y, mults = c(1, 3, 1, 1, 0.5)))
[1] 188.0 203.5 171.5 155.0 162.0

The direct matrix version:
crossprod(t(matrix(x[uv], ncol = 5)), c(1, 3, 1, 1, 0.5))
 [,1]
[1,] 188.0
[2,] 203.5
[3,] 171.5
[4,] 155.0
[5,] 162.0

Notice that the apply() call returns a vector whereas crossprod()  
returns a

matrix.
x[uv] selects the x values associated with the indices in uv and  
returns a
vector in column-major order. The crossprod() call transposes the  
reshaped
x[uv] and then 'matrix' multiplies it by the vector c(1, 3, 1, 1,  
0.5).


HTH,
Dennis

On Fri, Oct 29, 2010 at 3:54 PM, M.Ribeiro  
 wrote:




So, I am having a tricky reference file to extract information from.

The format of the file is

x   1 + 4 * 3 + 5 + 6 + 11 * 0.5


I saw the beginning of this task as parsing to extract the digits from  
a character string (possibly decimal digits in the case of the third  
and seventh positions) delimited by + and *:


library(gsubfn)
> x <-  "1 + 4 * 3 + 5 + 6 + 11 * 0.5"

 xin <- readLines(textConnection(x))
 xp <- strapply(xin, "^(\\d+) \\+ (\\d+) \\* (\\d+\\.*\\d*) \\+ (\\d 
+) \\+ (\\d+) \\+ (\\d+) \\* (\\d+\\.*\\d*)", c)

 sapply(xp, as.numeric)
 [,1]
[1,]  1.0
[2,]  4.0
[3,]  3.0
[4,]  5.0
[5,]  6.0
[6,] 11.0
[7,]  0.5

--
David



So, the elements that are not being multiplied (1, 5 and 6) and the
elements
before the multiplication sign (4 and 11) means actually the  
reference for

the row in a matrix where I need to extract the element from.

The numbers after the multiplication sign are regular numbers
Ex:


x<-matrix(20:35)
x

[,1]
[1,]   20
[2,]   21
[3,]   22
[4,]   23
[5,]   24
[6,]   25
[7,]   26
[8,]   27
[9,]   28
[10,]   29
[11,]   30
[12,]   31
[13,]   32
[14,]   33
[15,]   34
[16,]   35

I would like to read the rows 1,4,5,6 and 11 and sum then. However  
the

numbers in the elements row 4 and 11 are multiplied by 3 and 0.5

So it would be
20 + 23 * 3 + 24 + 25 + 30 * 0.5.

And I have this format in different files so I can't do all by hand.
Can anybody help me with a script that can differentiate this?
Thanks
--
View this message in context:
http://r.789695.n4.nabble.com/Differenciate-numbers-from-reference-for-rows-tp3019853p3019853.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Differenciate numbers from reference for rows

2010-10-29 Thread Dennis Murphy
Hi:

x <- matrix(20:35, ncol = 1)
u <- c(1, 4, 5, 6, 11)  # 'x values'
m <- c(1, 3, 1, 1, 0.5)

# Function to compute the inner product of the multipliers with the
extracted
# elements of x determined by u
f <- function(mat, inputs, mults) crossprod(mat[inputs], mults)
f(x, u, mults = c(1, 3, 1, 1, 0.5))
 [,1]
[1,]  153
20 + 23 * 3 + 24 + 25 + 30 * 0.5
[1] 153

The function is flexible enough to allow you to play with the input matrix
(although a vector would also work), the 'observation vector' inputs and the
set of multipliers. Here's one way (not necessarily the most efficient):

uv <- matrix(sample(1:15, 25, replace = TRUE), ncol = 5)
uv   # like an X matrix, where each row provides the input values of the
vars
 [,1] [,2] [,3] [,4] [,5]
[1,]   128   11   10   15
[2,]   15   11   14   148
[3,]484   10   12
[4,]   105217
[5,]   11491   11

# Apply the function f to each row of uv:
apply(uv, 1, function(y) f(x, y, mults = c(1, 3, 1, 1, 0.5)))
[1] 188.0 203.5 171.5 155.0 162.0

The direct matrix version:
crossprod(t(matrix(x[uv], ncol = 5)), c(1, 3, 1, 1, 0.5))
  [,1]
[1,] 188.0
[2,] 203.5
[3,] 171.5
[4,] 155.0
[5,] 162.0

Notice that the apply() call returns a vector whereas crossprod() returns a
matrix.
x[uv] selects the x values associated with the indices in uv and returns a
vector in column-major order. The crossprod() call transposes the reshaped
x[uv] and then 'matrix' multiplies it by the vector c(1, 3, 1, 1, 0.5).

HTH,
Dennis

On Fri, Oct 29, 2010 at 3:54 PM, M.Ribeiro  wrote:

>
> So, I am having a tricky reference file to extract information from.
>
> The format of the file is
>
> x   1 + 4 * 3 + 5 + 6 + 11 * 0.5
>
> So, the elements that are not being multiplied (1, 5 and 6) and the
> elements
> before the multiplication sign (4 and 11) means actually the reference for
> the row in a matrix where I need to extract the element from.
>
> The numbers after the multiplication sign are regular numbers
> Ex:
>
> > x<-matrix(20:35)
> > x
>  [,1]
>  [1,]   20
>  [2,]   21
>  [3,]   22
>  [4,]   23
>  [5,]   24
>  [6,]   25
>  [7,]   26
>  [8,]   27
>  [9,]   28
> [10,]   29
> [11,]   30
> [12,]   31
> [13,]   32
> [14,]   33
> [15,]   34
> [16,]   35
>
> I would like to read the rows 1,4,5,6 and 11 and sum then. However the
> numbers in the elements row 4 and 11 are multiplied by 3 and 0.5
>
> So it would be
> 20 + 23 * 3 + 24 + 25 + 30 * 0.5.
>
> And I have this format in different files so I can't do all by hand.
> Can anybody help me with a script that can differentiate this?
> Thanks
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Differenciate-numbers-from-reference-for-rows-tp3019853p3019853.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Differenciate numbers from reference for rows

2010-10-29 Thread M.Ribeiro

So, I am having a tricky reference file to extract information from.

The format of the file is

x   1 + 4 * 3 + 5 + 6 + 11 * 0.5

So, the elements that are not being multiplied (1, 5 and 6) and the elements
before the multiplication sign (4 and 11) means actually the reference for
the row in a matrix where I need to extract the element from.

The numbers after the multiplication sign are regular numbers 
Ex:

> x<-matrix(20:35)
> x
  [,1]
 [1,]   20
 [2,]   21
 [3,]   22
 [4,]   23
 [5,]   24
 [6,]   25
 [7,]   26
 [8,]   27
 [9,]   28
[10,]   29
[11,]   30
[12,]   31
[13,]   32
[14,]   33
[15,]   34
[16,]   35

I would like to read the rows 1,4,5,6 and 11 and sum then. However the
numbers in the elements row 4 and 11 are multiplied by 3 and 0.5

So it would be
20 + 23 * 3 + 24 + 25 + 30 * 0.5.

And I have this format in different files so I can't do all by hand.
Can anybody help me with a script that can differentiate this?
Thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Differenciate-numbers-from-reference-for-rows-tp3019853p3019853.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.