Re: [R] LM with summation function

2012-05-23 Thread Robbie Edwards
Thank you Peter, works perfectly.

Funny how simple things are once someone tells you the answer =)

robbie



On Tue, May 22, 2012 at 9:37 PM, Peter Ehlers ehl...@ucalgary.ca wrote:

 Robbie,

 Here's what I *think* you are trying to do:

 1.
 y is a cubic function of x:

  y = b1*x + b2*x^2 + b3*x^3

 2.
 s is the cumsum of y:

  s_i = y_1 + ... + y_i

 3.
 Given a subset of x = 1:n and the corresponding
 values of s, estimate the coefficients of the cubic.

 If that is the correct understanding, then you should
 be able to estimate the coefficients as follows:

 a) since s_i = b1 * sum of x_k for k=1, ..., i
   + b2 * sum of (x_k)^2 for k=1, ..., i
   + b3 * sum of (x_k)^3 for k=1, ..., i

 we can regress s on the cumsums of x, x^2 and x^3:

 using your sample data:
  d - data.frame(x = c(1, 4, 9, 12),
  s = c(109, 1200, 5325, 8216))

  e - data.frame(x = 1:12)
  e - merge(e, d, all.x = T)
  e - within(e,
 {z3 - cumsum(x^3)
  z2 - cumsum(x^2)
  z1 - cumsum(x)})

  coef(lm(s ~ 0 + z1 + z2 + z3, data = e))

 #  z1  z2  z3
 # 100  10  -1


 Peter Ehlers


 On 2012-05-22 09:43, Robbie Edwards wrote:

 I don't think I can.

 For the sample data

  d- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))

 when x = 4, s = 1200.  However, that s4 is sum of y1 + y2 + y3 + y4.
  Wouldn't I have to know the y for x = 2 and x = 3 to get the value of y
 for x = 4?

 In the previous message, I created two sample data frames.  d is what I'm
 trying to use to create df.  I only know what's in d, df is just used to
 illustrate what I'm trying to get from d.

 robbie





 On Tue, May 22, 2012 at 12:30 PM, R. Michael Weylandt
 michael.weyla...@gmail.com  wrote:

  But if I understand your problem correctly, you can get the y values
 from the s values. I'm relying on your statement that s is sum of the
 current y and all previous y (s3 = y1 + y2 + y3). E.g.,

 y- c(1, 4, 6, 9, 3, 7)

 s1 = 1
 s2 = 4 + s1 = 5
 s3 = 6 + s2 = 11

 more generally

 s- cumsum(y)

 Then if we only see s, we can get back the y vector by doing

 c(s[1], diff(s))

 which is identical to y.

 So for your data, the underlying y must have been c(109, 1091, 4125,
 2891) right?

 Or have I completely misunderstood your problem?

 Michael

 On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards
 robbie.edwa...@gmail.com  wrote:

 Actually, I can't.  I don't know the y values.  Only the s and only for
 a
 subset of the data.

 Like this.

 d- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))



 On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt
 michael.weyla...@gmail.com  wrote:


 You can reconstruct the y values by taking first-differences of the s
 vector, no? Then it sounds like you're good to go

 Best, Michael

 On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards
 robbie.edwa...@gmail.com  wrote:

 Hi all,

 Thanks for the replies, but I realize I've done a bad job explaining

 my

 problem.  To help, I've created some sample data to explain the

 problem.


 df- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109,
 232,
 363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704,
 1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216))

 In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3

 and

 s
 is sum of the current y and all previous y (s3 = y1 + y2 + y3).

 I know I can find b1, b2 and b3 using:
 lm(y ~ 0 + x + I(x^2) + I(x^3), data=df)

 yielding...
 Coefficients:
 x  I(x^2)  I(x^3)
   100  10  -1

 However, I need to find b1, b2 and b3 using the s column.  The reason
 being, I don't actually know the values of y in the actual data set.
  And
 in the actual data, I only have a few of the values.  Imagine this

 data

 is
 being used a reward schedule for like a loyalty points program.  y
 represents the number of points needed for each level while s is the
 total
 number of points to reach that level.  In the real problem, my data
 looks
 more like this:

 d- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))

 Where I need to use a few sample points to help define the parameters

 of

 the curve.

 thanks again and hopefully this makes the problem a bit clearer.

 robbie



 On Fri, May 18, 2012 at 7:40 PM, David Winsemius
 dwinsem...@comcast.netwrote:


 On May 18, 2012, at 1:44 PM, Robbie Edwards wrote:

  Hi all,


 I'm trying to model some data where the y is defined by

 y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3

 Hopefully that reads clearly for email.


  cumsum( rowSums( cbind(B1 * x,  B2 * x^2, B3 * x^3)))



  Anyway, if it wasn't for the summation, I know I would do it like

 this


 lm(y ~ x + x2 + x3)

 Where x2 and x3 are x^2 and x^3.

 However, since each value of x is related to the previous values of

 x,

 I
 don't know how to do this.  Any help is greatly appreciated.




 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

 

Re: [R] LM with summation function

2012-05-22 Thread Robbie Edwards
Hi all,

Thanks for the replies, but I realize I've done a bad job explaining my
problem.  To help, I've created some sample data to explain the problem.

df - data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109, 232,
363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704,
1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216))

In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3 and s
is sum of the current y and all previous y (s3 = y1 + y2 + y3).

I know I can find b1, b2 and b3 using:
lm(y ~ 0 + x + I(x^2) + I(x^3), data=df)

yielding...
Coefficients:
 x  I(x^2)  I(x^3)
   100  10  -1

However, I need to find b1, b2 and b3 using the s column.  The reason
being, I don't actually know the values of y in the actual data set.  And
in the actual data, I only have a few of the values.  Imagine this data is
being used a reward schedule for like a loyalty points program.  y
represents the number of points needed for each level while s is the total
number of points to reach that level.  In the real problem, my data looks
more like this:

d - data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))

Where I need to use a few sample points to help define the parameters of
the curve.

thanks again and hopefully this makes the problem a bit clearer.

robbie



On Fri, May 18, 2012 at 7:40 PM, David Winsemius dwinsem...@comcast.netwrote:


 On May 18, 2012, at 1:44 PM, Robbie Edwards wrote:

  Hi all,

 I'm trying to model some data where the y is defined by

 y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3

 Hopefully that reads clearly for email.


 cumsum( rowSums( cbind(B1 * x,  B2 * x^2, B3 * x^3)))



  Anyway, if it wasn't for the summation, I know I would do it like this

 lm(y ~ x + x2 + x3)

 Where x2 and x3 are x^2 and x^3.

 However, since each value of x is related to the previous values of x, I
 don't know how to do this.  Any help is greatly appreciated.




 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LM with summation function

2012-05-22 Thread R. Michael Weylandt
But if I understand your problem correctly, you can get the y values
from the s values. I'm relying on your statement that s is sum of the
current y and all previous y (s3 = y1 + y2 + y3). E.g.,

y - c(1, 4, 6, 9, 3, 7)

s1 = 1
s2 = 4 + s1 = 5
s3 = 6 + s2 = 11

more generally

s - cumsum(y)

Then if we only see s, we can get back the y vector by doing

c(s[1], diff(s))

which is identical to y.

So for your data, the underlying y must have been c(109, 1091, 4125,
2891) right?

Or have I completely misunderstood your problem?

Michael

On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards
robbie.edwa...@gmail.com wrote:
 Actually, I can't.  I don't know the y values.  Only the s and only for a
 subset of the data.

 Like this.

 d - data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))



 On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:

 You can reconstruct the y values by taking first-differences of the s
 vector, no? Then it sounds like you're good to go

 Best, Michael

 On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards
 robbie.edwa...@gmail.com wrote:
  Hi all,
 
  Thanks for the replies, but I realize I've done a bad job explaining my
  problem.  To help, I've created some sample data to explain the problem.
 
  df - data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109,
  232,
  363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704,
  1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216))
 
  In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3 and
  s
  is sum of the current y and all previous y (s3 = y1 + y2 + y3).
 
  I know I can find b1, b2 and b3 using:
  lm(y ~ 0 + x + I(x^2) + I(x^3), data=df)
 
  yielding...
  Coefficients:
      x  I(x^2)  I(x^3)
    100      10      -1
 
  However, I need to find b1, b2 and b3 using the s column.  The reason
  being, I don't actually know the values of y in the actual data set.
   And
  in the actual data, I only have a few of the values.  Imagine this data
  is
  being used a reward schedule for like a loyalty points program.  y
  represents the number of points needed for each level while s is the
  total
  number of points to reach that level.  In the real problem, my data
  looks
  more like this:
 
  d - data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))
 
  Where I need to use a few sample points to help define the parameters of
  the curve.
 
  thanks again and hopefully this makes the problem a bit clearer.
 
  robbie
 
 
 
  On Fri, May 18, 2012 at 7:40 PM, David Winsemius
  dwinsem...@comcast.netwrote:
 
 
  On May 18, 2012, at 1:44 PM, Robbie Edwards wrote:
 
   Hi all,
 
  I'm trying to model some data where the y is defined by
 
  y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3
 
  Hopefully that reads clearly for email.
 
 
  cumsum( rowSums( cbind(B1 * x,  B2 * x^2, B3 * x^3)))
 
 
 
   Anyway, if it wasn't for the summation, I know I would do it like this
 
  lm(y ~ x + x2 + x3)
 
  Where x2 and x3 are x^2 and x^3.
 
  However, since each value of x is related to the previous values of x,
  I
  don't know how to do this.  Any help is greatly appreciated.
 
 
 
 
  David Winsemius, MD
  West Hartford, CT
 
 
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LM with summation function

2012-05-22 Thread Robbie Edwards
I don't think I can.

For the sample data

 d - data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))

when x = 4, s = 1200.  However, that s4 is sum of y1 + y2 + y3 + y4.
 Wouldn't I have to know the y for x = 2 and x = 3 to get the value of y
for x = 4?

In the previous message, I created two sample data frames.  d is what I'm
trying to use to create df.  I only know what's in d, df is just used to
illustrate what I'm trying to get from d.

robbie





On Tue, May 22, 2012 at 12:30 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 But if I understand your problem correctly, you can get the y values
 from the s values. I'm relying on your statement that s is sum of the
 current y and all previous y (s3 = y1 + y2 + y3). E.g.,

 y - c(1, 4, 6, 9, 3, 7)

 s1 = 1
 s2 = 4 + s1 = 5
 s3 = 6 + s2 = 11

 more generally

 s - cumsum(y)

 Then if we only see s, we can get back the y vector by doing

 c(s[1], diff(s))

 which is identical to y.

 So for your data, the underlying y must have been c(109, 1091, 4125,
 2891) right?

 Or have I completely misunderstood your problem?

 Michael

 On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards
 robbie.edwa...@gmail.com wrote:
  Actually, I can't.  I don't know the y values.  Only the s and only for a
  subset of the data.
 
  Like this.
 
  d - data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))
 
 
 
  On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt
  michael.weyla...@gmail.com wrote:
 
  You can reconstruct the y values by taking first-differences of the s
  vector, no? Then it sounds like you're good to go
 
  Best, Michael
 
  On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards
  robbie.edwa...@gmail.com wrote:
   Hi all,
  
   Thanks for the replies, but I realize I've done a bad job explaining
 my
   problem.  To help, I've created some sample data to explain the
 problem.
  
   df - data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109,
   232,
   363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704,
   1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216))
  
   In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3
 and
   s
   is sum of the current y and all previous y (s3 = y1 + y2 + y3).
  
   I know I can find b1, b2 and b3 using:
   lm(y ~ 0 + x + I(x^2) + I(x^3), data=df)
  
   yielding...
   Coefficients:
   x  I(x^2)  I(x^3)
 100  10  -1
  
   However, I need to find b1, b2 and b3 using the s column.  The reason
   being, I don't actually know the values of y in the actual data set.
And
   in the actual data, I only have a few of the values.  Imagine this
 data
   is
   being used a reward schedule for like a loyalty points program.  y
   represents the number of points needed for each level while s is the
   total
   number of points to reach that level.  In the real problem, my data
   looks
   more like this:
  
   d - data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))
  
   Where I need to use a few sample points to help define the parameters
 of
   the curve.
  
   thanks again and hopefully this makes the problem a bit clearer.
  
   robbie
  
  
  
   On Fri, May 18, 2012 at 7:40 PM, David Winsemius
   dwinsem...@comcast.netwrote:
  
  
   On May 18, 2012, at 1:44 PM, Robbie Edwards wrote:
  
Hi all,
  
   I'm trying to model some data where the y is defined by
  
   y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3
  
   Hopefully that reads clearly for email.
  
  
   cumsum( rowSums( cbind(B1 * x,  B2 * x^2, B3 * x^3)))
  
  
  
Anyway, if it wasn't for the summation, I know I would do it like
 this
  
   lm(y ~ x + x2 + x3)
  
   Where x2 and x3 are x^2 and x^3.
  
   However, since each value of x is related to the previous values of
 x,
   I
   don't know how to do this.  Any help is greatly appreciated.
  
  
  
  
   David Winsemius, MD
   West Hartford, CT
  
  
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LM with summation function

2012-05-22 Thread R. Michael Weylandt
Ahh sorry -- I didn't understand that x was supposed to be an
index so I was using the row number an index for the summation -- yes,
my proposal probably won't work without further assumptions[I.e.,
you could assume linear growth between observations, but that will
bias something some direction...(not sure which)]

I'll ponder it some more and get back to you if I come up with anything

Michael

On Tue, May 22, 2012 at 12:43 PM, Robbie Edwards
robbie.edwa...@gmail.com wrote:
 I don't think I can.

 For the sample data

  d - data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))

 when x = 4, s = 1200.  However, that s4 is sum of y1 + y2 + y3 + y4.
  Wouldn't I have to know the y for x = 2 and x = 3 to get the value of y
 for x = 4?

 In the previous message, I created two sample data frames.  d is what I'm
 trying to use to create df.  I only know what's in d, df is just used to
 illustrate what I'm trying to get from d.

 robbie





 On Tue, May 22, 2012 at 12:30 PM, R. Michael Weylandt 
 michael.weyla...@gmail.com wrote:

 But if I understand your problem correctly, you can get the y values
 from the s values. I'm relying on your statement that s is sum of the
 current y and all previous y (s3 = y1 + y2 + y3). E.g.,

 y - c(1, 4, 6, 9, 3, 7)

 s1 = 1
 s2 = 4 + s1 = 5
 s3 = 6 + s2 = 11

 more generally

 s - cumsum(y)

 Then if we only see s, we can get back the y vector by doing

 c(s[1], diff(s))

 which is identical to y.

 So for your data, the underlying y must have been c(109, 1091, 4125,
 2891) right?

 Or have I completely misunderstood your problem?

 Michael

 On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards
 robbie.edwa...@gmail.com wrote:
  Actually, I can't.  I don't know the y values.  Only the s and only for a
  subset of the data.
 
  Like this.
 
  d - data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))
 
 
 
  On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt
  michael.weyla...@gmail.com wrote:
 
  You can reconstruct the y values by taking first-differences of the s
  vector, no? Then it sounds like you're good to go
 
  Best, Michael
 
  On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards
  robbie.edwa...@gmail.com wrote:
   Hi all,
  
   Thanks for the replies, but I realize I've done a bad job explaining
 my
   problem.  To help, I've created some sample data to explain the
 problem.
  
   df - data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109,
   232,
   363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704,
   1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216))
  
   In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3
 and
   s
   is sum of the current y and all previous y (s3 = y1 + y2 + y3).
  
   I know I can find b1, b2 and b3 using:
   lm(y ~ 0 + x + I(x^2) + I(x^3), data=df)
  
   yielding...
   Coefficients:
       x  I(x^2)  I(x^3)
     100      10      -1
  
   However, I need to find b1, b2 and b3 using the s column.  The reason
   being, I don't actually know the values of y in the actual data set.
    And
   in the actual data, I only have a few of the values.  Imagine this
 data
   is
   being used a reward schedule for like a loyalty points program.  y
   represents the number of points needed for each level while s is the
   total
   number of points to reach that level.  In the real problem, my data
   looks
   more like this:
  
   d - data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))
  
   Where I need to use a few sample points to help define the parameters
 of
   the curve.
  
   thanks again and hopefully this makes the problem a bit clearer.
  
   robbie
  
  
  
   On Fri, May 18, 2012 at 7:40 PM, David Winsemius
   dwinsem...@comcast.netwrote:
  
  
   On May 18, 2012, at 1:44 PM, Robbie Edwards wrote:
  
    Hi all,
  
   I'm trying to model some data where the y is defined by
  
   y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3
  
   Hopefully that reads clearly for email.
  
  
   cumsum( rowSums( cbind(B1 * x,  B2 * x^2, B3 * x^3)))
  
  
  
    Anyway, if it wasn't for the summation, I know I would do it like
 this
  
   lm(y ~ x + x2 + x3)
  
   Where x2 and x3 are x^2 and x^3.
  
   However, since each value of x is related to the previous values of
 x,
   I
   don't know how to do this.  Any help is greatly appreciated.
  
  
  
  
   David Winsemius, MD
   West Hartford, CT
  
  
  
          [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 

Re: [R] LM with summation function

2012-05-22 Thread Peter Ehlers

Robbie,

Here's what I *think* you are trying to do:

1.
y is a cubic function of x:

  y = b1*x + b2*x^2 + b3*x^3

2.
s is the cumsum of y:

  s_i = y_1 + ... + y_i

3.
Given a subset of x = 1:n and the corresponding
values of s, estimate the coefficients of the cubic.

If that is the correct understanding, then you should
be able to estimate the coefficients as follows:

a) since s_i = b1 * sum of x_k for k=1, ..., i
   + b2 * sum of (x_k)^2 for k=1, ..., i
   + b3 * sum of (x_k)^3 for k=1, ..., i

we can regress s on the cumsums of x, x^2 and x^3:

using your sample data:
  d - data.frame(x = c(1, 4, 9, 12),
  s = c(109, 1200, 5325, 8216))

  e - data.frame(x = 1:12)
  e - merge(e, d, all.x = T)
  e - within(e,
 {z3 - cumsum(x^3)
  z2 - cumsum(x^2)
  z1 - cumsum(x)})

  coef(lm(s ~ 0 + z1 + z2 + z3, data = e))

#  z1  z2  z3
# 100  10  -1


Peter Ehlers

On 2012-05-22 09:43, Robbie Edwards wrote:

I don't think I can.

For the sample data

  d- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))

when x = 4, s = 1200.  However, that s4 is sum of y1 + y2 + y3 + y4.
  Wouldn't I have to know the y for x = 2 and x = 3 to get the value of y
for x = 4?

In the previous message, I created two sample data frames.  d is what I'm
trying to use to create df.  I only know what's in d, df is just used to
illustrate what I'm trying to get from d.

robbie





On Tue, May 22, 2012 at 12:30 PM, R. Michael Weylandt
michael.weyla...@gmail.com  wrote:


But if I understand your problem correctly, you can get the y values
from the s values. I'm relying on your statement that s is sum of the
current y and all previous y (s3 = y1 + y2 + y3). E.g.,

y- c(1, 4, 6, 9, 3, 7)

s1 = 1
s2 = 4 + s1 = 5
s3 = 6 + s2 = 11

more generally

s- cumsum(y)

Then if we only see s, we can get back the y vector by doing

c(s[1], diff(s))

which is identical to y.

So for your data, the underlying y must have been c(109, 1091, 4125,
2891) right?

Or have I completely misunderstood your problem?

Michael

On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards
robbie.edwa...@gmail.com  wrote:

Actually, I can't.  I don't know the y values.  Only the s and only for a
subset of the data.

Like this.

d- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))



On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt
michael.weyla...@gmail.com  wrote:


You can reconstruct the y values by taking first-differences of the s
vector, no? Then it sounds like you're good to go

Best, Michael

On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards
robbie.edwa...@gmail.com  wrote:

Hi all,

Thanks for the replies, but I realize I've done a bad job explaining

my

problem.  To help, I've created some sample data to explain the

problem.


df- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109,
232,
363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704,
1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216))

In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3

and

s
is sum of the current y and all previous y (s3 = y1 + y2 + y3).

I know I can find b1, b2 and b3 using:
lm(y ~ 0 + x + I(x^2) + I(x^3), data=df)

yielding...
Coefficients:
 x  I(x^2)  I(x^3)
   100  10  -1

However, I need to find b1, b2 and b3 using the s column.  The reason
being, I don't actually know the values of y in the actual data set.
  And
in the actual data, I only have a few of the values.  Imagine this

data

is
being used a reward schedule for like a loyalty points program.  y
represents the number of points needed for each level while s is the
total
number of points to reach that level.  In the real problem, my data
looks
more like this:

d- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))

Where I need to use a few sample points to help define the parameters

of

the curve.

thanks again and hopefully this makes the problem a bit clearer.

robbie



On Fri, May 18, 2012 at 7:40 PM, David Winsemius
dwinsem...@comcast.netwrote:



On May 18, 2012, at 1:44 PM, Robbie Edwards wrote:

  Hi all,


I'm trying to model some data where the y is defined by

y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3

Hopefully that reads clearly for email.



cumsum( rowSums( cbind(B1 * x,  B2 * x^2, B3 * x^3)))



  Anyway, if it wasn't for the summation, I know I would do it like

this


lm(y ~ x + x2 + x3)

Where x2 and x3 are x^2 and x^3.

However, since each value of x is related to the previous values of

x,

I
don't know how to do this.  Any help is greatly appreciated.





David Winsemius, MD
West Hartford, CT




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







[[alternative HTML 

Re: [R] LM with summation function

2012-05-18 Thread Rolf Turner

On 19/05/12 05:44, Robbie Edwards wrote:

Hi all,

I'm trying to model some data where the y is defined by

y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3

Hopefully that reads clearly for email.

Anyway, if it wasn't for the summation, I know I would do it like this

lm(y ~ x + x2 + x3)

Where x2 and x3 are x^2 and x^3.

However, since each value of x is related to the previous values of x, I
don't know how to do this.  Any help is greatly appreciated.


If your mail says what it seems to say, then your question makes
no sense.  You are in effect trying to fit a linear model to a single
point:

y = B1*s1 + B2*s2 + B3*3

where s1 = sum(x), s2 = sum(x^2) and s3=sum(x^3)

and you have only a single value of each of s1, s2, s3.

If you have replicate values of s1, s2, and s3 (i.e. replicate
vectors (x1, ... x50)) --- and of course a corresponding y value
for each replicate --- then just form s1, s2, and s3 as vectors
whose entries correspond to the replicates and then fit

lm(y ~ s1 + s2 + s3)

If I have misunderstood what you are asking then please provide
a self-contained reproducible example as the posting guide requests.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LM with summation function

2012-05-18 Thread Bert Gunter
Following up on Rolf's post:

1) cumulative summation (cumsum) maybe?
2) In fact, you should probably **not** fit the non-summation version
as you have stated. See ?poly.

I would guess that context is important here. Based on (my
interpretation) of the rather strange nature of your request,  I
suspect that you shouldn't be trying to do what you're doing **at
all**;  but that's just a guess, of course.

-- Bert

On Fri, May 18, 2012 at 1:56 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote:
 On 19/05/12 05:44, Robbie Edwards wrote:

 Hi all,

 I'm trying to model some data where the y is defined by

 y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3

 Hopefully that reads clearly for email.

 Anyway, if it wasn't for the summation, I know I would do it like this

 lm(y ~ x + x2 + x3)

 Where x2 and x3 are x^2 and x^3.

 However, since each value of x is related to the previous values of x, I
 don't know how to do this.  Any help is greatly appreciated.


 If your mail says what it seems to say, then your question makes
 no sense.  You are in effect trying to fit a linear model to a single
 point:

    y = B1*s1 + B2*s2 + B3*3

 where s1 = sum(x), s2 = sum(x^2) and s3=sum(x^3)

 and you have only a single value of each of s1, s2, s3.

 If you have replicate values of s1, s2, and s3 (i.e. replicate
 vectors (x1, ... x50)) --- and of course a corresponding y value
 for each replicate --- then just form s1, s2, and s3 as vectors
 whose entries correspond to the replicates and then fit

    lm(y ~ s1 + s2 + s3)

 If I have misunderstood what you are asking then please provide
 a self-contained reproducible example as the posting guide requests.

    cheers,

        Rolf Turner

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LM with summation function

2012-05-18 Thread David Winsemius


On May 18, 2012, at 1:44 PM, Robbie Edwards wrote:


Hi all,

I'm trying to model some data where the y is defined by

y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3

Hopefully that reads clearly for email.



cumsum( rowSums( cbind(B1 * x,  B2 * x^2, B3 * x^3)))



Anyway, if it wasn't for the summation, I know I would do it like this

lm(y ~ x + x2 + x3)

Where x2 and x3 are x^2 and x^3.

However, since each value of x is related to the previous values of  
x, I

don't know how to do this.  Any help is greatly appreciated.





David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.