Re: [R] LM with summation function
Thank you Peter, works perfectly. Funny how simple things are once someone tells you the answer =) robbie On Tue, May 22, 2012 at 9:37 PM, Peter Ehlers wrote: > Robbie, > > Here's what I *think* you are trying to do: > > 1. > y is a cubic function of x: > > y = b1*x + b2*x^2 + b3*x^3 > > 2. > s is the cumsum of y: > > s_i = y_1 + ... + y_i > > 3. > Given a subset of x = 1:n and the corresponding > values of s, estimate the coefficients of the cubic. > > If that is the correct understanding, then you should > be able to estimate the coefficients as follows: > > a) since s_i = b1 * sum of x_k for k=1, ..., i > + b2 * sum of (x_k)^2 for k=1, ..., i > + b3 * sum of (x_k)^3 for k=1, ..., i > > we can regress s on the cumsums of x, x^2 and x^3: > > using your sample data: > d <- data.frame(x = c(1, 4, 9, 12), > s = c(109, 1200, 5325, 8216)) > > e <- data.frame(x = 1:12) > e <- merge(e, d, all.x = T) > e <- within(e, > {z3 <- cumsum(x^3) > z2 <- cumsum(x^2) > z1 <- cumsum(x)}) > > coef(lm(s ~ 0 + z1 + z2 + z3, data = e)) > > # z1 z2 z3 > # 100 10 -1 > > > Peter Ehlers > > > On 2012-05-22 09:43, Robbie Edwards wrote: > >> I don't think I can. >> >> For the sample data >> >> d<- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) >> >> when x = 4, s = 1200. However, that s4 is sum of y1 + y2 + y3 + y4. >> Wouldn't I have to know the y for x = 2 and x = 3 to get the value of y >> for x = 4? >> >> In the previous message, I created two sample data frames. d is what I'm >> trying to use to create df. I only know what's in d, df is just used to >> illustrate what I'm trying to get from d. >> >> robbie >> >> >> >> >> >> On Tue, May 22, 2012 at 12:30 PM, R. Michael Weylandt< >> michael.weyla...@gmail.com> wrote: >> >> But if I understand your problem correctly, you can get the y values >>> from the s values. I'm relying on your statement that "s is sum of the >>> current y and all previous y (s3 = y1 + y2 + y3)." E.g., >>> >>> y<- c(1, 4, 6, 9, 3, 7) >>> >>> s1 = 1 >>> s2 = 4 + s1 = 5 >>> s3 = 6 + s2 = 11 >>> >>> more generally >>> >>> s<- cumsum(y) >>> >>> Then if we only see s, we can get back the y vector by doing >>> >>> c(s[1], diff(s)) >>> >>> which is identical to y. >>> >>> So for your data, the underlying y must have been c(109, 1091, 4125, >>> 2891) right? >>> >>> Or have I completely misunderstood your problem? >>> >>> Michael >>> >>> On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards >>> wrote: >>> Actually, I can't. I don't know the y values. Only the s and only for a subset of the data. Like this. d<- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt wrote: > > You can reconstruct the y values by taking first-differences of the s > vector, no? Then it sounds like you're good to go > > Best, Michael > > On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards > wrote: > >> Hi all, >> >> Thanks for the replies, but I realize I've done a bad job explaining >> > my >>> problem. To help, I've created some sample data to explain the >> > problem. >>> >> df<- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109, >> 232, >> 363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704, >> 1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216)) >> >> In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3 >> > and >>> s >> is sum of the current y and all previous y (s3 = y1 + y2 + y3). >> >> I know I can find b1, b2 and b3 using: >> lm(y ~ 0 + x + I(x^2) + I(x^3), data=df) >> >> yielding... >> Coefficients: >> x I(x^2) I(x^3) >> 100 10 -1 >> >> However, I need to find b1, b2 and b3 using the s column. The reason >> being, I don't actually know the values of y in the actual data set. >> And >> in the actual data, I only have a few of the values. Imagine this >> > data >>> is >> being used a reward schedule for like a loyalty points program. y >> represents the number of points needed for each level while s is the >> total >> number of points to reach that level. In the real problem, my data >> looks >> more like this: >> >> d<- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) >> >> Where I need to use a few sample points to help define the parameters >> > of >>> the curve. >> >> thanks again and hopefully this makes the problem a bit clearer. >> >> robbie >> >> >> >> On Fri, May 18, 2012 at 7:40 PM, David Winsemius >> wrote: >> >> >>> On May 18, 2012, at 1:44 PM, Robbie Edwards wrote: >>> >>> Hi all, >>> I'm trying to
Re: [R] LM with summation function
Robbie, Here's what I *think* you are trying to do: 1. y is a cubic function of x: y = b1*x + b2*x^2 + b3*x^3 2. s is the cumsum of y: s_i = y_1 + ... + y_i 3. Given a subset of x = 1:n and the corresponding values of s, estimate the coefficients of the cubic. If that is the correct understanding, then you should be able to estimate the coefficients as follows: a) since s_i = b1 * sum of x_k for k=1, ..., i + b2 * sum of (x_k)^2 for k=1, ..., i + b3 * sum of (x_k)^3 for k=1, ..., i we can regress s on the cumsums of x, x^2 and x^3: using your sample data: d <- data.frame(x = c(1, 4, 9, 12), s = c(109, 1200, 5325, 8216)) e <- data.frame(x = 1:12) e <- merge(e, d, all.x = T) e <- within(e, {z3 <- cumsum(x^3) z2 <- cumsum(x^2) z1 <- cumsum(x)}) coef(lm(s ~ 0 + z1 + z2 + z3, data = e)) # z1 z2 z3 # 100 10 -1 Peter Ehlers On 2012-05-22 09:43, Robbie Edwards wrote: I don't think I can. For the sample data d<- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) when x = 4, s = 1200. However, that s4 is sum of y1 + y2 + y3 + y4. Wouldn't I have to know the y for x = 2 and x = 3 to get the value of y for x = 4? In the previous message, I created two sample data frames. d is what I'm trying to use to create df. I only know what's in d, df is just used to illustrate what I'm trying to get from d. robbie On Tue, May 22, 2012 at 12:30 PM, R. Michael Weylandt< michael.weyla...@gmail.com> wrote: But if I understand your problem correctly, you can get the y values from the s values. I'm relying on your statement that "s is sum of the current y and all previous y (s3 = y1 + y2 + y3)." E.g., y<- c(1, 4, 6, 9, 3, 7) s1 = 1 s2 = 4 + s1 = 5 s3 = 6 + s2 = 11 more generally s<- cumsum(y) Then if we only see s, we can get back the y vector by doing c(s[1], diff(s)) which is identical to y. So for your data, the underlying y must have been c(109, 1091, 4125, 2891) right? Or have I completely misunderstood your problem? Michael On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards wrote: Actually, I can't. I don't know the y values. Only the s and only for a subset of the data. Like this. d<- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt wrote: You can reconstruct the y values by taking first-differences of the s vector, no? Then it sounds like you're good to go Best, Michael On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards wrote: Hi all, Thanks for the replies, but I realize I've done a bad job explaining my problem. To help, I've created some sample data to explain the problem. df<- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109, 232, 363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704, 1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216)) In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3 and s is sum of the current y and all previous y (s3 = y1 + y2 + y3). I know I can find b1, b2 and b3 using: lm(y ~ 0 + x + I(x^2) + I(x^3), data=df) yielding... Coefficients: x I(x^2) I(x^3) 100 10 -1 However, I need to find b1, b2 and b3 using the s column. The reason being, I don't actually know the values of y in the actual data set. And in the actual data, I only have a few of the values. Imagine this data is being used a reward schedule for like a loyalty points program. y represents the number of points needed for each level while s is the total number of points to reach that level. In the real problem, my data looks more like this: d<- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) Where I need to use a few sample points to help define the parameters of the curve. thanks again and hopefully this makes the problem a bit clearer. robbie On Fri, May 18, 2012 at 7:40 PM, David Winsemius wrote: On May 18, 2012, at 1:44 PM, Robbie Edwards wrote: Hi all, I'm trying to model some data where the y is defined by y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3 Hopefully that reads clearly for email. cumsum( rowSums( cbind(B1 * x, B2 * x^2, B3 * x^3))) Anyway, if it wasn't for the summation, I know I would do it like this lm(y ~ x + x2 + x3) Where x2 and x3 are x^2 and x^3. However, since each value of x is related to the previous values of x, I don't know how to do this. Any help is greatly appreciated. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-proj
Re: [R] LM with summation function
Ahh sorry -- I didn't understand that x was supposed to be an index so I was using the row number an index for the summation -- yes, my proposal probably won't work without further assumptions[I.e., you could assume linear growth between observations, but that will bias something some direction...(not sure which)] I'll ponder it some more and get back to you if I come up with anything Michael On Tue, May 22, 2012 at 12:43 PM, Robbie Edwards wrote: > I don't think I can. > > For the sample data > > d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) > > when x = 4, s = 1200. However, that s4 is sum of y1 + y2 + y3 + y4. > Wouldn't I have to know the y for x = 2 and x = 3 to get the value of y > for x = 4? > > In the previous message, I created two sample data frames. d is what I'm > trying to use to create df. I only know what's in d, df is just used to > illustrate what I'm trying to get from d. > > robbie > > > > > > On Tue, May 22, 2012 at 12:30 PM, R. Michael Weylandt < > michael.weyla...@gmail.com> wrote: > >> But if I understand your problem correctly, you can get the y values >> from the s values. I'm relying on your statement that "s is sum of the >> current y and all previous y (s3 = y1 + y2 + y3)." E.g., >> >> y <- c(1, 4, 6, 9, 3, 7) >> >> s1 = 1 >> s2 = 4 + s1 = 5 >> s3 = 6 + s2 = 11 >> >> more generally >> >> s <- cumsum(y) >> >> Then if we only see s, we can get back the y vector by doing >> >> c(s[1], diff(s)) >> >> which is identical to y. >> >> So for your data, the underlying y must have been c(109, 1091, 4125, >> 2891) right? >> >> Or have I completely misunderstood your problem? >> >> Michael >> >> On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards >> wrote: >> > Actually, I can't. I don't know the y values. Only the s and only for a >> > subset of the data. >> > >> > Like this. >> > >> > d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) >> > >> > >> > >> > On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt >> > wrote: >> >> >> >> You can reconstruct the y values by taking first-differences of the s >> >> vector, no? Then it sounds like you're good to go >> >> >> >> Best, Michael >> >> >> >> On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards >> >> wrote: >> >> > Hi all, >> >> > >> >> > Thanks for the replies, but I realize I've done a bad job explaining >> my >> >> > problem. To help, I've created some sample data to explain the >> problem. >> >> > >> >> > df <- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109, >> >> > 232, >> >> > 363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704, >> >> > 1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216)) >> >> > >> >> > In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3 >> and >> >> > s >> >> > is sum of the current y and all previous y (s3 = y1 + y2 + y3). >> >> > >> >> > I know I can find b1, b2 and b3 using: >> >> > lm(y ~ 0 + x + I(x^2) + I(x^3), data=df) >> >> > >> >> > yielding... >> >> > Coefficients: >> >> > x I(x^2) I(x^3) >> >> > 100 10 -1 >> >> > >> >> > However, I need to find b1, b2 and b3 using the s column. The reason >> >> > being, I don't actually know the values of y in the actual data set. >> >> > And >> >> > in the actual data, I only have a few of the values. Imagine this >> data >> >> > is >> >> > being used a reward schedule for like a loyalty points program. y >> >> > represents the number of points needed for each level while s is the >> >> > total >> >> > number of points to reach that level. In the real problem, my data >> >> > looks >> >> > more like this: >> >> > >> >> > d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) >> >> > >> >> > Where I need to use a few sample points to help define the parameters >> of >> >> > the curve. >> >> > >> >> > thanks again and hopefully this makes the problem a bit clearer. >> >> > >> >> > robbie >> >> > >> >> > >> >> > >> >> > On Fri, May 18, 2012 at 7:40 PM, David Winsemius >> >> > wrote: >> >> > >> >> >> >> >> >> On May 18, 2012, at 1:44 PM, Robbie Edwards wrote: >> >> >> >> >> >> Hi all, >> >> >>> >> >> >>> I'm trying to model some data where the y is defined by >> >> >>> >> >> >>> y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3 >> >> >>> >> >> >>> Hopefully that reads clearly for email. >> >> >>> >> >> >>> >> >> >> cumsum( rowSums( cbind(B1 * x, B2 * x^2, B3 * x^3))) >> >> >> >> >> >> >> >> >> >> >> >> Anyway, if it wasn't for the summation, I know I would do it like >> this >> >> >>> >> >> >>> lm(y ~ x + x2 + x3) >> >> >>> >> >> >>> Where x2 and x3 are x^2 and x^3. >> >> >>> >> >> >>> However, since each value of x is related to the previous values of >> x, >> >> >>> I >> >> >>> don't know how to do this. Any help is greatly appreciated. >> >> >>> >> >> >>> >> >> >>> >> >> >> >> >> >> David Winsemius, MD >> >> >> West Hartford, CT >> >> >> >> >> >> >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > _
Re: [R] LM with summation function
I don't think I can. For the sample data d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) when x = 4, s = 1200. However, that s4 is sum of y1 + y2 + y3 + y4. Wouldn't I have to know the y for x = 2 and x = 3 to get the value of y for x = 4? In the previous message, I created two sample data frames. d is what I'm trying to use to create df. I only know what's in d, df is just used to illustrate what I'm trying to get from d. robbie On Tue, May 22, 2012 at 12:30 PM, R. Michael Weylandt < michael.weyla...@gmail.com> wrote: > But if I understand your problem correctly, you can get the y values > from the s values. I'm relying on your statement that "s is sum of the > current y and all previous y (s3 = y1 + y2 + y3)." E.g., > > y <- c(1, 4, 6, 9, 3, 7) > > s1 = 1 > s2 = 4 + s1 = 5 > s3 = 6 + s2 = 11 > > more generally > > s <- cumsum(y) > > Then if we only see s, we can get back the y vector by doing > > c(s[1], diff(s)) > > which is identical to y. > > So for your data, the underlying y must have been c(109, 1091, 4125, > 2891) right? > > Or have I completely misunderstood your problem? > > Michael > > On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards > wrote: > > Actually, I can't. I don't know the y values. Only the s and only for a > > subset of the data. > > > > Like this. > > > > d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) > > > > > > > > On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt > > wrote: > >> > >> You can reconstruct the y values by taking first-differences of the s > >> vector, no? Then it sounds like you're good to go > >> > >> Best, Michael > >> > >> On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards > >> wrote: > >> > Hi all, > >> > > >> > Thanks for the replies, but I realize I've done a bad job explaining > my > >> > problem. To help, I've created some sample data to explain the > problem. > >> > > >> > df <- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109, > >> > 232, > >> > 363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704, > >> > 1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216)) > >> > > >> > In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3 > and > >> > s > >> > is sum of the current y and all previous y (s3 = y1 + y2 + y3). > >> > > >> > I know I can find b1, b2 and b3 using: > >> > lm(y ~ 0 + x + I(x^2) + I(x^3), data=df) > >> > > >> > yielding... > >> > Coefficients: > >> > x I(x^2) I(x^3) > >> > 100 10 -1 > >> > > >> > However, I need to find b1, b2 and b3 using the s column. The reason > >> > being, I don't actually know the values of y in the actual data set. > >> > And > >> > in the actual data, I only have a few of the values. Imagine this > data > >> > is > >> > being used a reward schedule for like a loyalty points program. y > >> > represents the number of points needed for each level while s is the > >> > total > >> > number of points to reach that level. In the real problem, my data > >> > looks > >> > more like this: > >> > > >> > d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) > >> > > >> > Where I need to use a few sample points to help define the parameters > of > >> > the curve. > >> > > >> > thanks again and hopefully this makes the problem a bit clearer. > >> > > >> > robbie > >> > > >> > > >> > > >> > On Fri, May 18, 2012 at 7:40 PM, David Winsemius > >> > wrote: > >> > > >> >> > >> >> On May 18, 2012, at 1:44 PM, Robbie Edwards wrote: > >> >> > >> >> Hi all, > >> >>> > >> >>> I'm trying to model some data where the y is defined by > >> >>> > >> >>> y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3 > >> >>> > >> >>> Hopefully that reads clearly for email. > >> >>> > >> >>> > >> >> cumsum( rowSums( cbind(B1 * x, B2 * x^2, B3 * x^3))) > >> >> > >> >> > >> >> > >> >> Anyway, if it wasn't for the summation, I know I would do it like > this > >> >>> > >> >>> lm(y ~ x + x2 + x3) > >> >>> > >> >>> Where x2 and x3 are x^2 and x^3. > >> >>> > >> >>> However, since each value of x is related to the previous values of > x, > >> >>> I > >> >>> don't know how to do this. Any help is greatly appreciated. > >> >>> > >> >>> > >> >>> > >> >> > >> >> David Winsemius, MD > >> >> West Hartford, CT > >> >> > >> >> > >> > > >> >[[alternative HTML version deleted]] > >> > > >> > __ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LM with summation function
But if I understand your problem correctly, you can get the y values from the s values. I'm relying on your statement that "s is sum of the current y and all previous y (s3 = y1 + y2 + y3)." E.g., y <- c(1, 4, 6, 9, 3, 7) s1 = 1 s2 = 4 + s1 = 5 s3 = 6 + s2 = 11 more generally s <- cumsum(y) Then if we only see s, we can get back the y vector by doing c(s[1], diff(s)) which is identical to y. So for your data, the underlying y must have been c(109, 1091, 4125, 2891) right? Or have I completely misunderstood your problem? Michael On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards wrote: > Actually, I can't. I don't know the y values. Only the s and only for a > subset of the data. > > Like this. > > d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) > > > > On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt > wrote: >> >> You can reconstruct the y values by taking first-differences of the s >> vector, no? Then it sounds like you're good to go >> >> Best, Michael >> >> On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards >> wrote: >> > Hi all, >> > >> > Thanks for the replies, but I realize I've done a bad job explaining my >> > problem. To help, I've created some sample data to explain the problem. >> > >> > df <- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109, >> > 232, >> > 363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704, >> > 1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216)) >> > >> > In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3 and >> > s >> > is sum of the current y and all previous y (s3 = y1 + y2 + y3). >> > >> > I know I can find b1, b2 and b3 using: >> > lm(y ~ 0 + x + I(x^2) + I(x^3), data=df) >> > >> > yielding... >> > Coefficients: >> > x I(x^2) I(x^3) >> > 100 10 -1 >> > >> > However, I need to find b1, b2 and b3 using the s column. The reason >> > being, I don't actually know the values of y in the actual data set. >> > And >> > in the actual data, I only have a few of the values. Imagine this data >> > is >> > being used a reward schedule for like a loyalty points program. y >> > represents the number of points needed for each level while s is the >> > total >> > number of points to reach that level. In the real problem, my data >> > looks >> > more like this: >> > >> > d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) >> > >> > Where I need to use a few sample points to help define the parameters of >> > the curve. >> > >> > thanks again and hopefully this makes the problem a bit clearer. >> > >> > robbie >> > >> > >> > >> > On Fri, May 18, 2012 at 7:40 PM, David Winsemius >> > wrote: >> > >> >> >> >> On May 18, 2012, at 1:44 PM, Robbie Edwards wrote: >> >> >> >> Hi all, >> >>> >> >>> I'm trying to model some data where the y is defined by >> >>> >> >>> y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3 >> >>> >> >>> Hopefully that reads clearly for email. >> >>> >> >>> >> >> cumsum( rowSums( cbind(B1 * x, B2 * x^2, B3 * x^3))) >> >> >> >> >> >> >> >> Anyway, if it wasn't for the summation, I know I would do it like this >> >>> >> >>> lm(y ~ x + x2 + x3) >> >>> >> >>> Where x2 and x3 are x^2 and x^3. >> >>> >> >>> However, since each value of x is related to the previous values of x, >> >>> I >> >>> don't know how to do this. Any help is greatly appreciated. >> >>> >> >>> >> >>> >> >> >> >> David Winsemius, MD >> >> West Hartford, CT >> >> >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LM with summation function
Hi all, Thanks for the replies, but I realize I've done a bad job explaining my problem. To help, I've created some sample data to explain the problem. df <- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109, 232, 363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704, 1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216)) In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3 and s is sum of the current y and all previous y (s3 = y1 + y2 + y3). I know I can find b1, b2 and b3 using: lm(y ~ 0 + x + I(x^2) + I(x^3), data=df) yielding... Coefficients: x I(x^2) I(x^3) 100 10 -1 However, I need to find b1, b2 and b3 using the s column. The reason being, I don't actually know the values of y in the actual data set. And in the actual data, I only have a few of the values. Imagine this data is being used a reward schedule for like a loyalty points program. y represents the number of points needed for each level while s is the total number of points to reach that level. In the real problem, my data looks more like this: d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216)) Where I need to use a few sample points to help define the parameters of the curve. thanks again and hopefully this makes the problem a bit clearer. robbie On Fri, May 18, 2012 at 7:40 PM, David Winsemius wrote: > > On May 18, 2012, at 1:44 PM, Robbie Edwards wrote: > > Hi all, >> >> I'm trying to model some data where the y is defined by >> >> y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3 >> >> Hopefully that reads clearly for email. >> >> > cumsum( rowSums( cbind(B1 * x, B2 * x^2, B3 * x^3))) > > > > Anyway, if it wasn't for the summation, I know I would do it like this >> >> lm(y ~ x + x2 + x3) >> >> Where x2 and x3 are x^2 and x^3. >> >> However, since each value of x is related to the previous values of x, I >> don't know how to do this. Any help is greatly appreciated. >> >> >> > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LM with summation function
On May 18, 2012, at 1:44 PM, Robbie Edwards wrote: Hi all, I'm trying to model some data where the y is defined by y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3 Hopefully that reads clearly for email. cumsum( rowSums( cbind(B1 * x, B2 * x^2, B3 * x^3))) Anyway, if it wasn't for the summation, I know I would do it like this lm(y ~ x + x2 + x3) Where x2 and x3 are x^2 and x^3. However, since each value of x is related to the previous values of x, I don't know how to do this. Any help is greatly appreciated. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LM with summation function
Following up on Rolf's post: 1) cumulative summation (cumsum) maybe? 2) In fact, you should probably **not** fit the non-summation version as you have stated. See ?poly. I would guess that context is important here. Based on (my interpretation) of the rather strange nature of your request, I suspect that you shouldn't be trying to do what you're doing **at all**; but that's just a guess, of course. -- Bert On Fri, May 18, 2012 at 1:56 PM, Rolf Turner wrote: > On 19/05/12 05:44, Robbie Edwards wrote: >> >> Hi all, >> >> I'm trying to model some data where the y is defined by >> >> y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3 >> >> Hopefully that reads clearly for email. >> >> Anyway, if it wasn't for the summation, I know I would do it like this >> >> lm(y ~ x + x2 + x3) >> >> Where x2 and x3 are x^2 and x^3. >> >> However, since each value of x is related to the previous values of x, I >> don't know how to do this. Any help is greatly appreciated. > > > If your mail says what it seems to say, then your question makes > no sense. You are in effect trying to fit a linear model to a single > point: > > y = B1*s1 + B2*s2 + B3*3 > > where s1 = sum(x), s2 = sum(x^2) and s3=sum(x^3) > > and you have only a single value of each of s1, s2, s3. > > If you have replicate values of s1, s2, and s3 (i.e. replicate > vectors (x1, ... x50)) --- and of course a corresponding y value > for each replicate --- then just form s1, s2, and s3 as vectors > whose entries correspond to the replicates and then fit > > lm(y ~ s1 + s2 + s3) > > If I have misunderstood what you are asking then please provide > a self-contained reproducible example as the posting guide requests. > > cheers, > > Rolf Turner > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LM with summation function
On 19/05/12 05:44, Robbie Edwards wrote: Hi all, I'm trying to model some data where the y is defined by y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3 Hopefully that reads clearly for email. Anyway, if it wasn't for the summation, I know I would do it like this lm(y ~ x + x2 + x3) Where x2 and x3 are x^2 and x^3. However, since each value of x is related to the previous values of x, I don't know how to do this. Any help is greatly appreciated. If your mail says what it seems to say, then your question makes no sense. You are in effect trying to fit a linear model to a single point: y = B1*s1 + B2*s2 + B3*3 where s1 = sum(x), s2 = sum(x^2) and s3=sum(x^3) and you have only a single value of each of s1, s2, s3. If you have replicate values of s1, s2, and s3 (i.e. replicate vectors (x1, ... x50)) --- and of course a corresponding y value for each replicate --- then just form s1, s2, and s3 as vectors whose entries correspond to the replicates and then fit lm(y ~ s1 + s2 + s3) If I have misunderstood what you are asking then please provide a self-contained reproducible example as the posting guide requests. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] LM with summation function
Hi all, I'm trying to model some data where the y is defined by y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3 Hopefully that reads clearly for email. Anyway, if it wasn't for the summation, I know I would do it like this lm(y ~ x + x2 + x3) Where x2 and x3 are x^2 and x^3. However, since each value of x is related to the previous values of x, I don't know how to do this. Any help is greatly appreciated. robbie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.