Re: [R] Curve Fitting/Regression with Multiple Observations
Many thanks for the suggestion! That may reduce the computational time needed to find x value given the y one (for hundreds of pairs). Certainly, I will look into manuals for approx() and approxfun() in this regard. Again, thanks for your taking time to read my previous posts and make this valuable suggestion. Regards, Joseph On Sat, May 1, 2010 at 12:41 AM, Greg Snow greg.s...@imail.org wrote: I did not understand enough of the rest of your question to give any better response than others have given. Looking back at your previous posts, there is one suggestion that I can make that may help. You can use the approx or approxfun functions to approximate an inverse, just generate a bunch of x,y pairs from your function, then feed them to approx while switching x and y. Not an exact inverse, but if you give it enough points then it will be close. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] Sent: Friday, April 30, 2010 5:24 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations I have already learned a lot from the list, both technical and not, and cannot thank enough for those valuable suggestions. In fact, as said in my previous posts, I got really critical help and advices, which really addresses the issues I have. By the way, there is one point or two in your post I agree on, but I am not sure why you just pointed out side issues (by snipping a part of my saying) without touching the main topic of this thread at all. I can go on but won't because arguing for the sake of argument is of no value to anyone in this thread. It would have been better if you could have focused on the topic and provided some technical and practical information which I could learn from and be very thankful for. Regards, Joseph On Fri, Apr 30, 2010 at 11:35 PM, Greg Snow greg.s...@imail.org wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kyeong Soo (Joseph) Kim Sent: Friday, April 30, 2010 4:10 AM To: kMan Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations [snip] By the way, I wonder why most of the responses I've received from this list are so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's very hard to imagine that one would receive such responses in my own areas of computer simulation and optical communications/networking. If a newbie asks a question to the list not making much sense or another FAQ, that is usually ignored (i.e., no response) because all we are too busy to deal with that. Sometimes, though, a kind soul (like Gabor) takes his/her own valuable time and doesn't mind explaining all the details from simple basics. In my experience with this list, and others, the perceived level of cynical/skeptical/arrogant answers has more to do with the reader than with the writer. If you want to be offended, you will find things to be offended about even when none was intended. If you look for help and useful responses (follow the posting guide) and are thankful for what you learn, you will learn more and be bothered less. R-help is a mixture of different levels and cultures. In framing responses it is hard to know what the other person may find offensive (I was once yelled at and chewed out quite thoroughly for truthfully answering no when asked if I drink coffee). Most responders on this list (actually I would say all, but there might be an exception that I have not noticed) are trying to be helpful, there is just a large variability in the tones of the responses. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Curve Fitting/Regression with Multiple Observations
Dear Keith, Thanks for the suggestion and taking your time to respond to it. But, you misunderstand something and seems that you do not read all my previous e-mails. For instance, can a hand-drawing curve give you an inverse function (analytically or numerically) so that you can find an x value given the y value (not just for one, but for hundreds of points)? As for the statistical inferences, I admit that my communications were not that very clear. My intention is to get a smoothed curve from the simulation data in a statistically meaningful way as much as possible for my intended use of the resulting curve. As said before, I don't know all the thorough theoretical details behind regression and curve fitting functions available in R (know the basics though as one with PhD in Elec. Eng. unlike someone's assessment), but am doing my best to catch up reading textbooks and manuals, and posting this question to this list is definitely a way to learn from many experts and advanced users of R. By the way, I wonder why most of the responses I've received from this list are so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's very hard to imagine that one would receive such responses in my own areas of computer simulation and optical communications/networking. If a newbie asks a question to the list not making much sense or another FAQ, that is usually ignored (i.e., no response) because all we are too busy to deal with that. Sometimes, though, a kind soul (like Gabor) takes his/her own valuable time and doesn't mind explaining all the details from simple basics. Again, what I want to hear from the list is the proper use of regression/curve fitting functions of R for my simulation data with replications: Applying after taking means or directly on them? So far I haven't heard anyone even specifically touching my question, although there were several seemingly related suggestions. Regards, Joseph On Fri, Apr 30, 2010 at 4:25 AM, kMan kchambe...@gmail.com wrote: Dear Joseph, If you do not need to make any inferences, that is, you just want it to look pretty, then drawing a curve by hand is as good a solution as any. Plus, there is no reason for expert testimony to say that the curve does not mean anything. Sincerely, KeithC. -Original Message- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] Sent: Tuesday, April 27, 2010 2:33 PM To: Gabor Grothendieck Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations Frankly speaking, I am not looking for such a framework. The system I'm studying is a communication network (like M/M/1 queue, but way too complicated to mathematically analyze it using classical queueing theory) and the conclusion I want to make is qualitative rather than quantatitive -- a high-level comparative study of various network architectures based on the equivalence principle (a concept specific to netwokring, not in the general sense). What l want in this regard is a smooth, non-decreasing (hence one-to-one) function built out of simulation data because later in my processing, I need an inverse function of the said curve to find out an x value given the y value. That was, in fact, the reason I used the exponential (i.e., non-decreasing function) curve fiting. Even though I don't need a statistical inference framework for my work, I want to make sure that my use of regression/curve fitting techniques with my simulation data (as a tool for getting the mentioned curve) is proper and a usual practice among experts like you. To get answer to my question, I digged a lot through the Internet but found no clear explanation so far. Your suggestions and providing examples (always!) are much appreciated, but I am still not sure the use of those regression procedures with the kind of data I described is a right way to do. Again, many thanks for your prompt and kind answers, Joseph On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: If you are looking for a framework for statistical inference you could look at additive models as in the mgcv package which has a book associated with it if you need more info. e.g. library(mgcv) fm - gam(dist ~ s(speed), data = cars) summary(fm) plot(dist ~ speed, cars, pch = 20) fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, 2)) On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim kyeongsoo@gmail.com wrote: Hello Gabor, Many thanks for providing actual examples for the problem! In fact I know how to apply and generate plots using various R functions including loess, lowess, and smooth.spline procedures. My question, however, is whether applying those procedures directly on the data with multiple observations/duplicate points(?) is on the sound basis or not. Before asking
Re: [R] Curve Fitting/Regression with Multiple Observations
You may want to run RSiteSearch(monotone splines) at the R prompt. The 3rd hit looks quite promising. However, if I understand your data, you have multiple y values for the same x values. If so, can you justify inverting the regression function? The traffic on this mailing list is very high, and the signal to noise ratio is rather low. This has the tendency of burning out those who started with good intentions to help. Andy From: Kyeong Soo (Joseph) Kim Dear Keith, Thanks for the suggestion and taking your time to respond to it. But, you misunderstand something and seems that you do not read all my previous e-mails. For instance, can a hand-drawing curve give you an inverse function (analytically or numerically) so that you can find an x value given the y value (not just for one, but for hundreds of points)? As for the statistical inferences, I admit that my communications were not that very clear. My intention is to get a smoothed curve from the simulation data in a statistically meaningful way as much as possible for my intended use of the resulting curve. As said before, I don't know all the thorough theoretical details behind regression and curve fitting functions available in R (know the basics though as one with PhD in Elec. Eng. unlike someone's assessment), but am doing my best to catch up reading textbooks and manuals, and posting this question to this list is definitely a way to learn from many experts and advanced users of R. By the way, I wonder why most of the responses I've received from this list are so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's very hard to imagine that one would receive such responses in my own areas of computer simulation and optical communications/networking. If a newbie asks a question to the list not making much sense or another FAQ, that is usually ignored (i.e., no response) because all we are too busy to deal with that. Sometimes, though, a kind soul (like Gabor) takes his/her own valuable time and doesn't mind explaining all the details from simple basics. Again, what I want to hear from the list is the proper use of regression/curve fitting functions of R for my simulation data with replications: Applying after taking means or directly on them? So far I haven't heard anyone even specifically touching my question, although there were several seemingly related suggestions. Regards, Joseph On Fri, Apr 30, 2010 at 4:25 AM, kMan kchambe...@gmail.com wrote: Dear Joseph, If you do not need to make any inferences, that is, you just want it to look pretty, then drawing a curve by hand is as good a solution as any. Plus, there is no reason for expert testimony to say that the curve does not mean anything. Sincerely, KeithC. -Original Message- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] Sent: Tuesday, April 27, 2010 2:33 PM To: Gabor Grothendieck Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations Frankly speaking, I am not looking for such a framework. The system I'm studying is a communication network (like M/M/1 queue, but way too complicated to mathematically analyze it using classical queueing theory) and the conclusion I want to make is qualitative rather than quantatitive -- a high-level comparative study of various network architectures based on the equivalence principle (a concept specific to netwokring, not in the general sense). What l want in this regard is a smooth, non-decreasing (hence one-to-one) function built out of simulation data because later in my processing, I need an inverse function of the said curve to find out an x value given the y value. That was, in fact, the reason I used the exponential (i.e., non-decreasing function) curve fiting. Even though I don't need a statistical inference framework for my work, I want to make sure that my use of regression/curve fitting techniques with my simulation data (as a tool for getting the mentioned curve) is proper and a usual practice among experts like you. To get answer to my question, I digged a lot through the Internet but found no clear explanation so far. Your suggestions and providing examples (always!) are much appreciated, but I am still not sure the use of those regression procedures with the kind of data I described is a right way to do. Again, many thanks for your prompt and kind answers, Joseph On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: If you are looking for a framework for statistical inference you could look at additive models as in the mgcv package which has a book associated with it if you need more info. e.g. library(mgcv) fm - gam(dist ~ s(speed), data = cars) summary(fm) plot(dist ~ speed, cars, pch = 20) fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit
Re: [R] Curve Fitting/Regression with Multiple Observations
Dear Joseph, I have had a similar experience to replies. Andy's assessment about signal to noise on the list is, I believe, quite accurate, and quite elegant. My experience has generally been that R-replies get better with age. I welcome the feedback you just provided. Sincerely, KeithC. -Original Message- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] Sent: Friday, April 30, 2010 4:10 AM To: kMan Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations Dear Keith, Thanks for the suggestion and taking your time to respond to it. But, you misunderstand something and seems that you do not read all my previous e-mails. For instance, can a hand-drawing curve give you an inverse function (analytically or numerically) so that you can find an x value given the y value (not just for one, but for hundreds of points)? As for the statistical inferences, I admit that my communications were not that very clear. My intention is to get a smoothed curve from the simulation data in a statistically meaningful way as much as possible for my intended use of the resulting curve. As said before, I don't know all the thorough theoretical details behind regression and curve fitting functions available in R (know the basics though as one with PhD in Elec. Eng. unlike someone's assessment), but am doing my best to catch up reading textbooks and manuals, and posting this question to this list is definitely a way to learn from many experts and advanced users of R. By the way, I wonder why most of the responses I've received from this list are so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's very hard to imagine that one would receive such responses in my own areas of computer simulation and optical communications/networking. If a newbie asks a question to the list not making much sense or another FAQ, that is usually ignored (i.e., no response) because all we are too busy to deal with that. Sometimes, though, a kind soul (like Gabor) takes his/her own valuable time and doesn't mind explaining all the details from simple basics. Again, what I want to hear from the list is the proper use of regression/curve fitting functions of R for my simulation data with replications: Applying after taking means or directly on them? So far I haven't heard anyone even specifically touching my question, although there were several seemingly related suggestions. Regards, Joseph On Fri, Apr 30, 2010 at 4:25 AM, kMan kchambe...@gmail.com wrote: Dear Joseph, If you do not need to make any inferences, that is, you just want it to look pretty, then drawing a curve by hand is as good a solution as any. Plus, there is no reason for expert testimony to say that the curve does not mean anything. Sincerely, KeithC. -Original Message- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] Sent: Tuesday, April 27, 2010 2:33 PM To: Gabor Grothendieck Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations Frankly speaking, I am not looking for such a framework. The system I'm studying is a communication network (like M/M/1 queue, but way too complicated to mathematically analyze it using classical queueing theory) and the conclusion I want to make is qualitative rather than quantatitive -- a high-level comparative study of various network architectures based on the equivalence principle (a concept specific to netwokring, not in the general sense). What l want in this regard is a smooth, non-decreasing (hence one-to-one) function built out of simulation data because later in my processing, I need an inverse function of the said curve to find out an x value given the y value. That was, in fact, the reason I used the exponential (i.e., non-decreasing function) curve fiting. Even though I don't need a statistical inference framework for my work, I want to make sure that my use of regression/curve fitting techniques with my simulation data (as a tool for getting the mentioned curve) is proper and a usual practice among experts like you. To get answer to my question, I digged a lot through the Internet but found no clear explanation so far. Your suggestions and providing examples (always!) are much appreciated, but I am still not sure the use of those regression procedures with the kind of data I described is a right way to do. Again, many thanks for your prompt and kind answers, Joseph On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: If you are looking for a framework for statistical inference you could look at additive models as in the mgcv package which has a book associated with it if you need more info. e.g. library(mgcv) fm - gam(dist ~ s(speed), data = cars) summary(fm) plot(dist ~ speed, cars, pch = 20) fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit
Re: [R] Curve Fitting/Regression with Multiple Observations
Dear Andy, You're the kind soul I mentioned in my previous e-mail! Certainly yours is the kind of response I've been looking for, and now I can start with that, especially splinefun() with monoH.FC method. As for my simulation data, your understanding is correct; there are multiple y values from different replications for the same x values. Even though there are multiple y values for a given x value, this could be interpreted as the combination of multiple, different random components (inherent in any monte carlo simulation) + one fixed, unknown deterministic component. So underlying assumption is that there is a one-to-one (monotone) function between x and y. This is typical in many computer simulation in networking. As said before, for instance, you can get a nice, closed-form (monotone) function of utilization (i.e., \rho) for the average delay of customers in the queueing system in M/M/1 queue. The simulation with different random seeds, however, gives slightly different average delays for a given utilization per run. Still, we know from the underlying model that there is one-to-one correspondence between the utilization and the average delay. Of course, unlike the simple M/M/1 queue, for most of actual networking systems to analyze, we don't know the exact models, but it is well accepted and assumed in nearly all existing work in this area that there is still one-to-one correspondence between the utilization (or system load) and performance measures like delay, throughput, and packet loss. I do appreciate your suggestion and this would be of tremendous help for my current research. Also, thanks for the assessment on this list, which I take as a valuable advice in the future. With Regards, Joseph On Fri, Apr 30, 2010 at 12:52 PM, Liaw, Andy andy_l...@merck.com wrote: You may want to run RSiteSearch(monotone splines) at the R prompt. The 3rd hit looks quite promising. However, if I understand your data, you have multiple y values for the same x values. If so, can you justify inverting the regression function? The traffic on this mailing list is very high, and the signal to noise ratio is rather low. This has the tendency of burning out those who started with good intentions to help. Andy From: Kyeong Soo (Joseph) Kim Dear Keith, Thanks for the suggestion and taking your time to respond to it. But, you misunderstand something and seems that you do not read all my previous e-mails. For instance, can a hand-drawing curve give you an inverse function (analytically or numerically) so that you can find an x value given the y value (not just for one, but for hundreds of points)? As for the statistical inferences, I admit that my communications were not that very clear. My intention is to get a smoothed curve from the simulation data in a statistically meaningful way as much as possible for my intended use of the resulting curve. As said before, I don't know all the thorough theoretical details behind regression and curve fitting functions available in R (know the basics though as one with PhD in Elec. Eng. unlike someone's assessment), but am doing my best to catch up reading textbooks and manuals, and posting this question to this list is definitely a way to learn from many experts and advanced users of R. By the way, I wonder why most of the responses I've received from this list are so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's very hard to imagine that one would receive such responses in my own areas of computer simulation and optical communications/networking. If a newbie asks a question to the list not making much sense or another FAQ, that is usually ignored (i.e., no response) because all we are too busy to deal with that. Sometimes, though, a kind soul (like Gabor) takes his/her own valuable time and doesn't mind explaining all the details from simple basics. Again, what I want to hear from the list is the proper use of regression/curve fitting functions of R for my simulation data with replications: Applying after taking means or directly on them? So far I haven't heard anyone even specifically touching my question, although there were several seemingly related suggestions. Regards, Joseph On Fri, Apr 30, 2010 at 4:25 AM, kMan kchambe...@gmail.com wrote: Dear Joseph, If you do not need to make any inferences, that is, you just want it to look pretty, then drawing a curve by hand is as good a solution as any. Plus, there is no reason for expert testimony to say that the curve does not mean anything. Sincerely, KeithC. -Original Message- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] Sent: Tuesday, April 27, 2010 2:33 PM To: Gabor Grothendieck Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations Frankly speaking, I am not looking for such a framework. The system I'm studying is a communication
Re: [R] Curve Fitting/Regression with Multiple Observations
Dear Keith, I will keep that in mind in my future posting. Again, thanks for your time and advice! Regards, Joseph On Fri, Apr 30, 2010 at 3:54 PM, kMan kchambe...@gmail.com wrote: Dear Joseph, I have had a similar experience to replies. Andy's assessment about signal to noise on the list is, I believe, quite accurate, and quite elegant. My experience has generally been that R-replies get better with age. I welcome the feedback you just provided. Sincerely, KeithC. -Original Message- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] Sent: Friday, April 30, 2010 4:10 AM To: kMan Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations Dear Keith, Thanks for the suggestion and taking your time to respond to it. But, you misunderstand something and seems that you do not read all my previous e-mails. For instance, can a hand-drawing curve give you an inverse function (analytically or numerically) so that you can find an x value given the y value (not just for one, but for hundreds of points)? As for the statistical inferences, I admit that my communications were not that very clear. My intention is to get a smoothed curve from the simulation data in a statistically meaningful way as much as possible for my intended use of the resulting curve. As said before, I don't know all the thorough theoretical details behind regression and curve fitting functions available in R (know the basics though as one with PhD in Elec. Eng. unlike someone's assessment), but am doing my best to catch up reading textbooks and manuals, and posting this question to this list is definitely a way to learn from many experts and advanced users of R. By the way, I wonder why most of the responses I've received from this list are so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's very hard to imagine that one would receive such responses in my own areas of computer simulation and optical communications/networking. If a newbie asks a question to the list not making much sense or another FAQ, that is usually ignored (i.e., no response) because all we are too busy to deal with that. Sometimes, though, a kind soul (like Gabor) takes his/her own valuable time and doesn't mind explaining all the details from simple basics. Again, what I want to hear from the list is the proper use of regression/curve fitting functions of R for my simulation data with replications: Applying after taking means or directly on them? So far I haven't heard anyone even specifically touching my question, although there were several seemingly related suggestions. Regards, Joseph On Fri, Apr 30, 2010 at 4:25 AM, kMan kchambe...@gmail.com wrote: Dear Joseph, If you do not need to make any inferences, that is, you just want it to look pretty, then drawing a curve by hand is as good a solution as any. Plus, there is no reason for expert testimony to say that the curve does not mean anything. Sincerely, KeithC. -Original Message- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] Sent: Tuesday, April 27, 2010 2:33 PM To: Gabor Grothendieck Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations Frankly speaking, I am not looking for such a framework. The system I'm studying is a communication network (like M/M/1 queue, but way too complicated to mathematically analyze it using classical queueing theory) and the conclusion I want to make is qualitative rather than quantatitive -- a high-level comparative study of various network architectures based on the equivalence principle (a concept specific to netwokring, not in the general sense). What l want in this regard is a smooth, non-decreasing (hence one-to-one) function built out of simulation data because later in my processing, I need an inverse function of the said curve to find out an x value given the y value. That was, in fact, the reason I used the exponential (i.e., non-decreasing function) curve fiting. Even though I don't need a statistical inference framework for my work, I want to make sure that my use of regression/curve fitting techniques with my simulation data (as a tool for getting the mentioned curve) is proper and a usual practice among experts like you. To get answer to my question, I digged a lot through the Internet but found no clear explanation so far. Your suggestions and providing examples (always!) are much appreciated, but I am still not sure the use of those regression procedures with the kind of data I described is a right way to do. Again, many thanks for your prompt and kind answers, Joseph On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: If you are looking for a framework for statistical inference you could look at additive models as in the mgcv package which has
Re: [R] Curve Fitting/Regression with Multiple Observations
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kyeong Soo (Joseph) Kim Sent: Friday, April 30, 2010 4:10 AM To: kMan Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations [snip] By the way, I wonder why most of the responses I've received from this list are so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's very hard to imagine that one would receive such responses in my own areas of computer simulation and optical communications/networking. If a newbie asks a question to the list not making much sense or another FAQ, that is usually ignored (i.e., no response) because all we are too busy to deal with that. Sometimes, though, a kind soul (like Gabor) takes his/her own valuable time and doesn't mind explaining all the details from simple basics. In my experience with this list, and others, the perceived level of cynical/skeptical/arrogant answers has more to do with the reader than with the writer. If you want to be offended, you will find things to be offended about even when none was intended. If you look for help and useful responses (follow the posting guide) and are thankful for what you learn, you will learn more and be bothered less. R-help is a mixture of different levels and cultures. In framing responses it is hard to know what the other person may find offensive (I was once yelled at and chewed out quite thoroughly for truthfully answering no when asked if I drink coffee). Most responders on this list (actually I would say all, but there might be an exception that I have not noticed) are trying to be helpful, there is just a large variability in the tones of the responses. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Curve Fitting/Regression with Multiple Observations
I have already learned a lot from the list, both technical and not, and cannot thank enough for those valuable suggestions. In fact, as said in my previous posts, I got really critical help and advices, which really addresses the issues I have. By the way, there is one point or two in your post I agree on, but I am not sure why you just pointed out side issues (by snipping a part of my saying) without touching the main topic of this thread at all. I can go on but won't because arguing for the sake of argument is of no value to anyone in this thread. It would have been better if you could have focused on the topic and provided some technical and practical information which I could learn from and be very thankful for. Regards, Joseph On Fri, Apr 30, 2010 at 11:35 PM, Greg Snow greg.s...@imail.org wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kyeong Soo (Joseph) Kim Sent: Friday, April 30, 2010 4:10 AM To: kMan Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations [snip] By the way, I wonder why most of the responses I've received from this list are so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's very hard to imagine that one would receive such responses in my own areas of computer simulation and optical communications/networking. If a newbie asks a question to the list not making much sense or another FAQ, that is usually ignored (i.e., no response) because all we are too busy to deal with that. Sometimes, though, a kind soul (like Gabor) takes his/her own valuable time and doesn't mind explaining all the details from simple basics. In my experience with this list, and others, the perceived level of cynical/skeptical/arrogant answers has more to do with the reader than with the writer. If you want to be offended, you will find things to be offended about even when none was intended. If you look for help and useful responses (follow the posting guide) and are thankful for what you learn, you will learn more and be bothered less. R-help is a mixture of different levels and cultures. In framing responses it is hard to know what the other person may find offensive (I was once yelled at and chewed out quite thoroughly for truthfully answering no when asked if I drink coffee). Most responders on this list (actually I would say all, but there might be an exception that I have not noticed) are trying to be helpful, there is just a large variability in the tones of the responses. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Curve Fitting/Regression with Multiple Observations
I did not understand enough of the rest of your question to give any better response than others have given. Looking back at your previous posts, there is one suggestion that I can make that may help. You can use the approx or approxfun functions to approximate an inverse, just generate a bunch of x,y pairs from your function, then feed them to approx while switching x and y. Not an exact inverse, but if you give it enough points then it will be close. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] Sent: Friday, April 30, 2010 5:24 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations I have already learned a lot from the list, both technical and not, and cannot thank enough for those valuable suggestions. In fact, as said in my previous posts, I got really critical help and advices, which really addresses the issues I have. By the way, there is one point or two in your post I agree on, but I am not sure why you just pointed out side issues (by snipping a part of my saying) without touching the main topic of this thread at all. I can go on but won't because arguing for the sake of argument is of no value to anyone in this thread. It would have been better if you could have focused on the topic and provided some technical and practical information which I could learn from and be very thankful for. Regards, Joseph On Fri, Apr 30, 2010 at 11:35 PM, Greg Snow greg.s...@imail.org wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Kyeong Soo (Joseph) Kim Sent: Friday, April 30, 2010 4:10 AM To: kMan Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations [snip] By the way, I wonder why most of the responses I've received from this list are so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's very hard to imagine that one would receive such responses in my own areas of computer simulation and optical communications/networking. If a newbie asks a question to the list not making much sense or another FAQ, that is usually ignored (i.e., no response) because all we are too busy to deal with that. Sometimes, though, a kind soul (like Gabor) takes his/her own valuable time and doesn't mind explaining all the details from simple basics. In my experience with this list, and others, the perceived level of cynical/skeptical/arrogant answers has more to do with the reader than with the writer. If you want to be offended, you will find things to be offended about even when none was intended. If you look for help and useful responses (follow the posting guide) and are thankful for what you learn, you will learn more and be bothered less. R-help is a mixture of different levels and cultures. In framing responses it is hard to know what the other person may find offensive (I was once yelled at and chewed out quite thoroughly for truthfully answering no when asked if I drink coffee). Most responders on this list (actually I would say all, but there might be an exception that I have not noticed) are trying to be helpful, there is just a large variability in the tones of the responses. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Curve Fitting/Regression with Multiple Observations
Dear Joseph, If you do not need to make any inferences, that is, you just want it to look pretty, then drawing a curve by hand is as good a solution as any. Plus, there is no reason for expert testimony to say that the curve does not mean anything. Sincerely, KeithC. -Original Message- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] Sent: Tuesday, April 27, 2010 2:33 PM To: Gabor Grothendieck Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations Frankly speaking, I am not looking for such a framework. The system I'm studying is a communication network (like M/M/1 queue, but way too complicated to mathematically analyze it using classical queueing theory) and the conclusion I want to make is qualitative rather than quantatitive -- a high-level comparative study of various network architectures based on the equivalence principle (a concept specific to netwokring, not in the general sense). What l want in this regard is a smooth, non-decreasing (hence one-to-one) function built out of simulation data because later in my processing, I need an inverse function of the said curve to find out an x value given the y value. That was, in fact, the reason I used the exponential (i.e., non-decreasing function) curve fiting. Even though I don't need a statistical inference framework for my work, I want to make sure that my use of regression/curve fitting techniques with my simulation data (as a tool for getting the mentioned curve) is proper and a usual practice among experts like you. To get answer to my question, I digged a lot through the Internet but found no clear explanation so far. Your suggestions and providing examples (always!) are much appreciated, but I am still not sure the use of those regression procedures with the kind of data I described is a right way to do. Again, many thanks for your prompt and kind answers, Joseph On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: If you are looking for a framework for statistical inference you could look at additive models as in the mgcv package which has a book associated with it if you need more info. e.g. library(mgcv) fm - gam(dist ~ s(speed), data = cars) summary(fm) plot(dist ~ speed, cars, pch = 20) fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, 2)) On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim kyeongsoo@gmail.com wrote: Hello Gabor, Many thanks for providing actual examples for the problem! In fact I know how to apply and generate plots using various R functions including loess, lowess, and smooth.spline procedures. My question, however, is whether applying those procedures directly on the data with multiple observations/duplicate points(?) is on the sound basis or not. Before asking my question to the list, I checked smooth.spline manual pages and found the mentioning of cv option related with duplicate points, but I'm not sure duplicate points in the manual has the same meaning as multiple observations in my case. To me, the manual seems a bit unclear in this regard. Looking at car data, I found it has multiple points with the same speed but different dist, which is exactly what I mean by multiple observations, but am still not sure. Regards, Joseph On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: This will compute a loess curve and plot it: example(loess) plot(dist ~ speed, cars, pch = 20) lines(cars$speed, fitted(cars.lo)) Also this directly plots it but does not give you the values of the curve separately: library(lattice) xyplot(dist ~ speed, cars, type = c(p, smooth)) On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim kyeongsoo@gmail.com wrote: I recently came to realize the true power of R for statistical analysis -- mainly for post-processing of data from large-scale simulations -- and have been converting many of existing Python(SciPy) scripts to those based on R and/or Perl. In the middle of this conversion, I revisited the problem of curve fitting for simulation data with multiple observations resulting from repetitions. In the past, I first processed simulation data (i.e., multiple y's from repetitions) to get a mean with a confidence interval for a given value of x (independent variable) and then applied spline procedure for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather large confidence intervals, however, the resulting curves were hardly smooth enough for my purpose, I had to fix the function to exponential and used least square methods to fit its parameters for data. From a plot with confidence intervals, it's rather easy for one to visually and manually(?) figure out a smoothed curve for it. So
[R] Curve Fitting/Regression with Multiple Observations
I recently came to realize the true power of R for statistical analysis -- mainly for post-processing of data from large-scale simulations -- and have been converting many of existing Python(SciPy) scripts to those based on R and/or Perl. In the middle of this conversion, I revisited the problem of curve fitting for simulation data with multiple observations resulting from repetitions. In the past, I first processed simulation data (i.e., multiple y's from repetitions) to get a mean with a confidence interval for a given value of x (independent variable) and then applied spline procedure for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather large confidence intervals, however, the resulting curves were hardly smooth enough for my purpose, I had to fix the function to exponential and used least square methods to fit its parameters for data. From a plot with confidence intervals, it's rather easy for one to visually and manually(?) figure out a smoothed curve for it. So I'm thinking right now of directly applying spline (or whatever regression procedures for this purpose) to the simulation data with repetitions rather than means. The simulation data in this case looks like this (assuming three repetitions): # xy 1 1.2 1 0.9 1 1.3 2 2.2 2 1.7 2 2.0 ... So my idea is to let spline procedure handle the fluctuations in the data (i.e., in repetitions) by itself. But I wonder whether this direct application of spline procedures for data with multiple observations makes sense from the statistical analysis (i.e., theoretical) point of view. It may be a stupid question and quite obvious to many, but personally I don't know where to start. It would be greatly appreciated if anyone can shed a light on this in this regard. Many thanks in advance, Joseph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Curve Fitting/Regression with Multiple Observations
Joseph: I believe you need to stop inventing your own statistical methods and consult a professional statistician. I do not think this list is the proper place to look for a statistics tutorial when your statistical background appears to be so inadequate for the task. Sorry to be so direct -- perhaps I am wrong in my assessment. But if I am even close, would you like an accountant to fix your car or an auto mechanic to do your taxes? Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Kyeong Soo (Joseph) Kim Sent: Tuesday, April 27, 2010 10:31 AM To: r-help@r-project.org Subject: [R] Curve Fitting/Regression with Multiple Observations I recently came to realize the true power of R for statistical analysis -- mainly for post-processing of data from large-scale simulations -- and have been converting many of existing Python(SciPy) scripts to those based on R and/or Perl. In the middle of this conversion, I revisited the problem of curve fitting for simulation data with multiple observations resulting from repetitions. In the past, I first processed simulation data (i.e., multiple y's from repetitions) to get a mean with a confidence interval for a given value of x (independent variable) and then applied spline procedure for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather large confidence intervals, however, the resulting curves were hardly smooth enough for my purpose, I had to fix the function to exponential and used least square methods to fit its parameters for data. From a plot with confidence intervals, it's rather easy for one to visually and manually(?) figure out a smoothed curve for it. So I'm thinking right now of directly applying spline (or whatever regression procedures for this purpose) to the simulation data with repetitions rather than means. The simulation data in this case looks like this (assuming three repetitions): # xy 1 1.2 1 0.9 1 1.3 2 2.2 2 1.7 2 2.0 ... So my idea is to let spline procedure handle the fluctuations in the data (i.e., in repetitions) by itself. But I wonder whether this direct application of spline procedures for data with multiple observations makes sense from the statistical analysis (i.e., theoretical) point of view. It may be a stupid question and quite obvious to many, but personally I don't know where to start. It would be greatly appreciated if anyone can shed a light on this in this regard. Many thanks in advance, Joseph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Curve Fitting/Regression with Multiple Observations
This will compute a loess curve and plot it: example(loess) plot(dist ~ speed, cars, pch = 20) lines(cars$speed, fitted(cars.lo)) Also this directly plots it but does not give you the values of the curve separately: library(lattice) xyplot(dist ~ speed, cars, type = c(p, smooth)) On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim kyeongsoo@gmail.com wrote: I recently came to realize the true power of R for statistical analysis -- mainly for post-processing of data from large-scale simulations -- and have been converting many of existing Python(SciPy) scripts to those based on R and/or Perl. In the middle of this conversion, I revisited the problem of curve fitting for simulation data with multiple observations resulting from repetitions. In the past, I first processed simulation data (i.e., multiple y's from repetitions) to get a mean with a confidence interval for a given value of x (independent variable) and then applied spline procedure for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather large confidence intervals, however, the resulting curves were hardly smooth enough for my purpose, I had to fix the function to exponential and used least square methods to fit its parameters for data. From a plot with confidence intervals, it's rather easy for one to visually and manually(?) figure out a smoothed curve for it. So I'm thinking right now of directly applying spline (or whatever regression procedures for this purpose) to the simulation data with repetitions rather than means. The simulation data in this case looks like this (assuming three repetitions): # x y 1 1.2 1 0.9 1 1.3 2 2.2 2 1.7 2 2.0 ... So my idea is to let spline procedure handle the fluctuations in the data (i.e., in repetitions) by itself. But I wonder whether this direct application of spline procedures for data with multiple observations makes sense from the statistical analysis (i.e., theoretical) point of view. It may be a stupid question and quite obvious to many, but personally I don't know where to start. It would be greatly appreciated if anyone can shed a light on this in this regard. Many thanks in advance, Joseph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Curve Fitting/Regression with Multiple Observations
Hello Gabor, Many thanks for providing actual examples for the problem! In fact I know how to apply and generate plots using various R functions including loess, lowess, and smooth.spline procedures. My question, however, is whether applying those procedures directly on the data with multiple observations/duplicate points(?) is on the sound basis or not. Before asking my question to the list, I checked smooth.spline manual pages and found the mentioning of cv option related with duplicate points, but I'm not sure duplicate points in the manual has the same meaning as multiple observations in my case. To me, the manual seems a bit unclear in this regard. Looking at car data, I found it has multiple points with the same speed but different dist, which is exactly what I mean by multiple observations, but am still not sure. Regards, Joseph On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: This will compute a loess curve and plot it: example(loess) plot(dist ~ speed, cars, pch = 20) lines(cars$speed, fitted(cars.lo)) Also this directly plots it but does not give you the values of the curve separately: library(lattice) xyplot(dist ~ speed, cars, type = c(p, smooth)) On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim kyeongsoo@gmail.com wrote: I recently came to realize the true power of R for statistical analysis -- mainly for post-processing of data from large-scale simulations -- and have been converting many of existing Python(SciPy) scripts to those based on R and/or Perl. In the middle of this conversion, I revisited the problem of curve fitting for simulation data with multiple observations resulting from repetitions. In the past, I first processed simulation data (i.e., multiple y's from repetitions) to get a mean with a confidence interval for a given value of x (independent variable) and then applied spline procedure for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather large confidence intervals, however, the resulting curves were hardly smooth enough for my purpose, I had to fix the function to exponential and used least square methods to fit its parameters for data. From a plot with confidence intervals, it's rather easy for one to visually and manually(?) figure out a smoothed curve for it. So I'm thinking right now of directly applying spline (or whatever regression procedures for this purpose) to the simulation data with repetitions rather than means. The simulation data in this case looks like this (assuming three repetitions): # x y 1 1.2 1 0.9 1 1.3 2 2.2 2 1.7 2 2.0 ... So my idea is to let spline procedure handle the fluctuations in the data (i.e., in repetitions) by itself. But I wonder whether this direct application of spline procedures for data with multiple observations makes sense from the statistical analysis (i.e., theoretical) point of view. It may be a stupid question and quite obvious to many, but personally I don't know where to start. It would be greatly appreciated if anyone can shed a light on this in this regard. Many thanks in advance, Joseph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Curve Fitting/Regression with Multiple Observations
If you are looking for a framework for statistical inference you could look at additive models as in the mgcv package which has a book associated with it if you need more info. e.g. library(mgcv) fm - gam(dist ~ s(speed), data = cars) summary(fm) plot(dist ~ speed, cars, pch = 20) fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, 2)) On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim kyeongsoo@gmail.com wrote: Hello Gabor, Many thanks for providing actual examples for the problem! In fact I know how to apply and generate plots using various R functions including loess, lowess, and smooth.spline procedures. My question, however, is whether applying those procedures directly on the data with multiple observations/duplicate points(?) is on the sound basis or not. Before asking my question to the list, I checked smooth.spline manual pages and found the mentioning of cv option related with duplicate points, but I'm not sure duplicate points in the manual has the same meaning as multiple observations in my case. To me, the manual seems a bit unclear in this regard. Looking at car data, I found it has multiple points with the same speed but different dist, which is exactly what I mean by multiple observations, but am still not sure. Regards, Joseph On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: This will compute a loess curve and plot it: example(loess) plot(dist ~ speed, cars, pch = 20) lines(cars$speed, fitted(cars.lo)) Also this directly plots it but does not give you the values of the curve separately: library(lattice) xyplot(dist ~ speed, cars, type = c(p, smooth)) On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim kyeongsoo@gmail.com wrote: I recently came to realize the true power of R for statistical analysis -- mainly for post-processing of data from large-scale simulations -- and have been converting many of existing Python(SciPy) scripts to those based on R and/or Perl. In the middle of this conversion, I revisited the problem of curve fitting for simulation data with multiple observations resulting from repetitions. In the past, I first processed simulation data (i.e., multiple y's from repetitions) to get a mean with a confidence interval for a given value of x (independent variable) and then applied spline procedure for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather large confidence intervals, however, the resulting curves were hardly smooth enough for my purpose, I had to fix the function to exponential and used least square methods to fit its parameters for data. From a plot with confidence intervals, it's rather easy for one to visually and manually(?) figure out a smoothed curve for it. So I'm thinking right now of directly applying spline (or whatever regression procedures for this purpose) to the simulation data with repetitions rather than means. The simulation data in this case looks like this (assuming three repetitions): # x y 1 1.2 1 0.9 1 1.3 2 2.2 2 1.7 2 2.0 ... So my idea is to let spline procedure handle the fluctuations in the data (i.e., in repetitions) by itself. But I wonder whether this direct application of spline procedures for data with multiple observations makes sense from the statistical analysis (i.e., theoretical) point of view. It may be a stupid question and quite obvious to many, but personally I don't know where to start. It would be greatly appreciated if anyone can shed a light on this in this regard. Many thanks in advance, Joseph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Curve Fitting/Regression with Multiple Observations
Frankly speaking, I am not looking for such a framework. The system I'm studying is a communication network (like M/M/1 queue, but way too complicated to mathematically analyze it using classical queueing theory) and the conclusion I want to make is qualitative rather than quantatitive -- a high-level comparative study of various network architectures based on the equivalence principle (a concept specific to netwokring, not in the general sense). What l want in this regard is a smooth, non-decreasing (hence one-to-one) function built out of simulation data because later in my processing, I need an inverse function of the said curve to find out an x value given the y value. That was, in fact, the reason I used the exponential (i.e., non-decreasing function) curve fiting. Even though I don't need a statistical inference framework for my work, I want to make sure that my use of regression/curve fitting techniques with my simulation data (as a tool for getting the mentioned curve) is proper and a usual practice among experts like you. To get answer to my question, I digged a lot through the Internet but found no clear explanation so far. Your suggestions and providing examples (always!) are much appreciated, but I am still not sure the use of those regression procedures with the kind of data I described is a right way to do. Again, many thanks for your prompt and kind answers, Joseph On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: If you are looking for a framework for statistical inference you could look at additive models as in the mgcv package which has a book associated with it if you need more info. e.g. library(mgcv) fm - gam(dist ~ s(speed), data = cars) summary(fm) plot(dist ~ speed, cars, pch = 20) fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, 2)) On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim kyeongsoo@gmail.com wrote: Hello Gabor, Many thanks for providing actual examples for the problem! In fact I know how to apply and generate plots using various R functions including loess, lowess, and smooth.spline procedures. My question, however, is whether applying those procedures directly on the data with multiple observations/duplicate points(?) is on the sound basis or not. Before asking my question to the list, I checked smooth.spline manual pages and found the mentioning of cv option related with duplicate points, but I'm not sure duplicate points in the manual has the same meaning as multiple observations in my case. To me, the manual seems a bit unclear in this regard. Looking at car data, I found it has multiple points with the same speed but different dist, which is exactly what I mean by multiple observations, but am still not sure. Regards, Joseph On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: This will compute a loess curve and plot it: example(loess) plot(dist ~ speed, cars, pch = 20) lines(cars$speed, fitted(cars.lo)) Also this directly plots it but does not give you the values of the curve separately: library(lattice) xyplot(dist ~ speed, cars, type = c(p, smooth)) On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim kyeongsoo@gmail.com wrote: I recently came to realize the true power of R for statistical analysis -- mainly for post-processing of data from large-scale simulations -- and have been converting many of existing Python(SciPy) scripts to those based on R and/or Perl. In the middle of this conversion, I revisited the problem of curve fitting for simulation data with multiple observations resulting from repetitions. In the past, I first processed simulation data (i.e., multiple y's from repetitions) to get a mean with a confidence interval for a given value of x (independent variable) and then applied spline procedure for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather large confidence intervals, however, the resulting curves were hardly smooth enough for my purpose, I had to fix the function to exponential and used least square methods to fit its parameters for data. From a plot with confidence intervals, it's rather easy for one to visually and manually(?) figure out a smoothed curve for it. So I'm thinking right now of directly applying spline (or whatever regression procedures for this purpose) to the simulation data with repetitions rather than means. The simulation data in this case looks like this (assuming three repetitions): # x y 1 1.2 1 0.9 1 1.3 2 2.2 2 1.7 2 2.0 ... So my idea is to let spline procedure handle the fluctuations in the data (i.e., in repetitions) by itself. But I wonder whether this direct application of spline procedures for data with multiple observations makes