Re: [R] Curve Fitting/Regression with Multiple Observations

kMan Fri, 30 Apr 2010 07:55:29 -0700

Dear Joseph,

I have had a similar experience to replies. Andy's assessment about signal to 
noise on the list is, I believe, quite accurate, and quite elegant. My 
experience has generally been that R-replies get better with age.


I welcome the feedback you just provided.

Sincerely,
KeithC.

-----Original Message-----
From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo....@gmail.com] 
Sent: Friday, April 30, 2010 4:10 AM
To: kMan
Cc: r-help@r-project.org
Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

Dear Keith,

Thanks for the suggestion and taking your time to respond to it.

But, you misunderstand something and seems that you do not read all my previous 
e-mails.
For instance, can a hand-drawing curve give you an inverse function 
(analytically or numerically) so that you can find an x value given the y value 
(not just for one, but for hundreds of points)?

As for the statistical inferences, I admit that my communications were not that 
very clear. My intention is to get a smoothed curve from the simulation data in 
a statistically meaningful way as much as possible for my intended use of the 
resulting curve.

As said before, I don't know all the thorough theoretical details behind 
regression and curve fitting functions available in R (know the basics though 
as one with PhD in Elec. Eng. unlike someone's assessment), but am doing my 
best to catch up reading textbooks and manuals, and posting this question to 
this list is definitely a way to learn from many experts and advanced users of 
R.

By the way, I wonder why most of the responses I've received from this list are 
so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's 
very hard to imagine that one would receive such responses in my own areas of 
computer simulation and optical communications/networking. If a newbie asks a 
question to the list not making much sense or another FAQ, that is usually 
ignored (i.e., no
response) because all we are too busy to deal with that. Sometimes, though, a 
kind soul (like Gabor) takes his/her own valuable time and doesn't mind 
explaining all the details from simple basics.

Again, what I want to hear from the list is the proper use of regression/curve 
fitting functions of R for my simulation data with
replications: Applying after taking means or directly on them? So far I haven't 
heard anyone even specifically touching my question, although there were 
several seemingly related suggestions.

Regards,
Joseph

On Fri, Apr 30, 2010 at 4:25 AM, kMan <kchambe...@gmail.com> wrote:
> Dear Joseph,
>
> If you do not need to make any inferences, that is, you just want it to look 
> pretty, then drawing a curve by hand is as good a solution as any. Plus, 
> there is no reason for expert testimony to say that the curve does not mean 
> anything.
>
> Sincerely,
> KeithC.
>
> -----Original Message-----
> From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo....@gmail.com]
> Sent: Tuesday, April 27, 2010 2:33 PM
> To: Gabor Grothendieck
> Cc: r-help@r-project.org
> Subject: Re: [R] Curve Fitting/Regression with Multiple Observations
>
> Frankly speaking, I am not looking for such a framework.
>
> The system I'm studying is a communication network (like M/M/1 queue, but way 
> too complicated to mathematically analyze it using classical queueing theory) 
> and the conclusion I want to make is qualitative rather than quantatitive -- 
> a high-level comparative study of various network architectures based on the 
> "equivalence principle" (a concept specific to netwokring, not in the general 
> sense).
>
> What l want in this regard is a smooth, non-decreasing (hence
> one-to-one) function built out of simulation data because later in my 
> processing, I need an inverse function of the said curve to find out an x 
> value given the y value. That was, in fact, the reason I used the exponential 
> (i.e., non-decreasing function) curve fiting.
>
> Even though I don't need a statistical inference framework for my work, I 
> want to make sure that my use of regression/curve fitting techniques with my 
> simulation data (as a tool for getting the mentioned curve) is proper and a 
> usual practice among experts like you.
>
> To get answer to my question, I digged a lot through the Internet but found 
> no clear explanation so far.
>
> Your suggestions and providing examples (always!) are much appreciated, but I 
> am still not sure the use of those regression procedures with the kind of 
> data I described is a right way to do.
>
> Again, many thanks for your prompt and kind answers, Joseph
>
>
> On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck <ggrothendi...@gmail.com> 
> wrote:
>> If you are looking for a framework for statistical inference you 
>> could look at additive models as in the mgcv package which has  a 
>> book associated with it if you need more info. e.g.
>>
>> library(mgcv)
>> fm <- gam(dist ~ s(speed), data = cars)
>> summary(fm)
>> plot(dist ~ speed, cars, pch = 20)
>> fm.ci <- with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) +
>> c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2,
>> 2))
>>
>>
>> On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim 
>> <kyeongsoo....@gmail.com> wrote:
>>> Hello Gabor,
>>>
>>> Many thanks for providing actual examples for the problem!
>>>
>>> In fact I know how to apply and generate plots using various R 
>>> functions including loess, lowess, and smooth.spline procedures.
>>>
>>> My question, however, is whether applying those procedures directly 
>>> on the data with multiple observations/duplicate points(?) is on the 
>>> sound basis or not.
>>>
>>> Before asking my question to the list, I checked smooth.spline 
>>> manual pages and found the mentioning of "cv" option related with 
>>> duplicate points, but I'm not sure "duplicate points" in the manual 
>>> has the same meaning as "multiple observations" in my case. To me, 
>>> the manual seems a bit unclear in this regard.
>>>
>>> Looking at "car" data, I found it has multiple points with the same 
>>> "speed" but different "dist", which is exactly what I mean by 
>>> multiple observations, but am still not sure.
>>>
>>> Regards,
>>> Joseph
>>>
>>>
>>> On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck 
>>> <ggrothendi...@gmail.com> wrote:
>>>> This will compute a loess curve and plot it:
>>>>
>>>> example(loess)
>>>> plot(dist ~ speed, cars, pch = 20)
>>>> lines(cars$speed, fitted(cars.lo))
>>>>
>>>> Also this directly plots it but does not give you the values of the 
>>>> curve separately:
>>>>
>>>> library(lattice)
>>>> xyplot(dist ~ speed, cars, type = c("p", "smooth"))
>>>>
>>>>
>>>>
>>>> On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim 
>>>> <kyeongsoo....@gmail.com> wrote:
>>>>> I recently came to realize the true power of R for statistical 
>>>>> analysis -- mainly for post-processing of data from large-scale 
>>>>> simulations -- and have been converting many of existing
>>>>> Python(SciPy) scripts to those based on R and/or Perl.
>>>>>
>>>>> In the middle of this conversion, I revisited the problem of curve 
>>>>> fitting for simulation data with multiple observations resulting 
>>>>> from repetitions.
>>>>>
>>>>> In the past, I first processed simulation data (i.e., multiple y's 
>>>>> from repetitions) to get a mean with a confidence interval for a 
>>>>> given value of x (independent variable) and then applied spline 
>>>>> procedure for those mean values only (i.e., unique pairs of (x_i,
>>>>> y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather 
>>>>> large confidence intervals, however, the resulting curves were 
>>>>> hardly smooth enough for my purpose, I had to fix the function to 
>>>>> exponential and used least square methods to fit its parameters for data.
>>>>>
>>>>> >From a plot with confidence intervals, it's rather easy for one 
>>>>> >to
>>>>> visually and manually(?) figure out a smoothed curve for it.
>>>>> So I'm thinking right now of directly applying spline (or whatever 
>>>>> regression procedures for this purpose) to the simulation data 
>>>>> with repetitions rather than means. The simulation data in this 
>>>>> case looks like this (assuming three repetitions):
>>>>>
>>>>> # x    y
>>>>> 1      1.2
>>>>> 1      0.9
>>>>> 1      1.3
>>>>> 2      2.2
>>>>> 2      1.7
>>>>> 2      2.0
>>>>> ...      ....
>>>>>
>>>>> So my idea is to let spline procedure handle the fluctuations in 
>>>>> the data (i.e., in repetitions) by itself.
>>>>> But I wonder whether this direct application of spline procedures 
>>>>> for data with multiple observations makes sense from the 
>>>>> statistical analysis (i.e., theoretical) point of view.
>>>>>
>>>>> It may be a stupid question and quite obvious to many, but 
>>>>> personally I don't know where to start.
>>>>> It would be greatly appreciated if anyone can shed a light on this 
>>>>> in this regard.
>>>>>
>>>>> Many thanks in advance,
>>>>> Joseph
>>>>>
>>>>> ______________________________________________
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>
>>
>
>
>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Curve Fitting/Regression with Multiple Observations

Reply via email to