Re: [R] Curve Fitting/Regression with Multiple Observations

2010-05-02 Thread Kyeong Soo (Joseph) Kim
Many thanks for the suggestion!

That may reduce the computational time needed to find x value given
the y one (for hundreds of pairs). Certainly, I will look into manuals
for approx() and approxfun() in this regard.

Again, thanks for your taking time to read my previous posts and make
this valuable suggestion.

Regards,
Joseph

On Sat, May 1, 2010 at 12:41 AM, Greg Snow greg.s...@imail.org wrote:
 I did not understand enough of the rest of your question to give any better 
 response than others have given.

 Looking back at your previous posts, there is one suggestion that I can make 
 that may help.  You can use the approx or approxfun functions to approximate 
 an inverse, just generate a bunch of x,y pairs from your function, then feed 
 them to approx while switching x and y.  Not an exact inverse, but if you 
 give it enough points then it will be close.

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com]
 Sent: Friday, April 30, 2010 5:24 PM
 To: Greg Snow
 Cc: r-help@r-project.org
 Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

 I have already learned a lot from the list, both technical and not,
 and cannot thank enough for those valuable suggestions. In fact, as
 said in my previous posts, I got really critical help and advices,
 which really addresses the issues I have.

 By the way, there is one point or two in your post I agree on, but I
 am not sure why you just pointed out side issues (by snipping a part
 of my saying) without touching the main topic of this thread at all. I
 can go on but won't because arguing for the sake of argument is of no
 value to anyone in this thread.

 It would have been better if you could have focused on the topic and
 provided some technical and practical information which I could learn
 from and be very thankful for.

 Regards,
 Joseph

 On Fri, Apr 30, 2010 at 11:35 PM, Greg Snow greg.s...@imail.org
 wrote:
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of Kyeong Soo (Joseph) Kim
  Sent: Friday, April 30, 2010 4:10 AM
  To: kMan
  Cc: r-help@r-project.org
  Subject: Re: [R] Curve Fitting/Regression with Multiple Observations
 
  [snip]
 
  By the way, I wonder why most of the responses I've received from
 this
  list are so cynical (or skeptical?) and in some sense done in a
 quite
  arrogant way. It's very hard to imagine that one would receive such
  responses in my own areas of computer simulation and optical
  communications/networking. If a newbie asks a question to the list
 not
  making much sense or another FAQ, that is usually ignored (i.e., no
  response) because all we are too busy to deal with that. Sometimes,
  though, a kind soul (like Gabor) takes his/her own valuable time and
  doesn't mind explaining all the details from simple basics.
 
  In my experience with this list, and others, the perceived level of
 cynical/skeptical/arrogant answers has more to do with the reader than
 with the writer.  If you want to be offended, you will find things to
 be offended about even when none was intended.  If you look for help
 and useful responses (follow the posting guide) and are thankful for
 what you learn, you will learn more and be bothered less.
 
  R-help is a mixture of different levels and cultures.  In framing
 responses it is hard to know what the other person may find offensive
 (I was once yelled at and chewed out quite thoroughly for truthfully
 answering no when asked if I drink coffee).
 
  Most responders on this list (actually I would say all, but there
 might be an exception that I have not noticed) are trying to be
 helpful, there is just a large variability in the tones of the
 responses.
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  greg.s...@imail.org
  801.408.8111
 
 
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-30 Thread Kyeong Soo (Joseph) Kim
Dear Keith,

Thanks for the suggestion and taking your time to respond to it.

But, you misunderstand something and seems that you do not read all my
previous e-mails.
For instance, can a hand-drawing curve give you an inverse function
(analytically or numerically) so that you can find an x value given
the y value (not just for one, but for hundreds of points)?

As for the statistical inferences, I admit that my communications were
not that very clear. My intention is to get a smoothed curve from the
simulation data in a statistically meaningful way as much as possible
for my intended use of the resulting curve.

As said before, I don't know all the thorough theoretical details
behind regression and curve fitting functions available in R (know the
basics though as one with PhD in Elec. Eng. unlike someone's
assessment), but am doing my best to catch up reading textbooks and
manuals, and posting this question to this list is definitely a way to
learn from many experts and advanced users of R.

By the way, I wonder why most of the responses I've received from this
list are so cynical (or skeptical?) and in some sense done in a quite
arrogant way. It's very hard to imagine that one would receive such
responses in my own areas of computer simulation and optical
communications/networking. If a newbie asks a question to the list not
making much sense or another FAQ, that is usually ignored (i.e., no
response) because all we are too busy to deal with that. Sometimes,
though, a kind soul (like Gabor) takes his/her own valuable time and
doesn't mind explaining all the details from simple basics.

Again, what I want to hear from the list is the proper use of
regression/curve fitting functions of R for my simulation data with
replications: Applying after taking means or directly on them? So far
I haven't heard anyone even specifically touching my question,
although there were several seemingly related suggestions.

Regards,
Joseph

On Fri, Apr 30, 2010 at 4:25 AM, kMan kchambe...@gmail.com wrote:
 Dear Joseph,

 If you do not need to make any inferences, that is, you just want it to look 
 pretty, then drawing a curve by hand is as good a solution as any. Plus, 
 there is no reason for expert testimony to say that the curve does not mean 
 anything.

 Sincerely,
 KeithC.

 -Original Message-
 From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com]
 Sent: Tuesday, April 27, 2010 2:33 PM
 To: Gabor Grothendieck
 Cc: r-help@r-project.org
 Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

 Frankly speaking, I am not looking for such a framework.

 The system I'm studying is a communication network (like M/M/1 queue, but way 
 too complicated to mathematically analyze it using classical queueing theory) 
 and the conclusion I want to make is qualitative rather than quantatitive -- 
 a high-level comparative study of various network architectures based on the 
 equivalence principle (a concept specific to netwokring, not in the general 
 sense).

 What l want in this regard is a smooth, non-decreasing (hence
 one-to-one) function built out of simulation data because later in my 
 processing, I need an inverse function of the said curve to find out an x 
 value given the y value. That was, in fact, the reason I used the exponential 
 (i.e., non-decreasing function) curve fiting.

 Even though I don't need a statistical inference framework for my work, I 
 want to make sure that my use of regression/curve fitting techniques with my 
 simulation data (as a tool for getting the mentioned curve) is proper and a 
 usual practice among experts like you.

 To get answer to my question, I digged a lot through the Internet but found 
 no clear explanation so far.

 Your suggestions and providing examples (always!) are much appreciated, but I 
 am still not sure the use of those regression procedures with the kind of 
 data I described is a right way to do.

 Again, many thanks for your prompt and kind answers, Joseph


 On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck ggrothendi...@gmail.com 
 wrote:
 If you are looking for a framework for statistical inference you could
 look at additive models as in the mgcv package which has  a book
 associated with it if you need more info. e.g.

 library(mgcv)
 fm - gam(dist ~ s(speed), data = cars)
 summary(fm)
 plot(dist ~ speed, cars, pch = 20)
 fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) +
 c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2,
 2))


 On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim
 kyeongsoo@gmail.com wrote:
 Hello Gabor,

 Many thanks for providing actual examples for the problem!

 In fact I know how to apply and generate plots using various R
 functions including loess, lowess, and smooth.spline procedures.

 My question, however, is whether applying those procedures directly
 on the data with multiple observations/duplicate points(?) is on the
 sound basis or not.

 Before asking

Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-30 Thread Liaw, Andy
You may want to run 

RSiteSearch(monotone splines)

at the R prompt.  The 3rd hit looks quite promising.  However, if I 
understand your data, you have multiple y values for the same x
values.  If so, can you justify inverting the regression function?

The traffic on this mailing list is very high, and the signal to
noise ratio is rather low.  This has the tendency of burning out
those who started with good intentions to help.

Andy 

From: Kyeong Soo (Joseph) Kim
 
 Dear Keith,
 
 Thanks for the suggestion and taking your time to respond to it.
 
 But, you misunderstand something and seems that you do not read all my
 previous e-mails.
 For instance, can a hand-drawing curve give you an inverse function
 (analytically or numerically) so that you can find an x value given
 the y value (not just for one, but for hundreds of points)?
 
 As for the statistical inferences, I admit that my communications were
 not that very clear. My intention is to get a smoothed curve from the
 simulation data in a statistically meaningful way as much as possible
 for my intended use of the resulting curve.
 
 As said before, I don't know all the thorough theoretical details
 behind regression and curve fitting functions available in R (know the
 basics though as one with PhD in Elec. Eng. unlike someone's
 assessment), but am doing my best to catch up reading textbooks and
 manuals, and posting this question to this list is definitely a way to
 learn from many experts and advanced users of R.
 
 By the way, I wonder why most of the responses I've received from this
 list are so cynical (or skeptical?) and in some sense done in a quite
 arrogant way. It's very hard to imagine that one would receive such
 responses in my own areas of computer simulation and optical
 communications/networking. If a newbie asks a question to the list not
 making much sense or another FAQ, that is usually ignored (i.e., no
 response) because all we are too busy to deal with that. Sometimes,
 though, a kind soul (like Gabor) takes his/her own valuable time and
 doesn't mind explaining all the details from simple basics.
 
 Again, what I want to hear from the list is the proper use of
 regression/curve fitting functions of R for my simulation data with
 replications: Applying after taking means or directly on them? So far
 I haven't heard anyone even specifically touching my question,
 although there were several seemingly related suggestions.
 
 Regards,
 Joseph
 
 On Fri, Apr 30, 2010 at 4:25 AM, kMan kchambe...@gmail.com wrote:
  Dear Joseph,
 
  If you do not need to make any inferences, that is, you 
 just want it to look pretty, then drawing a curve by hand is 
 as good a solution as any. Plus, there is no reason for 
 expert testimony to say that the curve does not mean anything.
 
  Sincerely,
  KeithC.
 
  -Original Message-
  From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com]
  Sent: Tuesday, April 27, 2010 2:33 PM
  To: Gabor Grothendieck
  Cc: r-help@r-project.org
  Subject: Re: [R] Curve Fitting/Regression with Multiple Observations
 
  Frankly speaking, I am not looking for such a framework.
 
  The system I'm studying is a communication network (like 
 M/M/1 queue, but way too complicated to mathematically 
 analyze it using classical queueing theory) and the 
 conclusion I want to make is qualitative rather than 
 quantatitive -- a high-level comparative study of various 
 network architectures based on the equivalence principle (a 
 concept specific to netwokring, not in the general sense).
 
  What l want in this regard is a smooth, non-decreasing (hence
  one-to-one) function built out of simulation data because 
 later in my processing, I need an inverse function of the 
 said curve to find out an x value given the y value. That 
 was, in fact, the reason I used the exponential (i.e., 
 non-decreasing function) curve fiting.
 
  Even though I don't need a statistical inference framework 
 for my work, I want to make sure that my use of 
 regression/curve fitting techniques with my simulation data 
 (as a tool for getting the mentioned curve) is proper and a 
 usual practice among experts like you.
 
  To get answer to my question, I digged a lot through the 
 Internet but found no clear explanation so far.
 
  Your suggestions and providing examples (always!) are much 
 appreciated, but I am still not sure the use of those 
 regression procedures with the kind of data I described is a 
 right way to do.
 
  Again, many thanks for your prompt and kind answers, Joseph
 
 
  On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck 
 ggrothendi...@gmail.com wrote:
  If you are looking for a framework for statistical 
 inference you could
  look at additive models as in the mgcv package which has  a book
  associated with it if you need more info. e.g.
 
  library(mgcv)
  fm - gam(dist ~ s(speed), data = cars)
  summary(fm)
  plot(dist ~ speed, cars, pch = 20)
  fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit

Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-30 Thread kMan
Dear Joseph,

I have had a similar experience to replies. Andy's assessment about signal to 
noise on the list is, I believe, quite accurate, and quite elegant. My 
experience has generally been that R-replies get better with age. 

I welcome the feedback you just provided.

Sincerely,
KeithC.

-Original Message-
From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] 
Sent: Friday, April 30, 2010 4:10 AM
To: kMan
Cc: r-help@r-project.org
Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

Dear Keith,

Thanks for the suggestion and taking your time to respond to it.

But, you misunderstand something and seems that you do not read all my previous 
e-mails.
For instance, can a hand-drawing curve give you an inverse function 
(analytically or numerically) so that you can find an x value given the y value 
(not just for one, but for hundreds of points)?

As for the statistical inferences, I admit that my communications were not that 
very clear. My intention is to get a smoothed curve from the simulation data in 
a statistically meaningful way as much as possible for my intended use of the 
resulting curve.

As said before, I don't know all the thorough theoretical details behind 
regression and curve fitting functions available in R (know the basics though 
as one with PhD in Elec. Eng. unlike someone's assessment), but am doing my 
best to catch up reading textbooks and manuals, and posting this question to 
this list is definitely a way to learn from many experts and advanced users of 
R.

By the way, I wonder why most of the responses I've received from this list are 
so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's 
very hard to imagine that one would receive such responses in my own areas of 
computer simulation and optical communications/networking. If a newbie asks a 
question to the list not making much sense or another FAQ, that is usually 
ignored (i.e., no
response) because all we are too busy to deal with that. Sometimes, though, a 
kind soul (like Gabor) takes his/her own valuable time and doesn't mind 
explaining all the details from simple basics.

Again, what I want to hear from the list is the proper use of regression/curve 
fitting functions of R for my simulation data with
replications: Applying after taking means or directly on them? So far I haven't 
heard anyone even specifically touching my question, although there were 
several seemingly related suggestions.

Regards,
Joseph

On Fri, Apr 30, 2010 at 4:25 AM, kMan kchambe...@gmail.com wrote:
 Dear Joseph,

 If you do not need to make any inferences, that is, you just want it to look 
 pretty, then drawing a curve by hand is as good a solution as any. Plus, 
 there is no reason for expert testimony to say that the curve does not mean 
 anything.

 Sincerely,
 KeithC.

 -Original Message-
 From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com]
 Sent: Tuesday, April 27, 2010 2:33 PM
 To: Gabor Grothendieck
 Cc: r-help@r-project.org
 Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

 Frankly speaking, I am not looking for such a framework.

 The system I'm studying is a communication network (like M/M/1 queue, but way 
 too complicated to mathematically analyze it using classical queueing theory) 
 and the conclusion I want to make is qualitative rather than quantatitive -- 
 a high-level comparative study of various network architectures based on the 
 equivalence principle (a concept specific to netwokring, not in the general 
 sense).

 What l want in this regard is a smooth, non-decreasing (hence
 one-to-one) function built out of simulation data because later in my 
 processing, I need an inverse function of the said curve to find out an x 
 value given the y value. That was, in fact, the reason I used the exponential 
 (i.e., non-decreasing function) curve fiting.

 Even though I don't need a statistical inference framework for my work, I 
 want to make sure that my use of regression/curve fitting techniques with my 
 simulation data (as a tool for getting the mentioned curve) is proper and a 
 usual practice among experts like you.

 To get answer to my question, I digged a lot through the Internet but found 
 no clear explanation so far.

 Your suggestions and providing examples (always!) are much appreciated, but I 
 am still not sure the use of those regression procedures with the kind of 
 data I described is a right way to do.

 Again, many thanks for your prompt and kind answers, Joseph


 On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck ggrothendi...@gmail.com 
 wrote:
 If you are looking for a framework for statistical inference you 
 could look at additive models as in the mgcv package which has  a 
 book associated with it if you need more info. e.g.

 library(mgcv)
 fm - gam(dist ~ s(speed), data = cars)
 summary(fm)
 plot(dist ~ speed, cars, pch = 20)
 fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit

Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-30 Thread Kyeong Soo (Joseph) Kim
Dear Andy,

You're the kind soul I mentioned in my previous e-mail!

Certainly yours is the kind of response I've been looking for, and now
I can start with that, especially splinefun() with monoH.FC
method.

As for my simulation data, your understanding is correct; there are
multiple y values from different replications for the same x values.
Even though there are multiple y values for a given x value, this
could be interpreted as the combination of multiple, different random
components (inherent in any monte carlo simulation) + one fixed,
unknown deterministic component. So underlying assumption is that
there is a one-to-one (monotone) function between x and y.

This is typical in many computer simulation in networking. As said
before, for instance, you can get a nice, closed-form (monotone)
function of utilization (i.e., \rho) for the average delay of
customers in the queueing system in M/M/1 queue. The simulation with
different random seeds, however, gives slightly different average
delays for a given utilization per run. Still, we know from the
underlying model that there is one-to-one correspondence between the
utilization and the average delay. Of course, unlike the simple M/M/1
queue, for most of actual networking systems to analyze, we don't know
the exact models, but it is well accepted and assumed in nearly all
existing work in this area that there is still one-to-one
correspondence between the utilization (or system load) and
performance measures like delay, throughput, and packet loss.

I do appreciate your suggestion and this would be of tremendous help
for my current research.
Also, thanks for the assessment on this list, which I take as a
valuable advice in the future.

With Regards,
Joseph


On Fri, Apr 30, 2010 at 12:52 PM, Liaw, Andy andy_l...@merck.com wrote:
 You may want to run

 RSiteSearch(monotone splines)

 at the R prompt.  The 3rd hit looks quite promising.  However, if I
 understand your data, you have multiple y values for the same x
 values.  If so, can you justify inverting the regression function?

 The traffic on this mailing list is very high, and the signal to
 noise ratio is rather low.  This has the tendency of burning out
 those who started with good intentions to help.

 Andy

 From: Kyeong Soo (Joseph) Kim

 Dear Keith,

 Thanks for the suggestion and taking your time to respond to it.

 But, you misunderstand something and seems that you do not read all my
 previous e-mails.
 For instance, can a hand-drawing curve give you an inverse function
 (analytically or numerically) so that you can find an x value given
 the y value (not just for one, but for hundreds of points)?

 As for the statistical inferences, I admit that my communications were
 not that very clear. My intention is to get a smoothed curve from the
 simulation data in a statistically meaningful way as much as possible
 for my intended use of the resulting curve.

 As said before, I don't know all the thorough theoretical details
 behind regression and curve fitting functions available in R (know the
 basics though as one with PhD in Elec. Eng. unlike someone's
 assessment), but am doing my best to catch up reading textbooks and
 manuals, and posting this question to this list is definitely a way to
 learn from many experts and advanced users of R.

 By the way, I wonder why most of the responses I've received from this
 list are so cynical (or skeptical?) and in some sense done in a quite
 arrogant way. It's very hard to imagine that one would receive such
 responses in my own areas of computer simulation and optical
 communications/networking. If a newbie asks a question to the list not
 making much sense or another FAQ, that is usually ignored (i.e., no
 response) because all we are too busy to deal with that. Sometimes,
 though, a kind soul (like Gabor) takes his/her own valuable time and
 doesn't mind explaining all the details from simple basics.

 Again, what I want to hear from the list is the proper use of
 regression/curve fitting functions of R for my simulation data with
 replications: Applying after taking means or directly on them? So far
 I haven't heard anyone even specifically touching my question,
 although there were several seemingly related suggestions.

 Regards,
 Joseph

 On Fri, Apr 30, 2010 at 4:25 AM, kMan kchambe...@gmail.com wrote:
  Dear Joseph,
 
  If you do not need to make any inferences, that is, you
 just want it to look pretty, then drawing a curve by hand is
 as good a solution as any. Plus, there is no reason for
 expert testimony to say that the curve does not mean anything.
 
  Sincerely,
  KeithC.
 
  -Original Message-
  From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com]
  Sent: Tuesday, April 27, 2010 2:33 PM
  To: Gabor Grothendieck
  Cc: r-help@r-project.org
  Subject: Re: [R] Curve Fitting/Regression with Multiple Observations
 
  Frankly speaking, I am not looking for such a framework.
 
  The system I'm studying is a communication

Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-30 Thread Kyeong Soo (Joseph) Kim
Dear Keith,

I will keep that in mind in my future posting.
Again, thanks for your time and advice!

Regards,
Joseph

On Fri, Apr 30, 2010 at 3:54 PM, kMan kchambe...@gmail.com wrote:
 Dear Joseph,

 I have had a similar experience to replies. Andy's assessment about signal to 
 noise on the list is, I believe, quite accurate, and quite elegant. My 
 experience has generally been that R-replies get better with age.

 I welcome the feedback you just provided.

 Sincerely,
 KeithC.

 -Original Message-
 From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com]
 Sent: Friday, April 30, 2010 4:10 AM
 To: kMan
 Cc: r-help@r-project.org
 Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

 Dear Keith,

 Thanks for the suggestion and taking your time to respond to it.

 But, you misunderstand something and seems that you do not read all my 
 previous e-mails.
 For instance, can a hand-drawing curve give you an inverse function 
 (analytically or numerically) so that you can find an x value given the y 
 value (not just for one, but for hundreds of points)?

 As for the statistical inferences, I admit that my communications were not 
 that very clear. My intention is to get a smoothed curve from the simulation 
 data in a statistically meaningful way as much as possible for my intended 
 use of the resulting curve.

 As said before, I don't know all the thorough theoretical details behind 
 regression and curve fitting functions available in R (know the basics though 
 as one with PhD in Elec. Eng. unlike someone's assessment), but am doing my 
 best to catch up reading textbooks and manuals, and posting this question to 
 this list is definitely a way to learn from many experts and advanced users 
 of R.

 By the way, I wonder why most of the responses I've received from this list 
 are so cynical (or skeptical?) and in some sense done in a quite arrogant 
 way. It's very hard to imagine that one would receive such responses in my 
 own areas of computer simulation and optical communications/networking. If a 
 newbie asks a question to the list not making much sense or another FAQ, that 
 is usually ignored (i.e., no
 response) because all we are too busy to deal with that. Sometimes, though, a 
 kind soul (like Gabor) takes his/her own valuable time and doesn't mind 
 explaining all the details from simple basics.

 Again, what I want to hear from the list is the proper use of 
 regression/curve fitting functions of R for my simulation data with
 replications: Applying after taking means or directly on them? So far I 
 haven't heard anyone even specifically touching my question, although there 
 were several seemingly related suggestions.

 Regards,
 Joseph

 On Fri, Apr 30, 2010 at 4:25 AM, kMan kchambe...@gmail.com wrote:
 Dear Joseph,

 If you do not need to make any inferences, that is, you just want it to look 
 pretty, then drawing a curve by hand is as good a solution as any. Plus, 
 there is no reason for expert testimony to say that the curve does not mean 
 anything.

 Sincerely,
 KeithC.

 -Original Message-
 From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com]
 Sent: Tuesday, April 27, 2010 2:33 PM
 To: Gabor Grothendieck
 Cc: r-help@r-project.org
 Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

 Frankly speaking, I am not looking for such a framework.

 The system I'm studying is a communication network (like M/M/1 queue, but 
 way too complicated to mathematically analyze it using classical queueing 
 theory) and the conclusion I want to make is qualitative rather than 
 quantatitive -- a high-level comparative study of various network 
 architectures based on the equivalence principle (a concept specific to 
 netwokring, not in the general sense).

 What l want in this regard is a smooth, non-decreasing (hence
 one-to-one) function built out of simulation data because later in my 
 processing, I need an inverse function of the said curve to find out an x 
 value given the y value. That was, in fact, the reason I used the 
 exponential (i.e., non-decreasing function) curve fiting.

 Even though I don't need a statistical inference framework for my work, I 
 want to make sure that my use of regression/curve fitting techniques with my 
 simulation data (as a tool for getting the mentioned curve) is proper and a 
 usual practice among experts like you.

 To get answer to my question, I digged a lot through the Internet but found 
 no clear explanation so far.

 Your suggestions and providing examples (always!) are much appreciated, but 
 I am still not sure the use of those regression procedures with the kind of 
 data I described is a right way to do.

 Again, many thanks for your prompt and kind answers, Joseph


 On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck 
 ggrothendi...@gmail.com wrote:
 If you are looking for a framework for statistical inference you
 could look at additive models as in the mgcv package which has

Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-30 Thread Greg Snow

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Kyeong Soo (Joseph) Kim
 Sent: Friday, April 30, 2010 4:10 AM
 To: kMan
 Cc: r-help@r-project.org
 Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

[snip]

 By the way, I wonder why most of the responses I've received from this
 list are so cynical (or skeptical?) and in some sense done in a quite
 arrogant way. It's very hard to imagine that one would receive such
 responses in my own areas of computer simulation and optical
 communications/networking. If a newbie asks a question to the list not
 making much sense or another FAQ, that is usually ignored (i.e., no
 response) because all we are too busy to deal with that. Sometimes,
 though, a kind soul (like Gabor) takes his/her own valuable time and
 doesn't mind explaining all the details from simple basics.

In my experience with this list, and others, the perceived level of 
cynical/skeptical/arrogant answers has more to do with the reader than with the 
writer.  If you want to be offended, you will find things to be offended about 
even when none was intended.  If you look for help and useful responses (follow 
the posting guide) and are thankful for what you learn, you will learn more and 
be bothered less.

R-help is a mixture of different levels and cultures.  In framing responses it 
is hard to know what the other person may find offensive (I was once yelled at 
and chewed out quite thoroughly for truthfully answering no when asked if I 
drink coffee).

Most responders on this list (actually I would say all, but there might be an 
exception that I have not noticed) are trying to be helpful, there is just a 
large variability in the tones of the responses.  

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-30 Thread Kyeong Soo (Joseph) Kim
I have already learned a lot from the list, both technical and not,
and cannot thank enough for those valuable suggestions. In fact, as
said in my previous posts, I got really critical help and advices,
which really addresses the issues I have.

By the way, there is one point or two in your post I agree on, but I
am not sure why you just pointed out side issues (by snipping a part
of my saying) without touching the main topic of this thread at all. I
can go on but won't because arguing for the sake of argument is of no
value to anyone in this thread.

It would have been better if you could have focused on the topic and
provided some technical and practical information which I could learn
from and be very thankful for.

Regards,
Joseph

On Fri, Apr 30, 2010 at 11:35 PM, Greg Snow greg.s...@imail.org wrote:

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Kyeong Soo (Joseph) Kim
 Sent: Friday, April 30, 2010 4:10 AM
 To: kMan
 Cc: r-help@r-project.org
 Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

 [snip]

 By the way, I wonder why most of the responses I've received from this
 list are so cynical (or skeptical?) and in some sense done in a quite
 arrogant way. It's very hard to imagine that one would receive such
 responses in my own areas of computer simulation and optical
 communications/networking. If a newbie asks a question to the list not
 making much sense or another FAQ, that is usually ignored (i.e., no
 response) because all we are too busy to deal with that. Sometimes,
 though, a kind soul (like Gabor) takes his/her own valuable time and
 doesn't mind explaining all the details from simple basics.

 In my experience with this list, and others, the perceived level of 
 cynical/skeptical/arrogant answers has more to do with the reader than with 
 the writer.  If you want to be offended, you will find things to be offended 
 about even when none was intended.  If you look for help and useful responses 
 (follow the posting guide) and are thankful for what you learn, you will 
 learn more and be bothered less.

 R-help is a mixture of different levels and cultures.  In framing responses 
 it is hard to know what the other person may find offensive (I was once 
 yelled at and chewed out quite thoroughly for truthfully answering no when 
 asked if I drink coffee).

 Most responders on this list (actually I would say all, but there might be an 
 exception that I have not noticed) are trying to be helpful, there is just a 
 large variability in the tones of the responses.

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-30 Thread Greg Snow
I did not understand enough of the rest of your question to give any better 
response than others have given.

Looking back at your previous posts, there is one suggestion that I can make 
that may help.  You can use the approx or approxfun functions to approximate an 
inverse, just generate a bunch of x,y pairs from your function, then feed them 
to approx while switching x and y.  Not an exact inverse, but if you give it 
enough points then it will be close.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com]
 Sent: Friday, April 30, 2010 5:24 PM
 To: Greg Snow
 Cc: r-help@r-project.org
 Subject: Re: [R] Curve Fitting/Regression with Multiple Observations
 
 I have already learned a lot from the list, both technical and not,
 and cannot thank enough for those valuable suggestions. In fact, as
 said in my previous posts, I got really critical help and advices,
 which really addresses the issues I have.
 
 By the way, there is one point or two in your post I agree on, but I
 am not sure why you just pointed out side issues (by snipping a part
 of my saying) without touching the main topic of this thread at all. I
 can go on but won't because arguing for the sake of argument is of no
 value to anyone in this thread.
 
 It would have been better if you could have focused on the topic and
 provided some technical and practical information which I could learn
 from and be very thankful for.
 
 Regards,
 Joseph
 
 On Fri, Apr 30, 2010 at 11:35 PM, Greg Snow greg.s...@imail.org
 wrote:
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of Kyeong Soo (Joseph) Kim
  Sent: Friday, April 30, 2010 4:10 AM
  To: kMan
  Cc: r-help@r-project.org
  Subject: Re: [R] Curve Fitting/Regression with Multiple Observations
 
  [snip]
 
  By the way, I wonder why most of the responses I've received from
 this
  list are so cynical (or skeptical?) and in some sense done in a
 quite
  arrogant way. It's very hard to imagine that one would receive such
  responses in my own areas of computer simulation and optical
  communications/networking. If a newbie asks a question to the list
 not
  making much sense or another FAQ, that is usually ignored (i.e., no
  response) because all we are too busy to deal with that. Sometimes,
  though, a kind soul (like Gabor) takes his/her own valuable time and
  doesn't mind explaining all the details from simple basics.
 
  In my experience with this list, and others, the perceived level of
 cynical/skeptical/arrogant answers has more to do with the reader than
 with the writer.  If you want to be offended, you will find things to
 be offended about even when none was intended.  If you look for help
 and useful responses (follow the posting guide) and are thankful for
 what you learn, you will learn more and be bothered less.
 
  R-help is a mixture of different levels and cultures.  In framing
 responses it is hard to know what the other person may find offensive
 (I was once yelled at and chewed out quite thoroughly for truthfully
 answering no when asked if I drink coffee).
 
  Most responders on this list (actually I would say all, but there
 might be an exception that I have not noticed) are trying to be
 helpful, there is just a large variability in the tones of the
 responses.
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  greg.s...@imail.org
  801.408.8111
 
 
 
 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-29 Thread kMan
Dear Joseph,

If you do not need to make any inferences, that is, you just want it to look 
pretty, then drawing a curve by hand is as good a solution as any. Plus, there 
is no reason for expert testimony to say that the curve does not mean anything.

Sincerely,
KeithC.

-Original Message-
From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo@gmail.com] 
Sent: Tuesday, April 27, 2010 2:33 PM
To: Gabor Grothendieck
Cc: r-help@r-project.org
Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

Frankly speaking, I am not looking for such a framework.

The system I'm studying is a communication network (like M/M/1 queue, but way 
too complicated to mathematically analyze it using classical queueing theory) 
and the conclusion I want to make is qualitative rather than quantatitive -- a 
high-level comparative study of various network architectures based on the 
equivalence principle (a concept specific to netwokring, not in the general 
sense).

What l want in this regard is a smooth, non-decreasing (hence
one-to-one) function built out of simulation data because later in my 
processing, I need an inverse function of the said curve to find out an x value 
given the y value. That was, in fact, the reason I used the exponential (i.e., 
non-decreasing function) curve fiting.

Even though I don't need a statistical inference framework for my work, I want 
to make sure that my use of regression/curve fitting techniques with my 
simulation data (as a tool for getting the mentioned curve) is proper and a 
usual practice among experts like you.

To get answer to my question, I digged a lot through the Internet but found no 
clear explanation so far.

Your suggestions and providing examples (always!) are much appreciated, but I 
am still not sure the use of those regression procedures with the kind of data 
I described is a right way to do.

Again, many thanks for your prompt and kind answers, Joseph


On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck ggrothendi...@gmail.com 
wrote:
 If you are looking for a framework for statistical inference you could 
 look at additive models as in the mgcv package which has  a book 
 associated with it if you need more info. e.g.

 library(mgcv)
 fm - gam(dist ~ s(speed), data = cars)
 summary(fm)
 plot(dist ~ speed, cars, pch = 20)
 fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + 
 c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, 
 2))


 On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim 
 kyeongsoo@gmail.com wrote:
 Hello Gabor,

 Many thanks for providing actual examples for the problem!

 In fact I know how to apply and generate plots using various R 
 functions including loess, lowess, and smooth.spline procedures.

 My question, however, is whether applying those procedures directly 
 on the data with multiple observations/duplicate points(?) is on the 
 sound basis or not.

 Before asking my question to the list, I checked smooth.spline manual 
 pages and found the mentioning of cv option related with duplicate 
 points, but I'm not sure duplicate points in the manual has the 
 same meaning as multiple observations in my case. To me, the manual 
 seems a bit unclear in this regard.

 Looking at car data, I found it has multiple points with the same 
 speed but different dist, which is exactly what I mean by 
 multiple observations, but am still not sure.

 Regards,
 Joseph


 On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck 
 ggrothendi...@gmail.com wrote:
 This will compute a loess curve and plot it:

 example(loess)
 plot(dist ~ speed, cars, pch = 20)
 lines(cars$speed, fitted(cars.lo))

 Also this directly plots it but does not give you the values of the 
 curve separately:

 library(lattice)
 xyplot(dist ~ speed, cars, type = c(p, smooth))



 On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim 
 kyeongsoo@gmail.com wrote:
 I recently came to realize the true power of R for statistical 
 analysis -- mainly for post-processing of data from large-scale 
 simulations -- and have been converting many of existing 
 Python(SciPy) scripts to those based on R and/or Perl.

 In the middle of this conversion, I revisited the problem of curve 
 fitting for simulation data with multiple observations resulting 
 from repetitions.

 In the past, I first processed simulation data (i.e., multiple y's 
 from repetitions) to get a mean with a confidence interval for a 
 given value of x (independent variable) and then applied spline 
 procedure for those mean values only (i.e., unique pairs of (x_i, 
 y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather 
 large confidence intervals, however, the resulting curves were 
 hardly smooth enough for my purpose, I had to fix the function to 
 exponential and used least square methods to fit its parameters for data.

 From a plot with confidence intervals, it's rather easy for one to
 visually and manually(?) figure out a smoothed curve for it.
 So

[R] Curve Fitting/Regression with Multiple Observations

2010-04-27 Thread Kyeong Soo (Joseph) Kim
I recently came to realize the true power of R for statistical
analysis -- mainly for post-processing of data from large-scale
simulations -- and have been converting many of existing Python(SciPy)
scripts to those based on R and/or Perl.

In the middle of this conversion, I revisited the problem of curve
fitting for simulation data with multiple observations resulting from
repetitions.

In the past, I first processed simulation data (i.e., multiple y's
from repetitions) to get a mean with a confidence interval for a given
value of x (independent variable) and then applied spline procedure
for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1,
2, ...) to get a smoothed curve. Because of rather large confidence
intervals, however, the resulting curves were hardly smooth enough for
my purpose, I had to fix the function to exponential and used least
square methods to fit its parameters for data.

From a plot with confidence intervals, it's rather easy for one to
visually and manually(?) figure out a smoothed curve for it.
So I'm thinking right now of directly applying spline (or whatever
regression procedures for this purpose) to the simulation data with
repetitions rather than means. The simulation data in this case looks
like this (assuming three repetitions):

# xy
1  1.2
1  0.9
1  1.3
2  2.2
2  1.7
2  2.0
...  

So my idea is to let spline procedure handle the fluctuations in the
data (i.e., in repetitions) by itself.
But I wonder whether this direct application of spline procedures for
data with multiple observations makes sense from the statistical
analysis (i.e., theoretical) point of view.

It may be a stupid question and quite obvious to many, but personally
I don't know where to start.
It would be greatly appreciated if anyone can shed a light on this in
this regard.

Many thanks in advance,
Joseph

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-27 Thread Bert Gunter
Joseph:

I believe you need to stop inventing your own statistical methods and
consult a professional statistician. I do not think this list is the proper
place to look for a statistics tutorial when your statistical background
appears to be so inadequate for the task.

Sorry to be so direct -- perhaps I am wrong in my assessment. But if I am
even close, would you like an accountant to fix your car or an auto mechanic
to do your taxes?

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Kyeong Soo (Joseph) Kim
Sent: Tuesday, April 27, 2010 10:31 AM
To: r-help@r-project.org
Subject: [R] Curve Fitting/Regression with Multiple Observations

I recently came to realize the true power of R for statistical
analysis -- mainly for post-processing of data from large-scale
simulations -- and have been converting many of existing Python(SciPy)
scripts to those based on R and/or Perl.

In the middle of this conversion, I revisited the problem of curve
fitting for simulation data with multiple observations resulting from
repetitions.

In the past, I first processed simulation data (i.e., multiple y's
from repetitions) to get a mean with a confidence interval for a given
value of x (independent variable) and then applied spline procedure
for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1,
2, ...) to get a smoothed curve. Because of rather large confidence
intervals, however, the resulting curves were hardly smooth enough for
my purpose, I had to fix the function to exponential and used least
square methods to fit its parameters for data.

From a plot with confidence intervals, it's rather easy for one to
visually and manually(?) figure out a smoothed curve for it.
So I'm thinking right now of directly applying spline (or whatever
regression procedures for this purpose) to the simulation data with
repetitions rather than means. The simulation data in this case looks
like this (assuming three repetitions):

# xy
1  1.2
1  0.9
1  1.3
2  2.2
2  1.7
2  2.0
...  

So my idea is to let spline procedure handle the fluctuations in the
data (i.e., in repetitions) by itself.
But I wonder whether this direct application of spline procedures for
data with multiple observations makes sense from the statistical
analysis (i.e., theoretical) point of view.

It may be a stupid question and quite obvious to many, but personally
I don't know where to start.
It would be greatly appreciated if anyone can shed a light on this in
this regard.

Many thanks in advance,
Joseph

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-27 Thread Gabor Grothendieck
This will compute a loess curve and plot it:

example(loess)
plot(dist ~ speed, cars, pch = 20)
lines(cars$speed, fitted(cars.lo))

Also this directly plots it but does not give you the values of the
curve separately:

library(lattice)
xyplot(dist ~ speed, cars, type = c(p, smooth))



On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim
kyeongsoo@gmail.com wrote:
 I recently came to realize the true power of R for statistical
 analysis -- mainly for post-processing of data from large-scale
 simulations -- and have been converting many of existing Python(SciPy)
 scripts to those based on R and/or Perl.

 In the middle of this conversion, I revisited the problem of curve
 fitting for simulation data with multiple observations resulting from
 repetitions.

 In the past, I first processed simulation data (i.e., multiple y's
 from repetitions) to get a mean with a confidence interval for a given
 value of x (independent variable) and then applied spline procedure
 for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1,
 2, ...) to get a smoothed curve. Because of rather large confidence
 intervals, however, the resulting curves were hardly smooth enough for
 my purpose, I had to fix the function to exponential and used least
 square methods to fit its parameters for data.

 From a plot with confidence intervals, it's rather easy for one to
 visually and manually(?) figure out a smoothed curve for it.
 So I'm thinking right now of directly applying spline (or whatever
 regression procedures for this purpose) to the simulation data with
 repetitions rather than means. The simulation data in this case looks
 like this (assuming three repetitions):

 # x    y
 1      1.2
 1      0.9
 1      1.3
 2      2.2
 2      1.7
 2      2.0
 ...      

 So my idea is to let spline procedure handle the fluctuations in the
 data (i.e., in repetitions) by itself.
 But I wonder whether this direct application of spline procedures for
 data with multiple observations makes sense from the statistical
 analysis (i.e., theoretical) point of view.

 It may be a stupid question and quite obvious to many, but personally
 I don't know where to start.
 It would be greatly appreciated if anyone can shed a light on this in
 this regard.

 Many thanks in advance,
 Joseph

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-27 Thread Kyeong Soo (Joseph) Kim
Hello Gabor,

Many thanks for providing actual examples for the problem!

In fact I know how to apply and generate plots using various R
functions including loess, lowess, and smooth.spline procedures.

My question, however, is whether applying those procedures directly on
the data with multiple observations/duplicate points(?) is on the
sound basis or not.

Before asking my question to the list, I checked smooth.spline manual
pages and found the mentioning of cv option related with duplicate
points, but I'm not sure duplicate points in the manual has the same
meaning as multiple observations in my case. To me, the manual seems
a bit unclear in this regard.

Looking at car data, I found it has multiple points with the same
speed but different dist, which is exactly what I mean by multiple
observations, but am still not sure.

Regards,
Joseph


On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 This will compute a loess curve and plot it:

 example(loess)
 plot(dist ~ speed, cars, pch = 20)
 lines(cars$speed, fitted(cars.lo))

 Also this directly plots it but does not give you the values of the
 curve separately:

 library(lattice)
 xyplot(dist ~ speed, cars, type = c(p, smooth))



 On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim
 kyeongsoo@gmail.com wrote:
 I recently came to realize the true power of R for statistical
 analysis -- mainly for post-processing of data from large-scale
 simulations -- and have been converting many of existing Python(SciPy)
 scripts to those based on R and/or Perl.

 In the middle of this conversion, I revisited the problem of curve
 fitting for simulation data with multiple observations resulting from
 repetitions.

 In the past, I first processed simulation data (i.e., multiple y's
 from repetitions) to get a mean with a confidence interval for a given
 value of x (independent variable) and then applied spline procedure
 for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1,
 2, ...) to get a smoothed curve. Because of rather large confidence
 intervals, however, the resulting curves were hardly smooth enough for
 my purpose, I had to fix the function to exponential and used least
 square methods to fit its parameters for data.

 From a plot with confidence intervals, it's rather easy for one to
 visually and manually(?) figure out a smoothed curve for it.
 So I'm thinking right now of directly applying spline (or whatever
 regression procedures for this purpose) to the simulation data with
 repetitions rather than means. The simulation data in this case looks
 like this (assuming three repetitions):

 # x    y
 1      1.2
 1      0.9
 1      1.3
 2      2.2
 2      1.7
 2      2.0
 ...      

 So my idea is to let spline procedure handle the fluctuations in the
 data (i.e., in repetitions) by itself.
 But I wonder whether this direct application of spline procedures for
 data with multiple observations makes sense from the statistical
 analysis (i.e., theoretical) point of view.

 It may be a stupid question and quite obvious to many, but personally
 I don't know where to start.
 It would be greatly appreciated if anyone can shed a light on this in
 this regard.

 Many thanks in advance,
 Joseph

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-27 Thread Gabor Grothendieck
If you are looking for a framework for statistical inference you could
look at additive models as in the mgcv package which has  a book
associated with it if you need more info. e.g.

library(mgcv)
fm - gam(dist ~ s(speed), data = cars)
summary(fm)
plot(dist ~ speed, cars, pch = 20)
fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + c(fit))
matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, 2))


On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim
kyeongsoo@gmail.com wrote:
 Hello Gabor,

 Many thanks for providing actual examples for the problem!

 In fact I know how to apply and generate plots using various R
 functions including loess, lowess, and smooth.spline procedures.

 My question, however, is whether applying those procedures directly on
 the data with multiple observations/duplicate points(?) is on the
 sound basis or not.

 Before asking my question to the list, I checked smooth.spline manual
 pages and found the mentioning of cv option related with duplicate
 points, but I'm not sure duplicate points in the manual has the same
 meaning as multiple observations in my case. To me, the manual seems
 a bit unclear in this regard.

 Looking at car data, I found it has multiple points with the same
 speed but different dist, which is exactly what I mean by multiple
 observations, but am still not sure.

 Regards,
 Joseph


 On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
 This will compute a loess curve and plot it:

 example(loess)
 plot(dist ~ speed, cars, pch = 20)
 lines(cars$speed, fitted(cars.lo))

 Also this directly plots it but does not give you the values of the
 curve separately:

 library(lattice)
 xyplot(dist ~ speed, cars, type = c(p, smooth))



 On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim
 kyeongsoo@gmail.com wrote:
 I recently came to realize the true power of R for statistical
 analysis -- mainly for post-processing of data from large-scale
 simulations -- and have been converting many of existing Python(SciPy)
 scripts to those based on R and/or Perl.

 In the middle of this conversion, I revisited the problem of curve
 fitting for simulation data with multiple observations resulting from
 repetitions.

 In the past, I first processed simulation data (i.e., multiple y's
 from repetitions) to get a mean with a confidence interval for a given
 value of x (independent variable) and then applied spline procedure
 for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1,
 2, ...) to get a smoothed curve. Because of rather large confidence
 intervals, however, the resulting curves were hardly smooth enough for
 my purpose, I had to fix the function to exponential and used least
 square methods to fit its parameters for data.

 From a plot with confidence intervals, it's rather easy for one to
 visually and manually(?) figure out a smoothed curve for it.
 So I'm thinking right now of directly applying spline (or whatever
 regression procedures for this purpose) to the simulation data with
 repetitions rather than means. The simulation data in this case looks
 like this (assuming three repetitions):

 # x    y
 1      1.2
 1      0.9
 1      1.3
 2      2.2
 2      1.7
 2      2.0
 ...      

 So my idea is to let spline procedure handle the fluctuations in the
 data (i.e., in repetitions) by itself.
 But I wonder whether this direct application of spline procedures for
 data with multiple observations makes sense from the statistical
 analysis (i.e., theoretical) point of view.

 It may be a stupid question and quite obvious to many, but personally
 I don't know where to start.
 It would be greatly appreciated if anyone can shed a light on this in
 this regard.

 Many thanks in advance,
 Joseph

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-27 Thread Kyeong Soo (Joseph) Kim
Frankly speaking, I am not looking for such a framework.

The system I'm studying is a communication network (like M/M/1 queue,
but way too complicated to mathematically analyze it using classical
queueing theory) and the conclusion I want to make is qualitative
rather than quantatitive -- a high-level comparative study of various
network architectures based on the equivalence principle (a concept
specific to netwokring, not in the general sense).

What l want in this regard is a smooth, non-decreasing (hence
one-to-one) function built out of simulation data because later in my
processing, I need an inverse function of the said curve to find out
an x value given the y value. That was, in fact, the reason I used the
exponential (i.e., non-decreasing function) curve fiting.

Even though I don't need a statistical inference framework for my
work, I want to make sure that my use of regression/curve fitting
techniques with my simulation data (as a tool for getting the
mentioned curve) is proper and a usual practice among experts like
you.

To get answer to my question, I digged a lot through the Internet but
found no clear explanation so far.

Your suggestions and providing examples (always!) are much
appreciated, but I am still not sure the use of those regression
procedures with the kind of data I described is a right way to do.

Again, many thanks for your prompt and kind answers,
Joseph


On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 If you are looking for a framework for statistical inference you could
 look at additive models as in the mgcv package which has  a book
 associated with it if you need more info. e.g.

 library(mgcv)
 fm - gam(dist ~ s(speed), data = cars)
 summary(fm)
 plot(dist ~ speed, cars, pch = 20)
 fm.ci - with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + c(fit))
 matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, 2))


 On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim
 kyeongsoo@gmail.com wrote:
 Hello Gabor,

 Many thanks for providing actual examples for the problem!

 In fact I know how to apply and generate plots using various R
 functions including loess, lowess, and smooth.spline procedures.

 My question, however, is whether applying those procedures directly on
 the data with multiple observations/duplicate points(?) is on the
 sound basis or not.

 Before asking my question to the list, I checked smooth.spline manual
 pages and found the mentioning of cv option related with duplicate
 points, but I'm not sure duplicate points in the manual has the same
 meaning as multiple observations in my case. To me, the manual seems
 a bit unclear in this regard.

 Looking at car data, I found it has multiple points with the same
 speed but different dist, which is exactly what I mean by multiple
 observations, but am still not sure.

 Regards,
 Joseph


 On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
 This will compute a loess curve and plot it:

 example(loess)
 plot(dist ~ speed, cars, pch = 20)
 lines(cars$speed, fitted(cars.lo))

 Also this directly plots it but does not give you the values of the
 curve separately:

 library(lattice)
 xyplot(dist ~ speed, cars, type = c(p, smooth))



 On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim
 kyeongsoo@gmail.com wrote:
 I recently came to realize the true power of R for statistical
 analysis -- mainly for post-processing of data from large-scale
 simulations -- and have been converting many of existing Python(SciPy)
 scripts to those based on R and/or Perl.

 In the middle of this conversion, I revisited the problem of curve
 fitting for simulation data with multiple observations resulting from
 repetitions.

 In the past, I first processed simulation data (i.e., multiple y's
 from repetitions) to get a mean with a confidence interval for a given
 value of x (independent variable) and then applied spline procedure
 for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1,
 2, ...) to get a smoothed curve. Because of rather large confidence
 intervals, however, the resulting curves were hardly smooth enough for
 my purpose, I had to fix the function to exponential and used least
 square methods to fit its parameters for data.

 From a plot with confidence intervals, it's rather easy for one to
 visually and manually(?) figure out a smoothed curve for it.
 So I'm thinking right now of directly applying spline (or whatever
 regression procedures for this purpose) to the simulation data with
 repetitions rather than means. The simulation data in this case looks
 like this (assuming three repetitions):

 # x    y
 1      1.2
 1      0.9
 1      1.3
 2      2.2
 2      1.7
 2      2.0
 ...      

 So my idea is to let spline procedure handle the fluctuations in the
 data (i.e., in repetitions) by itself.
 But I wonder whether this direct application of spline procedures for
 data with multiple observations makes