Re: [R] nls, convergence and starting values
Hi Patrick, there exist specialized functionality in R that offer both automated calculation of starting values and relatively robust optimization, which can be used with success in many common cases of nonlinear regression, also for your data: library(drc) # on CRAN ## Fitting 3-parameter logistic model ## (slightly different parameterization from SSlogis()) bdd.m1 <- drm(pourcma~transat, weights=sqrt(nbfeces), data=bdd, fct=L.3()) plot(bdd.m1, broken=TRUE, conLevel=0.0001) summary(bdd.m1) Of course, standard errors are huge as the data do not really support this model (as already pointed out by other replies to this post). Christian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls, convergence and starting values
Patrick Giraudoux univ-fcomte.fr> writes: > > Patrick Burns a écrit : > > Patrick Giraudoux wrote: > >> Bert Gunter a écrit : > >>> Based on a simple scatterplot of pourcma vs transat, a 4 parameter > >>> logistic > >>> looks like wild overfitting, and that may be the source of your > >>> problems. > >>> Given the huge scatter, a straight line is about as much as would seem > >>> sensible. I think this falls into the "Why ever would you want to do > >>> such a > >>> thing?" category. > >>> > >>> -- Bert > >>> > >> > >> Right, well, the general idea was just to show that the "straight > >> line" was the best model indeed (in the other data sets, with model > >> comparison, the logistic one was clearly shown to be the best... ). > >> Can the fact that convergence cannot be obtained be an acceptable and > >> sufficient reason to select the null model (the straight line) ? > > > > It is my experience that convergence problems are > > often encountered when the model makes little sense. > > I'm not so sure that non-convergence on its own is > > a good reason to reject the model. That is, to answer > > your specific question, I think it is acceptable but not > > sufficient. > > > > Patrick Burns > > patrick burns-stat.com > > +44 (0)20 8525 0696 > > http://www.burns-stat.com > > (home of "The R Inferno" and "A Guide for the Unwilling S User") > > OK. Thanks for this opinion. Actually I was sharing it intuitively but > facing such situation for the first time, was quite unconfortable to > make a decision (and still I am). We are touching epistemology... and > maybe a bit far from purely technical thus from the R list issues. > A technical solution to this particular problem: with(bdd,plot(pourcma~transat)) stval <- list(Asym=30,xmid=0.07, scal=0.02) with(stval,curve(Asym/(1+exp((xmid-x)/scal)),add=TRUE)) nls(pourcma~SSlogis(transat, Asym, xmid, scal), start=c(Asym=30, xmid=0.07, scal=0.02),data=bdd, weights=sqrt(nbfeces),trace=T,alg="plinear") library(bbmle) m1 <- mle2(pourcma~dnorm(mean=Asym/(1+exp((xmid-transat)/scal)),sd=sd), start=c(stval,list(sd=0.1)),method="Nelder-Mead", data=bdd) with(as.list(coef(m1)),curve(Asym/(1+exp((xmid-x)/scal)),add=TRUE,col=2)) It happens to be able to find the flat-line solution (although it should really complain about lack of convergence, since the scale parameter should go to infinity and the midpoint parameter should be arbitrary in this case -- only Asym and the standard deviation are well defined). Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls, convergence and starting values
Patrick Burns a écrit : Patrick Giraudoux wrote: Bert Gunter a écrit : Based on a simple scatterplot of pourcma vs transat, a 4 parameter logistic looks like wild overfitting, and that may be the source of your problems. Given the huge scatter, a straight line is about as much as would seem sensible. I think this falls into the "Why ever would you want to do such a thing?" category. -- Bert Right, well, the general idea was just to show that the "straight line" was the best model indeed (in the other data sets, with model comparison, the logistic one was clearly shown to be the best... ). Can the fact that convergence cannot be obtained be an acceptable and sufficient reason to select the null model (the straight line) ? It is my experience that convergence problems are often encountered when the model makes little sense. I'm not so sure that non-convergence on its own is a good reason to reject the model. That is, to answer your specific question, I think it is acceptable but not sufficient. Patrick Burns patr...@burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of "The R Inferno" and "A Guide for the Unwilling S User") OK. Thanks for this opinion. Actually I was sharing it intuitively but facing such situation for the first time, was quite unconfortable to make a decision (and still I am). We are touching epistemology... and maybe a bit far from purely technical thus from the R list issues. Tanks again, anyway, Patrick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls, convergence and starting values
Patrick Giraudoux wrote: Bert Gunter a écrit : Based on a simple scatterplot of pourcma vs transat, a 4 parameter logistic looks like wild overfitting, and that may be the source of your problems. Given the huge scatter, a straight line is about as much as would seem sensible. I think this falls into the "Why ever would you want to do such a thing?" category. -- Bert Right, well, the general idea was just to show that the "straight line" was the best model indeed (in the other data sets, with model comparison, the logistic one was clearly shown to be the best... ). Can the fact that convergence cannot be obtained be an acceptable and sufficient reason to select the null model (the straight line) ? It is my experience that convergence problems are often encountered when the model makes little sense. I'm not so sure that non-convergence on its own is a good reason to reject the model. That is, to answer your specific question, I think it is acceptable but not sufficient. Patrick Burns patr...@burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of "The R Inferno" and "A Guide for the Unwilling S User") Patrick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls, convergence and starting values
Bert Gunter a écrit : Based on a simple scatterplot of pourcma vs transat, a 4 parameter logistic looks like wild overfitting, and that may be the source of your problems. Given the huge scatter, a straight line is about as much as would seem sensible. I think this falls into the "Why ever would you want to do such a thing?" category. -- Bert Right, well, the general idea was just to show that the "straight line" was the best model indeed (in the other data sets, with model comparison, the logistic one was clearly shown to be the best... ). Can the fact that convergence cannot be obtained be an acceptable and sufficient reason to select the null model (the straight line) ? Patrick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls, convergence and starting values
Based on a simple scatterplot of pourcma vs transat, a 4 parameter logistic looks like wild overfitting, and that may be the source of your problems. Given the huge scatter, a straight line is about as much as would seem sensible. I think this falls into the "Why ever would you want to do such a thing?" category. -- Bert Bert Gunter Genentech Nonclinical Biostatistics 650-467-7374 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Patrick Giraudoux Sent: Friday, March 27, 2009 12:39 PM To: r-h...@stat.math.ethz.ch Cc: Francis Raoul Subject: [R] nls, convergence and starting values "in non linear modelling finding appropriate starting values is something like an art"... (maybe from somewhere in Crawley , 2007) Here a colleague and I just want to compare different response models to a null model. This has worked OK for almost all the other data sets except that one (dumped below). Whatever our trials and algorithms, even subsetting data (to check if some singular point was the cause of the mess), we do not reach convergence... or screw up with singular gradients (?) etc... eg: nls(pourcma~SSlogis(transat, Asym, xmid, scal), start=c(Asym=30, xmid=0.07, scal=0.02),data=bdd, weights=sqrt(nbfeces),trace=T,alg="plinear") As anyone a hint about an alternate approach to fit a model ? Or an idea to get evidence that such model cannot be fitted to the data bdd <- structure(list(transat = c(0.0697, 0.13079, 0.314265, 0.241613, 0.039319, 0, 0, 0, 0, 0, 0.0805, 0.41, 0.30585, 0.27465, 0.06085, 0.09114, 0.05766, 0.036983, 0.093186, 0.046624, 0, 0, 0, 0, 0.000616, 0, 0.0025, 0.0325, 0.03125, 0.04599, 0.38398, 0.524505, 0.450337, 0.061831, 0.133926, 0.091806, 0.00928, 0.25114, 0.3074, 0.431056, 0.026158), transma = c(0.04141, 0.01599, 0.101803, 0.002378, 0.039319, 0.00472459016393443, 0.0031016393442623, 0.000178524590163934, 0.00255704918032787, 0.000346229508196721, 0.0665, 0.012, 0.0553, 0.0045, 0.0056, 0.00155, 0.00124, 0.011966, 0.001736, 0.004712, 3.62903225806452e-05, 9.79838709677419e-05, 2.20161290322581e-05, 0.00462, 0.01006444, 0.00213, 0.046, 0.005, 0.01195, 0.07154, 0.08468, 0.141182, 0.086578, 0.027959, 0.003159, 0.003081, 0.13862, 0.00754, 0.078648, 0.068324, 0.025288), nbfeces = c(22L, 26L, 43L, 30L, 35L, 25L, 21L, 36L, 34L, 37L, 23L, 32L, 40L, 35L, 30L, 16L, 25L, 37L, 37L, 34L, 31L, 35L, 41L, 31L, 34L, 39L, 5L, 14L, 31L, 13L, 21L, 34L, 32L, 36L, 36L, 40L, 31L, 35L, 39L, 29L, 32L), pourcma = c(50, 34.6153846153846, 27.9069767441860, 43.3, 65.7142857142857, 32, 28.5714285714286, 22.2, 50, 10.8108108108108, 26.0869565217391, 40.625, 12.5, 22.8571428571429, 43.3, 6.25, 4, 10.8108108108108, 16.2162162162162, 23.5294117647059, 25.8064516129032, 45.7142857142857, 39.0243902439024, 25.8064516129032, 41.7, 27.5, 20, 14.2857142857143, 22.5806451612903, 15.3846153846154, 38.0952380952381, 17.6470588235294, 78.125, 61.1, 25, 37.5, 22.5806451612903, 40, 17.9487179487179, 41.3793103448276, 50), pourcat = c(22.7272727272727, 30.7692307692308, 41.8604651162791, 56.7, 5.71428571428571, 0, 0, 0, 0, 0, 30.4347826086957, 15.625, 45, 74.2857142857143, 13.3, 50, 12, 18.9189189189189, 27.0270270270270, 20.5882352941176, 0, 0, 0, 0, 0, 5, 40, 0, 0, 7.69230769230769, 9.52380952380952, 38.2352941176471, 59.375, 5.56, 41.7, 42.5, 9.67741935483871, 14.2857142857143, 51.2820512820513, 79.3103448275862, 6.25)), .Names = c("transat", "transma", "nbfeces", "pourcma", "pourcat"), class = "data.frame", row.names = c(NA, -41L)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nls, convergence and starting values
"in non linear modelling finding appropriate starting values is something like an art"... (maybe from somewhere in Crawley , 2007) Here a colleague and I just want to compare different response models to a null model. This has worked OK for almost all the other data sets except that one (dumped below). Whatever our trials and algorithms, even subsetting data (to check if some singular point was the cause of the mess), we do not reach convergence... or screw up with singular gradients (?) etc... eg: nls(pourcma~SSlogis(transat, Asym, xmid, scal), start=c(Asym=30, xmid=0.07, scal=0.02),data=bdd, weights=sqrt(nbfeces),trace=T,alg="plinear") As anyone a hint about an alternate approach to fit a model ? Or an idea to get evidence that such model cannot be fitted to the data bdd <- structure(list(transat = c(0.0697, 0.13079, 0.314265, 0.241613, 0.039319, 0, 0, 0, 0, 0, 0.0805, 0.41, 0.30585, 0.27465, 0.06085, 0.09114, 0.05766, 0.036983, 0.093186, 0.046624, 0, 0, 0, 0, 0.000616, 0, 0.0025, 0.0325, 0.03125, 0.04599, 0.38398, 0.524505, 0.450337, 0.061831, 0.133926, 0.091806, 0.00928, 0.25114, 0.3074, 0.431056, 0.026158), transma = c(0.04141, 0.01599, 0.101803, 0.002378, 0.039319, 0.00472459016393443, 0.0031016393442623, 0.000178524590163934, 0.00255704918032787, 0.000346229508196721, 0.0665, 0.012, 0.0553, 0.0045, 0.0056, 0.00155, 0.00124, 0.011966, 0.001736, 0.004712, 3.62903225806452e-05, 9.79838709677419e-05, 2.20161290322581e-05, 0.00462, 0.01006444, 0.00213, 0.046, 0.005, 0.01195, 0.07154, 0.08468, 0.141182, 0.086578, 0.027959, 0.003159, 0.003081, 0.13862, 0.00754, 0.078648, 0.068324, 0.025288), nbfeces = c(22L, 26L, 43L, 30L, 35L, 25L, 21L, 36L, 34L, 37L, 23L, 32L, 40L, 35L, 30L, 16L, 25L, 37L, 37L, 34L, 31L, 35L, 41L, 31L, 34L, 39L, 5L, 14L, 31L, 13L, 21L, 34L, 32L, 36L, 36L, 40L, 31L, 35L, 39L, 29L, 32L), pourcma = c(50, 34.6153846153846, 27.9069767441860, 43.3, 65.7142857142857, 32, 28.5714285714286, 22.2, 50, 10.8108108108108, 26.0869565217391, 40.625, 12.5, 22.8571428571429, 43.3, 6.25, 4, 10.8108108108108, 16.2162162162162, 23.5294117647059, 25.8064516129032, 45.7142857142857, 39.0243902439024, 25.8064516129032, 41.7, 27.5, 20, 14.2857142857143, 22.5806451612903, 15.3846153846154, 38.0952380952381, 17.6470588235294, 78.125, 61.1, 25, 37.5, 22.5806451612903, 40, 17.9487179487179, 41.3793103448276, 50), pourcat = c(22.7272727272727, 30.7692307692308, 41.8604651162791, 56.7, 5.71428571428571, 0, 0, 0, 0, 0, 30.4347826086957, 15.625, 45, 74.2857142857143, 13.3, 50, 12, 18.9189189189189, 27.0270270270270, 20.5882352941176, 0, 0, 0, 0, 0, 5, 40, 0, 0, 7.69230769230769, 9.52380952380952, 38.2352941176471, 59.375, 5.56, 41.7, 42.5, 9.67741935483871, 14.2857142857143, 51.2820512820513, 79.3103448275862, 6.25)), .Names = c("transat", "transma", "nbfeces", "pourcma", "pourcat"), class = "data.frame", row.names = c(NA, -41L)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.