Re: [R] non-linear regression and root finding
On Mon, 6 Nov 2023 14:55:39 -0500 J C Nash wrote: > However, I'm wondering if this approach is worth writing up, at least > as a vignette or blog post. It does need a shorter example and some > explanation of the "why" and some testing perhaps. Do you mean using this problem as a basis to illustrate ordering constraints on parameters? Weird constraints do come up every now and then in regression problems. I could definitely offer my help with at least some of the text. > If there's interest, I'll be happy to join in. And my own posting > suggests how the ordering is enforced by bounding the "delta" > parameters from below. I have just tried nlsr::nlxb for a slightly larger dataset shared by Troels off-list, and it worked great with the delta parameters as you suggested, thank you! It's interesting that nlxb and nlsLM give slightly different answers, differing in 0.5 pK units for pK1 and (pK2-pK1) but not (pK3-pK2). Then again, they both agree that the standard error for pK1 and (pK2-pK1) is very large, so perhaps the problem is just very ill-conditioned. -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-linear regression and root finding
G'day Troels, On Tue, 7 Nov 2023 07:14:02 +0100 Troels Ring wrote: > Be as it may, I wonder if not your method might work if only we KNOW > that pK1 is either positive OR negative, in which case we have pK1 = > -exp(theta1)? If pK1 can be either negative or positive (or 0 :-) ), and it is just the ordering that you want to have enforced, then I would try the parameterisation: pK1 = pK1 pK2 = pK1 + exp(theta2) pK3 = pk2 + exp(theta3) and optimise over pK1, theta2 and theta3. As long as you want to know the estimates only. Asking for standard errors of the original estimates would open another can of worms. :-) Cheers, Berwin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-linear regression and root finding
Thanks a lot, Berwin. Unfortunately, pK1 may well be negative and as I understand the literature it may be poorly defined as such, and also seems to be at a boundary, since when lower is set to say rep(-4,3) pK1 is returned as -4 while pK2 and pK3 are undisturbed. Perhaps the point is that pK1 is not carrying any information at the pH around 5. Fair enough, I guess. Only, I believe I need stick to including all three pK values to be in agreement with the molecular information about HEPES - although even this is contentious. Be as it may, I wonder if not your method might work if only we KNOW that pK1 is either positive OR negative, in which case we have pK1 = -exp(theta1)? Best wishes Troels Den 07-11-2023 kl. 05:10 skrev Berwin A Turlach: G'day Troels, On Mon, 6 Nov 2023 20:43:10 +0100 Troels Ring wrote: Thanks a lot! This was amazing. I'm not sure I see how the conditiion pK1 < pK2 < pK3 is enforced? One way of enforcing such constraints (well, in finite computer arithemtic only "<=" can be enforced) is to rewrite the parameters as: pK1 = exp(theta1) ## only if pK1 > 0 pK2 = pK1 + exp(theta2) pK3 = pk2 + exp(theta3) And then use your optimiser to optimise over theta1, theta2 and theta3. There might be better approaches depending on the specific problem. Cheers, Berwin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-linear regression and root finding
G'day Troels, On Mon, 6 Nov 2023 20:43:10 +0100 Troels Ring wrote: > Thanks a lot! This was amazing. I'm not sure I see how the conditiion > pK1 < pK2 < pK3 is enforced? One way of enforcing such constraints (well, in finite computer arithemtic only "<=" can be enforced) is to rewrite the parameters as: pK1 = exp(theta1) ## only if pK1 > 0 pK2 = pK1 + exp(theta2) pK3 = pk2 + exp(theta3) And then use your optimiser to optimise over theta1, theta2 and theta3. There might be better approaches depending on the specific problem. Cheers, Berwin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-linear regression and root finding
I won't send to list, but just to the two of you, as I don't have anything to add at this time. However, I'm wondering if this approach is worth writing up, at least as a vignette or blog post. It does need a shorter example and some explanation of the "why" and some testing perhaps. If there's interest, I'll be happy to join in. And my own posting suggests how the ordering is enforced by bounding the "delta" parameters from below. Note that nls() can only handle bounds in the "port" algorithm, and the man page rather pours cold water on using that algorithm. Best, JN On 2023-11-06 14:43, Troels Ring wrote: Thanks a lot! This was amazing. I'm not sure I see how the conditiion pK1 < pK2 < pK3 is enforced? - it comes from the derivation via generalized Henderson-Hasselbalch but perhaps it is not really necessary. Anyway, the use of Vectorize did the trick! Best wishes Troels Den 06-11-2023 kl. 19:19 skrev Ivan Krylov: В Mon, 6 Nov 2023 17:53:49 +0100 Troels Ring пишет: Hence I wonder if I could somehow have non linear regression to find the 3 pK values. Below is HEPESFUNC which delivers charge in the fluid for known pKs, HEPTOT and SID. Is it possible to have root-finding in the formula with nls? Sure. Just reformulate the problem in terms of a function that takes a vector of predictors (your independent variable SID) and the desired parameters (pK1, pK2, pK3) as separate arguments and then returns predicted values of the dependent variable (to compare against pHobs): kw <- 1e-14 # I'm assuming pHm <- Vectorize(function(SID, pK1, pK2, pK3) -log10(uniroot(HEPESFUNC,c(1e-20,1),tol=1e-20,maxiter=1e4, HEPTOT=HEPTOT,SID = SID, pK1=pK1,pK2=pK2,pK3=pK3)$root)) (Yes, Vectorize() doesn't make the function any faster, but I don't see an easy way to rewrite this function to make it truly vectorised.) Unfortunately, nls() seems to take a step somewhere where crossprod() of the Jacobian of the model function cannot be inverted and fails with the message "Singular gradient". I wish that R could have a more reliable built-in nonlinear least squares solver. (I could also be holding it wrong.) Meanwhile, we have excellent CRAN packages nlsr and minpack.lm: minpack.lm::nlsLM( pHobs ~ pHm(SID, pK1, pK2, pK3), data.frame(pHobs = pHobs, SID = SID), start = c(pK1 = pK1, pK2 = pK2, pK3 = pK3), # the following is also needed to avoid MINPACK failing to fit lower = rep(-1, 3), upper = rep(9, 3) ) # Nonlinear regression model # model: pHobs ~ pHm(SID, pK1, pK2, pK3) # data: data.frame(pHobs = pHobs, SID = SID) # pK1 pK2 pK3 # -1.000 2.966 7.606 # residual sum-of-squares: 0.001195 # # Number of iterations to convergence: 15 # Achieved convergence tolerance: 1.49e-08 (Unfortunately, your code seemed to have a lot of non-breakable spaces, which confuse R's parser and make it harder to copy&paste it.) I think that your function can also be presented as a degree-5 polynomial in H, so it should also be possible to use polyroot() to obtain your solutions in a more exact manner. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-linear regression and root finding
Thanks a lot! This was amazing. I'm not sure I see how the conditiion pK1 < pK2 < pK3 is enforced? - it comes from the derivation via generalized Henderson-Hasselbalch but perhaps it is not really necessary. Anyway, the use of Vectorize did the trick! Best wishes Troels Den 06-11-2023 kl. 19:19 skrev Ivan Krylov: В Mon, 6 Nov 2023 17:53:49 +0100 Troels Ring пишет: Hence I wonder if I could somehow have non linear regression to find the 3 pK values. Below is HEPESFUNC which delivers charge in the fluid for known pKs, HEPTOT and SID. Is it possible to have root-finding in the formula with nls? Sure. Just reformulate the problem in terms of a function that takes a vector of predictors (your independent variable SID) and the desired parameters (pK1, pK2, pK3) as separate arguments and then returns predicted values of the dependent variable (to compare against pHobs): kw <- 1e-14 # I'm assuming pHm <- Vectorize(function(SID, pK1, pK2, pK3) -log10(uniroot(HEPESFUNC,c(1e-20,1),tol=1e-20,maxiter=1e4, HEPTOT=HEPTOT,SID = SID, pK1=pK1,pK2=pK2,pK3=pK3)$root)) (Yes, Vectorize() doesn't make the function any faster, but I don't see an easy way to rewrite this function to make it truly vectorised.) Unfortunately, nls() seems to take a step somewhere where crossprod() of the Jacobian of the model function cannot be inverted and fails with the message "Singular gradient". I wish that R could have a more reliable built-in nonlinear least squares solver. (I could also be holding it wrong.) Meanwhile, we have excellent CRAN packages nlsr and minpack.lm: minpack.lm::nlsLM( pHobs ~ pHm(SID, pK1, pK2, pK3), data.frame(pHobs = pHobs, SID = SID), start = c(pK1 = pK1, pK2 = pK2, pK3 = pK3), # the following is also needed to avoid MINPACK failing to fit lower = rep(-1, 3), upper = rep(9, 3) ) # Nonlinear regression model # model: pHobs ~ pHm(SID, pK1, pK2, pK3) #data: data.frame(pHobs = pHobs, SID = SID) #pK1 pK2pK3 # -1.000 2.966 7.606 # residual sum-of-squares: 0.001195 # # Number of iterations to convergence: 15 # Achieved convergence tolerance: 1.49e-08 (Unfortunately, your code seemed to have a lot of non-breakable spaces, which confuse R's parser and make it harder to copy&paste it.) I think that your function can also be presented as a degree-5 polynomial in H, so it should also be possible to use polyroot() to obtain your solutions in a more exact manner. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-linear regression and root finding
Your script is missing something (in particular kw). I presume you are trying to estimate the pK values. You may have more success with package nlsr than nls(). nlsr::nlxb() tries to get the Jacobian of the model specified by a formula and do so by applying symbolic or automatic differentiation. The multi expression function would probably not work. (That is one advantage of nls(), but it has an unstabilized solver and unless carefully called will use a simple forward derivative approximation.) There is also nlsr::nlfb() that can use functions for the residuals and jacobian. Your function is messy, but likely the jacobian can be developed with a bit of work. Some of the symbolic derivative features of R can help here. Alternatively, there are several numerical approximations in nlsr and the vignette "Intro to nlsr" explains how to use these. Note that simple forward and backward approximations are generally not worth using, but central approximation is quite good. The solver in the nlsr package allows of bounds on the parameters. Since you want them ordered, you may want to transform pK1 < pK2 <- pK3 to pk1, deltapk2, deltapk3 so pK2 = pk1+deltapk2 and pk3 = pk2 + deltapk3 = pk1 + deltapk2 + deltapk3 and all values bounded below by 0 or possibly some small numbers to keep the paramters apart. I won't pretend any of this is trivial. It is VERY easy to make small errors that still allow for output that is disastrously incorrect. If it is possible to get a single "formula" as one expression even if spread over multiple lines, then nlxb() might be able to handle it. J Nash (maintainer of nlsr and optimx) On 2023-11-06 11:53, Troels Ring wrote: HEPESFUNC <- function(H,SID,HEPTOT,pK1,pK2,pK3) { XX <- (H^3/(10^-pK3*10^-pK2*10^-pK1)+H^2/(10^-pK3*10^-pK2)+H/(10^-pK3)) IV <- HEPTOT/(1+XX) I <- IV*H^3/(10^-pK3*10^-pK2*10^-pK1) II <- IV*H^2/(10^-pK3*10^-pK2) III <- IV*H/10^-pK3 H - kw/H + SID + I*2 + II - IV } HEPTOT <- 0.050 SID <- c(-seq(10,1)*1e-4,0,seq(1,10)*1e-4) pHobs <- c(4.63,4.68,4.72,4.77,4.83,4.9,4.96,5.04,5.12,5.21,5.3, 5.39,5.48,5.55,5.63,5.69,5.74,5.8,5.85,5.89,5.93) pK1 <- -1; pK2 <- 3; pK3 <- 7.55 # literature values pK3 <- 7.6; pK2 <- 2.96 # values eye-balled to be better pH <- c() for (i in 1:length(SID)) { pH[i] <- -log10(uniroot(HEPESFUNC,c(1e-20,1),tol=1e-20,maxiter=1e4, HEPTOT=HEPTOT,SID = SID[i], pK1=pK1,pK2=pK2,pK3=pK3)$root) } plot(SID,pH) points(SID,pHobs,col="red") __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-linear regression and root finding
В Mon, 6 Nov 2023 17:53:49 +0100 Troels Ring пишет: > Hence I wonder if I could somehow have non linear regression to find > the 3 pK values. Below is HEPESFUNC which delivers charge in the > fluid for known pKs, HEPTOT and SID. Is it possible to have > root-finding in the formula with nls? Sure. Just reformulate the problem in terms of a function that takes a vector of predictors (your independent variable SID) and the desired parameters (pK1, pK2, pK3) as separate arguments and then returns predicted values of the dependent variable (to compare against pHobs): kw <- 1e-14 # I'm assuming pHm <- Vectorize(function(SID, pK1, pK2, pK3) -log10(uniroot(HEPESFUNC,c(1e-20,1),tol=1e-20,maxiter=1e4, HEPTOT=HEPTOT,SID = SID, pK1=pK1,pK2=pK2,pK3=pK3)$root)) (Yes, Vectorize() doesn't make the function any faster, but I don't see an easy way to rewrite this function to make it truly vectorised.) Unfortunately, nls() seems to take a step somewhere where crossprod() of the Jacobian of the model function cannot be inverted and fails with the message "Singular gradient". I wish that R could have a more reliable built-in nonlinear least squares solver. (I could also be holding it wrong.) Meanwhile, we have excellent CRAN packages nlsr and minpack.lm: minpack.lm::nlsLM( pHobs ~ pHm(SID, pK1, pK2, pK3), data.frame(pHobs = pHobs, SID = SID), start = c(pK1 = pK1, pK2 = pK2, pK3 = pK3), # the following is also needed to avoid MINPACK failing to fit lower = rep(-1, 3), upper = rep(9, 3) ) # Nonlinear regression model # model: pHobs ~ pHm(SID, pK1, pK2, pK3) #data: data.frame(pHobs = pHobs, SID = SID) #pK1 pK2pK3 # -1.000 2.966 7.606 # residual sum-of-squares: 0.001195 # # Number of iterations to convergence: 15 # Achieved convergence tolerance: 1.49e-08 (Unfortunately, your code seemed to have a lot of non-breakable spaces, which confuse R's parser and make it harder to copy&paste it.) I think that your function can also be presented as a degree-5 polynomial in H, so it should also be possible to use polyroot() to obtain your solutions in a more exact manner. -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.