Re: [R] nls start values
Hi! thanks a lot for this suggestion! I tried to implement it like this, and it worked nicely. I used the method suggested by Gabor Grothendieck for simplification: frml - gene_expression ~ sin(tpoints * afreq + phase) * amp + shift gridfit - nls2(frml, algorithm = grid-search, data=gendat, start = startdf) sine_nls - nls2(gridfit, data=gendat, start = gridfit, algorithm=port) But I also tried the method of running the nls with start-values for phase between 0 and 2pi in five steps and then choosing the one which has the lowest Standardized Residual Sum of Squares. This worked even better in some cases and I think it's also conceptually simpler. Yours, Nick On 13/12/11 16:53, Hans W Borchers wrote: Niklaus Fankhauser niklaus.fankhauser at cell.biol.ethz.ch writes: I'm using nls to fit periodic gene-expression data to sine waves. I need to set the upper and lower boundaries, because I do not want any negative phase and amplitude solutions. This means that I have to use the port algorithm. The problem is, that depending on what start value I choose for phase, the fit works for some cases, but not for others. In the example below, the fit works using phase=pi, but not using phase=0. But there are many examples which fit just fine using 0. Is there a comparable alternative to nls that is not so extremely influenced by the start values? Use package `nls2' to first search on a grid, and then apply `nls' again to identify the globally best point: # Data for example fit afreq - 1 / (24 / 2 / pi) tpoints - c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12, 14,14,14,16,16,16,18,18,18,20,20,20,24,24,24) gene_expression - c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436, 1.471568, 1.623381, 1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690, 1.985159, 2.205064, 2.160308, 2.120189, 2.194758, 2.165993, 2.189981, 2.098671, 2.122207, 2.012621, 1.963610, 1.884184, 1.955160, 1.801175, 1.829686, 1.773260, 1.588768, 1.563774, 1.559192) shift=mean(gene_expression) # y-axis (expression) shift # Grid search library(nls2) frml - gene_expression ~ sin(tpoints * afreq + phase) * amp + shift startdf - data.frame(phase=c(0, 2*pi), amp = c(0, 2)) nls2(frml, algorithm = grid-search, start = startdf, control = list(maxiter=200)) # Perfect fit startvals - list(phase = 4.4880, amp = 0.2857) sine_nls - nls(frml, start=startvals) # phaseamp # 4.3964 0.2931 # residual sum-of-squares: 0.1378 Maybe this can be done in one step. Hans Werner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls start values
Niklaus Fankhauser niklaus.fankhauser at cell.biol.ethz.ch writes: I'm using nls to fit periodic gene-expression data to sine waves. I need to set the upper and lower boundaries, because I do not want any negative phase and amplitude solutions. This means that I have to use the port algorithm. The problem is, that depending on what start value I choose for phase, the fit works for some cases, but not for others. In the example below, the fit works using phase=pi, but not using phase=0. But there are many examples which fit just fine using 0. Is there a comparable alternative to nls that is not so extremely influenced by the start values? Use package `nls2' to first search on a grid, and then apply `nls' again to identify the globally best point: # Data for example fit afreq - 1 / (24 / 2 / pi) tpoints - c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12, 14,14,14,16,16,16,18,18,18,20,20,20,24,24,24) gene_expression - c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436, 1.471568, 1.623381, 1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690, 1.985159, 2.205064, 2.160308, 2.120189, 2.194758, 2.165993, 2.189981, 2.098671, 2.122207, 2.012621, 1.963610, 1.884184, 1.955160, 1.801175, 1.829686, 1.773260, 1.588768, 1.563774, 1.559192) shift=mean(gene_expression) # y-axis (expression) shift # Grid search library(nls2) frml - gene_expression ~ sin(tpoints * afreq + phase) * amp + shift startdf - data.frame(phase=c(0, 2*pi), amp = c(0, 2)) nls2(frml, algorithm = grid-search, start = startdf, control = list(maxiter=200)) # Perfect fit startvals - list(phase = 4.4880, amp = 0.2857) sine_nls - nls(frml, start=startvals) # phaseamp # 4.3964 0.2931 # residual sum-of-squares: 0.1378 Maybe this can be done in one step. Hans Werner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls start values
Niklaus: 1. First, you mat not need to use nls at all, although I am not familiar with the port algorithm, so I could very well be wrong about this. Generally speaking, one uses time series methods (e.g. fourier analysis) to fit periodic sine waves, so you may wish to check out CRAN's TimeSeries task view to see whether there is something there that fits your constrained fit situation. 2. If you DO need to fit a nonlinear function the short answer to your questions is maybe/maybe not; obviously, Hans's suggestions may help you get a better starting point but it still usea the same sensitive algorithm, which is some version of gradient descent iirc. The optimx package contains a varied collection of optimizers, some of which may well be more robust than that of nls2. Check out that package and the Optimization task view for background, references, and alternatives (such as derivative-free optimizers) Cheers, Bert On Tue, Dec 13, 2011 at 7:53 AM, Hans W Borchers hwborch...@googlemail.com wrote: Niklaus Fankhauser niklaus.fankhauser at cell.biol.ethz.ch writes: I'm using nls to fit periodic gene-expression data to sine waves. I need to set the upper and lower boundaries, because I do not want any negative phase and amplitude solutions. This means that I have to use the port algorithm. The problem is, that depending on what start value I choose for phase, the fit works for some cases, but not for others. In the example below, the fit works using phase=pi, but not using phase=0. But there are many examples which fit just fine using 0. Is there a comparable alternative to nls that is not so extremely influenced by the start values? Use package `nls2' to first search on a grid, and then apply `nls' again to identify the globally best point: # Data for example fit afreq - 1 / (24 / 2 / pi) tpoints - c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12, 14,14,14,16,16,16,18,18,18,20,20,20,24,24,24) gene_expression - c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436, 1.471568, 1.623381, 1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690, 1.985159, 2.205064, 2.160308, 2.120189, 2.194758, 2.165993, 2.189981, 2.098671, 2.122207, 2.012621, 1.963610, 1.884184, 1.955160, 1.801175, 1.829686, 1.773260, 1.588768, 1.563774, 1.559192) shift=mean(gene_expression) # y-axis (expression) shift # Grid search library(nls2) frml - gene_expression ~ sin(tpoints * afreq + phase) * amp + shift startdf - data.frame(phase=c(0, 2*pi), amp = c(0, 2)) nls2(frml, algorithm = grid-search, start = startdf, control = list(maxiter=200)) # Perfect fit startvals - list(phase = 4.4880, amp = 0.2857) sine_nls - nls(frml, start=startvals) # phase amp # 4.3964 0.2931 # residual sum-of-squares: 0.1378 Maybe this can be done in one step. Hans Werner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls start values
On Tue, Dec 13, 2011 at 10:53 AM, Hans W Borchers hwborch...@googlemail.com wrote: Niklaus Fankhauser niklaus.fankhauser at cell.biol.ethz.ch writes: I'm using nls to fit periodic gene-expression data to sine waves. I need to set the upper and lower boundaries, because I do not want any negative phase and amplitude solutions. This means that I have to use the port algorithm. The problem is, that depending on what start value I choose for phase, the fit works for some cases, but not for others. In the example below, the fit works using phase=pi, but not using phase=0. But there are many examples which fit just fine using 0. Is there a comparable alternative to nls that is not so extremely influenced by the start values? Use package `nls2' to first search on a grid, and then apply `nls' again to identify the globally best point: # Data for example fit afreq - 1 / (24 / 2 / pi) tpoints - c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12, 14,14,14,16,16,16,18,18,18,20,20,20,24,24,24) gene_expression - c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436, 1.471568, 1.623381, 1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690, 1.985159, 2.205064, 2.160308, 2.120189, 2.194758, 2.165993, 2.189981, 2.098671, 2.122207, 2.012621, 1.963610, 1.884184, 1.955160, 1.801175, 1.829686, 1.773260, 1.588768, 1.563774, 1.559192) shift=mean(gene_expression) # y-axis (expression) shift # Grid search library(nls2) frml - gene_expression ~ sin(tpoints * afreq + phase) * amp + shift startdf - data.frame(phase=c(0, 2*pi), amp = c(0, 2)) nls2(frml, algorithm = grid-search, start = startdf, control = list(maxiter=200)) # Perfect fit startvals - list(phase = 4.4880, amp = 0.2857) sine_nls - nls(frml, start=startvals) # phase amp # 4.3964 0.2931 # residual sum-of-squares: 0.1378 Just one small point here. If out is the result of the nls2 call above then then coef(out) is the best set of parameter values among those tested on the grid and: nls2(out, start = out) is a quick way to run it again starting from the value found using grid search. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nls start values
I'm using nls to fit periodic gene-expression data to sine waves. I need to set the upper and lower boundaries, because I do not want any negative phase and amplitude solutions. This means that I have to use the port algorithm. The problem is, that depending on what start value I choose for phase, the fit works for some cases, but not for others. In the example below, the fit works using phase=pi, but not using phase=0. But there are many examples which fit just fine using 0. Is there a comparable alternative to nls that is not so extremely influenced by the start values? # Data for example fit lowervals - list(phase=0, amp=0) uppervals - list(phase=2*pi, amp=2) afreq - 1 / (24 / 2 / pi) gene_expression - c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436, 1.471568, 1.623381, 1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690, 1.985159, 2.205064, 2.160308, 2.120189, 2.194758, 2.165993, 2.189981, 2.098671, 2.122207, 2.012621, 1.963610, 1.884184, 1.955160, 1.801175, 1.829686, 1.773260, 1.588768, 1.563774, 1.559192) tpoints - c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12,14,14,14,16,16,16,18,18,18,20,20,20,24,24,24) shift=mean(gene_expression) # y-axis (expression) shift # Perfect fit startvals - list(phase=pi, amp=0.5) sine_nls - nls(gene_expression ~ sin(tpoints * afreq + phase) * amp + shift, start=startvals, algorithm=port, lower=lowervals, upper=uppervals) # Convergence failure startvals - list(phase=0, amp=0.5) sine_nls - nls(gene_expression ~ sin(tpoints * afreq + phase) * amp + shift, start=startvals, algorithm=port, lower=lowervals, upper=uppervals) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.