Re: [R] nls start values

2011-12-16 Thread Nick Fankhauser
Hi!

thanks a lot for this suggestion! I tried to implement it like this, and
it worked nicely.
I used the method suggested by Gabor Grothendieck for simplification:

frml - gene_expression ~ sin(tpoints * afreq + phase) * amp + shift
gridfit - nls2(frml, algorithm = grid-search, data=gendat, start =
startdf)
sine_nls - nls2(gridfit, data=gendat, start = gridfit, algorithm=port)

But I also tried the method of running the nls with start-values for
phase between 0 and 2pi in five steps and then choosing the one which
has the lowest Standardized Residual Sum of Squares. This worked even
better in some cases and I think it's also conceptually simpler.

Yours,
Nick

On 13/12/11 16:53, Hans W Borchers wrote:
 Niklaus Fankhauser niklaus.fankhauser at cell.biol.ethz.ch writes:

   
 I'm using nls to fit periodic gene-expression data to sine waves. I need
 to set the upper and lower boundaries, because I do not want any
 negative phase and amplitude solutions. This means that I have to use
 the port algorithm. The problem is, that depending on what start value
 I choose for phase, the fit works for some cases, but not for others.
 In the example below, the fit works using phase=pi,  but not using
 phase=0. But there are many examples which fit just fine using 0.

 Is there a comparable alternative to nls that is not so extremely
 influenced by the start values?

 
 Use package `nls2' to first search on a grid, and then apply `nls' again
 to identify the globally best point:

 # Data for example fit
 afreq - 1 / (24 / 2 / pi)
 tpoints - c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12,
  14,14,14,16,16,16,18,18,18,20,20,20,24,24,24)
 gene_expression -
 c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436, 1.471568, 1.623381,
   1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690,
   1.985159, 2.205064, 2.160308, 2.120189, 2.194758, 2.165993, 2.189981,
   2.098671, 2.122207, 2.012621, 1.963610, 1.884184, 1.955160, 1.801175,
   1.829686, 1.773260, 1.588768, 1.563774, 1.559192)
 shift=mean(gene_expression) # y-axis (expression) shift

 # Grid search
 library(nls2)
 frml - gene_expression ~ sin(tpoints * afreq + phase) * amp + shift
 startdf - data.frame(phase=c(0, 2*pi), amp = c(0, 2))
 nls2(frml, algorithm = grid-search, start = startdf,
control = list(maxiter=200))

 # Perfect fit
 startvals - list(phase = 4.4880, amp = 0.2857)
 sine_nls - nls(frml, start=startvals)
 #  phaseamp 
 # 4.3964 0.2931 
 # residual sum-of-squares: 0.1378

 Maybe this can be done in one step.
 Hans Werner

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls start values

2011-12-13 Thread Hans W Borchers
Niklaus Fankhauser niklaus.fankhauser at cell.biol.ethz.ch writes:

 I'm using nls to fit periodic gene-expression data to sine waves. I need
 to set the upper and lower boundaries, because I do not want any
 negative phase and amplitude solutions. This means that I have to use
 the port algorithm. The problem is, that depending on what start value
 I choose for phase, the fit works for some cases, but not for others.
 In the example below, the fit works using phase=pi,  but not using
 phase=0. But there are many examples which fit just fine using 0.
 
 Is there a comparable alternative to nls that is not so extremely
 influenced by the start values?
 

Use package `nls2' to first search on a grid, and then apply `nls' again
to identify the globally best point:

# Data for example fit
afreq - 1 / (24 / 2 / pi)
tpoints - c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12,
 14,14,14,16,16,16,18,18,18,20,20,20,24,24,24)
gene_expression -
c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436, 1.471568, 1.623381,
  1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690,
  1.985159, 2.205064, 2.160308, 2.120189, 2.194758, 2.165993, 2.189981,
  2.098671, 2.122207, 2.012621, 1.963610, 1.884184, 1.955160, 1.801175,
  1.829686, 1.773260, 1.588768, 1.563774, 1.559192)
shift=mean(gene_expression) # y-axis (expression) shift

# Grid search
library(nls2)
frml - gene_expression ~ sin(tpoints * afreq + phase) * amp + shift
startdf - data.frame(phase=c(0, 2*pi), amp = c(0, 2))
nls2(frml, algorithm = grid-search, start = startdf,
   control = list(maxiter=200))

# Perfect fit
startvals - list(phase = 4.4880, amp = 0.2857)
sine_nls - nls(frml, start=startvals)
#  phaseamp 
# 4.3964 0.2931 
# residual sum-of-squares: 0.1378

Maybe this can be done in one step.
Hans Werner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls start values

2011-12-13 Thread Bert Gunter
Niklaus:

1. First, you mat not need to use nls at all, although I am not
familiar with the port algorithm, so I could very well be wrong
about this.  Generally speaking, one uses time series methods (e.g.
fourier analysis) to fit periodic sine waves, so you may wish to check
out CRAN's TimeSeries task view to see whether there is something
there that fits your constrained fit situation.

2. If you DO need to fit a nonlinear function the  short answer to
your questions is maybe/maybe not; obviously, Hans's suggestions may
help you get a better starting point but it still usea the same
sensitive algorithm, which is some version of gradient descent iirc.
The optimx package contains a varied collection of optimizers, some of
which may well be more robust than that of nls2. Check out that
package and the Optimization task view for background, references, and
alternatives (such as derivative-free optimizers)

Cheers,

Bert

On Tue, Dec 13, 2011 at 7:53 AM, Hans W Borchers
hwborch...@googlemail.com wrote:
 Niklaus Fankhauser niklaus.fankhauser at cell.biol.ethz.ch writes:

 I'm using nls to fit periodic gene-expression data to sine waves. I need
 to set the upper and lower boundaries, because I do not want any
 negative phase and amplitude solutions. This means that I have to use
 the port algorithm. The problem is, that depending on what start value
 I choose for phase, the fit works for some cases, but not for others.
 In the example below, the fit works using phase=pi,  but not using
 phase=0. But there are many examples which fit just fine using 0.

 Is there a comparable alternative to nls that is not so extremely
 influenced by the start values?


 Use package `nls2' to first search on a grid, and then apply `nls' again
 to identify the globally best point:

    # Data for example fit
    afreq - 1 / (24 / 2 / pi)
    tpoints - c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12,
                 14,14,14,16,16,16,18,18,18,20,20,20,24,24,24)
    gene_expression -
    c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436, 1.471568, 1.623381,
      1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690,
      1.985159, 2.205064, 2.160308, 2.120189, 2.194758, 2.165993, 2.189981,
      2.098671, 2.122207, 2.012621, 1.963610, 1.884184, 1.955160, 1.801175,
      1.829686, 1.773260, 1.588768, 1.563774, 1.559192)
    shift=mean(gene_expression) # y-axis (expression) shift

    # Grid search
    library(nls2)
    frml - gene_expression ~ sin(tpoints * afreq + phase) * amp + shift
    startdf - data.frame(phase=c(0, 2*pi), amp = c(0, 2))
    nls2(frml, algorithm = grid-search, start = startdf,
               control = list(maxiter=200))

    # Perfect fit
    startvals - list(phase = 4.4880, amp = 0.2857)
    sine_nls - nls(frml, start=startvals)
    #  phase    amp
    # 4.3964 0.2931
    # residual sum-of-squares: 0.1378

 Maybe this can be done in one step.
 Hans Werner

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls start values

2011-12-13 Thread Gabor Grothendieck
On Tue, Dec 13, 2011 at 10:53 AM, Hans W Borchers
hwborch...@googlemail.com wrote:
 Niklaus Fankhauser niklaus.fankhauser at cell.biol.ethz.ch writes:

 I'm using nls to fit periodic gene-expression data to sine waves. I need
 to set the upper and lower boundaries, because I do not want any
 negative phase and amplitude solutions. This means that I have to use
 the port algorithm. The problem is, that depending on what start value
 I choose for phase, the fit works for some cases, but not for others.
 In the example below, the fit works using phase=pi,  but not using
 phase=0. But there are many examples which fit just fine using 0.

 Is there a comparable alternative to nls that is not so extremely
 influenced by the start values?


 Use package `nls2' to first search on a grid, and then apply `nls' again
 to identify the globally best point:

    # Data for example fit
    afreq - 1 / (24 / 2 / pi)
    tpoints - c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12,
                 14,14,14,16,16,16,18,18,18,20,20,20,24,24,24)
    gene_expression -
    c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436, 1.471568, 1.623381,
      1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690,
      1.985159, 2.205064, 2.160308, 2.120189, 2.194758, 2.165993, 2.189981,
      2.098671, 2.122207, 2.012621, 1.963610, 1.884184, 1.955160, 1.801175,
      1.829686, 1.773260, 1.588768, 1.563774, 1.559192)
    shift=mean(gene_expression) # y-axis (expression) shift

    # Grid search
    library(nls2)
    frml - gene_expression ~ sin(tpoints * afreq + phase) * amp + shift
    startdf - data.frame(phase=c(0, 2*pi), amp = c(0, 2))
    nls2(frml, algorithm = grid-search, start = startdf,
               control = list(maxiter=200))

    # Perfect fit
    startvals - list(phase = 4.4880, amp = 0.2857)
    sine_nls - nls(frml, start=startvals)
    #  phase    amp
    # 4.3964 0.2931
    # residual sum-of-squares: 0.1378


Just one small point here.  If out is the result of the nls2 call
above then then coef(out) is the best set of parameter values among
those tested on the grid and:

   nls2(out, start = out)

is a quick way to run it again starting from the value found using grid search.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nls start values

2011-12-11 Thread Niklaus Fankhauser
I'm using nls to fit periodic gene-expression data to sine waves. I need
to set the upper and lower boundaries, because I do not want any
negative phase and amplitude solutions. This means that I have to use
the port algorithm. The problem is, that depending on what start value
I choose for phase, the fit works for some cases, but not for others.
In the example below, the fit works using phase=pi,  but not using
phase=0. But there are many examples which fit just fine using 0.

Is there a comparable alternative to nls that is not so extremely
influenced by the start values?

# Data for example fit
lowervals - list(phase=0, amp=0)
uppervals - list(phase=2*pi, amp=2)
afreq - 1 / (24 / 2 / pi)
gene_expression - c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436,
1.471568, 1.623381,
1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690,
1.985159, 2.205064,
2.160308, 2.120189, 2.194758, 2.165993, 2.189981, 2.098671, 2.122207,
2.012621, 1.963610,
1.884184, 1.955160, 1.801175, 1.829686, 1.773260, 1.588768, 1.563774,
1.559192)
tpoints -
c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12,14,14,14,16,16,16,18,18,18,20,20,20,24,24,24)
shift=mean(gene_expression) # y-axis (expression) shift

# Perfect fit
startvals - list(phase=pi, amp=0.5)
sine_nls - nls(gene_expression ~ sin(tpoints * afreq + phase) * amp +
shift, start=startvals, algorithm=port, lower=lowervals, upper=uppervals)

# Convergence failure
startvals - list(phase=0, amp=0.5)
sine_nls - nls(gene_expression ~ sin(tpoints * afreq + phase) * amp +
shift, start=startvals, algorithm=port, lower=lowervals, upper=uppervals)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.