Re: [R] Very slow optim()

2021-03-12 Thread Deepayan Sarkar
On Sat, Mar 13, 2021 at 10:08 AM Spencer Graves
 wrote:
>
> TWO COMMENTS:
>
>
> 1.  DID YOU ASSIGN THE OUTPUT OF "optim" to an object, like "est <-
> optim(...)"?  If yes and if "optim" terminated normally, the 60,000+
> paramters should be there as est$par.  See the documentation on "optim".
>
>
> 2.  WHAT PROBLEM ARE YOU TRYING TO SOLVE?
>
>
>   I hope you will forgive me for being blunt (or perhaps bigoted), but
> I'm skeptical about anyone wanting to use optim to estimate 60,000+
> parameters.  With a situation like that, I think you would be wise to
> recast the problem as one in which those 60,000+ parameters are sampled
> from some hyperdistribution characterized by a small number of
> hyperparameters.  Then write a model where your observations are sampled
> from distribution(s) controlled by these random parameters.  Then
> multiply the likelihood of the observations by the likelihood of the
> hyperdistribution and integrate out the 60,000+ parameters, leaving only
> a small number hyperparameters.

Just a comment on this comment: I think it's perfectly reasonable to
optimize 60k+ parameters with conjugate gradient. CG was originally
developed to solve linear equations of the form Ax=b. If x was not
large in size, one would just use solve(A, b) instead of an iterative
method.

Use of CG is quite common in image processing. A relatively small
300x300 image will give you 90k parameters.

-Deepayan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very slow optim()

2021-03-12 Thread Spencer Graves

TWO COMMENTS:


1.  DID YOU ASSIGN THE OUTPUT OF "optim" to an object, like "est <- 
optim(...)"?  If yes and if "optim" terminated normally, the 60,000+ 
paramters should be there as est$par.  See the documentation on "optim".



2.  WHAT PROBLEM ARE YOU TRYING TO SOLVE?


	  I hope you will forgive me for being blunt (or perhaps bigoted), but 
I'm skeptical about anyone wanting to use optim to estimate 60,000+ 
parameters.  With a situation like that, I think you would be wise to 
recast the problem as one in which those 60,000+ parameters are sampled 
from some hyperdistribution characterized by a small number of 
hyperparameters.  Then write a model where your observations are sampled 
from distribution(s) controlled by these random parameters.  Then 
multiply the likelihood of the observations by the likelihood of the 
hyperdistribution and integrate out the 60,000+ parameters, leaving only 
a small number hyperparameters.



	  When everything is linear and all the random variables / random 
effects and observation errors follow normal distributions, this is the 
classic linear, mixed-effects situation that is routinely handled well 
for most such situations by the nlme package, documented with in 
companion book Pinhiero and Bates (2000) Mixed-Effects Models in S and 
S-PLUS (Springer).  If the models are nonlinear but with curvature that 
is reasonably well behaved and the random variables / random effects and 
observation errors are still normal, the nlme package and Pinhiero and 
Bates still provide a great approach to most such situations, as far as 
I know.  When the observations are non-normally distributed, then the 
best software I know is the lme4 package.  I have not used it recently, 
but it was written and being maintained by some of the leading experts 
in this area as far as I know.



CONCLUSION:


	  If you are short on time and "1" will work for you, do that. 
Obviously, you will need to do some further analysis to understand the 
60,000+ parameters you estimated -- which implies by itself that you 
really should be using approach "2".  However, if I'm short on time and 
need an answer, then I'd ignore "2" and hope to get something by 
plotting and doing other things with the 60,000+ parameters that should 
be in "est$par" if "optim" actually ended normally.



	  However, if the problem is sufficiently important to justify more 
work, then I'd want to cast it as some kind if mixed-effects model, per 
"2" -- perhaps using an analysis of "1" as a first step towards "2".



  Hope this helps.
  Spencer


On 2021-03-12 20:53, J C Nash wrote:

optim() has no method really suitable for very large numbers of parameters.

- CG as set up has never worked very well in any of its implementations
   (I wrote it, so am allowed to say so!). Rcgmin in optimx package works
   better, as does Rtnmin. Neither are really intended for 60K parameters
   however.

- optim::L-BFGS-B is reasonable, but my experience is that it still is not
   intended for more than a couple of hundred parameters.

JN



On 2021-03-12 9:31 p.m., Jeff Newmiller wrote:

Calculate fewer of them?

If you don't setup your code to save intermediate results, then you cannot see 
intermediate results.

On March 11, 2021 8:32:17 PM PST, "毕芳妮 via R-help"  wrote:

Dear list,
I am using optim() to estimate over 60 thousans of parameters, and use
the server to run the program.But it took me 5 hours and there was just
no result coming out.How could I do to show some results that have been
calculated by optim()?
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very slow optim()

2021-03-12 Thread J C Nash
optim() has no method really suitable for very large numbers of parameters.

- CG as set up has never worked very well in any of its implementations
  (I wrote it, so am allowed to say so!). Rcgmin in optimx package works
  better, as does Rtnmin. Neither are really intended for 60K parameters
  however.

- optim::L-BFGS-B is reasonable, but my experience is that it still is not
  intended for more than a couple of hundred parameters.

JN



On 2021-03-12 9:31 p.m., Jeff Newmiller wrote:
> Calculate fewer of them?
> 
> If you don't setup your code to save intermediate results, then you cannot 
> see intermediate results.
> 
> On March 11, 2021 8:32:17 PM PST, "毕芳妮 via R-help"  
> wrote:
>> Dear list, 
>> I am using optim() to estimate over 60 thousans of parameters, and use
>> the server to run the program.But it took me 5 hours and there was just
>> no result coming out.How could I do to show some results that have been
>> calculated by optim()?
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very slow optim(): solved

2021-03-12 Thread Jeff Newmiller
Calculate fewer of them?

If you don't setup your code to save intermediate results, then you cannot see 
intermediate results.

On March 11, 2021 8:32:17 PM PST, "毕芳妮 via R-help"  wrote:
>Dear list, 
>I am using optim() to estimate over 60 thousans of parameters, and use
>the server to run the program.But it took me 5 hours and there was just
>no result coming out.How could I do to show some results that have been
>calculated by optim()?
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very slow optim(): solved

2021-03-12 Thread 毕芳妮 via R-help
Dear list, 
 I am using optim() to estimate over 60 thousans of parameters, and use the 
server to run the program.But it took me 5 hours and there was just no result 
coming out.How could I do to show some results that have been calculated by 
optim()?
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R extension memory leak detection question

2021-03-12 Thread Duncan Murdoch

On 12/03/2021 12:37 p.m., Duncan Murdoch wrote:

On 12/03/2021 12:13 p.m., xiaoyan yu wrote:

I am writing C++ program based on R extensions and also try to test the
program with google address sanitizer.

I thought if we don't protect the variable from the allocation API such as
Rf_allocVector, there will be a memory leak. However, the address sanitizer
didn't report it. Is my understanding correct? Or I will see the memory
leak only if I compile R source code with the address sanitizer.



Your question is unclear without an actual example.  It all depends on
how the variable was created and how you use it.

If your real code is only a few lines, post it here. 


Sorry, to R-devel, not here.


 Otherwise, please

put together a minimal working example that contains the essence of what
you are doing in a few lines.  Check that it compiles, and we can
provide advice about whether it is doing dangerous things.

Duncan Murdoch



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R extension memory leak detection question

2021-03-12 Thread Duncan Murdoch

On 12/03/2021 12:13 p.m., xiaoyan yu wrote:

I am writing C++ program based on R extensions and also try to test the
program with google address sanitizer.

I thought if we don't protect the variable from the allocation API such as
Rf_allocVector, there will be a memory leak. However, the address sanitizer
didn't report it. Is my understanding correct? Or I will see the memory
leak only if I compile R source code with the address sanitizer.



Your question is unclear without an actual example.  It all depends on 
how the variable was created and how you use it.


If your real code is only a few lines, post it here.  Otherwise, please 
put together a minimal working example that contains the essence of what 
you are doing in a few lines.  Check that it compiles, and we can 
provide advice about whether it is doing dangerous things.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R extension memory leak detection question

2021-03-12 Thread Bert Gunter
This is the wrong list for such questions. Post to r-devel instead.

Cheers,

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Mar 12, 2021 at 9:13 AM xiaoyan yu  wrote:

> I am writing C++ program based on R extensions and also try to test the
> program with google address sanitizer.
>
> I thought if we don't protect the variable from the allocation API such as
> Rf_allocVector, there will be a memory leak. However, the address sanitizer
> didn't report it. Is my understanding correct? Or I will see the memory
> leak only if I compile R source code with the address sanitizer.
>
>  Please help!
>
> Thanks,
> Xiaoyan
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R extension memory leak detection question

2021-03-12 Thread xiaoyan yu
I am writing C++ program based on R extensions and also try to test the
program with google address sanitizer.

I thought if we don't protect the variable from the allocation API such as
Rf_allocVector, there will be a memory leak. However, the address sanitizer
didn't report it. Is my understanding correct? Or I will see the memory
leak only if I compile R source code with the address sanitizer.

 Please help!

Thanks,
Xiaoyan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External] Re: [External] Re: Help please

2021-03-12 Thread Areti Panopoulou
Thank you so much everyone for your time and help! I really appreciate it.

Kind Regards,
Areti

On Thu, Mar 11, 2021 at 10:34 PM Rasmus Liland  wrote:

> That is such a nice email crammed with
> useful info I'm sure Areti Panopoulou
> appreciates.  JR
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quantile from quantile table calculation without original data

2021-03-12 Thread Abby Spurdle
Hi Petr,

In principle, I like David's approach the best.
However, I note that there's a bug in the squared step.
Furthemore, the variance of the sample quantiles should increase as
they move away from the modal region.

I've built on David's approach, but changed it to a two stage
optimization algorithm.
The parameter estimates from the first stage are used to compute density values.
Then the second stage is weighted, using the scaled density values.

I tried to create an iteratively reweighted algorithm.
However, it didn't converge.
(But that doesn't necessarily mean it can't be done).

The following code returns the value: 1.648416e-05

qfit.lnorm <- function (p, q, lower.tail=TRUE, ...,
par0 = c (-0.5, 0.5) )
{   n <- length (p)
qsample <- q

objf <- function (par)
{   qmodel <- qlnorm (p, par [1], par [2], lower.tail)
sum ( (qmodel - qsample)^2) / n
}
objf.w <- function (wpar, w)
{   qmodel <- qlnorm (p, wpar [1], wpar [2], lower.tail)
sum (w * (qmodel - qsample)^2)
}

wpar0 <- optim (par0, objf)$par
w <- dlnorm (p, wpar0 [1], wpar0 [2], lower.tail)
optim (wpar0, objf.w,, w=w)
}

par <- qfit.lnorm (temp$percent, temp$size, FALSE)$par
plnorm (0.1, par [1], par [2])


On Tue, Mar 9, 2021 at 2:52 AM PIKAL Petr  wrote:
>
> Hallo David, Abby and Bert
>
> Thank you for your solutions. In the meantime I found package 
> rriskDistributions, which was able to calculate values for lognormal 
> distribution from quantiles.
>
> Abby
> > 1-psolution
> [1] 9.980823e-06
>
> David
> > plnorm(0.1, -.7020649, .4678656)
> [1] 0.0003120744
>
> rriskDistributions
> > plnorm(0.1, -.6937355, .3881209)
> [1] 1.697379e-05
>
> Bert suggested to ask for original data before quantile calculation what is 
> probably the best but also the most problematic solution. Actually, maybe 
> original data are unavailable as it is the result from particle size 
> measurement, where the software always twist the original data and spits only 
> descriptive results.
>
> All your results are quite consistent with the available values as they are 
> close to 1, so for me, each approach works.
>
> Thank you again.
>
> Best regards.
> Petr
>
> > -Original Message-
> > From: David Winsemius 
> > Sent: Sunday, March 7, 2021 1:33 AM
> > To: Abby Spurdle ; PIKAL Petr
> > 
> > Cc: r-help@r-project.org
> > Subject: Re: [R] quantile from quantile table calculation without original 
> > data
> >
> >
> > On 3/6/21 1:02 AM, Abby Spurdle wrote:
> > > I came up with a solution.
> > > But not necessarily the best solution.
> > >
> > > I used a spline to approximate the quantile function.
> > > Then use that to generate a large sample.
> > > (I don't see any need for the sample to be random, as such).
> > > Then compute the sample mean and sd, on a log scale.
> > > Finally, plug everything into the plnorm function:
> > >
> > > p <- seq (0.01, 0.99,, 1e6)
> > > Fht <- splinefun (temp$percent, temp$size) x <- log (Fht (p) )
> > > psolution <- plnorm (0.1, mean (x), sd (x), FALSE) psolution
> > >
> > > The value of the solution is very close to one.
> > > Which is not a surprise.
> > >
> > > Here's a plot of everything:
> > >
> > > u <- seq (0.01, 1.65,, 200)
> > > v <- plnorm (u, mean (x), sd (x), FALSE) plot (u, v, type="l", ylim =
> > > c (0, 1) ) points (temp$size, temp$percent, pch=16) points (0.1,
> > > psolution, pch=16, col="blue")
> >
> > Here's another approach, which uses minimization of the squared error to
> > get the parameters for a lognormal distribution.
> >
> > temp <- structure(list(size = c(1.6, 0.9466, 0.8062, 0.6477, 0.5069, 0.3781,
> > 0.3047, 0.2681, 0.1907), percent = c(0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 
> > 0.95,
> > 0.99)), .Names = c("size", "percent"
> > ), row.names = c(NA, -9L), class = "data.frame")
> >
> > obj <- function(x) {sum( qlnorm(1-temp$percent, x[[1]], x[[2]])-temp$size
> > )^2}
> >
> > # Note the inversion of the poorly named and flipped "percent" column,
> >
> > optim( list(a=-0.65, b=0.42), obj)
> >
> > #
> >
> > $par
> >   a  b
> > -0.7020649  0.4678656
> >
> > $value
> > [1] 3.110316e-12
> >
> > $counts
> > function gradient
> >51   NA
> >
> > $convergence
> > [1] 0
> >
> > $message
> > NULL
> >
> >
> > I'm not sure how principled this might be. There's no consideration in this
> > approach for expected sampling error at the right tail where the magnitudes
> > of the observed values will create much larger contributions to the sum of
> > squares.
> >
> > --
> >
> > David.
> >
> > >
> > >
> > > On Sat, Mar 6, 2021 at 8:09 PM Abby Spurdle 
> > wrote:
> > >> I'm sorry.
> > >> I misread your example, this morning.
> > >> (I didn't read the code after the line that calls plot).
> > >>
> > >> After looking at this problem again, interpolation doesn't apply, and
> > >> extrapolation would be a last resort.
> > >> If you can assume your data comes from a particular type of
> > >> distribution, such as