Re: [R] Fit continuous distribution to truncated empirical values

Bert Gunter Mon, 14 Nov 2011 08:30:36 -0800

A non-helpful reply on a "language" issue.

"Truncated" data are quite different than "censored" data and require
different methodologies to analyze. You -- and many others in their
postings -- have appeared to confuse the two here.


-- Bert



On Mon, Nov 14, 2011 at 8:20 AM, Michele Mazzucco <michelemazzu...@gmail.com
> wrote:

> David,
>
> here is the smallest dataset
>
> # Bid   Price   Survival        Censored
> 0.03    0.029   1       1
> 0.03    0.029   11      1
> 0.03    0.029   10      1
> 0.03    0.029   9       1
> 0.03    0.029   8       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   1       1
> 0.03    0.029   1       1
> 0.03    0.029   9       1
> 0.03    0.029   8       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   16      1
> 0.03    0.029   15      1
> 0.03    0.029   14      1
> 0.03    0.029   13      1
> 0.03    0.029   12      1
> 0.03    0.029   11      1
> 0.03    0.029   10      1
> 0.03    0.029   9       1
> 0.03    0.029   8       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   30      1
> 0.03    0.029   29      1
> 0.03    0.029   28      1
> 0.03    0.029   27      1
> 0.03    0.029   26      1
> 0.03    0.029   25      1
> 0.03    0.029   24      1
> 0.03    0.029   23      1
> 0.03    0.029   22      1
> 0.03    0.029   21      1
> 0.03    0.029   20      1
> 0.03    0.029   19      1
> 0.03    0.029   18      1
> 0.03    0.029   17      1
> 0.03    0.029   16      1
> 0.03    0.029   15      1
> 0.03    0.029   14      1
> 0.03    0.029   13      1
> 0.03    0.029   12      1
> 0.03    0.029   11      1
> 0.03    0.029   10      1
> 0.03    0.029   9       1
> 0.03    0.029   8       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   8       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.027   71      0
> 0.03    0.027   70      0
> 0.03    0.027   69      0
> 0.03    0.027   68      0
> 0.03    0.027   67      0
> 0.03    0.027   66      0
> 0.03    0.027   65      0
> 0.03    0.027   64      0
> 0.03    0.027   63      0
> 0.03    0.027   62      0
> 0.03    0.027   61      0
> 0.03    0.027   60      0
> 0.03    0.027   59      0
> 0.03    0.027   58      0
> 0.03    0.027   57      0
> 0.03    0.027   56      0
> 0.03    0.027   55      0
> 0.03    0.027   54      0
> 0.03    0.027   53      0
> 0.03    0.027   52      0
> 0.03    0.027   51      0
> 0.03    0.027   50      0
> 0.03    0.027   49      0
> 0.03    0.027   48      0
> 0.03    0.027   47      0
> 0.03    0.027   46      0
> 0.03    0.027   45      0
> 0.03    0.027   44      0
> 0.03    0.027   43      0
> 0.03    0.027   42      0
> 0.03    0.027   41      0
> 0.03    0.027   40      0
> 0.03    0.027   39      0
> 0.03    0.027   38      0
> 0.03    0.027   37      0
> 0.03    0.027   36      0
> 0.03    0.027   35      0
> 0.03    0.027   34      0
> 0.03    0.027   33      0
> 0.03    0.027   32      0
> 0.03    0.027   31      0
> 0.03    0.027   30      0
> 0.03    0.027   29      0
> 0.03    0.027   28      0
> 0.03    0.027   27      0
> 0.03    0.027   26      0
> 0.03    0.027   25      0
> 0.03    0.027   24      0
> 0.03    0.027   23      0
> 0.03    0.027   22      0
> 0.03    0.027   21      0
> 0.03    0.027   20      0
> 0.03    0.027   19      0
> 0.03    0.027   18      0
> 0.03    0.027   17      0
> 0.03    0.027   16      0
> 0.03    0.027   15      0
> 0.03    0.027   14      0
> 0.03    0.027   13      0
> 0.03    0.027   12      0
> 0.03    0.027   11      0
> 0.03    0.027   10      0
> 0.03    0.027   9       0
> 0.03    0.027   8       0
> 0.03    0.027   7       0
> 0.03    0.027   6       0
> 0.03    0.027   5       0
> 0.03    0.027   4       0
> 0.03    0.027   3       0
> 0.03    0.027   2       0
> 0.03    0.027   1       0
>
>
> This is column # 3
> 1 11 10  9  8  7  6  5  4  3  2  1  1  1  9  8  7  6  5  4  3  2  1 16 15
> 14
> 13 12 11 10  9  8  7  6  5  4  3  2  1  6  5  4  3  2  1  7  6  5  4  3  2
>  1
> 5  4  3  2  1  4  3  2  1  6  5  4  3  2  1 30 29 28 27 26 25 24 23 22 21
> 20
> 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  8  7  6  5  4  3
>  2
> 1  4  3  2  1 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52
> 51
> 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26
> 25
> 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1
>
>
> This is the script I have used for fitting the dataset to a Weibull
> distribution
>
>
> my_plot <- function() {
>        require(plotrix) # for plotCI
>        require(MASS) # for fitdistr
>        require(survival) # for survival analysis
>        require(fitdistrplus) # for fitdist
>
>
>        lifetimes=read.table("lifetime.txt", header=F, comment.char="#")
>        s = Surv(lifetimes$V3, lifetimes$V4)
>        mfit = survfit(s~V1, data=lifetimes, conf.type="plain")
>
>        cdf=ecdf(lifetimes$V3)
>        empirical = numeric(24)
>        T = seq(1,24)
>        for (i in T) empirical[i] = cdf(i)
>
>        fit = fitdist(lifetimes$V3, "weibull")
>        plot(T, 1- pweibull(T, fit$estimate[1], fit$estimate[2]), type="l",
> col="red", lwd=2, lty=1, xlim=c(1,24), xlab="Time [hours]", ylab="S(t)")
>        lines(T, 1 - empirical, col="blue", lwd=2, lty=2)
>        plotCI(T, mfit$surv[T], ui=mfit$upper[T], li=mfit$lower[T],
> col="green", lwd=1, lty=3, add=T)
>        legend("bottomleft", c("Weibull fit", "1 - Empirical CDF",
> "Empirical S(t)"), col=c("red", "blue", "green"), horiz=F, lty=1:3, lwd=2)
>
> }
>
>
> my_plot()
>
>
>
>
> About your questions
> 1 - What do you mean?, I guess I don't know the answer -- that's why I
> have posted my question on this mailing list.
> 2 - As I wrote in my first email, I am interested only in the first
> portion of the survival times (24 hours). Hence, I'd prefer to have a
> "good" fit over the interval of interest rather than a "reasonable" fit
> over the whole data.
> 3 - I went though those tutorials, but couldn't find an answer. That's
> probably due to the fact that those tutorials are "general", as you pointed
> out, while my question is rather specific.
>
> Cheers,
> Michele
>
>
>
>
> On Nov 14, 2011, at 5:59 PM, David Winsemius wrote:
>
> >
> > On Nov 14, 2011, at 6:11 AM, Michele Mazzucco wrote:
> >
> >> Hello David,
> >>
> >> thanks for your answer.
> >> I have done as you told me, however the fit is very poor, much worse
> than that obtained from using the whole dataset (without upper bound).
> >> Any idea?
> >
> > Counter questions in the absence of data:
> >
> > ??? Is there a reason from domain specific knowledge and theory to
> expect that the procedure you are attempting should be correct?
> >
> > ??? Why would you want to exclude data?
> >
> > ??? Why are you not looking for more general tutorials (such as the one
> by Ricci) rather than asking what is as yet an open-ended question on
> fitting methods that surely does not admit an answer easily provided on a
> mailing list?
> >
> > --
> > David.
> >
> >
> >>
> >> Thanks,
> >> Michele
> >>
> >> On Nov 4, 2011, at 8:56 PM, David Winsemius wrote:
> >>
> >>>
> >>> On Nov 3, 2011, at 7:54 AM, Michele Mazzucco wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I am trying to fit a distribution to some data about survival times.
> >>>> I am interested only in a specific interval, e.g., while the data
> lies in the interval (0,...., 600), I want the best for the interval
> (0,..., 24).
> >>>>
> >>>> I have tried both fitdistr (MASS package) and fitdist (from the
> fitdistrplus package), but I could not get them working, e.g.
> >>>>
> >>>> fitdistr(left, "weibull", upper=24)
> >>>> Error in optim(x = c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L,
> 521L,  :
> >>>> L-BFGS-B needs finite values of 'fn'
> >>>> In addition: Warning message:
> >>>> In dweibull(x, shape, scale, log) : NaNs produced
> >>>>
> >>>> Am I doing something wrong?
> >>>
> >>> You didn't supply data to test,  but shouldn't you supply a lower
> bound if you want to fit "weibull"? It is, after all, bounded at 0.
> >>>
> >>>> left <- c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L,
> 50*runif(100))
> >>>> fitdistr(left, "weibull", upper=24)
> >>> Error in optim(x = c(529, 528, 527, 526, 525, 524, 523, 522, 521,
> 18.3964251773432,  :
> >>> L-BFGS-B needs finite values of 'fn'
> >>> In addition: Warning message:
> >>> In dweibull(x, shape, scale, log) : NaNs produced
> >>>
> >>>> fitdistr(left, "weibull", upper=24, lower=0.5)
> >>>    shape         scale
> >>> 0.58195013   24.00000000
> >>> ( 0.04046087) ( 3.38621367)
> >>>
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Michele
> >>>>
> >>>>
> >>>> p.s. I have seen similar posts, e.g.,
> http://tolstoy.newcastle.edu.au/R/help/05/02/11558.html, but I am not
> sure whether I can apply the same approach here.
> >>>> ______________________________________________
> >>>> R-help@r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>> David Winsemius, MD
> >>> Heritage Laboratories
> >>> West Hartford, CT
> >>>
> >>
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fit continuous distribution to truncated empirical values

Reply via email to