Re: [R] Fitting mean and covariance of Multivariate normal with censored data

Sicotte, Hugues Ph.D. Mon, 13 Nov 2006 14:43:49 -0800

Nobody answered, but this is what I did.

I used an iterative poor man's imputation and filled in the data with a
single value at each iteration. My covariance matrices and variance will
be a little underestimated, but it'll do for me at this stage of this
project.


I iteratively filled in the data using the current data to estimate the
parameters of the normal distribution.
Using the current estimates, I figured out how much of the current
distribution is outside each cut-off. Then fill-in at the mid-point of
the probability density that is outside the area. I limited my
corrections to 3*sigma.

Here is the code section inside the loop (I am not showing another part
of the code that does outlier elimination, nor the rest of the loop that
recomputes the covariance and the parameters of the normal distribution)

Trim the data to only use "lines" with at least 60% real data.
Trim the columns to those that have enough data to estimate the
parameters of the normal distribution (e.g. ~ 20)

Loop over iter iterations ( from 1 to iterlast)

                  Compute mean, variance, and co-variance matrix.
                  Code to detect outliers.. Only activated after
"iterlast" iterations
                               Sort and trim worst outliers (left and
right, up to 10% of data)

                  For Each "line" of data, find out which elements were
missing in original data set.

                        for(im in missings) {                
                           # Compute the quantile on the cutoff away
from N(mode,variance) for both cutoff
 
quantOver<-pnorm(c(highR),mean=mu[im],sd=sigmaMu[im])
 
quantUnder<-pnorm(c(lowR),mean=mu[im],sd=sigmaMu[im])
                           # Find out which end of the distribution is
the most outside of the limit
                           quantOverMid<- (1.0-quantOver)/2
                           quantUnderMid<- quantUnder/2
#                          correct the mean, but underestimate the
variance a little bit in cases
#                          where width of pattern> width of
cutoff-region
                           fixUp<-mu[im]
                           fixLow<-mu[im]
#                          Don't fix if distribution is cutoff at edges.
                           if(quantOverMid<0.05) {
                                quantOverMid<-0.0
                           } else {
 
fixUp<-qnorm(c(1.0-quantOverMid),mean=mu[im],sd=sigmaMu[im])
                           }
                           if(quantUnderMid<0.05) {
                                quantUnderMid<-0.0
                           } else {
 
fixLow<-qnorm(c(quantUnderMid),mean=mu[im],sd=sigmaMu[im])
                           }
#                          Censored variables "pull" from both sides.
                           quantNormSum <- (quantOverMid+quantUnderMid)
                           if(quantNormSum>=0.05) {
                               quantNormSum<- 1.0/quantNormSum
                               fix <-
(fixUp*quantOverMid+fixLow*quantUnderMid)*quantNormSum
                           } else {
#                              If missing data is in the tails, correct
by cutoff.
                               if(quantNormSum>1e-10) {
                                   fix<-
(highR*(1.0-quantOver)+lowR*quantUnder)/((1.0-quantOver)+quantUnder)
                               } else {
                                   fix<-mu[im]
                               }
                           }
                           if(!is.finite(fix)) {
                               if(1.0-quantOver > quantUnder) {
                                    fix<-mu[im]+3*sigmaMu[im]
                               } else {
                                 fix<-  mu[im]-3*sigmaMu[im]
                               }
                           } else {
                               if(fix>mu[im]+3*sigmaMu[im]) {
                                     fix<-mu[im]+3*sigmaMu[im]
                               } else if (fix< mu[im]-3*sigmaMu[im]) {
                                     fix<-  mu[im]-3*sigmaMu[im]
                               }
                           }




> _____________________________________________ 
> From:         Sicotte, Hugues   Ph.D.  
> Sent: Wednesday, November 01, 2006 11:38 AM
> To:   'r-help@stat.math.ethz.ch'
> Subject:      Fitting mean and covariance of Multivariate normal with
> censored data
> 
> Hello,
>         I have been googling for 2 days and I cannot find the answer
> in previous posts.
> 
> I have a set of d-dimensional data elements (d=11 .. 14), each data
> point can be censored at different values both
> Lower-limit and upper limit.
> 
> N = 2000 sets of vectors of
> D=11 data points per vector.
> Each of the N*D points can have different upper and lower limits.
> 
> I "simply" want to fit a Multivariate distribution to the data and get
> the mean and covariance matrix  of the sample.
> 
> The closest post I saw mentionned the function survreg in the
> survival5 package, but I can't figure out how
> To use it for a non-regression model.
> 
> Thanks,
>            Hugues
> 

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fitting mean and covariance of Multivariate normal with censored data

Reply via email to