Re: [R] Generating samples from truncated multivariate Student-t distribution

David Winsemius Wed, 10 May 2017 12:27:06 -0700

> On May 10, 2017, at 11:02 AM, Czarek Kowalski <czarek230...@gmail.com> wrote:
> 
> Previously I had used another language to make calculations based on
> theory. I have calculated using R and I have received another results.
> My theoretical calculation does not take into account the full
> covariance matrix (only 6 elements from diagonal). Code based on
> theory:
> 
> df = 4;   #degrees of freedom
> sigmas = c(1, 4, 2, 5, 3, 6) # roots of diagonal elements of covariance matrix
> meann = c(55, 40, 50, 35, 45, 30)
> alfa1 = 20; # lower truncation
> beta1 = 60; # upper truncation
> a = (alfa1 - meann)/sigmas;
> b = (beta1 - meann)/sigmas;
> E = meann + sigmas * ((gamma(df - 1)/2)*((df + a^2)^(-(df-1)/2) - (df
> + b^2)^(-(df-1)/2))*df^(df/2))/(2*(pt(b,df)-pt(a,df))*gamma(df/2)*gamma(1/2))


This looks wrong:

(gamma(df - 1)/2)

According to your "theory" (which you have not yet supported with references) 
in the attached pdf file, that should be:

gamma( (df - 1)/2 )

I'm not a statistician and checking your "theoretical" expression for the mean 
of a truncated PDF is not really on-topic for R help. When I look at 
http://www.tonyohagan.co.uk/academic/pdf/trunc_multi_t.PDF I see a lot of 
normalization factors that seem quite different than yours. Perhaps you should 
post any further difficulties to stats.stackexchange.com ?

You may also want to consult:

https://www.jstatsoft.org/article/view/v016c02


Authors:        Saralees Nadarajah, Samuel Kotz
Title:  R Programs for Truncated Distributions
Abstract:       Truncated distributions arise naturally in many practical 
situations. In this note, we provide programs for computing six quantities of 
interest (probability density function, mean, variance, cumulative distribution 
function, quantile function and random numbers) for any truncated distribution: 
whether it is left truncated, right truncated or doubly truncated. The programs 
are written in R: a freely downloadable statistical software.

Best;
David.


> E
> 
> 
> Kind regards
> Czarek
> 
> On 9 May 2017 at 23:50, David Winsemius <dwinsem...@comcast.net> wrote:
>> 
>>> On May 9, 2017, at 2:33 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>>> 
>>> 
>>>> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230...@gmail.com> wrote:
>>>> 
>>>> I have already posted that in attachement - pdf file.
>>> 
>>> I see that now. I failed to scroll to the 3rd page.
>>> 
>>>> I am posting
>>>> plain text here:
>>>> 
>>>>> library(tmvtnorm)
>>>>> meann = c(55, 40, 50, 35, 45, 30)
>>>>> covv = matrix(c(  1, 1, 0, 2, -1, -1,
>>>>                  1, 16, -6, -6, -2, 12,
>>>>                  0, -6, 4, 2, -2, -5,
>>>>                  2, -6, 2, 25, 0, -17,
>>>>                 -1, -2, -2, 0, 9, -5,
>>>>                 -1, 12, -5, -17, -5, 36), 6, 6)
>>>> df = 4
>>>> lower = c(20, 20, 20, 20, 20, 20)
>>>> upper = c(60, 60, 60, 60, 60, 60)
>>>> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper)
>>>> 
>>>> 
>>>>> sum(X1[,1]) / 100000
>>>> [1] 54.98258
>>>> sum(X1[,2]) / 100000
>>>> [1] 40.36153
>>>> sum(X1[,3]) / 100000
>>>> [1] 49.83571
>>>> sum(X1[,4]) / 100000
>>>> [1] 34.69571      # "4th element of mean vector"
>>>> sum(X1[,5]) / 100000
>>>> [1] 44.81081
>>>> sum(X1[,6]) / 100000
>>>> [1] 31.10834
>>>> 
>>>> And corresponding results received using equation (3) from pdf file:
>>>> [54.97,
>>>> 40,
>>>> 49.95,
>>>> 35.31, #  "4th element of mean vector"
>>>> 44.94,
>>>> 31.32]
>>>> 
>>> 
>>> I get similar results for the output from your code,
>>> 
>>> My 100-fold run of your calculations were:
>>> 
>>> meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv, df, lower, 
>>> upper)
>>> colMeans(Xbig)} )
>>> 
>>> describe(meansBig[4,])  # describe is from Hmisc package
>>> 
>>> meansBig[4, ]
>>>      n  missing distinct     Info     Mean      Gmd      .05      .10      
>>> .25
>>>    100        0      100        1     34.7  0.01954    34.68    34.68    
>>> 34.69
>>>    .50      .75      .90      .95
>>>  34.70    34.72    34.72    34.73
>>> 
>>> lowest : 34.65222 34.66675 34.66703 34.66875 34.67566
>>> highest: 34.72939 34.73012 34.73051 34.73742 34.74441
>>> 
>>> 
>>> So agree, 35.31 is outside the plausible range of an RV formed with that 
>>> package, but I don't have any code relating to your calculations from 
>>> theory.
>> 
>> Further investigation:
>> 
>> covDiag <- covv*( row(covv)==col(covv) )  # just the diagonal means
>> 
>> Repeat with all zero covariances:
>> 
>>> meansDiag <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covDiag, df, 
>>> lower, upper)
>> + colMeans(Xbig)} )
>>> describe(meansDiag[4,])
>> meansDiag[4, ]
>>       n  missing distinct     Info     Mean      Gmd      .05      .10      
>> .25
>>     100        0      100        1    35.23  0.02074    35.21    35.21    
>> 35.22
>>     .50      .75      .90      .95
>>   35.23    35.25    35.26    35.26
>> 
>> lowest : 35.18360 35.19756 35.20098 35.20179 35.20622
>> highest: 35.26367 35.26635 35.26791 35.27251 35.27302
>> 
>> So failing to account for the covariances in your theoretical calculations 
>> mostly explains the apparent discrepancy, although your value of 35.31 would 
>> be at the  far end of a statistical distribution and I wonder about some 
>> sort of error in your theoretical calculation, which didn't appear to take 
>> into account the covariance matrix.
>> 
>> Best;
>> David.
>> 
>> 
>> 
>>> 
>>> Best;
>>> David.
>>> 
>>> 
>>>> On 9 May 2017 at 22:17, David Winsemius <dwinsem...@comcast.net> wrote:
>>>>> 
>>>>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> Of course I have expected the difference between theory and a sample
>>>>>> of realizations of RV's and result mean should still be a random
>>>>>> variable. But, for example for 4th element of mean vector: 35.31 -
>>>>>> 34.69571 = 0.61429. It is quite big difference, nieprawdaż? I have
>>>>>> expected that the difference would be smaller because of law of large
>>>>>> numbers (for 10mln samples the difference is quite similar).
>>>>> 
>>>>> I for one have no idea what is meant by a "4th element of mean vector". 
>>>>> So I have now idea what to consider "big". I have found that my 
>>>>> intuitions about multivariate distributions, especially those where the 
>>>>> covariate structure is as complex as you have assumed, are often far from 
>>>>> simulated results.
>>>>> 
>>>>> I suggest you post some code and results.
>>>>> 
>>>>> --
>>>>> David.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> On 9 May 2017 at 21:40, David Winsemius <dwinsem...@comcast.net> wrote:
>>>>>>> 
>>>>>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski <czarek230...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Dear Members,
>>>>>>>> I am working with 6-dimensional Student-t distribution with 4 degrees
>>>>>>>> of freedom truncated to [20; 60]. I have generated 100 000 samples
>>>>>>>> from truncated multivariate Student-t distribution using rtmvt
>>>>>>>> function from package ‘tmvtnorm’. I have also calculated  mean vector
>>>>>>>> using equation (3) from attached pdf. The problem is, that after
>>>>>>>> summing all elements in one column of rtmvt result (and dividing by
>>>>>>>> 100 000) I do not receive the same result as using (3) equation. Could
>>>>>>>> You tell me, what is incorrect, why there is a difference?
>>>>>>> 
>>>>>>> I guess the question is why you would NOT expect a difference between 
>>>>>>> theory and a sample of realizations of RV's? The result mean should 
>>>>>>> still be a random variable, night wahr?
>>>>>>> 
>>>>>>> 
>>>>>>>> Yours faithfully
>>>>>>>> Czarek Kowalski
>>>>>>>> <truncatedT.pdf>______________________________________________
>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide 
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>> 
>>>>>>> David Winsemius
>>>>>>> Alameda, CA, USA
>>>>>>> 
>>>>> 
>>>>> David Winsemius
>>>>> Alameda, CA, USA
>>>>> 
>>> 
>>> David Winsemius
>>> Alameda, CA, USA
>>> 
>>> ______________________________________________
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating samples from truncated multivariate Student-t distribution

Reply via email to