Re: [R] Chebyshev Inequality — MVUE

David Winsemius Sun, 10 Jul 2011 18:17:05 -0700

The formula from Finley is reproduced in Johnson, Kotz, andBalakrishnan's "Continuous Distributions: Vol. 1" in the beginning oftheir Log Normal chapter. I am not clear that the recursive formula inW. Huber's spreadsheet is a correct representation of the iterativeversion there, but cannot claim to have disproven it either. (I amsuggesting checking it against references.)

It certainly seems to deliver results close to the results of thepsi() approximation, which is also reproduced in that text. (And on athree year-old computer, the "converges slowly" comment does not seemto have meaning.) Results have no discernible time lag, so I thinkspeed of convergence in 1941 (at the end of the mechanical computationera) may have quite a different meaning in the electronic era.


--
David.

On Jul 10, 2011, at 7:42 PM, Durant, James T. (ATSDR/DTEM/PRMSB) wrote:

Ah, thanks so much.

I found the excel spreadsheet almost right after I posted to the rgroup. I had concerns about using Chebyshev and wanted to reproduceDr. Whubers simulation to see for myself how it performs. To beclear I normally use Lands exact or bootstrap for the UCL (orsometimes take a Bayesian approach with an uninformed prior).Someone wanted me to use proUCL from EPA for something and I hadnever seen a Chebyshev inequality until then, so I was curious aboutits performance.


Thanks to you both! You have made my day.


Jim


----- Original Message -----
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Sunday, July 10, 2011 03:14 PM
To: ted.hard...@wlandres.net <ted.hard...@wlandres.net>

Cc: Durant, James T. (ATSDR/DTEM/PRMSB); r-help@r-project.org <r-help@r-project.org>

Subject: Re: [R] Chebyshev Inequality  — MVUE


On Jul 10, 2011, at 2:49 PM, (Ted Harding) wrote:

On 10-Jul-11 16:27:04, Durant, James T. (ATSDR/DTEM/PRMSB) wrote:

Hello,
I was interested in trying to write an R script to calculate a
UCL for a lognormal distribution using the Chebyshev Inequality
-- MVUE Approach (based on EPA’s guidance found in
http://www.epa.gov/oswer/riskassessment/pdf/ucl.pdf).
This looks like it should be straight forward, but I am need to
calculate an MVUE for the population mean and an MVUE for the
population variance, which requires a value (g_n) from a table A7,
found in Aitchison and Brown (1969): The lognormal distribution.
I have looked across the RSiteSearch and can not seem to find a
function that will give me g_n or the MVUE for mean and variance
of lognormal distribution.

Is there an R function that will give me g_n or will calculate
an MVUE for the population mean and variance for the lognormal
distribution?

VR
Jim
James T. Durant, MSPH CIH
Emergency Response Coordinator
US Agency for Toxic Substances and Disease Registry
Atlanta, GA 30341
770-378-1695


Some quick comments. I will try to repond more fully later.

1. The Chebyshev inequality is usually very conservative.
As a simple example, consider X with a negative exponential
distribution with density exp(x), so that the population
mean is 1 and the population variance is also 1.

Then, for a factor K, Chebyshev says that

Prob(|X-1] > K*1) < 1/(K^2).

This is only informative if K>1. So (e.g.) take K=2. Then the
Chebyshev
result is that this Prob < 1/4. HOwever, because X is positive, the

event in question is X > 1 + 2 = 3 so Prob is exp(-3) = 0.0498 <1/20.


The reference you cite suggests ("Exhibit 5") applying the method to
log-transformed data, which for lognormal data would be normally
distributed. So apply Chebyshev to N(0,1) (mean=0, var=1). Then

Prob(|X-0| > K*1)  < 1/(K^2) as before.

Now take K=2 again (i.e. outside +/- 2 SDs, so Prob approx=0.05).
But Chebyshev still says "Prob < 1/4 = 0.25".

So, as a first comment, I am seriously wondering about the wisdom
of basing an approach on Chebyshev's inequality. Note also the
comments in your reference at the end of that section (bottom of
page 12) headed "Caveats about the Chebyshev method.", which is
essentially a warning on similar lines to the above.

2. The function in the reference you cite is not "g_n" but "psi_n",
and the Table cited from Aitchison and Brown is not A7 but A2.

On page 45 of Aitchison and Brown (1969), section 5.41 "The Method
of Maximum Likelihood", the function psi_n is defined (Eqn 5.38)
so as to be applicable to the sufficient statistics mean(log(X))
and var(log(X)) to yield unbiased estimators of the population
mean of X and the population variance of X (Eqns (5.40) and (5.42)).

psi_n is defined as an infinite series which, according to A&B
(page 46) "converges only slowly", and they exhibit a finite-form
asymptotic approximation to it (Eqn (5.43)) which is accurate
asyn=mptotically to O(1/(n^3)). This fairly simple expression
would be easy to define as a function in R:

psi <- function(t,n){
exp(t)*(1 - t*(t+1)/n + (t^2)*(3(t^2) + 22*t + 21)/(6*(n^2)))
}


ITYM:
psi <- function(t,n){
  exp(t)*(1 - t*(t+1)/n + (t^2)*(3*(t^2) + 22*t + 21)/(6*(n^2)))
 }

I was doing a bit of searching an found some VB code that whuber (and
am wondering if it's the same whuber as frequently makes cogent posts
on stats.stackexchange.com ?)  had posted in an Excel macro about ten
years ago that claimed to have reproduced the A9 Table in Gilbert.

http://www.quantdec.com/envstats/software/ln_mvue.xls

His macro was named Finney and I transposed it into R:

Finney <- function(m , z){
           aTol <- 0.0000000001
            iterMax <- 1000
          if (m <= -1) {# issue an error
                    error("Finney = 0#")}
            x <- z * m * m / (m + 1)
            if (abs(x) < aTol) { return(Finney = 1L)}
     # This is the correct answer.
            iMax = abs(trunc(z) + 1) + 20
            if (iMax > iterMax) {error("iMax > iterMax")}
     # Init
            a = 1L
            g = a       # Lead terms

           for ( i in seq(iMax) ) {
        # Test for convergence
               if (abs(a) <= aTol * abs(g)) {
                              break()} # Compute the next term
               a <-  a * x / (m + 2 * (i - 1)) / i
        #'
        #' Accumulate terms
        #'
               g = g + a}  #   Next
               return(g)
         }

The order of the arguments is reversed but they seem to offer similar
results:

psi(2, 30)

[1] 6.332695

Finney(30, 2)

[1] 6.254139

Finney(60, 1.5)

[1] 4.230381

psi(1.5, 60)

[1] 4.229944

I would think that a conservative statistical method _should_ be used
when assessing toxic risks as the OP might to be doing, given his
address and title.

--
David.


Hoping this helps. As I say, I hope to find time later to look
at this in more detail.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 10-Jul-11                                       Time: 19:49:39
------------------------------ XFMail ------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chebyshev Inequality — MVUE

Reply via email to