[Rd] phyper accuracy and efficiency (PR#6772)

terra Thu, 15 Apr 2004 09:08:09 -0700

Full_Name: Morten Welinder
Version: snapshot
OS: 
Submission from: (NULL) (65.213.85.218)



Time to kick phyper's tires...

The current version has very serious cancellation issues.  For example, if you
ask
for a small right-tail you are likely to get total cancellation.  For example
phyper(59, 150, 150, 60, FALSE, FALSE) gives 6.372680161e-14.  The right answer
is
dhyper(0, 150, 150, 60, FALSE) which is 5.111204798e-22.

phyper is also really slow for large arguments.

Therefore, I suggest using the code below.  This is a sniplet from Gnumeric, so
you'll have to s/gnm_float/double/ and s/gnum//, etc.  The code isn't perfect.
In fact, if  i*(NR+NB)  is close to   n*NR, then this code can take a while.
Not longer than the old code, though.





/*
 * New phyper implementation.  Copyright 2004 Morten Welinder.
 * Distributed under the GNU General Public License.
 *
 * Thanks to Ian Smith for ideas.
 */
/*
 * Calculate
 *
 *          phyper (i, NR, NB, n, TRUE, FALSE)
 *   [log]  ----------------------------------
 *             dhyper (i, NR, NB, n, FALSE)
 *
 * without actually calling phyper.  This assumes that
 *
 *     i * (NR + NB) <= n * NR
 *
 */
static gnm_float
pdhyper (gnm_float i, gnm_float NR, gnm_float NB, gnm_float n, gboolean log_p)
{
  gnm_float sum = 0;
  gnm_float term = 1;

  while (i > 0 && term >= GNUM_EPSILON * sum) {
          term *= i * (NB - n + i) / (n + 1 - i) / (NR + 1 - i);
          sum += term;
          i--;
  }

  return log_p ? log1pgnum (sum) : 1 + sum;
}


gnm_float
phyper (gnm_float i, gnm_float NR, gnm_float NB, gnm_float n, int lower_tail,
int log_p)
{
        gnm_float d, pd;

#ifdef IEEE_754
        if (isnangnum (i) || isnangnum (NR) || isnangnum (NB) || isnangnum (n))
                return i + NR + NB + n;
#endif

        i = floorgnum (i + 1e-7);
        NR = floorgnum (NR + 0.5);
        NB = floorgnum (NB + 0.5);
        n = floorgnum (n + 0.5);

        if (NR < 0 || NB < 0 || !finitegnum (NR + NB) || n < 0 || n > NR + NB)
                ML_ERR_return_NAN;

        if (i * (NR + NB) > n * NR) {
                /* Swap tails.  */
                gnm_float oldNB = NB;
                NB = NR;
                NR = oldNB;
                i = n - i - 1;
                lower_tail = !lower_tail;
        }

        if (i < 0)
                return R_DT_0;

        d = dhyper (i, NR, NB, n, log_p);
        pd = pdhyper (i, NR, NB, n, log_p);

        return log_p ? R_DT_log (d + pd) : R_D_Lval (d * pd);
}

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-devel

[Rd] phyper accuracy and efficiency (PR#6772)

Reply via email to