Re: [Rd] p.adjust(s), was "Re: [BioC] limma and p-values"

Gordon Smyth Sun, 16 Jan 2005 00:45:48 -0800

The new committed version of p.adjust() contains some problems:

> p.adjust(c(0.05,0.5),method="hommel")
[1] 0.05 0.50

No adjustment!

I can't see how the new treatment of NAs can be justified. One needs to distinguish between NAs which represent missing p-values and NAs which represent unknown p-values. In virtually all applications giving rise to NAs, the NAs represent missing p-values which could not be computed because of missing data. In such cases, the observed p-values should definitely be adjusted as if the NAs weren't there, because NAs represent p-values which genuinely don't exist.

I can only think of one situation in which the NAs might represent unknown but existing p-values. This would be when a large experiment has been conducted leading to many p-values. Instead of inputing all the p-values to the p.adjust() function, you decide to enter only the smallest p-values and represent the others using NAs. The trouble with this approach is that it can only be used with method="bonferroni". All the other adjustment methods are step-up or step-down methods or involve closure like Hommel's method. For these methods, you simply have to know all the p-values before you can adjust any of them.

For example, you're returning

> p.adjust(c(0.05,NA,NA,NA,NA,NA,NA,NA,NA,NA),method="fdr")
 [1] 0.5  NA  NA  NA  NA  NA  NA  NA  NA  NA

But the unknown p-values might have been like this: > p.adjust(c(0.05,0.051,0.051,0.051,0.051,0.051,0.051,0.051,0.051,0.051),method="fdr") [1] 0.051 0.051 0.051 0.051 0.051 0.051 0.051 0.051 0.051 0.051

in which case the first adjusted p-value would have been 0.051 not 0.5.

How can you justify returning 0.5 for the first adjusted p-value, when the correct value could actually be a factor of 10 lower, even when the first p-value is in fact the smallest of the p-values?

At 07:29 AM 9/01/2005, Martin Maechler wrote:

I've thought more and made experiements with R code versions
and just now committed a new version of  p.adjust()  to R-devel
--> https://svn.r-project.org/R/trunk/src/library/stats/R/p.adjust.R
which does sensible NA handling by default and
*additionally* has an "na.rm" argument (set to FALSE by
default).  The extended 'Examples' secion on the help page
    https://svn.r-project.org/R/trunk/src/library/stats/man/p.adjust.Rd
shows how the new NA handling is typically much more sensible
than using "na.rm = TRUE".

Martin


>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>>     on Sat, 8 Jan 2005 17:19:23 +0100 writes:

>>>>> "GS" == Gordon K Smyth <[EMAIL PROTECTED]>
>>>>>     on Sat, 8 Jan 2005 01:11:30 +1100 (EST) writes:

    MM>     <.............>

    GS> p.adjust() unfortunately gives incorrect results when
    GS> 'p' includes NAs.  The results from topTable are
    GS> correct.  topTable() takes care to remove NAs before
    GS> passing the values to p.adjust().

    MM> There's at least one bug in p.adjust(): The "hommel"
    MM> method currently does not work at all with NAs (and I
    MM> have an uncommitted fix ready for this bug).  OTOH, the
    MM> current version of p.adjust() ``works'' with NA's, apart
    MM> from Hommel's method, but by using "n = length(p)" in
    MM> the correction formulae, i.e. *including* the NAs for
    MM> determining sample size `n' {my fix to "hommel" would do
    MM> this as well}.

    MM> My question is what p.adjust() should do when there are
    MM> NA's more generally, or more specifically which `n' to
    MM> use in the correction formula. Your proposal amounts to
    MM> ``drop NA's and forget about them till the very end''
    MM> (where they are wanted in the result), i.e., your sample
    MM> size `n' would be sum(!is.na(p)) instead of length(p).

My approach to NAs (in the topTable function is the limma package) is correct in the microarray context where the NAs represent missing (non-existant) p-values which could not be computed.

   MM> To me it doesn't seem obvious that this setting "n =
    MM> #{non-NA observations}" is desirable for all P-value
    MM> adjustment methods. One argument for keeping ``n = #{all
    MM> observations}'' at least for some correction methods is
    MM> the following "continuity" one:

    MM> If only a few ``irrelevant'' (let's say > 0.5) P-values
    MM> are replaced by NA, the adjusted relevant small P-values
    MM> shouldn't change much, ideally not at all.  I'm really
    MM> no scholar on this topic, but e.g. for "holm" I think I
    MM> would want to keep ``full n'' because of the above
    MM> continuity argument.

I don't see how the treatment of NAs follows from this continuity argument. The argument seems to be rather informal and doesn't obviously relate to the concepts of power and control of FWER and FDR, which is what the adjustment method theory is based on.

The NAs should surely be treated in a consistent way for all the adjustment methods.

 BTW, for "fdr", I don't see a
    MM> straightforward way to achieve the desired continuity.
    MM> 5D Of course, p.adjust() could adopt the possibility of
    MM> chosing how NA's should be treated e.g. by another
    MM> argument ``use.na = TRUE/FALSE'' and hence allow both
    MM> versions.

    MM> Feedback very welcome, particularly from ``P-value
    MM> experts'' ;-)

MM> Martin Maechler, ETH Zurich


Gordon

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] p.adjust(s), was "Re: [BioC] limma and p-values"

Reply via email to