The new committed version of p.adjust() contains some problems:
> p.adjust(c(0.05,0.5),method="hommel") [1] 0.05 0.50
No adjustment!
I can't see how the new treatment of NAs can be justified. One needs to distinguish between NAs which represent missing p-values and NAs which represent unknown p-values. In virtually all applications giving rise to NAs, the NAs represent missing p-values which could not be computed because of missing data. In such cases, the observed p-values should definitely be adjusted as if the NAs weren't there, because NAs represent p-values which genuinely don't exist.
I can only think of one situation in which the NAs might represent unknown but existing p-values. This would be when a large experiment has been conducted leading to many p-values. Instead of inputing all the p-values to the p.adjust() function, you decide to enter only the smallest p-values and represent the others using NAs. The trouble with this approach is that it can only be used with method="bonferroni". All the other adjustment methods are step-up or step-down methods or involve closure like Hommel's method. For these methods, you simply have to know all the p-values before you can adjust any of them.
For example, you're returning
> p.adjust(c(0.05,NA,NA,NA,NA,NA,NA,NA,NA,NA),method="fdr") [1] 0.5 NA NA NA NA NA NA NA NA NA
But the unknown p-values might have been like this:
> p.adjust(c(0.05,0.051,0.051,0.051,0.051,0.051,0.051,0.051,0.051,0.051),method="fdr")
[1] 0.051 0.051 0.051 0.051 0.051 0.051 0.051 0.051 0.051 0.051
in which case the first adjusted p-value would have been 0.051 not 0.5.
How can you justify returning 0.5 for the first adjusted p-value, when the correct value could actually be a factor of 10 lower, even when the first p-value is in fact the smallest of the p-values?
At 07:29 AM 9/01/2005, Martin Maechler wrote:
I've thought more and made experiements with R code versions and just now committed a new version of p.adjust() to R-devel --> https://svn.r-project.org/R/trunk/src/library/stats/R/p.adjust.R which does sensible NA handling by default and *additionally* has an "na.rm" argument (set to FALSE by default). The extended 'Examples' secion on the help page https://svn.r-project.org/R/trunk/src/library/stats/man/p.adjust.Rd shows how the new NA handling is typically much more sensible than using "na.rm = TRUE".
Martin
>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]> >>>>> on Sat, 8 Jan 2005 17:19:23 +0100 writes:
>>>>> "GS" == Gordon K Smyth <[EMAIL PROTECTED]> >>>>> on Sat, 8 Jan 2005 01:11:30 +1100 (EST) writes:
MM> <.............>
GS> p.adjust() unfortunately gives incorrect results when GS> 'p' includes NAs. The results from topTable are GS> correct. topTable() takes care to remove NAs before GS> passing the values to p.adjust().
MM> There's at least one bug in p.adjust(): The "hommel" MM> method currently does not work at all with NAs (and I MM> have an uncommitted fix ready for this bug). OTOH, the MM> current version of p.adjust() ``works'' with NA's, apart MM> from Hommel's method, but by using "n = length(p)" in MM> the correction formulae, i.e. *including* the NAs for MM> determining sample size `n' {my fix to "hommel" would do MM> this as well}.
MM> My question is what p.adjust() should do when there are MM> NA's more generally, or more specifically which `n' to MM> use in the correction formula. Your proposal amounts to MM> ``drop NA's and forget about them till the very end'' MM> (where they are wanted in the result), i.e., your sample MM> size `n' would be sum(!is.na(p)) instead of length(p).
My approach to NAs (in the topTable function is the limma package) is correct in the microarray context where the NAs represent missing (non-existant) p-values which could not be computed.
MM> To me it doesn't seem obvious that this setting "n = MM> #{non-NA observations}" is desirable for all P-value MM> adjustment methods. One argument for keeping ``n = #{all MM> observations}'' at least for some correction methods is MM> the following "continuity" one:
MM> If only a few ``irrelevant'' (let's say > 0.5) P-values MM> are replaced by NA, the adjusted relevant small P-values MM> shouldn't change much, ideally not at all. I'm really MM> no scholar on this topic, but e.g. for "holm" I think I MM> would want to keep ``full n'' because of the above MM> continuity argument.
I don't see how the treatment of NAs follows from this continuity argument. The argument seems to be rather informal and doesn't obviously relate to the concepts of power and control of FWER and FDR, which is what the adjustment method theory is based on.
The NAs should surely be treated in a consistent way for all the adjustment methods.
BTW, for "fdr", I don't see a MM> straightforward way to achieve the desired continuity. MM> 5D Of course, p.adjust() could adopt the possibility of MM> chosing how NA's should be treated e.g. by another MM> argument ``use.na = TRUE/FALSE'' and hence allow both MM> versions.
MM> Feedback very welcome, particularly from ``P-value MM> experts'' ;-)
MM> Martin Maechler, ETH Zurich
Gordon
______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel