Re: [Rd] Silent failure with NA results in fligner.test()

2021-01-24 Thread Karolis K
Thank you a lot for the update.

I understand leaving NaN/NA in these cases, that can make sense.
But feels to me that this situation could maybe produce a warning, to inform 
the user of what had happened?

Kind regards,
Karolis K.

> On Jan 24, 2021, at 6:52 PM, Kurt Hornik  wrote:
> 
>> Karolis K writes:
> 
>> To me it seems like returning chi-sq = 0 and p-value = 1 would make sense.
>> It would also be consistent with other scenarios of equal variance in all
>> groups. One example:
> 
>> fligner.test(1:8, gl(2,4))
>> #Fligner-Killeen test of homogeneity of variances
>> #
>> # data:  1:8 and gl(2, 4)
>> # Fligner-Killeen:med chi-squared = 0, df = 1, p-value = 1
> 
>> But I am aware that other tests implemented in stats:: sometimes throw
>> errors in similar situations.
> 
>> Maybe someone more familiar with the behaviour and philosophy behind
>> stats:: preferences can add more weight here?
> 
> Thanks for spotting this.  After some internal discussions, we've come
> to the conclusion that there is no "obvious" way to handle situations
> where the Fligner-Killeen:med chi-squared test statistic is undefined
> (i.e., when the denominator is zero).  [Owing to the discreteness of the
> ranks, trying to take limits will not work.]  For now, these
> sitatuations consistently give NaN/NA instead of errors (and the numeric
> computations were improved so that it should no longer possible to get a
> zero denominator and a non-zero numerator).
> 
> Best
> -k
> 
>> Warm regards,
>> Karolis K.
> 
>>  [[alternative HTML version deleted]]
> 
>> __
>> R-devel@r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel 
>> 

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Silent failure with NA results in fligner.test()

2021-01-24 Thread Kurt Hornik
> Karolis K writes:

> To me it seems like returning chi-sq = 0 and p-value = 1 would make sense.
> It would also be consistent with other scenarios of equal variance in all
> groups. One example:

> fligner.test(1:8, gl(2,4))
> #Fligner-Killeen test of homogeneity of variances
> #
> # data:  1:8 and gl(2, 4)
> # Fligner-Killeen:med chi-squared = 0, df = 1, p-value = 1

> But I am aware that other tests implemented in stats:: sometimes throw
> errors in similar situations.

> Maybe someone more familiar with the behaviour and philosophy behind
> stats:: preferences can add more weight here?

Thanks for spotting this.  After some internal discussions, we've come
to the conclusion that there is no "obvious" way to handle situations
where the Fligner-Killeen:med chi-squared test statistic is undefined
(i.e., when the denominator is zero).  [Owing to the discreteness of the
ranks, trying to take limits will not work.]  For now, these
sitatuations consistently give NaN/NA instead of errors (and the numeric
computations were improved so that it should no longer possible to get a
zero denominator and a non-zero numerator).

Best
-k

> Warm regards,
> Karolis K.

>   [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Silent failure with NA results in fligner.test()

2020-12-24 Thread Martin Maechler
Not sure
If all of the variances are zero,  they are homogenous in that sense,
and I would give a  p-value of 1  ..
if only *some* of the variances are zero... it's less easy.

I still would try to *not* give an error in such cases  and even
prefer  NA  statistic and p-value..  because yes, these are "not
available" for such data.
But it is not strictly an error to try such a test on data of the
correct format...   Consequently, personally I would even try to not
give the current error ... but rather return NA values here:
>  if (all(x == 0))
>  stop("data are essentially constant")

On Mon, Dec 21, 2020 at 12:22 PM Kurt Hornik  wrote:
>
> > Karolis K writes:
>
> Any preferences?
>
> Best
> -k
>
> > Hello,
> > In certain cases fligner.test() returns NaN statistic and NA p-value.
> > The issue happens when, after centering with the median, all absolute 
> > values become constant, which ten leads to identical ranks.
>
> > Below are a few examples:
>
> > # 2 groups, 2 values each
> > # issue is caused by residual values after centering (-0.5, 0.5, -0.5, 0.5)
> > # then, after taking the absolute value, all the ranks become identical.
> >> fligner.test(c(2,3,4,5), gl(2,2))
>
> > Fligner-Killeen test of homogeneity of variances
>
> > data:  c(2, 3, 4, 5) and gl(2, 2)
> > Fligner-Killeen:med chi-squared = NaN, df = 1, p-value = NA
>
>
> > # similar situation with more observations and 3 groups
> >> fligner.test(c(2,3,2,3,4,4,5,5,8,9,9,8), gl(3,4))
>
> > Fligner-Killeen test of homogeneity of variances
>
> > data:  c(2, 3, 2, 3, 4, 4, 5, 5, 8, 9, 9, 8) and gl(3, 4)
> > Fligner-Killeen:med chi-squared = NaN, df = 2, p-value = NA
>
>
> > Two simple patches are proposed below. One returns an error, and another 
> > returns a p-value of 1.
> > Not sure which one is more appropriate, so submitting both.
>
> > Warm regards,
> > Karolis Koncevičius
>
> > ---
>
> > Index: fligner.test.R
> > ===
> > --- fligner.test.R(revision 79650)
> > +++ fligner.test.R(working copy)
> > @@ -59,8 +59,13 @@
> >  stop("data are essentially constant")
>
> >  a <- qnorm((1 + rank(abs(x)) / (n + 1)) / 2)
> > -STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
> > -STATISTIC <- (STATISTIC - n * mean(a)^2) / var(a)
> > +if (var(a) > 0) {
> > +STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
> > +STATISTIC <- (STATISTIC - n * mean(a)^2) / var(a)
> > +}
> > +else {
> > +STATISTIC <- 0
> > +}
> >  PARAMETER <- k - 1
> >  PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
> >  names(STATISTIC) <- "Fligner-Killeen:med chi-squared”
>
> > ---
>
> > Index: fligner.test.R
> > ===
> > --- fligner.test.R(revision 79650)
> > +++ fligner.test.R(working copy)
> > @@ -57,6 +57,8 @@
> >  x <- x - tapply(x,g,median)[g]
> >  if (all(x == 0))
> >  stop("data are essentially constant")
> > +if (var(abs(x)) == 0)
> > +stop("absolute residuals from the median are essentially constant")
>
> >  a <- qnorm((1 + rank(abs(x)) / (n + 1)) / 2)
> >  STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
>
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Martinhttp://stat.ethz.ch/~maechler
Seminar für Statistik, ETH Zürich HG G 16   Rämistrasse 101
CH-8092 Zurich, SWITZERLAND   ☎ +41 44 632 3408<><

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Silent failure with NA results in fligner.test()

2020-12-23 Thread Karolis K
To me it seems like returning chi-sq = 0 and p-value = 1 would make sense.
It would also be consistent with other scenarios of equal variance in all
groups. One example:

fligner.test(1:8, gl(2,4))
#Fligner-Killeen test of homogeneity of variances
#
# data:  1:8 and gl(2, 4)
# Fligner-Killeen:med chi-squared = 0, df = 1, p-value = 1

But I am aware that other tests implemented in stats:: sometimes throw
errors in similar situations.

Maybe someone more familiar with the behaviour and philosophy behind
stats:: preferences can add more weight here?

Warm regards,
Karolis K.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Silent failure with NA results in fligner.test()

2020-12-21 Thread Kurt Hornik
> Karolis K writes:

Any preferences?

Best
-k

> Hello,
> In certain cases fligner.test() returns NaN statistic and NA p-value.
> The issue happens when, after centering with the median, all absolute values 
> become constant, which ten leads to identical ranks.

> Below are a few examples:

> # 2 groups, 2 values each
> # issue is caused by residual values after centering (-0.5, 0.5, -0.5, 0.5)
> # then, after taking the absolute value, all the ranks become identical.
>> fligner.test(c(2,3,4,5), gl(2,2))

> Fligner-Killeen test of homogeneity of variances

> data:  c(2, 3, 4, 5) and gl(2, 2)
> Fligner-Killeen:med chi-squared = NaN, df = 1, p-value = NA


> # similar situation with more observations and 3 groups
>> fligner.test(c(2,3,2,3,4,4,5,5,8,9,9,8), gl(3,4))

> Fligner-Killeen test of homogeneity of variances

> data:  c(2, 3, 2, 3, 4, 4, 5, 5, 8, 9, 9, 8) and gl(3, 4)
> Fligner-Killeen:med chi-squared = NaN, df = 2, p-value = NA


> Two simple patches are proposed below. One returns an error, and another 
> returns a p-value of 1.
> Not sure which one is more appropriate, so submitting both.

> Warm regards,
> Karolis Koncevičius

> ---

> Index: fligner.test.R
> ===
> --- fligner.test.R(revision 79650)
> +++ fligner.test.R(working copy)
> @@ -59,8 +59,13 @@
>  stop("data are essentially constant")
 
>  a <- qnorm((1 + rank(abs(x)) / (n + 1)) / 2)
> -STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
> -STATISTIC <- (STATISTIC - n * mean(a)^2) / var(a)
> +if (var(a) > 0) {
> +STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
> +STATISTIC <- (STATISTIC - n * mean(a)^2) / var(a)
> +}
> +else {
> +STATISTIC <- 0
> +}
>  PARAMETER <- k - 1
>  PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
>  names(STATISTIC) <- "Fligner-Killeen:med chi-squared”

> ---

> Index: fligner.test.R
> ===
> --- fligner.test.R(revision 79650)
> +++ fligner.test.R(working copy)
> @@ -57,6 +57,8 @@
>  x <- x - tapply(x,g,median)[g]
>  if (all(x == 0))
>  stop("data are essentially constant")
> +if (var(abs(x)) == 0)
> +stop("absolute residuals from the median are essentially constant")
 
>  a <- qnorm((1 + rank(abs(x)) / (n + 1)) / 2)
>  STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel