Re: [R] differing behavior of mean(), median() and sd() with na.rm

2018-08-23 Thread Rolf Turner



On 08/23/2018 06:15 PM, Ivan Calandra wrote:


Thanks all for the enlightenment.

So, it does make sense that mean() produces NaN and median()/sd() NA, 
from a calculation point of view at least.
But I still think it also makes sense that the mean of NA is NA as well, 
be it only for consistency with other functions. That's just my opinion 
of course. I can still convert NaN to NA at the end if I need to.


But the mean of NA *is* NA!


x <- NA
mean(x)
[1] NA


This is *not* the same scenario as having nothing left after *removing* 
all NAs:



x <- rep(NA,3)
mean(x,na.rm=TRUE > [1] NaN


Seems quite consistent/coherent to me.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] differing behavior of mean(), median() and sd() with na.rm

2018-08-22 Thread Ivan Calandra

Thanks all for the enlightenment.

So, it does make sense that mean() produces NaN and median()/sd() NA, 
from a calculation point of view at least.
But I still think it also makes sense that the mean of NA is NA as well, 
be it only for consistency with other functions. That's just my opinion 
of course. I can still convert NaN to NA at the end if I need to.


Best,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 22/08/2018 18:41, Ted Harding wrote:

I think that one can usefully look at this question from the
point of view of what "NaN" and "NA" are abbreviations for
(at any rate, according to the understanding I have adopted
since many years -- maybe over-simplified).

NaN: Mot a Number
NA: Not Available

So NA is typically used for missing values, whereas NaN
represents the reults of numerical calculations which
cannot give a result which is a definite number,

Hence 0/0 is not a number, so NaN; similarly Inf/Inf.

Thus, with your x <- c(NA, NA, NA) mean(x, na.rm=TRUE)
sum(x, na.rm=TRUE) = 0, since the set of values of x
with na.rm=TRUE is empty so the number of elements
in x is 0; hence mean = 0/0 = NaN.

But for median(x, na.rm=TRUE), because there are no available
elements in x with na.rm=TRUE, and the median is found by
searching among available elements for the value which
divides the set of values into two halves, the median
is not available, hence NA.

Best wishes to all,
Ted.

On Wed, 2018-08-22 at 11:24 -0400, Marc Schwartz via R-help wrote:

Hi,

It might even be worthwhile to review this recent thread on R-Devel:

   https://stat.ethz.ch/pipermail/r-devel/2018-July/076377.html

which touches upon a subtly related topic vis-a-vis NaN handling.

Regards,

Marc Schwartz



On Aug 22, 2018, at 10:55 AM, Bert Gunter  wrote:

... And FWIW (not much, I agree), note that if z = numeric(0) and sum(z) =
0, then mean(z) = NaN makes sense, as length(z) = 0, so dividing by 0 gives
NaN. So you can see the sorts of issues you may need to consider.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Aug 22, 2018 at 7:47 AM Bert Gunter  wrote:


Actually, the dissonance is a bit more basic.

After xxx(, na.rm=TRUE) with all NA's in ... you have numeric(0). So
what you see is actually:


z <- numeric(0)
mean(z)

[1] NaN

median(z)

[1] NA

sd(z)

[1] NA

sum(z)

[1] 0
etc.

I imagine that there may be more of these little inconsistencies due to
the organic way R evolved over time. What the conventions should be  can be
purely a matter of personal opinion in the absence of accepted standards.
But I would look to see what accepted standards were, if any, first.

-- Bert


On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra  wrote:


Dear useRs,

I have just noticed that when input is only NA with na.rm=TRUE, mean()
results in NaN, whereas median() and sd() produce NA. Shouldn't it all
be the same? I think NA makes more sense than NaN in that case.

x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1]
NAsd(x, na.rm=TRUE) [1] NA

Thanks for any feedback.

Best,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] differing behavior of mean(), median() and sd() with na.rm

2018-08-22 Thread Ted Harding
I think that one can usefully look at this question from the
point of view of what "NaN" and "NA" are abbreviations for
(at any rate, according to the understanding I have adopted
since many years -- maybe over-simplified).

NaN: Mot a Number
NA: Not Available

So NA is typically used for missing values, whereas NaN
represents the reults of numerical calculations which
cannot give a result which is a definite number,

Hence 0/0 is not a number, so NaN; similarly Inf/Inf.

Thus, with your x <- c(NA, NA, NA) mean(x, na.rm=TRUE)
sum(x, na.rm=TRUE) = 0, since the set of values of x
with na.rm=TRUE is empty so the number of elements
in x is 0; hence mean = 0/0 = NaN.

But for median(x, na.rm=TRUE), because there are no available
elements in x with na.rm=TRUE, and the median is found by
searching among available elements for the value which
divides the set of values into two halves, the median
is not available, hence NA.

Best wishes to all,
Ted.

On Wed, 2018-08-22 at 11:24 -0400, Marc Schwartz via R-help wrote:
> Hi,
> 
> It might even be worthwhile to review this recent thread on R-Devel:
> 
>   https://stat.ethz.ch/pipermail/r-devel/2018-July/076377.html
> 
> which touches upon a subtly related topic vis-a-vis NaN handling.
> 
> Regards,
> 
> Marc Schwartz
> 
> 
> > On Aug 22, 2018, at 10:55 AM, Bert Gunter  wrote:
> > 
> > ... And FWIW (not much, I agree), note that if z = numeric(0) and sum(z) =
> > 0, then mean(z) = NaN makes sense, as length(z) = 0, so dividing by 0 gives
> > NaN. So you can see the sorts of issues you may need to consider.
> > 
> > Bert Gunter
> > 
> > "The trouble with having an open mind is that people keep coming along and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > 
> > 
> > On Wed, Aug 22, 2018 at 7:47 AM Bert Gunter  wrote:
> > 
> >> Actually, the dissonance is a bit more basic.
> >> 
> >> After xxx(, na.rm=TRUE) with all NA's in ... you have numeric(0). So
> >> what you see is actually:
> >> 
> >>> z <- numeric(0)
> >>> mean(z)
> >> [1] NaN
> >>> median(z)
> >> [1] NA
> >>> sd(z)
> >> [1] NA
> >>> sum(z)
> >> [1] 0
> >> etc.
> >> 
> >> I imagine that there may be more of these little inconsistencies due to
> >> the organic way R evolved over time. What the conventions should be  can be
> >> purely a matter of personal opinion in the absence of accepted standards.
> >> But I would look to see what accepted standards were, if any, first.
> >> 
> >> -- Bert
> >> 
> >> 
> >> On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra  wrote:
> >> 
> >>> Dear useRs,
> >>> 
> >>> I have just noticed that when input is only NA with na.rm=TRUE, mean()
> >>> results in NaN, whereas median() and sd() produce NA. Shouldn't it all
> >>> be the same? I think NA makes more sense than NaN in that case.
> >>> 
> >>> x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1]
> >>> NAsd(x, na.rm=TRUE) [1] NA
> >>> 
> >>> Thanks for any feedback.
> >>> 
> >>> Best,
> >>> Ivan
> >>> 
> >>> --
> >>> Dr. Ivan Calandra
> >>> TraCEr, laboratory for Traceology and Controlled Experiments
> >>> MONREPOS Archaeological Research Centre and
> >>> Museum for Human Behavioural Evolution
> >>> Schloss Monrepos
> >>> 56567 Neuwied, Germany
> >>> +49 (0) 2631 9772-243
> >>> https://www.researchgate.net/profile/Ivan_Calandra
> >>> 
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>> 
> >> 
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] differing behavior of mean(), median() and sd() with na.rm

2018-08-22 Thread Marc Schwartz via R-help
Hi,

It might even be worthwhile to review this recent thread on R-Devel:

  https://stat.ethz.ch/pipermail/r-devel/2018-July/076377.html

which touches upon a subtly related topic vis-a-vis NaN handling.

Regards,

Marc Schwartz


> On Aug 22, 2018, at 10:55 AM, Bert Gunter  wrote:
> 
> ... And FWIW (not much, I agree), note that if z = numeric(0) and sum(z) =
> 0, then mean(z) = NaN makes sense, as length(z) = 0, so dividing by 0 gives
> NaN. So you can see the sorts of issues you may need to consider.
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Wed, Aug 22, 2018 at 7:47 AM Bert Gunter  wrote:
> 
>> Actually, the dissonance is a bit more basic.
>> 
>> After xxx(, na.rm=TRUE) with all NA's in ... you have numeric(0). So
>> what you see is actually:
>> 
>>> z <- numeric(0)
>>> mean(z)
>> [1] NaN
>>> median(z)
>> [1] NA
>>> sd(z)
>> [1] NA
>>> sum(z)
>> [1] 0
>> etc.
>> 
>> I imagine that there may be more of these little inconsistencies due to
>> the organic way R evolved over time. What the conventions should be  can be
>> purely a matter of personal opinion in the absence of accepted standards.
>> But I would look to see what accepted standards were, if any, first.
>> 
>> -- Bert
>> 
>> 
>> On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra  wrote:
>> 
>>> Dear useRs,
>>> 
>>> I have just noticed that when input is only NA with na.rm=TRUE, mean()
>>> results in NaN, whereas median() and sd() produce NA. Shouldn't it all
>>> be the same? I think NA makes more sense than NaN in that case.
>>> 
>>> x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1]
>>> NAsd(x, na.rm=TRUE) [1] NA
>>> 
>>> Thanks for any feedback.
>>> 
>>> Best,
>>> Ivan
>>> 
>>> --
>>> Dr. Ivan Calandra
>>> TraCEr, laboratory for Traceology and Controlled Experiments
>>> MONREPOS Archaeological Research Centre and
>>> Museum for Human Behavioural Evolution
>>> Schloss Monrepos
>>> 56567 Neuwied, Germany
>>> +49 (0) 2631 9772-243
>>> https://www.researchgate.net/profile/Ivan_Calandra
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] differing behavior of mean(), median() and sd() with na.rm

2018-08-22 Thread Bert Gunter
... And FWIW (not much, I agree), note that if z = numeric(0) and sum(z) =
0, then mean(z) = NaN makes sense, as length(z) = 0, so dividing by 0 gives
NaN. So you can see the sorts of issues you may need to consider.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Aug 22, 2018 at 7:47 AM Bert Gunter  wrote:

> Actually, the dissonance is a bit more basic.
>
> After xxx(, na.rm=TRUE) with all NA's in ... you have numeric(0). So
> what you see is actually:
>
> > z <- numeric(0)
> > mean(z)
> [1] NaN
> > median(z)
> [1] NA
> > sd(z)
> [1] NA
> > sum(z)
> [1] 0
> etc.
>
> I imagine that there may be more of these little inconsistencies due to
> the organic way R evolved over time. What the conventions should be  can be
> purely a matter of personal opinion in the absence of accepted standards.
> But I would look to see what accepted standards were, if any, first.
>
> -- Bert
>
>
> On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra  wrote:
>
>> Dear useRs,
>>
>> I have just noticed that when input is only NA with na.rm=TRUE, mean()
>> results in NaN, whereas median() and sd() produce NA. Shouldn't it all
>> be the same? I think NA makes more sense than NaN in that case.
>>
>> x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1]
>> NAsd(x, na.rm=TRUE) [1] NA
>>
>> Thanks for any feedback.
>>
>> Best,
>> Ivan
>>
>> --
>> Dr. Ivan Calandra
>> TraCEr, laboratory for Traceology and Controlled Experiments
>> MONREPOS Archaeological Research Centre and
>> Museum for Human Behavioural Evolution
>> Schloss Monrepos
>> 56567 Neuwied, Germany
>> +49 (0) 2631 9772-243
>> https://www.researchgate.net/profile/Ivan_Calandra
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] differing behavior of mean(), median() and sd() with na.rm

2018-08-22 Thread Bert Gunter
Actually, the dissonance is a bit more basic.

After xxx(, na.rm=TRUE) with all NA's in ... you have numeric(0). So
what you see is actually:

> z <- numeric(0)
> mean(z)
[1] NaN
> median(z)
[1] NA
> sd(z)
[1] NA
> sum(z)
[1] 0
etc.

I imagine that there may be more of these little inconsistencies due to the
organic way R evolved over time. What the conventions should be  can be
purely a matter of personal opinion in the absence of accepted standards.
But I would look to see what accepted standards were, if any, first.

-- Bert


On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra  wrote:

> Dear useRs,
>
> I have just noticed that when input is only NA with na.rm=TRUE, mean()
> results in NaN, whereas median() and sd() produce NA. Shouldn't it all
> be the same? I think NA makes more sense than NaN in that case.
>
> x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1]
> NAsd(x, na.rm=TRUE) [1] NA
>
> Thanks for any feedback.
>
> Best,
> Ivan
>
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://www.researchgate.net/profile/Ivan_Calandra
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] differing behavior of mean(), median() and sd() with na.rm

2018-08-22 Thread Duncan Murdoch

On 22/08/2018 10:33 AM, Ivan Calandra wrote:

Dear useRs,

I have just noticed that when input is only NA with na.rm=TRUE, mean()
results in NaN, whereas median() and sd() produce NA. Shouldn't it all
be the same? I think NA makes more sense than NaN in that case.


The mean can be defined as sum(x)/length(x), so if x is length 0, you 
get 0/0 which is NaN.


median(x) is documented in its help page to give NA for x of length 0.

sd(x) is documented to give an error for such x and NA for length 1, but 
it gives NA for both.


Duncan Murdoch


x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1]
NAsd(x, na.rm=TRUE) [1] NA

Thanks for any feedback.

Best,
Ivan



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] differing behavior of mean(), median() and sd() with na.rm

2018-08-22 Thread Ivan Calandra

Dear useRs,

I have just noticed that when input is only NA with na.rm=TRUE, mean() 
results in NaN, whereas median() and sd() produce NA. Shouldn't it all 
be the same? I think NA makes more sense than NaN in that case.


x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1] 
NAsd(x, na.rm=TRUE) [1] NA


Thanks for any feedback.

Best,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.