Re: [Rd] xtabs(), factors and NAs

2017-01-21 Thread Milan Bouchet-Valat
Le vendredi 20 janvier 2017 à 18:59 +0100, Martin Maechler a écrit :
> > > > > > > > > > > > Milan Bouchet-Valat 
> > > > > > on Thu, 19 Jan 2017 13:58:31 +0100 writes:
> > Hi all,
> > I know this issue has been discussed a few times in the past already,
> > but Martin Maechler suggested in a bug report [1] that I raise it here.
> > 
> > Basically, there is no (easy) way of printing NAs for all variables
> > when calling xtabs() on factors. Passing 'exclude=NULL,
> > na.action=na.pass' works for character vectors, but not for factors.
> > 
> 
> [ yes, but your example below is *not* showing that ... so may be
>   a bit confusing !]  {Reason: stringsAsFactors etc}
Yes, sorry, that illustrates why one should never try to make an
example prettier in the last minute. For reference, here's the correct
example:

> test <- data.frame(x=c("a",NA), stringsAsFactors=FALSE)
> xtabs(~ x, exclude=NULL, na.action=na.pass, data=test)
x
   a  
   11 

> test <- data.frame(x=factor(c("a",NA)))
> xtabs(~ x, exclude=NULL, na.action=na.pass, data=test)
x
a 
1 


> > > test <- data.frame(x=c("a",NA))
> > > xtabs(~ x, exclude=NULL,
> > 
> > na.action=na.pass, data=test)
> > x
> > a 
> > 1 
> > 
> > > test <- data.frame(x=factor(c("a",NA)))
> > > xtabs(~ x, exclude=NULL,
> > 
> > na.action=na.pass, data=test)
> > x
> > a 
> > 1 
> > 
> > 
> > Even if it's documented, this inconsistency is annoying. When checking
> > data, it is often useful to print all NA values temporarily, without
> > calling addNA() individually on all crossed variables.
> 
>   {Note this is not (just) about print()ing; the issue is
>    about the resulting *object*.}
> > 
> > Would it make sense to add a new argument similar to table()'s useNA
> > which would behave the same for all input vector types?
> 
> You have to be aware that  table()  has been changed since R
> 3.3.2, i.e., is different in R-devel and hence will be different
> in R 3.4.0.
> table()'s handling of NAs has become very involved /
> sophisticated(*), and currently I'd rather like to keep
> xtabs()'s behavior much simpler. 
> 
> Interestingly, after starting to play with data containing NA's and
>   xtabs(*, na.action=na.pass)
> I have already detected bugs (for sparse=TRUE) and cases where
> the current xtabs() behavior seems dubious to me.
> So, the issue is --- as so often --- more involved than assumed initially.
> 
> We (R core) will probably do something, but do need more time
> before we can promise anything more...
OK, thanks. Given for how long this behavior has existed, there's
certainly no hurry...


Regards

> Thank you for raising the issue,
> Martin Maechler, ETH Zurich
> 
> 
> *) R-devel sources always current at
>    https://svn.r-project.org/R/trunk/src/library/base/R/table.R
> 
> > 
> > Regards
> > [1] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=14630

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] xtabs(), factors and NAs

2017-01-20 Thread Martin Maechler
> Milan Bouchet-Valat 
> on Thu, 19 Jan 2017 13:58:31 +0100 writes:

> Hi all,
> I know this issue has been discussed a few times in the past already,
> but Martin Maechler suggested in a bug report [1] that I raise it here.
> 
> Basically, there is no (easy) way of printing NAs for all variables
> when calling xtabs() on factors. Passing 'exclude=NULL,
> na.action=na.pass' works for character vectors, but not for factors.
> 
[ yes, but your example below is *not* showing that ... so may be
  a bit confusing !]  {Reason: stringsAsFactors etc}

> > test <- data.frame(x=c("a",NA))
> > xtabs(~ x, exclude=NULL,
> na.action=na.pass, data=test)
> x
> a 
> 1 
> 
> > test <- data.frame(x=factor(c("a",NA)))
> > xtabs(~ x, exclude=NULL,
> na.action=na.pass, data=test)
> x
> a 
> 1 
> 
> 
> Even if it's documented, this inconsistency is annoying. When checking
> data, it is often useful to print all NA values temporarily, without
> calling addNA() individually on all crossed variables.

  {Note this is not (just) about print()ing; the issue is
   about the resulting *object*.}
> 
> Would it make sense to add a new argument similar to table()'s useNA
> which would behave the same for all input vector types?

You have to be aware that  table()  has been changed since R
3.3.2, i.e., is different in R-devel and hence will be different
in R 3.4.0.
table()'s handling of NAs has become very involved /
sophisticated(*), and currently I'd rather like to keep
xtabs()'s behavior much simpler. 

Interestingly, after starting to play with data containing NA's and
  xtabs(*, na.action=na.pass)
I have already detected bugs (for sparse=TRUE) and cases where
the current xtabs() behavior seems dubious to me.
So, the issue is --- as so often --- more involved than assumed initially.

We (R core) will probably do something, but do need more time
before we can promise anything more...

Thank you for raising the issue,
Martin Maechler, ETH Zurich


*) R-devel sources always current at
   https://svn.r-project.org/R/trunk/src/library/base/R/table.R

> 
> Regards

> [1] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=14630

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] xtabs(), factors and NAs

2017-01-19 Thread Milan Bouchet-Valat
Hi all,

I know this issue has been discussed a few times in the past already,
but Martin Maechler suggested in a bug report [1] that I raise it here.

Basically, there is no (easy) way of printing NAs for all variables
when calling xtabs() on factors. Passing 'exclude=NULL,
na.action=na.pass' works for character vectors, but not for factors.

> test <- data.frame(x=c("a",NA))
> xtabs(~ x, exclude=NULL,
na.action=na.pass, data=test)
x
a 
1 

> test <- data.frame(x=factor(c("a",NA)))
> xtabs(~ x, exclude=NULL,
na.action=na.pass, data=test)
x
a 
1 


Even if it's documented, this inconsistency is annoying. When checking
data, it is often useful to print all NA values temporarily, without
calling addNA() individually on all crossed variables.

Would it make sense to add a new argument similar to table()'s useNA
which would behave the same for all input vector types?


Regards


1: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=14630

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel