On Mon, 24 May 2021, Adrian Dușa wrote:

On Mon, May 24, 2021 at 2:11 PM Greg Minshall <minsh...@umich.edu> wrote:

[...]
if you have 500 columns of possibly-NA'd variables, you could have one
column of 500 "bits", where each bit has one of N values, N being the
number of explanations the corresponding column has for why the NA
exists.


PLEASE DO NOT DO THIS!

It will not work reliably, as has been explained to you ad nauseam in
this thread.

If you distribute code that does this it will only lead to bug reports
on R that will waste R-core time.

As Alex explained, you can use attributes for this. If you need
operations to preserve attributes across subsetting you can define
subsetting methods that do that.

If you are dead set on doing something in C you can try to develop an
ALTREP class that provides augmented missing value information.

Best,

luke




The mere thought of implementing something like that gives me shivers. Not
to mention such a solution should also be robust when subsetting,
splitting, column and row binding, etc. and everything can be lost if the
user deletes that particular column without realising its importance.

Social science datasets are much more alive and complex than one might
first think: there are multi-wave studies with tens of countries, and
aggregating such data is already a complex process to add even more
complexity on top of that.

As undocumented as they may be, or even subject to change, I think the R
internals are much more reliable that this.

Best wishes,
Adrian



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to