Re: [Rcpp-devel] Missing values

Romain Francois Fri, 16 Nov 2012 00:26:08 -0800

Thanks for exploring these issue. This looks very useful.


I get:

> str( first_log(NA) )
 logi TRUE
> str( first_int(NA_integer_) )
 int NA
> str( first_num(NA_real_) )
 num NA
> str( first_char(NA_character_) )
 chr "NA"

For first_log: a bool can either be true or false. In R logical vectorsare represented as arrays of ints. When we coerce to bool, we testwhether the value is not 0. This works for most cases. I guessconversion to bool should be avoided.


We have the is_na template function that can help:

> evalCpp( 'traits::is_na<LGLSXP>( NA_LOGICAL )' )
[1] TRUE
> evalCpp( 'traits::is_na<REALSXP>( NA_REAL )' )
[1] TRUE

And from this I can see we don't have is_na<STRSXP>, will fix this.

> str( evalCpp( 'traits::get_na<REALSXP>()' ) )
 num NA
> str( evalCpp( 'traits::get_na<INTSXP>()' ) )
 int NA

I guess we could come up with a nicer syntax for these, maybe staticfunctions in Vector<> so that we could do :


IntegerVector::is_na( )
NumericVector::get_na( )
...

More below:


Le 15/11/12 23:36, Hadley Wickham a écrit :

Hi all,

I'm working on a description of how missing values work in Rcpp
(expanding on FAQ 3.4).  I'd really appreciate any comments,
corrections or suggestions on the text below.

Thanks!

Hadley


# Missing values

If you're working with missing values, you need to know two things:

* what happens when you put missing values in scalars (e.g. `double`)
* how to get and set missing values in vectors (e.g. `NumericVector`)

## Scalars

The following code explores what happens when you coerce the first
element of a vector into the corresponding scalar:

     cppFunction('int first_int(IntegerVector x) {
       return(x[0]);
     }')
     cppFunction('double first_num(NumericVector x) {
       return(x[0]);
     }')
     cppFunction('std::string first_char(CharacterVector x) {
       return((std::string) x[0]);
     }')
     cppFunction('bool first_log(LogicalVector x) {
       return(x[0]);
     }')

     first_log(NA)
     first_int(NA_integer_)
     first_num(NA_real_)
     first_char(NA_character_)

So

* `NumericVector` -> `double`: NAN

* `IntegerVector` -> `int`: NAN (not sure how this works given that
integer types don't usually have a missing value)


> str( evalCpp( 'std::numeric_limits<int>::min()' ) )
 int NA

This is how NA_integer_ is represented.

* `CharacterVector` -> `std::string`: the string "NA"


Ouch. We definitely need to fix this. Will do.

* `LogicalVector` -> `bool`: TRUE

If you're working with doubles, depending on your problem, you may be
able to get away with ignoring missing values and working with NaNs.
R's missing values are a special type of the IEEE 754 floating point
number NaN (not a number). That means if you coerce them to `double`
or `int` in your C++ code, they will behave like regular NaN's.

In a logical context they always evaluate to FALSE:

     evalCpp("NAN == 1")
     evalCpp("NAN < 1")
     evalCpp("NAN > 1")
     evalCpp("NAN == NAN")

But be careful when combining then with boolean values:

     evalCpp("NAN && TRUE")
     evalCpp("NAN || FALSE")

In numeric contexts, they propagate similarly to NA in R:

     evalCpp("NAN + 1")
     evalCpp("NAN - 1")
     evalCpp("NAN / 1")
     evalCpp("NAN * 1")


That's very useful to let people know of these issues.

## Vectors

To set a missing value in a vector, you need to use a missing value
specific to the type of vector. Unfortunately these are not named
terribly consistently:

     cppFunction('
       List missing_sampler() {

         NumericVector num(1);
         num[0] = NA_REAL;

         IntegerVector intv(1);
         intv[0] = NA_INTEGER;

         LogicalVector lgl(1);
         lgl[0] = NA_LOGICAL;

         CharacterVector chr(1);
         chr[0] = NA_STRING;

         List out(4);
         out[0] = num;
         out[1] = intv;
         out[2] = lgl;
         out[3] = chr;
         return(out);
       }
     ')
     str(missing_sampler())

To check if a value in a vector is missing, use `ISNA`:

     cppFunction('
       LogicalVector is_na2(NumericVector x) {
         LogicalVector out(x.size());

         NumericVector::iterator x_it;
         LogicalVector::iterator out_it;
         for (x_it = x.begin(), out_it = out.begin(); x_it != x.end();
x_it++, out_it++) {
           *out_it = ISNA(*x_it);
         }
         return(out);
       }
     ')
     is_na2(c(NA, 5.4, 3.2, NA))

Rcpp provides a helper function called `is_na` that works similarly to
`is_na2` above, producing a logical vector that's true where the value
in the vector was missing.


As said above, I'll add

...Vector::is_na
...Vector::get_na

to have something more consistent and not as cryptic astraits::is_na<...>( ). People should not need to know what REALSXP,INTSXP, LGLSXP, ... mean.




--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com
`- http://bit.ly/SweN1Z : SuperStorm Sandy

blog:            http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible

_______________________________________________
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Re: [Rcpp-devel] Missing values

Reply via email to