[Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)
Hi, I have written small code in C++ using Armadillo and inline with RcppArmadillo package. The input is data.marix(X). Some cells might be NAs. Example in R: X = matrix(sample(c(rnorm(10*9.9),NA)),ncol=10) I am calculating conditional correlation on columns of that matrix, just picking vectors, so cor(X,Y). The problem is that sometimes I might have empty cell in one or both vectors, in that case I would like to skip that row, and procede with calculating Pearson's correlation on remaining data. I know that there will be difference in degrees of freedom, but I have over 100 rows, so skiping few shouldnt matter that much. Basically my question boils down to solving the problem: How to find which colvec cells are nan, and remove this index from both X and Y colvec, before calculating correlation. I would be very grateful for help, Kind regards, Mateusz Kaduk ___ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)
Dear Dirk, I dont see how to do that in Armadillo, but I think I can create same size NumericMatrix B = is.na(X); This maybe I could use for indexing ? To save time, and not to introduce extra looping. Also, I want to perform regression column X on Z, and column Y on Z. Does arma::solve(...) handle nan values ? or I guess I have to check that myself ? The function works nicely, and while R implementation takes at least 40min (surely more because I terminated), Armadillo computes everything in less then one minute. But I skipped columns with missing cells, which now I want to include. Thanks, Mateusz On 9 October 2012 17:37, Dirk Eddelbuettel wrote: > > On 9 October 2012 at 10:08, Douglas Bates wrote: > | You may find it easier to use the Rcpp class NumericMatrix than to use > | RcppArmadillo. Detection of NA's is built in to R and Rcpp classes but > not > | RcppArmadillo. For each pair of columns, run a loop that checks for NA's > at > | each position in each column, skips the position if NA's are detected and > | otherwise increments the squared sums, cross-product and number of > elements. > > All true, but at the same time, once in RcppArmadillo or RcppEigen ... it > is > convenient to stay there. > > You should be able to set up a little "sweeper" function which applies one > of > > R_IsNA just NA > R_IsNaN just NaN > R_IsFinite NA, NaN or Inf > > across a vector or matrix and returns you an index vector. Armadillo can > use > indexing vectors in ways that are similar in R. > > Dirk > > > | > | On Oct 9, 2012 9:53 AM, "[email protected]" < > [email protected]> > | wrote: > | > | Hi, > | > | I have written small code in C++ using Armadillo and inline with > | RcppArmadillo package. > | The input is data.marix(X). Some cells might be NAs. Example in R: X > = > | matrix(sample(c(rnorm(10*9.9),NA)),ncol=10) > | > | I am calculating conditional correlation on columns of that matrix, > just > | picking vectors, so cor(X,Y). > | The problem is that sometimes I might have empty cell in one or both > | vectors, in that case I would like to skip that row, and procede with > | calculating Pearson's correlation on remaining data. I know that > there will > | be difference in degrees of freedom, but I have over 100 rows, so > skiping > | few shouldnt matter that much. > | > | Basically my question boils down to solving the problem: > | How to find which colvec cells are nan, and remove this index from > both X > | and Y colvec, before calculating correlation. > | > | I would be very grateful for help, > | > | Kind regards, > | Mateusz Kaduk > | > | ___ > | Rcpp-devel mailing list > | [email protected] > | > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel > | > | > | -- > | ___ > | Rcpp-devel mailing list > | [email protected] > | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel > -- > Dirk Eddelbuettel | [email protected] | http://dirk.eddelbuettel.com > ___ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)
Can you provide an example how to convert Armadillo colvec to uvec vector which I assume works as selector for rows? Thanks On 9 October 2012 18:17, [email protected] wrote: > Dear Dirk, > > I dont see how to do that in Armadillo, but I think I can create same size > NumericMatrix B = is.na(X); > This maybe I could use for indexing ? To save time, and not to introduce > extra looping. > > Also, I want to perform regression column X on Z, and column Y on Z. > Does arma::solve(...) handle nan values ? or I guess I have to check that > myself ? > > The function works nicely, and while R implementation takes at least 40min > (surely more because I terminated), Armadillo computes everything in less > then one minute. But I skipped columns with missing cells, which now I want > to include. > > Thanks, > Mateusz > > > On 9 October 2012 17:37, Dirk Eddelbuettel wrote: > >> >> On 9 October 2012 at 10:08, Douglas Bates wrote: >> | You may find it easier to use the Rcpp class NumericMatrix than to use >> | RcppArmadillo. Detection of NA's is built in to R and Rcpp classes but >> not >> | RcppArmadillo. For each pair of columns, run a loop that checks for >> NA's at >> | each position in each column, skips the position if NA's are detected >> and >> | otherwise increments the squared sums, cross-product and number of >> elements. >> >> All true, but at the same time, once in RcppArmadillo or RcppEigen ... it >> is >> convenient to stay there. >> >> You should be able to set up a little "sweeper" function which applies >> one of >> >> R_IsNA just NA >> R_IsNaN just NaN >> R_IsFinite NA, NaN or Inf >> >> across a vector or matrix and returns you an index vector. Armadillo can >> use >> indexing vectors in ways that are similar in R. >> >> Dirk >> >> >> | >> | On Oct 9, 2012 9:53 AM, "[email protected]" < >> [email protected]> >> | wrote: >> | >> | Hi, >> | >> | I have written small code in C++ using Armadillo and inline with >> | RcppArmadillo package. >> | The input is data.marix(X). Some cells might be NAs. Example in R: >> X = >> | matrix(sample(c(rnorm(10*9.9),NA)),ncol=10) >> | >> | I am calculating conditional correlation on columns of that matrix, >> just >> | picking vectors, so cor(X,Y). >> | The problem is that sometimes I might have empty cell in one or both >> | vectors, in that case I would like to skip that row, and procede >> with >> | calculating Pearson's correlation on remaining data. I know that >> there will >> | be difference in degrees of freedom, but I have over 100 rows, so >> skiping >> | few shouldnt matter that much. >> | >> | Basically my question boils down to solving the problem: >> | How to find which colvec cells are nan, and remove this index from >> both X >> | and Y colvec, before calculating correlation. >> | >> | I would be very grateful for help, >> | >> | Kind regards, >> | Mateusz Kaduk >> | >> | ___ >> | Rcpp-devel mailing list >> | [email protected] >> | >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel >> | >> | >> | -- >> | ___ >> | Rcpp-devel mailing list >> | [email protected] >> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel >> -- >> Dirk Eddelbuettel | [email protected] | http://dirk.eddelbuettel.com >> > > ___ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)
I am not running R's cor(..) function. I am calling arma::cor(...) on two
vectors. I cannot use R's function, because I have some nested loops and
extra steps before calculating Pearson's correlation.
The only thing I am providing in R is the matrix with values and matrix
with zeros and ones where missings are.
Now in ArmadilloRcpp I need to select rows with values.
The problem is that given arma::mat x and arma::colvec rowind, I cannot
select value in this way
colvec col = x(rowind,i);
So forget about correlations, there are more steps. The problem is how to
select with Armadillo values in rows from specific column.
Lets say colvec col = { 1,1,1,1,0,1}
and I want to select only values where there is 1 from column "i" of
arma::mat x
I hope it became clear, that this is not related to correlations, but how
to select values (without missings).
On 9 October 2012 19:43, Douglas Bates wrote:
> By the way, how are you calculating this correlation in R? Are you
> using the cor function?
>
> I'm confused because the cor function in R does a bit of bookkeeping
> then calls a C function using .Internal. It seems unlikely to me that
> one could make a C++/Rcpp function run much faster.
>
>
___
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)
Maybe I am not good in coding, but even though I managed to make uvec with ones and zeros, I cannot use it to select rows in given column, compiler complains error: no matching function for call to ‘arma::Mat::submat(arma::uvec&, int&)’ On 9 October 2012 19:55, Dirk Eddelbuettel wrote: > > On 9 October 2012 at 19:07, [email protected] wrote: > | Can you provide an example how to convert Armadillo colvec to uvec > vector which > | I assume works as selector for rows? > > You may need to loop (or use STL iterators) to fill the uvec position by > position. Then use the subset, and proceed with your correlation > calculation. I don't think there is a shortcut. > > Dirk > > -- > Dirk Eddelbuettel | [email protected] | http://dirk.eddelbuettel.com > ___ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)
The problem is that I want to calculate regression etc. with these shorter vectors. So yes what I do first is I am calculating vector with zero wherever the missing will occur, simply by pairwise multiplication. colvec indX = a(span::all,i); colvec indY = a(span::all,j) colvec indZ = a(span::all,k); colvec ind = indX % indY % indZ; uvec rowind = conv_to::from(ind); the a matrix is is the arma::mat object with ones where there is value, and zeros where there are missings (nan). ind is the colvec with pairwise product, and for any operation and regression further I want to select with rowind. But compiler complains on colvec colX = x.submat(rowind,i); colvec colY = x.submat(rowind,j); I think I can use just cor(..) if I remove the common row, in all vectors I use for i and j columns of the matrix. Let me think of example. On 9 October 2012 20:18, Douglas Bates wrote: > > On Tue, Oct 9, 2012 at 1:06 PM, [email protected] > wrote: > > Maybe I am not good in coding, but even though I managed to make uvec with > > ones and zeros, I cannot use it to select rows in given column, compiler > > complains > > error: no matching function for call to > > ‘arma::Mat::submat(arma::uvec&, int&)’ > > Basically your problem comes down to looking at each pair of columns > and determining the joint missingness pattern. You can try to force > the calculation into arma but I think you are better off just writing > the code in loops. To be careful you should calculate the means of > each column on the pairwise non-missing values as in the enclosed > function. > > > On 9 October 2012 19:55, Dirk Eddelbuettel wrote: > >> > >> > >> On 9 October 2012 at 19:07, [email protected] wrote: > >> | Can you provide an example how to convert Armadillo colvec to uvec > >> vector which > >> | I assume works as selector for rows? > >> > >> You may need to loop (or use STL iterators) to fill the uvec position by > >> position. Then use the subset, and proceed with your correlation > >> calculation. I don't think there is a shortcut. > >> > >> Dirk > >> > >> -- > >> Dirk Eddelbuettel | [email protected] | http://dirk.eddelbuettel.com > > > > ___ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)
I already solved my problem, but I don't want to continue since it's off topic. Mateusz On 12 October 2012 01:17, Davor Cubranic wrote: > According to Armadillo docs, submat's arguments are two uvec's, not a uvec > and an int. > > Davor ___ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
[Rcpp-devel] A proper R arrays with any dimensions in Rcpp ?
Hi, I would like to implement algorithm which works on 4 or more dimensional array. Is it possible to have full support of R arrays in Rcpp ? Thanks ___ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
