[Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)

2012-10-09 Thread mateusz.ka...@gmail.com
Hi,

I have written small code in C++ using Armadillo and inline with
RcppArmadillo package.
The input is data.marix(X). Some cells might be NAs. Example in R: X =
matrix(sample(c(rnorm(10*9.9),NA)),ncol=10)

I am calculating conditional correlation on columns of that matrix, just
picking vectors, so cor(X,Y).
The problem is that sometimes I might have empty cell in one or both
vectors, in that case I would like to skip that row, and procede with
calculating Pearson's correlation on remaining data. I know that there will
be difference in degrees of freedom, but I have over 100 rows, so skiping
few shouldnt matter that much.

Basically my question boils down to solving the problem:
How to find which colvec cells are nan, and remove this index from both X
and Y colvec, before calculating correlation.

I would be very grateful for help,

Kind regards,
Mateusz Kaduk
___
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)

2012-10-09 Thread mateusz.ka...@gmail.com
Dear Dirk,

I dont see how to do that in Armadillo, but I think I can create same size
NumericMatrix B = is.na(X);
This maybe I could use for indexing ? To save time, and not to introduce
extra looping.

Also, I want to perform regression column X on Z, and column Y on Z.
Does arma::solve(...) handle nan values ? or I guess I have to check that
myself ?

The function works nicely, and while R implementation takes at least 40min
(surely more because I terminated), Armadillo computes everything in less
then one minute. But I skipped columns with missing cells, which now I want
to include.

Thanks,
Mateusz

On 9 October 2012 17:37, Dirk Eddelbuettel  wrote:

>
> On 9 October 2012 at 10:08, Douglas Bates wrote:
> | You may find it easier to use the Rcpp class NumericMatrix than to use
> | RcppArmadillo.  Detection of NA's is built in to R and Rcpp classes but
> not
> | RcppArmadillo. For each pair of columns, run a loop that checks for NA's
> at
> | each position in each column, skips the position if NA's are detected and
> | otherwise increments the squared sums, cross-product and number of
> elements.
>
> All true, but at the same time, once in RcppArmadillo or RcppEigen ... it
> is
> convenient to stay there.
>
> You should be able to set up a little "sweeper" function which applies one
> of
>
> R_IsNA  just NA
> R_IsNaN just NaN
> R_IsFinite  NA, NaN or Inf
>
> across a vector or matrix and returns you an index vector. Armadillo can
> use
> indexing vectors in ways that are similar in R.
>
> Dirk
>
>
> |
> | On Oct 9, 2012 9:53 AM, "[email protected]" <
> [email protected]>
> | wrote:
> |
> | Hi,
> |
> | I have written small code in C++ using Armadillo and inline with
> | RcppArmadillo package.
> | The input is data.marix(X). Some cells might be NAs. Example in R: X
> =
> | matrix(sample(c(rnorm(10*9.9),NA)),ncol=10)
> |
> | I am calculating conditional correlation on columns of that matrix,
> just
> | picking vectors, so cor(X,Y).
> | The problem is that sometimes I might have empty cell in one or both
> | vectors, in that case I would like to skip that row, and procede with
> | calculating Pearson's correlation on remaining data. I know that
> there will
> | be difference in degrees of freedom, but I have over 100 rows, so
> skiping
> | few shouldnt matter that much.
> |
> | Basically my question boils down to solving the problem:
> | How to find which colvec cells are nan, and remove this index from
> both X
> | and Y colvec, before calculating correlation.
> |
> | I would be very grateful for help,
> |
> | Kind regards,
> | Mateusz Kaduk
> |
> | ___
> | Rcpp-devel mailing list
> | [email protected]
> |
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> |
> |
> | --
> | ___
> | Rcpp-devel mailing list
> | [email protected]
> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> --
> Dirk Eddelbuettel | [email protected] | http://dirk.eddelbuettel.com
>
___
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)

2012-10-09 Thread mateusz.ka...@gmail.com
Can you provide an example how to convert Armadillo colvec to uvec vector
which I assume works as selector for rows?

Thanks

On 9 October 2012 18:17, [email protected] wrote:

> Dear Dirk,
>
> I dont see how to do that in Armadillo, but I think I can create same size
> NumericMatrix B = is.na(X);
> This maybe I could use for indexing ? To save time, and not to introduce
> extra looping.
>
> Also, I want to perform regression column X on Z, and column Y on Z.
> Does arma::solve(...) handle nan values ? or I guess I have to check that
> myself ?
>
> The function works nicely, and while R implementation takes at least 40min
> (surely more because I terminated), Armadillo computes everything in less
> then one minute. But I skipped columns with missing cells, which now I want
> to include.
>
> Thanks,
> Mateusz
>
>
> On 9 October 2012 17:37, Dirk Eddelbuettel  wrote:
>
>>
>> On 9 October 2012 at 10:08, Douglas Bates wrote:
>> | You may find it easier to use the Rcpp class NumericMatrix than to use
>> | RcppArmadillo.  Detection of NA's is built in to R and Rcpp classes but
>> not
>> | RcppArmadillo. For each pair of columns, run a loop that checks for
>> NA's at
>> | each position in each column, skips the position if NA's are detected
>> and
>> | otherwise increments the squared sums, cross-product and number of
>> elements.
>>
>> All true, but at the same time, once in RcppArmadillo or RcppEigen ... it
>> is
>> convenient to stay there.
>>
>> You should be able to set up a little "sweeper" function which applies
>> one of
>>
>> R_IsNA  just NA
>> R_IsNaN just NaN
>> R_IsFinite  NA, NaN or Inf
>>
>> across a vector or matrix and returns you an index vector. Armadillo can
>> use
>> indexing vectors in ways that are similar in R.
>>
>> Dirk
>>
>>
>> |
>> | On Oct 9, 2012 9:53 AM, "[email protected]" <
>> [email protected]>
>> | wrote:
>> |
>> | Hi,
>> |
>> | I have written small code in C++ using Armadillo and inline with
>> | RcppArmadillo package.
>> | The input is data.marix(X). Some cells might be NAs. Example in R:
>> X =
>> | matrix(sample(c(rnorm(10*9.9),NA)),ncol=10)
>> |
>> | I am calculating conditional correlation on columns of that matrix,
>> just
>> | picking vectors, so cor(X,Y).
>> | The problem is that sometimes I might have empty cell in one or both
>> | vectors, in that case I would like to skip that row, and procede
>> with
>> | calculating Pearson's correlation on remaining data. I know that
>> there will
>> | be difference in degrees of freedom, but I have over 100 rows, so
>> skiping
>> | few shouldnt matter that much.
>> |
>> | Basically my question boils down to solving the problem:
>> | How to find which colvec cells are nan, and remove this index from
>> both X
>> | and Y colvec, before calculating correlation.
>> |
>> | I would be very grateful for help,
>> |
>> | Kind regards,
>> | Mateusz Kaduk
>> |
>> | ___
>> | Rcpp-devel mailing list
>> | [email protected]
>> |
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>> |
>> |
>> | --
>> | ___
>> | Rcpp-devel mailing list
>> | [email protected]
>> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>> --
>> Dirk Eddelbuettel | [email protected] | http://dirk.eddelbuettel.com
>>
>
>
___
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)

2012-10-09 Thread mateusz.ka...@gmail.com
I am not running R's cor(..) function. I am calling arma::cor(...) on two
vectors. I cannot use R's function, because I have some nested loops and
extra steps before calculating Pearson's correlation.

The only thing I am providing in R is the matrix with values and matrix
with zeros and ones where missings are.
Now in ArmadilloRcpp I need to select rows with values.

The problem is that given arma::mat x and arma::colvec rowind, I cannot
select value in this way
colvec col = x(rowind,i);

So forget about correlations, there are more steps. The problem is how to
select with Armadillo values in rows from specific column.

Lets say colvec col = { 1,1,1,1,0,1}
and I want to select only values where there is 1 from column "i" of
arma::mat x

I hope it became clear, that this is not related to correlations, but how
to select values (without missings).

On 9 October 2012 19:43, Douglas Bates  wrote:

> By the way, how are you calculating this correlation in R?  Are you
> using the cor function?
>
> I'm confused because the cor function in R does a bit of bookkeeping
> then calls a C function using .Internal.  It seems unlikely to me that
> one could make a C++/Rcpp function run much faster.
>
>
___
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)

2012-10-09 Thread mateusz.ka...@gmail.com
Maybe I am not good in coding, but even though I managed to make uvec with
ones and zeros, I cannot use it to select rows  in given column, compiler
complains
error: no matching function for call to
‘arma::Mat::submat(arma::uvec&, int&)’

On 9 October 2012 19:55, Dirk Eddelbuettel  wrote:

>
> On 9 October 2012 at 19:07, [email protected] wrote:
> | Can you provide an example how to convert Armadillo colvec to uvec
> vector which
> | I assume works as selector for rows?
>
> You may need to loop (or use STL iterators) to fill the uvec position by
> position.  Then use the subset, and proceed with your correlation
> calculation.  I don't think there is a shortcut.
>
> Dirk
>
> --
> Dirk Eddelbuettel | [email protected] | http://dirk.eddelbuettel.com
>
___
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)

2012-10-09 Thread mateusz.ka...@gmail.com
The problem is that I want to calculate regression etc. with these
shorter vectors.
So yes what I do first is I am calculating vector with zero wherever
the missing will occur, simply by pairwise multiplication.

 colvec indX = a(span::all,i);
  colvec indY = a(span::all,j)
  colvec indZ = a(span::all,k);
  colvec ind = indX % indY % indZ;
  uvec rowind = conv_to::from(ind);

the a matrix is is the arma::mat object with ones where there is
value, and zeros where there are missings (nan).
ind is the colvec with pairwise product, and for any operation and
regression further I want to select with rowind.

But compiler complains on
colvec colX = x.submat(rowind,i);
colvec colY = x.submat(rowind,j);

I think I can use just cor(..) if I remove the common row, in all
vectors I use for i and j columns of the matrix.
Let me think of example.

On 9 October 2012 20:18, Douglas Bates  wrote:
>
> On Tue, Oct 9, 2012 at 1:06 PM, [email protected]
>  wrote:
> > Maybe I am not good in coding, but even though I managed to make uvec with
> > ones and zeros, I cannot use it to select rows  in given column, compiler
> > complains
> > error: no matching function for call to
> > ‘arma::Mat::submat(arma::uvec&, int&)’
>
> Basically your problem comes down to looking at each pair of columns
> and determining the joint missingness pattern.  You can try to force
> the calculation into arma but I think you are better off just writing
> the code in loops.  To be careful you should calculate the means of
> each column on the pairwise non-missing values as in the enclosed
> function.
>
> > On 9 October 2012 19:55, Dirk Eddelbuettel  wrote:
> >>
> >>
> >> On 9 October 2012 at 19:07, [email protected] wrote:
> >> | Can you provide an example how to convert Armadillo colvec to uvec
> >> vector which
> >> | I assume works as selector for rows?
> >>
> >> You may need to loop (or use STL iterators) to fill the uvec position by
> >> position.  Then use the subset, and proceed with your correlation
> >> calculation.  I don't think there is a shortcut.
> >>
> >> Dirk
> >>
> >> --
> >> Dirk Eddelbuettel | [email protected] | http://dirk.eddelbuettel.com
> >
> >
___
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Re: [Rcpp-devel] Correlation in ArmadilloRcpp with missing values (nan)

2012-10-11 Thread mateusz.ka...@gmail.com
I already solved my problem, but I don't want to continue since it's off topic.

Mateusz

On 12 October 2012 01:17, Davor Cubranic  wrote:
> According to Armadillo docs, submat's arguments are two uvec's, not a uvec
> and an int.
>
> Davor
___
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


[Rcpp-devel] A proper R arrays with any dimensions in Rcpp ?

2012-02-20 Thread mateusz.ka...@gmail.com
Hi,

I would like to implement algorithm which works on 4 or more dimensional
array. Is it possible to have full support of R arrays in Rcpp ?

Thanks
___
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel