Re: [R] Use apply only on non-missing values

Joris Meys Wed, 02 Jun 2010 16:36:19 -0700

Not really a direct answer on your question, but:
> system.time(replicate(10000,apply(as.matrix(theta), 1, rasch, b_vector)))
   user  system elapsed
   4.51    0.03    4.55


> system.time(replicate(10000,theta%*%t(b_vector)))
   user  system elapsed
   0.25    0.00    0.25

It does make a difference on large datasets...
Cheers
Joris

On Wed, Jun 2, 2010 at 4:44 PM, Doran, Harold <hdo...@air.org> wrote:

> I have a function that I am currently using very inefficiently. The
> following are needed to illustrate the problem:
>
> set.seed(12345)
> dat <- matrix(sample(c(0,1), 110, replace = TRUE), nrow = 11, ncol=10)
> mis <- sample(1:110, 5)
> dat[mis] <- NA
> theta <- rnorm(11)
> b_vector <- runif(10, -4,4)
> empty <- which(is.na(t(dat)))
>
> So, I have a matrix (dat) with some values within the matrix missing. In my
> real world problem, the matrix is huge, and most values are missing. The
> function in question is called derivs() and is below. But, let me step
> through the inefficient portions.
>
> First, I create a matrix of some predicted probabilities as:
>
> rasch <- function(theta,b) 1/ (1 + exp(b-theta))
> mat <- apply(as.matrix(theta), 1, rasch, b_vector)
>
> However, I only need those predicted probabilities in places where the data
> are not missing. So, the next step in the function is
>
> mat[empty] <- NA
>
> which manually places NAs in places where the data are missing (notice the
> matrix 'mat' is the transpose of the data matrix and so I get the empty
> positions from the transpose of dat).
>
> Afterwards, the function computes the gradient and hessians needed to
> complete the MLE estimation.
>
> All of this works in the sense that it yields the correct answers for my
> problem. But, the glaring problem is that I create predicted probabilities
> for every cell in 'mat' when in many cases they are not needed. I end up
> replacing those values with NAs. In my real world problem, this is horribly
> inefficient and slow.
>
> My question is then is there a way to use apply such that is computes the
> necessary predicted probabilities only when the data are not missing to
> yield the matrix 'mat'. My desired end result is the matrix 'mat' created
> after the manually placing the NAs in the appropriate cells.
>
> Thanks
> Harold
>
>
> derivs <- function(dat, b_vector, theta){
>                                mat <- apply(as.matrix(theta), 1, rasch,
> b_vector)
>                                mat[empty] <- NA
>                                gradient <- -(colSums(dat, na.rm = TRUE) -
> rowSums(mat, na.rm = TRUE))
>                                hessian <-  -(rowSums(mat * (1-mat), na.rm =
> TRUE))
>                                list('gradient' = gradient, 'hessian' =
> hessian)
>                }
>
>
>
> > sessionInfo()
> R version 2.10.1 (2009-12-14)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.1
> >
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Use apply only on non-missing values

Reply via email to