On Jun 15, 2013, at 2:38 PM, Fabrice Tourre wrote:

> Dear expert,
> 
> How can I make follow codematrix function more faster?
> 
> top20.dat <- top20.dat[,7:length(top20.dat[1,])]
> top40.dat <-
> read.table("top40snps.ped",header=F,sep="\t",stringsAsFactors=F)

Did you attach a file with a non-'.txt' extension?

> row.names(top40.dat) <- top40.dat[,1]
> top40.dat <- top40.dat[,7:length(top40.dat[1,])]
> codematrix <- function(dat)
> {
>    new.dat <- dat
>    for(col in 1:length(dat[1,]))
>    {

I'm guessing that using `ifelse` would be much faster than going column by 
column and and then row by painful row through this testing with nested 
`if(cond){conseq}else{alter}` . You should gain efficiency by setting up the 
results of the `strsplit` operation on a full column at a time. Build a 
function that would work one column at a time and then lapply it to the 
dataframe.

>        tbl <- table(dat[,col])
>        max.allel <- names(which(tbl==max(table(dat[,col]))))
>        for(row in 1:length(dat[,1]))
>            {
>                if(dat[row,col]=="0 0")
>                {
>                    new.dat[row,col]=NA
>                }else{
>                    if(dat[row,col]==max.allel) {
>                        new.dat[row,col]=0
>                    }else{
>                        allele <- unlist(strsplit(
> as.character(dat[row,col])," "))
>                        if(allele[1]==allele[2]){
>                            new.dat[row,col]=2
>                        }else{
>                            new.dat[row,col]=1

You could leave the "==max.allelle" test on the outer of nested ifelse 
operations to "overwrite" the resutls of the testing of the two split-bits. But 
I would make it a %in%-test so that it won't fail when mor than one maximum 
occur.

Perhaps (untested and a lot of guesswork):

testsplitfunc <- function(col){
         temptbl <- table(col)
         tempspl <-  strsplit(as.character(col) , split=" ")
         allele <-cbind( sapply(temp, "[", 1),
                         sapply(temp, "[", 2) )
         res <- ifelse ( col %in% names(temptbl)[ which(tbl==max(temptbl))] , 
                     0,
                     ifelse( allele[,1]==allele[,2], 2, 1) )
         is.na(res) <- col=="0 0"
         }

code.top20 <- do.call(cbind, lapply(top20.dat, testsplitfunc) )
                     




>                        }
>                    }
>                }
>            }
>        #})
>        cat(paste(col," ",sep=""))
>    }
>    return(new.dat)
> }
> code.top20 <- codematrix(top20.dat)

In the absence of a problem description I will leave the details unaddressed.
> 
>       [[alternative HTML version deleted]]


David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to