On Jun 15, 2013, at 2:38 PM, Fabrice Tourre wrote: > Dear expert, > > How can I make follow codematrix function more faster? > > top20.dat <- top20.dat[,7:length(top20.dat[1,])] > top40.dat <- > read.table("top40snps.ped",header=F,sep="\t",stringsAsFactors=F)
Did you attach a file with a non-'.txt' extension? > row.names(top40.dat) <- top40.dat[,1] > top40.dat <- top40.dat[,7:length(top40.dat[1,])] > codematrix <- function(dat) > { > new.dat <- dat > for(col in 1:length(dat[1,])) > { I'm guessing that using `ifelse` would be much faster than going column by column and and then row by painful row through this testing with nested `if(cond){conseq}else{alter}` . You should gain efficiency by setting up the results of the `strsplit` operation on a full column at a time. Build a function that would work one column at a time and then lapply it to the dataframe. > tbl <- table(dat[,col]) > max.allel <- names(which(tbl==max(table(dat[,col])))) > for(row in 1:length(dat[,1])) > { > if(dat[row,col]=="0 0") > { > new.dat[row,col]=NA > }else{ > if(dat[row,col]==max.allel) { > new.dat[row,col]=0 > }else{ > allele <- unlist(strsplit( > as.character(dat[row,col])," ")) > if(allele[1]==allele[2]){ > new.dat[row,col]=2 > }else{ > new.dat[row,col]=1 You could leave the "==max.allelle" test on the outer of nested ifelse operations to "overwrite" the resutls of the testing of the two split-bits. But I would make it a %in%-test so that it won't fail when mor than one maximum occur. Perhaps (untested and a lot of guesswork): testsplitfunc <- function(col){ temptbl <- table(col) tempspl <- strsplit(as.character(col) , split=" ") allele <-cbind( sapply(temp, "[", 1), sapply(temp, "[", 2) ) res <- ifelse ( col %in% names(temptbl)[ which(tbl==max(temptbl))] , 0, ifelse( allele[,1]==allele[,2], 2, 1) ) is.na(res) <- col=="0 0" } code.top20 <- do.call(cbind, lapply(top20.dat, testsplitfunc) ) > } > } > } > } > #}) > cat(paste(col," ",sep="")) > } > return(new.dat) > } > code.top20 <- codematrix(top20.dat) In the absence of a problem description I will leave the details unaddressed. > > [[alternative HTML version deleted]] David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.