HI, Dear R community, I have one data set like this, What I want to do is to calculate the cumulative coverage. The following codes works for small data set (#rows = 100), but when feed the whole data set, it still running after 24 hours. Can someone give some suggestions for long vector?
id reads Contig79:1 4 Contig79:2 8 Contig79:3 13 Contig79:4 14 Contig79:5 17 Contig79:6 20 Contig79:7 25 Contig79:8 27 Contig79:9 32 Contig79:10 33 Contig79:11 34 matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth", sep="\t", skip=0, header=F,fill=T) # dim(matt) [1] 3384766 2 matt_plot<-function(matt, outputfile) { names(matt)<-c("id","reads") cover<-matt$reads #calculate the cumulative coverage. + cover_per<-function (data) { + output<-numeric(0) + for (i in data) { + x<-(100*sum(ifelse(data >= i, 1, 0))/length(data)) + output<-c(output, x) + } + return(output) + } result<-cover_per(cover) Thanks so much! -- Sincerely, Changbin -- [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.