[R] Is that an efficient way to find the overlapped , upstream and downstream rangess for a bunch of rangess
I do have a bunch of genes ( nearly ~5) from the whole genome, which read in genomic ranges A range(gene) can be seem as an observation has three columns chromosome, start and end, like that seqnames start end width strand gene1 chr1 1 5 5 + gene2 chr110 15 6 + gene3 chr112 17 6 + gene4 chr120 25 6 + gene5 chr130 4011 + I just wondering is there an efficient way to find *overlapped, upstream and downstream genes for each gene in the granges* For example, assuming all_genes_gr is a ~5 genes genomic range, the result I want like belows: gene_name upstream_gene downstream_gene overlapped_gene gene1 NA gene2 NA gene2 gene1 gene4 gene3 gene3 gene1 gene4 gene2 gene4 gene3 gene5 NA Currently , the strategy I use is like that, library(GenomicRanges) find_overlapped_gene <- function(idx, all_genes_gr) { #cat(idx, "\n") curr_gene <- all_genes_gr[idx] other_genes <- all_genes_gr[-idx] n <- countOverlaps(curr_gene, other_genes) gene <- subsetByOverlaps(curr_gene, other_genes) return(list(n, gene)) } system.time(lapply(1:100, function(idx) find_overlapped_gene(idx, all_genes_gr))) However, for 100 genes, it use nearly ~8s by system.time().That means if I had 5 genes, nearly one hour for just find overlapped gene. I am just wondering any algorithm or strategy to do that efficiently, perhaps 5 genes in ~10min or even less Yao He [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Transform a list of multiple to a data.frame which I want
Dear all: I have a list like that,which is a standard str_locate_all() function (stringr package) output: $K start end $GSEGTCSCSSK start end [1,] 6 6 [2,] 8 8 $GFSTTCPAHVDDLTPEQVLDGDVNELMDVVLHHVPEAK start end [1,] 6 6 $LVECIGQELIFLLPNK start end [1,] 4 4 $NFK start end $HR start end $AYASLFR start end I want to transform this list like that: ID start.1 start.2 K NA NA GSEGTCSCSSK 6 8 GFSTTCPAHVDDLTPEQVLDGDVNELMDVVLHHVPEAK 6 NA LVECIGQELIFLLPNK 4 NA NFK NA NA HR NA NA AYASLFR NA NA I have already tried to use t() , lapply() but I think it is hard to handle the NA value and different rows in every matrix Thanks in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Do association study based on mixed linear model
Dear All I want to do association study based on mixed linear model, My model not only includes serval fixed effects and random effects but also incorporates some covariates such as birth weight. Otherwise, the size of the data are about 180 individuals and 12 variables and 6 Fixed effect estimates As asreml-R is not free ,is there any packages for my study? I heard nlme or lme4 but I'm not sure whether they could incorporate covariates and what about their computational efficiency? Thanks for you recommendation Yao He — Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to do association study based on mixed linear model
Dear All: I want to do association study based on mixed linear model, My model not only includes serval fixed effects and random effects but also incorporates some covariates such as birth weight. Otherwise, the size of the data are about 180 individuals and 12 variables and 6 Fixed effect estimates As asreml-R is not free ,is there any packages for my study? I heard nlme or lme4 but I'm not sure whether they could incorporate covariates and what about their computational efficiency? Thanks for you recommendation Yao He — Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to transpose it in a fast way?
Thanks for everybody's help! I learn a lot from this discuss! 2013/3/10 jim holtman jholt...@gmail.com: Did you check out the 'colbycol' package. On Fri, Mar 8, 2013 at 5:46 PM, Martin Morgan mtmor...@fhcrc.org wrote: On 03/08/2013 06:01 AM, Jan van der Laan wrote: You could use the fact that scan reads the data rowwise, and the fact that arrays are stored columnwise: # generate a small example dataset exampl - array(letters[1:25], dim=c(5,5)) write.table(exampl, file=example.dat, row.names=FALSE. col.names=FALSE, sep=\t, quote=FALSE) # and read... d - scan(example.dat, what=character()) d - array(d, dim=c(5,5)) t(exampl) == d Although this is probably faster, it doesn't help with the large size. You could used the n option of scan to read chunks/blocks and feed those to, for example, an ff array (which you ideally have preallocated). I think it's worth asking what the overall goal is; all we get from this exercise is another large file that we can't easily manipulate in R! But nothing like a little challenge. The idea I think would be to transpose in chunks of rows by scanning in some number of rows and writing to a temporary file tpose1 - function(fin, nrowPerChunk, ncol) { v - scan(fin, character(), nmax=ncol * nrowPerChunk) m - matrix(v, ncol=ncol, byrow=TRUE) fout - tempfile() write(m, fout, nrow(m), append=TRUE) fout } Apparently the data is 60k x 60k, so we could maybe easily read 60k x 10k at a time from some file fl - big.txt ncol - 6L nrowPerChunk - 1L nChunks - ncol / nrowPerChunk fin - file(fl); open(fin) fls - replicate(nChunks, tpose1(fin, nrowPerChunk, ncol)) close(fin) 'fls' is now a vector of file paths, each containing a transposed slice of the matrix. The next task is to splice these together. We could do this by taking a slice of rows from each file, cbind'ing them together, and writing to an output splice - function(fout, cons, nrowPerChunk, ncol) { slices - lapply(cons, function(con) { v - scan(con, character(), nmax=nrowPerChunk * ncol) matrix(v, nrowPerChunk, byrow=TRUE) }) m - do.call(cbind, slices) write(t(m), fout, ncol(m), append=TRUE) } We'd need to use open connections as inputs and output cons - lapply(fls, file); for (con in cons) open(con) fout - file(big_transposed.txt); open(fout, w) xx - replicate(nChunks, splice(fout, cons, nrowPerChunk, nrowPerChunk)) for (con in cons) close(con) close(fout) As another approach, it looks like the data are from genotypes. If they really only consist of pairs of A, C, G, T, then two pairs e.g., 'AA' 'CT' could be encoded as a single byte alf - c(A, C, G, T) nms - outer(alf, alf, paste0) map - outer(setNames(as.raw(0:15), nms), setNames(as.raw(bitwShiftL(0:**15, 4)), nms), |) with e.g., map[matrix(c(AA, CT), ncol=2)] [1] d0 This translates the problem of representing the 60k x 60k array as a 3.6 billion element vector of 60k * 60k * 8 bytes (approx. 30 Gbytes) to one of 60k x 30k = 1.8 billion elements (fits in R-2.15 vectors) of approx 1.8 Gbyte (probably usable in an 8 Gbyte laptop). Personally, I would probably put this data in a netcdf / rdf5 file. Perhaps I'd use snpStats or GWAStools in Bioconductor http://bioconductor.org. Martin HTH, Jan peter dalgaard pda...@gmail.com schreef: On Mar 7, 2013, at 01:18 , Yao He wrote: Dear all: I have a big data file of 6 columns and 6 rows like that: AA AC AA AA ...AT CC CC CT CT...TC .. . I want to transpose it and the output is a new like that AA CC AC CC AA CT. AA CT. AT TC. The keypoint is I can't read it into R by read.table() because the data is too large,so I try that: c-file(silygenotype.txt,r**) geno_t-list() repeat{ line-readLines(c,n=1) if (length(line)==0)break #end of file line-unlist(strsplit(line,\**t)) geno_t-cbind(geno_t,line) } write.table(geno_t,xxx.txt) It works but it is too slow ,how to optimize it??? As others have pointed out, that's a lot of data! You seem to have the right idea: If you read the columns line by line there is nothing to transpose. A couple of points, though: - The cbind() is a potential performance hit since it copies the list every time around. geno_t - vector(list, 6) and then geno_t[[i]] - etc - You might use scan() instead of readLines, strsplit - Perhaps consider the data type as you seem to be reading strings with 16 possible values (I suspect that R already optimizes string storage to make this point moot, though.) -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School
[R] how to read a df like that and transform it?
Dear all I have a data.frame like that : father mother num_daughterdaughter 291 39060 NULL 275 42190 NULL 273 42361 49410 281 41631 49408 274 42261 49406 295 38692 49403 49404 287 41130 NULL 295 38711 49401 292 38954 49396 49397 49398 49399 291 39003 49392 How to read it into R and transform it like that: father mother num_daughter daughter1 daughter2 daughter3 daughter4 291 39060 NULL 275 42190 NULL 273 42361 49410 281 41631 49408 274 42261 49406 295 38692 49403 49404 287 41130 NULL 295 38711 49401 292 38954 49396 4939749398 49399 291 39003 49392 library (plyr) and library (reshape2) and other good packages are OK for me. Thanks a lot! Yao He — Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to generate a matrix by an my data.frame
Thanks a lot it works! 2013/1/11 Rui Barradas ruipbarra...@sapo.pt: Hello, Here are two ways. dat - read.table(text = id1id2 value 2353 2353 0.096313 2353 2409 0.301773 [...etc...] 2356 2356 0 2356 2611 0 2611 2611 0 , header = TRUE) mat1 - matrix(nrow = 53, ncol = 53) # initialize with NA's mat1[upper.tri(mat1, diag = TRUE)] - dat$value mat2 - matrix(0, nrow = 53, ncol = 53) # initialize with zeros mat2[upper.tri(mat2, diag = TRUE)] - dat$value Hope this helps, Rui Barradas Em 10-01-2013 15:21, Yao He escreveu: Dear All It is a little hard to give a good small example of my question,so I will show the full data on the bottom and the attachment.Maybe some one could tell me an appropriate way to show it.I'm sorry for the inconvenience. Q:How to generate a 53*53 diagonal matrix by my data Some problems confused me are that: 1.Since it is a diagonal matrix,I have tried to transform col1 and col2 to rowindex and colindex ,but I don't know how to generate matrix by its value's index 2. As you see, the number of 2353 corresponding to other ids in col2 is 53,however,the number of 2409 corresponding to other ids in col2 is 52 and 2500 corresponding to 51 values and so on,so it is hard to use matrix() to generate it id1id2 value 2353 23530.096313 2353 24090.301773 2353 25000.169518 2353 25980.11274 2353 26100.107414 2353 23000.034492 2353 25070.037521 2353 25300.064125 2353 23270.029259 2353 23890.036423 2353 24080.029259 2353 24630.036423 2353 24200.04409 2353 25630.055038 2353 24620.046478 2353 22920.036369 2353 24050.036369 2353 25430.053413 2353 25570.058151 2353 25830.081512 2353 23220.044373 2353 25350.04847 2353 25360.035538 2353 25810.035538 2353 25700.07711 2353 24760.047081 2353 25340.047081 2353 22800.088264 2353 23160.073608 2353 23390.067307 2353 23310.061172 2353 23430.060425 2353 23520.041153 2353 22930.040764 2353 23380.045128 2353 24490.040764 2353 22960.061333 2353 24530.046074 2353 24600.060387 2353 24740.060387 2353 26030.060387 2353 22820.048065 2353 23130.05584 2353 25380.050873 2353 25220.065727 2353 24890.041023 2353 25640.039696 2353 25940.056946 2353 22740.060875 2353 24510.037468 2353 23210 2353 23560 2353 26110 2409 24090.096313 2409 25000.169518 2409 25980.11274 2409 26100.107414 2409 23000.034492 2409 25070.037521 2409 25300.064125 2409 23270.029259 2409 23890.036423 2409 24080.029259 2409 24630.036423 2409 24200.04409 2409 25630.055038 2409 24620.046478 2409 22920.036369 2409 24050.036369 2409 25430.053413 2409 25570.058151 2409 25830.081512 2409 23220.044373 2409 25350.04847 2409 25360.035538 2409 25810.035538 2409 25700.07711 2409 24760.047081 2409 25340.047081 2409 22800.088264 2409 23160.073608 2409 23390.067307 2409 23310.061172 2409 23430.060425 2409 23520.041153 2409 22930.040764 2409 23380.045128 2409 24490.040764 2409 22960.061333 2409 24530.046074 2409 24600.060387 2409 24740.060387 2409 26030.060387 2409 22820.048065 2409 23130.05584 2409 25380.050873 2409 25220.065727 2409 24890.041023 2409 25640.039696 2409 25940.056946 2409 22740.060875 2409 24510.037468 2409 23210 2409 23560 2409 26110 2500 25000.048615 2500 25980.051979 2500 26100.041031 2500 23000.032974 2500 25070.052788 2500 25300.041435 2500 23270.038071 2500 23890.051659 2500 24080.038071 2500 24630.051659 2500 24200.052635 2500 25630.07872 2500 24620.048615 2500 22920.044365 2500 24050.044365 2500 25430.04277 2500 25570.051109 2500 25830.047409 2500 23220.054512 2500 25350.039368 2500 25360.041763 2500 25810.041763 2500 25700.063148 2500 24760.043755 2500 25340.043755 2500 22800.063164 2500 23160.083961 2500 23390.074127 2500 23310.051094 2500 23430.060066 2500 23520.038208 2500 22930.048698 2500 23380.048218 2500 24490.048698 2500 22960.073212 2500 24530.070595 2500 24600.073677 2500 24740.073677 2500 26030.073677 2500 22820.073677 2500 23130.068443 2500 25380.079865 2500 25220.059723 2500 24890.049873 2500 25640.087639 2500 25940.05175 2500 22740.043396 2500 24510.046532 2500 23210 2500 2356
Re: [R] how to count A, C, T, G in each row in a big data.frame?
, TT, TT, CC, TT, CC, CC, TT, CC, AG, GG, GA, GG, GT, CT, GA, CT, AA, AA, GA), X2460 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, AG, GG, GG, GG, TG, CT, GG, CC, AA, AA, AA), X2474 = c(AA, TC, TT, CC, TC, CC, CC, TT, CC, GA, AG, AG, GG, TT, CC, AG, TC, AA, AA, GA), X2603 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AG, AG, GG, TT, CC, AG, CC, AA, AA, GA), X2282 = c(GA, TC, TT, CC, TC, CC, CC, TT, CC, GG, GG, AA, GG, TT, TT, AA, CC, AA, AA, GA), X2313 = c(AG, CT, TT, CC, CT, CC, CC, TT, CC, GG, AG, GA, GG, GT, CC, GA, CT, AA, AA, AA), X2538 = c(AA, CT, TT, CC, CT, CC, CC, TT, CC, GG, AA, AG, GG, TG, CC, AG, CC, AA, AA, AA), X2522 = c(AG, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GA, GG, TT, TC, GG, CC, AA, AG, GA), X2489 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, GT, TC, AG, CC, AA, AA, AG), X2564 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GA, GG, GG, GG, TT, CC, AA, CT, AA, AA, AA), X2594 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AA, AG, GG, TT, TC, AG, TC, AA, AA, AG), X2274 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, TT, CT, GG, CC, AA, AA, GA), X2451 = c(AG, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, TT, CT, GG, CC, AA, AA, GA), X2321 = c(GG, CT, TT, CC, CT, CC, CC, TT, CC, GG, AG, AA, GG, TT, TT, AA, CC, AA, AA, AA), X2356 = c(AA, TC, TT, CC, TC, CC, CC, TT, CC, GA, AG, AG, GG, TG, TC, AG, TT, AA, AA, AA), X2611 = c(AG, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, GA, GG, TT, CT, GA, TT, AA, AA, AG)), .Names = c(name, chr, pos, strand, X2353, X2409, X2500, X2598, X2610, X2300, X2507, X2530, X2327, X2389, X2408, X2463, X2420, X2563, X2462, X2292, X2405, X2543, X2557, X2583, X2322, X2535, X2536, X2581, X2570, X2476, X2534, X2280, X2316, X2339, X2331, X2343, X2352, X2293, X2338, X2449, X2296, X2453, X2460, X2474, X2603, X2282, X2313, X2538, X2522, X2489, X2564, X2594, X2274, X2451, X2321, X2356, X2611), row.names = 27412:27431, class = data.frame) # create a 'key' of characters in the X columns indx - which(grepl(^X, names(x))) x$key - apply(x[, indx], 1, paste, collapse = '') # create counts counts - t(apply(x, 1, function(z){ c(A = nchar(gsub([^A], '', z['key'])) , C = nchar(gsub([^C], '', z['key'])) , G = nchar(gsub([^G], '', z['key'])) , T = nchar(gsub([^T], '', z['key'])) ) })) # output counts A.key C.key G.key T.key 2741281 025 0 27413 029 077 27414 0 0 0 106 27415 0 106 0 0 27416 027 079 27417 0 106 0 0 27418 0 106 0 0 27419 0 0 0 106 27420 0 106 0 0 2742110 096 0 2742237 069 0 2742339 067 0 27424 4 0 102 0 27425 0 02086 27426 065 041 2742740 066 0 27428 078 028 27429 106 0 0 0 2743097 0 9 0 2743168 038 0 On Wed, Jan 9, 2013 at 9:23 AM, Yao He yao.h.1...@gmail.com wrote: Dear All I have a data.frame like that: structure(list(name = c(Gga_rs10722041, Gga_rs10722249, Gga_rs10722565, Gga_rs10723082, Gga_rs10723993, Gga_rs10724555, Gga_rs10726238, Gga_rs10726461, Gga_rs10726774, Gga_rs10726967, Gga_rs10727581, Gga_rs10728004, Gga_rs10728156, Gga_rs10728177, Gga_rs10728373, Gga_rs10728585, Gga_rs10729598, Gga_rs10729643, Gga_rs10729685, Gga_rs10729827), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L, 20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L, 36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L, 2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L, 3619538L), strand = c(+, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +), X2353 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AG, AG, AG, TT, CC, AG, CC, AA, GG, GG), X2409 = c(AA, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, AG, AG, TT, CC, AG, CC, AA, AG, GA), X2500 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, GT, CT, GG, CC, AA, AA, AA), X2598 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AA, AG, GG, TT, CC, AG, TC, AA, AA, AG), X2610 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GA, GG, TT, CC, GA, CC, AA, AA, GA), X2300 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, AA, AG, TT, TC, AA, TC, AA, AG, AA), X2507 = c(AG, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GA, GG, TT, TC, GG, CC, AA, GA, AG), X2530 = c(AG, TC, TT, CC, TC, CC, CC, TT, CC, GG, AA, GG, GG, TT, CC, GG, CC, AA, AA, AA), X2327 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GG, GG, TT, TC, GG, CC, AA, AA, AA), X2389 = c(AA, CC, TT, CC, CC, CC, CC, TT, CC, AG, GG, AG, GG, TT, TC, AG, CC, AA
Re: [R] how to count A, C, T, G in each row in a big data.frame?
Thanks a lot. The problem is that I don't know how to handle the output list as I want calculate the frequency of A or G or T or C by row. Yao He 2013/1/10 Jessica Streicher j.streic...@micromata.de: Sorry, you wanted rows, i wrote for columns #rows would be: test2-apply(test[,-c(1:4)],1,function(x){table(t(x))}) #find single values in a row sapply(test2,function(row){ allVars-paste(names(row),collapse=) u - unique(strsplit(allVars,)[[1]]) parts-sapply(names(row),function(x){u%in%strsplit(x,)[[1]]}) mat-parts%*%row rownames(mat)-u mat }) though i guess lists aren't ideal, but theres another answer as well i see. On 09.01.2013, at 15:23, Yao He wrote: Dear All I have a data.frame like that: structure(list(name = c(Gga_rs10722041, Gga_rs10722249, Gga_rs10722565, Gga_rs10723082, Gga_rs10723993, Gga_rs10724555, Gga_rs10726238, Gga_rs10726461, Gga_rs10726774, Gga_rs10726967, Gga_rs10727581, Gga_rs10728004, Gga_rs10728156, Gga_rs10728177, Gga_rs10728373, Gga_rs10728585, Gga_rs10729598, Gga_rs10729643, Gga_rs10729685, Gga_rs10729827), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L, 20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L, 36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L, 2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L, 3619538L), strand = c(+, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +), X2353 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AG, AG, AG, TT, CC, AG, CC, AA, GG, GG), X2409 = c(AA, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, AG, AG, TT, CC, AG, CC, AA, AG, GA), X2500 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, GT, CT, GG, CC, AA, AA, AA), X2598 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AA, AG, GG, TT, CC, AG, TC, AA, AA, AG), X2610 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GA, GG, TT, CC, GA, CC, AA, AA, GA), X2300 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, AA, AG, TT, TC, AA, TC, AA, AG, AA), X2507 = c(AG, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GA, GG, TT, TC, GG, CC, AA, GA, AG), X2530 = c(AG, TC, TT, CC, TC, CC, CC, TT, CC, GG, AA, GG, GG, TT, CC, GG, CC, AA, AA, AA), X2327 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GG, GG, TT, TC, GG, CC, AA, AA, AA), X2389 = c(AA, CC, TT, CC, CC, CC, CC, TT, CC, AG, GG, AG, GG, TT, TC, AG, CC, AA, AA, AA), X2408 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GA, GG, TT, CC, GA, CC, AA, AA, AG), X2463 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, TT, CT, GG, CC, AA, AA, GA), X2420 = c(GA, TC, TT, CC, TC, CC, CC, TT, CC, GG, AG, GG, GG, TG, TT, GG, CT, AA, AA, AA), X2563 = c(GA, CC, TT, CC, TC, CC, CC, TT, CC, GG, GA, GG, GG, GT, TT, GG, CT, AA, AA, AA), X2462 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AA, GG, GG, GT, TC, GG, CC, AA, AA, AA), X2292 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, AA, GG, TG, TC, AA, TC, AA, AA, AA), X2405 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, AG, GG, TG, TT, AA, CT, AA, AA, AA), X2543 = c(AA, TC, TT, CC, TC, CC, CC, TT, CC, GA, GA, GA, GG, TT, CT, GA, TT, AA, AA, GG), X2557 = c(AG, CT, TT, CC, CT, CC, CC, TT, CC, GG, AG, GA, GG, GT, CT, GA, CT, AA, AA, AG), X2583 = c(GA, CT, TT, CC, CT, CC, CC, TT, CC, GG, GA, GG, GG, GG, CT, GA, CT, AA, AA, AG), X2322 = c(AG, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, GT, TT, GG, CC, AA, AA, GA), X2535 = c(AA, TC, TT, CC, TT, CC, CC, TT, CC, GG, GA, GG, GG, TT, CC, GG, CC, AA, AA, AG), X2536 = c(GA, TC, TT, CC, TC, CC, CC, TT, CC, GG, GG, AG, GG, TT, TC, AG, TC, AA, AA, GA), X2581 = c(AG, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, GA, GG, TT, CC, GA, CT, AA, AA, AG), X2570 = c(AA, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, GG, GG, TT, TC, GG, CC, AA, AA, GG), X2476 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, GT, TC, AG, CC, AA, AA, AG), X2534 = c(GA, TC, TT, CC, TC, CC, CC, TT, CC, GG, GA, AG, GG, TG, CC, AG, TC, AA, AA, AA), X2280 = c(AA, TC, TT, CC, TC, CC, CC, TT, CC, GG, AG, AG, GG, TT, CC, GG, CC, AA, AA, AG), X2316 = c(AA, CC, TT, CC, CC, CC, CC, TT, CC, AG, AA, AA, AG, TT, TC, GG, CT, AA, GG, GG), X2339 = c(AA, CC, TT, CC, CC, CC, CC, TT, CC, GA, AA, GG, GG, GT, CT, GG, TT, AA, AA, AG), X2331 = c(AA, TC, TT, CC, TC, CC, CC, TT, CC, GG, GG, GG, GG, TT, CC, GG, CC, AA, AA, AG), X2343 = c(AA, TC, TT, CC, TC, CC, CC, TT, CC, GG, GG, GG, GG, TT, CT, GG, CC, AA, AA, GA), X2352 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AA, GG, GG, TT, CC, GG, CC, AA, GA, AG), X2293 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, AA, GG, TT, TC, AA, CT, AA, AA, AA), X2338 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, AG, GG, TT, TC, AG, TC, AA, AA, GA), X2449 = c(AA
Re: [R] how to count A, C, T, G in each row in a big data.frame?
It is really a good output. Maybe I could go on with this output. Everytime I understand R further from your help. The first four cols are irrelevant. It is a negligence 2013/1/10 William Dunlap wdun...@tibco.com: Can you get what you need from the following, where 'd' is your data.frame, the first four columns of which are irrelevant to this problem? dd - d[,-(1:4)] ; table(rownames(dd)[row(dd)], unlist(dd)) AA AG CC CT GA GG GT TC TG TT 27412 29 10 0 0 13 1 0 0 0 0 27413 0 0 4 9 0 0 0 12 0 28 27414 0 0 0 0 0 0 0 0 0 53 27415 0 0 53 0 0 0 0 0 0 0 ... 27430 46 3 0 0 2 2 0 0 0 0 27431 19 15 0 0 15 4 0 0 0 0 table() is pretty quick. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Yao He Sent: Wednesday, January 09, 2013 4:04 PM To: jim holtman Cc: R help Subject: Re: [R] how to count A, C, T, G in each row in a big data.frame? In fact I want to calculate the gene frequency of each SNP. The key problems are that: 1. my data.frame is large ,about 50,000 rows. So it is so slow to split() it by row 2 .The allele in each SNP (each row) are different.Some are A/G, some are G/C. It is a little bit embarrassed for me to handle it. Thank you for your help 2013/1/9 jim holtman jholt...@gmail.com: forgot the data. this will count the characters; you can add logic with 'table' to count groups x - structure(list(name = c(Gga_rs10722041, Gga_rs10722249, Gga_rs10722565, Gga_rs10723082, Gga_rs10723993, Gga_rs10724555, Gga_rs10726238, Gga_rs10726461, Gga_rs10726774, Gga_rs10726967, Gga_rs10727581, Gga_rs10728004, Gga_rs10728156, Gga_rs10728177, Gga_rs10728373, Gga_rs10728585, Gga_rs10729598, Gga_rs10729643, Gga_rs10729685, Gga_rs10729827), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L, 20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L, 36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L, 2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L, 3619538L), strand = c(+, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +), X2353 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AG, AG, AG, TT, CC, AG, CC, AA, GG, GG), X2409 = c(AA, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, AG, AG, TT, CC, AG, CC, AA, AG, GA), X2500 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, GT, CT, GG, CC, AA, AA, AA), X2598 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AA, AG, GG, TT, CC, AG, TC, AA, AA, AG), X2610 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GA, GG, TT, CC, GA, CC, AA, AA, GA), X2300 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, AA, AG, TT, TC, AA, TC, AA, AG, AA), X2507 = c(AG, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GA, GG, TT, TC, GG, CC, AA, GA, AG), X2530 = c(AG, TC, TT, CC, TC, CC, CC, TT, CC, GG, AA, GG, GG, TT, CC, GG, CC, AA, AA, AA), X2327 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GG, GG, TT, TC, GG, CC, AA, AA, AA), X2389 = c(AA, CC, TT, CC, CC, CC, CC, TT, CC, AG, GG, AG, GG, TT, TC, AG, CC, AA, AA, AA), X2408 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GA, GG, TT, CC, GA, CC, AA, AA, AG), X2463 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, TT, CT, GG, CC, AA, AA, GA), X2420 = c(GA, TC, TT, CC, TC, CC, CC, TT, CC, GG, AG, GG, GG, TG, TT, GG, CT, AA, AA, AA), X2563 = c(GA, CC, TT, CC, TC, CC, CC, TT, CC, GG, GA, GG, GG, GT, TT, GG, CT, AA, AA, AA), X2462 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AA, GG, GG, GT, TC, GG, CC, AA, AA, AA), X2292 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, AA, GG, TG, TC, AA, TC, AA, AA, AA), X2405 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, AG, GG, TG, TT, AA, CT, AA, AA, AA), X2543 = c(AA, TC, TT, CC, TC, CC, CC, TT, CC, GA, GA, GA, GG, TT, CT, GA, TT, AA, AA, GG), X2557 = c(AG, CT, TT, CC, CT, CC, CC, TT, CC, GG, AG, GA, GG, GT, CT, GA, CT, AA, AA, AG), X2583 = c(GA, CT, TT, CC, CT, CC, CC, TT, CC, GG, GA, GG, GG, GG, CT, GA, CT, AA, AA, AG), X2322 = c(AG, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, GT, TT, GG, CC, AA, AA, GA), X2535 = c(AA, TC, TT, CC, TT, CC, CC, TT, CC, GG, GA, GG, GG, TT, CC, GG, CC, AA, AA, AG), X2536 = c(GA, TC, TT, CC, TC, CC, CC, TT, CC, GG, GG, AG, GG, TT, TC, AG, TC, AA, AA, GA), X2581 = c(AG, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, GA, GG, TT, CC, GA, CT, AA, AA, AG), X2570 = c(AA, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, GG, GG, TT, TC, GG, CC, AA, AA, GG), X2476 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG
Re: [R] how to count A, C, T, G in each row in a big data.frame?
Hi arun Then how could spilt them and get a table of letters count such as: id AA AG CC CT GA GG GT TC TG TT id A T C G #1 27412 81 0 0 25 #2 27413 0 77 29 0 Thanks 2013/1/10 arun smartpink...@yahoo.com: Hi Yao, You could also use: library(reshape2) dd-dat1[,-(1:4)] res-dcast(melt(within(dd,{id=row.names(dd)}),id.var=id),id~value,length) head(res) # id AA AG CC CT GA GG GT TC TG TT #1 27412 29 10 0 0 13 1 0 0 0 0 #2 27413 0 0 4 9 0 0 0 12 0 28 #3 27414 0 0 0 0 0 0 0 0 0 53 #4 27415 0 0 53 0 0 0 0 0 0 0 #5 27416 0 0 3 9 0 0 0 12 0 29 #6 27417 0 0 53 0 0 0 0 0 0 0 #Just for comparison: dat2- dat1[rep(row.names(dat1),2000),] nrow(dat2) #[1] 4 row.names(dat2)-1:4 dd - dat2[,-(1:4)] system.time(res1- table(rownames(dd)[row(dd)], unlist(dd))) # user system elapsed # 5.840 0.104 5.954 system.time(res2 - dcast(melt(within(dd,{id=row.names(dd)}),id.var=id),id~value,length)) # user system elapsed # 3.100 0.064 3.167 head(res1,3) # AA AG CC CT GA GG GT TC TG TT # 1 29 10 0 0 13 1 0 0 0 0 # 10 0 4 0 0 6 43 0 0 0 0 # 100 19 15 0 0 15 4 0 0 0 0 head(res2,3) # id AA AG CC CT GA GG GT TC TG TT #1 1 29 10 0 0 13 1 0 0 0 0 #2 10 0 4 0 0 6 43 0 0 0 0 #3 100 19 15 0 0 15 4 0 0 0 0 A.K. - Original Message - From: Yao He yao.h.1...@gmail.com To: R help r-help@r-project.org Cc: Sent: Wednesday, January 9, 2013 9:23 AM Subject: [R] how to count A,C,T,G in each row in a big data.frame? Dear All I have a data.frame like that: structure(list(name = c(Gga_rs10722041, Gga_rs10722249, Gga_rs10722565, Gga_rs10723082, Gga_rs10723993, Gga_rs10724555, Gga_rs10726238, Gga_rs10726461, Gga_rs10726774, Gga_rs10726967, Gga_rs10727581, Gga_rs10728004, Gga_rs10728156, Gga_rs10728177, Gga_rs10728373, Gga_rs10728585, Gga_rs10729598, Gga_rs10729643, Gga_rs10729685, Gga_rs10729827), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L, 20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L, 36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L, 2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L, 3619538L), strand = c(+, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +), X2353 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AG, AG, AG, TT, CC, AG, CC, AA, GG, GG), X2409 = c(AA, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, AG, AG, TT, CC, AG, CC, AA, AG, GA), X2500 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, GT, CT, GG, CC, AA, AA, AA), X2598 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AA, AG, GG, TT, CC, AG, TC, AA, AA, AG), X2610 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GA, GG, TT, CC, GA, CC, AA, AA, GA), X2300 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, AA, AG, TT, TC, AA, TC, AA, AG, AA), X2507 = c(AG, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GA, GG, TT, TC, GG, CC, AA, GA, AG), X2530 = c(AG, TC, TT, CC, TC, CC, CC, TT, CC, GG, AA, GG, GG, TT, CC, GG, CC, AA, AA, AA), X2327 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GG, GG, TT, TC, GG, CC, AA, AA, AA), X2389 = c(AA, CC, TT, CC, CC, CC, CC, TT, CC, AG, GG, AG, GG, TT, TC, AG, CC, AA, AA, AA), X2408 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, GA, GG, TT, CC, GA, CC, AA, AA, AG), X2463 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, TT, CT, GG, CC, AA, AA, GA), X2420 = c(GA, TC, TT, CC, TC, CC, CC, TT, CC, GG, AG, GG, GG, TG, TT, GG, CT, AA, AA, AA), X2563 = c(GA, CC, TT, CC, TC, CC, CC, TT, CC, GG, GA, GG, GG, GT, TT, GG, CT, AA, AA, AA), X2462 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, AA, GG, GG, GT, TC, GG, CC, AA, AA, AA), X2292 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GA, AA, GG, TG, TC, AA, TC, AA, AA, AA), X2405 = c(GA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, AG, GG, TG, TT, AA, CT, AA, AA, AA), X2543 = c(AA, TC, TT, CC, TC, CC, CC, TT, CC, GA, GA, GA, GG, TT, CT, GA, TT, AA, AA, GG), X2557 = c(AG, CT, TT, CC, CT, CC, CC, TT, CC, GG, AG, GA, GG, GT, CT, GA, CT, AA, AA, AG), X2583 = c(GA, CT, TT, CC, CT, CC, CC, TT, CC, GG, GA, GG, GG, GG, CT, GA, CT, AA, AA, AG), X2322 = c(AG, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG, GG, GG, GT, TT, GG, CC, AA, AA, GA), X2535 = c(AA, TC, TT, CC, TT, CC, CC, TT, CC, GG, GA, GG, GG, TT, CC, GG, CC, AA, AA, AG), X2536 = c(GA, TC, TT, CC, TC, CC, CC, TT, CC, GG, GG, AG, GG, TT, TC, AG, TC, AA, AA, GA), X2581 = c(AG, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, GA, GG, TT, CC, GA, CT, AA, AA, AG), X2570 = c(AA, CT, TT, CC, CT, CC, CC, TT, CC, GG, GG, GG, GG, TT, TC, GG, CC, AA, AA, GG), X2476 = c(AA, TT, TT, CC, TT, CC, CC, TT, CC, GG, GG
Re: [R] ggplot not showing all the years on the x-axis
Hi,this is a question about how to set the scale,try this add a scale_x_discrete() like that: plot - tmpplot + geom_line()+scale_x_continuous(breaks=ii) Yao He 2013/1/8 Francesco Sarracino f.sarrac...@gmail.com: Dear R helpers, I am currently having hard time fixing the values on the x-axis of a plot with ggplot: even though I have 12 years, ggplot plots only 3 of them. Here is my example: library(ggplot2) ii - 2000:2011 ss - rnorm(12,0,1) pm - data.frame(ii,ss) tmpplot - ggplot(pm, aes(x = ii, y = ss)) plot - tmpplot + geom_line() plot In my case, ggplot reports on the year 2000, 2004 and 2008 on the x-axis, but I'd like to have all the years from 2000 to 2011. I know how to fix this with the standard plot in R, but for consistency I'd like to use ggplot. Can anyone help? thanks in advance, f. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- — Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to aggregate T-test result in an elegant way?
Hi, arun I'm so sorry for that isn't helpful. One of question is that I don't know how to subset a small part as it is a 3-dimension array so I just show the structure of that. I tried dput() to a file , then what should I do for subsetting it? Another question is : My rawdata is a melt dataframe like that: IID O2 variablevalue 1 TWF2H5 13% EW.INCU 49.38 2 TWF2H6 13% EW.INCU 48.02 3 TWF2H19 13% EW.INCU51.44 280 TWF2H10113% EW.17.5 42.26 281 TWF2H10513% EW.17.5 43.52 282 TWF2H10613% EW.17.5 42.83 472 TWF2N10221% EW.17.5 45.97 473 TWF2N10421% EW.17.543.32 474 TWF2N10621% EW.17.5 48.63 689 TWF2N2 21% EMW19.57 690 TWF2N6 21% EMW18.07 691 TWF2N10 21% EMW 15.4 491 TWF2H5 13%EMW 15.61 492 TWF2H6 13% EMW 13.41 493 TWF2H19 13% EMW 14.03 199 TWF2N2 21% EW.INCU 48.69 200 TWF2N6 21% EW.INCU 50.52 201 TWF2N10 21% EW.INCU 42.04 if you meet a t-test task as I described , is that generate a high-dimension array a good way ? Thank you! Yao He 2013/1/7 arun smartpink...@yahoo.com: HI, I tried to create an example dataset (as you didn't provide the data). set.seed(25) a-array(sample(1:50,60,replace=TRUE),dim=c(2,10,3)) dimnames(a)[[1]]-c(13%,21%) dimnames(a)[[2]]-paste(TWF2H,101:110,sep=) dimnames(a)[[3]]-c(EW.INCU,EW.17.5,EMW) str(a) # int [1:2, 1:10, 1:3] 21 35 8 45 7 50 32 17 4 15 ... #- attr(*, dimnames)=List of 3 #..$ : chr [1:2] 13% 21% .#.$ : chr [1:10] TWF2H101 TWF2H102 TWF2H103 TWF2H104 ... #..$ : chr [1:3] EW.INCU EW.17.5 EMW res-lapply(lapply(seq_len(dim(a)[3]),function(i) t.test(a[dimnames(a)[[1]][1],,i],a[dimnames(a)[[1]][2],,i])),function(x) data.frame(mean=x$estimate,p.value=x$p.value)) res1-do.call(rbind,res) row.names(res1)[grep(mean of x,row.names(res1))]-gsub((.*\\.).*$,\\113%,row.names(res1)[grep(mean of x,row.names(res1))]) row.names(res1)[grep(mean of y,row.names(res1))]-gsub((.*\\.).*$,\\121%,row.names(res1)[grep(mean of y,row.names(res1))]) res1 #mean p.value #EW.INCU.13% 22.3 0.2754842 #EW.INCU.21% 29.3 0.2754842 #EW.17.5.13% 20.5 0.4705772 #EW.17.5.21% 16.0 0.4705772 #EMW.13% 23.9 0.9638679 #EMW.21% 24.2 0.9638679 A.K. - Original Message - From: Yao He yao.h.1...@gmail.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Sunday, January 6, 2013 11:21 PM Subject: Re: [R] how to aggregate T-test result in an elegant way? Thank you,it is really helpful everytime. I didn't provide any example data because I thought it is just a question of how to report t.test() result in R. However,as you say,it is better to show more details for finding an elegant way In fact I generate a 3-dimension array like that: str(a) num [1:2, 1:245, 1:3] 47.5 NA 48.9 NA 47.5 ... - attr(*, dimnames)=List of 3 ..$ : chr [1:2] 13% 21% ..$ : chr [1:245] TWF2H101 TWF2H105 TWF2H106 TWF2H110 ... ..$ : chr [1:3] EW.INCU EW.17.5 EMW I want to do two sample mean t-test between 13% and 21% for each variable EW.INCU EW.17.5 EMW. So I try these codes: variable-dimnames(a)[[3]] O2-dimnames(a)[[1]] for (i in variable) { print(i) print(O2[1]) print(O2[2]) print(t.test(a[O2[1],,i],a[O2[2],,i],na.rm=T)) } I don't think it is an elegant way and I am inexperience to report raw result. Could you give me more help? Yao He 2013/1/7 arun smartpink...@yahoo.com: Hi, You didn't provide any example data. So, I am not sure whether this helps. set.seed(15) dat1-data.frame(A=sample(10:20,5,replace=TRUE),B=sample(18:28,5,replace=TRUE),C=sample(25:35,5,replace=TRUE),D=sample(20:30,5,replace=TRUE)) res-lapply(lapply(seq_len(ncol(dat2)),function(i) t.test(dat2[,i],dat1[,1],paired=TRUE)),function(x) data.frame(meanDiff=x$estimate,p.value=x$p.value))# paired names(res)-paste(A,LETTERS[2:4],sep=) res- do.call(rbind,res) res # meanDiff p.value #AB 9.4 0.021389577 #AC 15.0 0.002570261 #AD 10.6 0.003971604 #or res1-lapply(lapply(seq_len(ncol(dat2)),function(i) t.test(dat2[,i],dat1[,1],paired=FALSE)),function(x) data.frame(mean=x$estimate,p.value=x$p.value)) names(res1)-paste(A,LETTERS[2:4],sep=) res1-do.call(rbind,res1) row.names(res1)[grep(mean of y,row.names(res1))]-gsub((.*\\.).*,\\1A,row.names(res1)[grep(mean of y,row.names(res1))]) row.names(res1)[grep(mean of x,row.names(res1))]-gsub((\\w)(\\w)(\\.).*,\\1\\2\\3\\2,row.names(res1)[grep(mean of x,row.names(res1))]) res1 # mean p.value #AB.B 25.2 1.299192e-03 #AB.A 15.8 1.299192e-03 #AC.C 30.8 5.145519e-05 #AC.A 15.8 5.145519e-05 #AD.D 26.4 1.381339e-03 #AD.A 15.8 1.381339e-03 A.K. - Original Message - From: Yao He yao.h.1...@gmail.com To: r-help@r
Re: [R] how to aggregate T-test result in an elegant way?
Hi,arun Yes , I just want to do the t.test I think maybe it is not necessary to generate a 3D array from the raw data.frame by acast() at first Thanks a lot 2013/1/7 arun smartpink...@yahoo.com: Hi Yao, It's okay. How did you generate the 3 D array? Using ?acast() I am not sure I understand your question if you meet a t-test task as I described , is that generate a high-dimension array a good way ? Do you want to do the t-test in the melt dataset? b- read.table(text= IDO2variablevalue 1TWF2H513% EW.INCU49.38 2TWF2H613% EW.INCU48.02 3TWF2H1913%EW.INCU51.44 280TWF2H10113% EW.17.542.26 281TWF2H10513%EW.17.543.52 282TWF2H10613% EW.17.542.83 472TWF2N10221% EW.17.545.97 473TWF2N10421%EW.17.5 43.32 474TWF2N10621% EW.17.548.63 689TWF2N221% EMW19.57 690TWF2N621%EMW18.07 691TWF2N1021%EMW15.4 491TWF2H513%EMW15.61 492TWF2H613%EMW13.41 493TWF2H1913%EMW14.03 199TWF2N221%EW.INCU48.69 200TWF2N621%EW.INCU50.52 201TWF2N1021%EW.INCU42.04 ,sep=,header=TRUE,stringsAsFactors=FALSE) res-lapply(lapply(split(b,b$variable),function(x) t.test(x$value[x$O2==13%],x$value[x$O2==21%])),function(x) data.frame(mean=x$estimate,p.value=x$p.value)) res1-do.call(rbind,res) row.names(res1)[grep(mean of x,row.names(res1))]-gsub((.*\\.).*$,\\113%,row.names(res1)[grep(mean of x,row.names(res1))]) row.names(res1)[grep(mean of y,row.names(res1))]-gsub((.*\\.).*$,\\121%,row.names(res1)[grep(mean of y,row.names(res1))]) res1 #meanp.value #EMW.13% 14.35000 0.09355374 #EMW.21% 17.68000 0.09355374 #EW.17.5.13% 42.87000 0.17464018 #EW.17.5.21% 45.97333 0.17464018 #EW.INCU.13% 49.61333 0.43689727 #EW.INCU.21% 47.08333 0.43689727 A.K. - Original Message - From: Yao He yao.h.1...@gmail.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Monday, January 7, 2013 4:00 AM Subject: Re: [R] how to aggregate T-test result in an elegant way? Hi, arun I'm so sorry for that isn't helpful. One of question is that I don't know how to subset a small part as it is a 3-dimension array so I just show the structure of that. I tried dput() to a file , then what should I do for subsetting it? Another question is : My rawdata is a melt dataframe like that: IIDO2variablevalue 1TWF2H513% EW.INCU49.38 2TWF2H613% EW.INCU48.02 3TWF2H1913% EW.INCU51.44 280TWF2H10113% EW.17.542.26 281TWF2H10513% EW.17.5 43.52 282TWF2H10613% EW.17.542.83 472TWF2N10221% EW.17.545.97 473TWF2N10421% EW.17.5 43.32 474TWF2N10621% EW.17.548.63 689TWF2N221% EMW19.57 690TWF2N621% EMW18.07 691TWF2N1021%EMW15.4 491TWF2H5 13%EMW15.61 492TWF2H613%EMW13.41 493TWF2H1913%EMW14.03 199TWF2N221%EW.INCU48.69 200TWF2N621%EW.INCU50.52 201TWF2N1021%EW.INCU42.04 if you meet a t-test task as I described , is that generate a high-dimension array a good way ? Thank you! Yao He 2013/1/7 arun smartpink...@yahoo.com: HI, I tried to create an example dataset (as you didn't provide the data). set.seed(25) a-array(sample(1:50,60,replace=TRUE),dim=c(2,10,3)) dimnames(a)[[1]]-c(13%,21%) dimnames(a)[[2]]-paste(TWF2H,101:110,sep=) dimnames(a)[[3]]-c(EW.INCU,EW.17.5,EMW) str(a) # int [1:2, 1:10, 1:3] 21 35 8 45 7 50 32 17 4 15 ... #- attr(*, dimnames)=List of 3 #..$ : chr [1:2] 13% 21% .#.$ : chr [1:10] TWF2H101 TWF2H102 TWF2H103 TWF2H104 ... #..$ : chr [1:3] EW.INCU EW.17.5 EMW res-lapply(lapply(seq_len(dim(a)[3]),function(i) t.test(a[dimnames(a)[[1]][1],,i],a[dimnames(a)[[1]][2],,i])),function(x) data.frame(mean=x$estimate,p.value=x$p.value)) res1-do.call(rbind,res) row.names(res1)[grep(mean of x,row.names(res1))]-gsub((.*\\.).*$,\\113%,row.names(res1)[grep(mean of x,row.names(res1))]) row.names(res1)[grep(mean of y,row.names(res1))]-gsub((.*\\.).*$,\\121%,row.names(res1)[grep(mean of y,row.names(res1))]) res1 #mean p.value #EW.INCU.13% 22.3 0.2754842 #EW.INCU.21% 29.3 0.2754842 #EW.17.5.13% 20.5 0.4705772 #EW.17.5.21% 16.0 0.4705772 #EMW.13% 23.9 0.9638679 #EMW.21% 24.2 0.9638679 A.K. - Original Message - From: Yao He yao.h.1...@gmail.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Sunday, January 6, 2013 11:21 PM Subject: Re: [R] how to aggregate T-test result in an elegant way? Thank you,it is really helpful everytime. I didn't provide
Re: [R] how to aggregate T-test result in an elegant way?
Yes, thanks a lot for your help! Regards 2013/1/8 arun smartpink...@yahoo.com: Hi Yao, You could also have the results in a wide format: res-do.call(rbind,lapply(lapply(split(b,b$variable),function(x) t.test(x$value[x$O2==13%],x$value[x$O2==21%])),function(x) data.frame(mean13=x$estimate[1],mean21=x$estimate[2],p.value=x$p.value,CILow=x$conf.int[1],CIHigh=x$conf.int[2]))) res # mean13 mean21p.value CILowCIHigh #EMW 14.35000 17.68000 0.09355374 -7.682686 1.022686 #EW.17.5 42.87000 45.97333 0.17464018 -9.265622 3.058955 #EW.INCU 49.61333 47.08333 0.43689727 -7.119234 12.179234 A.K. - Original Message - From: Yao He yao.h.1...@gmail.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Monday, January 7, 2013 10:57 AM Subject: Re: [R] how to aggregate T-test result in an elegant way? Hi,arun Yes , I just want to do the t.test I think maybe it is not necessary to generate a 3D array from the raw data.frame by acast() at first Thanks a lot 2013/1/7 arun smartpink...@yahoo.com: Hi Yao, It's okay. How did you generate the 3 D array? Using ?acast() I am not sure I understand your question if you meet a t-test task as I described , is that generate a high-dimension array a good way ? Do you want to do the t-test in the melt dataset? b- read.table(text= IDO2variablevalue 1TWF2H513% EW.INCU49.38 2TWF2H613% EW.INCU48.02 3TWF2H1913%EW.INCU51.44 280TWF2H10113% EW.17.542.26 281TWF2H10513%EW.17.543.52 282TWF2H10613% EW.17.542.83 472TWF2N10221% EW.17.545.97 473TWF2N10421%EW.17.5 43.32 474TWF2N10621% EW.17.548.63 689TWF2N221% EMW19.57 690TWF2N621%EMW18.07 691TWF2N1021%EMW15.4 491TWF2H513%EMW15.61 492TWF2H613%EMW13.41 493TWF2H1913%EMW14.03 199TWF2N221%EW.INCU48.69 200TWF2N621%EW.INCU50.52 201TWF2N1021%EW.INCU42.04 ,sep=,header=TRUE,stringsAsFactors=FALSE) res-lapply(lapply(split(b,b$variable),function(x) t.test(x$value[x$O2==13%],x$value[x$O2==21%])),function(x) data.frame(mean=x$estimate,p.value=x$p.value)) res1-do.call(rbind,res) row.names(res1)[grep(mean of x,row.names(res1))]-gsub((.*\\.).*$,\\113%,row.names(res1)[grep(mean of x,row.names(res1))]) row.names(res1)[grep(mean of y,row.names(res1))]-gsub((.*\\.).*$,\\121%,row.names(res1)[grep(mean of y,row.names(res1))]) res1 #meanp.value #EMW.13% 14.35000 0.09355374 #EMW.21% 17.68000 0.09355374 #EW.17.5.13% 42.87000 0.17464018 #EW.17.5.21% 45.97333 0.17464018 #EW.INCU.13% 49.61333 0.43689727 #EW.INCU.21% 47.08333 0.43689727 A.K. - Original Message - From: Yao He yao.h.1...@gmail.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Monday, January 7, 2013 4:00 AM Subject: Re: [R] how to aggregate T-test result in an elegant way? Hi, arun I'm so sorry for that isn't helpful. One of question is that I don't know how to subset a small part as it is a 3-dimension array so I just show the structure of that. I tried dput() to a file , then what should I do for subsetting it? Another question is : My rawdata is a melt dataframe like that: IIDO2variablevalue 1TWF2H513% EW.INCU49.38 2TWF2H613% EW.INCU48.02 3TWF2H1913% EW.INCU51.44 280TWF2H10113% EW.17.542.26 281TWF2H10513% EW.17.5 43.52 282TWF2H10613% EW.17.542.83 472TWF2N10221% EW.17.545.97 473TWF2N10421% EW.17.5 43.32 474TWF2N10621% EW.17.548.63 689TWF2N221% EMW19.57 690TWF2N621% EMW18.07 691TWF2N1021%EMW15.4 491TWF2H5 13%EMW15.61 492TWF2H613%EMW13.41 493TWF2H1913%EMW14.03 199TWF2N221%EW.INCU48.69 200TWF2N621%EW.INCU50.52 201TWF2N1021%EW.INCU42.04 if you meet a t-test task as I described , is that generate a high-dimension array a good way ? Thank you! Yao He 2013/1/7 arun smartpink...@yahoo.com: HI, I tried to create an example dataset (as you didn't provide the data). set.seed(25) a-array(sample(1:50,60,replace=TRUE),dim=c(2,10,3)) dimnames(a)[[1]]-c(13%,21%) dimnames(a)[[2]]-paste(TWF2H,101:110,sep=) dimnames(a)[[3]]-c(EW.INCU,EW.17.5,EMW) str(a) # int [1:2, 1:10, 1:3] 21 35 8 45 7 50 32 17 4 15 ... #- attr(*, dimnames)=List of 3 #..$ : chr [1:2] 13% 21% .#.$ : chr [1:10] TWF2H101 TWF2H102 TWF2H103 TWF2H104 ... #..$ : chr [1:3] EW.INCU EW.17.5 EMW res-lapply(lapply(seq_len(dim(a)[3]),function(i) t.test(a[dimnames(a)[[1]][1],,i
[R] how to aggregate T-test result in an elegant way?
Dear all: Plan 1: I want to do serval t-test means for different variables in a loop , so I want to add all results to an object then dump() them to an text. But I don't know how to append T-test result to the object? I have already plot the barplot and I want to know an elegant way to report raw result. Can anybody give me some pieces of advice? Yao He — Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to aggregate T-test result in an elegant way?
Thank you,it is really helpful everytime. I didn't provide any example data because I thought it is just a question of how to report t.test() result in R. However,as you say,it is better to show more details for finding an elegant way In fact I generate a 3-dimension array like that: str(a) num [1:2, 1:245, 1:3] 47.5 NA 48.9 NA 47.5 ... - attr(*, dimnames)=List of 3 ..$ : chr [1:2] 13% 21% ..$ : chr [1:245] TWF2H101 TWF2H105 TWF2H106 TWF2H110 ... ..$ : chr [1:3] EW.INCU EW.17.5 EMW I want to do two sample mean t-test between 13% and 21% for each variable EW.INCU EW.17.5 EMW. So I try these codes: variable-dimnames(a)[[3]] O2-dimnames(a)[[1]] for (i in variable) { print(i) print(O2[1]) print(O2[2]) print(t.test(a[O2[1],,i],a[O2[2],,i],na.rm=T)) } I don't think it is an elegant way and I am inexperience to report raw result. Could you give me more help? Yao He 2013/1/7 arun smartpink...@yahoo.com: Hi, You didn't provide any example data. So, I am not sure whether this helps. set.seed(15) dat1-data.frame(A=sample(10:20,5,replace=TRUE),B=sample(18:28,5,replace=TRUE),C=sample(25:35,5,replace=TRUE),D=sample(20:30,5,replace=TRUE)) res-lapply(lapply(seq_len(ncol(dat2)),function(i) t.test(dat2[,i],dat1[,1],paired=TRUE)),function(x) data.frame(meanDiff=x$estimate,p.value=x$p.value))# paired names(res)-paste(A,LETTERS[2:4],sep=) res- do.call(rbind,res) res # meanDiff p.value #AB 9.4 0.021389577 #AC 15.0 0.002570261 #AD 10.6 0.003971604 #or res1-lapply(lapply(seq_len(ncol(dat2)),function(i) t.test(dat2[,i],dat1[,1],paired=FALSE)),function(x) data.frame(mean=x$estimate,p.value=x$p.value)) names(res1)-paste(A,LETTERS[2:4],sep=) res1-do.call(rbind,res1) row.names(res1)[grep(mean of y,row.names(res1))]-gsub((.*\\.).*,\\1A,row.names(res1)[grep(mean of y,row.names(res1))]) row.names(res1)[grep(mean of x,row.names(res1))]-gsub((\\w)(\\w)(\\.).*,\\1\\2\\3\\2,row.names(res1)[grep(mean of x,row.names(res1))]) res1 # mean p.value #AB.B 25.2 1.299192e-03 #AB.A 15.8 1.299192e-03 #AC.C 30.8 5.145519e-05 #AC.A 15.8 5.145519e-05 #AD.D 26.4 1.381339e-03 #AD.A 15.8 1.381339e-03 A.K. - Original Message - From: Yao He yao.h.1...@gmail.com To: r-help@r-project.org Cc: Sent: Sunday, January 6, 2013 10:20 PM Subject: [R] how to aggregate T-test result in an elegant way? Dear all: Plan 1: I want to do serval t-test means for different variables in a loop , so I want to add all results to an object then dump() them to an text. But I don't know how to append T-test result to the object? I have already plot the barplot and I want to know an elegant way to report raw result. Can anybody give me some pieces of advice? Yao He ― Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com ―― __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- ― Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com ―― __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to read different files into different objects in one time?
Dear All I have a lot of files in a directory as follows: 02-03.txt 03-04.txt 04-05.txt 05-06.txt 06-07.txt 07-08.txt 08-09.txt 09-10.txt G0.txt G1.txt raw_ped.txt .. I want to read them into different objects according to their filenames,such as: 02-03-read.table(02-03.txt,header=T) 03-04-read.table(03-04.txt,header=T) I don't want to type hundreds of read.table(),so how I read it in one time? I think the core problem is that I can't create different objects' name in the use of loop or sapply() ,but there may be a better way to do what I want. Thanks a lot Yao He Yao He -- — Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to handle NA values in aggregate()
Dear All: I am trying to calculate four columns' means in a dataframe like this: FID MID IID EW_INCU EW_17.5 EMWEEratio 1 4621 TWF2H545.26NA 15.61 NA 1 4621 TWF2H648.0244.09 13.41 0.3041506 2 4630 TWF2H19 51.44 47.81 NA NA 2 4631 TWF2H21 NA 52.72 16.70 0.3167678 2 4632 TWF2H22 55.70 50.45 16.48 0.3266601 2 4633 TWF2H23 44.42 40.89 12.96 0.3169479 I try this code aggregate(df[,4:7],df[,1],mean) But I couldn't set the agrument na.rm=T in the mean() function,so the results are all NAs Please tell me how to handle NA values in the use of aggregate() Thanks a lot Yao He — Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to select a subset data to do a barplot in ggplot2
Hi,everybody I have a dataframe like this FID IID STATUS 14621live 14628dead 24631live 24632live 24633live 24634live 64675live 64679dead 104716dead 104719live 104721dead 114726live 114728nosperm 114730nosperm 124732live 174783live 174783live 174784live I just want a barblot to count live or dead in every FID, and fill the bar with different colour. I try these codes: p-ggplot(data,aes(x=FID)); p+geom_bar(aes(x=factor(FID),y=..count..,fill=STATUS)) But how could I exclude nosperm or other levels just in the use of ggplot2 without generating another dataframe Thanks a lot Yao He Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com ming...@vt.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.