[R] Is that an efficient way to find the overlapped , upstream and downstream rangess for a bunch of rangess

2016-04-05 Thread Yao He
I do have a bunch of genes ( nearly ~5)  from the whole genome, which
read in genomic ranges

A range(gene) can be seem as an observation has three columns chromosome,
start and end, like that

   seqnames start end width strand

gene1 chr1 1   5 5  +

gene2 chr110  15 6  +

gene3 chr112  17 6  +

gene4 chr120  25 6  +

gene5 chr130  4011  +

I just wondering is there an efficient way to find *overlapped, upstream
and downstream genes for each gene in the granges*

For example, assuming all_genes_gr is a ~5 genes genomic range, the
result I want like belows:
gene_name upstream_gene downstream_gene overlapped_gene
gene1 NA gene2 NA
gene2 gene1 gene4 gene3
gene3 gene1 gene4 gene2
gene4 gene3 gene5 NA

Currently ,  the strategy I use is like that,

library(GenomicRanges)

find_overlapped_gene <- function(idx, all_genes_gr) {
  #cat(idx, "\n")
  curr_gene <- all_genes_gr[idx]
  other_genes <- all_genes_gr[-idx]
  n <- countOverlaps(curr_gene, other_genes)
  gene <- subsetByOverlaps(curr_gene, other_genes)
  return(list(n, gene))
}​

system.time(lapply(1:100, function(idx)  find_overlapped_gene(idx,
all_genes_gr)))

However, for 100 genes, it use nearly ~8s by system.time().That means if I
had 5 genes, nearly one hour for just find overlapped gene.

I am just wondering any algorithm or strategy to do that efficiently,
perhaps 5 genes in ~10min or even less

Yao He

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Transform a list of multiple to a data.frame which I want

2015-02-01 Thread Yao He
Dear all:

I have a list like that,which is a standard str_locate_all() function 
(stringr package) output:
$K
   start end
$GSEGTCSCSSK
   start end
[1,] 6   6
[2,] 8   8
$GFSTTCPAHVDDLTPEQVLDGDVNELMDVVLHHVPEAK
   start end
[1,] 6   6
$LVECIGQELIFLLPNK
   start end
[1,] 4   4
$NFK
   start end
$HR
   start end
$AYASLFR
   start end

I want to transform this list like that:

ID   start.1  start.2 
K   NA  NA
GSEGTCSCSSK 6 8
GFSTTCPAHVDDLTPEQVLDGDVNELMDVVLHHVPEAK 6 NA
LVECIGQELIFLLPNK 4 NA
NFK NA NA
HR NA NA
AYASLFR NA NA

I have already tried to use t() , lapply() but I think it is hard to handle the 
NA value and different rows in every matrix 

Thanks in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Do association study based on mixed linear model

2013-03-19 Thread Yao He
Dear All

I want to do association study based on mixed linear model,

My model not only includes serval fixed effects and random effects but
also incorporates some covariates such as birth weight.
Otherwise, the size of the data are about 180 individuals and 12
variables and 6 Fixed effect estimates

As asreml-R is not free ,is there any packages for my study?
I heard  nlme or lme4 but I'm not sure whether they could incorporate
covariates and what about their computational efficiency?

Thanks for you recommendation

Yao He
—
Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com
——

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to do association study based on mixed linear model

2013-03-19 Thread Yao He
Dear All:

I want to do association study based on mixed linear model,

My model not only includes serval fixed effects and random effects but
also incorporates some covariates such as birth weight.
Otherwise, the size of the data are about 180 individuals and 12
variables and 6 Fixed effect estimates

As asreml-R is not free ,is there any packages for my study?
I heard  nlme or lme4 but I'm not sure whether they could incorporate
covariates and what about their computational efficiency?

Thanks for you recommendation

Yao He
—
Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com
——

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to transpose it in a fast way?

2013-03-13 Thread Yao He
Thanks for everybody's help!

I learn a lot from this discuss!



2013/3/10 jim holtman jholt...@gmail.com:
 Did you check out the 'colbycol' package.

 On Fri, Mar 8, 2013 at 5:46 PM, Martin Morgan mtmor...@fhcrc.org wrote:

 On 03/08/2013 06:01 AM, Jan van der Laan wrote:


 You could use the fact that scan reads the data rowwise, and the fact that
 arrays are stored columnwise:

 # generate a small example dataset
 exampl - array(letters[1:25], dim=c(5,5))
 write.table(exampl, file=example.dat, row.names=FALSE. col.names=FALSE,
  sep=\t, quote=FALSE)

 # and read...
 d - scan(example.dat, what=character())
 d - array(d, dim=c(5,5))

 t(exampl) == d


 Although this is probably faster, it doesn't help with the large size.
 You could
 used the n option of scan to read chunks/blocks and feed those to, for
 example,
 an ff array (which you ideally have preallocated).


 I think it's worth asking what the overall goal is; all we get from this
 exercise is another large file that we can't easily manipulate in R!

 But nothing like a little challenge. The idea I think would be to
 transpose in chunks of rows by scanning in some number of rows and writing
 to a temporary file

 tpose1 - function(fin, nrowPerChunk, ncol) {
 v - scan(fin, character(), nmax=ncol * nrowPerChunk)
 m - matrix(v, ncol=ncol, byrow=TRUE)
 fout - tempfile()
 write(m, fout, nrow(m), append=TRUE)
 fout
 }

 Apparently the data is 60k x 60k, so we could maybe easily read 60k x 10k
 at a time from some file fl - big.txt

 ncol - 6L
 nrowPerChunk - 1L
 nChunks - ncol / nrowPerChunk

 fin - file(fl); open(fin)
 fls - replicate(nChunks, tpose1(fin, nrowPerChunk, ncol))
 close(fin)

 'fls' is now a vector of file paths, each containing a transposed slice of
 the matrix. The next task is to splice these together. We could do this by
 taking a slice of rows from each file, cbind'ing them together, and writing
 to an output

 splice - function(fout, cons, nrowPerChunk, ncol) {
 slices - lapply(cons, function(con) {
 v - scan(con, character(), nmax=nrowPerChunk * ncol)
 matrix(v, nrowPerChunk, byrow=TRUE)
 })
 m - do.call(cbind, slices)
 write(t(m), fout, ncol(m), append=TRUE)
 }

 We'd need to use open connections as inputs and output

 cons - lapply(fls, file); for (con in cons) open(con)
 fout - file(big_transposed.txt); open(fout, w)
 xx - replicate(nChunks, splice(fout, cons, nrowPerChunk,
 nrowPerChunk))
 for (con in cons) close(con)
 close(fout)

 As another approach, it looks like the data are from genotypes. If they
 really only consist of pairs of A, C, G, T, then two pairs e.g., 'AA' 'CT'
 could be encoded as a single byte

 alf - c(A, C, G, T)
 nms - outer(alf, alf, paste0)
 map - outer(setNames(as.raw(0:15), nms),
  setNames(as.raw(bitwShiftL(0:**15, 4)), nms),
  |)

 with e.g.,

  map[matrix(c(AA, CT), ncol=2)]
 [1] d0

 This translates the problem of representing the 60k x 60k array as a 3.6
 billion element vector of 60k * 60k * 8 bytes (approx. 30 Gbytes) to one of
 60k x 30k = 1.8 billion elements (fits in R-2.15 vectors) of approx 1.8
 Gbyte (probably usable in an 8 Gbyte laptop).

 Personally, I would probably put this data in a netcdf / rdf5 file.
 Perhaps I'd use snpStats or GWAStools in Bioconductor
 http://bioconductor.org.

 Martin


 HTH,

 Jan




 peter dalgaard pda...@gmail.com schreef:

  On Mar 7, 2013, at 01:18 , Yao He wrote:

  Dear all:

 I have a big data file of 6 columns and 6 rows like that:

 AA AC AA AA ...AT
 CC CC CT CT...TC
 ..
 .

 I want to transpose it and the output is a new like that
 AA CC 
 AC CC
 AA CT.
 AA CT.
 
 
 AT TC.

 The keypoint is  I can't read it into R by read.table() because the
 data is too large,so I try that:
 c-file(silygenotype.txt,r**)
 geno_t-list()
 repeat{
  line-readLines(c,n=1)
  if (length(line)==0)break  #end of file
  line-unlist(strsplit(line,\**t))
 geno_t-cbind(geno_t,line)
 }
 write.table(geno_t,xxx.txt)

 It works but it is too slow ,how to optimize it???



 As others have pointed out, that's a lot of data!

 You seem to have the right idea: If you read the columns line by line
 there is
 nothing to transpose. A couple of points, though:

 - The cbind() is a potential performance hit since it copies the list
 every
 time around. geno_t - vector(list, 6) and then
 geno_t[[i]] - etc

 - You might use scan() instead of readLines, strsplit

 - Perhaps consider the data type as you seem to be reading strings with
 16
 possible values (I suspect that R already optimizes string storage to
 make
 this point moot, though.)

 --
 Peter Dalgaard, Professor
 Center for Statistics, Copenhagen Business School

[R] how to read a df like that and transform it?

2013-01-23 Thread Yao He
Dear all

I have a data.frame like that :

father  mother  num_daughterdaughter
291 39060   NULL
275 42190   NULL
273 42361   49410
281 41631   49408
274 42261   49406
295 38692   49403
49404
287 41130   NULL
295 38711   49401
292 38954   49396
49397
49398
49399
291 39003   49392

How to read it into R and transform it like that:

father mother   num_daughter   daughter1  daughter2  daughter3 daughter4
291 39060   NULL
275 42190   NULL
273 42361   49410
281 41631   49408
274 42261   49406
295 38692   49403  49404
287 41130   NULL
295 38711   49401
292 38954   49396  4939749398   49399
291 39003   49392

library (plyr) and library (reshape2) and other good packages are  OK for me.

Thanks a lot!

Yao He
—
Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com
——

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to generate a matrix by an my data.frame

2013-01-10 Thread Yao He
Thanks a lot
it works!

2013/1/11 Rui Barradas ruipbarra...@sapo.pt:
 Hello,

 Here are two ways.

 dat - read.table(text = 

 id1id2   value
 2353  2353  0.096313
 2353  2409  0.301773
 [...etc...]

 2356  2356  0
 2356  2611  0
 2611  2611  0
 , header = TRUE)

 mat1 - matrix(nrow = 53, ncol = 53)  # initialize with NA's
 mat1[upper.tri(mat1, diag = TRUE)] - dat$value

 mat2 - matrix(0, nrow = 53, ncol = 53)  # initialize with zeros
 mat2[upper.tri(mat2, diag = TRUE)] - dat$value


 Hope this helps,

 Rui Barradas
 Em 10-01-2013 15:21, Yao He escreveu:

 Dear All

 It is a little hard to give a good small example of my question,so I
 will show  the full data on the bottom and the attachment.Maybe some
 one could tell me an appropriate way
 to show it.I'm sorry for the inconvenience.


 Q:How to generate a  53*53 diagonal matrix by my data
 Some problems confused me are that:
 1.Since it is a  diagonal matrix,I have tried to transform col1 and
 col2 to rowindex and colindex ,but I don't know how to generate matrix
 by its value's index
 2. As you see, the number of  2353 corresponding to other ids in col2
 is 53,however,the number of 2409 corresponding to other ids in col2 is
 52 and 2500 corresponding to 51 values and so on,so it is hard to use
 matrix() to generate it

 id1id2   value
 2353  23530.096313
 2353  24090.301773
 2353  25000.169518
 2353  25980.11274
 2353  26100.107414
 2353  23000.034492
 2353  25070.037521
 2353  25300.064125
 2353  23270.029259
 2353  23890.036423
 2353  24080.029259
 2353  24630.036423
 2353  24200.04409
 2353  25630.055038
 2353  24620.046478
 2353  22920.036369
 2353  24050.036369
 2353  25430.053413
 2353  25570.058151
 2353  25830.081512
 2353  23220.044373
 2353  25350.04847
 2353  25360.035538
 2353  25810.035538
 2353  25700.07711
 2353  24760.047081
 2353  25340.047081
 2353  22800.088264
 2353  23160.073608
 2353  23390.067307
 2353  23310.061172
 2353  23430.060425
 2353  23520.041153
 2353  22930.040764
 2353  23380.045128
 2353  24490.040764
 2353  22960.061333
 2353  24530.046074
 2353  24600.060387
 2353  24740.060387
 2353  26030.060387
 2353  22820.048065
 2353  23130.05584
 2353  25380.050873
 2353  25220.065727
 2353  24890.041023
 2353  25640.039696
 2353  25940.056946
 2353  22740.060875
 2353  24510.037468
 2353  23210
 2353  23560
 2353  26110
 2409  24090.096313
 2409  25000.169518
 2409  25980.11274
 2409  26100.107414
 2409  23000.034492
 2409  25070.037521
 2409  25300.064125
 2409  23270.029259
 2409  23890.036423
 2409  24080.029259
 2409  24630.036423
 2409  24200.04409
 2409  25630.055038
 2409  24620.046478
 2409  22920.036369
 2409  24050.036369
 2409  25430.053413
 2409  25570.058151
 2409  25830.081512
 2409  23220.044373
 2409  25350.04847
 2409  25360.035538
 2409  25810.035538
 2409  25700.07711
 2409  24760.047081
 2409  25340.047081
 2409  22800.088264
 2409  23160.073608
 2409  23390.067307
 2409  23310.061172
 2409  23430.060425
 2409  23520.041153
 2409  22930.040764
 2409  23380.045128
 2409  24490.040764
 2409  22960.061333
 2409  24530.046074
 2409  24600.060387
 2409  24740.060387
 2409  26030.060387
 2409  22820.048065
 2409  23130.05584
 2409  25380.050873
 2409  25220.065727
 2409  24890.041023
 2409  25640.039696
 2409  25940.056946
 2409  22740.060875
 2409  24510.037468
 2409  23210
 2409  23560
 2409  26110
 2500  25000.048615
 2500  25980.051979
 2500  26100.041031
 2500  23000.032974
 2500  25070.052788
 2500  25300.041435
 2500  23270.038071
 2500  23890.051659
 2500  24080.038071
 2500  24630.051659
 2500  24200.052635
 2500  25630.07872
 2500  24620.048615
 2500  22920.044365
 2500  24050.044365
 2500  25430.04277
 2500  25570.051109
 2500  25830.047409
 2500  23220.054512
 2500  25350.039368
 2500  25360.041763
 2500  25810.041763
 2500  25700.063148
 2500  24760.043755
 2500  25340.043755
 2500  22800.063164
 2500  23160.083961
 2500  23390.074127
 2500  23310.051094
 2500  23430.060066
 2500  23520.038208
 2500  22930.048698
 2500  23380.048218
 2500  24490.048698
 2500  22960.073212
 2500  24530.070595
 2500  24600.073677
 2500  24740.073677
 2500  26030.073677
 2500  22820.073677
 2500  23130.068443
 2500  25380.079865
 2500  25220.059723
 2500  24890.049873
 2500  25640.087639
 2500  25940.05175
 2500  22740.043396
 2500  24510.046532
 2500  23210
 2500  2356

Re: [R] how to count A, C, T, G in each row in a big data.frame?

2013-01-09 Thread Yao He
,
 TT, TT, CC, TT, CC, CC, TT, CC, AG, GG,
 GA, GG, GT, CT, GA, CT, AA, AA, GA), X2460 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, AG, GG,
 GG, GG, TG, CT, GG, CC, AA, AA, AA), X2474 = c(AA,
 TC, TT, CC, TC, CC, CC, TT, CC, GA, AG,
 AG, GG, TT, CC, AG, TC, AA, AA, GA), X2603 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, AG,
 AG, GG, TT, CC, AG, CC, AA, AA, GA), X2282 = c(GA,
 TC, TT, CC, TC, CC, CC, TT, CC, GG, GG,
 AA, GG, TT, TT, AA, CC, AA, AA, GA), X2313 = c(AG,
 CT, TT, CC, CT, CC, CC, TT, CC, GG, AG,
 GA, GG, GT, CC, GA, CT, AA, AA, AA), X2538 = c(AA,
 CT, TT, CC, CT, CC, CC, TT, CC, GG, AA,
 AG, GG, TG, CC, AG, CC, AA, AA, AA), X2522 = c(AG,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
 GA, GG, TT, TC, GG, CC, AA, AG, GA), X2489 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
 GG, GG, GT, TC, AG, CC, AA, AA, AG), X2564 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GA, GG,
 GG, GG, TT, CC, AA, CT, AA, AA, AA), X2594 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, AA,
 AG, GG, TT, TC, AG, TC, AA, AA, AG), X2274 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
 GG, GG, TT, CT, GG, CC, AA, AA, GA), X2451 = c(AG,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
 GG, GG, TT, CT, GG, CC, AA, AA, GA), X2321 = c(GG,
 CT, TT, CC, CT, CC, CC, TT, CC, GG, AG,
 AA, GG, TT, TT, AA, CC, AA, AA, AA), X2356 = c(AA,
 TC, TT, CC, TC, CC, CC, TT, CC, GA, AG,
 AG, GG, TG, TC, AG, TT, AA, AA, AA), X2611 = c(AG,
 CT, TT, CC, CT, CC, CC, TT, CC, GG, GG,
 GA, GG, TT, CT, GA, TT, AA, AA, AG)), .Names = c(name,
 chr, pos, strand, X2353, X2409, X2500, X2598, X2610,
 X2300, X2507, X2530, X2327, X2389, X2408, X2463,
 X2420, X2563, X2462, X2292, X2405, X2543, X2557,
 X2583, X2322, X2535, X2536, X2581, X2570, X2476,
 X2534, X2280, X2316, X2339, X2331, X2343, X2352,
 X2293, X2338, X2449, X2296, X2453, X2460, X2474,
 X2603, X2282, X2313, X2538, X2522, X2489, X2564,
 X2594, X2274, X2451, X2321, X2356, X2611), row.names =
 27412:27431, class = data.frame)

 # create a 'key' of characters in the X columns
 indx - which(grepl(^X, names(x)))

 x$key - apply(x[, indx], 1, paste, collapse = '')

 # create counts
 counts - t(apply(x, 1, function(z){
 c(A = nchar(gsub([^A], '', z['key']))
 , C = nchar(gsub([^C], '', z['key']))
 , G = nchar(gsub([^G], '', z['key']))
 , T = nchar(gsub([^T], '', z['key']))
 )
 }))

 # output

 counts
   A.key C.key G.key T.key
 2741281 025 0
 27413 029 077
 27414 0 0 0   106
 27415 0   106 0 0
 27416 027 079
 27417 0   106 0 0
 27418 0   106 0 0
 27419 0 0 0   106
 27420 0   106 0 0
 2742110 096 0
 2742237 069 0
 2742339 067 0
 27424 4 0   102 0
 27425 0 02086
 27426 065 041
 2742740 066 0
 27428 078 028
 27429   106 0 0 0
 2743097 0 9 0
 2743168 038 0


 On Wed, Jan 9, 2013 at 9:23 AM, Yao He yao.h.1...@gmail.com wrote:
 Dear All

 I have a data.frame like that:
 structure(list(name = c(Gga_rs10722041, Gga_rs10722249, Gga_rs10722565,
 Gga_rs10723082, Gga_rs10723993, Gga_rs10724555, Gga_rs10726238,
 Gga_rs10726461, Gga_rs10726774, Gga_rs10726967, Gga_rs10727581,
 Gga_rs10728004, Gga_rs10728156, Gga_rs10728177, Gga_rs10728373,
 Gga_rs10728585, Gga_rs10729598, Gga_rs10729643, Gga_rs10729685,
 Gga_rs10729827), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L,
 20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L,
 36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L,
 2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L,
 3619538L), strand = c(+, +, +, +, +, +, +, +,
 +, +, +, +, +, +, +, +, +, +, +, +),
 X2353 = c(AA, TT, TT, CC, TT, CC, CC, TT,
 CC, GG, AG, AG, AG, TT, CC, AG, CC, AA,
 GG, GG), X2409 = c(AA, CT, TT, CC, CT, CC,
 CC, TT, CC, GG, GG, AG, AG, TT, CC, AG,
 CC, AA, AG, GA), X2500 = c(GA, TT, TT, CC,
 TT, CC, CC, TT, CC, GG, GG, GG, GG, GT,
 CT, GG, CC, AA, AA, AA), X2598 = c(AA, TT,
 TT, CC, TT, CC, CC, TT, CC, GG, AA, AG,
 GG, TT, CC, AG, TC, AA, AA, AG), X2610 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
 GA, GG, TT, CC, GA, CC, AA, AA, GA), X2300 = c(GA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
 AA, AG, TT, TC, AA, TC, AA, AG, AA), X2507 = c(AG,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
 GA, GG, TT, TC, GG, CC, AA, GA, AG), X2530 = c(AG,
 TC, TT, CC, TC, CC, CC, TT, CC, GG, AA,
 GG, GG, TT, CC, GG, CC, AA, AA, AA), X2327 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
 GG, GG, TT, TC, GG, CC, AA, AA, AA), X2389 = c(AA,
 CC, TT, CC, CC, CC, CC, TT, CC, AG, GG,
 AG, GG, TT, TC, AG, CC, AA

Re: [R] how to count A, C, T, G in each row in a big data.frame?

2013-01-09 Thread Yao He
Thanks a lot.

The problem is that I don't know how to handle the output list as I
want calculate the frequency of A or G or T or C by row.


Yao He
2013/1/10 Jessica Streicher j.streic...@micromata.de:
 Sorry, you wanted rows, i wrote for columns

 #rows would be:
 test2-apply(test[,-c(1:4)],1,function(x){table(t(x))})

 #find single values in a row
 sapply(test2,function(row){
 allVars-paste(names(row),collapse=)
 u - unique(strsplit(allVars,)[[1]])
 parts-sapply(names(row),function(x){u%in%strsplit(x,)[[1]]})
 mat-parts%*%row
 rownames(mat)-u
 mat
 })

 though i guess lists aren't ideal, but theres another answer as well i see.

 On 09.01.2013, at 15:23, Yao He wrote:

 Dear All

 I have a data.frame like that:
 structure(list(name = c(Gga_rs10722041, Gga_rs10722249, Gga_rs10722565,
 Gga_rs10723082, Gga_rs10723993, Gga_rs10724555, Gga_rs10726238,
 Gga_rs10726461, Gga_rs10726774, Gga_rs10726967, Gga_rs10727581,
 Gga_rs10728004, Gga_rs10728156, Gga_rs10728177, Gga_rs10728373,
 Gga_rs10728585, Gga_rs10729598, Gga_rs10729643, Gga_rs10729685,
 Gga_rs10729827), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L,
 20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L,
 36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L,
 2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L,
 3619538L), strand = c(+, +, +, +, +, +, +, +,
 +, +, +, +, +, +, +, +, +, +, +, +),
X2353 = c(AA, TT, TT, CC, TT, CC, CC, TT,
CC, GG, AG, AG, AG, TT, CC, AG, CC, AA,
GG, GG), X2409 = c(AA, CT, TT, CC, CT, CC,
CC, TT, CC, GG, GG, AG, AG, TT, CC, AG,
CC, AA, AG, GA), X2500 = c(GA, TT, TT, CC,
TT, CC, CC, TT, CC, GG, GG, GG, GG, GT,
CT, GG, CC, AA, AA, AA), X2598 = c(AA, TT,
TT, CC, TT, CC, CC, TT, CC, GG, AA, AG,
GG, TT, CC, AG, TC, AA, AA, AG), X2610 = c(AA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
GA, GG, TT, CC, GA, CC, AA, AA, GA), X2300 = c(GA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
AA, AG, TT, TC, AA, TC, AA, AG, AA), X2507 = c(AG,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
GA, GG, TT, TC, GG, CC, AA, GA, AG), X2530 = c(AG,
TC, TT, CC, TC, CC, CC, TT, CC, GG, AA,
GG, GG, TT, CC, GG, CC, AA, AA, AA), X2327 = c(AA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
GG, GG, TT, TC, GG, CC, AA, AA, AA), X2389 = c(AA,
CC, TT, CC, CC, CC, CC, TT, CC, AG, GG,
AG, GG, TT, TC, AG, CC, AA, AA, AA), X2408 = c(AA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
GA, GG, TT, CC, GA, CC, AA, AA, AG), X2463 = c(AA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
GG, GG, TT, CT, GG, CC, AA, AA, GA), X2420 = c(GA,
TC, TT, CC, TC, CC, CC, TT, CC, GG, AG,
GG, GG, TG, TT, GG, CT, AA, AA, AA), X2563 = c(GA,
CC, TT, CC, TC, CC, CC, TT, CC, GG, GA,
GG, GG, GT, TT, GG, CT, AA, AA, AA), X2462 = c(AA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, AA,
GG, GG, GT, TC, GG, CC, AA, AA, AA), X2292 = c(GA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
AA, GG, TG, TC, AA, TC, AA, AA, AA), X2405 = c(GA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
AG, GG, TG, TT, AA, CT, AA, AA, AA), X2543 = c(AA,
TC, TT, CC, TC, CC, CC, TT, CC, GA, GA,
GA, GG, TT, CT, GA, TT, AA, AA, GG), X2557 = c(AG,
CT, TT, CC, CT, CC, CC, TT, CC, GG, AG,
GA, GG, GT, CT, GA, CT, AA, AA, AG), X2583 = c(GA,
CT, TT, CC, CT, CC, CC, TT, CC, GG, GA,
GG, GG, GG, CT, GA, CT, AA, AA, AG), X2322 = c(AG,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
GG, GG, GT, TT, GG, CC, AA, AA, GA), X2535 = c(AA,
TC, TT, CC, TT, CC, CC, TT, CC, GG, GA,
GG, GG, TT, CC, GG, CC, AA, AA, AG), X2536 = c(GA,
TC, TT, CC, TC, CC, CC, TT, CC, GG, GG,
AG, GG, TT, TC, AG, TC, AA, AA, GA), X2581 = c(AG,
CT, TT, CC, CT, CC, CC, TT, CC, GG, GG,
GA, GG, TT, CC, GA, CT, AA, AA, AG), X2570 = c(AA,
CT, TT, CC, CT, CC, CC, TT, CC, GG, GG,
GG, GG, TT, TC, GG, CC, AA, AA, GG), X2476 = c(AA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
GG, GG, GT, TC, AG, CC, AA, AA, AG), X2534 = c(GA,
TC, TT, CC, TC, CC, CC, TT, CC, GG, GA,
AG, GG, TG, CC, AG, TC, AA, AA, AA), X2280 = c(AA,
TC, TT, CC, TC, CC, CC, TT, CC, GG, AG,
AG, GG, TT, CC, GG, CC, AA, AA, AG), X2316 = c(AA,
CC, TT, CC, CC, CC, CC, TT, CC, AG, AA,
AA, AG, TT, TC, GG, CT, AA, GG, GG), X2339 = c(AA,
CC, TT, CC, CC, CC, CC, TT, CC, GA, AA,
GG, GG, GT, CT, GG, TT, AA, AA, AG), X2331 = c(AA,
TC, TT, CC, TC, CC, CC, TT, CC, GG, GG,
GG, GG, TT, CC, GG, CC, AA, AA, AG), X2343 = c(AA,
TC, TT, CC, TC, CC, CC, TT, CC, GG, GG,
GG, GG, TT, CT, GG, CC, AA, AA, GA), X2352 = c(AA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, AA,
GG, GG, TT, CC, GG, CC, AA, GA, AG), X2293 = c(GA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
AA, GG, TT, TC, AA, CT, AA, AA, AA), X2338 = c(GA,
TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
AG, GG, TT, TC, AG, TC, AA, AA, GA), X2449 = c(AA

Re: [R] how to count A, C, T, G in each row in a big data.frame?

2013-01-09 Thread Yao He
It is really a good output. Maybe I could go on with this output.
Everytime I  understand R further from your help.
The first four cols are irrelevant. It is a negligence

2013/1/10 William Dunlap wdun...@tibco.com:
 Can you get what you need from the following, where 'd' is your data.frame,
 the first four columns of which are irrelevant to this problem?
dd - d[,-(1:4)] ; table(rownames(dd)[row(dd)], unlist(dd))

   AA AG CC CT GA GG GT TC TG TT
 27412 29 10  0  0 13  1  0  0  0  0
 27413  0  0  4  9  0  0  0 12  0 28
 27414  0  0  0  0  0  0  0  0  0 53
 27415  0  0 53  0  0  0  0  0  0  0
 ...
 27430 46  3  0  0  2  2  0  0  0  0
 27431 19 15  0  0 15  4  0  0  0  0
 table() is pretty quick.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Yao He
 Sent: Wednesday, January 09, 2013 4:04 PM
 To: jim holtman
 Cc: R help
 Subject: Re: [R] how to count A, C, T, G in each row in a big 
 data.frame?

 In fact I want to calculate the gene frequency of each SNP.

 The key problems are that:
 1. my data.frame is large ,about 50,000 rows. So it is so slow to
 split() it by row

 2 .The allele in each SNP (each row) are different.Some are A/G, some
 are G/C. It is a little bit embarrassed for me to handle it.

 Thank you for your help

 2013/1/9 jim holtman jholt...@gmail.com:
  forgot the data.  this will count the characters; you can add logic
  with 'table' to count groups
 
  
  x -
  structure(list(name = c(Gga_rs10722041, Gga_rs10722249, 
  Gga_rs10722565,
  Gga_rs10723082, Gga_rs10723993, Gga_rs10724555, Gga_rs10726238,
  Gga_rs10726461, Gga_rs10726774, Gga_rs10726967, Gga_rs10727581,
  Gga_rs10728004, Gga_rs10728156, Gga_rs10728177, Gga_rs10728373,
  Gga_rs10728585, Gga_rs10729598, Gga_rs10729643, Gga_rs10729685,
  Gga_rs10729827), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
  7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L,
  20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L,
  36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L,
  2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L,
  3619538L), strand = c(+, +, +, +, +, +, +, +,
  +, +, +, +, +, +, +, +, +, +, +, +),
  X2353 = c(AA, TT, TT, CC, TT, CC, CC, TT,
  CC, GG, AG, AG, AG, TT, CC, AG, CC, AA,
  GG, GG), X2409 = c(AA, CT, TT, CC, CT, CC,
  CC, TT, CC, GG, GG, AG, AG, TT, CC, AG,
  CC, AA, AG, GA), X2500 = c(GA, TT, TT, CC,
  TT, CC, CC, TT, CC, GG, GG, GG, GG, GT,
  CT, GG, CC, AA, AA, AA), X2598 = c(AA, TT,
  TT, CC, TT, CC, CC, TT, CC, GG, AA, AG,
  GG, TT, CC, AG, TC, AA, AA, AG), X2610 = c(AA,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
  GA, GG, TT, CC, GA, CC, AA, AA, GA), X2300 = c(GA,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
  AA, AG, TT, TC, AA, TC, AA, AG, AA), X2507 = c(AG,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
  GA, GG, TT, TC, GG, CC, AA, GA, AG), X2530 = c(AG,
  TC, TT, CC, TC, CC, CC, TT, CC, GG, AA,
  GG, GG, TT, CC, GG, CC, AA, AA, AA), X2327 = c(AA,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
  GG, GG, TT, TC, GG, CC, AA, AA, AA), X2389 = c(AA,
  CC, TT, CC, CC, CC, CC, TT, CC, AG, GG,
  AG, GG, TT, TC, AG, CC, AA, AA, AA), X2408 = c(AA,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
  GA, GG, TT, CC, GA, CC, AA, AA, AG), X2463 = c(AA,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
  GG, GG, TT, CT, GG, CC, AA, AA, GA), X2420 = c(GA,
  TC, TT, CC, TC, CC, CC, TT, CC, GG, AG,
  GG, GG, TG, TT, GG, CT, AA, AA, AA), X2563 = c(GA,
  CC, TT, CC, TC, CC, CC, TT, CC, GG, GA,
  GG, GG, GT, TT, GG, CT, AA, AA, AA), X2462 = c(AA,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, AA,
  GG, GG, GT, TC, GG, CC, AA, AA, AA), X2292 = c(GA,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
  AA, GG, TG, TC, AA, TC, AA, AA, AA), X2405 = c(GA,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
  AG, GG, TG, TT, AA, CT, AA, AA, AA), X2543 = c(AA,
  TC, TT, CC, TC, CC, CC, TT, CC, GA, GA,
  GA, GG, TT, CT, GA, TT, AA, AA, GG), X2557 = c(AG,
  CT, TT, CC, CT, CC, CC, TT, CC, GG, AG,
  GA, GG, GT, CT, GA, CT, AA, AA, AG), X2583 = c(GA,
  CT, TT, CC, CT, CC, CC, TT, CC, GG, GA,
  GG, GG, GG, CT, GA, CT, AA, AA, AG), X2322 = c(AG,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
  GG, GG, GT, TT, GG, CC, AA, AA, GA), X2535 = c(AA,
  TC, TT, CC, TT, CC, CC, TT, CC, GG, GA,
  GG, GG, TT, CC, GG, CC, AA, AA, AG), X2536 = c(GA,
  TC, TT, CC, TC, CC, CC, TT, CC, GG, GG,
  AG, GG, TT, TC, AG, TC, AA, AA, GA), X2581 = c(AG,
  CT, TT, CC, CT, CC, CC, TT, CC, GG, GG,
  GA, GG, TT, CC, GA, CT, AA, AA, AG), X2570 = c(AA,
  CT, TT, CC, CT, CC, CC, TT, CC, GG, GG,
  GG, GG, TT, TC, GG, CC, AA, AA, GG), X2476 = c(AA,
  TT, TT, CC, TT, CC, CC, TT, CC, GG, GG

Re: [R] how to count A, C, T, G in each row in a big data.frame?

2013-01-09 Thread Yao He
Hi arun
Then how could spilt them and get a table of letters count such as:
  id AA AG CC CT GA GG GT TC TG TT
  id   A T C G
 #1 27412 81 0 0 25
 #2 27413  0  77 29 0

 Thanks

2013/1/10 arun smartpink...@yahoo.com:
 Hi Yao,
 You could also use:
 library(reshape2)
 dd-dat1[,-(1:4)]
 res-dcast(melt(within(dd,{id=row.names(dd)}),id.var=id),id~value,length)
 head(res)
 # id AA AG CC CT GA GG GT TC TG TT
 #1 27412 29 10  0  0 13  1  0  0  0  0
 #2 27413  0  0  4  9  0  0  0 12  0 28
 #3 27414  0  0  0  0  0  0  0  0  0 53
 #4 27415  0  0 53  0  0  0  0  0  0  0
 #5 27416  0  0  3  9  0  0  0 12  0 29
 #6 27417  0  0 53  0  0  0  0  0  0  0

 #Just for comparison:
 dat2- dat1[rep(row.names(dat1),2000),]
  nrow(dat2)
 #[1] 4
  row.names(dat2)-1:4
  dd - dat2[,-(1:4)]
   system.time(res1- table(rownames(dd)[row(dd)], unlist(dd)))
 #   user  system elapsed
 #  5.840   0.104   5.954
  system.time(res2 - 
 dcast(melt(within(dd,{id=row.names(dd)}),id.var=id),id~value,length))
 #   user  system elapsed
 #  3.100   0.064   3.167
  head(res1,3)

  # AA AG CC CT GA GG GT TC TG TT
  # 1   29 10  0  0 13  1  0  0  0  0
  # 10   0  4  0  0  6 43  0  0  0  0
  # 100 19 15  0  0 15  4  0  0  0  0
  head(res2,3)
 #   id AA AG CC CT GA GG GT TC TG TT
 #1   1 29 10  0  0 13  1  0  0  0  0
 #2  10  0  4  0  0  6 43  0  0  0  0
 #3 100 19 15  0  0 15  4  0  0  0  0

 A.K.







 - Original Message -
 From: Yao He yao.h.1...@gmail.com
 To: R help r-help@r-project.org
 Cc:
 Sent: Wednesday, January 9, 2013 9:23 AM
 Subject: [R] how to count A,C,T,G in each row in a big data.frame?

 Dear All

 I have a data.frame like that:
 structure(list(name = c(Gga_rs10722041, Gga_rs10722249, Gga_rs10722565,
 Gga_rs10723082, Gga_rs10723993, Gga_rs10724555, Gga_rs10726238,
 Gga_rs10726461, Gga_rs10726774, Gga_rs10726967, Gga_rs10727581,
 Gga_rs10728004, Gga_rs10728156, Gga_rs10728177, Gga_rs10728373,
 Gga_rs10728585, Gga_rs10729598, Gga_rs10729643, Gga_rs10729685,
 Gga_rs10729827), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L,
 20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L,
 36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L,
 2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L,
 3619538L), strand = c(+, +, +, +, +, +, +, +,
 +, +, +, +, +, +, +, +, +, +, +, +),
 X2353 = c(AA, TT, TT, CC, TT, CC, CC, TT,
 CC, GG, AG, AG, AG, TT, CC, AG, CC, AA,
 GG, GG), X2409 = c(AA, CT, TT, CC, CT, CC,
 CC, TT, CC, GG, GG, AG, AG, TT, CC, AG,
 CC, AA, AG, GA), X2500 = c(GA, TT, TT, CC,
 TT, CC, CC, TT, CC, GG, GG, GG, GG, GT,
 CT, GG, CC, AA, AA, AA), X2598 = c(AA, TT,
 TT, CC, TT, CC, CC, TT, CC, GG, AA, AG,
 GG, TT, CC, AG, TC, AA, AA, AG), X2610 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
 GA, GG, TT, CC, GA, CC, AA, AA, GA), X2300 = c(GA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
 AA, AG, TT, TC, AA, TC, AA, AG, AA), X2507 = c(AG,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
 GA, GG, TT, TC, GG, CC, AA, GA, AG), X2530 = c(AG,
 TC, TT, CC, TC, CC, CC, TT, CC, GG, AA,
 GG, GG, TT, CC, GG, CC, AA, AA, AA), X2327 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
 GG, GG, TT, TC, GG, CC, AA, AA, AA), X2389 = c(AA,
 CC, TT, CC, CC, CC, CC, TT, CC, AG, GG,
 AG, GG, TT, TC, AG, CC, AA, AA, AA), X2408 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
 GA, GG, TT, CC, GA, CC, AA, AA, AG), X2463 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
 GG, GG, TT, CT, GG, CC, AA, AA, GA), X2420 = c(GA,
 TC, TT, CC, TC, CC, CC, TT, CC, GG, AG,
 GG, GG, TG, TT, GG, CT, AA, AA, AA), X2563 = c(GA,
 CC, TT, CC, TC, CC, CC, TT, CC, GG, GA,
 GG, GG, GT, TT, GG, CT, AA, AA, AA), X2462 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, AA,
 GG, GG, GT, TC, GG, CC, AA, AA, AA), X2292 = c(GA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GA,
 AA, GG, TG, TC, AA, TC, AA, AA, AA), X2405 = c(GA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
 AG, GG, TG, TT, AA, CT, AA, AA, AA), X2543 = c(AA,
 TC, TT, CC, TC, CC, CC, TT, CC, GA, GA,
 GA, GG, TT, CT, GA, TT, AA, AA, GG), X2557 = c(AG,
 CT, TT, CC, CT, CC, CC, TT, CC, GG, AG,
 GA, GG, GT, CT, GA, CT, AA, AA, AG), X2583 = c(GA,
 CT, TT, CC, CT, CC, CC, TT, CC, GG, GA,
 GG, GG, GG, CT, GA, CT, AA, AA, AG), X2322 = c(AG,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GG,
 GG, GG, GT, TT, GG, CC, AA, AA, GA), X2535 = c(AA,
 TC, TT, CC, TT, CC, CC, TT, CC, GG, GA,
 GG, GG, TT, CC, GG, CC, AA, AA, AG), X2536 = c(GA,
 TC, TT, CC, TC, CC, CC, TT, CC, GG, GG,
 AG, GG, TT, TC, AG, TC, AA, AA, GA), X2581 = c(AG,
 CT, TT, CC, CT, CC, CC, TT, CC, GG, GG,
 GA, GG, TT, CC, GA, CT, AA, AA, AG), X2570 = c(AA,
 CT, TT, CC, CT, CC, CC, TT, CC, GG, GG,
 GG, GG, TT, TC, GG, CC, AA, AA, GG), X2476 = c(AA,
 TT, TT, CC, TT, CC, CC, TT, CC, GG, GG

Re: [R] ggplot not showing all the years on the x-axis

2013-01-08 Thread Yao He
Hi,this is a question about how to set the scale,try this
add a scale_x_discrete() like that:

plot - tmpplot + geom_line()+scale_x_continuous(breaks=ii)


Yao He


2013/1/8 Francesco Sarracino f.sarrac...@gmail.com:
 Dear R helpers,

 I am currently having hard time fixing the values on the x-axis of a plot
 with ggplot: even though I have 12 years, ggplot plots only 3 of them.
 Here is my example:

 library(ggplot2)
 ii - 2000:2011
 ss - rnorm(12,0,1)
 pm - data.frame(ii,ss)
 tmpplot - ggplot(pm, aes(x = ii, y = ss))
 plot - tmpplot + geom_line()
 plot

 In my case, ggplot reports on the year 2000, 2004 and 2008 on the x-axis,
 but I'd like to have all the years from 2000 to 2011. I know how to fix
 this with the standard plot in R, but for consistency I'd like to use
 ggplot.
 Can anyone help?
 thanks in advance,
 f.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
—
Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com
——

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to aggregate T-test result in an elegant way?

2013-01-07 Thread Yao He
Hi, arun
I'm so sorry for that isn't helpful.
One of question is that I don't know how  to subset a small part as it
is a 3-dimension array so I just show the structure of that.
 I tried  dput()  to a file , then what should I do for subsetting it?

Another question is :
My rawdata is a melt dataframe like that:
IID O2  variablevalue
1   TWF2H5  13% EW.INCU 49.38
2   TWF2H6  13% EW.INCU 48.02
3   TWF2H19 13%  EW.INCU51.44
280 TWF2H10113% EW.17.5 42.26
281 TWF2H10513%  EW.17.5 43.52
282 TWF2H10613% EW.17.5 42.83
472 TWF2N10221% EW.17.5 45.97
473 TWF2N10421%  EW.17.543.32
474 TWF2N10621% EW.17.5 48.63
689 TWF2N2  21%  EMW19.57
690 TWF2N6  21%  EMW18.07
691 TWF2N10 21% EMW 15.4
491 TWF2H5   13%EMW 15.61
492 TWF2H6  13% EMW 13.41
493 TWF2H19 13% EMW 14.03
199 TWF2N2  21% EW.INCU 48.69
200 TWF2N6  21% EW.INCU 50.52
201 TWF2N10 21% EW.INCU 42.04

if you meet a t-test task as I described  , is that generate a
high-dimension array  a good way ?
Thank you!

Yao He
2013/1/7 arun smartpink...@yahoo.com:
 HI,
 I tried to create an example dataset (as you didn't provide the data).
 set.seed(25)
 a-array(sample(1:50,60,replace=TRUE),dim=c(2,10,3))
 dimnames(a)[[1]]-c(13%,21%)
 dimnames(a)[[2]]-paste(TWF2H,101:110,sep=)
 dimnames(a)[[3]]-c(EW.INCU,EW.17.5,EMW)


 str(a)
 # int [1:2, 1:10, 1:3] 21 35 8 45 7 50 32 17 4 15 ...
  #- attr(*, dimnames)=List of 3
   #..$ : chr [1:2] 13% 21%
   .#.$ : chr [1:10] TWF2H101 TWF2H102 TWF2H103 TWF2H104 ...
   #..$ : chr [1:3] EW.INCU EW.17.5 EMW

 res-lapply(lapply(seq_len(dim(a)[3]),function(i) 
 t.test(a[dimnames(a)[[1]][1],,i],a[dimnames(a)[[1]][2],,i])),function(x) 
 data.frame(mean=x$estimate,p.value=x$p.value))
 res1-do.call(rbind,res)
   row.names(res1)[grep(mean of 
 x,row.names(res1))]-gsub((.*\\.).*$,\\113%,row.names(res1)[grep(mean 
 of x,row.names(res1))])
  row.names(res1)[grep(mean of 
 y,row.names(res1))]-gsub((.*\\.).*$,\\121%,row.names(res1)[grep(mean 
 of y,row.names(res1))])
 res1
 #mean   p.value
 #EW.INCU.13% 22.3 0.2754842
 #EW.INCU.21% 29.3 0.2754842
 #EW.17.5.13% 20.5 0.4705772
 #EW.17.5.21% 16.0 0.4705772
 #EMW.13% 23.9 0.9638679
 #EMW.21% 24.2 0.9638679
 A.K.




 - Original Message -
 From: Yao He yao.h.1...@gmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Sunday, January 6, 2013 11:21 PM
 Subject: Re: [R] how to aggregate T-test result in an elegant way?

 Thank you,it is really helpful everytime.

 I didn't provide any example data because I thought it is just a
 question of how to report t.test() result in R.
 However,as you say,it is better to show more details for finding an elegant 
 way

 In fact  I generate a 3-dimension array like that:
 str(a)
 num [1:2, 1:245, 1:3] 47.5 NA 48.9 NA 47.5 ...
 - attr(*, dimnames)=List of 3
   ..$ : chr [1:2] 13% 21%
   ..$ : chr [1:245] TWF2H101 TWF2H105 TWF2H106 TWF2H110 ...
   ..$ : chr [1:3] EW.INCU EW.17.5 EMW

 I want to do two sample mean t-test between 13% and 21% for each
 variable EW.INCU EW.17.5 EMW.

 So I try these codes:
 variable-dimnames(a)[[3]]
   O2-dimnames(a)[[1]]
   for (i in variable) {
 print(i)
 print(O2[1])
 print(O2[2])
 print(t.test(a[O2[1],,i],a[O2[2],,i],na.rm=T))
 }

 I don't think it is an elegant way and I am inexperience to report raw result.
 Could you give me more help?

 Yao He

 2013/1/7 arun smartpink...@yahoo.com:
 Hi,
 You didn't provide any example data.  So, I am not sure whether this helps.

 set.seed(15)
 dat1-data.frame(A=sample(10:20,5,replace=TRUE),B=sample(18:28,5,replace=TRUE),C=sample(25:35,5,replace=TRUE),D=sample(20:30,5,replace=TRUE))
  res-lapply(lapply(seq_len(ncol(dat2)),function(i) 
 t.test(dat2[,i],dat1[,1],paired=TRUE)),function(x) 
 data.frame(meanDiff=x$estimate,p.value=x$p.value))# paired
 names(res)-paste(A,LETTERS[2:4],sep=)
 res- do.call(rbind,res)
 res
   # meanDiff p.value
 #AB  9.4 0.021389577
 #AC 15.0 0.002570261
 #AD 10.6 0.003971604


 #or
 res1-lapply(lapply(seq_len(ncol(dat2)),function(i) 
 t.test(dat2[,i],dat1[,1],paired=FALSE)),function(x) 
 data.frame(mean=x$estimate,p.value=x$p.value))
 names(res1)-paste(A,LETTERS[2:4],sep=)
 res1-do.call(rbind,res1)
 row.names(res1)[grep(mean of 
 y,row.names(res1))]-gsub((.*\\.).*,\\1A,row.names(res1)[grep(mean of 
 y,row.names(res1))])
 row.names(res1)[grep(mean of 
 x,row.names(res1))]-gsub((\\w)(\\w)(\\.).*,\\1\\2\\3\\2,row.names(res1)[grep(mean
  of x,row.names(res1))])
 res1
 # mean  p.value
 #AB.B 25.2 1.299192e-03
 #AB.A 15.8 1.299192e-03
 #AC.C 30.8 5.145519e-05
 #AC.A 15.8 5.145519e-05
 #AD.D 26.4 1.381339e-03
 #AD.A 15.8 1.381339e-03


 A.K.



 - Original Message -
 From: Yao He yao.h.1...@gmail.com
 To: r-help@r

Re: [R] how to aggregate T-test result in an elegant way?

2013-01-07 Thread Yao He
Hi,arun

Yes , I just want to do the t.test
I think maybe  it is not necessary to generate a 3D array from the raw
data.frame by acast() at first

Thanks a lot

2013/1/7 arun smartpink...@yahoo.com:
 Hi Yao,

 It's okay.

 How did you generate the 3 D array?
 Using ?acast()

 I am not sure I understand your question 

 if you meet a t-test task as I described  , is that generate a
 high-dimension array  a good way ?

 Do you want to do the t-test in the melt dataset?

 b- read.table(text=
 IDO2variablevalue
 1TWF2H513% EW.INCU49.38
 2TWF2H613% EW.INCU48.02
 3TWF2H1913%EW.INCU51.44
 280TWF2H10113% EW.17.542.26
 281TWF2H10513%EW.17.543.52
 282TWF2H10613% EW.17.542.83
 472TWF2N10221% EW.17.545.97
 473TWF2N10421%EW.17.5 43.32
 474TWF2N10621% EW.17.548.63
 689TWF2N221% EMW19.57
 690TWF2N621%EMW18.07
 691TWF2N1021%EMW15.4
 491TWF2H513%EMW15.61
 492TWF2H613%EMW13.41
 493TWF2H1913%EMW14.03
 199TWF2N221%EW.INCU48.69
 200TWF2N621%EW.INCU50.52
 201TWF2N1021%EW.INCU42.04
 ,sep=,header=TRUE,stringsAsFactors=FALSE)
  res-lapply(lapply(split(b,b$variable),function(x) 
 t.test(x$value[x$O2==13%],x$value[x$O2==21%])),function(x) 
 data.frame(mean=x$estimate,p.value=x$p.value))
 res1-do.call(rbind,res)
 row.names(res1)[grep(mean of 
 x,row.names(res1))]-gsub((.*\\.).*$,\\113%,row.names(res1)[grep(mean 
 of x,row.names(res1))])
 row.names(res1)[grep(mean of 
 y,row.names(res1))]-gsub((.*\\.).*$,\\121%,row.names(res1)[grep(mean 
 of y,row.names(res1))])
 res1
 #meanp.value
 #EMW.13% 14.35000 0.09355374
 #EMW.21% 17.68000 0.09355374
 #EW.17.5.13% 42.87000 0.17464018
 #EW.17.5.21% 45.97333 0.17464018
 #EW.INCU.13% 49.61333 0.43689727
 #EW.INCU.21% 47.08333 0.43689727

 A.K.



 - Original Message -
 From: Yao He yao.h.1...@gmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Monday, January 7, 2013 4:00 AM
 Subject: Re: [R] how to aggregate T-test result in an elegant way?

 Hi, arun
 I'm so sorry for that isn't helpful.
 One of question is that I don't know how  to subset a small part as it
 is a 3-dimension array so I just show the structure of that.
 I tried  dput()  to a file , then what should I do for subsetting it?

 Another question is :
 My rawdata is a melt dataframe like that:
 IIDO2variablevalue
 1TWF2H513% EW.INCU49.38
 2TWF2H613% EW.INCU48.02
 3TWF2H1913% EW.INCU51.44
 280TWF2H10113% EW.17.542.26
 281TWF2H10513% EW.17.5 43.52
 282TWF2H10613% EW.17.542.83
 472TWF2N10221% EW.17.545.97
 473TWF2N10421% EW.17.5 43.32
 474TWF2N10621% EW.17.548.63
 689TWF2N221%  EMW19.57
 690TWF2N621% EMW18.07
 691TWF2N1021%EMW15.4
 491TWF2H5 13%EMW15.61
 492TWF2H613%EMW13.41
 493TWF2H1913%EMW14.03
 199TWF2N221%EW.INCU48.69
 200TWF2N621%EW.INCU50.52
 201TWF2N1021%EW.INCU42.04

 if you meet a t-test task as I described  , is that generate a
 high-dimension array  a good way ?
 Thank you!

 Yao He
 2013/1/7 arun smartpink...@yahoo.com:
 HI,
 I tried to create an example dataset (as you didn't provide the data).
 set.seed(25)
 a-array(sample(1:50,60,replace=TRUE),dim=c(2,10,3))
 dimnames(a)[[1]]-c(13%,21%)
 dimnames(a)[[2]]-paste(TWF2H,101:110,sep=)
 dimnames(a)[[3]]-c(EW.INCU,EW.17.5,EMW)


 str(a)
 # int [1:2, 1:10, 1:3] 21 35 8 45 7 50 32 17 4 15 ...
  #- attr(*, dimnames)=List of 3
   #..$ : chr [1:2] 13% 21%
   .#.$ : chr [1:10] TWF2H101 TWF2H102 TWF2H103 TWF2H104 ...
   #..$ : chr [1:3] EW.INCU EW.17.5 EMW

 res-lapply(lapply(seq_len(dim(a)[3]),function(i) 
 t.test(a[dimnames(a)[[1]][1],,i],a[dimnames(a)[[1]][2],,i])),function(x) 
 data.frame(mean=x$estimate,p.value=x$p.value))
 res1-do.call(rbind,res)
   row.names(res1)[grep(mean of 
 x,row.names(res1))]-gsub((.*\\.).*$,\\113%,row.names(res1)[grep(mean 
 of x,row.names(res1))])
  row.names(res1)[grep(mean of 
 y,row.names(res1))]-gsub((.*\\.).*$,\\121%,row.names(res1)[grep(mean 
 of y,row.names(res1))])
 res1
 #mean   p.value
 #EW.INCU.13% 22.3 0.2754842
 #EW.INCU.21% 29.3 0.2754842
 #EW.17.5.13% 20.5 0.4705772
 #EW.17.5.21% 16.0 0.4705772
 #EMW.13% 23.9 0.9638679
 #EMW.21% 24.2 0.9638679
 A.K.




 - Original Message -
 From: Yao He yao.h.1...@gmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Sunday, January 6, 2013 11:21 PM
 Subject: Re: [R] how to aggregate T-test result in an elegant way?

 Thank you,it is really helpful everytime.

 I didn't provide

Re: [R] how to aggregate T-test result in an elegant way?

2013-01-07 Thread Yao He
Yes, thanks a lot for your help!

Regards

2013/1/8 arun smartpink...@yahoo.com:
 Hi Yao,

 You could also have the results in a wide format:
 res-do.call(rbind,lapply(lapply(split(b,b$variable),function(x) 
 t.test(x$value[x$O2==13%],x$value[x$O2==21%])),function(x) 
 data.frame(mean13=x$estimate[1],mean21=x$estimate[2],p.value=x$p.value,CILow=x$conf.int[1],CIHigh=x$conf.int[2])))
  res
 #  mean13   mean21p.value CILowCIHigh
 #EMW 14.35000 17.68000 0.09355374 -7.682686  1.022686
 #EW.17.5 42.87000 45.97333 0.17464018 -9.265622  3.058955
 #EW.INCU 49.61333 47.08333 0.43689727 -7.119234 12.179234
 A.K.




 - Original Message -
 From: Yao He yao.h.1...@gmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Monday, January 7, 2013 10:57 AM
 Subject: Re: [R] how to aggregate T-test result in an elegant way?

 Hi,arun

 Yes , I just want to do the t.test
 I think maybe  it is not necessary to generate a 3D array from the raw
 data.frame by acast() at first

 Thanks a lot

 2013/1/7 arun smartpink...@yahoo.com:
 Hi Yao,

 It's okay.

 How did you generate the 3 D array?
 Using ?acast()

 I am not sure I understand your question 

 if you meet a t-test task as I described  , is that generate a
 high-dimension array  a good way ?

 Do you want to do the t-test in the melt dataset?

 b- read.table(text=
 IDO2variablevalue
 1TWF2H513% EW.INCU49.38
 2TWF2H613% EW.INCU48.02
 3TWF2H1913%EW.INCU51.44
 280TWF2H10113% EW.17.542.26
 281TWF2H10513%EW.17.543.52
 282TWF2H10613% EW.17.542.83
 472TWF2N10221% EW.17.545.97
 473TWF2N10421%EW.17.5 43.32
 474TWF2N10621% EW.17.548.63
 689TWF2N221% EMW19.57
 690TWF2N621%EMW18.07
 691TWF2N1021%EMW15.4
 491TWF2H513%EMW15.61
 492TWF2H613%EMW13.41
 493TWF2H1913%EMW14.03
 199TWF2N221%EW.INCU48.69
 200TWF2N621%EW.INCU50.52
 201TWF2N1021%EW.INCU42.04
 ,sep=,header=TRUE,stringsAsFactors=FALSE)
  res-lapply(lapply(split(b,b$variable),function(x) 
 t.test(x$value[x$O2==13%],x$value[x$O2==21%])),function(x) 
 data.frame(mean=x$estimate,p.value=x$p.value))
 res1-do.call(rbind,res)
 row.names(res1)[grep(mean of 
 x,row.names(res1))]-gsub((.*\\.).*$,\\113%,row.names(res1)[grep(mean 
 of x,row.names(res1))])
 row.names(res1)[grep(mean of 
 y,row.names(res1))]-gsub((.*\\.).*$,\\121%,row.names(res1)[grep(mean 
 of y,row.names(res1))])
 res1
 #meanp.value
 #EMW.13% 14.35000 0.09355374
 #EMW.21% 17.68000 0.09355374
 #EW.17.5.13% 42.87000 0.17464018
 #EW.17.5.21% 45.97333 0.17464018
 #EW.INCU.13% 49.61333 0.43689727
 #EW.INCU.21% 47.08333 0.43689727

 A.K.



 - Original Message -
 From: Yao He yao.h.1...@gmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Monday, January 7, 2013 4:00 AM
 Subject: Re: [R] how to aggregate T-test result in an elegant way?

 Hi, arun
 I'm so sorry for that isn't helpful.
 One of question is that I don't know how  to subset a small part as it
 is a 3-dimension array so I just show the structure of that.
 I tried  dput()  to a file , then what should I do for subsetting it?

 Another question is :
 My rawdata is a melt dataframe like that:
 IIDO2variablevalue
 1TWF2H513% EW.INCU49.38
 2TWF2H613% EW.INCU48.02
 3TWF2H1913% EW.INCU51.44
 280TWF2H10113% EW.17.542.26
 281TWF2H10513% EW.17.5 43.52
 282TWF2H10613% EW.17.542.83
 472TWF2N10221% EW.17.545.97
 473TWF2N10421% EW.17.5 43.32
 474TWF2N10621% EW.17.548.63
 689TWF2N221%  EMW19.57
 690TWF2N621% EMW18.07
 691TWF2N1021%EMW15.4
 491TWF2H5 13%EMW15.61
 492TWF2H613%EMW13.41
 493TWF2H1913%EMW14.03
 199TWF2N221%EW.INCU48.69
 200TWF2N621%EW.INCU50.52
 201TWF2N1021%EW.INCU42.04

 if you meet a t-test task as I described  , is that generate a
 high-dimension array  a good way ?
 Thank you!

 Yao He
 2013/1/7 arun smartpink...@yahoo.com:
 HI,
 I tried to create an example dataset (as you didn't provide the data).
 set.seed(25)
 a-array(sample(1:50,60,replace=TRUE),dim=c(2,10,3))
 dimnames(a)[[1]]-c(13%,21%)
 dimnames(a)[[2]]-paste(TWF2H,101:110,sep=)
 dimnames(a)[[3]]-c(EW.INCU,EW.17.5,EMW)


 str(a)
 # int [1:2, 1:10, 1:3] 21 35 8 45 7 50 32 17 4 15 ...
  #- attr(*, dimnames)=List of 3
   #..$ : chr [1:2] 13% 21%
   .#.$ : chr [1:10] TWF2H101 TWF2H102 TWF2H103 TWF2H104 ...
   #..$ : chr [1:3] EW.INCU EW.17.5 EMW

 res-lapply(lapply(seq_len(dim(a)[3]),function(i) 
 t.test(a[dimnames(a)[[1]][1],,i

[R] how to aggregate T-test result in an elegant way?

2013-01-06 Thread Yao He
Dear all:

Plan 1:
I want to do serval t-test means for different variables in a loop ,
so I want to add all results to an object then  dump() them to an
text. But I don't know how to append T-test result to the object?

I have already plot the barplot and I want to know an elegant way to
report raw result.
Can anybody give me some pieces of advice?

Yao He
—
Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com
——

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to aggregate T-test result in an elegant way?

2013-01-06 Thread Yao He
Thank you,it is really helpful everytime.

I didn't provide any example data because I thought it is just a
question of how to report t.test() result in R.
However,as you say,it is better to show more details for finding an elegant way

In fact  I generate a 3-dimension array like that:
str(a)
 num [1:2, 1:245, 1:3] 47.5 NA 48.9 NA 47.5 ...
 - attr(*, dimnames)=List of 3
  ..$ : chr [1:2] 13% 21%
  ..$ : chr [1:245] TWF2H101 TWF2H105 TWF2H106 TWF2H110 ...
  ..$ : chr [1:3] EW.INCU EW.17.5 EMW

I want to do two sample mean t-test between 13% and 21% for each
variable EW.INCU EW.17.5 EMW.

So I try these codes:
variable-dimnames(a)[[3]]
  O2-dimnames(a)[[1]]
  for (i in variable) {
print(i)
print(O2[1])
print(O2[2])
print(t.test(a[O2[1],,i],a[O2[2],,i],na.rm=T))
}

I don't think it is an elegant way and I am inexperience to report raw result.
Could you give me more help?

Yao He

2013/1/7 arun smartpink...@yahoo.com:
 Hi,
 You didn't provide any example data.  So, I am not sure whether this helps.

 set.seed(15)
 dat1-data.frame(A=sample(10:20,5,replace=TRUE),B=sample(18:28,5,replace=TRUE),C=sample(25:35,5,replace=TRUE),D=sample(20:30,5,replace=TRUE))
  res-lapply(lapply(seq_len(ncol(dat2)),function(i) 
 t.test(dat2[,i],dat1[,1],paired=TRUE)),function(x) 
 data.frame(meanDiff=x$estimate,p.value=x$p.value))# paired
 names(res)-paste(A,LETTERS[2:4],sep=)
 res- do.call(rbind,res)
 res
   # meanDiff p.value
 #AB  9.4 0.021389577
 #AC 15.0 0.002570261
 #AD 10.6 0.003971604


 #or
 res1-lapply(lapply(seq_len(ncol(dat2)),function(i) 
 t.test(dat2[,i],dat1[,1],paired=FALSE)),function(x) 
 data.frame(mean=x$estimate,p.value=x$p.value))
 names(res1)-paste(A,LETTERS[2:4],sep=)
 res1-do.call(rbind,res1)
 row.names(res1)[grep(mean of 
 y,row.names(res1))]-gsub((.*\\.).*,\\1A,row.names(res1)[grep(mean of 
 y,row.names(res1))])
 row.names(res1)[grep(mean of 
 x,row.names(res1))]-gsub((\\w)(\\w)(\\.).*,\\1\\2\\3\\2,row.names(res1)[grep(mean
  of x,row.names(res1))])
 res1
 # mean  p.value
 #AB.B 25.2 1.299192e-03
 #AB.A 15.8 1.299192e-03
 #AC.C 30.8 5.145519e-05
 #AC.A 15.8 5.145519e-05
 #AD.D 26.4 1.381339e-03
 #AD.A 15.8 1.381339e-03


 A.K.



 - Original Message -
 From: Yao He yao.h.1...@gmail.com
 To: r-help@r-project.org
 Cc:
 Sent: Sunday, January 6, 2013 10:20 PM
 Subject: [R] how to aggregate T-test result in an elegant way?

 Dear all:

 Plan 1:
 I want to do serval t-test means for different variables in a loop ,
 so I want to add all results to an object then  dump() them to an
 text. But I don't know how to append T-test result to the object?

 I have already plot the barplot and I want to know an elegant way to
 report raw result.
 Can anybody give me some pieces of advice?

 Yao He
 ―
 Master candidate in 2rd year
 Department of Animal genetics  breeding
 Room 436,College of Animial ScienceTechnology,
 China Agriculture University,Beijing,100193
 E-mail: yao.h.1...@gmail.com
 ――

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
―
Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com
――

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to read different files into different objects in one time?

2012-12-19 Thread Yao He
Dear All

I have a lot of files in a directory as follows:
02-03.txt   03-04.txt   04-05.txt   05-06.txt   06-07.txt
07-08.txt   08-09.txt
 09-10.txt   G0.txt  G1.txt  raw_ped.txt
..

I want to read them into different objects according to their filenames,such as:
02-03-read.table(02-03.txt,header=T)
03-04-read.table(03-04.txt,header=T)
I don't want to type hundreds of read.table(),so how I read it in one time?
I think the core problem is that I can't create different objects'
name in the use of loop or sapply() ,but there may be a better way to
do what I want.

Thanks a lot

Yao He

Yao He


-- 
—
Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com
——

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to handle NA values in aggregate()

2012-12-15 Thread Yao He
Dear All:

I am trying to calculate four columns' means in a dataframe like this:

FID  MID IID EW_INCU EW_17.5   EMWEEratio
1   4621  TWF2H545.26NA 15.61 NA
1   4621  TWF2H648.0244.09 13.41  0.3041506
2   4630  TWF2H19   51.44   47.81 NA NA
2   4631  TWF2H21   NA  52.72 16.70  0.3167678
2   4632  TWF2H22   55.70   50.45 16.48  0.3266601
2   4633  TWF2H23   44.42   40.89 12.96  0.3169479

I try this code

 aggregate(df[,4:7],df[,1],mean)

But I couldn't set the agrument na.rm=T in the mean() function,so the
results are all NAs

Please tell me how to handle NA values in the use of aggregate()

Thanks a lot

Yao He
—
Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com
——

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to select a subset data to do a barplot in ggplot2

2012-12-13 Thread Yao He
Hi,everybody

I have a dataframe like this

FID IID STATUS
14621live
14628dead
24631live
24632live
24633live
24634live
64675live
64679dead
104716dead
104719live
104721dead
114726live
114728nosperm
114730nosperm
124732live
174783live
174783live
174784live

I just want a barblot to count live or dead in every FID, and fill
the bar with different colour.

I try these codes:

p-ggplot(data,aes(x=FID));
p+geom_bar(aes(x=factor(FID),y=..count..,fill=STATUS))

But how could I exclude nosperm or other levels just in the use of
ggplot2 without generating another dataframe

Thanks a lot

Yao He
—
Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com ming...@vt.edu
——

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.