Hi--
This is a question with a trivial and obvious answer, I'm sure, but I can't
seem to find it in the help files and books that I have handy. I have a
dataframe consisting of two columns, "Gene_Name," a list of gene symbols, and
"Number," a numeric measure of how frequently a tag representing that gene
showed up in a SAGE library. Several of the genes are represented by multiple
tags, and therefore are present more than once in the list, e.g.:
1167 Zcchc8 6
1168 Zcwpw1 5
1169 Zdhhc18 6
1170 Zdhhc20 5
1171 Zdhhc3 6
1172 Zdhhc3 5
1173 Zeb2 9
1174 Zeb2 6
What I want is to collapse the list by gene name, such that duplicates are
summed up and appear only once in the final version:
Zcchc8 6
Zcwpw1 5
Zdhhc18 6
Zdhhc20 5
Zdhhc3 11
Zeb2 15
The only way I can figure out to do this is via rowsum:
> rowsum (Number,Gene_Name)
gives me exactly what I want, *except* that in the end, I am left with a matrix
containing the Number values and with the Gene_Names used as row names (the
output therefore looks exactly as printed above) -- what I want is a dataframe
equivalent to the starting table, with numbered rows and separate, accessible
columns containing the Gene_Name and Number values.
I was able to put such a dataframe together manually, by cobbling together
the row names of the above list with the values:
> genes.unique <- data.frame (rownames (rowsum(Number,Gene_Name)),
> rowsum(Number,Gene_Name))
but then I have to manually replace the row names of the dataframe with
numbers, to get back to what I wanted in the first place.
I hope this makes some sort of sense. Is there an easier way to do this?
Thanks in advance!
Charlie Murtaugh
=====
L. Charles Murtaugh
Assistant Professor
University of Utah
Dept. of Human Genetics
15 N. 2030 E. Rm. 2100
Salt Lake City, UT 84112
tel 801-581-5958
fax 801-581-6463
email [EMAIL PROTECTED]
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.