Re: [R] Any method to speed up this problem?

Marc Schwartz Thu, 18 Jun 2009 07:38:38 -0700

On Jun 18, 2009, at 9:28 AM, njhuang86 wrote:

Hi all,

Suppose I have a vector like this:
[1] "STAT1" "STAT1" "STAT1" "STAT1" "GAPDH" "GAPDH" "GAPDH""ACTB"
"ACTB"
[10] "ACTB" "DDR1" "RFC2" "HSPA6" "PAX8" "GUCA1A" "UBE1L""THRA"
"PTPN21"
[19] "CCL5"   "CYP2E1"  "STAT1"  "THRA"  "PAX8"
I would like to produce a vector such that it has the same length asthe oneabove but it tells me where the duplicates are. So essentially, if Icouldrepresent each gene symbol as a specific number, and have theduplicates be
the same number, that would be ideal. Right now, I'm using the unique
command along with two nested for loops to do the job... But it'sreallytaking too long... Any suggestions would be greatly appreciated.Thank you!


Is this what you want?

> Vec
 [1] "STAT1"  "STAT1"  "STAT1"  "STAT1"  "GAPDH"  "GAPDH"  "GAPDH"
 [8] "ACTB"   "ACTB"   "ACTB"   "DDR1"   "RFC2"   "HSPA6"  "PAX8"
[15] "GUCA1A" "UBE1L"  "THRA"   "PTPN21" "CCL5"   "CYP2E1" "STAT1"
[22] "THRA"   "PAX8"

> as.numeric(factor(Vec))
 [1] 11 11 11 11  5  5  5  1  1  1  4 10  7  8  6 13 12  9  2  3 11 12
[23]  8

?

HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Any method to speed up this problem?

Reply via email to