I have a vector of gene symbols, some of which have multiple aliases. In the case of an alias, they are separated by ' \\\ '. Here is a real world example, which would represent one element of my vector: Eif4g2 /// Eif4g2-ps1 /// LOC678831
What I would like to do is input the vector into a function and output a vector with just the first alias of each element (or, if there are no aliases, just the one symbol). So I wrote a simple little function to do this: get.first.id.func <- function(vec, splitter){ vec.lst <- strsplit(vec, splitter) first.func <- function(vec1){vec1[1]} vec.out <- sapply(vec.lst, first.func) vec.out } For a trivial example, this works: > a <- c("a_b", "c_d") > get.first.id.func(a, "_") [1] "a" "c" I am running into problems, however, with the real world split of ' \\\ ' I'm not even able to construct a sample vector of my own! Here is what I get: > a <- c('a \\\ b', 'a \\\ b') > a [1] "a \\ b" "a \\ b" > a <- c('a \\\\ b', 'a \\\\ b') > a [1] "a \\\\ b" "a \\\\ b" I KNOW this is related to R's peculiarities with \ escapes, but I don't have the expertise to know how to get around it. I would be very interested to learn: 1. how to construct a vector such that a == c('a \\\ b', 'a \\\ b') 2. how to properly input my split into my function so that I get the split desired. Thanks regex experts! Mark ------------------------------------------------------------ Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail "The real problem is not whether machines think but whether men do." -- B. F. Skinner ****************************************************************** [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.