I have a vector of gene symbols, some of which have multiple aliases. In the
case of an alias, they are separated by ' \\\ '.
Here is a real world example, which would represent one element of my
vector:
Eif4g2 /// Eif4g2-ps1 /// LOC678831

What I would like to do is input the vector into a function and output a
vector with just the first alias of each element (or, if there are no
aliases, just the one symbol).

So I wrote a simple little function to do this:
get.first.id.func <- function(vec, splitter){
  vec.lst <- strsplit(vec, splitter)
  first.func <- function(vec1){vec1[1]}
  vec.out <- sapply(vec.lst, first.func)
  vec.out
}

For a trivial example, this works:
> a <- c("a_b", "c_d")
> get.first.id.func(a, "_")
[1] "a" "c"

I am running into problems, however, with the real world split of ' \\\ '
I'm not even able to construct a sample vector of my own! Here is what I
get:
> a <- c('a \\\ b', 'a \\\ b')
> a
[1] "a \\ b" "a \\ b"
> a <- c('a \\\\ b', 'a \\\\ b')
> a
[1] "a \\\\ b" "a \\\\ b"

I KNOW this is related to R's peculiarities with \ escapes, but I don't have
the expertise to know how to get around it.

I would be very interested to learn:
1. how to construct a vector such that a == c('a \\\ b', 'a \\\ b')
2. how to properly input my split into my function so that I get the split
desired.

Thanks regex experts!
Mark

------------------------------------------------------------
Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail

"The real problem is not whether machines think but whether men do." -- B.
F. Skinner
******************************************************************

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to