Re: [R] seqinr ?: Splitting a factor name into several columns. Dealing with metabarcoding data.

David.Kaethner Mon, 13 Oct 2014 12:42:06 -0700

I'm not sure I understood your problem, maybe like this:

# split identifiers into columns
df1 <- data.frame(cbind(X = 1:10, Y = rnorm(10)),
                  Z.identifierA.B1298712 = factor(rep(LETTERS[1:2], each = 5)))


id <- names(df1)[3]
x <- do.call(rbind, str_split(id, "\\."))
y <- sapply(x, function(z) z <- df1[,id])

df1.goal <- data.frame(df1[,-3], y)

-dk

-----Ursprüngliche Nachricht-----
Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im 
Auftrag von Anna Zakrisson Braeunlich
Gesendet: Sonntag, 12. Oktober 2014 09:25
An: r-help@r-project.org
Betreff: [R] seqinr ?: Splitting a factor name into several columns. Dealing 
with metabarcoding data.

Hi,

I have a question how to split a factor name into different columns. I have 
metabarcoding data and need to merge the FASTA-file with the taxonomy- and 
counttable files (dataframes). To be able to do this merge, I need to isolate 
the common identifier, that unfortunately is baked in with a lot of other 
labels in the factor name eg:
sequence identifier: 
M01271_77_000000000.A8J0P_1_1101_10150_1525.1.322519.sample_1.sample_2

I want to split this name at every "." to get several columns:
column1: M01271_77_000000000
column2: A8J0P_1_1101_10150_1525
column3: 1
column4: 322519
column5: sample_1
column6: sample_2

I must add that I have no influence on how these names are given. This is how 
thay are supplied from Illumina Miseq. I just need to be able to deal with it.

Here is some extremely simplified dummy data to further show the issue at hand:

df1 <- data.frame(cbind(X = 1:10, Y = rnorm(10)),
                  Z.identifierA.B1298712 = factor(rep(LETTERS[1:2], each = 5)))
df2 <- data.frame(cbind(B = 13:22, K = rnorm(10)),
                  Q.identifierA.B4668726 = factor(rep(LETTERS[1:2], each = 5)))

# I have metabarcoding data with one FASTA-file, one count table and one 
taxonomy file # Above dummy data is just showing the issue at hand. I want to 
be able to merge my three # original data frames (here, the dummy data is only 
two dataframes). The problem is that # the only identifier that is commmon for 
the dataframes is "hidden" in the # factor name eg: Z.identifierA.1298712 and 
Q.identifierA.4668726. I hence need to be able # to split this name up into 
different columns to get "identifierA" alone as one column name # Then I can 
merge the dataframes.
# How can I do this in R. I know that it can be done in excel, but I would like 
to # produce a complete R-script to get a fast pipeline and avoid copy and 
paste errors.
# This is what I want it to look:

df1.goal <- data.frame(cbind(X = 1:10, Y = rnorm(10)),
                  Z = factor(rep(LETTERS[1:2], each = 5)),
                  identifierA = factor(rep(LETTERS[1:2], each = 5)),
                  B1298712 = factor(rep(LETTERS[1:2], each = 5)))

# Many thank's and with kind regards
Anna Zakrisson

><(((( >` . .   ` . .  ` . . ><(((( >` . .   ` . .  ` . .><(((( >` . .   
>` . .  ` . .><(((( >

Anna Zakrisson Braeunlich
PhD student

Department of Ecology, Environment and Plant Sciences Stockholm University 
Svante Arrheniusv. 21A
SE-106 91 Stockholm
Sweden/Sverige

Lives in Berlin.
For paper mail:
Katzbachstr. 21
D-10965, Berlin
Germany/Deutschland

E-mail: anna.zakris...@su.se
Tel work: +49-(0)3091541281
Mobile: +49-(0)15777374888
LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b

><(((( >` . .   ` . .  ` . . ><(((( >` . .   ` . .  ` . .><(((( >` . .   
>` . .  ` . .><(((( >

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] seqinr ?: Splitting a factor name into several columns. Dealing with metabarcoding data.

Reply via email to