Re: [R] To build a new Df from 2 Df

2014-10-14 Thread David.Kaethner
Hello,

here's a draft of a solution. I hope it's not overly complicated.

# find all possible combinations
combi - expand.grid(Dem$Nom, Rap$Nom); names(combi) - c(Dem, Rap)

# we need the corresponding departments and units 
combi$DemDep - apply(combi, 1, function(x) Dem$Departement[x[1] == Dem$Nom])
combi$DemUni - apply(combi, 1, function(x) Dem$Unite[x[1] == Dem$Nom])
combi$RapDep - apply(combi, 1, function(x) Rap$Departement[x[2] == Rap$Nom])
combi$RapUni - apply(combi, 1, function(x) Rap$Unite[x[2] == Rap$Nom])

# we exclude the combinations that we don't want
dep - combi[combi$DemDep != combi$RapDep, c(Dem, Rap)]
dep$id - as.numeric(dep$Rap)
uni - combi[combi$DemUni != combi$RapUni, c(Dem, Rap)]
uni$id - as.numeric(uni$Rap)

# preliminary result
resDep - reshape(dep,
timevar = id,
idvar = Dem,
direction = wide
)

resUni - reshape(uni,
  timevar = id,
  idvar = Dem,
  direction = wide
)

In resDep and resUni you find the results for Rapporteur1 and Rapporteur2. NAs 
indicate where conditions did not match. For Rap1/Rap2 you can now choose any 
column from resDep and resUni that is not NA for that specific Demandeur. I 
wasn't exactly sure about your third condition, so I'll leave that to you. But 
with the complete possible matches, you have a more general solution.

Btw, you can construct data.frames just like this:

Dem - data.frame(
  Nom = c(John, Jim, Julie, Charles, Michel, Emma, Sandra, 
Elodie, Thierry, Albert, Jean, Francois, Pierre, Cyril, Damien, 
Jean-Michel, Vincent, Daniel, Yvan, Catherine),
  Departement = c(D, A, A, C, D, B, D, B, C, D, B, B, 
B, A, C, D, B, A, D, D),
  Unite = c(Unite8, Unite4, Unite4, Unite7, Unite9, Unite1, 
Unite6, Unite5, Unite7, Unite3, Unite2, Unite6, Unite8, Unite8, 
Unite3, Unite8, Unite9, Unite7, Unite9, Unite5)
)

-dk

-Ursprüngliche Nachricht-
Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im 
Auftrag von Arnaud Michel
Gesendet: Dienstag, 14. Oktober 2014 10:46
An: r-help@r-project.org
Betreff: [R] To build a new Df from 2 Df

Hello

I have 2 df Dem and Rap.
I would want to build all the df (dfnew) by associating these two df (Dem and 
Rap) in the following way :

For each value of Dem$Nom (dfnew$Demandeur), I associate 2 different values of 
Rap$Nom (dfnew$Rapporteur1 and dfnew$Rapporteur2) in such a way

  * for each dfnew$Demandeur, dfnew$Rapporteur1 does not have the same
value for Departement as Dem$Departement
  * for each dfnew$Demandeur, dfnew$Rapporteur2 does not have the same
value for Unite as Dem$Unite
  * the value of table(dfnew$Rapporteur1) and the value of
table(dfnew$Rapporteur2) must be balanced and not too different
(Accepted differences : 1)

table(dfnew$Rapporteur1)
Rapporteur01 Rapporteur02 Rapporteur03 Rapporteur04 Rapporteur05
4   4 4  4   
   4

Thanks for your help
Michel

  Dem - structure(list(Nom = c(John, Jim, Julie, Charles, Michel, 
Emma, Sandra, Elodie, Thierry, Albert, Jean, Francois, Pierre, 
Cyril, Damien, Jean-Michel, Vincent, Daniel, Yvan, Catherine), 
Departement = c(D, A, A, C, D, B, D, B, C, D, B, B, 
B, A, C, D, B, A, D, D), Unite = c(Unite8, Unite4, 
Unite4, Unite7, Unite9, Unite1, Unite6, Unite5, Unite7, Unite3, 
Unite2, Unite6, Unite8, Unite8, Unite3, Unite8, Unite9, Unite7, 
Unite9, Unite5)), .Names = c(Nom, Departement, Unite
), row.names = c(NA, -20L), class = data.frame)

Rap - structure(list(Nom = c(Rapporteur01, Rapporteur02, Rapporteur03, 
Rapporteur04, Rapporteur05), Departement = c(C, D, C, C, D), 
Unite = c(Unite10, Unite6, Unite5, Unite5, Unite4)), .Names = 
c(Nom, Departement, Unite), row.names = c(NA, -5L), class = data.frame)

dfnew - structure(list(Demandeur = structure(c(13L, 12L, 14L, 3L, 15L, 8L, 
17L, 7L, 18L, 1L, 10L, 9L, 16L, 4L, 5L, 11L, 19L, 6L, 20L, 2L), .Label = 
c(Albert, Catherine, Charles, Cyril, Damien, Daniel, Elodie, 
Emma, Francois, Jean, Jean-Michel, Jim, John, Julie, Michel, 
Pierre, Sandra, Thierry, Vincent, Yvan), class = factor), 
Rapporteur1 = structure(c(3L, 1L, 3L, 5L, 1L, 5L, 1L, 2L, 5L, 4L, 2L, 4L, 2L, 
3L, 5L, 4L, 4L, 2L, 3L, 1L), .Label = c(Rapporteur01, Rapporteur02, 
Rapporteur03, Rapporteur04, Rapporteur05), class = factor), Rapporteur2 
= structure(c(1L, 3L, 4L, 4L, 2L, 4L, 5L, 1L, 2L, 3L, 3L, 3L, 5L, 5L, 1L, 1L, 
2L, 5L, 4L, 2L), .Label = c(Rapporteur01, Rapporteur02, Rapporteur03, 
Rapporteur04, Rapporteur05), class = factor)), .Names = c(Demandeur, 
Rapporteur1, Rapporteur2), row.names = c(NA, -20L), class =
data.frame)


--
Michel ARNAUD
Cirad Montpellier


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] seqinr ?: Splitting a factor name into several columns. Dealing with metabarcoding data.

2014-10-13 Thread David.Kaethner
I'm not sure I understood your problem, maybe like this:

# split identifiers into columns
df1 - data.frame(cbind(X = 1:10, Y = rnorm(10)),
  Z.identifierA.B1298712 = factor(rep(LETTERS[1:2], each = 5)))

id - names(df1)[3]
x - do.call(rbind, str_split(id, \\.))
y - sapply(x, function(z) z - df1[,id])

df1.goal - data.frame(df1[,-3], y)

-dk

-Ursprüngliche Nachricht-
Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im 
Auftrag von Anna Zakrisson Braeunlich
Gesendet: Sonntag, 12. Oktober 2014 09:25
An: r-help@r-project.org
Betreff: [R] seqinr ?: Splitting a factor name into several columns. Dealing 
with metabarcoding data.

Hi,

I have a question how to split a factor name into different columns. I have 
metabarcoding data and need to merge the FASTA-file with the taxonomy- and 
counttable files (dataframes). To be able to do this merge, I need to isolate 
the common identifier, that unfortunately is baked in with a lot of other 
labels in the factor name eg:
sequence identifier: 
M01271_77_0.A8J0P_1_1101_10150_1525.1.322519.sample_1.sample_2

I want to split this name at every . to get several columns:
column1: M01271_77_0
column2: A8J0P_1_1101_10150_1525
column3: 1
column4: 322519
column5: sample_1
column6: sample_2

I must add that I have no influence on how these names are given. This is how 
thay are supplied from Illumina Miseq. I just need to be able to deal with it.

Here is some extremely simplified dummy data to further show the issue at hand:

df1 - data.frame(cbind(X = 1:10, Y = rnorm(10)),
  Z.identifierA.B1298712 = factor(rep(LETTERS[1:2], each = 5)))
df2 - data.frame(cbind(B = 13:22, K = rnorm(10)),
  Q.identifierA.B4668726 = factor(rep(LETTERS[1:2], each = 5)))

# I have metabarcoding data with one FASTA-file, one count table and one 
taxonomy file # Above dummy data is just showing the issue at hand. I want to 
be able to merge my three # original data frames (here, the dummy data is only 
two dataframes). The problem is that # the only identifier that is commmon for 
the dataframes is hidden in the # factor name eg: Z.identifierA.1298712 and 
Q.identifierA.4668726. I hence need to be able # to split this name up into 
different columns to get identifierA alone as one column name # Then I can 
merge the dataframes.
# How can I do this in R. I know that it can be done in excel, but I would like 
to # produce a complete R-script to get a fast pipeline and avoid copy and 
paste errors.
# This is what I want it to look:

df1.goal - data.frame(cbind(X = 1:10, Y = rnorm(10)),
  Z = factor(rep(LETTERS[1:2], each = 5)),
  identifierA = factor(rep(LETTERS[1:2], each = 5)),
  B1298712 = factor(rep(LETTERS[1:2], each = 5)))

# Many thank's and with kind regards
Anna Zakrisson

 ` . .   ` . .  ` . .  ` . .   ` . .  ` . . ` . .   
` . .  ` . . 

Anna Zakrisson Braeunlich
PhD student

Department of Ecology, Environment and Plant Sciences Stockholm University 
Svante Arrheniusv. 21A
SE-106 91 Stockholm
Sweden/Sverige

Lives in Berlin.
For paper mail:
Katzbachstr. 21
D-10965, Berlin
Germany/Deutschland

E-mail: anna.zakris...@su.se
Tel work: +49-(0)3091541281
Mobile: +49-(0)15777374888
LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b

 ` . .   ` . .  ` . .  ` . .   ` . .  ` . . ` . .   
` . .  ` . . 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.