Hi, Excuse me for the indistinctness Le 13/07/2013 17:18, arun a écrit : > Hi, > "when the value of Debut of lines i = value Fin of lines i-1" > That part is not clear esp. when it is looked upon with the expected output > (df2). I want to group the lines which have the same caracteristics (Matricule, Nom, Sexe, DateNaissance, Contrat, Pays) and with period of time (Debut/start and Fin/end) without interruption of time. For exemple : The following three lines : Debut/Start Fin/End 1 VERON Féminin 02/09/1935 CDI commun France *24/01/1995* 30/04/1997 1 VERON Féminin 02/09/1935 CDI commun France 01/05/1997 30/12/1997 1 VERON Féminin 02/09/1935 CDI commun France 31/12/1997 *31/12/1997* are transformed into 1 line 1 VERON Féminin 02/09/1935 CDI commun France *24/01/1995* *31/12/1997* because same caracteristicsand period of time without interruption of time (from *24/01/1995* to *31/12/1997)*
The following six lines : 6 BENARD Masculin 01/04/1935 CDI commun Philippines *02/02/1995* 27/02/1995 6 BENARD Masculin 01/04/1935 CDI commun Philippines 28/02/1995 28/02/1995 6 BENARD Masculin 01/04/1935 CDI commun Philippines 01/03/1995 *12/03/1995* 6 BENARD Masculin 01/04/1935 CDI commun France *13/03/1995* *30/06/1995* 6 BENARD Masculin 01/04/1935 CDI commun France *01/01/1996* 30/01/1996 6 BENARD Masculin 01/04/1935 CDI commun France 31/01/1996 *31/01/1996* are transformed into 6 BENARD Masculin 01/04/1935 CDI commun Philippines *02/02/1995* *12/03/1995* 6 BENARD Masculin 01/04/1935 CDI commun France *13/03/1995* *30/06/1995* 6 BENARD Masculin 01/04/1935 CDI commun France *01/01/1996* *31/01/1996* because lines 1-3 identical for caracteristics and without interruption in time lines 4 and lines 5-6 are not grouped because there is an interruption in time beetween *30/06/1995 *and *01/01/1996* Thank you for your help Michel > Also, in your example dataset: > > df1$contrat[grep("^CDD",df1$contrat)] > #[1] "CDD détaché ext. Cirad" "CDD détaché ext. Cirad" "CDD détaché ext. > Cirad" > #[4] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad" "CDD détaché ext. > Cirad" > #[7] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad" "CDD détaché ext. > Cirad" > ##Looks like there are extra spaces in some of them. I guess these are the > same > df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad" > > > I tried this: > indx<-as.numeric(interaction(df1[,1:6],drop=FALSE)) > > df1New<- df1 > res2<-unique(within(df1New,{Debut<-ave(seq_along(indx),indx,FUN=function(x) > Debut[head(x,1)]);Fin<- ave(seq_along(indx),indx,FUN=function(x) > Fin[tail(x,1)])})) > row.names(res2)<- 1:nrow(res2) > > res2[,c(1,2,7:8)] > Matricule Nom Debut Fin > 1 1 VERON 24/01/1995 31/12/1997 > 2 6 BENARD 02/02/1995 12/03/1995 > 3 6 BENARD 13/03/1995 31/01/1996 ###here not correct > 4 8 DALNIC 24/01/1995 31/08/1995 > 5 8 DALNIC 01/09/1995 29/02/2000 > 6 934 FORNI 26/01/1995 31/08/2001 > 7 934 FORNI 01/09/2001 31/08/2004 > 8 934 FORNI 01/09/2004 31/08/2007 > 9 934 FORNI 01/09/2007 04/09/2012 > 10 934 FORNI 05/09/2012 31/12/4712 > > > df2[,c(1,2,7:8)] > Mat Nom Debut Fin > 1 1 VERON 24/01/1995 31/12/1997 > 2 6 BENARD 02/02/1995 12/03/1995 > 3 6 BENARD 13/03/1995 30/06/1995 > 4 6 BENARD 01/01/1996 31/01/1996 #missing this row > 5 8 DALNIC 24/01/1995 31/08/1995 > 6 8 DALNIC 01/09/1995 29/02/2000 > 7 934 FORNI 26/01/1995 31/08/2001 > 8 934 FORNI 01/09/2001 31/08/2004 > 9 934 FORNI 01/09/2004 31/08/2007 > 10 934 FORNI 01/09/2007 04/09/2012 > 11 934 FORNI 05/09/2012 31/12/4712 > > > Here, the dates look similar to the ones on df2 except for one row in df2. > > A.K. > > > > > ----- Original Message ----- > From: Arnaud Michel <michel.arn...@cirad.fr> > To: R help <r-help@r-project.org> > Cc: > Sent: Friday, July 12, 2013 3:45 PM > Subject: [R] simplify a dataframe > > Hello > > I have the following problem : group the lines of a dataframe when no > information change (Matricule, Nom, Sexe, DateNaissance, Contrat, Pays) > and when the value of Debut of lines i = value Fin of lines i-1 > I can obtain it with a do loop. Is it possible to avoid the loop ? > > The dataframe initial is df1 > dput(df1) > structure(list(Matricule = c(1L, 1L, 1L, 6L, 6L, 6L, 6L, 6L, > 6L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 934L, 934L, 934L, 934L, > 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, > 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, > 934L, 934L, 934L, 934L), Nom = c("VERON", "VERON", "VERON", "BENARD", > "BENARD", "BENARD", "BENARD", "BENARD", "BENARD", "DALNIC", "DALNIC", > "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", > "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", > "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", > "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", > "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", > "FORNI", "FORNI"), Sexe = c("Féminin", "Féminin", "Féminin", > "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", > "Féminin", "Féminin", "Féminin", "Féminin", "Féminin", "Féminin", > "Féminin", "Féminin", "Féminin", "Masculin", "Masculin", "Masculin", > "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", > "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", > "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", > "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", > "Masculin", "Masculin", "Masculin"), DateNaissance = c("02/09/1935", > "02/09/1935", "02/09/1935", "01/04/1935", "01/04/1935", "01/04/1935", > "01/04/1935", "01/04/1935", "01/04/1935", "19/02/1940", "19/02/1940", > "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", > "19/02/1940", "19/02/1940", "10/07/1961", "10/07/1961", "10/07/1961", > "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", > "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", > "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", > "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", > "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", > "10/07/1961", "10/07/1961"), contrat = c("CDI commun", "CDI commun", > "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", > "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", > "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", > "CDI commun", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", > "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. > Cirad", > "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. > Cirad", > "CDD détaché ext. Cirad", "CDI Détachés Autres", "CDI Détachés Autres", > "CDI Détachés Autres", "CDI Détachés Autres", "CDI Détachés Autres", > "CDI Détachés Autres", "CDI Détachés Autres", "CDI Détachés Autres", > "CDI Détachés Autres", "CDI Détachés Autres", "CDI Détachés Autres", > "CDI Détachés Autres", "CDI Détachés Autres", "CDI commun", "CDI commun", > "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", > "CDI commun"), Pays = c("France", "France", "France", "Philippines", > "Philippines", "Philippines", "France", "France", "France", "France", > "France", "Martinique", "Martinique", "Martinique", "Martinique", > "Martinique", "Martinique", "Martinique", "Cameroun", "Cameroun", > "Cameroun", "Cameroun", "Cameroun", "Cameroun", "Cameroun", "Cameroun", > "Cameroun", "France", "France", "France", "France", "France", > "France", "France", "Congo", "Congo", "Congo", "Congo", "Congo", > "Congo", "Gabon", "Gabon", "Gabon", "Gabon", "Gabon", "Gabon", > "Congo", "Congo"), Debut = c("24/01/1995", "01/05/1997", "31/12/1997", > "02/02/1995", "28/02/1995", "01/03/1995", "13/03/1995", "01/01/1996", > "31/01/1996", "24/01/1995", "01/07/1995", "01/09/1995", "01/07/1997", > "01/01/1998", "01/08/1998", "01/01/2000", "17/01/2000", "29/02/2000", > "26/01/1995", "01/07/1996", "16/09/1997", "01/01/1998", "01/07/1998", > "04/11/1999", "01/01/2001", "01/04/2001", "31/08/2001", "01/09/2001", > "02/09/2001", "01/12/2001", "01/02/2003", "01/04/2003", "01/01/2004", > "01/03/2004", "01/09/2004", "01/01/2005", "01/04/2005", "28/10/2006", > "01/01/2007", "01/04/2007", "01/09/2007", "01/01/2009", "01/04/2009", > "01/01/2010", "01/01/2011", "01/04/2011", "05/09/2012", "01/01/2013" > ), Fin = c("30/04/1997", "30/12/1997", "31/12/1997", "27/02/1995", > "28/02/1995", "12/03/1995", "30/06/1995", "30/01/1996", "31/01/1996", > "30/06/1995", "31/08/1995", "30/06/1997", "31/12/1997", "31/07/1998", > "31/12/1999", "16/01/2000", "28/02/2000", "29/02/2000", "30/06/1996", > "15/09/1997", "31/12/1997", "30/06/1998", "03/11/1999", "31/12/2000", > "31/03/2001", "30/08/2001", "31/08/2001", "01/09/2001", "30/11/2001", > "31/01/2003", "31/03/2003", "31/12/2003", "29/02/2004", "31/08/2004", > "31/12/2004", "31/03/2005", "27/10/2006", "31/12/2006", "31/03/2007", > "31/08/2007", "31/12/2008", "31/03/2009", "31/12/2009", "31/12/2010", > "31/03/2011", "04/09/2012", "31/12/2012", "31/12/4712")), .Names = > c("Matricule", > "Nom", "Sexe", "DateNaissance", "contrat", "Pays", "Debut", "Fin" > ), class = "data.frame", row.names = c(NA, -48L)) > > The dataframe to be obtained is df2 > dput(df2) > structure(list(Mat = c(1L, 6L, 6L, 6L, 8L, 8L, 934L, 934L, 934L, > 934L, 934L), Nom = c("VERON", "BENARD", "BENARD", "BENARD", "DALNIC", > "DALNIC", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI"), Sexe = c("Féminin", > "Masculin", "Masculin", "Masculin", "Féminin", "Féminin", "Masculin", > "Masculin", "Masculin", "Masculin", "Masculin"), DateNaissance = > c("02/09/1935", > "01/04/1935", "01/04/1935", "01/04/1935", "19/02/1940", "19/02/1940", > "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961" > ), contrat = c("CDI commun", "CDI commun", "CDI commun", "CDI commun", > "CDI commun", "CDI commun", "CDD détaché ext. Cirad", "CDI Détachés > Autres", > "CDI Détachés Autres", "CDI commun", "CDI commun"), Pays = c("France", > "Philippines", "France", "France", "France", "Martinique", "Cameroun", > "France", "Congo", "Gabon", "Congo"), Debut = c("24/01/1995", > "02/02/1995", "13/03/1995", "01/01/1996", "24/01/1995", "01/09/1995", > "26/01/1995", "01/09/2001", "01/09/2004", "01/09/2007", "05/09/2012" > ), Fin = c("31/12/1997", "12/03/1995", "30/06/1995", "31/01/1996", > "31/08/1995", "29/02/2000", "31/08/2001", "31/08/2004", "31/08/2007", > "04/09/2012", "31/12/4712")), .Names = c("Mat", "Nom", "Sexe", > "DateNaissance", "contrat", "Pays", "Debut", "Fin"), class = > "data.frame", row.names = c(NA, > -11L)) > > Thank you for your help > -- Michel ARNAUD Chargé de mission auprès du DRH DGDRD-Drh - TA 174/04 Av Agropolis 34398 Montpellier cedex 5 tel : 04.67.61.75.38 fax : 04.67.61.57.87 port: 06.47.43.55.31 [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.