An other remark : If I calculate df1$D<- as.numeric(as.Date(paste0(substr(df1$Debut,7,10),"-", substr(df1$Debut,4,5),"-",substr(df1$Debut,1,2)))) and df1$F <- as.numeric(as.Date(paste0(substr(df1$Fin,7,10),"-", substr(df1$Fin,4,5),"-",substr(df1$Fin,1,2)))) and if there is no interruption of time for the lines i and i+1 then df1$F[i] + 1 == df1$D[i+1] Michel
Le 14/07/2013 18:17, Arnaud Michel a écrit : > Hi, > Excuse me for the indistinctness > Le 13/07/2013 17:18, arun a écrit : >> Hi, >> "when the value of Debut of lines i = value Fin of lines i-1" >> That part is not clear esp. when it is looked upon with the expected output >> (df2). > I want to group the lines which have the same caracteristics (Matricule, > Nom, Sexe, DateNaissance, Contrat, Pays) and with period of time > (Debut/start and Fin/end) without interruption of time. > For exemple : > The following three lines : Debut/Start Fin/End > 1 VERON Féminin 02/09/1935 CDI commun France > *24/01/1995* 30/04/1997 > 1 VERON Féminin 02/09/1935 CDI commun France > 01/05/1997 30/12/1997 > 1 VERON Féminin 02/09/1935 CDI commun France > 31/12/1997 *31/12/1997* > are transformed into 1 line > 1 VERON Féminin 02/09/1935 CDI commun France > *24/01/1995* *31/12/1997* > because same caracteristicsand period of time without interruption of > time (from *24/01/1995* to *31/12/1997)* > > The following six lines : > 6 BENARD Masculin 01/04/1935 CDI commun Philippines > *02/02/1995* 27/02/1995 > 6 BENARD Masculin 01/04/1935 CDI commun Philippines > 28/02/1995 28/02/1995 > 6 BENARD Masculin 01/04/1935 CDI commun Philippines > 01/03/1995 *12/03/1995* > 6 BENARD Masculin 01/04/1935 CDI commun France > *13/03/1995* *30/06/1995* > 6 BENARD Masculin 01/04/1935 CDI commun France > *01/01/1996* 30/01/1996 > 6 BENARD Masculin 01/04/1935 CDI commun France > 31/01/1996 *31/01/1996* > are transformed into > 6 BENARD Masculin 01/04/1935 CDI commun Philippines > *02/02/1995* *12/03/1995* > 6 BENARD Masculin 01/04/1935 CDI commun France > *13/03/1995* *30/06/1995* > 6 BENARD Masculin 01/04/1935 CDI commun France > *01/01/1996* *31/01/1996* > because > lines 1-3 identical for caracteristics and without interruption in time > lines 4 and lines 5-6 are not grouped because there is an interruption > in time beetween *30/06/1995 *and *01/01/1996* > > Thank you for your help > Michel > >> Also, in your example dataset: >> >> df1$contrat[grep("^CDD",df1$contrat)] >> #[1] "CDD détaché ext. Cirad" "CDD détaché ext. Cirad" "CDD détaché ext. >> Cirad" >> #[4] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad" "CDD détaché ext. >> Cirad" >> #[7] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad" "CDD détaché ext. >> Cirad" >> ##Looks like there are extra spaces in some of them. I guess these are the >> same >> df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad" >> >> >> I tried this: >> indx<-as.numeric(interaction(df1[,1:6],drop=FALSE)) >> >> df1New<- df1 >> res2<-unique(within(df1New,{Debut<-ave(seq_along(indx),indx,FUN=function(x) >> Debut[head(x,1)]);Fin<- ave(seq_along(indx),indx,FUN=function(x) >> Fin[tail(x,1)])})) >> row.names(res2)<- 1:nrow(res2) >> >> res2[,c(1,2,7:8)] >> Matricule Nom Debut Fin >> 1 1 VERON 24/01/1995 31/12/1997 >> 2 6 BENARD 02/02/1995 12/03/1995 >> 3 6 BENARD 13/03/1995 31/01/1996 ###here not correct >> 4 8 DALNIC 24/01/1995 31/08/1995 >> 5 8 DALNIC 01/09/1995 29/02/2000 >> 6 934 FORNI 26/01/1995 31/08/2001 >> 7 934 FORNI 01/09/2001 31/08/2004 >> 8 934 FORNI 01/09/2004 31/08/2007 >> 9 934 FORNI 01/09/2007 04/09/2012 >> 10 934 FORNI 05/09/2012 31/12/4712 >> >> >> df2[,c(1,2,7:8)] >> Mat Nom Debut Fin >> 1 1 VERON 24/01/1995 31/12/1997 >> 2 6 BENARD 02/02/1995 12/03/1995 >> 3 6 BENARD 13/03/1995 30/06/1995 >> 4 6 BENARD 01/01/1996 31/01/1996 #missing this row >> 5 8 DALNIC 24/01/1995 31/08/1995 >> 6 8 DALNIC 01/09/1995 29/02/2000 >> 7 934 FORNI 26/01/1995 31/08/2001 >> 8 934 FORNI 01/09/2001 31/08/2004 >> 9 934 FORNI 01/09/2004 31/08/2007 >> 10 934 FORNI 01/09/2007 04/09/2012 >> 11 934 FORNI 05/09/2012 31/12/4712 >> >> >> Here, the dates look similar to the ones on df2 except for one row in df2. >> >> A.K. >> >> >> >> >> ----- Original Message ----- >> From: Arnaud Michel <michel.arn...@cirad.fr> >> To: R help <r-help@r-project.org> >> Cc: >> Sent: Friday, July 12, 2013 3:45 PM >> Subject: [R] simplify a dataframe >> >> Hello >> >> I have the following problem : group the lines of a dataframe when no >> information change (Matricule, Nom, Sexe, DateNaissance, Contrat, Pays) >> and when the value of Debut of lines i = value Fin of lines i-1 >> I can obtain it with a do loop. Is it possible to avoid the loop ? >> >> The dataframe initial is df1 >> dput(df1) >> structure(list(Matricule = c(1L, 1L, 1L, 6L, 6L, 6L, 6L, 6L, >> 6L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 934L, 934L, 934L, 934L, >> 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, >> 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, >> 934L, 934L, 934L, 934L), Nom = c("VERON", "VERON", "VERON", "BENARD", >> "BENARD", "BENARD", "BENARD", "BENARD", "BENARD", "DALNIC", "DALNIC", >> "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", >> "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", >> "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", >> "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", >> "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", >> "FORNI", "FORNI"), Sexe = c("Féminin", "Féminin", "Féminin", >> "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", >> "Féminin", "Féminin", "Féminin", "Féminin", "Féminin", "Féminin", >> "Féminin", "Féminin", "Féminin", "Masculin", "Masculin", "Masculin", >> "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", >> "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", >> "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", >> "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", >> "Masculin", "Masculin", "Masculin"), DateNaissance = c("02/09/1935", >> "02/09/1935", "02/09/1935", "01/04/1935", "01/04/1935", "01/04/1935", >> "01/04/1935", "01/04/1935", "01/04/1935", "19/02/1940", "19/02/1940", >> "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", >> "19/02/1940", "19/02/1940", "10/07/1961", "10/07/1961", "10/07/1961", >> "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", >> "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", >> "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", >> "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", >> "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", >> "10/07/1961", "10/07/1961"), contrat = c("CDI commun", "CDI commun", >> "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", >> "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", >> "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", >> "CDI commun", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", >> "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. >> Cirad", >> "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. >> Cirad", >> "CDD détaché ext. Cirad", "CDI Détachés Autres", "CDI Détachés Autres", >> "CDI Détachés Autres", "CDI Détachés Autres", "CDI Détachés Autres", >> "CDI Détachés Autres", "CDI Détachés Autres", "CDI Détachés Autres", >> "CDI Détachés Autres", "CDI Détachés Autres", "CDI Détachés Autres", >> "CDI Détachés Autres", "CDI Détachés Autres", "CDI commun", "CDI commun", >> "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", >> "CDI commun"), Pays = c("France", "France", "France", "Philippines", >> "Philippines", "Philippines", "France", "France", "France", "France", >> "France", "Martinique", "Martinique", "Martinique", "Martinique", >> "Martinique", "Martinique", "Martinique", "Cameroun", "Cameroun", >> "Cameroun", "Cameroun", "Cameroun", "Cameroun", "Cameroun", "Cameroun", >> "Cameroun", "France", "France", "France", "France", "France", >> "France", "France", "Congo", "Congo", "Congo", "Congo", "Congo", >> "Congo", "Gabon", "Gabon", "Gabon", "Gabon", "Gabon", "Gabon", >> "Congo", "Congo"), Debut = c("24/01/1995", "01/05/1997", "31/12/1997", >> "02/02/1995", "28/02/1995", "01/03/1995", "13/03/1995", "01/01/1996", >> "31/01/1996", "24/01/1995", "01/07/1995", "01/09/1995", "01/07/1997", >> "01/01/1998", "01/08/1998", "01/01/2000", "17/01/2000", "29/02/2000", >> "26/01/1995", "01/07/1996", "16/09/1997", "01/01/1998", "01/07/1998", >> "04/11/1999", "01/01/2001", "01/04/2001", "31/08/2001", "01/09/2001", >> "02/09/2001", "01/12/2001", "01/02/2003", "01/04/2003", "01/01/2004", >> "01/03/2004", "01/09/2004", "01/01/2005", "01/04/2005", "28/10/2006", >> "01/01/2007", "01/04/2007", "01/09/2007", "01/01/2009", "01/04/2009", >> "01/01/2010", "01/01/2011", "01/04/2011", "05/09/2012", "01/01/2013" >> ), Fin = c("30/04/1997", "30/12/1997", "31/12/1997", "27/02/1995", >> "28/02/1995", "12/03/1995", "30/06/1995", "30/01/1996", "31/01/1996", >> "30/06/1995", "31/08/1995", "30/06/1997", "31/12/1997", "31/07/1998", >> "31/12/1999", "16/01/2000", "28/02/2000", "29/02/2000", "30/06/1996", >> "15/09/1997", "31/12/1997", "30/06/1998", "03/11/1999", "31/12/2000", >> "31/03/2001", "30/08/2001", "31/08/2001", "01/09/2001", "30/11/2001", >> "31/01/2003", "31/03/2003", "31/12/2003", "29/02/2004", "31/08/2004", >> "31/12/2004", "31/03/2005", "27/10/2006", "31/12/2006", "31/03/2007", >> "31/08/2007", "31/12/2008", "31/03/2009", "31/12/2009", "31/12/2010", >> "31/03/2011", "04/09/2012", "31/12/2012", "31/12/4712")), .Names = >> c("Matricule", >> "Nom", "Sexe", "DateNaissance", "contrat", "Pays", "Debut", "Fin" >> ), class = "data.frame", row.names = c(NA, -48L)) >> >> The dataframe to be obtained is df2 >> dput(df2) >> structure(list(Mat = c(1L, 6L, 6L, 6L, 8L, 8L, 934L, 934L, 934L, >> 934L, 934L), Nom = c("VERON", "BENARD", "BENARD", "BENARD", "DALNIC", >> "DALNIC", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI"), Sexe = c("Féminin", >> "Masculin", "Masculin", "Masculin", "Féminin", "Féminin", "Masculin", >> "Masculin", "Masculin", "Masculin", "Masculin"), DateNaissance = >> c("02/09/1935", >> "01/04/1935", "01/04/1935", "01/04/1935", "19/02/1940", "19/02/1940", >> "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961" >> ), contrat = c("CDI commun", "CDI commun", "CDI commun", "CDI commun", >> "CDI commun", "CDI commun", "CDD détaché ext. Cirad", "CDI Détachés >> Autres", >> "CDI Détachés Autres", "CDI commun", "CDI commun"), Pays = c("France", >> "Philippines", "France", "France", "France", "Martinique", "Cameroun", >> "France", "Congo", "Gabon", "Congo"), Debut = c("24/01/1995", >> "02/02/1995", "13/03/1995", "01/01/1996", "24/01/1995", "01/09/1995", >> "26/01/1995", "01/09/2001", "01/09/2004", "01/09/2007", "05/09/2012" >> ), Fin = c("31/12/1997", "12/03/1995", "30/06/1995", "31/01/1996", >> "31/08/1995", "29/02/2000", "31/08/2001", "31/08/2004", "31/08/2007", >> "04/09/2012", "31/12/4712")), .Names = c("Mat", "Nom", "Sexe", >> "DateNaissance", "contrat", "Pays", "Debut", "Fin"), class = >> "data.frame", row.names = c(NA, >> -11L)) >> >> Thank you for your help >> > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Michel ARNAUD Chargé de mission auprès du DRH DGDRD-Drh - TA 174/04 Av Agropolis 34398 Montpellier cedex 5 tel : 04.67.61.75.38 fax : 04.67.61.57.87 port: 06.47.43.55.31 [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.