Re: [R] Change values in a dateframe-Speed TEST

Arnaud Michel Thu, 25 Jul 2013 01:01:02 -0700

Le 25/07/2013 08:50, Berend Hasselman a écrit :

On 25-07-2013, at 08:35, Arnaud Michel <michel.arn...@cirad.fr> wrote:

But I just noticed that the two solutions are not comparable :
the change concern only Nom and Prenom (solution Berend) and not also Sexe or 
Date.de.naissance orother variables (solution Arun) that can changed. But my 
question was badly put.

Indeed:-)

But that can be remedied with (small correction w.r.t. initial solution: 
drop=TRUE removed; not relevant here)

r1 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
                     FUN=function(x) {x[,1:ncol(x)] <- x[1,1:ncol(x)];x})))

and

r2 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
                     FUN=function(x) {x[,1:ncol(x)] <- 
x[nrow(x),1:ncol(x)];x})))

Thank you but I keep

{x[,c("Nom","Prénom")] <- x[nrow(x),c("Nom","Prénom")];x} because in thedataframe there are other variables that I do not want to change. I wantchange only "Nom" and "Prénom"


PS : ?w.r.t.
Michel

Less elegant than alternative with ave

Berend

Michel

Le 25/07/2013 08:06, Arnaud Michel a écrit :

Hi

For a dataframe with name PaysContrat1 and with
nrow(PaysContrat1)
[1] 52366

the test of system.time is :

system.time(droplevels(do.call(rbind,lapply(split(PaysContrat1,PaysContrat1$Matricule),
FUN=function(x) {x[,c("Nom","Prénom")] <- 
x[nrow(x),c("Nom","Prénom"),drop=TRUE];x}))))
   user  system elapsed
  14.03    0.00   14.04

system.time(droplevels(PaysContrat1[with(PaysContrat1,ave(seq_along(Matricule),Matricule,FUN=min))
 ,]  ))
   user  system elapsed
    0.2     0.0     0.2

Michel

Le 24/07/2013 15:29, arun a écrit :

Hi Michel,
You could try:


df1New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),])
row.names(df1New)<-1:nrow(df1New)
df2New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),])
row.names(df2New)<-1:nrow(df2New)
  identical(df1New,df1)
#[1] TRUE
  identical(df2New,df2)
#[1] TRUE
A.K.



----- Original Message -----
From: Arnaud Michel <michel.arn...@cirad.fr>
To: R help <r-help@r-project.org>
Cc:
Sent: Wednesday, July 24, 2013 2:39 AM
Subject: [R] Change values in a dateframe

Hello

I have the following problem :
The dataframe TEST has multiple lines for a same person because :
there are differents values of Nom or differents values of Prenom
but the values of Matricule or Sexe or Date.de.naissance are the same.

TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER",
"JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"
), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine",
"Jeannine", "Michel", "Michele", "Michèle", "Michelle", "Victor"
), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class =
"factor"),
      Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
      1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
      "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class =
"factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
row.names = c(NA,
-11L))


I would want to make homogeneous the information and would like built 2
dataframes :
df1 wich has the value of Nom and Prenom of the first lines of TEST when
there are different values. The other values (Matricule or Sexe or
Date.de.naissance) are unchanged

df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER",
"JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom =
structure(c(6L,
3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
"Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor"
), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class =
"factor"),
      Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
      1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
      "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class =
"factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
row.names = c(NA,
-11L))

df2 wich has the value of Nom and Prenom of the last lines of TEST when
there are different values. The other values (Matricule or Sexe or
Date.de.naissance) are unchanged.

df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE",
"LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"),
      Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
      5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel",
      "Michèle", "Michelle", "Victor"), class = "factor"), Sexe =
structure(c(1L,
      1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin",
      "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L,
      2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940",
      "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936",
      "30/03/1935"), class = "factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
row.names = c(NA,
-11L))

Thank for your helps
Michel

--
Michel ARNAUD
Chargé de mission auprès du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31


--
Michel ARNAUD
Chargé de mission auprès du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Change values in a dateframe-Speed TEST

Reply via email to