On 2011-04-25 10:19, Christoph Jäckel wrote:
Hi Together,

I have a problem with the plyr package - more precisely with the ddply
function - and would be very grateful for any help. I hope the example
here is precise enough for someone to identify the problem. Basically,
in this step I want to identify observations that are identical in
terms of certain identifiers (ID1, ID2, ID3) and just want to save
those observations (in this step, without deleting any rows or
manipulating any data) in a separate data.frame. However, I get the
warning message below and the column with dates is messed up.
Interestingly, the value column (the type is factor here, but if you
change that with as.integer it doesn't make any difference) is handled
correctly. Any idea what I do wrong?

df<- 
data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c("v1","v1","v1","v1","v2","v1","v1"),

Date=c("1985-05-1","1985-05-2","1985-05-3","1985-05-4","1985-05-5","1985-05-6","1985-05-7"),
                  Value=c(1,2,3,4,5,6,7)))
df[,1]<- as.character(df[,1])
df[,2]<- as.character(df[,2])
df$Date<- strptime(df$Date,"%Y-%m-%d")

#Apparently there are two observation that have the same IDs: ID1=2 and ID1=4
ddply(df,.(ID1,ID2,ID3),nrow)
#I want to save those IDs in a separate data.frame, so the desired output is:
df[c(2:3,6:7),]

#My idea: Write a custom function that only returns observations with
multiple rows.
#Seems to work except that the Date column doesn't make any sense anymore
#Warning message: In output[[var]][rng]<- df[[var]]: number of items
to replace is not a multiple of replacement length
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})

#Notice that it works perfectly if I only have one observation with
multiple rows
ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})

I would characterize your problem as:
a) using strptime - this is what gives ddply() fits;

b) not using str() to check whether R agrees with
   you with respect to your data;

c) using cbind() inside data.frame(). This isn't
   wrong, but is rarely (in my experience) useful.

If you use as.Date (or even nothing) on your Date
variable, you'll find that ddply does what you want.
To see why it doesn't work with strptime, check
str(df) and then ?Posixlt. You've converted Date
values to lists.

My comment about cbind() is to warn you that your
Values variable, as you have constructed it, is
a factor.

Peter Ehlers


Thanks in advance,

Christoph

--------------------------------------------------------------------------------------------------------------------------------------------------------------------

Christoph Jäckel (Dipl.-Kfm.)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------

Research Assistant

Chair for Financial Management and Capital Markets | Lehrstuhls für
Finanzmanagement und Kapitalmärkte

TUM School of Management | Technische Universität München

Arcisstr. 21 | D-80333 München | Germany

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to