In a research project we are using a web-based tools for collecting data
from questionnaire. The system generates files that are simple to read
as a data frame in the "long" format, which are simple to convert to the
"wide" format.
Something that might happen are: (a) there are two (multiple) references
to the same cell, and (b) if there are missing values? So, the data set
has two references to S2/T2 and none to the S2/T1 combination:
> d
values person time
1 1 S1 T1
2 2 S1 T2
3 3 S1 T3
4 4 S1 T4
5 22 S2 T2
6 6 S2 T2
7 7 S2 T3
8 8 S2 T4
9 9 S3 T1
10 10 S3 T2
11 11 S3 T3
12 12 S3 T4
reshape (d, idvar="person", v.names=c("values"), timevar="time",
direction="wide")
person values.T1 values.T2 values.T3 values.T4
1 S1 1 2 3 4
5 S2 NA 22 7 8
9 S3 9 10 11 12
The missing cell gets an NA as expected. But the surprise is in the
case where there are two references to the same cell. The the *first*
is used (22 rather than 6).
Is there some way of forcing reshape () to use the *last* value?
Tom
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.