Hi Martin,
it sounds like you want the difference between the first and the last
observation per user, not, e.g., all the date differences between
successive observations of each separate user. Correct me if I'm wrong.
That said, let's build some toy data:
set.seed(1)
dataset <- data.frame(User=sample(LETTERS[1:5],100,replace=TRUE),
Date=sample(as.Date("2014-01-01")+0:364,100,replace=TRUE))
Now we can calculate these differences and plot a histogram or tabulate:
foo <- with(dataset,by(Date,User,function(xx)diff(range(xx))))
hist(foo)
table(foo)
The key here is really the by() function, which calculates a function
(here an anonymous function "function(xx)diff(range(xx))") applied to
some data (here dataset$Date) separately for each level of a grouping
factor (here dataset$User).
HTH,
Stephan
On 23.03.2014 01:32, Martin Tomko wrote:
Apologies if the question is a but naïve, I am a novice in time series data
handling in R
I have the following type of data, in a long format ( as called by the
spacetime vignette – the table contains also space, not noted here):
User | Date | Otherdata |
A | 01/01/2014 | aa
A | 01/01/2014 | bb
A | 01/01/2014 | cc
B | 01/01/2014 | aa
B | 05/01/2014 | cc
A | 07/01/2014 | aa
C | 05/02/2014 | xx
C | 20/02/2014 | yy
Etc
[A,B,C,…] are user Ids (some strings).
Date is converted into a Date format (2013-10-15)
The table is sorted by User and then by Date, and is over 800K records long.
There are about 20K users.
User | Date | Otherdata |
A | 2014-01-01 | aa
A | 2014-01-01 | bb
A | 2014-01-01 | cc
A | 2014-01-07 | aa
B | 2014-01-01 | aa
B | 2014-01-05 | cc
C | 2014-02-05 | xx
C | 2014-02-20 | yy
I want to:
Get a frequency table ( and ultimately plot) of the count of differences (in
days) between records of a user. Meaning, I would first get the unique days
recorded:
A | 2014-01-01
A | 2014-01-07
B | 2014-01-01
B | 2014-01-05
C | 2014-02-05
C | 2014-02-20
And then want to run the differences between timestamps within a group defined
by the user, in days:
A| 6
B| 4
C|15
Imagining that I have tens of thousands of records, I then want the table with
the counts of differences ( across all users) ( in our case it would be 6, 4
and 15, all counte = 1)
IN the larger sample, something like this:
DeltaDays | Count
1 | 150
2 | 320
…
N | X
I know there are all sorts of packages for time analysis, but I could not find
a simple function like this (incl searching here
http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf
). I assume that something working on a simple data frame would be sufficient,
but I am happy ( prefer?) to use TS. I would appreciate any hints. The ultimate
analysis involves also space, so hints in the direction of space-time are
welcome. Ultimately, I would like to separate records for each user into a
dataset that can be handled separately, but splitting it into a large number of
files does not seem wise. Any hint also appreciated.
Thanks,
Martin
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.