Hi Martin,

it sounds like you want the difference between the first and the last observation per user, not, e.g., all the date differences between successive observations of each separate user. Correct me if I'm wrong. That said, let's build some toy data:

set.seed(1)
dataset <- data.frame(User=sample(LETTERS[1:5],100,replace=TRUE),
        Date=sample(as.Date("2014-01-01")+0:364,100,replace=TRUE))

Now we can calculate these differences and plot a histogram or tabulate:

foo <- with(dataset,by(Date,User,function(xx)diff(range(xx))))
hist(foo)
table(foo)

The key here is really the by() function, which calculates a function (here an anonymous function "function(xx)diff(range(xx))") applied to some data (here dataset$Date) separately for each level of a grouping factor (here dataset$User).

HTH,
Stephan


On 23.03.2014 01:32, Martin Tomko wrote:
Apologies if the question is a but naïve, I am a novice in time series data 
handling in R

I have the following type of data, in a long format ( as called by the 
spacetime vignette – the table contains also space, not noted here):

User |  Date | Otherdata |
A | 01/01/2014 | aa
A | 01/01/2014 | bb
A | 01/01/2014 | cc
B | 01/01/2014 | aa
B | 05/01/2014 | cc
A | 07/01/2014 | aa
C | 05/02/2014 | xx
C | 20/02/2014 | yy

Etc
[A,B,C,…] are user Ids (some strings).
Date is converted into a Date format (2013-10-15)

The table is sorted by User and then by Date, and is over 800K records long. 
There are about 20K users.

User |  Date | Otherdata |
A | 2014-01-01 | aa
A | 2014-01-01  | bb
A | 2014-01-01  | cc
A | 2014-01-07  | aa
B | 2014-01-01  | aa
B | 2014-01-05  | cc
C | 2014-02-05  | xx
C | 2014-02-20  | yy

I want to:
Get a frequency table ( and ultimately plot) of the count of differences (in 
days) between records of a user. Meaning, I would first get the unique days 
recorded:

A | 2014-01-01
A | 2014-01-07
B | 2014-01-01
B | 2014-01-05
C | 2014-02-05
C | 2014-02-20

And then want to run the differences between timestamps within a group defined 
by the user, in days:
A| 6
B| 4
C|15

Imagining that I have tens of thousands of records, I then want the table with 
the counts of differences ( across all users) ( in our case it would be 6, 4 
and 15, all counte = 1)
IN the larger sample, something like this:
DeltaDays | Count
1 | 150
2 | 320
…
N | X

I know there are all sorts of packages for time analysis, but I could not find 
a simple function like this (incl searching here 
http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf
 ). I assume that something working on a simple data frame would be sufficient, 
but I am happy ( prefer?) to use TS. I would appreciate any hints. The ultimate 
analysis involves also space, so hints in the direction of space-time are 
welcome. Ultimately, I would like to separate records for each user into a 
dataset that can be handled separately, but splitting it into a large number of 
files does not seem wise. Any hint also appreciated.

Thanks,
Martin



        [[alternative HTML version deleted]]



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to