Re: [R] Odp: Problem with aggregating data across time points

David Winsemius Fri, 02 Jul 2010 09:24:19 -0700


On Jul 2, 2010, at 11:55 AM, Petr PIKAL wrote:

Hi

did you try aggregate?

aggregate(data[, 5:8],list(data$Date), sum, na.rm=T)

  Group.1 verbal self.harm violence_objects violence
1               0         0                0        0
2 01/04/07     25        15                3        9
3 02/04/07     24         6                8       13
4 03/04/07     17        13                0       10

aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T)


That address his A) request:

Here is the application of aggregate to his B) request (I think):

# Not e that Date is not of class Date but is rather a factor thatincludes "" as a level.

> aggregate(series[, 5:8],list(series$Date, series$Location), sum,na.rm=T)

    Group.1 Group.2 verbal self.harm violence_objects violence
1                        0         0                0        0
2                 A      0         0                0        0
3  01/04/07       A      7         1                0        3
4  02/04/07       A      8         2                0        1
5  03/04/07       A      0         0                0        2
6                 B      0         0                0        0
7  01/04/07       B      3         2                0        1
8  02/04/07       B      4         2                0        0
9  03/04/07       B      4         0                0        3
10                C      0         0                0        0
11 01/04/07       C      4         2                3        2
12 02/04/07       C      0         0                4        2
13 03/04/07       C      1         1                0        5
14                D      0         0                0        0
15 01/04/07       D      7         6                0        3
16 02/04/07       D      0         0                0        9
17 03/04/07       D      4        11                0        0
18                E      0         0                0        0
19 01/04/07       E      4         3                0        0
20 02/04/07       E      4         0                4        0
21 03/04/07       E      8         1                0        0
22                F      0         0                0        0
23 01/04/07       F      0         1                0        0
24 02/04/07       F      8         2                0        1

So perhaps an output with less extraneous input would be better:

> with(series[series$Date != "", ],

aggregate(list(verbal=verbal, self.harm=self.harm,viol_obj=violence_objects, violence=violence),

                  list(Date, Location),
                  sum, na.rm=T)
       )

    Group.1 Group.2 verbal self.harm viol_obj violence
1  01/04/07       A      7         1        0        3
2  02/04/07       A      8         2        0        1
3  03/04/07       A      0         0        0        2
4  01/04/07       B      3         2        0        1
5  02/04/07       B      4         2        0        0
6  03/04/07       B      4         0        0        3
7  01/04/07       C      4         2        3        2
8  02/04/07       C      0         0        4        2
9  03/04/07       C      1         1        0        5
10 01/04/07       D      7         6        0        3
11 02/04/07       D      0         0        0        9
12 03/04/07       D      4        11        0        0
13 01/04/07       E      4         3        0        0
14 02/04/07       E      4         0        4        0
15 03/04/07       E      8         1        0        0
16 01/04/07       F      0         1        0        0
17 02/04/07       F      8         2        0        1


BTW, why do you have empty rows?

Regards
Petr

Hello-

I have a dataset which basically looks like this:

Location   Sex       Date          Time   Verbal    Self harm
Violence_objects   Violence
 A             1      1-4-2007       1800      3             0
           1                       3
 A             1      1-4-2007       1230      2            1
          2                       4
 D             2      2-4-2007       1100      0            4
          0                       0
...

I've put a dput of the first section of the data at the end of this
email. Basically I have these data for several days across all of the
dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on

until 31-12-2009. The last four variables which you can see at theend

of the email are my dependent variables, they are different types of
violent and self harming behaviour shown by patients in a psychiatric
hospital.

What I want to do is:

A) sum each of the dependent variables for each of the dates (so e.g.

in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3,and

3+4=7 for each of the variables)

B) do this sum, but only in each location this time (location is the
first variable)- so the sum for 1-4-2007 in location A, sum for
1-4-2007 in location B, and so on and so on. Because this is divided
across locations, some dates will have no data going into them and
will return 0 sums. Crucially I still want these dates to appear- so
e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2
0 0, then 23-5-2008 0 0 0 0 again, and etc.

I've had several abortive attempts and done some Googling but havegot

nowhere. I'd greatly appreciate any advice.

Many thanks,
Chris Beeley
(Institute of Mental Health, UK)


structure(list(Location = structure(c(1L, 2L, 2L, 1L, 3L, 5L,
5L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 6L,
1L, 2L, 3L, 5L, 6L, 6L, 6L, 7L, 7L, 5L, 5L, 4L, 4L, 4L, 3L, 3L,
3L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 6L, 5L, 4L, 4L, 6L, 5L, 2L, 2L,
3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 5L, 5L, 3L, 3L, 4L,
4L, 4L, 4L), .Label = c("", "A", "B", "C", "D", "E", "F"), class =

"factor"),

   Sex = c(NA, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 2L, NA, 2L,
   2L, 1L, 1L, NA, 2L, 2L, 2L, 1L, NA, NA, 1L, 1L, 1L, 1L, 2L,
   1L, 2L, NA, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 1L, 2L, 2L, 2L,
   2L, 1L, 2L, 2L, 2L, 2L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, NA,
   1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, NA, 2L, 2L, 1L, 2L, 1L
   ), Date = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,
   2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
   2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L,
   3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 4L,
   4L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L,
   4L, 4L, 4L, 4L), .Label = c("", "01/04/07", "02/04/07", "03/04/07"
   ), class = "factor"), Time = structure(c(1L, 28L, 1L, 1L,
   1L, 1L, 20L, 1L, 1L, 37L, 37L, 2L, 13L, 31L, 1L, 17L, 1L,
   34L, 38L, 39L, 23L, 1L, 1L, 24L, 14L, 16L, 1L, 33L, 30L,
   10L, 1L, 6L, 8L, 1L, 26L, 1L, 1L, 13L, 3L, 4L, 1L, 1L, 35L,
   36L, 25L, 9L, 11L, 5L, 22L, 1L, 10L, 30L, 1L, 19L, 15L, 1L,
   29L, 1L, 27L, 10L, 2L, 21L, 18L, 1L, 23L, 32L, 36L, 1L, 30L,
   7L, 12L, 1L, 15L), .Label = c("", " ", "02:24:00", "03:44:00",
   "04:30:00", "07:00:00", "08:35:00", "09:20:00", "09:30:00",
   "10:00:00", "10:15:00", "10:45:00", "11:00:00", "11:20:00",
   "11:30:00", "11:35:00", "11:50:00", "12:00:00", "12:25:00",
   "12:30:00", "12:45:00", "15:00:00", "15:15:00", "15:30:00",
   "15:35:00", "17:15:00", "17:50:00", "18:00:00", "19:00:00",
   "19:30:00", "19:50:00", "20:00:00", "20:30:00", "20:55:00",
   "22:15:00", "22:30:00", "22:35:00", "22:40:00", "23:10:00"
   ), class = "factor"), verbal = c(NA, 3L, NA, NA, NA, 3L,
   0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 4L, NA, 0L, 0L, 0L, 4L,
   NA, NA, 4L, 3L, 0L, 4L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA,
   NA, 4L, 0L, 4L, 0L, 0L, 4L, 1L, 4L, 3L, 0L, 0L, 0L, NA, 4L,
   0L, NA, 0L, 3L, NA, 1L, NA, 0L, 3L, NA, 1L, 4L, NA, 4L, 0L,
   0L, NA, 0L, 0L, 0L, 0L, 1L), self.harm = c(NA, 0L, NA, NA,
   NA, 0L, 0L, NA, NA, 0L, 1L, NA, 2L, 0L, 0L, 2L, NA, 2L, 0L,
   2L, 0L, NA, NA, 0L, 0L, 2L, 0L, 1L, 2L, 1L, NA, 0L, 0L, NA,
   0L, NA, NA, 0L, 2L, 0L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L,
   NA, 0L, 2L, NA, 0L, 0L, NA, 0L, NA, 4L, 0L, NA, 1L, 0L, NA,
   1L, 3L, 1L, NA, 0L, 0L, 0L, 1L, 0L), violence_objects = c(NA,
   0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 3L,
   NA, 0L, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA,
   0L, 0L, NA, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
   4L, 0L, 4L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, 0L, 0L, NA,
   0L, 0L, NA, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L), violence = c(NA,
   0L, NA, NA, NA, 0L, 1L, NA, NA, 3L, 0L, NA, 0L, 1L, 1L, 1L,
   NA, 1L, 1L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA,
   3L, 3L, NA, 2L, NA, NA, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L,
   0L, 3L, 0L, NA, 0L, 0L, NA, 2L, 0L, NA, 0L, NA, 0L, 0L, NA,
   0L, 0L, NA, 0L, 0L, 0L, NA, 3L, 3L, 2L, 0L, 0L)), .Names =

c("Location",

"Sex", "Date", "Time", "verbal", "self.harm", "violence_objects",
"violence"), class = "data.frame", row.names = c(NA, -73L))

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Odp: Problem with aggregating data across time points

Reply via email to