subject:"Re\: \[R\] aggregate"

Re: [R] aggregate formula - differing results

2023-09-04 Thread Achim Zeileis


On Mon, 4 Sep 2023, Ivan Calandra wrote:


Thanks Rui for your help; that would be one possibility indeed.

But am I the only one who finds that behavior of aggregate() completely 
unexpected and confusing? Especially considering that dplyr::summarise() and 
doBy::summaryBy() deal with NAs differently, even though they all use 
mean(na.rm = TRUE) to calculate the group stats.


I agree with Rui that this behaves as documented but I also agree with 
Ivan that the behavior is potentially confusing. Not so much because other 
packages behave differently but mostly because the handling of missing 
values differs between the different aggregate() methods.


Based on my teaching experience, I feel that a default of 
na.action=na.pass would be less confusing, especially in the case with 
multivariate "response".


In the univeriate case the discrepancy can be surprising - in the default 
method you need na.rm=TRUE but in the formula method you get the same 
result without additional arguments (due to na.action=na.omit). But in the 
multivariate case the discrepancy is not obvious, especially for 
beginners, because the results in other variables without NAs are affected 
as well.


A minimal toy example is the following data with two groups (x = A vs. B) 
and two "responses" (y without NAs and z with NA):


d <- data.frame(x = c("A", "A", "B", "B"), y = 1:4, z = c(1:3, NA))
d
##   x y  z
## 1 A 1  1
## 2 A 2  2
## 3 B 3  3
## 4 B 4 NA

Except for naming of the columns, both of the following summaries for y by 
x (without NAs) yield the same result:


aggregate(d$y, list(d$x), FUN = mean)
aggregate(y ~ x, data = d, FUN = mean)
##   x   y
## 1 A 1.5
## 2 B 3.5

For a single variable _with_ NAs, the default method needs the na.rm = 
TRUE argument, the fomula method does not. Again, except for naming of the 
columns:


aggregate(d$z, list(d$x), FUN = mean, na.rm = TRUE)
aggregate(z ~ x, data = d, FUN = mean)
##   x   z
## 1 A 1.5
## 2 B 3.0

Conversely, if you do want the NAs in the groups, the following two are 
the same (except for naming):


aggregate(d$z, list(d$x), FUN = mean)
aggregate(z ~ x, data = d, FUN = mean, na.action = na.pass)
##   x   z
## 1 A 1.5
## 2 B  NA

But in the multivariate case, it is not so obvious why the following two 
commands differ in their results for y (!), the variable without NAs, in 
group B:


aggregate(d[, c("y", "z")], list(d$x), FUN = mean, na.rm = TRUE)
##   Group.1   y   z
## 1   A 1.5 1.5
## 2   B 3.5 3.0
 ^^^

aggregate(cbind(y, z) ~ x, data = d, FUN = mean)
##   x   y   z
## 1 A 1.5 1.5
## 2 B 3.0 3.0
   ^^^

Hence, in my programming courses I tell students to use na.action=na.pass 
in the formula method and to handle NAs in the FUN argument themselves.


I guess that this is not important enough to change the default in 
aggregate.formula. Or are there R core members who also find that this 
inconsistency between the different methods is worth addressing?


If not, maybe an explicit example could be added on the help page? Showing 
something like this might help:


## default: omit enitre row 4 where z=NA
aggregate(cbind(y, z) ~ x, data = d, FUN = mean)
##   x   y   z
## 1 A 1.5 1.5
## 2 B 3.0 3.0

## alternatively: omit row 4 only for z result but not for y result
aggregate(cbind(y, z) ~ x, data = d, FUN = mean, na.action = na.pass, na.rm = 
TRUE)
##   x   y   z
## 1 A 1.5 1.5
## 2 B 3.5 3.0

Best wishes,
Achim


On 04/09/2023 13:46, Rui Barradas wrote:

Às 10:44 de 04/09/2023, Ivan Calandra escreveu:

Dear useRs,

I have just stumbled across a behavior in aggregate() that I cannot 
explain. Any help would be appreciated!


Sample data:
my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103", 
"HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41, 121.37, 
70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83, 1656.46, 
1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2), Length = 
c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 32.075, 21.337, 
35.459), Width = c(45.982, 67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 
62.783, 26.417, 35.297), PLATWIDTH = c(38.84, NA, 15.33, 30.37, 11.44, 
14.88, 13.86, NA, NA, 26.71), PLATTHICK = c(8.67, NA, 7.99, 11.69, 3.3, 
16.52, 4.58, NA, NA, 9.35), EPA = c(78, NA, 78, 54, 72, 49, 56, NA, NA, 
56), THICKNESS = c(10.97, NA, 9.36, 6.4, 5.89, 11.05, 4.9, NA, NA, 10.08), 
WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9, 29.5, 4.5, NA, NA, 23), RAWMAT = 
c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", "HORNFELS", 
"HORNFELS", "HORNFELS", "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 
111L, 112L, 113L, 114L, 115L), class = "data.frame")


1) Simple aggregation with 2 variables:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean, na.rm 
= TRUE)


2) Using the dot notation - different results:
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)

3) Using

Re: [R] aggregate formula - differing results

2023-09-04 Thread Bert Gunter

Ivan:
Just one perhaps extraneous comment.

You said that you were surprised that aggregate() and group_by() did not
have the same behavior. That is a misconception on your part. As you know,
the tidyverse recapitulates the functionality of many base R functions; but
it makes no claims to do so in exactly the same way and, indeed, often
makes deliberate changes to "improve" behavior. So if you wish to use both,
you should *expect* such differences, which, of course, are documented in
the man pages (and often elsewhere).

Cheers,
Bert

On Mon, Sep 4, 2023 at 5:21 AM Ivan Calandra  wrote:

> Haha, got it now, there is an na.action argument (which defaults to
> na.omit) to aggregate() which is applied before calling mean(na.rm =
> TRUE). Thank you Rui for pointing this out.
>
> So running it with na.pass instead of na.omit gives the same results as
> dplyr::group_by()+summarise():
> aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE,
> na.action = na.pass)
>
> Cheers,
> Ivan
>
> On 04/09/2023 13:56, Rui Barradas wrote:
> > Às 12:51 de 04/09/2023, Ivan Calandra escreveu:
> >> Thanks Rui for your help; that would be one possibility indeed.
> >>
> >> But am I the only one who finds that behavior of aggregate()
> >> completely unexpected and confusing? Especially considering that
> >> dplyr::summarise() and doBy::summaryBy() deal with NAs differently,
> >> even though they all use mean(na.rm = TRUE) to calculate the group
> >> stats.
> >>
> >> Best wishes,
> >> Ivan
> >>
> >> On 04/09/2023 13:46, Rui Barradas wrote:
> >>> Às 10:44 de 04/09/2023, Ivan Calandra escreveu:
>  Dear useRs,
> 
>  I have just stumbled across a behavior in aggregate() that I cannot
>  explain. Any help would be appreciated!
> 
>  Sample data:
>  my_data <- structure(list(ID = c("FLINT-1", "FLINT-10",
>  "FLINT-100", "FLINT-101", "FLINT-102", "HORN-10", "HORN-100",
>  "HORN-102", "HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77,
>  142.79, 130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5),
>  SurfaceArea = c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47,
>  1169.26, 444.61, 1791.48, 461.15, 1127.2), Length = c(44.384,
>  29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 32.075, 21.337,
>  35.459), Width = c(45.982, 67.303, 52.679, 26.42, 25.149, 33.427,
>  20.683, 62.783, 26.417, 35.297), PLATWIDTH = c(38.84, NA, 15.33,
>  30.37, 11.44, 14.88, 13.86, NA, NA, 26.71), PLATTHICK = c(8.67, NA,
>  7.99, 11.69, 3.3, 16.52, 4.58, NA, NA, 9.35), EPA = c(78, NA, 78,
>  54, 72, 49, 56, NA, NA, 56), THICKNESS = c(10.97, NA, 9.36, 6.4,
>  5.89, 11.05, 4.9, NA, NA, 10.08), WEIGHT = c(34.3, NA, 25.5, 18.6,
>  14.9, 29.5, 4.5, NA, NA, 23), RAWMAT = c("FLINT", "FLINT", "FLINT",
>  "FLINT", "FLINT", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS",
>  "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L,
>  114L, 115L), class = "data.frame")
> 
>  1) Simple aggregation with 2 variables:
>  aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN =
>  mean, na.rm = TRUE)
> 
>  2) Using the dot notation - different results:
>  aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)
> 
>  3) Using dplyr, I get the same results as #1:
>  group_by(my_data, RAWMAT) %>%
> summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))
> 
>  4) It gets weirder: using all columns in #1 give the same results
>  as in #2 but different from #1 and #3
>  aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH,
>  PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN =
>  mean, na.rm = TRUE)
> 
>  So it seems it is not only due to the notation (cbind() vs. dot).
>  Is it a bug? A peculiar thing in my dataset? I tend to think this
>  could be due to some variables (or their names) as all notations
>  seem to agree when I remove some variables (although I haven't
>  found out which variable(s) is (are) at fault), e.g.:
> 
>  my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10",
>  "FLINT-100", "FLINT-101", "FLINT-102", "HORN-10", "HORN-100",
>  "HORN-102", "HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77,
>  142.79, 130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5),
>  SurfaceArea = c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47,
>  1169.26, 444.61, 1791.48, 461.15, 1127.2), Length = c(44.384,
>  29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 32.075, 21.337,
>  35.459), Width = c(45.982, 67.303, 52.679, 26.42, 25.149, 33.427,
>  20.683, 62.783, 26.417, 35.297), RAWMAT = c("FLINT", "FLINT",
>  "FLINT", "FLINT", "FLINT", "HORNFELS", "HORNFELS", "HORNFELS",
>  "HORNFELS", "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 111L,
>  112L, 113L, 114L, 115L), class = "data.frame")
> 
>  aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT,
>

Re: [R] aggregate formula - differing results

2023-09-04 Thread Ivan Calandra

Haha, got it now, there is an na.action argument (which defaults to 
na.omit) to aggregate() which is applied before calling mean(na.rm = 
TRUE). Thank you Rui for pointing this out.


So running it with na.pass instead of na.omit gives the same results as 
dplyr::group_by()+summarise():
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE, 
na.action = na.pass)


Cheers,
Ivan

On 04/09/2023 13:56, Rui Barradas wrote:

Às 12:51 de 04/09/2023, Ivan Calandra escreveu:

Thanks Rui for your help; that would be one possibility indeed.

But am I the only one who finds that behavior of aggregate() 
completely unexpected and confusing? Especially considering that 
dplyr::summarise() and doBy::summaryBy() deal with NAs differently, 
even though they all use mean(na.rm = TRUE) to calculate the group 
stats.


Best wishes,
Ivan

On 04/09/2023 13:46, Rui Barradas wrote:

Às 10:44 de 04/09/2023, Ivan Calandra escreveu:

Dear useRs,

I have just stumbled across a behavior in aggregate() that I cannot 
explain. Any help would be appreciated!


Sample data:
my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", 
"FLINT-100", "FLINT-101", "FLINT-102", "HORN-10", "HORN-100", 
"HORN-102", "HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 
142.79, 130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), 
SurfaceArea = c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 
1169.26, 444.61, 1791.48, 461.15, 1127.2), Length = c(44.384, 
29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 32.075, 21.337, 
35.459), Width = c(45.982, 67.303, 52.679, 26.42, 25.149, 33.427, 
20.683, 62.783, 26.417, 35.297), PLATWIDTH = c(38.84, NA, 15.33, 
30.37, 11.44, 14.88, 13.86, NA, NA, 26.71), PLATTHICK = c(8.67, NA, 
7.99, 11.69, 3.3, 16.52, 4.58, NA, NA, 9.35), EPA = c(78, NA, 78, 
54, 72, 49, 56, NA, NA, 56), THICKNESS = c(10.97, NA, 9.36, 6.4, 
5.89, 11.05, 4.9, NA, NA, 10.08), WEIGHT = c(34.3, NA, 25.5, 18.6, 
14.9, 29.5, 4.5, NA, NA, 23), RAWMAT = c("FLINT", "FLINT", "FLINT", 
"FLINT", "FLINT", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", 
"HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 
114L, 115L), class = "data.frame")


1) Simple aggregation with 2 variables:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = 
mean, na.rm = TRUE)


2) Using the dot notation - different results:
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)

3) Using dplyr, I get the same results as #1:
group_by(my_data, RAWMAT) %>%
   summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))

4) It gets weirder: using all columns in #1 give the same results 
as in #2 but different from #1 and #3
aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH, 
PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN = 
mean, na.rm = TRUE)


So it seems it is not only due to the notation (cbind() vs. dot). 
Is it a bug? A peculiar thing in my dataset? I tend to think this 
could be due to some variables (or their names) as all notations 
seem to agree when I remove some variables (although I haven't 
found out which variable(s) is (are) at fault), e.g.:


my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10", 
"FLINT-100", "FLINT-101", "FLINT-102", "HORN-10", "HORN-100", 
"HORN-102", "HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 
142.79, 130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), 
SurfaceArea = c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 
1169.26, 444.61, 1791.48, 461.15, 1127.2), Length = c(44.384, 
29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 32.075, 21.337, 
35.459), Width = c(45.982, 67.303, 52.679, 26.42, 25.149, 33.427, 
20.683, 62.783, 26.417, 35.297), RAWMAT = c("FLINT", "FLINT", 
"FLINT", "FLINT", "FLINT", "HORNFELS", "HORNFELS", "HORNFELS", 
"HORNFELS", "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 
112L, 113L, 114L, 115L), class = "data.frame")


aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT, 
data = my_data2, FUN = mean, na.rm = TRUE)


aggregate(. ~ RAWMAT, data = my_data2[-1], FUN = mean, na.rm = TRUE)

group_by(my_data2, RAWMAT) %>%
   summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))


Thank you in advance for any hint.
Best wishes,
Ivan




 *LEIBNIZ-ZENTRUM*
*FÜR ARCHÄOLOGIE*

*Dr. Ivan CALANDRA*
**Head of IMPALA (IMaging Platform At LeizA)

*MONREPOS* Archaeological Research Centre, Schloss Monrepos
56567 Neuwied, Germany

T: +49 2631 9772 243
T: +49 6131 8885 543
ivan.calan...@leiza.de

leiza.de 

ORCID 
ResearchGate


LEIZA is a foundation under public law of the State of 
Rhineland-Palatinate and the City of Mainz. Its headquarters are in 
Mainz. Supervision is carried out by the Ministry of Science and 
Health of the State of Rhineland-Palatinate. LEIZA is a research 
museum of the Leibniz Association.

__

Re: [R] aggregate formula - differing results

2023-09-04 Thread Rui Barradas


Às 12:51 de 04/09/2023, Ivan Calandra escreveu:

Thanks Rui for your help; that would be one possibility indeed.

But am I the only one who finds that behavior of aggregate() completely 
unexpected and confusing? Especially considering that dplyr::summarise() 
and doBy::summaryBy() deal with NAs differently, even though they all 
use mean(na.rm = TRUE) to calculate the group stats.


Best wishes,
Ivan

On 04/09/2023 13:46, Rui Barradas wrote:

Às 10:44 de 04/09/2023, Ivan Calandra escreveu:

Dear useRs,

I have just stumbled across a behavior in aggregate() that I cannot 
explain. Any help would be appreciated!


Sample data:
my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", 
"HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 
130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = 
c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 
1791.48, 461.15, 1127.2), Length = c(44.384, 29.831, 43.869, 48.011, 
54.109, 41.742, 23.854, 32.075, 21.337, 35.459), Width = c(45.982, 
67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 62.783, 26.417, 
35.297), PLATWIDTH = c(38.84, NA, 15.33, 30.37, 11.44, 14.88, 13.86, 
NA, NA, 26.71), PLATTHICK = c(8.67, NA, 7.99, 11.69, 3.3, 16.52, 
4.58, NA, NA, 9.35), EPA = c(78, NA, 78, 54, 72, 49, 56, NA, NA, 56), 
THICKNESS = c(10.97, NA, 9.36, 6.4, 5.89, 11.05, 4.9, NA, NA, 10.08), 
WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9, 29.5, 4.5, NA, NA, 23), RAWMAT 
= c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", 
"HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), row.names = c(1L, 
2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), class = "data.frame")


1) Simple aggregation with 2 variables:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean, 
na.rm = TRUE)


2) Using the dot notation - different results:
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)

3) Using dplyr, I get the same results as #1:
group_by(my_data, RAWMAT) %>%
   summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))

4) It gets weirder: using all columns in #1 give the same results as 
in #2 but different from #1 and #3
aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH, 
PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN = 
mean, na.rm = TRUE)


So it seems it is not only due to the notation (cbind() vs. dot). Is 
it a bug? A peculiar thing in my dataset? I tend to think this could 
be due to some variables (or their names) as all notations seem to 
agree when I remove some variables (although I haven't found out 
which variable(s) is (are) at fault), e.g.:


my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", 
"HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 
130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = 
c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 
1791.48, 461.15, 1127.2), Length = c(44.384, 29.831, 43.869, 48.011, 
54.109, 41.742, 23.854, 32.075, 21.337, 35.459), Width = c(45.982, 
67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 62.783, 26.417, 
35.297), RAWMAT = c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", 
"HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), 
row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), 
class = "data.frame")


aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT, 
data = my_data2, FUN = mean, na.rm = TRUE)


aggregate(. ~ RAWMAT, data = my_data2[-1], FUN = mean, na.rm = TRUE)

group_by(my_data2, RAWMAT) %>%
   summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))


Thank you in advance for any hint.
Best wishes,
Ivan




 *LEIBNIZ-ZENTRUM*
*FÜR ARCHÄOLOGIE*

*Dr. Ivan CALANDRA*
**Head of IMPALA (IMaging Platform At LeizA)

*MONREPOS* Archaeological Research Centre, Schloss Monrepos
56567 Neuwied, Germany

T: +49 2631 9772 243
T: +49 6131 8885 543
ivan.calan...@leiza.de

leiza.de 

ORCID 
ResearchGate


LEIZA is a foundation under public law of the State of 
Rhineland-Palatinate and the City of Mainz. Its headquarters are in 
Mainz. Supervision is carried out by the Ministry of Science and 
Health of the State of Rhineland-Palatinate. LEIZA is a research 
museum of the Leibniz Association.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

You can define a vector of the columns of interest and subset the data 
with it. Then the default na.action = na.omit will no longer remove 
the rows with NA vals in at least one column and

Re: [R] aggregate formula - differing results

2023-09-04 Thread Ivan Calandra


Thanks Rui for your help; that would be one possibility indeed.

But am I the only one who finds that behavior of aggregate() completely 
unexpected and confusing? Especially considering that dplyr::summarise() 
and doBy::summaryBy() deal with NAs differently, even though they all 
use mean(na.rm = TRUE) to calculate the group stats.


Best wishes,
Ivan

On 04/09/2023 13:46, Rui Barradas wrote:

Às 10:44 de 04/09/2023, Ivan Calandra escreveu:

Dear useRs,

I have just stumbled across a behavior in aggregate() that I cannot 
explain. Any help would be appreciated!


Sample data:
my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", 
"HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 
130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = 
c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 
1791.48, 461.15, 1127.2), Length = c(44.384, 29.831, 43.869, 48.011, 
54.109, 41.742, 23.854, 32.075, 21.337, 35.459), Width = c(45.982, 
67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 62.783, 26.417, 
35.297), PLATWIDTH = c(38.84, NA, 15.33, 30.37, 11.44, 14.88, 13.86, 
NA, NA, 26.71), PLATTHICK = c(8.67, NA, 7.99, 11.69, 3.3, 16.52, 
4.58, NA, NA, 9.35), EPA = c(78, NA, 78, 54, 72, 49, 56, NA, NA, 56), 
THICKNESS = c(10.97, NA, 9.36, 6.4, 5.89, 11.05, 4.9, NA, NA, 10.08), 
WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9, 29.5, 4.5, NA, NA, 23), RAWMAT 
= c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", 
"HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), row.names = c(1L, 
2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), class = "data.frame")


1) Simple aggregation with 2 variables:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean, 
na.rm = TRUE)


2) Using the dot notation - different results:
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)

3) Using dplyr, I get the same results as #1:
group_by(my_data, RAWMAT) %>%
   summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))

4) It gets weirder: using all columns in #1 give the same results as 
in #2 but different from #1 and #3
aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH, 
PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN = 
mean, na.rm = TRUE)


So it seems it is not only due to the notation (cbind() vs. dot). Is 
it a bug? A peculiar thing in my dataset? I tend to think this could 
be due to some variables (or their names) as all notations seem to 
agree when I remove some variables (although I haven't found out 
which variable(s) is (are) at fault), e.g.:


my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", 
"HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 
130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = 
c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 
1791.48, 461.15, 1127.2), Length = c(44.384, 29.831, 43.869, 48.011, 
54.109, 41.742, 23.854, 32.075, 21.337, 35.459), Width = c(45.982, 
67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 62.783, 26.417, 
35.297), RAWMAT = c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", 
"HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), 
row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), 
class = "data.frame")


aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT, 
data = my_data2, FUN = mean, na.rm = TRUE)


aggregate(. ~ RAWMAT, data = my_data2[-1], FUN = mean, na.rm = TRUE)

group_by(my_data2, RAWMAT) %>%
   summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))


Thank you in advance for any hint.
Best wishes,
Ivan




 *LEIBNIZ-ZENTRUM*
*FÜR ARCHÄOLOGIE*

*Dr. Ivan CALANDRA*
**Head of IMPALA (IMaging Platform At LeizA)

*MONREPOS* Archaeological Research Centre, Schloss Monrepos
56567 Neuwied, Germany

T: +49 2631 9772 243
T: +49 6131 8885 543
ivan.calan...@leiza.de

leiza.de 

ORCID 
ResearchGate


LEIZA is a foundation under public law of the State of 
Rhineland-Palatinate and the City of Mainz. Its headquarters are in 
Mainz. Supervision is carried out by the Ministry of Science and 
Health of the State of Rhineland-Palatinate. LEIZA is a research 
museum of the Leibniz Association.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

You can define a vector of the columns of interest and subset the data 
with it. Then the default na.action = na.omit will no longer remove 
the rows with NA vals in at least one column and the results are the 
same.


However, this will

Re: [R] aggregate formula - differing results

2023-09-04 Thread Rui Barradas


Às 10:44 de 04/09/2023, Ivan Calandra escreveu:

Dear useRs,

I have just stumbled across a behavior in aggregate() that I cannot 
explain. Any help would be appreciated!


Sample data:
my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103", 
"HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41, 
121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83, 
1656.46, 1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2), 
Length = c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 
32.075, 21.337, 35.459), Width = c(45.982, 67.303, 52.679, 26.42, 
25.149, 33.427, 20.683, 62.783, 26.417, 35.297), PLATWIDTH = c(38.84, 
NA, 15.33, 30.37, 11.44, 14.88, 13.86, NA, NA, 26.71), PLATTHICK = 
c(8.67, NA, 7.99, 11.69, 3.3, 16.52, 4.58, NA, NA, 9.35), EPA = c(78, 
NA, 78, 54, 72, 49, 56, NA, NA, 56), THICKNESS = c(10.97, NA, 9.36, 6.4, 
5.89, 11.05, 4.9, NA, NA, 10.08), WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9, 
29.5, 4.5, NA, NA, 23), RAWMAT = c("FLINT", "FLINT", "FLINT", "FLINT", 
"FLINT", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), 
row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), class = 
"data.frame")


1) Simple aggregation with 2 variables:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean, 
na.rm = TRUE)


2) Using the dot notation - different results:
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)

3) Using dplyr, I get the same results as #1:
group_by(my_data, RAWMAT) %>%
   summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))

4) It gets weirder: using all columns in #1 give the same results as in 
#2 but different from #1 and #3
aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH, 
PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN = mean, 
na.rm = TRUE)


So it seems it is not only due to the notation (cbind() vs. dot). Is it 
a bug? A peculiar thing in my dataset? I tend to think this could be due 
to some variables (or their names) as all notations seem to agree when I 
remove some variables (although I haven't found out which variable(s) is 
(are) at fault), e.g.:


my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103", 
"HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41, 
121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83, 
1656.46, 1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2), 
Length = c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 
32.075, 21.337, 35.459), Width = c(45.982, 67.303, 52.679, 26.42, 
25.149, 33.427, 20.683, 62.783, 26.417, 35.297), RAWMAT = c("FLINT", 
"FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", "HORNFELS", "HORNFELS", 
"HORNFELS", "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 
113L, 114L, 115L), class = "data.frame")


aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT, data = 
my_data2, FUN = mean, na.rm = TRUE)


aggregate(. ~ RAWMAT, data = my_data2[-1], FUN = mean, na.rm = TRUE)

group_by(my_data2, RAWMAT) %>%
   summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))


Thank you in advance for any hint.
Best wishes,
Ivan




 *LEIBNIZ-ZENTRUM*
*FÜR ARCHÄOLOGIE*

*Dr. Ivan CALANDRA*
**Head of IMPALA (IMaging Platform At LeizA)

*MONREPOS* Archaeological Research Centre, Schloss Monrepos
56567 Neuwied, Germany

T: +49 2631 9772 243
T: +49 6131 8885 543
ivan.calan...@leiza.de

leiza.de 

ORCID 
ResearchGate


LEIZA is a foundation under public law of the State of 
Rhineland-Palatinate and the City of Mainz. Its headquarters are in 
Mainz. Supervision is carried out by the Ministry of Science and Health 
of the State of Rhineland-Palatinate. LEIZA is a research museum of the 
Leibniz Association.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

You can define a vector of the columns of interest and subset the data 
with it. Then the default na.action = na.omit will no longer remove the 
rows with NA vals in at least one column and the results are the same.


However, this will not give the mean values of the other numeric 
columns, just of those two.




# define a vector of columns of interest
cols <- c("Length", "Width", "RAWMAT")

# 1) Simple aggregation with 2 variables, select cols:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data[cols], FUN = 
mean, na.rm = TRUE)


# 2) Using the dot notation - if cols are selected, equal results:
aggregate(. ~ RAWMAT, data

Re: [R] aggregate formula - differing results

2023-09-04 Thread Ivan Calandra

Thanks Iago for the pointer.


It then means that na.rm = TRUE is not applied in the same way within 
aggregate() as opposed to dplyr::group_by() + summarise(), right? Within 
aggregate, it behaves like na.omit(), that is, it excludes the 
incomplete cases (whole rows), whereas with group_by() + summarise() it 
is applied on each vector (variable), which is what I actually would expect.


I hadn't showed it, but doBy::summaryBy() produces the same results as 
group_by() + summarise().


Ivan


On 04/09/2023 12:45, Iago Giné Vázquez wrote:
> It seems that the issue are the missings. If in  #1 you use the 
> dataset na.omit(my_data) instead of my_data, you get the same output 
> that in #2 and in #4, where all observations with missing data are 
> removed since you are including all the variables.
>
>
> The second dataset has no issue since it has no missing data.
>
> Iago
> 
> *De:* R-help  de part de Ivan Calandra 
> 
> *Enviat el:* dilluns, 4 de setembre de 2023 11:44
> *Per a:* R-help 
> *Tema:* [R] aggregate formula - differing results
> Dear useRs,
>
> I have just stumbled across a behavior in aggregate() that I cannot
> explain. Any help would be appreciated!
>
> Sample data:
> my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100",
> "FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103",
> "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41,
> 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83,
> 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2),
> Length = c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854,
> 32.075, 21.337, 35.459), Width = c(45.982, 67.303, 52.679, 26.42,
> 25.149, 33.427, 20.683, 62.783, 26.417, 35.297), PLATWIDTH = c(38.84,
> NA, 15.33, 30.37, 11.44, 14.88, 13.86, NA, NA, 26.71), PLATTHICK =
> c(8.67, NA, 7.99, 11.69, 3.3, 16.52, 4.58, NA, NA, 9.35), EPA = c(78,
> NA, 78, 54, 72, 49, 56, NA, NA, 56), THICKNESS = c(10.97, NA, 9.36, 6.4,
> 5.89, 11.05, 4.9, NA, NA, 10.08), WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9,
> 29.5, 4.5, NA, NA, 23), RAWMAT = c("FLINT", "FLINT", "FLINT", "FLINT",
> "FLINT", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")),
> row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), class =
> "data.frame")
>
> 1) Simple aggregation with 2 variables:
> aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean,
> na.rm = TRUE)
>
> 2) Using the dot notation - different results:
> aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)
>
> 3) Using dplyr, I get the same results as #1:
> group_by(my_data, RAWMAT) %>%
>    summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))
>
> 4) It gets weirder: using all columns in #1 give the same results as in
> #2 but different from #1 and #3
> aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH,
> PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN = mean,
> na.rm = TRUE)
>
> So it seems it is not only due to the notation (cbind() vs. dot). Is it
> a bug? A peculiar thing in my dataset? I tend to think this could be due
> to some variables (or their names) as all notations seem to agree when I
> remove some variables (although I haven't found out which variable(s) is
> (are) at fault), e.g.:
>
> my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100",
> "FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103",
> "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41,
> 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83,
> 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2),
> Length = c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854,
> 32.075, 21.337, 35.459), Width = c(45.982, 67.303, 52.679, 26.42,
> 25.149, 33.427, 20.683, 62.783, 26.417, 35.297), RAWMAT = c("FLINT",
> "FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", "HORNFELS", "HORNFELS",
> "HORNFELS", "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L,
> 113L, 114L, 115L), class = "data.frame")
>
> aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT, data =
> my_data2, FUN = mean, na.rm = TRUE)
>
> aggregate(. ~ RAWMAT, data = my_data2[-1], FUN = mean, na.rm = TRUE)
>
> group_by(my_data2, RAWMAT) %>%
>    summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))
>
>
> Thank you in advance for any hint.
> Best wishes,
> Ivan
>
>
>
>
>     *LEIBNIZ-ZENTRUM*
> *FÜR ARCHÄOLOGIE*
>
> *Dr. Ivan CALANDRA*
> **Head of IMPALA (IMaging Platform At LeizA)
>
> *MONREPOS* Archaeological Research Centre, Schloss Monrepos
> 56567 Neuwied, Germany
>
> T: +49 2631 9772 243
> T: +49 6131 8885 543
> ivan.calan...@leiza.de
>
> leiza.de 
> 
> ORCID 
> ResearchGate
> 
>
> LEIZA is a foundation under public law

Re: [R] aggregate formula - differing results

2023-09-04 Thread Iago Giné Vázquez

It seems that the issue are the missings. If in  #1 you use the dataset 
na.omit(my_data) instead of my_data, you get the same output that in #2 and in 
#4, where all observations with missing data are removed since you are 
including all the variables.


The second dataset has no issue since it has no missing data.

Iago

De: R-help  de part de Ivan Calandra 

Enviat el: dilluns, 4 de setembre de 2023 11:44
Per a: R-help 
Tema: [R] aggregate formula - differing results

Dear useRs,

I have just stumbled across a behavior in aggregate() that I cannot
explain. Any help would be appreciated!

Sample data:
my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100",
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103",
"HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41,
121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83,
1656.46, 1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2),
Length = c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854,
32.075, 21.337, 35.459), Width = c(45.982, 67.303, 52.679, 26.42,
25.149, 33.427, 20.683, 62.783, 26.417, 35.297), PLATWIDTH = c(38.84,
NA, 15.33, 30.37, 11.44, 14.88, 13.86, NA, NA, 26.71), PLATTHICK =
c(8.67, NA, 7.99, 11.69, 3.3, 16.52, 4.58, NA, NA, 9.35), EPA = c(78,
NA, 78, 54, 72, 49, 56, NA, NA, 56), THICKNESS = c(10.97, NA, 9.36, 6.4,
5.89, 11.05, 4.9, NA, NA, 10.08), WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9,
29.5, 4.5, NA, NA, 23), RAWMAT = c("FLINT", "FLINT", "FLINT", "FLINT",
"FLINT", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")),
row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), class =
"data.frame")

1) Simple aggregation with 2 variables:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean,
na.rm = TRUE)

2) Using the dot notation - different results:
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)

3) Using dplyr, I get the same results as #1:
group_by(my_data, RAWMAT) %>%
   summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))

4) It gets weirder: using all columns in #1 give the same results as in
#2 but different from #1 and #3
aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH,
PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN = mean,
na.rm = TRUE)

So it seems it is not only due to the notation (cbind() vs. dot). Is it
a bug? A peculiar thing in my dataset? I tend to think this could be due
to some variables (or their names) as all notations seem to agree when I
remove some variables (although I haven't found out which variable(s) is
(are) at fault), e.g.:

my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100",
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103",
"HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41,
121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83,
1656.46, 1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2),
Length = c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854,
32.075, 21.337, 35.459), Width = c(45.982, 67.303, 52.679, 26.42,
25.149, 33.427, 20.683, 62.783, 26.417, 35.297), RAWMAT = c("FLINT",
"FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", "HORNFELS", "HORNFELS",
"HORNFELS", "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L,
113L, 114L, 115L), class = "data.frame")

aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT, data =
my_data2, FUN = mean, na.rm = TRUE)

aggregate(. ~ RAWMAT, data = my_data2[-1], FUN = mean, na.rm = TRUE)

group_by(my_data2, RAWMAT) %>%
   summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))


Thank you in advance for any hint.
Best wishes,
Ivan




*LEIBNIZ-ZENTRUM*
*F�R ARCH�OLOGIE*

*Dr. Ivan CALANDRA*
**Head of IMPALA (IMaging Platform At LeizA)

*MONREPOS* Archaeological Research Centre, Schloss Monrepos
56567 Neuwied, Germany

T: +49 2631 9772 243
T: +49 6131 8885 543
ivan.calan...@leiza.de

leiza.de 

ORCID 
ResearchGate


LEIZA is a foundation under public law of the State of
Rhineland-Palatinate and the City of Mainz. Its headquarters are in
Mainz. Supervision is carried out by the Ministry of Science and Health
of the State of Rhineland-Palatinate. LEIZA is a research museum of the
Leibniz Association.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting

Re: [R] aggregate wind direction data with wind speed required

2023-06-10 Thread Stefano Sofia

Sorry for the big delay of my report, but prolonged severe weather conditions 
often absorb a large amount of time and energies to my studies.

Thanks to all of you for your suggestions.

I have not been able to implement Bert Gunter's hint, his code gave me an error 
I have not been able to fix.

Bill Dunlap's hint is smart and has been resolutive.


Thank you again to all of you

Stefano



 (oo)
--oOO--( )--OOo--
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.so...@regione.marche.it
---Oo-oO



Da: Bill Dunlap 
Inviato: sabato 13 maggio 2023 22:38
A: Stefano Sofia
Cc: r-help@R-project.org
Oggetto: Re: [R] aggregate wind direction data with wind speed required

I think that using complex numbers to represent the wind velocity makes this 
simpler.  You would need to write some simple conversion functions since wind 
directions are typically measured clockwise from north and the argument of a 
complex number is measured counterclockwise from east.  E.g.,

windToComplex <-
function(speed, degreesCW) {
  complex(mod=speed, arg=(90-degreesCW)/180*pi)
}
complexToWind <-
function(z) {
  # Convert complex velocity z to speed and direction (degrees clockwise
  # from north, in range [0,360)).
  stopifnot(is.complex(z))
  data.frame(speed = Mod(z), degreesCW = (pi - Arg(z*1i))/(2*pi)*360)
}

Then use FUN=mean instead of my_fun.

-Bill

On Sat, May 13, 2023 at 7:51 AM Stefano Sofia 
mailto:stefano.so...@regione.marche.it>> wrote:
Dear list users,

I have to aggregate wind direction data (wd) using a function that requires 
also a second input variable, wind speed (ws).

This is the function that I need to use:


my_fun <- function(wd1, ws1){

  u_component <- -ws1*sin(2*pi*wd1/360)
  v_component <- -ws1*cos(2*pi*wd1/360)
  mean_u <- mean(u_component, na.rm=T)
  mean_v <- mean(v_component, na.rm=T)
  mean_wd <- (atan2(mean_u, mean_v) * 360/2/pi) + 180
  result <- mean_wd
  result
}

Does the aggregate function work only with functions with a single input 
variable (the one that I want to aggregate), or its use can be extended to 
functions with two input variables?

Here a simple example (which is meaningless, the important think is the concept 
behind it):
df <- data.frame(day=c(1, 1, 1, 2, 2, 2, 3, 3), month=c(1, 1, 2, 2, 2, 2, 2, 
2), wd=c(45, 90, 90, 135, 180, 270, 270, 315), ws=c(7, 7, 8, 3, 2, 7, 14, 13))

aggregate(wd ~ day + month, data=df, FUN = my_fun)

cannot work, because ws is not taken into consideration.

I got lost. Any hint, any help?
I hope to have been able to explain my problem.
Thank you for your attention,
Stefano


 (oo)
--oOO--( )--OOo--
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.so...@regione.marche.it<mailto:stefano.so...@regione.marche.it>
---Oo-oO



AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere 
informazioni confidenziali, pertanto è destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione Marche 
possono contenere informazioni confidenziali e con privilegi legali. Se non si 
è il destinatario specificato, non leggere, copiare, inoltrare o archiviare 
questo messaggio. Se si è ricevuto questo messaggio per errore, inoltrarlo al 
mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi 
dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessità ed 
urgenza, la risposta al presente messaggio di posta elettronica può essere 
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages to clients of Regione Marche may contain information that is 
confidential and legally privileged. Please do not read, copy, forward, or 
store this message unless you are an intended recipient of it. If you have 
received this message in error, please forward it to the sender and delete it 
completely from your computer system.

--
Questo messaggio  stato analizzato da Libraesva ESG ed  risultato non infetto.
This message was scanned by Libraesva ESG and is believed to be clean.


[[alternative HTML version deleted]]

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help<https://urlsand.esvalabs.com/?u=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2

Re: [R] aggregate wind direction data with wind speed required

2023-05-13 Thread Jeff Newmiller

You don't have to bother with the subtracting from pi/2 bit ... just assume the 
cartesian complex values are (y,x) instead of (x,y).

On May 13, 2023 1:38:51 PM PDT, Bill Dunlap  wrote:
>I think that using complex numbers to represent the wind velocity makes
>this simpler.  You would need to write some simple conversion functions
>since wind directions are typically measured clockwise from north and the
>argument of a complex number is measured counterclockwise from east.  E.g.,
>
>windToComplex <-
>function(speed, degreesCW) {
>  complex(mod=speed, arg=(90-degreesCW)/180*pi)
>}
>complexToWind <-
>function(z) {
>  # Convert complex velocity z to speed and direction (degrees clockwise
>  # from north, in range [0,360)).
>  stopifnot(is.complex(z))
>  data.frame(speed = Mod(z), degreesCW = (pi - Arg(z*1i))/(2*pi)*360)
>}
>
>Then use FUN=mean instead of my_fun.
>
>-Bill
>
>On Sat, May 13, 2023 at 7:51 AM Stefano Sofia <
>stefano.so...@regione.marche.it> wrote:
>
>> Dear list users,
>>
>> I have to aggregate wind direction data (wd) using a function that
>> requires also a second input variable, wind speed (ws).
>>
>> This is the function that I need to use:
>>
>>
>> my_fun <- function(wd1, ws1){
>>
>>   u_component <- -ws1*sin(2*pi*wd1/360)
>>   v_component <- -ws1*cos(2*pi*wd1/360)
>>   mean_u <- mean(u_component, na.rm=T)
>>   mean_v <- mean(v_component, na.rm=T)
>>   mean_wd <- (atan2(mean_u, mean_v) * 360/2/pi) + 180
>>   result <- mean_wd
>>   result
>> }
>>
>> Does the aggregate function work only with functions with a single input
>> variable (the one that I want to aggregate), or its use can be extended to
>> functions with two input variables?
>>
>> Here a simple example (which is meaningless, the important think is the
>> concept behind it):
>> df <- data.frame(day=c(1, 1, 1, 2, 2, 2, 3, 3), month=c(1, 1, 2, 2, 2, 2,
>> 2, 2), wd=c(45, 90, 90, 135, 180, 270, 270, 315), ws=c(7, 7, 8, 3, 2, 7,
>> 14, 13))
>>
>> aggregate(wd ~ day + month, data=df, FUN = my_fun)
>>
>> cannot work, because ws is not taken into consideration.
>>
>> I got lost. Any hint, any help?
>> I hope to have been able to explain my problem.
>> Thank you for your attention,
>> Stefano
>>
>>
>>  (oo)
>> --oOO--( )--OOo--
>> Stefano Sofia PhD
>> Civil Protection - Marche Region - Italy
>> Meteo Section
>> Snow Section
>> Via del Colle Ameno 5
>> 60126 Torrette di Ancona, Ancona (AN)
>> Uff: +39 071 806 7743
>> E-mail: stefano.so...@regione.marche.it
>> ---Oo-oO
>>
>> 
>>
>> AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere
>> informazioni confidenziali, pertanto è destinato solo a persone autorizzate
>> alla ricezione. I messaggi di posta elettronica per i client di Regione
>> Marche possono contenere informazioni confidenziali e con privilegi legali.
>> Se non si è il destinatario specificato, non leggere, copiare, inoltrare o
>> archiviare questo messaggio. Se si è ricevuto questo messaggio per errore,
>> inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio
>> computer. Ai sensi dell'art. 6 della DGR n. 1394/2008 si segnala che, in
>> caso di necessità ed urgenza, la risposta al presente messaggio di posta
>> elettronica può essere visionata da persone estranee al destinatario.
>> IMPORTANT NOTICE: This e-mail message is intended to be received only by
>> persons entitled to receive the confidential information it may contain.
>> E-mail messages to clients of Regione Marche may contain information that
>> is confidential and legally privileged. Please do not read, copy, forward,
>> or store this message unless you are an intended recipient of it. If you
>> have received this message in error, please forward it to the sender and
>> delete it completely from your computer system.
>>
>> --
>> Questo messaggio  stato analizzato da Libraesva ESG ed  risultato non
>> infetto.
>> This message was scanned by Libraesva ESG and is believed to be clean.
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see

Re: [R] aggregate wind direction data with wind speed required

2023-05-13 Thread Bill Dunlap

I think that using complex numbers to represent the wind velocity makes
this simpler.  You would need to write some simple conversion functions
since wind directions are typically measured clockwise from north and the
argument of a complex number is measured counterclockwise from east.  E.g.,

windToComplex <-
function(speed, degreesCW) {
  complex(mod=speed, arg=(90-degreesCW)/180*pi)
}
complexToWind <-
function(z) {
  # Convert complex velocity z to speed and direction (degrees clockwise
  # from north, in range [0,360)).
  stopifnot(is.complex(z))
  data.frame(speed = Mod(z), degreesCW = (pi - Arg(z*1i))/(2*pi)*360)
}

Then use FUN=mean instead of my_fun.

-Bill

On Sat, May 13, 2023 at 7:51 AM Stefano Sofia <
stefano.so...@regione.marche.it> wrote:

> Dear list users,
>
> I have to aggregate wind direction data (wd) using a function that
> requires also a second input variable, wind speed (ws).
>
> This is the function that I need to use:
>
>
> my_fun <- function(wd1, ws1){
>
>   u_component <- -ws1*sin(2*pi*wd1/360)
>   v_component <- -ws1*cos(2*pi*wd1/360)
>   mean_u <- mean(u_component, na.rm=T)
>   mean_v <- mean(v_component, na.rm=T)
>   mean_wd <- (atan2(mean_u, mean_v) * 360/2/pi) + 180
>   result <- mean_wd
>   result
> }
>
> Does the aggregate function work only with functions with a single input
> variable (the one that I want to aggregate), or its use can be extended to
> functions with two input variables?
>
> Here a simple example (which is meaningless, the important think is the
> concept behind it):
> df <- data.frame(day=c(1, 1, 1, 2, 2, 2, 3, 3), month=c(1, 1, 2, 2, 2, 2,
> 2, 2), wd=c(45, 90, 90, 135, 180, 270, 270, 315), ws=c(7, 7, 8, 3, 2, 7,
> 14, 13))
>
> aggregate(wd ~ day + month, data=df, FUN = my_fun)
>
> cannot work, because ws is not taken into consideration.
>
> I got lost. Any hint, any help?
> I hope to have been able to explain my problem.
> Thank you for your attention,
> Stefano
>
>
>  (oo)
> --oOO--( )--OOo--
> Stefano Sofia PhD
> Civil Protection - Marche Region - Italy
> Meteo Section
> Snow Section
> Via del Colle Ameno 5
> 60126 Torrette di Ancona, Ancona (AN)
> Uff: +39 071 806 7743
> E-mail: stefano.so...@regione.marche.it
> ---Oo-oO
>
> 
>
> AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere
> informazioni confidenziali, pertanto è destinato solo a persone autorizzate
> alla ricezione. I messaggi di posta elettronica per i client di Regione
> Marche possono contenere informazioni confidenziali e con privilegi legali.
> Se non si è il destinatario specificato, non leggere, copiare, inoltrare o
> archiviare questo messaggio. Se si è ricevuto questo messaggio per errore,
> inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio
> computer. Ai sensi dell'art. 6 della DGR n. 1394/2008 si segnala che, in
> caso di necessità ed urgenza, la risposta al presente messaggio di posta
> elettronica può essere visionata da persone estranee al destinatario.
> IMPORTANT NOTICE: This e-mail message is intended to be received only by
> persons entitled to receive the confidential information it may contain.
> E-mail messages to clients of Regione Marche may contain information that
> is confidential and legally privileged. Please do not read, copy, forward,
> or store this message unless you are an intended recipient of it. If you
> have received this message in error, please forward it to the sender and
> delete it completely from your computer system.
>
> --
> Questo messaggio  stato analizzato da Libraesva ESG ed  risultato non
> infetto.
> This message was scanned by Libraesva ESG and is believed to be clean.
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate wind direction data with wind speed required

2023-05-13 Thread Bert Gunter

Sorry Rui; if you run your code you will get:
Error in FUN(X[[i]], ...) : object 'ws' not found

Moreover, even if you did this:
aggregate(wd ~ day + month, data=df, FUN = my_fun, ws1 = df$ws)
 the answer would be wrong is you need to include only the subsets of ws1
corresponding to the split defined by day + month in each call to my_fun.
aggregate() does not do this (the ... argument is evaluated only once, not
once per subset).

So I think the answer to Stefano's question is no, aggregate doesn't work.
You should use by() or tapply() instead:

##Warning: I haven't checked this carefully. So please do!

>  by(df, list(df$day,df$month), FUN = \(x)with(x, my_fun(wd, ws)))

Cheers,
Bert



On Sat, May 13, 2023 at 8:54 AM Rui Barradas  wrote:

> Às 15:51 de 13/05/2023, Stefano Sofia escreveu:
> > Dear list users,
> >
> > I have to aggregate wind direction data (wd) using a function that
> requires also a second input variable, wind speed (ws).
> >
> > This is the function that I need to use:
> >
> >
> > my_fun <- function(wd1, ws1){
> >
> >u_component <- -ws1*sin(2*pi*wd1/360)
> >v_component <- -ws1*cos(2*pi*wd1/360)
> >mean_u <- mean(u_component, na.rm=T)
> >mean_v <- mean(v_component, na.rm=T)
> >mean_wd <- (atan2(mean_u, mean_v) * 360/2/pi) + 180
> >result <- mean_wd
> >result
> > }
> >
> > Does the aggregate function work only with functions with a single input
> variable (the one that I want to aggregate), or its use can be extended to
> functions with two input variables?
> >
> > Here a simple example (which is meaningless, the important think is the
> concept behind it):
> > df <- data.frame(day=c(1, 1, 1, 2, 2, 2, 3, 3), month=c(1, 1, 2, 2, 2,
> 2, 2, 2), wd=c(45, 90, 90, 135, 180, 270, 270, 315), ws=c(7, 7, 8, 3, 2, 7,
> 14, 13))
> >
> > aggregate(wd ~ day + month, data=df, FUN = my_fun)
> >
> > cannot work, because ws is not taken into consideration.
> >
> > I got lost. Any hint, any help?
> > I hope to have been able to explain my problem.
> > Thank you for your attention,
> > Stefano
> >
> >
> >   (oo)
> > --oOO--( )--OOo--
> > Stefano Sofia PhD
> > Civil Protection - Marche Region - Italy
> > Meteo Section
> > Snow Section
> > Via del Colle Ameno 5
> > 60126 Torrette di Ancona, Ancona (AN)
> > Uff: +39 071 806 7743
> > E-mail: stefano.so...@regione.marche.it
> > ---Oo-oO
> >
> > 
> >
> > AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere
> informazioni confidenziali, pertanto � destinato solo a persone autorizzate
> alla ricezione. I messaggi di posta elettronica per i client di Regione
> Marche possono contenere informazioni confidenziali e con privilegi legali.
> Se non si � il destinatario specificato, non leggere, copiare, inoltrare o
> archiviare questo messaggio. Se si � ricevuto questo messaggio per errore,
> inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio
> computer. Ai sensi dell'art. 6 della DGR n. 1394/2008 si segnala che, in
> caso di necessit� ed urgenza, la risposta al presente messaggio di posta
> elettronica pu� essere visionata da persone estranee al destinatario.
> > IMPORTANT NOTICE: This e-mail message is intended to be received only by
> persons entitled to receive the confidential information it may contain.
> E-mail messages to clients of Regione Marche may contain information that
> is confidential and legally privileged. Please do not read, copy, forward,
> or store this message unless you are an intended recipient of it. If you
> have received this message in error, please forward it to the sender and
> delete it completely from your computer system.
> >
> > --
> >
> > Questo messaggio  stato analizzato da Libraesva ESG ed  risultato non
> infetto.
> >
> > This message was scanned by Libraesva ESG and is believed to be clean.
> >
> >
> >   [[alternative HTML version deleted]]
> >
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> Use the dots argument to pass any number of named arguments to your
> aggregation function.
> In this case, ws1 = ws at the end of the aggregate call.
>
>
> aggregate(wd ~ day + month, data=df, FUN = my_fun, ws1 = ws)
>
>
> You can also give the user the option to remove or not NA's by adding a
> na.rm argument:
>
>
> my_fun <- function(wd1, ws1, na.rm = FALSE) {
>[...]
>mean_u <- mean(u_component, na.rm = na.rm)
>mean_v <- mean(v_component, na.rm = na.rm)
>[...]
> }
>
> aggregate(wd ~ day + month, data=df, FUN = my_fun, ws1 = ws, na.rm = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> __
>

Re: [R] aggregate wind direction data with wind speed required

2023-05-13 Thread Rui Barradas


Às 15:51 de 13/05/2023, Stefano Sofia escreveu:

Dear list users,

I have to aggregate wind direction data (wd) using a function that requires 
also a second input variable, wind speed (ws).

This is the function that I need to use:


my_fun <- function(wd1, ws1){

   u_component <- -ws1*sin(2*pi*wd1/360)
   v_component <- -ws1*cos(2*pi*wd1/360)
   mean_u <- mean(u_component, na.rm=T)
   mean_v <- mean(v_component, na.rm=T)
   mean_wd <- (atan2(mean_u, mean_v) * 360/2/pi) + 180
   result <- mean_wd
   result
}

Does the aggregate function work only with functions with a single input 
variable (the one that I want to aggregate), or its use can be extended to 
functions with two input variables?

Here a simple example (which is meaningless, the important think is the concept 
behind it):
df <- data.frame(day=c(1, 1, 1, 2, 2, 2, 3, 3), month=c(1, 1, 2, 2, 2, 2, 2, 
2), wd=c(45, 90, 90, 135, 180, 270, 270, 315), ws=c(7, 7, 8, 3, 2, 7, 14, 13))

aggregate(wd ~ day + month, data=df, FUN = my_fun)

cannot work, because ws is not taken into consideration.

I got lost. Any hint, any help?
I hope to have been able to explain my problem.
Thank you for your attention,
Stefano


  (oo)
--oOO--( )--OOo--
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.so...@regione.marche.it
---Oo-oO



AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere 
informazioni confidenziali, pertanto � destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione Marche 
possono contenere informazioni confidenziali e con privilegi legali. Se non si 
� il destinatario specificato, non leggere, copiare, inoltrare o archiviare 
questo messaggio. Se si � ricevuto questo messaggio per errore, inoltrarlo al 
mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi 
dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit� ed 
urgenza, la risposta al presente messaggio di posta elettronica pu� essere 
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages to clients of Regione Marche may contain information that is 
confidential and legally privileged. Please do not read, copy, forward, or 
store this message unless you are an intended recipient of it. If you have 
received this message in error, please forward it to the sender and delete it 
completely from your computer system.

--

Questo messaggio  stato analizzato da Libraesva ESG ed  risultato non infetto.

This message was scanned by Libraesva ESG and is believed to be clean.


[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Use the dots argument to pass any number of named arguments to your 
aggregation function.

In this case, ws1 = ws at the end of the aggregate call.


aggregate(wd ~ day + month, data=df, FUN = my_fun, ws1 = ws)


You can also give the user the option to remove or not NA's by adding a 
na.rm argument:



my_fun <- function(wd1, ws1, na.rm = FALSE) {
  [...]
  mean_u <- mean(u_component, na.rm = na.rm)
  mean_v <- mean(v_component, na.rm = na.rm)
  [...]
}

aggregate(wd ~ day + month, data=df, FUN = my_fun, ws1 = ws, na.rm = TRUE)


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate semi-hourly data not 00-24 but 9-9

2020-09-22 Thread Stefano Sofia

Yes, thank you so much.

Stefano

 (oo)
--oOO--( )--OOo
Stefano Sofia PhD
Civil Protection - Marche Region
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona
Uff: 071 806 7743
E-mail: stefano.so...@regione.marche.it
---Oo-oO


Da: Eric Berger [ericjber...@gmail.com]
Inviato: martedì 22 settembre 2020 11.00
A: Jeff Newmiller
Cc: Stefano Sofia; r-help mailing list
Oggetto: Re: [R] aggregate semi-hourly data not 00-24 but 9-9

Thanks Jeff.
Stefano, per Jeff's comment, you can replace the line

df1$data_POSIXminus9 <- df1$data_POSIX - lubridate::hours(9)

by

df1$data_POSIXminus9 <- df1$data_POSIX - as.difftime(9,units="hours")

On Mon, Sep 21, 2020 at 8:06 PM Jeff Newmiller  wrote:
>
> The base R as.difftime function is perfectly usable to create this offset 
> without pulling in lubridate.
>
> On September 21, 2020 8:06:51 AM PDT, Eric Berger  
> wrote:
> >Hi Stefano,
> >If you mean from 9am on one day to 9am on the following day, you can
> >do a trick. Simply subtract 9hrs from each timestamp and then you want
> >midnight to midnight for these adjusted times, which you can get using
> >the method you followed.
> >
> >I googled and found that lubridate::hours() can be used to add or
> >subtract hours from a POSIXct.
> >
> >library(lubridate)
> >
> >day_1 <- as.POSIXct("2020-02-19-00-00", format="%Y-%m-%d-%H-%M",
> >tz="Etc/GMT-1")
> >day_2 <- as.POSIXct("2020-02-24-12-00", format="%Y-%m-%d-%H-%M",
> >tz="Etc/GMT-1")
> >df1 <- data.frame(data_POSIX=seq(day_1, day_2, by="30 min"))
> >df1$hs <- rnorm(nrow(df1), 40, 10)
> >df1$diff[2:nrow(df1)] <- diff(df1$hs)
> >
> >df1$data_POSIXminus9 <- df1$data_POSIX - lubridate::hours(9)
> >df1$dayX <- format(df1$data_POSIXminus9,"%y-%m-%d")
> >df2X <- aggregate(diff ~ dayX, df1, sum)
> >df2X
> >
> >HTH,
> >Eric
> >
> >On Mon, Sep 21, 2020 at 5:30 PM Stefano Sofia
> > wrote:
> >>
> >> Dear R-list members,
> >> I have semi-hourly snowfall data.
> >> I should sum the semi-hourly increments (only the positive ones, but
> >this is not described in my example) day by day, not from 00 to 24 but
> >from 9 to 9.
> >>
> >> I am able to use the diff function, create a list of days and use the
> >function aggregate, but it works only from 0 to 24. Any suggestion for
> >an efficient way to do it?
> >> Here my code:
> >> day_1 <- as.POSIXct("2020-02-19-00-00", format="%Y-%m-%d-%H-%M",
> >tz="Etc/GMT-1")
> >> day_2 <- as.POSIXct("2020-02-24-12-00", format="%Y-%m-%d-%H-%M",
> >tz="Etc/GMT-1")
> >> df1 <- data.frame(data_POSIX=seq(day_1, day_2, by="30 min"))
> >> df1$hs <- rnorm(nrows(df1), 40, 10)
> >> df1$diff[2:nrow(df1)] <- diff(df1$hs)
> >> df1$day <- format(df$data_POSIX,"%y-%m-%d")
> >> df2 <- aggregate(diff ~ day, df, sum)
> >>
> >> Thank you for your help
> >> Stefano
> >>
> >>  (oo)
> >> --oOO--( )--OOo
> >> Stefano Sofia PhD
> >> Civil Protection - Marche Region
> >> Meteo Section
> >> Snow Section
> >> Via del Colle Ameno 5
> >> 60126 Torrette di Ancona, Ancona
> >> Uff: 071 806 7743
> >> E-mail: stefano.so...@regione.marche.it
> >> ---Oo-oO
> >>
> >> 
> >>
> >> AVVISO IMPORTANTE: Questo messaggio di posta elettronica può
> >contenere informazioni confidenziali, pertanto è destinato solo a
> >persone autorizzate alla ricezione. I messaggi di posta elettronica per
> >i client di Regione Marche possono contenere informazioni confidenziali
> >e con privilegi legali. Se non si è il destinatario specificato, non
> >leggere, copiare, inoltrare o archiviare questo messaggio. Se si è
> >ricevuto questo messaggio per errore, inoltrarlo al mittente ed
> >eliminarlo completamente dal sistema del proprio computer. Ai sensi
> >dell’art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessità
> >ed urgenza, la risposta al presente messaggio di posta elettronica può
> >essere visionata da persone estranee al destinatario.
> >> IMPORTANT NOTICE: This e-mail message is intended to be received only
> >by persons entitled to receive the confidential information it may
> >contain

Re: [R] aggregate semi-hourly data not 00-24 but 9-9

2020-09-22 Thread Eric Berger

Thanks Jeff.
Stefano, per Jeff's comment, you can replace the line

df1$data_POSIXminus9 <- df1$data_POSIX - lubridate::hours(9)

by

df1$data_POSIXminus9 <- df1$data_POSIX - as.difftime(9,units="hours")

On Mon, Sep 21, 2020 at 8:06 PM Jeff Newmiller  wrote:
>
> The base R as.difftime function is perfectly usable to create this offset 
> without pulling in lubridate.
>
> On September 21, 2020 8:06:51 AM PDT, Eric Berger  
> wrote:
> >Hi Stefano,
> >If you mean from 9am on one day to 9am on the following day, you can
> >do a trick. Simply subtract 9hrs from each timestamp and then you want
> >midnight to midnight for these adjusted times, which you can get using
> >the method you followed.
> >
> >I googled and found that lubridate::hours() can be used to add or
> >subtract hours from a POSIXct.
> >
> >library(lubridate)
> >
> >day_1 <- as.POSIXct("2020-02-19-00-00", format="%Y-%m-%d-%H-%M",
> >tz="Etc/GMT-1")
> >day_2 <- as.POSIXct("2020-02-24-12-00", format="%Y-%m-%d-%H-%M",
> >tz="Etc/GMT-1")
> >df1 <- data.frame(data_POSIX=seq(day_1, day_2, by="30 min"))
> >df1$hs <- rnorm(nrow(df1), 40, 10)
> >df1$diff[2:nrow(df1)] <- diff(df1$hs)
> >
> >df1$data_POSIXminus9 <- df1$data_POSIX - lubridate::hours(9)
> >df1$dayX <- format(df1$data_POSIXminus9,"%y-%m-%d")
> >df2X <- aggregate(diff ~ dayX, df1, sum)
> >df2X
> >
> >HTH,
> >Eric
> >
> >On Mon, Sep 21, 2020 at 5:30 PM Stefano Sofia
> > wrote:
> >>
> >> Dear R-list members,
> >> I have semi-hourly snowfall data.
> >> I should sum the semi-hourly increments (only the positive ones, but
> >this is not described in my example) day by day, not from 00 to 24 but
> >from 9 to 9.
> >>
> >> I am able to use the diff function, create a list of days and use the
> >function aggregate, but it works only from 0 to 24. Any suggestion for
> >an efficient way to do it?
> >> Here my code:
> >> day_1 <- as.POSIXct("2020-02-19-00-00", format="%Y-%m-%d-%H-%M",
> >tz="Etc/GMT-1")
> >> day_2 <- as.POSIXct("2020-02-24-12-00", format="%Y-%m-%d-%H-%M",
> >tz="Etc/GMT-1")
> >> df1 <- data.frame(data_POSIX=seq(day_1, day_2, by="30 min"))
> >> df1$hs <- rnorm(nrows(df1), 40, 10)
> >> df1$diff[2:nrow(df1)] <- diff(df1$hs)
> >> df1$day <- format(df$data_POSIX,"%y-%m-%d")
> >> df2 <- aggregate(diff ~ day, df, sum)
> >>
> >> Thank you for your help
> >> Stefano
> >>
> >>  (oo)
> >> --oOO--( )--OOo
> >> Stefano Sofia PhD
> >> Civil Protection - Marche Region
> >> Meteo Section
> >> Snow Section
> >> Via del Colle Ameno 5
> >> 60126 Torrette di Ancona, Ancona
> >> Uff: 071 806 7743
> >> E-mail: stefano.so...@regione.marche.it
> >> ---Oo-oO
> >>
> >> 
> >>
> >> AVVISO IMPORTANTE: Questo messaggio di posta elettronica può
> >contenere informazioni confidenziali, pertanto è destinato solo a
> >persone autorizzate alla ricezione. I messaggi di posta elettronica per
> >i client di Regione Marche possono contenere informazioni confidenziali
> >e con privilegi legali. Se non si è il destinatario specificato, non
> >leggere, copiare, inoltrare o archiviare questo messaggio. Se si è
> >ricevuto questo messaggio per errore, inoltrarlo al mittente ed
> >eliminarlo completamente dal sistema del proprio computer. Ai sensi
> >dell’art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessità
> >ed urgenza, la risposta al presente messaggio di posta elettronica può
> >essere visionata da persone estranee al destinatario.
> >> IMPORTANT NOTICE: This e-mail message is intended to be received only
> >by persons entitled to receive the confidential information it may
> >contain. E-mail messages to clients of Regione Marche may contain
> >information that is confidential and legally privileged. Please do not
> >read, copy, forward, or store this message unless you are an intended
> >recipient of it. If you have received this message in error, please
> >forward it to the sender and delete it completely from your computer
> >system.
> >>
> >> --
> >> Questo messaggio  stato analizzato da Libra ESVA ed  risultato non
> >infetto.
> >> This message was scanned by Libra ESVA and is believed to be clean.
> >>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org

Re: [R] aggregate semi-hourly data not 00-24 but 9-9

2020-09-21 Thread Jeff Newmiller

The base R as.difftime function is perfectly usable to create this offset 
without pulling in lubridate.

On September 21, 2020 8:06:51 AM PDT, Eric Berger  wrote:
>Hi Stefano,
>If you mean from 9am on one day to 9am on the following day, you can
>do a trick. Simply subtract 9hrs from each timestamp and then you want
>midnight to midnight for these adjusted times, which you can get using
>the method you followed.
>
>I googled and found that lubridate::hours() can be used to add or
>subtract hours from a POSIXct.
>
>library(lubridate)
>
>day_1 <- as.POSIXct("2020-02-19-00-00", format="%Y-%m-%d-%H-%M",
>tz="Etc/GMT-1")
>day_2 <- as.POSIXct("2020-02-24-12-00", format="%Y-%m-%d-%H-%M",
>tz="Etc/GMT-1")
>df1 <- data.frame(data_POSIX=seq(day_1, day_2, by="30 min"))
>df1$hs <- rnorm(nrow(df1), 40, 10)
>df1$diff[2:nrow(df1)] <- diff(df1$hs)
>
>df1$data_POSIXminus9 <- df1$data_POSIX - lubridate::hours(9)
>df1$dayX <- format(df1$data_POSIXminus9,"%y-%m-%d")
>df2X <- aggregate(diff ~ dayX, df1, sum)
>df2X
>
>HTH,
>Eric
>
>On Mon, Sep 21, 2020 at 5:30 PM Stefano Sofia
> wrote:
>>
>> Dear R-list members,
>> I have semi-hourly snowfall data.
>> I should sum the semi-hourly increments (only the positive ones, but
>this is not described in my example) day by day, not from 00 to 24 but
>from 9 to 9.
>>
>> I am able to use the diff function, create a list of days and use the
>function aggregate, but it works only from 0 to 24. Any suggestion for
>an efficient way to do it?
>> Here my code:
>> day_1 <- as.POSIXct("2020-02-19-00-00", format="%Y-%m-%d-%H-%M",
>tz="Etc/GMT-1")
>> day_2 <- as.POSIXct("2020-02-24-12-00", format="%Y-%m-%d-%H-%M",
>tz="Etc/GMT-1")
>> df1 <- data.frame(data_POSIX=seq(day_1, day_2, by="30 min"))
>> df1$hs <- rnorm(nrows(df1), 40, 10)
>> df1$diff[2:nrow(df1)] <- diff(df1$hs)
>> df1$day <- format(df$data_POSIX,"%y-%m-%d")
>> df2 <- aggregate(diff ~ day, df, sum)
>>
>> Thank you for your help
>> Stefano
>>
>>  (oo)
>> --oOO--( )--OOo
>> Stefano Sofia PhD
>> Civil Protection - Marche Region
>> Meteo Section
>> Snow Section
>> Via del Colle Ameno 5
>> 60126 Torrette di Ancona, Ancona
>> Uff: 071 806 7743
>> E-mail: stefano.so...@regione.marche.it
>> ---Oo-oO
>>
>> 
>>
>> AVVISO IMPORTANTE: Questo messaggio di posta elettronica può
>contenere informazioni confidenziali, pertanto è destinato solo a
>persone autorizzate alla ricezione. I messaggi di posta elettronica per
>i client di Regione Marche possono contenere informazioni confidenziali
>e con privilegi legali. Se non si è il destinatario specificato, non
>leggere, copiare, inoltrare o archiviare questo messaggio. Se si è
>ricevuto questo messaggio per errore, inoltrarlo al mittente ed
>eliminarlo completamente dal sistema del proprio computer. Ai sensi
>dell’art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessità
>ed urgenza, la risposta al presente messaggio di posta elettronica può
>essere visionata da persone estranee al destinatario.
>> IMPORTANT NOTICE: This e-mail message is intended to be received only
>by persons entitled to receive the confidential information it may
>contain. E-mail messages to clients of Regione Marche may contain
>information that is confidential and legally privileged. Please do not
>read, copy, forward, or store this message unless you are an intended
>recipient of it. If you have received this message in error, please
>forward it to the sender and delete it completely from your computer
>system.
>>
>> --
>> Questo messaggio  stato analizzato da Libra ESVA ed  risultato non
>infetto.
>> This message was scanned by Libra ESVA and is believed to be clean.
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate semi-hourly data not 00-24 but 9-9

2020-09-21 Thread Eric Berger

Hi Stefano,
If you mean from 9am on one day to 9am on the following day, you can
do a trick. Simply subtract 9hrs from each timestamp and then you want
midnight to midnight for these adjusted times, which you can get using
the method you followed.

I googled and found that lubridate::hours() can be used to add or
subtract hours from a POSIXct.

library(lubridate)

day_1 <- as.POSIXct("2020-02-19-00-00", format="%Y-%m-%d-%H-%M", tz="Etc/GMT-1")
day_2 <- as.POSIXct("2020-02-24-12-00", format="%Y-%m-%d-%H-%M", tz="Etc/GMT-1")
df1 <- data.frame(data_POSIX=seq(day_1, day_2, by="30 min"))
df1$hs <- rnorm(nrow(df1), 40, 10)
df1$diff[2:nrow(df1)] <- diff(df1$hs)

df1$data_POSIXminus9 <- df1$data_POSIX - lubridate::hours(9)
df1$dayX <- format(df1$data_POSIXminus9,"%y-%m-%d")
df2X <- aggregate(diff ~ dayX, df1, sum)
df2X

HTH,
Eric

On Mon, Sep 21, 2020 at 5:30 PM Stefano Sofia
 wrote:
>
> Dear R-list members,
> I have semi-hourly snowfall data.
> I should sum the semi-hourly increments (only the positive ones, but this is 
> not described in my example) day by day, not from 00 to 24 but from 9 to 9.
>
> I am able to use the diff function, create a list of days and use the 
> function aggregate, but it works only from 0 to 24. Any suggestion for an 
> efficient way to do it?
> Here my code:
> day_1 <- as.POSIXct("2020-02-19-00-00", format="%Y-%m-%d-%H-%M", 
> tz="Etc/GMT-1")
> day_2 <- as.POSIXct("2020-02-24-12-00", format="%Y-%m-%d-%H-%M", 
> tz="Etc/GMT-1")
> df1 <- data.frame(data_POSIX=seq(day_1, day_2, by="30 min"))
> df1$hs <- rnorm(nrows(df1), 40, 10)
> df1$diff[2:nrow(df1)] <- diff(df1$hs)
> df1$day <- format(df$data_POSIX,"%y-%m-%d")
> df2 <- aggregate(diff ~ day, df, sum)
>
> Thank you for your help
> Stefano
>
>  (oo)
> --oOO--( )--OOo
> Stefano Sofia PhD
> Civil Protection - Marche Region
> Meteo Section
> Snow Section
> Via del Colle Ameno 5
> 60126 Torrette di Ancona, Ancona
> Uff: 071 806 7743
> E-mail: stefano.so...@regione.marche.it
> ---Oo-oO
>
> 
>
> AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere 
> informazioni confidenziali, pertanto è destinato solo a persone autorizzate 
> alla ricezione. I messaggi di posta elettronica per i client di Regione 
> Marche possono contenere informazioni confidenziali e con privilegi legali. 
> Se non si è il destinatario specificato, non leggere, copiare, inoltrare o 
> archiviare questo messaggio. Se si è ricevuto questo messaggio per errore, 
> inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio 
> computer. Ai sensi dell’art. 6 della DGR n. 1394/2008 si segnala che, in caso 
> di necessità ed urgenza, la risposta al presente messaggio di posta 
> elettronica può essere visionata da persone estranee al destinatario.
> IMPORTANT NOTICE: This e-mail message is intended to be received only by 
> persons entitled to receive the confidential information it may contain. 
> E-mail messages to clients of Regione Marche may contain information that is 
> confidential and legally privileged. Please do not read, copy, forward, or 
> store this message unless you are an intended recipient of it. If you have 
> received this message in error, please forward it to the sender and delete it 
> completely from your computer system.
>
> --
> Questo messaggio  stato analizzato da Libra ESVA ed  risultato non infetto.
> This message was scanned by Libra ESVA and is believed to be clean.
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate individual level data to age categories

2020-02-12 Thread stefan.d...@gmail.com

Thank you!
This is exactly what I was looking for!
Cheers!


On Wed, Feb 12, 2020 at 11:29 PM Jim Lemon  wrote:
>
> Hi Stefan,
> How about this:
>
> sddf<-read.table(text="age x
> 45   1
> 45   2
> 46   1
> 47   3
> 47   3",
> header=TRUE)
> library(prettyR)
> sdtab<-xtab(age~x,sddf)
> sdtab$counts
>
> Jim
>
> On Thu, Feb 13, 2020 at 7:40 AM stefan.d...@gmail.com
>  wrote:
> >
> > Dear All,
> >
> > I have a seemingly standard problem to which I somehow I do  not find
> > a simple solution. I have individual level data where x is a
> > categorical variable with 3 categories which I would like to aggregate
> > by age.
> >
> > age x
> > 45   1
> > 45   2
> > 46   1
> > 47   3
> > 47   3
> >  and so on.
> >
> > It should after transformation look like that
> >
> > age x_1 x_2 x_3
> > 451 0   1
> > 461 0   0
> > 47 00   2
> >
> > Basically to calculate prevalences by age categories.
> >
> > Thanks for any pointers!
> >
> > Cheers!
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate individual level data to age categories

2020-02-12 Thread Jim Lemon

Hi Stefan,
How about this:

sddf<-read.table(text="age x
45   1
45   2
46   1
47   3
47   3",
header=TRUE)
library(prettyR)
sdtab<-xtab(age~x,sddf)
sdtab$counts

Jim

On Thu, Feb 13, 2020 at 7:40 AM stefan.d...@gmail.com
 wrote:
>
> Dear All,
>
> I have a seemingly standard problem to which I somehow I do  not find
> a simple solution. I have individual level data where x is a
> categorical variable with 3 categories which I would like to aggregate
> by age.
>
> age x
> 45   1
> 45   2
> 46   1
> 47   3
> 47   3
>  and so on.
>
> It should after transformation look like that
>
> age x_1 x_2 x_3
> 451 0   1
> 461 0   0
> 47 00   2
>
> Basically to calculate prevalences by age categories.
>
> Thanks for any pointers!
>
> Cheers!
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate individual level data to age categories

2020-02-12 Thread stefan.d...@gmail.com

Thank you, this is already very helpful.

But how do I get it in the form
age  var_x=1  var_x=2  var_x=3
45 1  1 0
46  1  00

So it would be a data frame with 4 variables.

Cheers!

On Wed, Feb 12, 2020 at 10:25 PM William Dunlap  wrote:
>
> You didn't say how you wanted to use it as a data.frame, but here is one way
>
> d <- data.frame(
> check.names = FALSE,
> age = c(45L, 45L, 46L, 47L, 47L),
> x = c(1L, 2L, 1L, 3L, 3L))
> with(d, as.data.frame(table(age,x)))
>
> which gives:
>   age x Freq
> 1  45 11
> 2  46 11
> 3  47 10
> 4  45 21
> 5  46 20
> 6  47 20
> 7  45 30
> 8  46 30
> 9  47 32
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Wed, Feb 12, 2020 at 1:12 PM stefan.d...@gmail.com  
> wrote:
>>
>> well, if I think about, its actually a simple frequency table grouped
>> by age. but it should be usable a matrix or data frame.
>>
>> On Wed, Feb 12, 2020 at 9:48 PM  wrote:
>> >
>> > So a pivot table?
>> >
>> > On 12 Feb 2020 20:39, stefan.d...@gmail.com wrote:
>> >
>> > Dear All,
>> >
>> > I have a seemingly standard problem to which I somehow I do  not find
>> > a simple solution. I have individual level data where x is a
>> > categorical variable with 3 categories which I would like to aggregate
>> > by age.
>> >
>> > age x
>> > 45   1
>> > 45   2
>> > 46   1
>> > 47   3
>> > 47   3
>> > and so on.
>> >
>> > It should after transformation look like that
>> >
>> > age x_1 x_2 x_3
>> > 451 0   1
>> > 461 0   0
>> > 47 00   2
>> >
>> > Basically to calculate prevalences by age categories.
>> >
>> > Thanks for any pointers!
>> >
>> > Cheers!
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide 
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate individual level data to age categories

2020-02-12 Thread William Dunlap via R-help

You didn't say how you wanted to use it as a data.frame, but here is one way

d <- data.frame(
check.names = FALSE,
age = c(45L, 45L, 46L, 47L, 47L),
x = c(1L, 2L, 1L, 3L, 3L))
with(d, as.data.frame(table(age,x)))

which gives:
  age x Freq
1  45 11
2  46 11
3  47 10
4  45 21
5  46 20
6  47 20
7  45 30
8  46 30
9  47 32

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Feb 12, 2020 at 1:12 PM stefan.d...@gmail.com 
wrote:

> well, if I think about, its actually a simple frequency table grouped
> by age. but it should be usable a matrix or data frame.
>
> On Wed, Feb 12, 2020 at 9:48 PM  wrote:
> >
> > So a pivot table?
> >
> > On 12 Feb 2020 20:39, stefan.d...@gmail.com wrote:
> >
> > Dear All,
> >
> > I have a seemingly standard problem to which I somehow I do  not find
> > a simple solution. I have individual level data where x is a
> > categorical variable with 3 categories which I would like to aggregate
> > by age.
> >
> > age x
> > 45   1
> > 45   2
> > 46   1
> > 47   3
> > 47   3
> > and so on.
> >
> > It should after transformation look like that
> >
> > age x_1 x_2 x_3
> > 451 0   1
> > 461 0   0
> > 47 00   2
> >
> > Basically to calculate prevalences by age categories.
> >
> > Thanks for any pointers!
> >
> > Cheers!
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate individual level data to age categories

2020-02-12 Thread stefan.d...@gmail.com

well, if I think about, its actually a simple frequency table grouped
by age. but it should be usable a matrix or data frame.

On Wed, Feb 12, 2020 at 9:48 PM  wrote:
>
> So a pivot table?
>
> On 12 Feb 2020 20:39, stefan.d...@gmail.com wrote:
>
> Dear All,
>
> I have a seemingly standard problem to which I somehow I do  not find
> a simple solution. I have individual level data where x is a
> categorical variable with 3 categories which I would like to aggregate
> by age.
>
> age x
> 45   1
> 45   2
> 46   1
> 47   3
> 47   3
> and so on.
>
> It should after transformation look like that
>
> age x_1 x_2 x_3
> 451 0   1
> 461 0   0
> 47 00   2
>
> Basically to calculate prevalences by age categories.
>
> Thanks for any pointers!
>
> Cheers!
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate output to data frame

2019-03-29 Thread jim holtman

You can also use 'dplyr'

library(tidyverse)
result <- pcr %>%
  group_by(Gene, Type, Rep) %>%
  summarise(mean = mean(Ct),
   sd = sd(Ct),
   oth = sd(Ct) / sqrt(sd(Ct))
  )

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Wed, Mar 27, 2019 at 7:40 PM Jim Lemon  wrote:

> Hi Cyrus,
> Try this:
>
> pcr<-data.frame(Ct=runif(66,10,20),Gene=rep(LETTERS[1:22],3),
>  Type=rep(c("Std","Unkn"),33),Rep=rep(1:3,each=22))
> testagg<-aggregate(pcr$Ct,c(pcr["Gene"],pcr["Type"],pcr["Rep"]),
>  FUN=function(x){c(mean(x), sd(x), sd(x)/sqrt(sd(x)))})
> nxcol<-dim(testagg$x)[2]
> newxs<-paste("x",1:nxcol,sep="")
> for(col in 1:nxcol)
>  testagg[[newxs[col]]]<-testagg$x[,col]
> testagg$x<-NULL
>
> Jim
>
> On Thu, Mar 28, 2019 at 12:39 PM cir p via R-help 
> wrote:
> >
> > Dear users,
> > i am trying to summarize data using "aggregate" with the following
> command:
> >
> >
> aggregate(pcr$Ct,c(pcr["Gene"],pcr["Type"],pcr["Rep"]),FUN=function(x){c(mean(x),
> sd(x), sd(x)/sqrt(sd(x)))})
> >
> > and the structure of the resulting data frame is
> >
> > 'data.frame':66 obs. of  4 variables:
> > $ Gene: Factor w/ 22 levels "14-3-3e","Act5C",..: 1 2 3 4 5 6 7 8 9 10
> ...
> > $ Type: Factor w/ 2 levels "Std","Unkn": 2 2 2 2 2 2 2 2 2 2 ...
> > $ Rep : int  1 1 1 1 1 1 1 1 1 1 ...
> >  $ x   : num [1:66, 1:3] 16.3 16.7 18.2 17.1 18.6 ...
> >
> > The actual data is "bundled" in a matrix $x of the data frame. I would
> like to have the columns of this matrix as individual numeric columns in
> the same data frame instead of a matrix, but cant really figure it out how
> to do this in an efficient way. Could someone help me with the construction
> of this?
> >
> > Thanks a lot,
> >
> > Cyrus
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate output to data frame

2019-03-27 Thread Jim Lemon

Hi Cyrus,
Try this:

pcr<-data.frame(Ct=runif(66,10,20),Gene=rep(LETTERS[1:22],3),
 Type=rep(c("Std","Unkn"),33),Rep=rep(1:3,each=22))
testagg<-aggregate(pcr$Ct,c(pcr["Gene"],pcr["Type"],pcr["Rep"]),
 FUN=function(x){c(mean(x), sd(x), sd(x)/sqrt(sd(x)))})
nxcol<-dim(testagg$x)[2]
newxs<-paste("x",1:nxcol,sep="")
for(col in 1:nxcol)
 testagg[[newxs[col]]]<-testagg$x[,col]
testagg$x<-NULL

Jim

On Thu, Mar 28, 2019 at 12:39 PM cir p via R-help  wrote:
>
> Dear users,
> i am trying to summarize data using "aggregate" with the following command:
>
> aggregate(pcr$Ct,c(pcr["Gene"],pcr["Type"],pcr["Rep"]),FUN=function(x){c(mean(x),
>  sd(x), sd(x)/sqrt(sd(x)))})
>
> and the structure of the resulting data frame is
>
> 'data.frame':66 obs. of  4 variables:
> $ Gene: Factor w/ 22 levels "14-3-3e","Act5C",..: 1 2 3 4 5 6 7 8 9 10 ...
> $ Type: Factor w/ 2 levels "Std","Unkn": 2 2 2 2 2 2 2 2 2 2 ...
> $ Rep : int  1 1 1 1 1 1 1 1 1 1 ...
>  $ x   : num [1:66, 1:3] 16.3 16.7 18.2 17.1 18.6 ...
>
> The actual data is "bundled" in a matrix $x of the data frame. I would like 
> to have the columns of this matrix as individual numeric columns in the same 
> data frame instead of a matrix, but cant really figure it out how to do this 
> in an efficient way. Could someone help me with the construction of this?
>
> Thanks a lot,
>
> Cyrus
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-08 Thread Eik Vettorazzi

Hi,
if you are willing to use dplyr, you can do all in one line of code:

library(dplyr)
df<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))

df%>%group_by(unique_A=A)%>%summarise(list_id=paste(id,collapse=", "))->r

cheers


Am 06.06.2018 um 10:13 schrieb Massimo Bressan:
> #given the following reproducible and simplified example 
> 
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
> t 
> 
> #I need to get the following result 
> 
> r<-data.frame(unique_A=c(123, 345, 678, 
> 789),list_id=c('1,3,6,9','2,5,8','4','7,10')) 
> r 
> 
> # i.e. aggregate over the variable "A" and list all elements of the variable 
> "id" satisfying the criteria of having the same corrisponding value of "A" 
> #any help for that? 
> 
> #so far I've just managed to "aggregate" and "count", like: 
> 
> library(sqldf) 
> sqldf('select count(*) as count_id, A as unique_A from t group by A') 
> 
> library(dplyr) 
> t%>%group_by(unique_A=A) %>% summarise(count_id = n()) 
> 
> # thank you 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistrasse 52
building W 34
20246 Hamburg

Phone: +49 (0) 40 7410 - 58243
Fax:   +49 (0) 40 7410 - 57790
Web: www.uke.de/imbe
--

_

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe 
Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
_

SAVE PAPER - THINK BEFORE PRINTING
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Bert Gunter

which() is unnecessary. Use logical subscripting:

... t$id[t$A ==x]

Further simplification can be gotten by using the with() function:

l <- with(t, sapply(unique(A), function(x) id[A ==x]))

Check this though -- there might be scoping issues.

Cheers,
Bert

On Thu, Jun 7, 2018, 6:49 AM Massimo Bressan 
wrote:

> #ok, finally this is my final "best and more compact" solution of the
> problem by merging different contributions (thanks to all indeed)
>
> t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
>
> l<-sapply(unique(t$A), function(x) t$id[which(t$A==x)])
> r<-data.frame(unique_A= unique(t$A), list_id=unlist(lapply(l, paste,
> collapse = ", ")))
> r
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Massimo Bressan

#ok, finally this is my final "best and more compact" solution of the problem 
by merging different contributions (thanks to all indeed) 

t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
 
l<-sapply(unique(t$A), function(x) t$id[which(t$A==x)]) 
r<-data.frame(unique_A= unique(t$A), list_id=unlist(lapply(l, paste, collapse = 
", "))) 
r 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Massimo Bressan

thank you for the help 

this is my solution based on your valuable hint but without the need to pass 
through the use of a 'tibble' 

x<-data.frame(id=LETTERS[1:10], A=c(123,345,123,678,345,123,789,345,123,789)) 
uA<-unique(x$A) 
idx<-lapply(uA, function(v) which(x$A %in% v)) 
vals<- lapply(idx, function(index) x$id[index]) 
data.frame(unique_A = uA, list_vals=unlist(lapply(vals, paste, collapse = ", 
"))) 

best 



Da: "Ben Tupper"  
A: "Massimo Bressan"  
Cc: "r-help"  
Inviato: Giovedì, 7 giugno 2018 14:47:55 
Oggetto: Re: [R] aggregate and list elements of variables in data.frame 

Hi, 

Does this do what you want? I had to change the id values to something more 
obvious. It uses tibbles which allow each variable to be a list. 

library(tibble) 
library(dplyr) 
x <- tibble(id=LETTERS[1:10], 
A=c(123,345,123,678,345,123,789,345,123,789)) 
uA <- unique(x$A) 
idx <- lapply(uA, function(v) which(x$A %in% v)) 
vals <- lapply(idx, function(index) x$id[index]) 

r <- tibble(unique_A = uA, list_idx = idx, list_vals = vals) 


> r 
# A tibble: 4 x 3 
unique_A list_idx list_vals 
   
1 123.   
2 345.   
3 678.   
4 789.   
> r$list_idx[1] 
[[1]] 
[1] 1 3 6 9 

> r$list_vals[1] 
[[1]] 
[1] "A" "C" "F" "I" 


Cheers, 
ben 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Ben Tupper

Hi,

Does this do what you want?  I had to change the id values to something more 
obvious.  It uses tibbles which allow each variable to be a list.

library(tibble)
library(dplyr)
x   <- tibble(id=LETTERS[1:10],
A=c(123,345,123,678,345,123,789,345,123,789))
uA  <- unique(x$A)
idx <- lapply(uA, function(v) which(x$A %in% v))
vals<- lapply(idx, function(index) x$id[index])

r <- tibble(unique_A = uA, list_idx = idx, list_vals = vals)


> r
# A tibble: 4 x 3
  unique_A list_idx  list_vals
 
1 123.  
2 345.  
3 678.  
4 789.  
> r$list_idx[1]
[[1]]
[1] 1 3 6 9

> r$list_vals[1]
[[1]]
[1] "A" "C" "F" "I"


Cheers,
ben



> On Jun 7, 2018, at 8:21 AM, Massimo Bressan  
> wrote:
> 
> sorry, but by further looking at the example I just realised that the posted 
> solution it's not completely what I need because in fact I do not need to get 
> back the 'indices' but instead the corrisponding values of column A 
> 
> #please consider this new example 
> 
> t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
>  
> t 
> 
> # I need to get this result 
> r<-data.frame(unique_A=c(123, 345, 678, 
> 789),list_id=c('18,20,27,4','91,54,15','68','26,97')) 
> r 
> 
> # any help for this, please? 
> 
> 
> 
> 
> 
> Da: "Massimo Bressan"  
> A: "r-help"  
> Inviato: Giovedì, 7 giugno 2018 10:09:55 
> Oggetto: Re: aggregate and list elements of variables in data.frame 
> 
> thanks for the help 
> 
> I'm posting here the complete solution 
> 
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
> t$A <- factor(t$A) 
> l<-sapply(levels(t$A), function(x) which(t$A==x)) 
> r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", "))) 
> r<-cbind(unique_A=row.names(r),r) 
> row.names(r)<-NULL 
> r 
> 
> best 
> 
> 
> 
> Da: "Massimo Bressan"  
> A: "r-help"  
> Inviato: Mercoledì, 6 giugno 2018 10:13:10 
> Oggetto: aggregate and list elements of variables in data.frame 
> 
> #given the following reproducible and simplified example 
> 
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
> t 
> 
> #I need to get the following result 
> 
> r<-data.frame(unique_A=c(123, 345, 678, 
> 789),list_id=c('1,3,6,9','2,5,8','4','7,10')) 
> r 
> 
> # i.e. aggregate over the variable "A" and list all elements of the variable 
> "id" satisfying the criteria of having the same corrisponding value of "A" 
> #any help for that? 
> 
> #so far I've just managed to "aggregate" and "count", like: 
> 
> library(sqldf) 
> sqldf('select count(*) as count_id, A as unique_A from t group by A') 
> 
> library(dplyr) 
> t%>%group_by(unique_A=A) %>% summarise(count_id = n()) 
> 
> # thank you 
> 
> 
> -- 
> 
>  
> Massimo Bressan 
> 
> ARPAV 
> Agenzia Regionale per la Prevenzione e 
> Protezione Ambientale del Veneto 
> 
> Dipartimento Provinciale di Treviso 
> Via Santa Barbara, 5/a 
> 31100 Treviso, Italy 
> 
> tel: +39 0422 558545 
> fax: +39 0422 558516 
> e-mail: massimo.bres...@arpa.veneto.it 
>  
> 
> 
> -- 
> 
>  
> Massimo Bressan 
> 
> ARPAV 
> Agenzia Regionale per la Prevenzione e 
> Protezione Ambientale del Veneto 
> 
> Dipartimento Provinciale di Treviso 
> Via Santa Barbara, 5/a 
> 31100 Treviso, Italy 
> 
> tel: +39 0422 558545 
> fax: +39 0422 558516 
> e-mail: massimo.bres...@arpa.veneto.it 
>  
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Ecological Forecasting: https://eco.bigelow.org/






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Ivan Calandra


Using which() to subset t$id should do the trick:

sapply(levels(t$A), function(x) t$id[which(t$A==x)])

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 07/06/2018 14:21, Massimo Bressan wrote:

sorry, but by further looking at the example I just realised that the posted 
solution it's not completely what I need because in fact I do not need to get 
back the 'indices' but instead the corrisponding values of column A

#please consider this new example

t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
t

# I need to get this result
r<-data.frame(unique_A=c(123, 345, 678, 
789),list_id=c('18,20,27,4','91,54,15','68','26,97'))
r

# any help for this, please?





Da: "Massimo Bressan" 
A: "r-help" 
Inviato: Giovedì, 7 giugno 2018 10:09:55
Oggetto: Re: aggregate and list elements of variables in data.frame

thanks for the help

I'm posting here the complete solution

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
t$A <- factor(t$A)
l<-sapply(levels(t$A), function(x) which(t$A==x))
r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", ")))
r<-cbind(unique_A=row.names(r),r)
row.names(r)<-NULL
r

best



Da: "Massimo Bressan" 
A: "r-help" 
Inviato: Mercoledì, 6 giugno 2018 10:13:10
Oggetto: aggregate and list elements of variables in data.frame

#given the following reproducible and simplified example

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
t

#I need to get the following result

r<-data.frame(unique_A=c(123, 345, 678, 
789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
r

# i.e. aggregate over the variable "A" and list all elements of the variable "id" 
satisfying the criteria of having the same corrisponding value of "A"
#any help for that?

#so far I've just managed to "aggregate" and "count", like:

library(sqldf)
sqldf('select count(*) as count_id, A as unique_A from t group by A')

library(dplyr)
t%>%group_by(unique_A=A) %>% summarise(count_id = n())

# thank you




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Massimo Bressan

sorry, but by further looking at the example I just realised that the posted 
solution it's not completely what I need because in fact I do not need to get 
back the 'indices' but instead the corrisponding values of column A 

#please consider this new example 

t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
 
t 

# I need to get this result 
r<-data.frame(unique_A=c(123, 345, 678, 
789),list_id=c('18,20,27,4','91,54,15','68','26,97')) 
r 

# any help for this, please? 





Da: "Massimo Bressan"  
A: "r-help"  
Inviato: Giovedì, 7 giugno 2018 10:09:55 
Oggetto: Re: aggregate and list elements of variables in data.frame 

thanks for the help 

I'm posting here the complete solution 

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
t$A <- factor(t$A) 
l<-sapply(levels(t$A), function(x) which(t$A==x)) 
r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", "))) 
r<-cbind(unique_A=row.names(r),r) 
row.names(r)<-NULL 
r 

best 



Da: "Massimo Bressan"  
A: "r-help"  
Inviato: Mercoledì, 6 giugno 2018 10:13:10 
Oggetto: aggregate and list elements of variables in data.frame 

#given the following reproducible and simplified example 

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
t 

#I need to get the following result 

r<-data.frame(unique_A=c(123, 345, 678, 
789),list_id=c('1,3,6,9','2,5,8','4','7,10')) 
r 

# i.e. aggregate over the variable "A" and list all elements of the variable 
"id" satisfying the criteria of having the same corrisponding value of "A" 
#any help for that? 

#so far I've just managed to "aggregate" and "count", like: 

library(sqldf) 
sqldf('select count(*) as count_id, A as unique_A from t group by A') 

library(dplyr) 
t%>%group_by(unique_A=A) %>% summarise(count_id = n()) 

# thank you 


-- 

 
Massimo Bressan 

ARPAV 
Agenzia Regionale per la Prevenzione e 
Protezione Ambientale del Veneto 

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545 
fax: +39 0422 558516 
e-mail: massimo.bres...@arpa.veneto.it 
 


-- 

 
Massimo Bressan 

ARPAV 
Agenzia Regionale per la Prevenzione e 
Protezione Ambientale del Veneto 

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545 
fax: +39 0422 558516 
e-mail: massimo.bres...@arpa.veneto.it 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-07 Thread Massimo Bressan

thanks for the help 

I'm posting here the complete solution 

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
t$A <- factor(t$A) 
l<-sapply(levels(t$A), function(x) which(t$A==x)) 
r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", "))) 
r<-cbind(unique_A=row.names(r),r) 
row.names(r)<-NULL 
r 

best 



Da: "Massimo Bressan"  
A: "r-help"  
Inviato: Mercoledì, 6 giugno 2018 10:13:10 
Oggetto: aggregate and list elements of variables in data.frame 

#given the following reproducible and simplified example 

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
t 

#I need to get the following result 

r<-data.frame(unique_A=c(123, 345, 678, 
789),list_id=c('1,3,6,9','2,5,8','4','7,10')) 
r 

# i.e. aggregate over the variable "A" and list all elements of the variable 
"id" satisfying the criteria of having the same corrisponding value of "A" 
#any help for that? 

#so far I've just managed to "aggregate" and "count", like: 

library(sqldf) 
sqldf('select count(*) as count_id, A as unique_A from t group by A') 

library(dplyr) 
t%>%group_by(unique_A=A) %>% summarise(count_id = n()) 

# thank you 


-- 

 
Massimo Bressan 

ARPAV 
Agenzia Regionale per la Prevenzione e 
Protezione Ambientale del Veneto 

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545 
fax: +39 0422 558516 
e-mail: massimo.bres...@arpa.veneto.it 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and list elements of variables in data.frame

2018-06-06 Thread Ivan Calandra


Hi Massimo,

Something along those lines could help you I guess:
t$A <- factor(t$A)
sapply(levels(t$A), function(x) which(t$A==x))

You can then play with the output using paste()

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 06/06/2018 10:13, Massimo Bressan wrote:

#given the following reproducible and simplified example

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
t

#I need to get the following result

r<-data.frame(unique_A=c(123, 345, 678, 
789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
r

# i.e. aggregate over the variable "A" and list all elements of the variable "id" 
satisfying the criteria of having the same corrisponding value of "A"
#any help for that?

#so far I've just managed to "aggregate" and "count", like:

library(sqldf)
sqldf('select count(*) as count_id, A as unique_A from t group by A')

library(dplyr)
t%>%group_by(unique_A=A) %>% summarise(count_id = n())

# thank you


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate over multiple and unequal column length data frames

2018-02-28 Thread Ek Esawi

Thank you again Pikal and Bert. Using lapply, as Bert suggested, was
the first thing that i thought of dealing with this question and  was
mentioned in my original posting. I just did not know how to implement
it to get the results/form i want. Below is what i did but could not
get it to give me the results as i want which was given on Pikal's
answer.

nlist <- list(df1$col2,df2$col2, df3$col2)

lapply(nlist, function(x) table(x))
[[1]]
x
aa bb cc dd
 2  1  1  1

[[2]]
x
bb cc
 3  1

[[3]]
x
aa
 2

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate over multiple and unequal column length data frames

2018-02-27 Thread Bert Gunter

Then you need to rethink your data structure. Use a list instead of a data
frame. The components of a list can have different lengths, and the "apply"
family of functions (lapply(), etc.) can operate on them. Consult any good
R tutorial for details.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Feb 27, 2018 at 5:54 AM, Ek Esawi  wrote:

> Thank you Pikal and Bert. My apology for posting parts of my previous
> email in HTML. Bert's suggestion will work but i am wondering if there
> is an alternative
> especially in the case where the data frames are big; that is the
> difference in lengths among them is large. Below is a list of sample
> date frames and desired result.
>
> EK
>
>
> dput(df1<-data.frame(col1=c(1,2,3,4,5),col2=c("aa","aa","bb","cc","dd")))
> dput(df2<-data.frame(col1=c(1,2,4,5),col2=c("bb","bb","cc","bb")))
> dput(df3<-data.frame(col1=c(1,3),col2=c("aa","aa")))
> # desired result
> dput(dfn<-data.frame(col1=c(2,2,1,1),col2=c(0,3,1,0),col3=c(
> 2,0,0,0),row.names
> = c("aa","bb","cc","dd")))
>
> On Fri, Feb 23, 2018 at 7:45 AM, PIKAL Petr 
> wrote:
> > Hi
> >
> > Your example is rather confusing - partly because HTML formating, partly
> because weird coding.
> >
> > You probably could concatenate your data frames e.g. by rbind or merge
> and after that you could try to aggregate them somehow.
> >
> > I could construct example data.frames myself but most probably they
> would be different from yours and also the result would not be necessary
> the same as you expect.
> >
> > You should post those data frames as output from dput(data) and show us
> real desired result from those example data frames.
> >
> > Cheers
> > Petr
> >
> >> -Original Message-
> >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ek
> Esawi
> >> Sent: Wednesday, February 21, 2018 3:34 AM
> >> To: r-help@r-project.org
> >> Subject: [R] Aggregate over multiple and unequal column length data
> frames
> >>
> >>  Hi All--
> >>
> >> I have generated several 2 column data frames with variable length. The
> data
> >> frames have the same column names and variable types. I was trying to
> >> aggregate over the 2nd column for all the date frames, but could not
> figure out
> >> how.
> >>
> >> I thought i could make them all of equal length then combine them in 1
> data
> >> frame where i can use aggregate, the formula version Or to put them in
> a list
> >> and loop use lapply but did not know how to do that and thought there
> might
> >> be a simpler way.
> >>
> >> Below is an example of 3 data frames and the desired result; note that
> some
> >> levels don't appear in all and may be null over all variable, like the
> case of dd
> >> on the desired result which i would like to list all levels even if
> some are all null.
> >>
> >> Thanks in advance,
> >>
> >> EK
> >>
> >>df1   df2  df3
> >>
> >> c1 c2 c1 c2 c1 c2
> >> 1 aa 1 bb 1 aa
> >> 2 aa 2 bb 2 aa
> >> 3 bb 3 cc
> >> 4 cc 4 bb
> >> 5 bb
> >>
> >> desired result
> >>
> >> c1 c2 c2 c2
> >> aa 2 2
> >> bb 1 2 2
> >> cc 1 1
> >> dd
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > 
> > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
> určeny pouze jeho adresátům.
> > Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie
> vymažte ze svého systému.
> > Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
> email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> > Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou
> modifikacemi či zpožděním přenosu e-mailu.
> >
> > V případě, že je tento e-mail součástí obchodního jednání:
> > - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> > - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
> přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze
> strany příjemce s dodatkem či odchylkou.
> > - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
> výslovným dosažením shody na všech jejích náležitostech.
> > - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
> společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn
> nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto
> emailu případně osobě, kterou adresát

Re: [R] Aggregate over multiple and unequal column length data frames

2018-02-27 Thread PIKAL Petr

Hi

you does not need this:
dput(df1<-data.frame(col1=c(1,2,3,4,5),col2=c("aa","aa","bb","cc","dd")))

this is enough to represent exactly any object to mail:
> dput(df1)
structure(list(col1 = c(1, 2, 3, 4, 5), col2 = structure(c(1L,
1L, 2L, 3L, 4L), .Label = c("aa", "bb", "cc", "dd"), class = "factor")), .Names 
= c("col1",
"col2"), row.names = c(NA, -5L), class = "data.frame")

Anyway (I am sure there is better way but)
# make list from your data frames
lll<-list(df1, df2, df3)

#add new column to each part
for (i in 1:3) lll[[i]]$n<-i

#concatenate data to one data frame
dat<-do.call(rbind, lll)

# make table (and data frame from it)
data.frame(unclass(table(dat$col2, dat$n)))
   X1 X2 X3
aa  2  0  2
bb  1  3  0
cc  1  1  0
dd  1  0  0

I believe that in your dfn is typo in second row and first column and that with 
your 3 data.frames the result should be 1.

Cheers
Petr

> -Original Message-
> From: Ek Esawi [mailto:esaw...@gmail.com]
> Sent: Tuesday, February 27, 2018 2:54 PM
> To: PIKAL Petr <petr.pi...@precheza.cz>; r-help@r-project.org
> Subject: Re: [R] Aggregate over multiple and unequal column length data
> frames
>
> Thank you Pikal and Bert. My apology for posting parts of my previous email in
> HTML. Bert's suggestion will work but i am wondering if there is an 
> alternative
> especially in the case where the data frames are big; that is the difference 
> in
> lengths among them is large. Below is a list of sample date frames and desired
> result.
>
> EK
>
>
> dput(df1<-data.frame(col1=c(1,2,3,4,5),col2=c("aa","aa","bb","cc","dd")))
> dput(df2<-data.frame(col1=c(1,2,4,5),col2=c("bb","bb","cc","bb")))
> dput(df3<-data.frame(col1=c(1,3),col2=c("aa","aa")))
> # desired result
> dput(dfn<-data.frame(col1=c(2,2,1,1),col2=c(0,3,1,0),col3=c(2,0,0,0),row.names
> = c("aa","bb","cc","dd")))
>
> On Fri, Feb 23, 2018 at 7:45 AM, PIKAL Petr <petr.pi...@precheza.cz> wrote:
> > Hi
> >
> > Your example is rather confusing - partly because HTML formating, partly
> because weird coding.
> >
> > You probably could concatenate your data frames e.g. by rbind or merge and
> after that you could try to aggregate them somehow.
> >
> > I could construct example data.frames myself but most probably they would
> be different from yours and also the result would not be necessary the same as
> you expect.
> >
> > You should post those data frames as output from dput(data) and show us
> real desired result from those example data frames.
> >
> > Cheers
> > Petr
> >
> >> -Original Message-
> >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ek
> >> Esawi
> >> Sent: Wednesday, February 21, 2018 3:34 AM
> >> To: r-help@r-project.org
> >> Subject: [R] Aggregate over multiple and unequal column length data
> >> frames
> >>
> >>  Hi All--
> >>
> >> I have generated several 2 column data frames with variable length.
> >> The data frames have the same column names and variable types. I was
> >> trying to aggregate over the 2nd column for all the date frames, but
> >> could not figure out how.
> >>
> >> I thought i could make them all of equal length then combine them in
> >> 1 data frame where i can use aggregate, the formula version Or to put
> >> them in a list and loop use lapply but did not know how to do that
> >> and thought there might be a simpler way.
> >>
> >> Below is an example of 3 data frames and the desired result; note
> >> that some levels don't appear in all and may be null over all
> >> variable, like the case of dd on the desired result which i would like to 
> >> list all
> levels even if some are all null.
> >>
> >> Thanks in advance,
> >>
> >> EK
> >>
> >>df1   df2  df3
> >>
> >> c1 c2 c1 c2 c1 c2
> >> 1 aa 1 bb 1 aa
> >> 2 aa 2 bb 2 aa
> >> 3 bb 3 cc
> >> 4 cc 4 bb
> >> 5 bb
> >>
> >> desired result
> >>
> >> c1 c2 c2 c2
> >> aa 2 2
> >> bb 1 2 2
> >> cc 1 1
> >> dd
> >>
> >> [[alternative HTML version deleted]]
> >>


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho ad

Re: [R] Aggregate over multiple and unequal column length data frames

2018-02-27 Thread Ek Esawi

Thank you Pikal and Bert. My apology for posting parts of my previous
email in HTML. Bert's suggestion will work but i am wondering if there
is an alternative
especially in the case where the data frames are big; that is the
difference in lengths among them is large. Below is a list of sample
date frames and desired result.

EK


dput(df1<-data.frame(col1=c(1,2,3,4,5),col2=c("aa","aa","bb","cc","dd")))
dput(df2<-data.frame(col1=c(1,2,4,5),col2=c("bb","bb","cc","bb")))
dput(df3<-data.frame(col1=c(1,3),col2=c("aa","aa")))
# desired result
dput(dfn<-data.frame(col1=c(2,2,1,1),col2=c(0,3,1,0),col3=c(2,0,0,0),row.names
= c("aa","bb","cc","dd")))

On Fri, Feb 23, 2018 at 7:45 AM, PIKAL Petr  wrote:
> Hi
>
> Your example is rather confusing - partly because HTML formating, partly 
> because weird coding.
>
> You probably could concatenate your data frames e.g. by rbind or merge and 
> after that you could try to aggregate them somehow.
>
> I could construct example data.frames myself but most probably they would be 
> different from yours and also the result would not be necessary the same as 
> you expect.
>
> You should post those data frames as output from dput(data) and show us real 
> desired result from those example data frames.
>
> Cheers
> Petr
>
>> -Original Message-
>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ek Esawi
>> Sent: Wednesday, February 21, 2018 3:34 AM
>> To: r-help@r-project.org
>> Subject: [R] Aggregate over multiple and unequal column length data frames
>>
>>  Hi All--
>>
>> I have generated several 2 column data frames with variable length. The data
>> frames have the same column names and variable types. I was trying to
>> aggregate over the 2nd column for all the date frames, but could not figure 
>> out
>> how.
>>
>> I thought i could make them all of equal length then combine them in 1 data
>> frame where i can use aggregate, the formula version Or to put them in a list
>> and loop use lapply but did not know how to do that and thought there might
>> be a simpler way.
>>
>> Below is an example of 3 data frames and the desired result; note that some
>> levels don't appear in all and may be null over all variable, like the case 
>> of dd
>> on the desired result which i would like to list all levels even if some are 
>> all null.
>>
>> Thanks in advance,
>>
>> EK
>>
>>df1   df2  df3
>>
>> c1 c2 c1 c2 c1 c2
>> 1 aa 1 bb 1 aa
>> 2 aa 2 bb 2 aa
>> 3 bb 3 cc
>> 4 cc 4 bb
>> 5 bb
>>
>> desired result
>>
>> c1 c2 c2 c2
>> aa 2 2
>> bb 1 2 2
>> cc 1 1
>> dd
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> 
> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
> určeny pouze jeho adresátům.
> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
> jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
> svého systému.
> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
> jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
> zpožděním přenosu e-mailu.
>
> V případě, že je tento e-mail součástí obchodního jednání:
> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, 
> a to z jakéhokoliv důvodu i bez uvedení důvodu.
> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
> Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany 
> příjemce s dodatkem či odchylkou.
> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
> dosažením shody na všech jejích náležitostech.
> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
> žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
> pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu 
> případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je 
> adresátovi či osobě jím zastoupené známá.
>
> This e-mail and any documents attached to it may be confidential and are 
> intended only for its intended recipients.
> If you received this e-mail by mistake, please immediately inform its sender. 
> Delete the contents of this e-mail with all attachments and its copies from 
> your system.
> If you are not the intended recipient of this e-mail, you are not authorized 
> to use, disseminate, copy or disclose this e-mail in any manner.
> The sender of this e-mail shall not be liable for any possible damage caused 
> by modifications of the e-mail or by delay with transfer of the email.
>
> In case

Re: [R] Aggregate over multiple and unequal column length data frames

2018-02-23 Thread PIKAL Petr

Hi

Your example is rather confusing - partly because HTML formating, partly 
because weird coding.

You probably could concatenate your data frames e.g. by rbind or merge and 
after that you could try to aggregate them somehow.

I could construct example data.frames myself but most probably they would be 
different from yours and also the result would not be necessary the same as you 
expect.

You should post those data frames as output from dput(data) and show us real 
desired result from those example data frames.

Cheers
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ek Esawi
> Sent: Wednesday, February 21, 2018 3:34 AM
> To: r-help@r-project.org
> Subject: [R] Aggregate over multiple and unequal column length data frames
>
>  Hi All--
>
> I have generated several 2 column data frames with variable length. The data
> frames have the same column names and variable types. I was trying to
> aggregate over the 2nd column for all the date frames, but could not figure 
> out
> how.
>
> I thought i could make them all of equal length then combine them in 1 data
> frame where i can use aggregate, the formula version Or to put them in a list
> and loop use lapply but did not know how to do that and thought there might
> be a simpler way.
>
> Below is an example of 3 data frames and the desired result; note that some
> levels don't appear in all and may be null over all variable, like the case 
> of dd
> on the desired result which i would like to list all levels even if some are 
> all null.
>
> Thanks in advance,
>
> EK
>
>df1   df2  df3
>
> c1 c2 c1 c2 c1 c2
> 1 aa 1 bb 1 aa
> 2 aa 2 bb 2 aa
> 3 bb 3 cc
> 4 cc 4 bb
> 5 bb
>
> desired result
>
> c1 c2 c2 c2
> aa 2 2
> bb 1 2 2
> cc 1 1
> dd
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or

Re: [R] Aggregate over multiple and unequal column length data frames

2018-02-20 Thread Bert Gunter

All columns in a data.frame **must** have the same length. So you cannot do
this unless empty values are filled with missings (NA's).

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Feb 20, 2018 at 6:33 PM, Ek Esawi  wrote:

>  Hi All--
>
> I have generated several 2 column data frames with variable length. The
> data frames have the same column names and variable types. I was trying to
> aggregate over the 2nd column for all the date frames, but could not figure
> out how.
>
> I thought i could make them all of equal length then combine them in 1 data
> frame where i can use aggregate, the formula version
> Or to put them in a list and loop use lapply but did not know how to do
> that and thought there might be a simpler way.
>
> Below is an example of 3 data frames and the desired result; note that some
> levels don't appear in all and may be null over all variable, like the case
> of dd on the desired result which i would like to list all levels even if
> some are all null.
>
> Thanks in advance,
>
> EK
>
>df1   df2  df3
>
> c1 c2 c1 c2 c1 c2
> 1 aa 1 bb 1 aa
> 2 aa 2 bb 2 aa
> 3 bb 3 cc
> 4 cc 4 bb
> 5 bb
>
> desired result
>
> c1 c2 c2 c2
> aa 2 2
> bb 1 2 2
> cc 1 1
> dd
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate behaviour inconsistent (?) when FUN=table

2018-02-06 Thread Alain Guillet

Thank you for your response. Note that with R 3.4.3, I get the same 
result with simplify=TRUE or simplify=FALSE.

My problem was the behaviour was different if I define my columns as 
character or as numeric but for now some minutes I discovered there also 
is a stringsAsFactors option in the function data.frame. So yes, it was 
a stupid question and I apologize for it.

On 06/02/2018 18:07, William Dunlap wrote:

Don't use aggregate's simplify=TRUE when FUN() produces return
values of various dimensions.  In your case, the shape of table(subset)'s
return value depends on the number of levels in the factor 'subset'.
If you make B a factor before splitting it by C, each split will have the
same number of levels (2).  If you split it and then let table convert
each split to a factor, one split will have 1 level and the other 2.  
To see

the details of the output , use str() instead of print().

Bill Dunlap
TIBCO Software
wdunlap tibco.com 

On Tue, Feb 6, 2018 at 12:20 AM, Alain Guillet 
> wrote:

Dear R users,

When I use aggregate with table as FUN, I get what I would call a
strange behaviour if it involves numerical vectors and one "level"
of it is not present for every "levels" of the "by" variable:

---

> df <-
data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
> aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
  Group.1 A.0 A.1    B
1   0   1   2    3
2   1   3   2 2, 3

> table(df$C,df$B)

    0 1
  0 3 0
  1 2 3

---

As you can see, a comma appears in the column with the variable B
in the aggregate whereas when I call table I obtain the same
result as if B was defined as a factor (I suppose it comes from
the fact "non-factor arguments a are coerced via factor" according
to the details of the table help). I find it completely normal if
I remember that aggregate first splits the data into subsets and
then compute the table. But then I don't understand why it works
differently with character vectors. Indeed if I use character
vectors, I get the same result as with factors:

> df <-

data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
> aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
  Group.1 A.0 A.1 B.0 B.1
1   0   1   2   3   0
2   1   3   2   2   3

> df <-

data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
> aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
  Group.1 A.0 A.1 B.0 B.1
1   0   1   2   3   0
2   1   3   2   2   3

-

Is it possible to precise anything about this behaviour in the
aggregate help since the result is not completely compatible with
the expectation of result we can have according to the table help?
Or would it be possible to have the same results independently of
the vector type? This post was rejected on the R-devel mailing
list so I ask my question here as suggested.

Best regards,
Alain Guillet

-- 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate behaviour inconsistent (?) when FUN=table

2018-02-06 Thread William Dunlap via R-help

Don't use aggregate's simplify=TRUE when FUN() produces return
values of various dimensions.  In your case, the shape of table(subset)'s
return value depends on the number of levels in the factor 'subset'.
If you make B a factor before splitting it by C, each split will have the
same number of levels (2).  If you split it and then let table convert
each split to a factor, one split will have 1 level and the other 2.  To see
the details of the output , use str() instead of print().


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Feb 6, 2018 at 12:20 AM, Alain Guillet 
wrote:

> Dear R users,
>
> When I use aggregate with table as FUN, I get what I would call a strange
> behaviour if it involves numerical vectors and one "level" of it is not
> present for every "levels" of the "by" variable:
>
> ---
>
> > df <- data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0
> ,1,0,0,1,1,1))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1B
> 1   0   1   23
> 2   1   3   2 2, 3
>
> > table(df$C,df$B)
>
> 0 1
>   0 3 0
>   1 2 3
>
> ---
>
> As you can see, a comma appears in the column with the variable B in the
> aggregate whereas when I call table I obtain the same result as if B was
> defined as a factor (I suppose it comes from the fact "non-factor arguments
> a are coerced via factor" according to the details of the table help). I
> find it completely normal if I remember that aggregate first splits the
> data into subsets and then compute the table. But then I don't understand
> why it works differently with character vectors. Indeed if I use character
> vectors, I get the same result as with factors:
>
> 
>
> > df <- data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=fa
> ctor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0",
> "1","0","0","1","1","1")))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
> 1   0   1   2   3   0
> 2   1   3   2   2   3
>
> > df <- data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0
> ,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
> 1   0   1   2   3   0
> 2   1   3   2   2   3
>
> -
>
> Is it possible to precise anything about this behaviour in the aggregate
> help since the result is not completely compatible with the expectation of
> result we can have according to the table help? Or would it be possible to
> have the same results independently of the vector type? This post was
> rejected on the R-devel mailing list so I ask my question here as suggested.
>
>
> Best regards,
> Alain Guillet
>
> --
> Alain Guillet
> Statistician and Computer Scientist
>
> SMCS - IMMAQ - Université catholique de Louvain
> http://www.uclouvain.be/smcs
>
> Bureau c.316
> Voie du Roman Pays, 20 (bte L1.04.01)
> B-1348 Louvain-la-Neuve
> Belgium
>
> Tel: +32 10 47 30 50
>
> Accès: http://www.uclouvain.be/323631.html
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate behaviour inconsistent (?) when FUN=table

2018-02-06 Thread Jeff Newmiller

The normal input to a factory that builds cars is car parts. Feeding whole 
trucks into such a factory is likely to yield odd-looking results.

Both aggregate and table do similar kinds of things, but yield differently 
constructed outputs. The output of the table function is not well-suited to be 
used as the aggregated value to be compiled into a data frame by the aggregate 
function, so having aggregate call the table function will yield surprises.

I am having some difficulty deciphering what it is you are trying to accomplish 
with all this, so I will guess that you are trying to reproduce the information 
output from

table( df$C, df$B )

so

aggregate( df$A, df[ , c( "C", "B" ) ], length )

but if that isn't what you want then perhaps you can clarify what result you 
want to see and we can help you get there. 
-- 
Sent from my phone. Please excuse my brevity.

On February 6, 2018 12:20:03 AM PST, Alain Guillet  
wrote:
>Dear R users,
>
>When I use aggregate with table as FUN, I get what I would call a 
>strange behaviour if it involves numerical vectors and one "level" of
>it 
>is not present for every "levels" of the "by" variable:
>
>---
>
> > df <- 
>data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1    B
>1   0   1   2    3
>2   1   3   2 2, 3
>
> > table(df$C,df$B)
>
>     0 1
>   0 3 0
>   1 2 3
>
>---
>
>As you can see, a comma appears in the column with the variable B in
>the 
>aggregate whereas when I call table I obtain the same result as if B
>was 
>defined as a factor (I suppose it comes from the fact "non-factor 
>arguments a are coerced via factor" according to the details of the 
>table help). I find it completely normal if I remember that aggregate 
>first splits the data into subsets and then compute the table. But then
>
>I don't understand why it works differently with character vectors. 
>Indeed if I use character vectors, I get the same result as with
>factors:
>
>
>
> > df <- 
>data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
>1   0   1   2   3   0
>2   1   3   2   2   3
>
> > df <- 
>data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
>1   0   1   2   3   0
>2   1   3   2   2   3
>
>-
>
>Is it possible to precise anything about this behaviour in the
>aggregate 
>help since the result is not completely compatible with the expectation
>
>of result we can have according to the table help? Or would it be 
>possible to have the same results independently of the vector type?
>This 
>post was rejected on the R-devel mailing list so I ask my question here
>
>as suggested.
>
>
>Best regards,
>Alain Guillet

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate counts of observations with times surrounding a time?

2017-05-16 Thread Jim Lemon

Hi again,
Here is a version cleaned up a bit. Too tired to do it last night.

mndf<-data.frame(st=seq(1483360938,by=1700,length=10),
 et=seq(1483362938,by=1700,length=10),
 store=c(rep("gap",5),rep("starbucks",5)),
 zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),
 store_id=seq(50,59))
# orders the times and calculates number of simultaneous presences
count_simult<-function(x) {
 nrows<-dim(x)[1]
 timeorder<-order(unlist(mndf[1:nrows,c("st","et")]))
 # initialize result data frame - first time always has a value of 1
 interval_counts<-data.frame(time=c(x$st,x$et)[timeorder],
  startfin=rep(c("st","et"),each=5)[timeorder],
  count=c(1,rep(0,nrows-1)))
 for(i in 2:(nrows*2)) {
  interval_counts[i,"count"]<-
   interval_counts[i-1,"count"]+
   ifelse(interval_counts[i,"startfin"]=="st",1,-1)
 }
 return(interval_counts)
}
gap_counts<-count_simult(mndf[mndf$store=="gap",])
plot(gap_counts$time,gap_counts$count,type="l")
starbucks_counts<-count_simult(mndf[mndf$store=="starbucks",])
plot(starbucks_counts$time,gap_counts$count,type="l")

Jim


On Tue, May 16, 2017 at 7:43 PM, Jim Lemon  wrote:
> Hi Mark,
> I think you might want something like this:
>
> mndf<-data.frame(st=seq(1483360938,by=1700,length=10),
>  et=seq(1483362938,by=1700,length=10),
>  store=c(rep("gap",5),rep("starbucks",5)),
>  zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),
>  store_id=seq(50,59))
> # orders the times and calculates number of simultaneous presences
> count_simult<-function(x) {
>  nrows<-dim(x)[1]
>  timeorder<-order(unlist(mndf[1:nrows,c("st","et")]))
>  interval_counts<-data.frame(time=c(x$st,x$et)[timeorder],
>   startfin=rep(c("st","et"),each=5)[timeorder],count=rep(NA,10))
>  interval_counts[1,"count"]<-1
>  for(i in 2:(nrows*2)) {
>   interval_counts[i,"count"]<-
>interval_counts[i-1,"count"]+
>ifelse(interval_counts[i,"startfin"]=="st",1,-1)
>  }
>  return(interval_counts)
> }
> gap_counts<-count_simult(mndf[1:5,])
> plot(gap_counts$time,gap_counts$count,type="l")
> starbucks_counts<-count_simult(mndf[6:10,])
> plot(starbucks_counts$time,gap_counts$count,type="l")
>
> There are a lot of ways to plot the counts by time. If you have any
> preferences, let me know.
>
> Jim
>
>
> On Tue, May 16, 2017 at 2:48 PM, Mark Noworolski  wrote:
>> I have a data frame that has a set of observed dwell times at a set of
>> locations. The metadata for the locations includes things that have varying
>> degrees of specificity. I'm interested in tracking the number of people
>> present at a given time in a given store, type of store, or zip code.
>>
>> Here's an example of some sample data (here st=start_time, and et=end_time):
>> data.frame(st=seq(1483360938,by=1700,length=10),et=seq(1483362938,by=1700,length=10),store=c(rep("gap",5),rep("starbucks",5)),zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),store_id=seq(50,59))
>>st et store   zip store_id
>> 1  1483360938 1483362938   gap 94000   50
>> 2  1483362638 1483364638   gap 94000   51
>> 3  1483364338 1483366338   gap 94100   52
>> 4  1483366038 1483368038   gap 94100   53
>> 5  1483367738 1483369738   gap 94200   54
>> 6  1483369438 1483371438 starbucks 94000   55
>> 7  1483371138 1483373138 starbucks 94000   56
>> 8  1483372838 1483374838 starbucks 94100   57
>> 9  1483374538 1483376538 starbucks 94100   58
>> 10 1483376238 1483378238 starbucks 94200   59
>>
>> I'd like to be able to:
>> a) create aggretages of the number of people present in each store_id at a
>> given time
>> b) create aggregates of the number of people present - grouped by zip or
>> store
>>
>> I expect to be rolling up to hour or half hour buckets, but I don't think I
>> should have to decide this up front and be able to do something clever to
>> be able to use ggplot + some other library to plot the time evolution of
>> this information, rolled up the way I want.
>>
>> Any clever solutions? I've trolled stackoverflow and this email list.. to
>> no avail - but I'm willing to acknowledge I may have missed something.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate counts of observations with times surrounding a time?

2017-05-16 Thread Jim Lemon

Hi Mark,
I think you might want something like this:

mndf<-data.frame(st=seq(1483360938,by=1700,length=10),
 et=seq(1483362938,by=1700,length=10),
 store=c(rep("gap",5),rep("starbucks",5)),
 zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),
 store_id=seq(50,59))
# orders the times and calculates number of simultaneous presences
count_simult<-function(x) {
 nrows<-dim(x)[1]
 timeorder<-order(unlist(mndf[1:nrows,c("st","et")]))
 interval_counts<-data.frame(time=c(x$st,x$et)[timeorder],
  startfin=rep(c("st","et"),each=5)[timeorder],count=rep(NA,10))
 interval_counts[1,"count"]<-1
 for(i in 2:(nrows*2)) {
  interval_counts[i,"count"]<-
   interval_counts[i-1,"count"]+
   ifelse(interval_counts[i,"startfin"]=="st",1,-1)
 }
 return(interval_counts)
}
gap_counts<-count_simult(mndf[1:5,])
plot(gap_counts$time,gap_counts$count,type="l")
starbucks_counts<-count_simult(mndf[6:10,])
plot(starbucks_counts$time,gap_counts$count,type="l")

There are a lot of ways to plot the counts by time. If you have any
preferences, let me know.

Jim


On Tue, May 16, 2017 at 2:48 PM, Mark Noworolski  wrote:
> I have a data frame that has a set of observed dwell times at a set of
> locations. The metadata for the locations includes things that have varying
> degrees of specificity. I'm interested in tracking the number of people
> present at a given time in a given store, type of store, or zip code.
>
> Here's an example of some sample data (here st=start_time, and et=end_time):
> data.frame(st=seq(1483360938,by=1700,length=10),et=seq(1483362938,by=1700,length=10),store=c(rep("gap",5),rep("starbucks",5)),zip=c(94000,94000,94100,94100,94200,94000,94000,94100,94100,94200),store_id=seq(50,59))
>st et store   zip store_id
> 1  1483360938 1483362938   gap 94000   50
> 2  1483362638 1483364638   gap 94000   51
> 3  1483364338 1483366338   gap 94100   52
> 4  1483366038 1483368038   gap 94100   53
> 5  1483367738 1483369738   gap 94200   54
> 6  1483369438 1483371438 starbucks 94000   55
> 7  1483371138 1483373138 starbucks 94000   56
> 8  1483372838 1483374838 starbucks 94100   57
> 9  1483374538 1483376538 starbucks 94100   58
> 10 1483376238 1483378238 starbucks 94200   59
>
> I'd like to be able to:
> a) create aggretages of the number of people present in each store_id at a
> given time
> b) create aggregates of the number of people present - grouped by zip or
> store
>
> I expect to be rolling up to hour or half hour buckets, but I don't think I
> should have to decide this up front and be able to do something clever to
> be able to use ggplot + some other library to plot the time evolution of
> this information, rolled up the way I want.
>
> Any clever solutions? I've trolled stackoverflow and this email list.. to
> no avail - but I'm willing to acknowledge I may have missed something.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate data to lower resolution

2017-01-19 Thread Adams, Jean

Milu,

To get the quickest help and keep everyone in the loop, you should cc the
help list.

I don't understand your question.  If you want the mean GDP use the mean
function, if you want the sum of the GDP use the sum function.

Jean

On Fri, Jan 13, 2017 at 5:33 PM, Miluji Sb  wrote:

> Dear Jean,
>
> Greetings of the new year. Hope you are doing well.
>
> I apologise for writing to you off-list but this might be a really silly
> question and I wanted to clarify without bothering everyone. Would kindly
> help me out? Hope this is not too much of a bother.
>
> My original question was regarding aggregating data to 1 degree x 1
> degree. You had kindly provided the following solution:
>
> temp$long1 <- floor(temp$longitude)
> temp$lat1 <- floor(temp$latitude)
> temp1 <- aggregate(GDP ~ long1 + lat1, temp, mean)
>
> Everything works well, my only question (and confusion) is that for
> aggregating from 0.5 degree by 0.5 degree to 1 degree by 1 degree, should
> we use sum instead of mean?
>
> temp1 <- aggregate(GDP ~ long1 + lat1, temp, sum)
>
> I really hope I'm not bothering you too much. Thanks again.
>
> Sincerely,
>
> Milu
>
> On Fri, Jul 22, 2016 at 3:06 PM, Adams, Jean  wrote:
>
>> Milu,
>>
>> Perhaps an approach like this would work.  In the example below, I
>> calculate the mean GDP for each 1 degree by 1 degree.
>>
>> temp$long1 <- floor(temp$longitude)
>> temp$lat1 <- floor(temp$latitude)
>> temp1 <- aggregate(GDP ~ long1 + lat1, temp, mean)
>>
>>   long1 lat1GDP
>> 1   -69  -55 0.90268640
>> 2   -68  -55 0.09831317
>> 3   -72  -54 0.22379000
>> 4   -71  -54 0.14067290
>> 5   -70  -54 0.00300380
>> 6   -69  -54 0.00574220
>>
>> Jean
>>
>> On Thu, Jul 21, 2016 at 3:57 PM, Miluji Sb  wrote:
>>
>>> Dear all,
>>>
>>> I have the following GDP data by latitude and longitude at 0.5 degree by
>>> 0.5 degree.
>>>
>>> temp <- dput(head(ptsDF,10))
>>> structure(list(longitude = c(-68.25, -67.75, -67.25, -68.25,
>>> -67.75, -67.25, -71.25, -70.75, -69.25, -68.75), latitude = c(-54.75,
>>> -54.75, -54.75, -54.25, -54.25, -54.25, -53.75, -53.75, -53.75,
>>> -53.75), GDP = c(1.683046, 0.3212307, 0.0486207, 0.1223268, 0.0171909,
>>> 0.0062104, 0.22379, 0.1406729, 0.0030038, 0.0057422)), .Names =
>>> c("longitude",
>>> "latitude", "GDP"), row.names = c(4L, 17L, 30L, 43L, 56L, 69L,
>>> 82L, 95L, 108L, 121L), class = "data.frame")
>>>
>>> I would like to aggregate the data 1 degree by 1 degree. I understand
>>> that
>>> the first step is to convert to raster. I have tried:
>>>
>>> rasterDF <- rasterFromXYZ(temp)
>>> r <- aggregate(rasterDF,fact=2, fun=sum)
>>>
>>> But this does not seem to work. Could anyone help me out please? Thank
>>> you
>>> in advance.
>>>
>>> Sincerely,
>>>
>>> Milu
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate dataframe by multiple factors

2016-11-20 Thread Karim Mezhoud

ocument
> about "I" argument for aggregate function.
> > >>>
> > >>> How can know which value for which date?
> > >>
> > >> You asked for a functional reshaping that did not have the date. Now
> you want the date? There was no date in the result you indicated was
> desired.   ??
> > >>
> > >> --
> > >> David.
> > >>
> > >>>
> > >>> I will save the reshaping/ordering dataframe  for later use.
> > >>> Many Thanks,
> > >>> Karim
> > >>>
> > >>>
> > >>> On Fri, Nov 18, 2016 at 11:34 AM, PIKAL Petr <petr.pi...@precheza.cz>
> wrote:
> > >>> Hi
> > >>>
> > >>> same result can be achieved by
> > >>>
> > >>> dat.ag<-aggregate(dat[ , c("DCE","DP")], by= list(dat$first.Name,
> dat$Name, dat$Department) , "I")
> > >>>
> > >>> Sorting according to the first row seems to be quite tricky. You
> could probably get closer by using some combination of split and order and
> arranging back chunks  of data
> > >>>
> > >>> ooo1<-order(split(dat$DCE,interaction(dat$first.Name, dat$Name,
> dat$Department, drop=T))[[1]])
> > >>> data.frame(sapply(split(dat$DCE,interaction(dat$first.Name,
> dat$Name, dat$Department, drop=T)), rbind))[ooo1,]
> > >>>  Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
> > >>> 2   0.28  NA NA
> > >>> 4   0.28  NA NA
> > >>> 1   0.540.59   0.57
> > >>> 3   0.540.59   0.57
> > >>>
> > >>> however I wonder why the order according to the first row is
> necessary if all NAs are on correct positions?
> > >>>
> > >>> Cheers
> > >>> Petr
> > >>>
> > >>>
> > >>>> -Original Message-
> > >>>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> David
> > >>>> Winsemius
> > >>>> Sent: Friday, November 18, 2016 9:30 AM
> > >>>> To: Karim Mezhoud <kmezh...@gmail.com>
> > >>>> Cc: r-help@r-project.org
> > >>>> Subject: Re: [R] aggregate dataframe by multiple factors
> > >>>>
> > >>>>
> > >>>>> On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezh...@gmail.com>
> > >>>> wrote:
> > >>>>>
> > >>>>> Dear all,
> > >>>>>
> > >>>>> the dat  has missing values NA,
> > >>>>>
> > >>>>>   first.Name   Name Department  DCE   DP   date
> > >>>>> 5  Auction VideosYME 0.57 0.56 2013-09-30
> > >>>>> 18   Amish  WivesTAS 0.59 0.56 2013-09-30
> > >>>>> 34 Ancient NationQLH 0.54 0.58 2013-09-30
> > >>>>> 53 Auction VideosYME   NA   NA 2013-12-28
> > >>>>> 66   Amish  WivesTAS   NA   NA 2013-12-28
> > >>>>> 82 Ancient NationQLH 0.28 0.29 2013-12-28
> > >>>>> 102Auction VideosYME 0.57 0.56 2014-03-30
> > >>>>> 115  Amish  WivesTAS 0.59 0.56 2014-03-30
> > >>>>> 131Ancient NationQLH 0.54 0.58 2014-03-30
> > >>>>> 150Auction VideosYME   NA   NA 2014-06-28
> > >>>>> 163  Amish  WivesTAS   NA   NA 2014-06-28
> > >>>>> 179Ancient NationQLH 0.28 0.29 2014-06-28
> > >>>>>
> > >>>>>
> > >>>>> agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > >>>>> list(dat$first.Name, dat$Name, dat$Department) , "sort"))
> > >>>>
> > >>>> The closest I could get on a few attempts was:
> > >>>>
> > >>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > >>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> unlist(d)}))
> > >>>> )
> > >>>>
> > >>>>  Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
> > &g

Re: [R] aggregate dataframe by multiple factors

2016-11-20 Thread David Winsemius

ping/ordering dataframe  for later use.
> >>> Many Thanks,
> >>> Karim
> >>>
> >>>
> >>> On Fri, Nov 18, 2016 at 11:34 AM, PIKAL Petr <petr.pi...@precheza.cz> 
> >>> wrote:
> >>> Hi
> >>>
> >>> same result can be achieved by
> >>>
> >>> dat.ag<-aggregate(dat[ , c("DCE","DP")], by= list(dat$first.Name, 
> >>> dat$Name, dat$Department) , "I")
> >>>
> >>> Sorting according to the first row seems to be quite tricky. You could 
> >>> probably get closer by using some combination of split and order and 
> >>> arranging back chunks  of data
> >>>
> >>> ooo1<-order(split(dat$DCE,interaction(dat$first.Name, dat$Name, 
> >>> dat$Department, drop=T))[[1]])
> >>> data.frame(sapply(split(dat$DCE,interaction(dat$first.Name, dat$Name, 
> >>> dat$Department, drop=T)), rbind))[ooo1,]
> >>>  Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
> >>> 2   0.28  NA NA
> >>> 4   0.28  NA NA
> >>> 1   0.540.59   0.57
> >>> 3   0.540.59   0.57
> >>>
> >>> however I wonder why the order according to the first row is necessary if 
> >>> all NAs are on correct positions?
> >>>
> >>> Cheers
> >>> Petr
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
> >>>> Winsemius
> >>>> Sent: Friday, November 18, 2016 9:30 AM
> >>>> To: Karim Mezhoud <kmezh...@gmail.com>
> >>>> Cc: r-help@r-project.org
> >>>> Subject: Re: [R] aggregate dataframe by multiple factors
> >>>>
> >>>>
> >>>>> On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezh...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Dear all,
> >>>>>
> >>>>> the dat  has missing values NA,
> >>>>>
> >>>>>   first.Name   Name Department  DCE   DP   date
> >>>>> 5  Auction VideosYME 0.57 0.56 2013-09-30
> >>>>> 18   Amish  WivesTAS 0.59 0.56 2013-09-30
> >>>>> 34 Ancient NationQLH 0.54 0.58 2013-09-30
> >>>>> 53 Auction VideosYME   NA   NA 2013-12-28
> >>>>> 66   Amish  WivesTAS   NA   NA 2013-12-28
> >>>>> 82 Ancient NationQLH 0.28 0.29 2013-12-28
> >>>>> 102Auction VideosYME 0.57 0.56 2014-03-30
> >>>>> 115  Amish  WivesTAS 0.59 0.56 2014-03-30
> >>>>> 131Ancient NationQLH 0.54 0.58 2014-03-30
> >>>>> 150Auction VideosYME   NA   NA 2014-06-28
> >>>>> 163  Amish  WivesTAS   NA   NA 2014-06-28
> >>>>> 179Ancient NationQLH 0.28 0.29 2014-06-28
> >>>>>
> >>>>>
> >>>>> agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> >>>>> list(dat$first.Name, dat$Name, dat$Department) , "sort"))
> >>>>
> >>>> The closest I could get on a few attempts was:
> >>>>
> >>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> >>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) { 
> >>>> unlist(d)}))
> >>>> )
> >>>>
> >>>>  Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
> >>>> 1 Ancient  Nation QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58 0.29
> >>>> 2   Amish   Wives TAS  0.59NA  0.59NA 0.56   NA 0.56   NA
> >>>> 3 Auction  Videos YME  0.57NA  0.57NA 0.56   NA 0.56   NA
> >>>>
> >>>> I think the sort operation might be somewhat ambiguous in this instance. 
> >>>> I
> >>>> tried:
> >>>>
> >>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> >>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> >>>> unlist(lapply(d,sort))}))
> >>>> )
> >>>>
>

Re: [R] aggregate dataframe by multiple factors

2016-11-20 Thread Karim Mezhoud

 c("DCE","DP")], by= list(dat$first.Name,
> dat$Name, dat$Department) , "I")
> >>>
> >>> Sorting according to the first row seems to be quite tricky. You could
> probably get closer by using some combination of split and order and
> arranging back chunks  of data
> >>>
> >>> ooo1<-order(split(dat$DCE,interaction(dat$first.Name, dat$Name,
> dat$Department, drop=T))[[1]])
> >>> data.frame(sapply(split(dat$DCE,interaction(dat$first.Name, dat$Name,
> dat$Department, drop=T)), rbind))[ooo1,]
> >>>  Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
> >>> 2   0.28  NA NA
> >>> 4   0.28  NA NA
> >>> 1   0.540.59   0.57
> >>> 3   0.540.59   0.57
> >>>
> >>> however I wonder why the order according to the first row is necessary
> if all NAs are on correct positions?
> >>>
> >>> Cheers
> >>> Petr
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
> >>>> Winsemius
> >>>> Sent: Friday, November 18, 2016 9:30 AM
> >>>> To: Karim Mezhoud <kmezh...@gmail.com>
> >>>> Cc: r-help@r-project.org
> >>>> Subject: Re: [R] aggregate dataframe by multiple factors
> >>>>
> >>>>
> >>>>> On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezh...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Dear all,
> >>>>>
> >>>>> the dat  has missing values NA,
> >>>>>
> >>>>>   first.Name   Name Department  DCE   DP   date
> >>>>> 5  Auction VideosYME 0.57 0.56 2013-09-30
> >>>>> 18   Amish  WivesTAS 0.59 0.56 2013-09-30
> >>>>> 34 Ancient NationQLH 0.54 0.58 2013-09-30
> >>>>> 53 Auction VideosYME   NA   NA 2013-12-28
> >>>>> 66   Amish  WivesTAS   NA   NA 2013-12-28
> >>>>> 82 Ancient NationQLH 0.28 0.29 2013-12-28
> >>>>> 102Auction VideosYME 0.57 0.56 2014-03-30
> >>>>> 115  Amish  WivesTAS 0.59 0.56 2014-03-30
> >>>>> 131Ancient NationQLH 0.54 0.58 2014-03-30
> >>>>> 150Auction VideosYME   NA   NA 2014-06-28
> >>>>> 163  Amish  WivesTAS   NA   NA 2014-06-28
> >>>>> 179Ancient NationQLH 0.28 0.29 2014-06-28
> >>>>>
> >>>>>
> >>>>> agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> >>>>> list(dat$first.Name, dat$Name, dat$Department) , "sort"))
> >>>>
> >>>> The closest I could get on a few attempts was:
> >>>>
> >>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> >>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> unlist(d)}))
> >>>> )
> >>>>
> >>>>  Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
> >>>> 1 Ancient  Nation QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58 0.29
> >>>> 2   Amish   Wives TAS  0.59NA  0.59NA 0.56   NA 0.56   NA
> >>>> 3 Auction  Videos YME  0.57NA  0.57NA 0.56   NA 0.56   NA
> >>>>
> >>>> I think the sort operation might be somewhat ambiguous in this
> instance. I
> >>>> tried:
> >>>>
> >>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> >>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> >>>> unlist(lapply(d,sort))}))
> >>>> )
> >>>>
> >>>> With no success, not even a sorted result.
> >>>>
> >>>> --
> >>>> David.
> >>>>>
> >>>>>
> >>>>> agg has list of value. I would separate value in different columns.
> >>>>>
> >>>>> Group.1 Group.2 Group.3DCE DP
> >>>>> 1 Ancient  Nation QLH 0.28, 0.28, 0.54, 0.54 0.29, 0.29, 0.58,
> 0.58
> >>>>> 2

Re: [R] aggregate dataframe by multiple factors

2016-11-20 Thread David Winsemius

rst.Name, dat$Name, 
>>> dat$Department, drop=T)), rbind))[ooo1,]
>>>  Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
>>> 2   0.28  NA NA
>>> 4   0.28  NA NA
>>> 1   0.540.59   0.57
>>> 3   0.540.59   0.57
>>> 
>>> however I wonder why the order according to the first row is necessary if 
>>> all NAs are on correct positions?
>>> 
>>> Cheers
>>> Petr
>>> 
>>> 
>>>> -Original Message-
>>>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
>>>> Winsemius
>>>> Sent: Friday, November 18, 2016 9:30 AM
>>>> To: Karim Mezhoud <kmezh...@gmail.com>
>>>> Cc: r-help@r-project.org
>>>> Subject: Re: [R] aggregate dataframe by multiple factors
>>>> 
>>>> 
>>>>> On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezh...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> the dat  has missing values NA,
>>>>> 
>>>>>   first.Name   Name Department  DCE   DP   date
>>>>> 5  Auction VideosYME 0.57 0.56 2013-09-30
>>>>> 18   Amish  WivesTAS 0.59 0.56 2013-09-30
>>>>> 34 Ancient NationQLH 0.54 0.58 2013-09-30
>>>>> 53 Auction VideosYME   NA   NA 2013-12-28
>>>>> 66   Amish  WivesTAS   NA   NA 2013-12-28
>>>>> 82 Ancient NationQLH 0.28 0.29 2013-12-28
>>>>> 102Auction VideosYME 0.57 0.56 2014-03-30
>>>>> 115  Amish  WivesTAS 0.59 0.56 2014-03-30
>>>>> 131Ancient NationQLH 0.54 0.58 2014-03-30
>>>>> 150Auction VideosYME   NA   NA 2014-06-28
>>>>> 163  Amish  WivesTAS   NA   NA 2014-06-28
>>>>> 179Ancient NationQLH 0.28 0.29 2014-06-28
>>>>> 
>>>>> 
>>>>> agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
>>>>> list(dat$first.Name, dat$Name, dat$Department) , "sort"))
>>>> 
>>>> The closest I could get on a few attempts was:
>>>> 
>>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
>>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) { unlist(d)}))
>>>> )
>>>> 
>>>>  Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
>>>> 1 Ancient  Nation QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58 0.29
>>>> 2   Amish   Wives TAS  0.59NA  0.59NA 0.56   NA 0.56   NA
>>>> 3 Auction  Videos YME  0.57NA  0.57NA 0.56   NA 0.56   NA
>>>> 
>>>> I think the sort operation might be somewhat ambiguous in this instance. I
>>>> tried:
>>>> 
>>>> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
>>>> list(dat$first.Name, dat$Name, dat$Department) , function(d) {
>>>> unlist(lapply(d,sort))}))
>>>> )
>>>> 
>>>> With no success, not even a sorted result.
>>>> 
>>>> --
>>>> David.
>>>>> 
>>>>> 
>>>>> agg has list of value. I would separate value in different columns.
>>>>> 
>>>>> Group.1 Group.2 Group.3DCE DP
>>>>> 1 Ancient  Nation QLH 0.28, 0.28, 0.54, 0.54 0.29, 0.29, 0.58, 0.58
>>>>> 2   Amish   Wives TAS 0.59, 0.59 0.56, 0.56
>>>>> 3 Auction  Videos YME 0.57, 0.57 0.56, 0.56
>>>>> 
>>>>> The  goal:
>>>>> 
>>>>> Group.1 Group.2 Group.3  DCE.1 DCE.2 DCE.3  DCE.4  DP.1  DP.2  DP.3  DP.4
>>>>> 1 Ancient  Nation QLH0.28 0.280.54 0.54 0.29, 
>>>>> 0.29,
>>>>> 0.58, 0.58
>>>>> 2   Amish   Wives TASNA NA 0.59, 0.59   NA
>>>>> NA  0.56, 0.56
>>>>> 3 Auction  Videos YME NA   NA  0.57, 0.57 NA
>>>>> NA  0.56, 0.56
>>>>> 
>>>>> 
>>>>> 
>>>>> dat <

Re: [R] aggregate dataframe by multiple factors

2016-11-20 Thread David Winsemius


> On Nov 20, 2016, at 10:49 AM, Karim Mezhoud <kmezh...@gmail.com> wrote:
> 
> Yes the results does not have a date but 
> successive DCE.1, DCE2, DCE.3 indicates
>  DCE.date1 , DCE.date2, DCE.date3 
> I hope that the chronological order of date is conserved.

There are, however, 4 dates. See this pair of results. You can probably do 
something if you ever figure out what it is that you precisely want. I think 
the date ordering is automatic here:

> require(reshape2)
Loading required package: reshape2

> dcast(dat, first.Name +  Name + Department ~ date, value.var='DCE')
  first.Name   Name Department 2013-09-30 2013-12-28 2014-03-30 2014-06-28
1  Amish  WivesTAS   0.59 NA   0.59 NA
2Ancient NationQLH   0.54   0.28   0.54   0.28
3Auction VideosYME   0.57 NA   0.57 NA
> dcast(dat, first.Name +  Name + Department ~ date, value.var='DP')
  first.Name   Name Department 2013-09-30 2013-12-28 2014-03-30 2014-06-28
1  Amish  WivesTAS   0.56 NA   0.56 NA
2Ancient NationQLH   0.58   0.29   0.58   0.29
3Auction VideosYME   0.56 NA   0.56 NA



> Thanks,
> Karim
> 
> 
> On Sun, Nov 20, 2016 at 7:44 PM, David Winsemius <dwinsem...@comcast.net> 
> wrote:
> 
> > On Nov 20, 2016, at 5:28 AM, Karim Mezhoud <kmezh...@gmail.com> wrote:
> >
> > Sorry for the delay,
> > Many Thanks for Mr. David and Mr. Petr
> > I thinked  to use "sort" function to arrange chronologically  value by  
> > 'date' (without 'date' is colnames) of each variables (DCE, DP).
> >
> >
> > The solution of David seems to be simple to understand with "unlist" 
> > function.
> > The solution of Petr seems to be fancy. I did not find document  about "I" 
> > argument for aggregate function.
> >
> > How can know which value for which date?
> 
> You asked for a functional reshaping that did not have the date. Now you want 
> the date? There was no date in the result you indicated was desired.   
> ??
> 
> --
> David.
> 
> >
> > I will save the reshaping/ordering dataframe  for later use.
> > Many Thanks,
> > Karim
> >
> >
> > On Fri, Nov 18, 2016 at 11:34 AM, PIKAL Petr <petr.pi...@precheza.cz> wrote:
> > Hi
> >
> > same result can be achieved by
> >
> > dat.ag<-aggregate(dat[ , c("DCE","DP")], by= list(dat$first.Name, dat$Name, 
> > dat$Department) , "I")
> >
> > Sorting according to the first row seems to be quite tricky. You could 
> > probably get closer by using some combination of split and order and 
> > arranging back chunks  of data
> >
> > ooo1<-order(split(dat$DCE,interaction(dat$first.Name, dat$Name, 
> > dat$Department, drop=T))[[1]])
> > data.frame(sapply(split(dat$DCE,interaction(dat$first.Name, dat$Name, 
> > dat$Department, drop=T)), rbind))[ooo1,]
> >   Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
> > 2   0.28  NA NA
> > 4   0.28  NA NA
> > 1   0.540.59   0.57
> > 3   0.540.59   0.57
> >
> > however I wonder why the order according to the first row is necessary if 
> > all NAs are on correct positions?
> >
> > Cheers
> > Petr
> >
> >
> > > -Original Message-
> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
> > > Winsemius
> > > Sent: Friday, November 18, 2016 9:30 AM
> > > To: Karim Mezhoud <kmezh...@gmail.com>
> > > Cc: r-help@r-project.org
> > > Subject: Re: [R] aggregate dataframe by multiple factors
> > >
> > >
> > > > On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezh...@gmail.com>
> > > wrote:
> > > >
> > > > Dear all,
> > > >
> > > > the dat  has missing values NA,
> > > >
> > > >first.Name   Name Department  DCE   DP   date
> > > > 5  Auction VideosYME 0.57 0.56 2013-09-30
> > > > 18   Amish  WivesTAS 0.59 0.56 2013-09-30
> > > > 34 Ancient NationQLH 0.54 0.58 2013-09-30
> > > > 53 Auction VideosYME   NA   NA 2013-12-28
> > > > 66   Amish  WivesTAS   NA   NA 2013-12-28
> > > > 82 Ancient NationQLH 0.28 0.29 2013-12-28
> > > >

Re: [R] aggregate dataframe by multiple factors

2016-11-20 Thread Karim Mezhoud

Yes the results does not have a date but
successive DCE.1, DCE2, DCE.3 indicates
 DCE.date1 , DCE.date2, DCE.date3
I hope that the chronological order of date is conserved.
Thanks,
Karim


On Sun, Nov 20, 2016 at 7:44 PM, David Winsemius <dwinsem...@comcast.net>
wrote:

>
> > On Nov 20, 2016, at 5:28 AM, Karim Mezhoud <kmezh...@gmail.com> wrote:
> >
> > Sorry for the delay,
> > Many Thanks for Mr. David and Mr. Petr
> > I thinked  to use "sort" function to arrange chronologically  value by
> 'date' (without 'date' is colnames) of each variables (DCE, DP).
> >
> >
> > The solution of David seems to be simple to understand with "unlist"
> function.
> > The solution of Petr seems to be fancy. I did not find document  about
> "I" argument for aggregate function.
> >
> > How can know which value for which date?
>
> You asked for a functional reshaping that did not have the date. Now you
> want the date? There was no date in the result you indicated was desired.
>  ??
>
> --
> David.
>
> >
> > I will save the reshaping/ordering dataframe  for later use.
> > Many Thanks,
> > Karim
> >
> >
> > On Fri, Nov 18, 2016 at 11:34 AM, PIKAL Petr <petr.pi...@precheza.cz>
> wrote:
> > Hi
> >
> > same result can be achieved by
> >
> > dat.ag<-aggregate(dat[ , c("DCE","DP")], by= list(dat$first.Name,
> dat$Name, dat$Department) , "I")
> >
> > Sorting according to the first row seems to be quite tricky. You could
> probably get closer by using some combination of split and order and
> arranging back chunks  of data
> >
> > ooo1<-order(split(dat$DCE,interaction(dat$first.Name, dat$Name,
> dat$Department, drop=T))[[1]])
> > data.frame(sapply(split(dat$DCE,interaction(dat$first.Name, dat$Name,
> dat$Department, drop=T)), rbind))[ooo1,]
> >   Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
> > 2   0.28  NA NA
> > 4   0.28  NA NA
> > 1   0.540.59   0.57
> > 3   0.540.59   0.57
> >
> > however I wonder why the order according to the first row is necessary
> if all NAs are on correct positions?
> >
> > Cheers
> > Petr
> >
> >
> > > -Original Message-
> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
> > > Winsemius
> > > Sent: Friday, November 18, 2016 9:30 AM
> > > To: Karim Mezhoud <kmezh...@gmail.com>
> > > Cc: r-help@r-project.org
> > > Subject: Re: [R] aggregate dataframe by multiple factors
> > >
> > >
> > > > On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezh...@gmail.com>
> > > wrote:
> > > >
> > > > Dear all,
> > > >
> > > > the dat  has missing values NA,
> > > >
> > > >first.Name   Name Department  DCE   DP   date
> > > > 5  Auction VideosYME 0.57 0.56 2013-09-30
> > > > 18   Amish  WivesTAS 0.59 0.56 2013-09-30
> > > > 34 Ancient NationQLH 0.54 0.58 2013-09-30
> > > > 53 Auction VideosYME   NA   NA 2013-12-28
> > > > 66   Amish  WivesTAS   NA   NA 2013-12-28
> > > > 82 Ancient NationQLH 0.28 0.29 2013-12-28
> > > > 102Auction VideosYME 0.57 0.56 2014-03-30
> > > > 115  Amish  WivesTAS 0.59 0.56 2014-03-30
> > > > 131Ancient NationQLH 0.54 0.58 2014-03-30
> > > > 150Auction VideosYME   NA   NA 2014-06-28
> > > > 163  Amish  WivesTAS   NA   NA 2014-06-28
> > > > 179Ancient NationQLH 0.28 0.29 2014-06-28
> > > >
> > > >
> > > > agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > > > list(dat$first.Name, dat$Name, dat$Department) , "sort"))
> > >
> > > The closest I could get on a few attempts was:
> > >
> > > (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > > list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> unlist(d)}))
> > >  )
> > >
> > >   Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
> > > 1 Ancient  Nation QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58 0.29
> > > 2   Amish   Wives TAS  0.59NA

Re: [R] aggregate dataframe by multiple factors

2016-11-20 Thread David Winsemius


> On Nov 20, 2016, at 5:28 AM, Karim Mezhoud <kmezh...@gmail.com> wrote:
> 
> Sorry for the delay,
> Many Thanks for Mr. David and Mr. Petr
> I thinked  to use "sort" function to arrange chronologically  value by  
> 'date' (without 'date' is colnames) of each variables (DCE, DP).
> 
> 
> The solution of David seems to be simple to understand with "unlist" function.
> The solution of Petr seems to be fancy. I did not find document  about "I" 
> argument for aggregate function.
> 
> How can know which value for which date?

You asked for a functional reshaping that did not have the date. Now you want 
the date? There was no date in the result you indicated was desired.   
??

-- 
David.

> 
> I will save the reshaping/ordering dataframe  for later use. 
> Many Thanks,
> Karim
> 
> 
> On Fri, Nov 18, 2016 at 11:34 AM, PIKAL Petr <petr.pi...@precheza.cz> wrote:
> Hi
> 
> same result can be achieved by
> 
> dat.ag<-aggregate(dat[ , c("DCE","DP")], by= list(dat$first.Name, dat$Name, 
> dat$Department) , "I")
> 
> Sorting according to the first row seems to be quite tricky. You could 
> probably get closer by using some combination of split and order and 
> arranging back chunks  of data
> 
> ooo1<-order(split(dat$DCE,interaction(dat$first.Name, dat$Name, 
> dat$Department, drop=T))[[1]])
> data.frame(sapply(split(dat$DCE,interaction(dat$first.Name, dat$Name, 
> dat$Department, drop=T)), rbind))[ooo1,]
>   Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
> 2   0.28  NA NA
> 4   0.28  NA NA
> 1   0.540.59   0.57
> 3   0.540.59   0.57
> 
> however I wonder why the order according to the first row is necessary if all 
> NAs are on correct positions?
> 
> Cheers
> Petr
> 
> 
> > -Original Message-----
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
> > Winsemius
> > Sent: Friday, November 18, 2016 9:30 AM
> > To: Karim Mezhoud <kmezh...@gmail.com>
> > Cc: r-help@r-project.org
> > Subject: Re: [R] aggregate dataframe by multiple factors
> >
> >
> > > On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezh...@gmail.com>
> > wrote:
> > >
> > > Dear all,
> > >
> > > the dat  has missing values NA,
> > >
> > >first.Name   Name Department  DCE   DP   date
> > > 5  Auction VideosYME 0.57 0.56 2013-09-30
> > > 18   Amish  WivesTAS 0.59 0.56 2013-09-30
> > > 34 Ancient NationQLH 0.54 0.58 2013-09-30
> > > 53 Auction VideosYME   NA   NA 2013-12-28
> > > 66   Amish  WivesTAS   NA   NA 2013-12-28
> > > 82 Ancient NationQLH 0.28 0.29 2013-12-28
> > > 102Auction VideosYME 0.57 0.56 2014-03-30
> > > 115  Amish  WivesTAS 0.59 0.56 2014-03-30
> > > 131Ancient NationQLH 0.54 0.58 2014-03-30
> > > 150Auction VideosYME   NA   NA 2014-06-28
> > > 163  Amish  WivesTAS   NA   NA 2014-06-28
> > > 179Ancient NationQLH 0.28 0.29 2014-06-28
> > >
> > >
> > > agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > > list(dat$first.Name, dat$Name, dat$Department) , "sort"))
> >
> > The closest I could get on a few attempts was:
> >
> > (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > list(dat$first.Name, dat$Name, dat$Department) , function(d) { unlist(d)}))
> >  )
> >
> >   Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
> > 1 Ancient  Nation QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58 0.29
> > 2   Amish   Wives TAS  0.59NA  0.59NA 0.56   NA 0.56   NA
> > 3 Auction  Videos YME  0.57NA  0.57NA 0.56   NA 0.56   NA
> >
> > I think the sort operation might be somewhat ambiguous in this instance. I
> > tried:
> >
> >  (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> > unlist(lapply(d,sort))}))
> >  )
> >
> > With no success, not even a sorted result.
> >
> > --
> > David.
> > >
> > >
> > > agg has list of value. I would separate value in different columns.
> > >
> > >  Group.1 Group.2 Group.3

Re: [R] aggregate dataframe by multiple factors

2016-11-20 Thread Karim Mezhoud

Sorry for the delay,
Many Thanks for Mr. David and Mr. Petr
I thinked  to use "sort" function to arrange chronologically  value by
'date' (without 'date' is colnames) of each variables (DCE, DP).


The solution of David seems to be simple to understand with "unlist"
function.
The solution of Petr seems to be fancy. I did not find document  about "I"
argument for aggregate function.

How can know which value for which date?

I will save the reshaping/ordering dataframe  for later use.
Many Thanks,
Karim


On Fri, Nov 18, 2016 at 11:34 AM, PIKAL Petr <petr.pi...@precheza.cz> wrote:

> Hi
>
> same result can be achieved by
>
> dat.ag<-aggregate(dat[ , c("DCE","DP")], by= list(dat$first.Name,
> dat$Name, dat$Department) , "I")
>
> Sorting according to the first row seems to be quite tricky. You could
> probably get closer by using some combination of split and order and
> arranging back chunks  of data
>
> ooo1<-order(split(dat$DCE,interaction(dat$first.Name, dat$Name,
> dat$Department, drop=T))[[1]])
> data.frame(sapply(split(dat$DCE,interaction(dat$first.Name, dat$Name,
> dat$Department, drop=T)), rbind))[ooo1,]
>   Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
> 2   0.28  NA NA
> 4   0.28  NA NA
> 1   0.540.59   0.57
> 3   0.540.59   0.57
>
> however I wonder why the order according to the first row is necessary if
> all NAs are on correct positions?
>
> Cheers
> Petr
>
>
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
> > Winsemius
> > Sent: Friday, November 18, 2016 9:30 AM
> > To: Karim Mezhoud <kmezh...@gmail.com>
> > Cc: r-help@r-project.org
> > Subject: Re: [R] aggregate dataframe by multiple factors
> >
> >
> > > On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezh...@gmail.com>
> > wrote:
> > >
> > > Dear all,
> > >
> > > the dat  has missing values NA,
> > >
> > >first.Name   Name Department  DCE   DP   date
> > > 5  Auction VideosYME 0.57 0.56 2013-09-30
> > > 18   Amish  WivesTAS 0.59 0.56 2013-09-30
> > > 34 Ancient NationQLH 0.54 0.58 2013-09-30
> > > 53 Auction VideosYME   NA   NA 2013-12-28
> > > 66   Amish  WivesTAS   NA   NA 2013-12-28
> > > 82 Ancient NationQLH 0.28 0.29 2013-12-28
> > > 102Auction VideosYME 0.57 0.56 2014-03-30
> > > 115  Amish  WivesTAS 0.59 0.56 2014-03-30
> > > 131Ancient NationQLH 0.54 0.58 2014-03-30
> > > 150Auction VideosYME   NA   NA 2014-06-28
> > > 163  Amish  WivesTAS   NA   NA 2014-06-28
> > > 179Ancient NationQLH 0.28 0.29 2014-06-28
> > >
> > >
> > > agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > > list(dat$first.Name, dat$Name, dat$Department) , "sort"))
> >
> > The closest I could get on a few attempts was:
> >
> > (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> unlist(d)}))
> >  )
> >
> >   Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
> > 1 Ancient  Nation QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58 0.29
> > 2   Amish   Wives TAS  0.59NA  0.59NA 0.56   NA 0.56   NA
> > 3 Auction  Videos YME  0.57NA  0.57NA 0.56   NA 0.56   NA
> >
> > I think the sort operation might be somewhat ambiguous in this instance.
> I
> > tried:
> >
> >  (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> > unlist(lapply(d,sort))}))
> >  )
> >
> > With no success, not even a sorted result.
> >
> > --
> > David.
> > >
> > >
> > > agg has list of value. I would separate value in different columns.
> > >
> > >  Group.1 Group.2 Group.3DCE DP
> > > 1 Ancient  Nation QLH 0.28, 0.28, 0.54, 0.54 0.29, 0.29, 0.58, 0.58
> > > 2   Amish   Wives TAS 0.59, 0.59 0.56, 0.56
> > > 3 Auction  Videos YME 0.57, 0.57 0.56, 0.56
> > >
> > > The  goal:
> > >
> > >

Re: [R] aggregate dataframe by multiple factors

2016-11-18 Thread PIKAL Petr

Hi

same result can be achieved by

dat.ag<-aggregate(dat[ , c("DCE","DP")], by= list(dat$first.Name, dat$Name, 
dat$Department) , "I")

Sorting according to the first row seems to be quite tricky. You could probably 
get closer by using some combination of split and order and arranging back 
chunks  of data

ooo1<-order(split(dat$DCE,interaction(dat$first.Name, dat$Name, dat$Department, 
drop=T))[[1]])
data.frame(sapply(split(dat$DCE,interaction(dat$first.Name, dat$Name, 
dat$Department, drop=T)), rbind))[ooo1,]
  Ancient.Nation.QLH Amish.Wives.TAS Auction.Videos.YME
2   0.28  NA NA
4   0.28  NA NA
1   0.540.59   0.57
3   0.540.59   0.57

however I wonder why the order according to the first row is necessary if all 
NAs are on correct positions?

Cheers
Petr


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
> Winsemius
> Sent: Friday, November 18, 2016 9:30 AM
> To: Karim Mezhoud <kmezh...@gmail.com>
> Cc: r-help@r-project.org
> Subject: Re: [R] aggregate dataframe by multiple factors
>
>
> > On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezh...@gmail.com>
> wrote:
> >
> > Dear all,
> >
> > the dat  has missing values NA,
> >
> >first.Name   Name Department  DCE   DP   date
> > 5  Auction VideosYME 0.57 0.56 2013-09-30
> > 18   Amish  WivesTAS 0.59 0.56 2013-09-30
> > 34 Ancient NationQLH 0.54 0.58 2013-09-30
> > 53 Auction VideosYME   NA   NA 2013-12-28
> > 66   Amish  WivesTAS   NA   NA 2013-12-28
> > 82 Ancient NationQLH 0.28 0.29 2013-12-28
> > 102Auction VideosYME 0.57 0.56 2014-03-30
> > 115  Amish  WivesTAS 0.59 0.56 2014-03-30
> > 131Ancient NationQLH 0.54 0.58 2014-03-30
> > 150Auction VideosYME   NA   NA 2014-06-28
> > 163  Amish  WivesTAS   NA   NA 2014-06-28
> > 179Ancient NationQLH 0.28 0.29 2014-06-28
> >
> >
> > agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> > list(dat$first.Name, dat$Name, dat$Department) , "sort"))
>
> The closest I could get on a few attempts was:
>
> (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> list(dat$first.Name, dat$Name, dat$Department) , function(d) { unlist(d)}))
>  )
>
>   Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
> 1 Ancient  Nation QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58 0.29
> 2   Amish   Wives TAS  0.59NA  0.59NA 0.56   NA 0.56   NA
> 3 Auction  Videos YME  0.57NA  0.57NA 0.56   NA 0.56   NA
>
> I think the sort operation might be somewhat ambiguous in this instance. I
> tried:
>
>  (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> list(dat$first.Name, dat$Name, dat$Department) , function(d) {
> unlist(lapply(d,sort))}))
>  )
>
> With no success, not even a sorted result.
>
> --
> David.
> >
> >
> > agg has list of value. I would separate value in different columns.
> >
> >  Group.1 Group.2 Group.3DCE DP
> > 1 Ancient  Nation QLH 0.28, 0.28, 0.54, 0.54 0.29, 0.29, 0.58, 0.58
> > 2   Amish   Wives TAS 0.59, 0.59 0.56, 0.56
> > 3 Auction  Videos YME 0.57, 0.57 0.56, 0.56
> >
> > The  goal:
> >
> > Group.1 Group.2 Group.3  DCE.1 DCE.2 DCE.3  DCE.4  DP.1  DP.2  DP.3  DP.4
> > 1 Ancient  Nation QLH0.28 0.280.54 0.54 0.29, 0.29,
> > 0.58, 0.58
> > 2   Amish   Wives TASNA NA 0.59, 0.59   NA
> > NA  0.56, 0.56
> > 3 Auction  Videos YME NA   NA  0.57, 0.57 NA
> > NA  0.56, 0.56
> >
> >
> >
> > dat <- structure(list(first.Name = structure(c(3L, 1L, 2L, 3L, 1L, 2L,
> > 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("Amish", "Ancient", "Auction",
> > "Ax", "Bachelorette", "Basketball", "BBQ", "Cake", "Celebrity",
> > "Chef", "Clean", "Colonial", "Comedy", "Comic", "Crocodile", "Dog",
> > "Empire", "Extreme", "Farm", "Half Pint", "Hollywood", "House", "Ice
> > Road", "Jersey", "Justice", &quo

Re: [R] aggregate dataframe by multiple factors

2016-11-18 Thread David Winsemius


> On Nov 17, 2016, at 11:27 PM, Karim Mezhoud  wrote:
> 
> Dear all,
> 
> the dat  has missing values NA,
> 
>first.Name   Name Department  DCE   DP   date
> 5  Auction VideosYME 0.57 0.56 2013-09-30
> 18   Amish  WivesTAS 0.59 0.56 2013-09-30
> 34 Ancient NationQLH 0.54 0.58 2013-09-30
> 53 Auction VideosYME   NA   NA 2013-12-28
> 66   Amish  WivesTAS   NA   NA 2013-12-28
> 82 Ancient NationQLH 0.28 0.29 2013-12-28
> 102Auction VideosYME 0.57 0.56 2014-03-30
> 115  Amish  WivesTAS 0.59 0.56 2014-03-30
> 131Ancient NationQLH 0.54 0.58 2014-03-30
> 150Auction VideosYME   NA   NA 2014-06-28
> 163  Amish  WivesTAS   NA   NA 2014-06-28
> 179Ancient NationQLH 0.28 0.29 2014-06-28
> 
> 
> agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> list(dat$first.Name, dat$Name, dat$Department) , "sort"))

The closest I could get on a few attempts was:

(agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
list(dat$first.Name, dat$Name, dat$Department) , function(d) { unlist(d)}))
 )

  Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
1 Ancient  Nation QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58 0.29
2   Amish   Wives TAS  0.59NA  0.59NA 0.56   NA 0.56   NA
3 Auction  Videos YME  0.57NA  0.57NA 0.56   NA 0.56   NA

I think the sort operation might be somewhat ambiguous in this instance. I 
tried:

 (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
list(dat$first.Name, dat$Name, dat$Department) , function(d) { 
unlist(lapply(d,sort))}))
 )

With no success, not even a sorted result.

-- 
David.
> 
> 
> agg has list of value. I would separate value in different columns.
> 
>  Group.1 Group.2 Group.3DCE DP
> 1 Ancient  Nation QLH 0.28, 0.28, 0.54, 0.54 0.29, 0.29, 0.58, 0.58
> 2   Amish   Wives TAS 0.59, 0.59 0.56, 0.56
> 3 Auction  Videos YME 0.57, 0.57 0.56, 0.56
> 
> The  goal:
> 
> Group.1 Group.2 Group.3  DCE.1 DCE.2 DCE.3  DCE.4  DP.1  DP.2  DP.3  DP.4
> 1 Ancient  Nation QLH0.28 0.280.54 0.54 0.29, 0.29,
> 0.58, 0.58
> 2   Amish   Wives TASNA NA 0.59, 0.59   NA
> NA  0.56, 0.56
> 3 Auction  Videos YME NA   NA  0.57, 0.57 NA
> NA  0.56, 0.56
> 
> 
> 
> dat <- structure(list(first.Name = structure(c(3L, 1L, 2L, 3L, 1L, 2L,
> 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("Amish", "Ancient", "Auction",
> "Ax", "Bachelorette", "Basketball", "BBQ", "Cake", "Celebrity",
> "Chef", "Clean", "Colonial", "Comedy", "Comic", "Crocodile",
> "Dog", "Empire", "Extreme", "Farm", "Half Pint", "Hollywood",
> "House", "Ice Road", "Jersey", "Justice", "Love", "Mega", "Model",
> "Modern", "Mountain", "Mystery", "Myth", "New York", "Paradise",
> "Pioneer", "Queer", "Restaurant", "Road", "Royal", "Spouse",
> "Star", "Storage", "Survival", "The Great American", "Tool",
> "Treasure", "Wedding", "Wife"), class = "factor"), Name = structure(c(43L,
> 47L, 29L, 43L, 47L, 29L, 43L, 47L, 29L, 43L, 47L, 29L), .Label =
> c("Aliens",
> "Behavior", "Casino", "Casting Call", "Challenge", "Contest",
> "Crashers", "Crew", "Dad", "Dancing", "Date", "Disasters", "Dynasty",
> "Family", "Garage", "Greenlight", "Gypsies", "Haul", "Hot Rod",
> "Inventor", "Jail", "Job", "Justice", "Marvels", "Master", "Mates",
> "Model", "Moms", "Nation", "Ninja", "Patrol", "People", "Pitmasters",
> "Queens", "Rescue", "Rivals", "Room", "Rooms", "Rules", "Star",
> "Stars", "Superhero", "Videos", "VIP", "Wars", "Wishes", "Wives",
> "Wrangler"), class = "factor"), Department = structure(c(8L,
> 6L, 2L, 8L, 6L, 2L, 8L, 6L, 2L, 8L, 6L, 2L), .Label = c("HXW",
> "QLH", "RAR", "RYC", "SYI", "TAS", "VUV", "YME"), class = "factor"),
>DCE = c(0.57, 0.59, 0.54, NA, NA, 0.28, 0.57, 0.59, 0.54,
>NA, NA, 0.28), DP = c(0.56, 0.56, 0.58, NA, NA, 0.29, 0.56,
>0.56, 0.58, NA, NA, 0.29), date = structure(c(15978, 15978,
>15978, 16067, 16067, 16067, 16159, 16159, 16159, 16249, 16249,
>16249), class = "Date")), description = "", row.names = c(5L,
> 18L, 34L, 53L, 66L, 82L, 102L, 115L, 131L, 150L, 163L, 179L), class =
> "data.frame", .Names = c("first.Name",
> "Name", "Department", "DCE", "DP", "date"))
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting

Re: [R] aggregate

2016-08-24 Thread Gang Chen

Yes, this works out perfectly! Thanks a lot, David. Have a wonderful day...

On Wed, Aug 24, 2016 at 4:24 PM, David L Carlson <dcarl...@tamu.edu> wrote:
> This will work, but you should double-check to be certain that CP and 
> unique(myData[, 3:5]) are in the same order. It will fail if N is not 
> identical for all rows of the same S-Z combination.
>
>> CP <- sapply(split(myData, paste0(myData$S, myData$Z)), function(x)
> +   crossprod(x[, 1], x[, 2]))
>> data.frame(CP, unique(myData[, 3:5]))
> CP   N  S Z
> S1A 22 2.1 S1 A
> S1B 38 2.1 S1 B
> S2A 38 3.2 S2 A
> S2B 22 3.2 S2 B
>
> David C
>
> -Original Message-
> From: Gang Chen [mailto:gangch...@gmail.com]
> Sent: Wednesday, August 24, 2016 2:51 PM
> To: David L Carlson
> Cc: r-help mailing list
> Subject: Re: [R] aggregate
>
> Thanks again for patiently offering great help, David! I just learned
> dput() and paste0() now. Hopefully this is my last question.
>
> Suppose a new dataframe is as below (one more numeric column):
>
> myData <- structure(list(X = c(1, 2, 3, 4, 5, 6, 7, 8), Y = c(8, 7, 6,
> 5, 4, 3, 2, 1), N =c(rep(2.1, 4), rep(3.2, 4)), S = structure(c(1L,
> 1L, 1L, 1L, 2L, 2L, 2L, 2L
> ), .Label = c("S1", "S2"), class = "factor"), Z = structure(c(1L,
> 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor")),
> .Names = c("X",
> "Y", "N", "S", "Z"), row.names = c(NA, -8L), class = "data.frame")
>
>> myData
>
>   X Y   N  S Z
> 1 1 8 2.1 S1 A
> 2 2 7 2.1 S1 A
> 3 3 6 2.1 S1 B
> 4 4 5 2.1 S1 B
> 5 5 4 3.2 S2 A
> 6 6 3 3.2 S2 A
> 7 7 2 3.2 S2 B
> 8 8 1 3.2 S2 B
>
> Once I obtain the cross product,
>
>> sapply(split(myData, paste0(myData$S, myData$Z)), function(x) crossprod(x[, 
>> 1], x[, 2]))
> S1A S1B S2A S2B
>  22  38  38  22
>
> how can I easily add the other 3 columns (N, S, and Z) in a new
> dataframe? For S and Z, I can play with the names from the cross
> product output, but I have trouble dealing with the numeric column N.
>
>
>
>
> On Wed, Aug 24, 2016 at 1:07 PM, David L Carlson <dcarl...@tamu.edu> wrote:
>> You need to spend some time with a basic R tutorial. Your data is messed up 
>> because you did not use a simple text editor somewhere along the way. R 
>> understands ', but not ‘ or ’. The best way to send data to the list is to 
>> use dput:
>>
>>> dput(myData)
>> structure(list(X = c(1, 2, 3, 4, 5, 6, 7, 8), Y = c(8, 7, 6,
>> 5, 4, 3, 2, 1), S = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
>> ), .Label = c("S1", "S2"), class = "factor"), Z = structure(c(1L,
>> 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor")), 
>> .Names = c("X",
>> "Y", "S", "Z"), row.names = c(NA, -8L), class = "data.frame")
>>
>> Combining two labels just requires the paste0() function:
>>
>>> sapply(split(myData, paste0(myData$S, myData$Z)), function(x) crossprod(x[, 
>>> 1], x[, 2]))
>> S1A S1B S2A S2B
>>  22  38  38  22
>>
>> David C
>>
>> -Original Message-
>> From: Gang Chen [mailto:gangch...@gmail.com]
>> Sent: Wednesday, August 24, 2016 11:56 AM
>> To: David L Carlson
>> Cc: Jim Lemon; r-help mailing list
>> Subject: Re: [R] aggregate
>>
>> Thanks a lot, David! I want to further expand the operation a little
>> bit. With a new dataframe:
>>
>> myData <- data.frame(X=c(1, 2, 3, 4, 5, 6, 7, 8), Y=c(8, 7, 6, 5, 4,
>> 3, 2, 1), S=c(‘S1’, ‘S1’, ‘S1’, ‘S1’, ‘S2’, ‘S2’, ‘S2’, ‘S2’),
>> Z=c(‘A’, ‘A’, ‘B’, ‘B’, ‘A’, ‘A’, ‘B’, ‘B’))
>>
>>> myData
>>
>>   X Y  S Z
>> 1 1 8 S1 A
>> 2 2 7 S1 A
>> 3 3 6 S1 B
>> 4 4 5 S1 B
>> 5 5 4 S2 A
>> 6 6 3 S2 A
>> 7 7 2 S2 B
>> 8 8 1 S2 B
>>
>> I would like to obtain the same cross product between columns X and Y,
>> but at each combination level of factors S and Z. In other words, the
>> cross product would be still performed each two rows in the new
>> dataframe myData. How can I achieve that?
>>
>> On Wed, Aug 24, 2016 at 11:54 AM, David L Carlson <dcarl...@tamu.edu> wrote:
>>> Your is fine, but it will be a little simpler if you use sapply() instead:
>>>
>>>> data.frame(Z=levels(myData$Z), CP=sapply(split(myData, myData$Z),
>>> + function(x) crossprod(x[, 1], x[, 2])))
>>>   Z CP
>>> A A 10
>>> B B 10

Re: [R] aggregate

2016-08-24 Thread David L Carlson

This will work, but you should double-check to be certain that CP and 
unique(myData[, 3:5]) are in the same order. It will fail if N is not identical 
for all rows of the same S-Z combination. 

> CP <- sapply(split(myData, paste0(myData$S, myData$Z)), function(x)
+   crossprod(x[, 1], x[, 2]))
> data.frame(CP, unique(myData[, 3:5]))
CP   N  S Z
S1A 22 2.1 S1 A
S1B 38 2.1 S1 B
S2A 38 3.2 S2 A
S2B 22 3.2 S2 B

David C

-Original Message-
From: Gang Chen [mailto:gangch...@gmail.com] 
Sent: Wednesday, August 24, 2016 2:51 PM
To: David L Carlson
Cc: r-help mailing list
Subject: Re: [R] aggregate

Thanks again for patiently offering great help, David! I just learned
dput() and paste0() now. Hopefully this is my last question.

Suppose a new dataframe is as below (one more numeric column):

myData <- structure(list(X = c(1, 2, 3, 4, 5, 6, 7, 8), Y = c(8, 7, 6,
5, 4, 3, 2, 1), N =c(rep(2.1, 4), rep(3.2, 4)), S = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L
), .Label = c("S1", "S2"), class = "factor"), Z = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor")),
.Names = c("X",
"Y", "N", "S", "Z"), row.names = c(NA, -8L), class = "data.frame")

> myData

  X Y   N  S Z
1 1 8 2.1 S1 A
2 2 7 2.1 S1 A
3 3 6 2.1 S1 B
4 4 5 2.1 S1 B
5 5 4 3.2 S2 A
6 6 3 3.2 S2 A
7 7 2 3.2 S2 B
8 8 1 3.2 S2 B

Once I obtain the cross product,

> sapply(split(myData, paste0(myData$S, myData$Z)), function(x) crossprod(x[, 
> 1], x[, 2]))
S1A S1B S2A S2B
 22  38  38  22

how can I easily add the other 3 columns (N, S, and Z) in a new
dataframe? For S and Z, I can play with the names from the cross
product output, but I have trouble dealing with the numeric column N.




On Wed, Aug 24, 2016 at 1:07 PM, David L Carlson <dcarl...@tamu.edu> wrote:
> You need to spend some time with a basic R tutorial. Your data is messed up 
> because you did not use a simple text editor somewhere along the way. R 
> understands ', but not ‘ or ’. The best way to send data to the list is to 
> use dput:
>
>> dput(myData)
> structure(list(X = c(1, 2, 3, 4, 5, 6, 7, 8), Y = c(8, 7, 6,
> 5, 4, 3, 2, 1), S = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
> ), .Label = c("S1", "S2"), class = "factor"), Z = structure(c(1L,
> 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor")), .Names 
> = c("X",
> "Y", "S", "Z"), row.names = c(NA, -8L), class = "data.frame")
>
> Combining two labels just requires the paste0() function:
>
>> sapply(split(myData, paste0(myData$S, myData$Z)), function(x) crossprod(x[, 
>> 1], x[, 2]))
> S1A S1B S2A S2B
>  22  38  38  22
>
> David C
>
> -Original Message-
> From: Gang Chen [mailto:gangch...@gmail.com]
> Sent: Wednesday, August 24, 2016 11:56 AM
> To: David L Carlson
> Cc: Jim Lemon; r-help mailing list
> Subject: Re: [R] aggregate
>
> Thanks a lot, David! I want to further expand the operation a little
> bit. With a new dataframe:
>
> myData <- data.frame(X=c(1, 2, 3, 4, 5, 6, 7, 8), Y=c(8, 7, 6, 5, 4,
> 3, 2, 1), S=c(‘S1’, ‘S1’, ‘S1’, ‘S1’, ‘S2’, ‘S2’, ‘S2’, ‘S2’),
> Z=c(‘A’, ‘A’, ‘B’, ‘B’, ‘A’, ‘A’, ‘B’, ‘B’))
>
>> myData
>
>   X Y  S Z
> 1 1 8 S1 A
> 2 2 7 S1 A
> 3 3 6 S1 B
> 4 4 5 S1 B
> 5 5 4 S2 A
> 6 6 3 S2 A
> 7 7 2 S2 B
> 8 8 1 S2 B
>
> I would like to obtain the same cross product between columns X and Y,
> but at each combination level of factors S and Z. In other words, the
> cross product would be still performed each two rows in the new
> dataframe myData. How can I achieve that?
>
> On Wed, Aug 24, 2016 at 11:54 AM, David L Carlson <dcarl...@tamu.edu> wrote:
>> Your is fine, but it will be a little simpler if you use sapply() instead:
>>
>>> data.frame(Z=levels(myData$Z), CP=sapply(split(myData, myData$Z),
>> + function(x) crossprod(x[, 1], x[, 2])))
>>   Z CP
>> A A 10
>> B B 10
>>
>> David C
>>
>>
>> -Original Message-
>> From: Gang Chen [mailto:gangch...@gmail.com]
>> Sent: Wednesday, August 24, 2016 10:17 AM
>> To: David L Carlson
>> Cc: Jim Lemon; r-help mailing list
>> Subject: Re: [R] aggregate
>>
>> Thank you all for the suggestions! Yes, I'm looking for the cross
>> product between the two columns of X and Y.
>>
>> A follow-up question: what is a nice way to merge the output of
>>
>> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
>>
>> with the column Z in myData so that I would get a new dataframe as the
>>

Re: [R] aggregate

2016-08-24 Thread Gang Chen

Thanks again for patiently offering great help, David! I just learned
dput() and paste0() now. Hopefully this is my last question.

Suppose a new dataframe is as below (one more numeric column):

myData <- structure(list(X = c(1, 2, 3, 4, 5, 6, 7, 8), Y = c(8, 7, 6,
5, 4, 3, 2, 1), N =c(rep(2.1, 4), rep(3.2, 4)), S = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L
), .Label = c("S1", "S2"), class = "factor"), Z = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor")),
.Names = c("X",
"Y", "N", "S", "Z"), row.names = c(NA, -8L), class = "data.frame")

> myData

  X Y   N  S Z
1 1 8 2.1 S1 A
2 2 7 2.1 S1 A
3 3 6 2.1 S1 B
4 4 5 2.1 S1 B
5 5 4 3.2 S2 A
6 6 3 3.2 S2 A
7 7 2 3.2 S2 B
8 8 1 3.2 S2 B

Once I obtain the cross product,

> sapply(split(myData, paste0(myData$S, myData$Z)), function(x) crossprod(x[, 
> 1], x[, 2]))
S1A S1B S2A S2B
 22  38  38  22

how can I easily add the other 3 columns (N, S, and Z) in a new
dataframe? For S and Z, I can play with the names from the cross
product output, but I have trouble dealing with the numeric column N.




On Wed, Aug 24, 2016 at 1:07 PM, David L Carlson <dcarl...@tamu.edu> wrote:
> You need to spend some time with a basic R tutorial. Your data is messed up 
> because you did not use a simple text editor somewhere along the way. R 
> understands ', but not ‘ or ’. The best way to send data to the list is to 
> use dput:
>
>> dput(myData)
> structure(list(X = c(1, 2, 3, 4, 5, 6, 7, 8), Y = c(8, 7, 6,
> 5, 4, 3, 2, 1), S = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
> ), .Label = c("S1", "S2"), class = "factor"), Z = structure(c(1L,
> 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor")), .Names 
> = c("X",
> "Y", "S", "Z"), row.names = c(NA, -8L), class = "data.frame")
>
> Combining two labels just requires the paste0() function:
>
>> sapply(split(myData, paste0(myData$S, myData$Z)), function(x) crossprod(x[, 
>> 1], x[, 2]))
> S1A S1B S2A S2B
>  22  38  38  22
>
> David C
>
> -Original Message-
> From: Gang Chen [mailto:gangch...@gmail.com]
> Sent: Wednesday, August 24, 2016 11:56 AM
> To: David L Carlson
> Cc: Jim Lemon; r-help mailing list
> Subject: Re: [R] aggregate
>
> Thanks a lot, David! I want to further expand the operation a little
> bit. With a new dataframe:
>
> myData <- data.frame(X=c(1, 2, 3, 4, 5, 6, 7, 8), Y=c(8, 7, 6, 5, 4,
> 3, 2, 1), S=c(‘S1’, ‘S1’, ‘S1’, ‘S1’, ‘S2’, ‘S2’, ‘S2’, ‘S2’),
> Z=c(‘A’, ‘A’, ‘B’, ‘B’, ‘A’, ‘A’, ‘B’, ‘B’))
>
>> myData
>
>   X Y  S Z
> 1 1 8 S1 A
> 2 2 7 S1 A
> 3 3 6 S1 B
> 4 4 5 S1 B
> 5 5 4 S2 A
> 6 6 3 S2 A
> 7 7 2 S2 B
> 8 8 1 S2 B
>
> I would like to obtain the same cross product between columns X and Y,
> but at each combination level of factors S and Z. In other words, the
> cross product would be still performed each two rows in the new
> dataframe myData. How can I achieve that?
>
> On Wed, Aug 24, 2016 at 11:54 AM, David L Carlson <dcarl...@tamu.edu> wrote:
>> Your is fine, but it will be a little simpler if you use sapply() instead:
>>
>>> data.frame(Z=levels(myData$Z), CP=sapply(split(myData, myData$Z),
>> + function(x) crossprod(x[, 1], x[, 2])))
>>   Z CP
>> A A 10
>> B B 10
>>
>> David C
>>
>>
>> -Original Message-
>> From: Gang Chen [mailto:gangch...@gmail.com]
>> Sent: Wednesday, August 24, 2016 10:17 AM
>> To: David L Carlson
>> Cc: Jim Lemon; r-help mailing list
>> Subject: Re: [R] aggregate
>>
>> Thank you all for the suggestions! Yes, I'm looking for the cross
>> product between the two columns of X and Y.
>>
>> A follow-up question: what is a nice way to merge the output of
>>
>> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
>>
>> with the column Z in myData so that I would get a new dataframe as the
>> following (the 2nd column is the cross product between X and Y)?
>>
>> Z   CP
>> A   10
>> B   10
>>
>> Is the following legitimate?
>>
>> data.frame(Z=levels(myData$Z), CP= unlist(lapply(split(myData,
>> myData$Z), function(x) crossprod(x[, 1], x[, 2]
>>
>>
>> On Wed, Aug 24, 2016 at 10:37 AM, David L Carlson <dcarl...@tamu.edu> wrote:
>>> Thank you for the reproducible example, but it is not clear what cross 
>>> product you want. Jim's solution gives you the cross product of the 
>>> 2-column matrix with itse

Re: [R] aggregate

2016-08-24 Thread David L Carlson

You need to spend some time with a basic R tutorial. Your data is messed up 
because you did not use a simple text editor somewhere along the way. R 
understands ', but not ‘ or ’. The best way to send data to the list is to use 
dput:

> dput(myData)
structure(list(X = c(1, 2, 3, 4, 5, 6, 7, 8), Y = c(8, 7, 6, 
5, 4, 3, 2, 1), S = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
), .Label = c("S1", "S2"), class = "factor"), Z = structure(c(1L, 
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor")), .Names = 
c("X", 
"Y", "S", "Z"), row.names = c(NA, -8L), class = "data.frame")

Combining two labels just requires the paste0() function:

> sapply(split(myData, paste0(myData$S, myData$Z)), function(x) crossprod(x[, 
> 1], x[, 2]))
S1A S1B S2A S2B 
 22  38  38  22

David C

-Original Message-
From: Gang Chen [mailto:gangch...@gmail.com] 
Sent: Wednesday, August 24, 2016 11:56 AM
To: David L Carlson
Cc: Jim Lemon; r-help mailing list
Subject: Re: [R] aggregate

Thanks a lot, David! I want to further expand the operation a little
bit. With a new dataframe:

myData <- data.frame(X=c(1, 2, 3, 4, 5, 6, 7, 8), Y=c(8, 7, 6, 5, 4,
3, 2, 1), S=c(‘S1’, ‘S1’, ‘S1’, ‘S1’, ‘S2’, ‘S2’, ‘S2’, ‘S2’),
Z=c(‘A’, ‘A’, ‘B’, ‘B’, ‘A’, ‘A’, ‘B’, ‘B’))

> myData

  X Y  S Z
1 1 8 S1 A
2 2 7 S1 A
3 3 6 S1 B
4 4 5 S1 B
5 5 4 S2 A
6 6 3 S2 A
7 7 2 S2 B
8 8 1 S2 B

I would like to obtain the same cross product between columns X and Y,
but at each combination level of factors S and Z. In other words, the
cross product would be still performed each two rows in the new
dataframe myData. How can I achieve that?

On Wed, Aug 24, 2016 at 11:54 AM, David L Carlson <dcarl...@tamu.edu> wrote:
> Your is fine, but it will be a little simpler if you use sapply() instead:
>
>> data.frame(Z=levels(myData$Z), CP=sapply(split(myData, myData$Z),
> + function(x) crossprod(x[, 1], x[, 2])))
>   Z CP
> A A 10
> B B 10
>
> David C
>
>
> -Original Message-----
> From: Gang Chen [mailto:gangch...@gmail.com]
> Sent: Wednesday, August 24, 2016 10:17 AM
> To: David L Carlson
> Cc: Jim Lemon; r-help mailing list
> Subject: Re: [R] aggregate
>
> Thank you all for the suggestions! Yes, I'm looking for the cross
> product between the two columns of X and Y.
>
> A follow-up question: what is a nice way to merge the output of
>
> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
>
> with the column Z in myData so that I would get a new dataframe as the
> following (the 2nd column is the cross product between X and Y)?
>
> Z   CP
> A   10
> B   10
>
> Is the following legitimate?
>
> data.frame(Z=levels(myData$Z), CP= unlist(lapply(split(myData,
> myData$Z), function(x) crossprod(x[, 1], x[, 2]
>
>
> On Wed, Aug 24, 2016 at 10:37 AM, David L Carlson <dcarl...@tamu.edu> wrote:
>> Thank you for the reproducible example, but it is not clear what cross 
>> product you want. Jim's solution gives you the cross product of the 2-column 
>> matrix with itself. If you want the cross product between the columns you 
>> need something else. The aggregate function will not work since it will 
>> treat the columns separately:
>>
>>> A <- as.matrix(myData[myData$Z=="A", 1:2])
>>> A
>>   X Y
>> 1 1 4
>> 2 2 3
>>> crossprod(A) # Same as t(A) %*% A
>>X  Y
>> X  5 10
>> Y 10 25
>>> crossprod(A[, 1], A[, 2]) # Same as t(A[, 1] %*% A[, 2]
>>  [,1]
>> [1,]   10
>>>
>>> # For all the groups
>>> lapply(split(myData, myData$Z), function(x) crossprod(as.matrix(x[, 1:2])))
>> $A
>>X  Y
>> X  5 10
>> Y 10 25
>>
>> $B
>>X  Y
>> X 25 10
>> Y 10  5
>>
>>> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
>> $A
>>  [,1]
>> [1,]   10
>>
>> $B
>>  [,1]
>> [1,]   10
>>
>> -
>> David L Carlson
>> Department of Anthropology
>> Texas A University
>> College Station, TX 77840-4352
>>
>>
>> -Original Message-
>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Lemon
>> Sent: Tuesday, August 23, 2016 6:02 PM
>> To: Gang Chen; r-help mailing list
>> Subject: Re: [R] aggregate
>>
>> Hi Gang Chen,
>> If I have the right idea:
>>
>> for(zval in levels(myData$Z))
>> crossprod(as.matrix(myData[myData$Z==zval,c("X","Y")]))
>>
>> Jim
>>
>> On Wed, Aug 24, 2016 at 8:03 AM, Gang Chen <

Re: [R] aggregate

2016-08-24 Thread Gang Chen

Thanks a lot, David! I want to further expand the operation a little
bit. With a new dataframe:

myData <- data.frame(X=c(1, 2, 3, 4, 5, 6, 7, 8), Y=c(8, 7, 6, 5, 4,
3, 2, 1), S=c(‘S1’, ‘S1’, ‘S1’, ‘S1’, ‘S2’, ‘S2’, ‘S2’, ‘S2’),
Z=c(‘A’, ‘A’, ‘B’, ‘B’, ‘A’, ‘A’, ‘B’, ‘B’))

> myData

  X Y  S Z
1 1 8 S1 A
2 2 7 S1 A
3 3 6 S1 B
4 4 5 S1 B
5 5 4 S2 A
6 6 3 S2 A
7 7 2 S2 B
8 8 1 S2 B

I would like to obtain the same cross product between columns X and Y,
but at each combination level of factors S and Z. In other words, the
cross product would be still performed each two rows in the new
dataframe myData. How can I achieve that?

On Wed, Aug 24, 2016 at 11:54 AM, David L Carlson <dcarl...@tamu.edu> wrote:
> Your is fine, but it will be a little simpler if you use sapply() instead:
>
>> data.frame(Z=levels(myData$Z), CP=sapply(split(myData, myData$Z),
> + function(x) crossprod(x[, 1], x[, 2])))
>   Z CP
> A A 10
> B B 10
>
> David C
>
>
> -Original Message-
> From: Gang Chen [mailto:gangch...@gmail.com]
> Sent: Wednesday, August 24, 2016 10:17 AM
> To: David L Carlson
> Cc: Jim Lemon; r-help mailing list
> Subject: Re: [R] aggregate
>
> Thank you all for the suggestions! Yes, I'm looking for the cross
> product between the two columns of X and Y.
>
> A follow-up question: what is a nice way to merge the output of
>
> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
>
> with the column Z in myData so that I would get a new dataframe as the
> following (the 2nd column is the cross product between X and Y)?
>
> Z   CP
> A   10
> B   10
>
> Is the following legitimate?
>
> data.frame(Z=levels(myData$Z), CP= unlist(lapply(split(myData,
> myData$Z), function(x) crossprod(x[, 1], x[, 2]
>
>
> On Wed, Aug 24, 2016 at 10:37 AM, David L Carlson <dcarl...@tamu.edu> wrote:
>> Thank you for the reproducible example, but it is not clear what cross 
>> product you want. Jim's solution gives you the cross product of the 2-column 
>> matrix with itself. If you want the cross product between the columns you 
>> need something else. The aggregate function will not work since it will 
>> treat the columns separately:
>>
>>> A <- as.matrix(myData[myData$Z=="A", 1:2])
>>> A
>>   X Y
>> 1 1 4
>> 2 2 3
>>> crossprod(A) # Same as t(A) %*% A
>>X  Y
>> X  5 10
>> Y 10 25
>>> crossprod(A[, 1], A[, 2]) # Same as t(A[, 1] %*% A[, 2]
>>  [,1]
>> [1,]   10
>>>
>>> # For all the groups
>>> lapply(split(myData, myData$Z), function(x) crossprod(as.matrix(x[, 1:2])))
>> $A
>>X  Y
>> X  5 10
>> Y 10 25
>>
>> $B
>>X  Y
>> X 25 10
>> Y 10  5
>>
>>> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
>> $A
>>  [,1]
>> [1,]   10
>>
>> $B
>>  [,1]
>> [1,]   10
>>
>> -
>> David L Carlson
>> Department of Anthropology
>> Texas A University
>> College Station, TX 77840-4352
>>
>>
>> -Original Message-
>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Lemon
>> Sent: Tuesday, August 23, 2016 6:02 PM
>> To: Gang Chen; r-help mailing list
>> Subject: Re: [R] aggregate
>>
>> Hi Gang Chen,
>> If I have the right idea:
>>
>> for(zval in levels(myData$Z))
>> crossprod(as.matrix(myData[myData$Z==zval,c("X","Y")]))
>>
>> Jim
>>
>> On Wed, Aug 24, 2016 at 8:03 AM, Gang Chen <gangch...@gmail.com> wrote:
>>> This is a simple question: With a dataframe like the following
>>>
>>> myData <- data.frame(X=c(1, 2, 3, 4), Y=c(4, 3, 2, 1), Z=c('A', 'A', 'B', 
>>> 'B'))
>>>
>>> how can I get the cross product between X and Y for each level of
>>> factor Z? My difficulty is that I don't know how to deal with the fact
>>> that crossprod() acts on two variables in this case.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate

2016-08-24 Thread David L Carlson

Your is fine, but it will be a little simpler if you use sapply() instead:

> data.frame(Z=levels(myData$Z), CP=sapply(split(myData, myData$Z), 
+ function(x) crossprod(x[, 1], x[, 2])))
  Z CP
A A 10
B B 10

David C


-Original Message-
From: Gang Chen [mailto:gangch...@gmail.com] 
Sent: Wednesday, August 24, 2016 10:17 AM
To: David L Carlson
Cc: Jim Lemon; r-help mailing list
Subject: Re: [R] aggregate

Thank you all for the suggestions! Yes, I'm looking for the cross
product between the two columns of X and Y.

A follow-up question: what is a nice way to merge the output of

lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))

with the column Z in myData so that I would get a new dataframe as the
following (the 2nd column is the cross product between X and Y)?

Z   CP
A   10
B   10

Is the following legitimate?

data.frame(Z=levels(myData$Z), CP= unlist(lapply(split(myData,
myData$Z), function(x) crossprod(x[, 1], x[, 2]


On Wed, Aug 24, 2016 at 10:37 AM, David L Carlson <dcarl...@tamu.edu> wrote:
> Thank you for the reproducible example, but it is not clear what cross 
> product you want. Jim's solution gives you the cross product of the 2-column 
> matrix with itself. If you want the cross product between the columns you 
> need something else. The aggregate function will not work since it will treat 
> the columns separately:
>
>> A <- as.matrix(myData[myData$Z=="A", 1:2])
>> A
>   X Y
> 1 1 4
> 2 2 3
>> crossprod(A) # Same as t(A) %*% A
>X  Y
> X  5 10
> Y 10 25
>> crossprod(A[, 1], A[, 2]) # Same as t(A[, 1] %*% A[, 2]
>  [,1]
> [1,]   10
>>
>> # For all the groups
>> lapply(split(myData, myData$Z), function(x) crossprod(as.matrix(x[, 1:2])))
> $A
>X  Y
> X  5 10
> Y 10 25
>
> $B
>X  Y
> X 25 10
> Y 10  5
>
>> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
> $A
>  [,1]
> [1,]   10
>
> $B
>  [,1]
> [1,]   10
>
> -
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77840-4352
>
>
> -----Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Lemon
> Sent: Tuesday, August 23, 2016 6:02 PM
> To: Gang Chen; r-help mailing list
> Subject: Re: [R] aggregate
>
> Hi Gang Chen,
> If I have the right idea:
>
> for(zval in levels(myData$Z))
> crossprod(as.matrix(myData[myData$Z==zval,c("X","Y")]))
>
> Jim
>
> On Wed, Aug 24, 2016 at 8:03 AM, Gang Chen <gangch...@gmail.com> wrote:
>> This is a simple question: With a dataframe like the following
>>
>> myData <- data.frame(X=c(1, 2, 3, 4), Y=c(4, 3, 2, 1), Z=c('A', 'A', 'B', 
>> 'B'))
>>
>> how can I get the cross product between X and Y for each level of
>> factor Z? My difficulty is that I don't know how to deal with the fact
>> that crossprod() acts on two variables in this case.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate

2016-08-24 Thread Gang Chen

Thank you all for the suggestions! Yes, I'm looking for the cross
product between the two columns of X and Y.

A follow-up question: what is a nice way to merge the output of

lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))

with the column Z in myData so that I would get a new dataframe as the
following (the 2nd column is the cross product between X and Y)?

Z   CP
A   10
B   10

Is the following legitimate?

data.frame(Z=levels(myData$Z), CP= unlist(lapply(split(myData,
myData$Z), function(x) crossprod(x[, 1], x[, 2]


On Wed, Aug 24, 2016 at 10:37 AM, David L Carlson <dcarl...@tamu.edu> wrote:
> Thank you for the reproducible example, but it is not clear what cross 
> product you want. Jim's solution gives you the cross product of the 2-column 
> matrix with itself. If you want the cross product between the columns you 
> need something else. The aggregate function will not work since it will treat 
> the columns separately:
>
>> A <- as.matrix(myData[myData$Z=="A", 1:2])
>> A
>   X Y
> 1 1 4
> 2 2 3
>> crossprod(A) # Same as t(A) %*% A
>X  Y
> X  5 10
> Y 10 25
>> crossprod(A[, 1], A[, 2]) # Same as t(A[, 1] %*% A[, 2]
>  [,1]
> [1,]   10
>>
>> # For all the groups
>> lapply(split(myData, myData$Z), function(x) crossprod(as.matrix(x[, 1:2])))
> $A
>X  Y
> X  5 10
> Y 10 25
>
> $B
>X  Y
> X 25 10
> Y 10  5
>
>> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
> $A
>  [,1]
> [1,]   10
>
> $B
>  [,1]
> [1,]   10
>
> -
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77840-4352
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Lemon
> Sent: Tuesday, August 23, 2016 6:02 PM
> To: Gang Chen; r-help mailing list
> Subject: Re: [R] aggregate
>
> Hi Gang Chen,
> If I have the right idea:
>
> for(zval in levels(myData$Z))
> crossprod(as.matrix(myData[myData$Z==zval,c("X","Y")]))
>
> Jim
>
> On Wed, Aug 24, 2016 at 8:03 AM, Gang Chen <gangch...@gmail.com> wrote:
>> This is a simple question: With a dataframe like the following
>>
>> myData <- data.frame(X=c(1, 2, 3, 4), Y=c(4, 3, 2, 1), Z=c('A', 'A', 'B', 
>> 'B'))
>>
>> how can I get the cross product between X and Y for each level of
>> factor Z? My difficulty is that I don't know how to deal with the fact
>> that crossprod() acts on two variables in this case.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate

2016-08-24 Thread David L Carlson

Thank you for the reproducible example, but it is not clear what cross product 
you want. Jim's solution gives you the cross product of the 2-column matrix 
with itself. If you want the cross product between the columns you need 
something else. The aggregate function will not work since it will treat the 
columns separately:

> A <- as.matrix(myData[myData$Z=="A", 1:2])
> A
  X Y
1 1 4
2 2 3
> crossprod(A) # Same as t(A) %*% A 
   X  Y
X  5 10
Y 10 25
> crossprod(A[, 1], A[, 2]) # Same as t(A[, 1] %*% A[, 2]
 [,1]
[1,]   10
> 
> # For all the groups
> lapply(split(myData, myData$Z), function(x) crossprod(as.matrix(x[, 1:2])))
$A
   X  Y
X  5 10
Y 10 25

$B
   X  Y
X 25 10
Y 10  5

> lapply(split(myData, myData$Z), function(x) crossprod(x[, 1], x[, 2]))
$A
 [,1]
[1,]   10

$B
 [,1]
[1,]   10

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Lemon
Sent: Tuesday, August 23, 2016 6:02 PM
To: Gang Chen; r-help mailing list
Subject: Re: [R] aggregate

Hi Gang Chen,
If I have the right idea:

for(zval in levels(myData$Z))
crossprod(as.matrix(myData[myData$Z==zval,c("X","Y")]))

Jim

On Wed, Aug 24, 2016 at 8:03 AM, Gang Chen <gangch...@gmail.com> wrote:
> This is a simple question: With a dataframe like the following
>
> myData <- data.frame(X=c(1, 2, 3, 4), Y=c(4, 3, 2, 1), Z=c('A', 'A', 'B', 
> 'B'))
>
> how can I get the cross product between X and Y for each level of
> factor Z? My difficulty is that I don't know how to deal with the fact
> that crossprod() acts on two variables in this case.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate

2016-08-23 Thread Jim Lemon

Hi Gang Chen,
If I have the right idea:

for(zval in levels(myData$Z))
crossprod(as.matrix(myData[myData$Z==zval,c("X","Y")]))

Jim

On Wed, Aug 24, 2016 at 8:03 AM, Gang Chen  wrote:
> This is a simple question: With a dataframe like the following
>
> myData <- data.frame(X=c(1, 2, 3, 4), Y=c(4, 3, 2, 1), Z=c('A', 'A', 'B', 
> 'B'))
>
> how can I get the cross product between X and Y for each level of
> factor Z? My difficulty is that I don't know how to deal with the fact
> that crossprod() acts on two variables in this case.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate

2016-08-23 Thread David Winsemius

> On Aug 23, 2016, at 3:03 PM, Gang Chen  wrote:
> 
> This is a simple question: With a dataframe like the following
> 
> myData <- data.frame(X=c(1, 2, 3, 4), Y=c(4, 3, 2, 1), Z=c('A', 'A', 'B', 
> 'B'))
> 
> how can I get the cross product between X and Y for each level of
> factor Z? My difficulty is that I don't know how to deal with the fact
> that crossprod() acts on two variables in this case.
> 

Just make a function that takes a dataframe and does a crossprod on two of its 
columns.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate matrix in a 2 by 2 manor

2016-07-31 Thread Anthoni, Peter (IMK)

Hi Jeff,

many thanks, that one is the Speedy Gonzalles out of all. Can also do some FUN 
stuff.

aggregate.nx.ny.array.aperm <- function( dta, nx = 2, ny = 2, FUN=colMeans, ... 
) {
 # number of rows in result
 nnr <- nrow( dta ) %/% ny
 # number of columns in result
 nnc <- ncol( dta ) %/% nx
 # number of values to take mean of
 nxny <- nx * ny
 # describe existing layout of values in dta as 4-d array
 a1 <- array( dta, dim = c( ny, nnr, nx, nnc ) )
 # swap data in dimensions 2 and 3
 a2 <- aperm( a1, c( 1, 3, 2, 4 ) )
 # treat first two dimensions as column vectors, remaining as columns
 a3 <- matrix( a2, nrow = nxny )
 # fast calculation of column means
 v <- FUN( a3, ... )
 # reframe result vector as a matrix
 matrix( v, ncol = nnc )
}


> system.time( {for(i in 1:10) 
> tst_2x2=aggregate.nx.ny.forloop(tst,2,2,mean,na.rm=T)})
   user  system elapsed 
 14.003   0.271  14.663 
> system.time( {for(i in 1:10) 
> tst_2x2=aggregate.nx.ny.interaction(tst,2,2,mean,na.rm=T)})
   user  system elapsed 
 32.686   1.175  35.012 
> system.time( {for(i in 1:10) 
> tst_2x2=aggregate.nx.ny.expand.grid(tst,2,2,mean,na.rm=T)})
   user  system elapsed 
  9.590   0.197   9.951 
> system.time( {for(i in 1:10) 
> tst_2x2=aggregate.nx.ny.array.apply(tst,2,2,mean,na.rm=T)})
   user  system elapsed 
  8.391   0.174   8.737 
> system.time( {for(i in 1:10) 
> tst_2x2=aggregate.nx.ny.array.aperm(tst,2,2,colMeans,na.rm=T)})
   user  system elapsed 
  0.195   0.019   0.216 
> system.time( {for(i in 1:10) 
> tst_2x2=aggregate.nx.ny.array.aperm(tst,2,2,colSums,na.rm=T)})
   user  system elapsed 
  0.169   0.017   0.188 


> aggregate.nx.ny.array.aperm(tst.small,FUN=colMeans)
 [,1] [,2] [,3] [,4]
[1,]  3.5 11.5 19.5 27.5
[2,]  5.5 13.5 21.5 29.5
> aggregate.nx.ny.array.aperm(tst.small,FUN=colSums)
 [,1] [,2] [,3] [,4]
[1,]   14   46   78  110
[2,]   22   54   86  118
> aggregate.nx.ny.forloop(tst.small,FUN=sum)
 [,1] [,2] [,3] [,4]
[1,]   14   46   78  110
[2,]   22   54   86  118

cheers
Peter



> On 31 Jul 2016, at 08:13, Jeff Newmiller  wrote:
> 
> If you don't need all that FUN flexibility, you can get this done way faster 
> with the aperm and colMeans functions:
> 
> tst <- matrix( seq.int( 1440 * 360 )
> , ncol = 1440
> , nrow = 360
> )
> tst.small <- matrix( seq.int( 8 * 4 )
>   , ncol = 8
>   , nrow = 4
>   )
> aggregate.nx.ny.expand.grid <- function( dta, nx = 2, ny = 2, FUN = mean, ... 
> )
> {
>  ilon <- seq( 1, ncol( dta ), nx )
>  ilat <- seq( 1, nrow( dta ), ny )
>  cells <- as.matrix( expand.grid( ilat, ilon ) )
>  blocks <- apply( cells
> , 1
> , function( x ) dta[ x[ 1 ]:( x[ 1 ] + 1 ), x[ 2 ]:( x[ 2 ] + 
> 1 ) ] )
>  block.means <- colMeans( blocks )
>  matrix( block.means
>, nrow( dta ) / ny
>, ncol( dta ) / nx
>)
> }
> 
> aggregate.nx.ny.array.apply <- function( dta, nx = 2, ny = 2, FUN = mean, ... 
> ) {
>  a <- array( dta
>, dim = c( ny
> , nrow( dta ) %/% ny
> , nx
> , ncol( dta ) %/% nx
> )
>)
>  apply( a, c( 2, 4 ), FUN, ... )
> }
> 
> aggregate.nx.ny.array.aperm.mean <- function( dta, nx = 2, ny = 2, ... ) {
>  # number of rows in result
>  nnr <- nrow( dta ) %/% ny
>  # number of columns in result
>  nnc <- ncol( dta ) %/% nx
>  # number of values to take mean of
>  nxny <- nx * ny
>  # describe existing layout of values in dta as 4-d array
>  a1 <- array( dta, dim = c( ny, nnr, nx, nnc ) )
>  # swap data in dimensions 2 and 3
>  a2 <- aperm( a1, c( 1, 3, 2, 4 ) )
>  # treat first two dimensions as column vectors, remaining as columns
>  a3 <- matrix( a2, nrow = nxny )
>  # fast calculation of column means
>  v <- colMeans( a3, ... )
>  # reframe result vector as a matrix
>  matrix( v, ncol = nnc )
> }
> 
> aggregate.nx.ny.array.aperm.apply <- function( dta, nx = 2, ny = 2, FUN = 
> mean, ... ) {
>  # number of rows in result
>  nnr <- nrow( dta ) %/% ny
>  # number of columns in result
>  nnc <- ncol( dta ) %/% nx
>  # number of values to apply FUN to
>  nxny <- nx * ny
>  # describe existing layout of values in dta as 4-d array
>  a1 <- array( dta, dim = c( ny, nnr, nx, nnc ) )
>  # swap data in dimensions 2 and 3
>  a2 <- aperm( a1, c( 1, 3, 2, 4 ) )
>  # treat first two dimensions as column vectors, remaining as columns
>  a3 <- matrix( a2, nrow = nxny )
>  # apply FUN to column vectors
>  v <- apply( a3, 2, FUN = FUN, ... )
>  matrix( v, ncol = nnc )
> }
> test1 <- aggregate.nx.ny.expand.grid( tst )
> test2 <- aggregate.nx.ny.array.apply( tst )
> test3 <- aggregate.nx.ny.array.aperm.mean( tst )
> test4 <- aggregate.nx.ny.array.aperm.apply( tst )
> library(microbenchmark)
> microbenchmark(
>  aggregate.nx.ny.expand.grid( tst, 2, 2, mean, na.rm = TRUE )
> , aggregate.nx.ny.array.apply( tst, 2, 2, mean, na.rm = TRUE )
> ,

Re: [R] Aggregate matrix in a 2 by 2 manor

2016-07-31 Thread Jeff Newmiller

If you don't need all that FUN flexibility, you can get this done way 
faster with the aperm and colMeans functions:


tst <- matrix( seq.int( 1440 * 360 )
 , ncol = 1440
 , nrow = 360
 )
tst.small <- matrix( seq.int( 8 * 4 )
   , ncol = 8
   , nrow = 4
   )
aggregate.nx.ny.expand.grid <- function( dta, nx = 2, ny = 2, FUN = mean, ... )
{
  ilon <- seq( 1, ncol( dta ), nx )
  ilat <- seq( 1, nrow( dta ), ny )
  cells <- as.matrix( expand.grid( ilat, ilon ) )
  blocks <- apply( cells
 , 1
 , function( x ) dta[ x[ 1 ]:( x[ 1 ] + 1 ), x[ 2 ]:( x[ 2 ] + 
1 ) ] )
  block.means <- colMeans( blocks )
  matrix( block.means
, nrow( dta ) / ny
, ncol( dta ) / nx
)
}

aggregate.nx.ny.array.apply <- function( dta, nx = 2, ny = 2, FUN = mean, ... ) 
{
  a <- array( dta
, dim = c( ny
 , nrow( dta ) %/% ny
 , nx
 , ncol( dta ) %/% nx
 )
)
  apply( a, c( 2, 4 ), FUN, ... )
}

aggregate.nx.ny.array.aperm.mean <- function( dta, nx = 2, ny = 2, ... ) {
  # number of rows in result
  nnr <- nrow( dta ) %/% ny
  # number of columns in result
  nnc <- ncol( dta ) %/% nx
  # number of values to take mean of
  nxny <- nx * ny
  # describe existing layout of values in dta as 4-d array
  a1 <- array( dta, dim = c( ny, nnr, nx, nnc ) )
  # swap data in dimensions 2 and 3
  a2 <- aperm( a1, c( 1, 3, 2, 4 ) )
  # treat first two dimensions as column vectors, remaining as columns
  a3 <- matrix( a2, nrow = nxny )
  # fast calculation of column means
  v <- colMeans( a3, ... )
  # reframe result vector as a matrix
  matrix( v, ncol = nnc )
}

aggregate.nx.ny.array.aperm.apply <- function( dta, nx = 2, ny = 2, FUN = mean, 
... ) {
  # number of rows in result
  nnr <- nrow( dta ) %/% ny
  # number of columns in result
  nnc <- ncol( dta ) %/% nx
  # number of values to apply FUN to
  nxny <- nx * ny
  # describe existing layout of values in dta as 4-d array
  a1 <- array( dta, dim = c( ny, nnr, nx, nnc ) )
  # swap data in dimensions 2 and 3
  a2 <- aperm( a1, c( 1, 3, 2, 4 ) )
  # treat first two dimensions as column vectors, remaining as columns
  a3 <- matrix( a2, nrow = nxny )
  # apply FUN to column vectors
  v <- apply( a3, 2, FUN = FUN, ... )
  matrix( v, ncol = nnc )
}
test1 <- aggregate.nx.ny.expand.grid( tst )
test2 <- aggregate.nx.ny.array.apply( tst )
test3 <- aggregate.nx.ny.array.aperm.mean( tst )
test4 <- aggregate.nx.ny.array.aperm.apply( tst )
library(microbenchmark)
microbenchmark(
  aggregate.nx.ny.expand.grid( tst, 2, 2, mean, na.rm = TRUE )
, aggregate.nx.ny.array.apply( tst, 2, 2, mean, na.rm = TRUE )
, aggregate.nx.ny.array.aperm.mean( tst, 2, 2, na.rm = TRUE )
, aggregate.nx.ny.array.aperm.apply( tst, 2, 2, mean, na.rm = TRUE )
)
#Unit: milliseconds
# exprmin
#   aggregate.nx.ny.expand.grid(tst, 2, 2, mean, na.rm = TRUE) 628.528322
#   aggregate.nx.ny.array.apply(tst, 2, 2, mean, na.rm = TRUE) 846.883314
#aggregate.nx.ny.array.aperm.mean(tst, 2, 2, na.rm = TRUE)   8.904369
# aggregate.nx.ny.array.aperm.apply(tst, 2, 2, mean, na.rm = TRUE) 619.691851
# lq   mean medianuq  max neval cld
# 675.470967  916.39630  778.54090  873.9754 2452.695   100  b
# 920.831966 1126.94691 1000.33830 1094.9233 3412.639   100   c
#   9.191747   21.98528   10.30099   15.9169  158.687   100 a
# 733.246331  936.73359  757.58383  844.2016 2824.557   100  b


On Sat, 30 Jul 2016, Jeff Newmiller wrote:


For the record, the array.apply code can be fixed as below, but then it is 
slower than the expand.grid version.

aggregate.nx.ny.array.apply <- function(dta,nx=2,ny=2, FUN=mean,...)
{
  a <- array(dta, dim = c(ny, nrow( dta ) %/% ny, nx, ncol( dta ) %/% nx))
 apply( a, c(2, 4), FUN, ... )
}

--
Sent from my phone. Please excuse my brevity.

On July 30, 2016 11:06:16 AM PDT, "Anthoni, Peter (IMK)" 
 wrote:

Hi all,

thanks for the suggestions, I did some timing tests, see below.
Unfortunately the aggregate.nx.ny.array.apply, does not produce the
expected result.
So the fastest seems to be the aggregate.nx.ny.expand.grid, though the
double for loop is not that much slower.

many thanks
Peter


tst=matrix(1:(1440*360),ncol=1440,nrow=360)
system.time( {for(i in 1:10)

tst_2x2=aggregate.nx.ny.forloop(tst,2,2,mean,na.rm=T)})
  user  system elapsed
11.227   0.073  11.371

system.time( {for(i in 1:10)

tst_2x2=aggregate.nx.ny.interaction(tst,2,2,mean,na.rm=T)})
  user  system elapsed
26.354   0.475  26.880

system.time( {for(i in 1:10)

tst_2x2=aggregate.nx.ny.expand.grid(tst,2,2,mean,na.rm=T)})
  user  system elapsed
 9.683   0.055   9.763

system.time( {for(i in 1:10)

tst_2x2=aggregate.nx.ny.array.apply(tst,2,2,mean,na.rm=T)})
  user  system elapsed
 7.693   0.055   7.800

Re: [R] Aggregate matrix in a 2 by 2 manor

2016-07-30 Thread Jeff Newmiller

For the record, the array.apply code can be fixed as below, but then it is 
slower than the expand.grid version. 

aggregate.nx.ny.array.apply <- function(dta,nx=2,ny=2, FUN=mean,...)
{
   a <- array(dta, dim = c(ny, nrow( dta ) %/% ny, nx, ncol( dta ) %/% nx))
  apply( a, c(2, 4), FUN, ... )
}

-- 
Sent from my phone. Please excuse my brevity.

On July 30, 2016 11:06:16 AM PDT, "Anthoni, Peter (IMK)" 
 wrote:
>Hi all,
>
>thanks for the suggestions, I did some timing tests, see below.
>Unfortunately the aggregate.nx.ny.array.apply, does not produce the
>expected result.
>So the fastest seems to be the aggregate.nx.ny.expand.grid, though the
>double for loop is not that much slower.
>
>many thanks
>Peter
>
>> tst=matrix(1:(1440*360),ncol=1440,nrow=360)
>> system.time( {for(i in 1:10)
>tst_2x2=aggregate.nx.ny.forloop(tst,2,2,mean,na.rm=T)})
>   user  system elapsed 
> 11.227   0.073  11.371 
>> system.time( {for(i in 1:10)
>tst_2x2=aggregate.nx.ny.interaction(tst,2,2,mean,na.rm=T)})
>   user  system elapsed 
> 26.354   0.475  26.880 
>> system.time( {for(i in 1:10)
>tst_2x2=aggregate.nx.ny.expand.grid(tst,2,2,mean,na.rm=T)})
>   user  system elapsed 
>  9.683   0.055   9.763 
>> system.time( {for(i in 1:10)
>tst_2x2=aggregate.nx.ny.array.apply(tst,2,2,mean,na.rm=T)})
>   user  system elapsed 
>  7.693   0.055   7.800 
>
>> tst.small=matrix(1:(8*4),ncol=8,nrow=4)
>> aggregate.nx.ny.forloop = function(data,nx=2,ny=2, FUN=mean,...) 
>+ {
>+   nlon=nrow(data)
>+   nlat=ncol(data)
>+   newdata=matrix(NA,nrow=nlon/nx,ncol=nlat/ny)
>+   dim(newdata)
>+   for(ilon in seq(1,nlon,nx)) {
>+ for(ilat in seq(1,nlat,ny)) {
>+   ilon_new=1+(ilon-1)/nx
>+   ilat_new=1+(ilat-1)/ny
>+   newdata[ilon_new,ilat_new] = FUN(data[ilon+0:1,ilat+0:1],...)
>+ }
>+   }
>+   newdata
>+ }
>> aggregate.nx.ny.forloop(tst.small)
> [,1] [,2] [,3] [,4]
>[1,]  3.5 11.5 19.5 27.5
>[2,]  5.5 13.5 21.5 29.5
>> 
>> aggregate.nx.ny.interaction = function(data,nx=2,ny=2, FUN=mean,...) 
>+ {
>+   
>+   nlon=nrow(data)
>+   nlat=ncol(data)
>+   newdata=matrix(NA,nrow=nlon/nx,ncol=nlat/ny)
>+   newdata[] <- tapply( data, interaction( (row(data)+1) %/% 2,
>(col(data)+1) %/% 2 ), FUN, ...)
>+   newdata
>+ }
>> aggregate.nx.ny.interaction(tst.small)
> [,1] [,2] [,3] [,4]
>[1,]  3.5 11.5 19.5 27.5
>[2,]  5.5 13.5 21.5 29.5
>> 
>> aggregate.nx.ny.expand.grid = function(data,nx=2,ny=2, FUN=mean,...) 
>+ {
>+   ilon <- seq(1,ncol(data),nx)
>+   ilat <- seq(1,nrow(data),ny)
>+   cells <- as.matrix(expand.grid(ilat, ilon))
>+   blocks <- apply(cells, 1, function(x)
>data[x[1]:(x[1]+1),x[2]:(x[2]+1)])
>+   block.means <- colMeans(blocks)
>+   matrix(block.means, nrow(data)/ny, ncol(data)/nx)
>+ }
>> aggregate.nx.ny.expand.grid(tst.small)
> [,1] [,2] [,3] [,4]
>[1,]  3.5 11.5 19.5 27.5
>[2,]  5.5 13.5 21.5 29.5
>> 
>> aggregate.nx.ny.array.apply = function(data,nx=2,ny=2, FUN=mean,...)
>{
>+   a <- array(data, dim = c(ny, nrow( data ) %/% ny, ncol( data ) %/%
>nx))
>+   apply( a, c(2, 3), FUN, ... )
>+ }
>> aggregate.nx.ny.array.apply(tst.small)
> [,1] [,2] [,3] [,4]
>[1,]  1.5  5.5  9.5 13.5
>[2,]  3.5  7.5 11.5 15.5
>
>
>
>> On 28 Jul 2016, at 00:26, David Winsemius 
>wrote:
>> 
>> 
>>> On Jul 27, 2016, at 12:02 PM, Jeff Newmiller
> wrote:
>>> 
>>> An alternative (more compact, not necessarily faster, because apply
>is still a for loop inside):
>>> 
>>> f <- function( m, nx, ny ) {
>>> # redefine the dimensions of my
>>> a <- array( m
>>>, dim = c( ny
>>>   , nrow( m ) %/% ny
>>>   , ncol( m ) %/% nx )
>>>   )
>>> # apply mean over dim 1
>>> apply( a, c( 2, 3 ), FUN=mean )
>>> }
>>> f( tst, nx, ny )
>> 
>> Here's an apparently loopless strategy, although I suspect the code
>for interaction (and maybe tapply as well?) uses a loop.
>> 
>> 
>> tst_2X2 <- matrix(NA, ,ncol=4,nrow=2)
>> 
>> tst_2x2[] <- tapply( tst, interaction( (row(tst)+1) %/% 2,
>(col(tst)+1) %/% 2 ), mean)
>> 
>> tst_2x2
>> 
>> [,1] [,2] [,3] [,4]
>> [1,]  3.5 11.5 19.5 27.5
>> [2,]  5.5 13.5 21.5 29.5
>> 
>> -- 
>> David.
>> 
>> 
>>> 
>>> -- 
>>> Sent from my phone. Please excuse my brevity.
>>> 
>>> On July 27, 2016 9:08:32 AM PDT, David L Carlson 
>wrote:
 This should be faster. It uses apply() across the blocks. 
 
> ilon <- seq(1,8,nx)
> ilat <- seq(1,4,ny)
> cells <- as.matrix(expand.grid(ilat, ilon))
> blocks <- apply(cells, 1, function(x) tst[x[1]:(x[1]+1),
 x[2]:(x[2]+1)])
> block.means <- colMeans(blocks)
> tst_2x2 <- matrix(block.means, 2, 4)
> tst_2x2
   [,1] [,2] [,3] [,4]
 [1,]  3.5 11.5 19.5 27.5
 [2,]  5.5 13.5 21.5 29.5
 
 -
 David L Carlson
 Department of Anthropology
 Texas A University
 College Station, TX 77840-4352
 
 
 
 -Original Message-
 From: R-help

Re: [R] Aggregate matrix in a 2 by 2 manor

2016-07-30 Thread Anthoni, Peter (IMK)

Hi all,

thanks for the suggestions, I did some timing tests, see below.
Unfortunately the aggregate.nx.ny.array.apply, does not produce the expected 
result.
So the fastest seems to be the aggregate.nx.ny.expand.grid, though the double 
for loop is not that much slower.

many thanks
Peter

> tst=matrix(1:(1440*360),ncol=1440,nrow=360)
> system.time( {for(i in 1:10) 
> tst_2x2=aggregate.nx.ny.forloop(tst,2,2,mean,na.rm=T)})
   user  system elapsed 
 11.227   0.073  11.371 
> system.time( {for(i in 1:10) 
> tst_2x2=aggregate.nx.ny.interaction(tst,2,2,mean,na.rm=T)})
   user  system elapsed 
 26.354   0.475  26.880 
> system.time( {for(i in 1:10) 
> tst_2x2=aggregate.nx.ny.expand.grid(tst,2,2,mean,na.rm=T)})
   user  system elapsed 
  9.683   0.055   9.763 
> system.time( {for(i in 1:10) 
> tst_2x2=aggregate.nx.ny.array.apply(tst,2,2,mean,na.rm=T)})
   user  system elapsed 
  7.693   0.055   7.800 

> tst.small=matrix(1:(8*4),ncol=8,nrow=4)
> aggregate.nx.ny.forloop = function(data,nx=2,ny=2, FUN=mean,...) 
+ {
+   nlon=nrow(data)
+   nlat=ncol(data)
+   newdata=matrix(NA,nrow=nlon/nx,ncol=nlat/ny)
+   dim(newdata)
+   for(ilon in seq(1,nlon,nx)) {
+ for(ilat in seq(1,nlat,ny)) {
+   ilon_new=1+(ilon-1)/nx
+   ilat_new=1+(ilat-1)/ny
+   newdata[ilon_new,ilat_new] = FUN(data[ilon+0:1,ilat+0:1],...)
+ }
+   }
+   newdata
+ }
> aggregate.nx.ny.forloop(tst.small)
 [,1] [,2] [,3] [,4]
[1,]  3.5 11.5 19.5 27.5
[2,]  5.5 13.5 21.5 29.5
> 
> aggregate.nx.ny.interaction = function(data,nx=2,ny=2, FUN=mean,...) 
+ {
+   
+   nlon=nrow(data)
+   nlat=ncol(data)
+   newdata=matrix(NA,nrow=nlon/nx,ncol=nlat/ny)
+   newdata[] <- tapply( data, interaction( (row(data)+1) %/% 2, (col(data)+1) 
%/% 2 ), FUN, ...)
+   newdata
+ }
> aggregate.nx.ny.interaction(tst.small)
 [,1] [,2] [,3] [,4]
[1,]  3.5 11.5 19.5 27.5
[2,]  5.5 13.5 21.5 29.5
> 
> aggregate.nx.ny.expand.grid = function(data,nx=2,ny=2, FUN=mean,...) 
+ {
+   ilon <- seq(1,ncol(data),nx)
+   ilat <- seq(1,nrow(data),ny)
+   cells <- as.matrix(expand.grid(ilat, ilon))
+   blocks <- apply(cells, 1, function(x) data[x[1]:(x[1]+1),x[2]:(x[2]+1)])
+   block.means <- colMeans(blocks)
+   matrix(block.means, nrow(data)/ny, ncol(data)/nx)
+ }
> aggregate.nx.ny.expand.grid(tst.small)
 [,1] [,2] [,3] [,4]
[1,]  3.5 11.5 19.5 27.5
[2,]  5.5 13.5 21.5 29.5
> 
> aggregate.nx.ny.array.apply = function(data,nx=2,ny=2, FUN=mean,...) {
+   a <- array(data, dim = c(ny, nrow( data ) %/% ny, ncol( data ) %/% nx))
+   apply( a, c(2, 3), FUN, ... )
+ }
> aggregate.nx.ny.array.apply(tst.small)
 [,1] [,2] [,3] [,4]
[1,]  1.5  5.5  9.5 13.5
[2,]  3.5  7.5 11.5 15.5



> On 28 Jul 2016, at 00:26, David Winsemius  wrote:
> 
> 
>> On Jul 27, 2016, at 12:02 PM, Jeff Newmiller  
>> wrote:
>> 
>> An alternative (more compact, not necessarily faster, because apply is still 
>> a for loop inside):
>> 
>> f <- function( m, nx, ny ) {
>> # redefine the dimensions of my
>> a <- array( m
>>, dim = c( ny
>>   , nrow( m ) %/% ny
>>   , ncol( m ) %/% nx )
>>   )
>> # apply mean over dim 1
>> apply( a, c( 2, 3 ), FUN=mean )
>> }
>> f( tst, nx, ny )
> 
> Here's an apparently loopless strategy, although I suspect the code for 
> interaction (and maybe tapply as well?) uses a loop.
> 
> 
> tst_2X2 <- matrix(NA, ,ncol=4,nrow=2)
> 
> tst_2x2[] <- tapply( tst, interaction( (row(tst)+1) %/% 2, (col(tst)+1) %/% 2 
> ), mean)
> 
> tst_2x2
> 
> [,1] [,2] [,3] [,4]
> [1,]  3.5 11.5 19.5 27.5
> [2,]  5.5 13.5 21.5 29.5
> 
> -- 
> David.
> 
> 
>> 
>> -- 
>> Sent from my phone. Please excuse my brevity.
>> 
>> On July 27, 2016 9:08:32 AM PDT, David L Carlson  wrote:
>>> This should be faster. It uses apply() across the blocks. 
>>> 
 ilon <- seq(1,8,nx)
 ilat <- seq(1,4,ny)
 cells <- as.matrix(expand.grid(ilat, ilon))
 blocks <- apply(cells, 1, function(x) tst[x[1]:(x[1]+1),
>>> x[2]:(x[2]+1)])
 block.means <- colMeans(blocks)
 tst_2x2 <- matrix(block.means, 2, 4)
 tst_2x2
>>>   [,1] [,2] [,3] [,4]
>>> [1,]  3.5 11.5 19.5 27.5
>>> [2,]  5.5 13.5 21.5 29.5
>>> 
>>> -
>>> David L Carlson
>>> Department of Anthropology
>>> Texas A University
>>> College Station, TX 77840-4352
>>> 
>>> 
>>> 
>>> -Original Message-
>>> From: R-help [mailto:r-help-boun...@r-poject.org] On Behalf Of Anthoni,
>>> Peter (IMK)
>>> Sent: Wednesday, July 27, 2016 6:14 AM
>>> To: r-help@r-project.org
>>> Subject: [R] Aggregate matrix in a 2 by 2 manor
>>> 
>>> Hi all,
>>> 
>>> I need to aggregate some matrix data (1440x720) to a lower dimension
>>> (720x360) for lots of years and variables
>>> 
>>> I can do double for loop, but that will be slow. Anybody know a quicker
>>> way?
>>> 
>>> here an example with a smaller matrix size:
>>> 
>>> tst=matrix(1:(8*4),ncol=8,nrow=4)
>>> tst_2x2=matrix(NA,ncol=4,nrow=2)
>>> nx=2

Re: [R] Aggregate matrix in a 2 by 2 manor

2016-07-27 Thread David Winsemius


> On Jul 27, 2016, at 12:02 PM, Jeff Newmiller  wrote:
> 
> An alternative (more compact, not necessarily faster, because apply is still 
> a for loop inside):
> 
> f <- function( m, nx, ny ) {
>  # redefine the dimensions of my
>  a <- array( m
> , dim = c( ny
>, nrow( m ) %/% ny
>, ncol( m ) %/% nx )
>)
>  # apply mean over dim 1
>  apply( a, c( 2, 3 ), FUN=mean )
> }
> f( tst, nx, ny )

Here's an apparently loopless strategy, although I suspect the code for 
interaction (and maybe tapply as well?) uses a loop.


tst_2X2 <- matrix(NA, ,ncol=4,nrow=2)

 tst_2x2[] <- tapply( tst, interaction( (row(tst)+1) %/% 2, (col(tst)+1) %/% 2 
), mean)

 tst_2x2

 [,1] [,2] [,3] [,4]
[1,]  3.5 11.5 19.5 27.5
[2,]  5.5 13.5 21.5 29.5

-- 
David.


> 
> -- 
> Sent from my phone. Please excuse my brevity.
> 
> On July 27, 2016 9:08:32 AM PDT, David L Carlson  wrote:
>> This should be faster. It uses apply() across the blocks. 
>> 
>>> ilon <- seq(1,8,nx)
>>> ilat <- seq(1,4,ny)
>>> cells <- as.matrix(expand.grid(ilat, ilon))
>>> blocks <- apply(cells, 1, function(x) tst[x[1]:(x[1]+1),
>> x[2]:(x[2]+1)])
>>> block.means <- colMeans(blocks)
>>> tst_2x2 <- matrix(block.means, 2, 4)
>>> tst_2x2
>>[,1] [,2] [,3] [,4]
>> [1,]  3.5 11.5 19.5 27.5
>> [2,]  5.5 13.5 21.5 29.5
>> 
>> -
>> David L Carlson
>> Department of Anthropology
>> Texas A University
>> College Station, TX 77840-4352
>> 
>> 
>> 
>> -Original Message-
>> From: R-help [mailto:r-help-boun...@r-poject.org] On Behalf Of Anthoni,
>> Peter (IMK)
>> Sent: Wednesday, July 27, 2016 6:14 AM
>> To: r-help@r-project.org
>> Subject: [R] Aggregate matrix in a 2 by 2 manor
>> 
>> Hi all,
>> 
>> I need to aggregate some matrix data (1440x720) to a lower dimension
>> (720x360) for lots of years and variables
>> 
>> I can do double for loop, but that will be slow. Anybody know a quicker
>> way?
>> 
>> here an example with a smaller matrix size:
>> 
>> tst=matrix(1:(8*4),ncol=8,nrow=4)
>> tst_2x2=matrix(NA,ncol=4,nrow=2)
>> nx=2
>> ny=2
>> for(ilon in seq(1,8,nx)) {
>> for (ilat in seq(1,4,ny)) {
>>   ilon_2x2=1+(ilon-1)/nx
>>   ilat_2x2=1+(ilat-1)/ny
>>   tst_2x2[ilat_2x2,ilon_2x2] = mean(tst[ilat+0:1,ilon+0:1])
>> }
>> }
>> 
>> tst
>> tst_2x2
>> 
>>> tst
>>[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
>> [1,]159   13   17   21   25   29
>> [2,]26   10   14   18   22   26   30
>> [3,]37   11   15   19   23   27   31
>> [4,]48   12   16   20   24   28   32
>> 
>>> tst_2x2
>>[,1] [,2] [,3] [,4]
>> [1,]  3.5 11.5 19.5 27.5
>> [2,]  5.5 13.5 21.5 29.5
>> 
>> 
>> I though a cast to 3d-array might do the trick and apply over the new
>> dimension, but that does not work, since it casts the data along the
>> row.
>>> matrix(apply(array(tst,dim=c(nx,ny,8)),3,mean),nrow=nrow(tst)/ny)
>>[,1] [,2] [,3] [,4]
>> [1,]  2.5 10.5 18.5 26.5
>> [2,]  6.5 14.5 22.5 30.5
>> 
>> 
>> cheers
>> Peter
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate matrix in a 2 by 2 manor

2016-07-27 Thread Jeff Newmiller

An alternative (more compact, not necessarily faster, because apply is still a 
for loop inside):

f <- function( m, nx, ny ) {
  # redefine the dimensions of my
  a <- array( m
 , dim = c( ny
, nrow( m ) %/% ny
, ncol( m ) %/% nx )
)
  # apply mean over dim 1
  apply( a, c( 2, 3 ), FUN=mean )
}
f( tst, nx, ny )

-- 
Sent from my phone. Please excuse my brevity.

On July 27, 2016 9:08:32 AM PDT, David L Carlson  wrote:
>This should be faster. It uses apply() across the blocks. 
>
>> ilon <- seq(1,8,nx)
>> ilat <- seq(1,4,ny)
>> cells <- as.matrix(expand.grid(ilat, ilon))
>> blocks <- apply(cells, 1, function(x) tst[x[1]:(x[1]+1),
>x[2]:(x[2]+1)])
>> block.means <- colMeans(blocks)
>> tst_2x2 <- matrix(block.means, 2, 4)
>> tst_2x2
> [,1] [,2] [,3] [,4]
>[1,]  3.5 11.5 19.5 27.5
>[2,]  5.5 13.5 21.5 29.5
>
>-
>David L Carlson
>Department of Anthropology
>Texas A University
>College Station, TX 77840-4352
>
>
>
>-Original Message-
>From: R-help [mailto:r-help-boun...@r-poject.org] On Behalf Of Anthoni,
>Peter (IMK)
>Sent: Wednesday, July 27, 2016 6:14 AM
>To: r-help@r-project.org
>Subject: [R] Aggregate matrix in a 2 by 2 manor
>
>Hi all,
>
>I need to aggregate some matrix data (1440x720) to a lower dimension
>(720x360) for lots of years and variables
>
>I can do double for loop, but that will be slow. Anybody know a quicker
>way?
>
>here an example with a smaller matrix size:
>
>tst=matrix(1:(8*4),ncol=8,nrow=4)
>tst_2x2=matrix(NA,ncol=4,nrow=2)
>nx=2
>ny=2
>for(ilon in seq(1,8,nx)) {
>  for (ilat in seq(1,4,ny)) {
>ilon_2x2=1+(ilon-1)/nx
>ilat_2x2=1+(ilat-1)/ny
>tst_2x2[ilat_2x2,ilon_2x2] = mean(tst[ilat+0:1,ilon+0:1])
>  }
>}
>
>tst
>tst_2x2
>
>> tst
> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
>[1,]159   13   17   21   25   29
>[2,]26   10   14   18   22   26   30
>[3,]37   11   15   19   23   27   31
>[4,]48   12   16   20   24   28   32
>
>> tst_2x2
> [,1] [,2] [,3] [,4]
>[1,]  3.5 11.5 19.5 27.5
>[2,]  5.5 13.5 21.5 29.5
>
>
>I though a cast to 3d-array might do the trick and apply over the new
>dimension, but that does not work, since it casts the data along the
>row.
>> matrix(apply(array(tst,dim=c(nx,ny,8)),3,mean),nrow=nrow(tst)/ny)
> [,1] [,2] [,3] [,4]
>[1,]  2.5 10.5 18.5 26.5
>[2,]  6.5 14.5 22.5 30.5
>
>
>cheers
>Peter
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate matrix in a 2 by 2 manor

2016-07-27 Thread David L Carlson

This should be faster. It uses apply() across the blocks. 

> ilon <- seq(1,8,nx)
> ilat <- seq(1,4,ny)
> cells <- as.matrix(expand.grid(ilat, ilon))
> blocks <- apply(cells, 1, function(x) tst[x[1]:(x[1]+1), x[2]:(x[2]+1)])
> block.means <- colMeans(blocks)
> tst_2x2 <- matrix(block.means, 2, 4)
> tst_2x2
 [,1] [,2] [,3] [,4]
[1,]  3.5 11.5 19.5 27.5
[2,]  5.5 13.5 21.5 29.5

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-poject.org] On Behalf Of Anthoni, Peter 
(IMK)
Sent: Wednesday, July 27, 2016 6:14 AM
To: r-help@r-project.org
Subject: [R] Aggregate matrix in a 2 by 2 manor

Hi all,

I need to aggregate some matrix data (1440x720) to a lower dimension (720x360) 
for lots of years and variables

I can do double for loop, but that will be slow. Anybody know a quicker way?

here an example with a smaller matrix size:

tst=matrix(1:(8*4),ncol=8,nrow=4)
tst_2x2=matrix(NA,ncol=4,nrow=2)
nx=2
ny=2
for(ilon in seq(1,8,nx)) {
  for (ilat in seq(1,4,ny)) {
ilon_2x2=1+(ilon-1)/nx
ilat_2x2=1+(ilat-1)/ny
tst_2x2[ilat_2x2,ilon_2x2] = mean(tst[ilat+0:1,ilon+0:1])
  }
}

tst
tst_2x2

> tst
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]159   13   17   21   25   29
[2,]26   10   14   18   22   26   30
[3,]37   11   15   19   23   27   31
[4,]48   12   16   20   24   28   32

> tst_2x2
 [,1] [,2] [,3] [,4]
[1,]  3.5 11.5 19.5 27.5
[2,]  5.5 13.5 21.5 29.5


I though a cast to 3d-array might do the trick and apply over the new 
dimension, but that does not work, since it casts the data along the row.
> matrix(apply(array(tst,dim=c(nx,ny,8)),3,mean),nrow=nrow(tst)/ny)
 [,1] [,2] [,3] [,4]
[1,]  2.5 10.5 18.5 26.5
[2,]  6.5 14.5 22.5 30.5


cheers
Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate data to lower resolution

2016-07-22 Thread Miluji Sb

Dear Jean,

Thank you so much for your reply and the solution, This does work. I was
wondering is this similar to 'rasterFromXYZ'? Thanks again!

Sincerely,

Milu

On Fri, Jul 22, 2016 at 3:06 PM, Adams, Jean  wrote:

> Milu,
>
> Perhaps an approach like this would work.  In the example below, I
> calculate the mean GDP for each 1 degree by 1 degree.
>
> temp$long1 <- floor(temp$longitude)
> temp$lat1 <- floor(temp$latitude)
> temp1 <- aggregate(GDP ~ long1 + lat1, temp, mean)
>
>   long1 lat1GDP
> 1   -69  -55 0.90268640
> 2   -68  -55 0.09831317
> 3   -72  -54 0.22379000
> 4   -71  -54 0.14067290
> 5   -70  -54 0.00300380
> 6   -69  -54 0.00574220
>
> Jean
>
> On Thu, Jul 21, 2016 at 3:57 PM, Miluji Sb  wrote:
>
>> Dear all,
>>
>> I have the following GDP data by latitude and longitude at 0.5 degree by
>> 0.5 degree.
>>
>> temp <- dput(head(ptsDF,10))
>> structure(list(longitude = c(-68.25, -67.75, -67.25, -68.25,
>> -67.75, -67.25, -71.25, -70.75, -69.25, -68.75), latitude = c(-54.75,
>> -54.75, -54.75, -54.25, -54.25, -54.25, -53.75, -53.75, -53.75,
>> -53.75), GDP = c(1.683046, 0.3212307, 0.0486207, 0.1223268, 0.0171909,
>> 0.0062104, 0.22379, 0.1406729, 0.0030038, 0.0057422)), .Names =
>> c("longitude",
>> "latitude", "GDP"), row.names = c(4L, 17L, 30L, 43L, 56L, 69L,
>> 82L, 95L, 108L, 121L), class = "data.frame")
>>
>> I would like to aggregate the data 1 degree by 1 degree. I understand that
>> the first step is to convert to raster. I have tried:
>>
>> rasterDF <- rasterFromXYZ(temp)
>> r <- aggregate(rasterDF,fact=2, fun=sum)
>>
>> But this does not seem to work. Could anyone help me out please? Thank you
>> in advance.
>>
>> Sincerely,
>>
>> Milu
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate data to lower resolution

2016-07-22 Thread Adams, Jean

Milu,

Perhaps an approach like this would work.  In the example below, I
calculate the mean GDP for each 1 degree by 1 degree.

temp$long1 <- floor(temp$longitude)
temp$lat1 <- floor(temp$latitude)
temp1 <- aggregate(GDP ~ long1 + lat1, temp, mean)

  long1 lat1GDP
1   -69  -55 0.90268640
2   -68  -55 0.09831317
3   -72  -54 0.22379000
4   -71  -54 0.14067290
5   -70  -54 0.00300380
6   -69  -54 0.00574220

Jean

On Thu, Jul 21, 2016 at 3:57 PM, Miluji Sb  wrote:

> Dear all,
>
> I have the following GDP data by latitude and longitude at 0.5 degree by
> 0.5 degree.
>
> temp <- dput(head(ptsDF,10))
> structure(list(longitude = c(-68.25, -67.75, -67.25, -68.25,
> -67.75, -67.25, -71.25, -70.75, -69.25, -68.75), latitude = c(-54.75,
> -54.75, -54.75, -54.25, -54.25, -54.25, -53.75, -53.75, -53.75,
> -53.75), GDP = c(1.683046, 0.3212307, 0.0486207, 0.1223268, 0.0171909,
> 0.0062104, 0.22379, 0.1406729, 0.0030038, 0.0057422)), .Names =
> c("longitude",
> "latitude", "GDP"), row.names = c(4L, 17L, 30L, 43L, 56L, 69L,
> 82L, 95L, 108L, 121L), class = "data.frame")
>
> I would like to aggregate the data 1 degree by 1 degree. I understand that
> the first step is to convert to raster. I have tried:
>
> rasterDF <- rasterFromXYZ(temp)
> r <- aggregate(rasterDF,fact=2, fun=sum)
>
> But this does not seem to work. Could anyone help me out please? Thank you
> in advance.
>
> Sincerely,
>
> Milu
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate rainfall data

2016-07-19 Thread roslinazairimah zakaria

Hi David,

Thank you so much for your help and others.  Here is the code.

balok <- read.csv("G:/A_backup 11 mei 2015/DATA (D)/1 Universiti Malaysia
Pahang/ISM-3 2016 UM/Data/Hourly Rainfall/balok2.csv",header=TRUE)
head(balok, 10); tail(balok, 10)
str(balok)

## Introduce NAs for
balok$Rain.mm2 <- as.numeric( as.character(balok$Rain.mm) )
head(balok$Rain.mm2); tail(balok$Rain.mm2)
head(balok, 10); tail(balok, 10)

## Change date format from DD/MM/ to Day, Month, Year separately
realdate <- as.Date(balok$Date,format="%d/%m/%Y")
dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))

balok2 <-cbind(dfdate,day,month,year,balok)
colnames(balok2)
head(balok2)

## New data format
balok2_new <- balok2[,c(-1,-5,-7)]
colnames(balok2_new)
head(balok2_new); tail(balok2_new)

## Aggregate data
## Sum rainfall amount from HOURLY to DAILY
dt <- balok2_new
str(dt)
aggbalok_day <- aggregate(dt[,5], by=dt[,c(1,2,3)],FUN=sum, na.rm=TRUE)
head(aggbalok_day)

## Sum rainfall amount from HOURLY to MONTHLY
dt <- balok2_new
str(dt)
aggbalok_mth <- aggregate(dt[,5], by=dt[,c(2,3)],FUN=sum, na.rm=TRUE)
head(aggbalok_mth)

Now I would like to find the basic statistics summary for the data
according to monthly.


Best regards




On Wed, Jul 13, 2016 at 10:37 PM, David Winsemius 
wrote:

>
> > On Jul 13, 2016, at 3:21 AM, roslinazairimah zakaria <
> roslina...@gmail.com> wrote:
> >
> > Dear David,
> >
> > I got your point.  How do I remove the data that contain "0.0?".
> >
> > I tried : balok <- cbind(balok3[,-5],
> balok3$Rain.mm[balok3$Rain.mm==0.0?] <- NA)
>
> If you had done as I suggested, the items with factor levels of "0.0?"
> would have automatically become NA and you would have gotten a warning
> message:
>
> > testfac <- factor( c(rep("0.0",7), "0.07", "0.0?", '0.01', '0.17'))
> > testfac
>  [1] 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.07 0.0? 0.01 0.17
> Levels: 0.0 0.0? 0.01 0.07 0.17
> > as.numeric(as.character( testfac))
>  [1] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.07   NA 0.01 0.17
> Warning message:
> NAs introduced by coercion
>
>
>
> >
> > However all the Rain.mm column all become NA.
> >
> >day month year Time balok3$Rain.mm[balok3$Rain.mm == "0.0?"] <- NA
> > 1   30 7 2008  9:00:00 NA
> > 2   30 7 2008 10:00:00 NA
> > 3   30 7 2008 11:00:00 NA
> > 4   30 7 2008 12:00:00 NA
> > 5   30 7 2008 13:00:00 NA
> > 6   30 7 2008 14:00:00 NA
> > 7   30 7 2008 15:00:00 NA
> > 8   30 7 2008 16:00:00 NA
> > 9   30 7 2008 17:00:00 NA
> > 10  30 7 2008 18:00:00 NA
> >
> > Thank you so much.
> >
> >
> > On Wed, Jul 13, 2016 at 9:42 AM, David Winsemius 
> wrote:
> >
> > > On Jul 12, 2016, at 3:45 PM, roslinazairimah zakaria <
> roslina...@gmail.com> wrote:
> > >
> > > Dear R-users,
> > >
> > > I have these data:
> > >
> > > head(balok, 10); tail(balok, 10)
> > >Date Time Rain.mm
> > > 1  30/7/2008  9:00:00   0
> > > 2  30/7/2008 10:00:00   0
> > > 3  30/7/2008 11:00:00   0
> > > 4  30/7/2008 12:00:00   0
> > > 5  30/7/2008 13:00:00   0
> > > 6  30/7/2008 14:00:00   0
> > > 7  30/7/2008 15:00:00   0
> > > 8  30/7/2008 16:00:00   0
> > > 9  30/7/2008 17:00:00   0
> > > 10 30/7/2008 18:00:00   0
> > >   Date Time Rain.mm
> > > 63667 4/11/2015  3:00:00   0
> > > 63668 4/11/2015  4:00:00   0
> > > 63669 4/11/2015  5:00:00   0
> > > 63670 4/11/2015  6:00:00   0
> > > 63671 4/11/2015  7:00:00   0
> > > 63672 4/11/2015  8:00:00   0
> > > 63673 4/11/2015  9:00:00 0.1
> > > 63674 4/11/2015 10:00:00 0.1
> > > 63675 4/11/2015 11:00:00 0.1
> > > 63676 4/11/2015 12:00:000.1?
> > >
> > >> str(balok)
> > > 'data.frame':   63676 obs. of  3 variables:
> > > $ Date   : Factor w/ 2654 levels "1/1/2009","1/1/2010",..: 2056 2056
> 2056
> > > 2056 2056 2056 2056 2056 2056 2056 ...
> > > $ Time   : Factor w/ 24 levels "1:00:00","10:00:00",..: 24 2 3 4 5 6 7
> 8 9
> > > 10 ...
> > > $ Rain.mm: Factor w/ 352 levels "0","0.0?","0.1",..: 1 1 1 1 1 1 1 1 1
> 1
> >
> > Thar's your problem:
> >
> >   Rain.mm: Factor w/ 352 levels "0","0.0?","0.1"
> >
> > Need to use the standard fix for the screwed-up-factor-on-input-problem
> >
> >   balok$Rain.mm2 <- as.numeric( as.character(balok$Rain.mm) )
> >
> > Cannot just do as.numeric because factors are actually already numeric.
> >
> > --
> > David.
> >
> >
> > > ...
> > >
> > > and I

Re: [R] Aggregate rainfall data

2016-07-13 Thread David Winsemius


> On Jul 13, 2016, at 3:21 AM, roslinazairimah zakaria  
> wrote:
> 
> Dear David,
> 
> I got your point.  How do I remove the data that contain "0.0?".
> 
> I tried : balok <- cbind(balok3[,-5], balok3$Rain.mm[balok3$Rain.mm==0.0?] <- 
> NA)

If you had done as I suggested, the items with factor levels of "0.0?" would 
have automatically become NA and you would have gotten a warning message:

> testfac <- factor( c(rep("0.0",7), "0.07", "0.0?", '0.01', '0.17'))
> testfac
 [1] 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.07 0.0? 0.01 0.17
Levels: 0.0 0.0? 0.01 0.07 0.17
> as.numeric(as.character( testfac))
 [1] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.07   NA 0.01 0.17
Warning message:
NAs introduced by coercion 



> 
> However all the Rain.mm column all become NA.
> 
>day month year Time balok3$Rain.mm[balok3$Rain.mm == "0.0?"] <- NA
> 1   30 7 2008  9:00:00 NA
> 2   30 7 2008 10:00:00 NA
> 3   30 7 2008 11:00:00 NA
> 4   30 7 2008 12:00:00 NA
> 5   30 7 2008 13:00:00 NA
> 6   30 7 2008 14:00:00 NA
> 7   30 7 2008 15:00:00 NA
> 8   30 7 2008 16:00:00 NA
> 9   30 7 2008 17:00:00 NA
> 10  30 7 2008 18:00:00 NA
> 
> Thank you so much.
> 
> 
> On Wed, Jul 13, 2016 at 9:42 AM, David Winsemius  
> wrote:
> 
> > On Jul 12, 2016, at 3:45 PM, roslinazairimah zakaria  
> > wrote:
> >
> > Dear R-users,
> >
> > I have these data:
> >
> > head(balok, 10); tail(balok, 10)
> >Date Time Rain.mm
> > 1  30/7/2008  9:00:00   0
> > 2  30/7/2008 10:00:00   0
> > 3  30/7/2008 11:00:00   0
> > 4  30/7/2008 12:00:00   0
> > 5  30/7/2008 13:00:00   0
> > 6  30/7/2008 14:00:00   0
> > 7  30/7/2008 15:00:00   0
> > 8  30/7/2008 16:00:00   0
> > 9  30/7/2008 17:00:00   0
> > 10 30/7/2008 18:00:00   0
> >   Date Time Rain.mm
> > 63667 4/11/2015  3:00:00   0
> > 63668 4/11/2015  4:00:00   0
> > 63669 4/11/2015  5:00:00   0
> > 63670 4/11/2015  6:00:00   0
> > 63671 4/11/2015  7:00:00   0
> > 63672 4/11/2015  8:00:00   0
> > 63673 4/11/2015  9:00:00 0.1
> > 63674 4/11/2015 10:00:00 0.1
> > 63675 4/11/2015 11:00:00 0.1
> > 63676 4/11/2015 12:00:000.1?
> >
> >> str(balok)
> > 'data.frame':   63676 obs. of  3 variables:
> > $ Date   : Factor w/ 2654 levels "1/1/2009","1/1/2010",..: 2056 2056 2056
> > 2056 2056 2056 2056 2056 2056 2056 ...
> > $ Time   : Factor w/ 24 levels "1:00:00","10:00:00",..: 24 2 3 4 5 6 7 8 9
> > 10 ...
> > $ Rain.mm: Factor w/ 352 levels "0","0.0?","0.1",..: 1 1 1 1 1 1 1 1 1 1
> 
> Thar's your problem:
> 
>   Rain.mm: Factor w/ 352 levels "0","0.0?","0.1"
> 
> Need to use the standard fix for the screwed-up-factor-on-input-problem
> 
>   balok$Rain.mm2 <- as.numeric( as.character(balok$Rain.mm) )
> 
> Cannot just do as.numeric because factors are actually already numeric.
> 
> --
> David.
> 
> 
> > ...
> >
> > and I have change the data as follows:
> >
> > realdate <- as.Date(balok$Date,format="%d/%m/%Y")
> > dfdate <- data.frame(date=realdate)
> > year=as.numeric (format(realdate,"%Y"))
> > month=as.numeric (format(realdate,"%m"))
> > day=as.numeric (format(realdate,"%d"))
> >
> > balok2 <-cbind(dfdate,day,month,year,balok[,2:3])
> > colnames(balok2)
> > head(balok2)
> >date day month year Time Rain.mm
> > 1 2008-07-30  30 7 2008  9:00:00   0
> > 2 2008-07-30  30 7 2008 10:00:00   0
> > 3 2008-07-30  30 7 2008 11:00:00   0
> > 4 2008-07-30  30 7 2008 12:00:00   0
> > 5 2008-07-30  30 7 2008 13:00:00   0
> > 6 2008-07-30  30 7 2008 14:00:00   0
> > ...
> >
> >> balok3 <- balok2[,-1]; head(balok3, n=100)
> >day month year Time Rain.mm
> > 130 7 2008  9:00:00   0
> > 230 7 2008 10:00:00   0
> > 330 7 2008 11:00:00   0
> > 430 7 2008 12:00:00   0
> > 530 7 2008 13:00:00   0
> > 630 7 2008 14:00:00   0
> > 730 7 2008 15:00:00   0
> > 830 7 2008 16:00:00   0
> > 930 7 2008 17:00:00   0
> > 10   30 7 2008 18:00:00   0
> > 11   30 7 2008 19:00:00   0
> > 12   30 7 2008 20:00:00   0
> > 13   30 7 2008 21:00:00   0
> > 14   30 7 2008 22:00:00   0
> > 15   30 7 2008 23:00:00   0
> > 16   30 7 2008 24:00:00   0
> > 17   31 7 2008  1:00:00   0
> > 18   31 7 2008  2:00:00   0
> > 19   31 7 2008  3:00:00   0
> > 20   31 7 2008

Re: [R] Aggregate rainfall data

2016-07-13 Thread PIKAL Petr

Hi

First you could check what levels are not numeric by

levels(balok3$Rain.mm)

If only 0.0? is the offending level you can either change it to 0 by

levels(balok3$Rain.mm)[number of ofending level] <- "0.0"

and then change the factor to numeric by

balok3$Rain.mm <- as.numeric(as.character(balok3$Rain.mm))

Or you can treat those levels as missing.

levels(balok3$Rain.mm)[number of ofending level] <- NA
balok3$Rain.mm <- as.numeric(as.character(balok3$Rain.mm))

Cheers
Petr



> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> roslinazairimah zakaria
> Sent: Wednesday, July 13, 2016 12:22 PM
> To: David Winsemius <dwinsem...@comcast.net>
> Cc: r-help mailing list <r-help@r-project.org>
> Subject: Re: [R] Aggregate rainfall data
>
> Dear David,
>
> I got your point.  How do I remove the data that contain "0.0?".
>
> I tried : balok <- cbind(balok3[,-5], balok3$Rain.mm[balok3$Rain.mm==0.0?]
> <- NA)
>
> However all the Rain.mm column all become NA.
>
>day month year Time balok3$Rain.mm[balok3$Rain.mm == "0.0?"] <- NA
> 1   30 7 2008  9:00:00 NA
> 2   30 7 2008 10:00:00 NA
> 3   30 7 2008 11:00:00 NA
> 4   30 7 2008 12:00:00 NA
> 5   30 7 2008 13:00:00 NA
> 6   30 7 2008 14:00:00 NA
> 7   30 7 2008 15:00:00 NA
> 8   30 7 2008 16:00:00 NA
> 9   30 7 2008 17:00:00 NA
> 10  30 7 2008 18:00:00 NA
>
> Thank you so much.
>
>
> On Wed, Jul 13, 2016 at 9:42 AM, David Winsemius
> <dwinsem...@comcast.net>
> wrote:
>
> >
> > > On Jul 12, 2016, at 3:45 PM, roslinazairimah zakaria <
> > roslina...@gmail.com> wrote:
> > >
> > > Dear R-users,
> > >
> > > I have these data:
> > >
> > > head(balok, 10); tail(balok, 10)
> > >Date Time Rain.mm
> > > 1  30/7/2008  9:00:00   0
> > > 2  30/7/2008 10:00:00   0
> > > 3  30/7/2008 11:00:00   0
> > > 4  30/7/2008 12:00:00   0
> > > 5  30/7/2008 13:00:00   0
> > > 6  30/7/2008 14:00:00   0
> > > 7  30/7/2008 15:00:00   0
> > > 8  30/7/2008 16:00:00   0
> > > 9  30/7/2008 17:00:00   0
> > > 10 30/7/2008 18:00:00   0
> > >   Date Time Rain.mm
> > > 63667 4/11/2015  3:00:00   0
> > > 63668 4/11/2015  4:00:00   0
> > > 63669 4/11/2015  5:00:00   0
> > > 63670 4/11/2015  6:00:00   0
> > > 63671 4/11/2015  7:00:00   0
> > > 63672 4/11/2015  8:00:00   0
> > > 63673 4/11/2015  9:00:00 0.1
> > > 63674 4/11/2015 10:00:00 0.1
> > > 63675 4/11/2015 11:00:00 0.1
> > > 63676 4/11/2015 12:00:000.1?
> > >
> > >> str(balok)
> > > 'data.frame':   63676 obs. of  3 variables:
> > > $ Date   : Factor w/ 2654 levels "1/1/2009","1/1/2010",..: 2056 2056 2056
> > > 2056 2056 2056 2056 2056 2056 2056 ...
> > > $ Time   : Factor w/ 24 levels "1:00:00","10:00:00",..: 24 2 3 4 5 6 7 8
> > 9
> > > 10 ...
> > > $ Rain.mm: Factor w/ 352 levels "0","0.0?","0.1",..: 1 1 1 1 1 1 1 1
> > > 1 1
> >
> > Thar's your problem:
> >
> >   Rain.mm: Factor w/ 352 levels "0","0.0?","0.1"
> >
> > Need to use the standard fix for the
> > screwed-up-factor-on-input-problem
> >
> >   balok$Rain.mm2 <- as.numeric( as.character(balok$Rain.mm) )
> >
> > Cannot just do as.numeric because factors are actually already numeric.
> >
> > --
> > David.
> >
> >
> > > ...
> > >
> > > and I have change the data as follows:
> > >
> > > realdate <- as.Date(balok$Date,format="%d/%m/%Y")
> > > dfdate <- data.frame(date=realdate)
> > > year=as.numeric (format(realdate,"%Y")) month=as.numeric
> > > (format(realdate,"%m")) day=as.numeric (format(realdate,"%d"))
> > >
> > > balok2 <-cbind(dfdate,day,month,year,balok[,2:3])
> > > colnames(balok2)

Re: [R] Aggregate rainfall data

2016-07-13 Thread boB Rudis

use `gsub()` after the `as.character()` conversion to remove
everything but valid numeric components from the strings.

On Wed, Jul 13, 2016 at 6:21 AM, roslinazairimah zakaria
 wrote:
> Dear David,
>
> I got your point.  How do I remove the data that contain "0.0?".
>
> I tried : balok <- cbind(balok3[,-5], balok3$Rain.mm[balok3$Rain.mm==0.0?]
> <- NA)
>
> However all the Rain.mm column all become NA.
>
>day month year Time balok3$Rain.mm[balok3$Rain.mm == "0.0?"] <- NA
> 1   30 7 2008  9:00:00 NA
> 2   30 7 2008 10:00:00 NA
> 3   30 7 2008 11:00:00 NA
> 4   30 7 2008 12:00:00 NA
> 5   30 7 2008 13:00:00 NA
> 6   30 7 2008 14:00:00 NA
> 7   30 7 2008 15:00:00 NA
> 8   30 7 2008 16:00:00 NA
> 9   30 7 2008 17:00:00 NA
> 10  30 7 2008 18:00:00 NA
>
> Thank you so much.
>
>
> On Wed, Jul 13, 2016 at 9:42 AM, David Winsemius 
> wrote:
>
>>
>> > On Jul 12, 2016, at 3:45 PM, roslinazairimah zakaria <
>> roslina...@gmail.com> wrote:
>> >
>> > Dear R-users,
>> >
>> > I have these data:
>> >
>> > head(balok, 10); tail(balok, 10)
>> >Date Time Rain.mm
>> > 1  30/7/2008  9:00:00   0
>> > 2  30/7/2008 10:00:00   0
>> > 3  30/7/2008 11:00:00   0
>> > 4  30/7/2008 12:00:00   0
>> > 5  30/7/2008 13:00:00   0
>> > 6  30/7/2008 14:00:00   0
>> > 7  30/7/2008 15:00:00   0
>> > 8  30/7/2008 16:00:00   0
>> > 9  30/7/2008 17:00:00   0
>> > 10 30/7/2008 18:00:00   0
>> >   Date Time Rain.mm
>> > 63667 4/11/2015  3:00:00   0
>> > 63668 4/11/2015  4:00:00   0
>> > 63669 4/11/2015  5:00:00   0
>> > 63670 4/11/2015  6:00:00   0
>> > 63671 4/11/2015  7:00:00   0
>> > 63672 4/11/2015  8:00:00   0
>> > 63673 4/11/2015  9:00:00 0.1
>> > 63674 4/11/2015 10:00:00 0.1
>> > 63675 4/11/2015 11:00:00 0.1
>> > 63676 4/11/2015 12:00:000.1?
>> >
>> >> str(balok)
>> > 'data.frame':   63676 obs. of  3 variables:
>> > $ Date   : Factor w/ 2654 levels "1/1/2009","1/1/2010",..: 2056 2056 2056
>> > 2056 2056 2056 2056 2056 2056 2056 ...
>> > $ Time   : Factor w/ 24 levels "1:00:00","10:00:00",..: 24 2 3 4 5 6 7 8
>> 9
>> > 10 ...
>> > $ Rain.mm: Factor w/ 352 levels "0","0.0?","0.1",..: 1 1 1 1 1 1 1 1 1 1
>>
>> Thar's your problem:
>>
>>   Rain.mm: Factor w/ 352 levels "0","0.0?","0.1"
>>
>> Need to use the standard fix for the screwed-up-factor-on-input-problem
>>
>>   balok$Rain.mm2 <- as.numeric( as.character(balok$Rain.mm) )
>>
>> Cannot just do as.numeric because factors are actually already numeric.
>>
>> --
>> David.
>>
>>
>> > ...
>> >
>> > and I have change the data as follows:
>> >
>> > realdate <- as.Date(balok$Date,format="%d/%m/%Y")
>> > dfdate <- data.frame(date=realdate)
>> > year=as.numeric (format(realdate,"%Y"))
>> > month=as.numeric (format(realdate,"%m"))
>> > day=as.numeric (format(realdate,"%d"))
>> >
>> > balok2 <-cbind(dfdate,day,month,year,balok[,2:3])
>> > colnames(balok2)
>> > head(balok2)
>> >date day month year Time Rain.mm
>> > 1 2008-07-30  30 7 2008  9:00:00   0
>> > 2 2008-07-30  30 7 2008 10:00:00   0
>> > 3 2008-07-30  30 7 2008 11:00:00   0
>> > 4 2008-07-30  30 7 2008 12:00:00   0
>> > 5 2008-07-30  30 7 2008 13:00:00   0
>> > 6 2008-07-30  30 7 2008 14:00:00   0
>> > ...
>> >
>> >> balok3 <- balok2[,-1]; head(balok3, n=100)
>> >day month year Time Rain.mm
>> > 130 7 2008  9:00:00   0
>> > 230 7 2008 10:00:00   0
>> > 330 7 2008 11:00:00   0
>> > 430 7 2008 12:00:00   0
>> > 530 7 2008 13:00:00   0
>> > 630 7 2008 14:00:00   0
>> > 730 7 2008 15:00:00   0
>> > 830 7 2008 16:00:00   0
>> > 930 7 2008 17:00:00   0
>> > 10   30 7 2008 18:00:00   0
>> > 11   30 7 2008 19:00:00   0
>> > 12   30 7 2008 20:00:00   0
>> > 13   30 7 2008 21:00:00   0
>> > 14   30 7 2008 22:00:00   0
>> > 15   30 7 2008 23:00:00   0
>> > 16   30 7 2008 24:00:00   0
>> > 17   31 7 2008  1:00:00   0
>> > 18   31 7 2008  2:00:00   0
>> > 19   31 7 2008  3:00:00   0
>> > 20   31 7 2008  4:00:00   0
>> > 21   31 7 2008  5:00:00   0
>> > 22   31 7 2008  6:00:00   0
>> > 23   31 7 2008  7:00:00   0
>> > 24   31 7 2008  8:00:00   0
>> > 25   31 7 2008  9:00:00   0
>> > 26   31 7 2008 10:00:00   0
>> > 27   31

Re: [R] Aggregate rainfall data

2016-07-13 Thread roslinazairimah zakaria

Dear David,

I got your point.  How do I remove the data that contain "0.0?".

I tried : balok <- cbind(balok3[,-5], balok3$Rain.mm[balok3$Rain.mm==0.0?]
<- NA)

However all the Rain.mm column all become NA.

   day month year Time balok3$Rain.mm[balok3$Rain.mm == "0.0?"] <- NA
1   30 7 2008  9:00:00 NA
2   30 7 2008 10:00:00 NA
3   30 7 2008 11:00:00 NA
4   30 7 2008 12:00:00 NA
5   30 7 2008 13:00:00 NA
6   30 7 2008 14:00:00 NA
7   30 7 2008 15:00:00 NA
8   30 7 2008 16:00:00 NA
9   30 7 2008 17:00:00 NA
10  30 7 2008 18:00:00 NA

Thank you so much.


On Wed, Jul 13, 2016 at 9:42 AM, David Winsemius 
wrote:

>
> > On Jul 12, 2016, at 3:45 PM, roslinazairimah zakaria <
> roslina...@gmail.com> wrote:
> >
> > Dear R-users,
> >
> > I have these data:
> >
> > head(balok, 10); tail(balok, 10)
> >Date Time Rain.mm
> > 1  30/7/2008  9:00:00   0
> > 2  30/7/2008 10:00:00   0
> > 3  30/7/2008 11:00:00   0
> > 4  30/7/2008 12:00:00   0
> > 5  30/7/2008 13:00:00   0
> > 6  30/7/2008 14:00:00   0
> > 7  30/7/2008 15:00:00   0
> > 8  30/7/2008 16:00:00   0
> > 9  30/7/2008 17:00:00   0
> > 10 30/7/2008 18:00:00   0
> >   Date Time Rain.mm
> > 63667 4/11/2015  3:00:00   0
> > 63668 4/11/2015  4:00:00   0
> > 63669 4/11/2015  5:00:00   0
> > 63670 4/11/2015  6:00:00   0
> > 63671 4/11/2015  7:00:00   0
> > 63672 4/11/2015  8:00:00   0
> > 63673 4/11/2015  9:00:00 0.1
> > 63674 4/11/2015 10:00:00 0.1
> > 63675 4/11/2015 11:00:00 0.1
> > 63676 4/11/2015 12:00:000.1?
> >
> >> str(balok)
> > 'data.frame':   63676 obs. of  3 variables:
> > $ Date   : Factor w/ 2654 levels "1/1/2009","1/1/2010",..: 2056 2056 2056
> > 2056 2056 2056 2056 2056 2056 2056 ...
> > $ Time   : Factor w/ 24 levels "1:00:00","10:00:00",..: 24 2 3 4 5 6 7 8
> 9
> > 10 ...
> > $ Rain.mm: Factor w/ 352 levels "0","0.0?","0.1",..: 1 1 1 1 1 1 1 1 1 1
>
> Thar's your problem:
>
>   Rain.mm: Factor w/ 352 levels "0","0.0?","0.1"
>
> Need to use the standard fix for the screwed-up-factor-on-input-problem
>
>   balok$Rain.mm2 <- as.numeric( as.character(balok$Rain.mm) )
>
> Cannot just do as.numeric because factors are actually already numeric.
>
> --
> David.
>
>
> > ...
> >
> > and I have change the data as follows:
> >
> > realdate <- as.Date(balok$Date,format="%d/%m/%Y")
> > dfdate <- data.frame(date=realdate)
> > year=as.numeric (format(realdate,"%Y"))
> > month=as.numeric (format(realdate,"%m"))
> > day=as.numeric (format(realdate,"%d"))
> >
> > balok2 <-cbind(dfdate,day,month,year,balok[,2:3])
> > colnames(balok2)
> > head(balok2)
> >date day month year Time Rain.mm
> > 1 2008-07-30  30 7 2008  9:00:00   0
> > 2 2008-07-30  30 7 2008 10:00:00   0
> > 3 2008-07-30  30 7 2008 11:00:00   0
> > 4 2008-07-30  30 7 2008 12:00:00   0
> > 5 2008-07-30  30 7 2008 13:00:00   0
> > 6 2008-07-30  30 7 2008 14:00:00   0
> > ...
> >
> >> balok3 <- balok2[,-1]; head(balok3, n=100)
> >day month year Time Rain.mm
> > 130 7 2008  9:00:00   0
> > 230 7 2008 10:00:00   0
> > 330 7 2008 11:00:00   0
> > 430 7 2008 12:00:00   0
> > 530 7 2008 13:00:00   0
> > 630 7 2008 14:00:00   0
> > 730 7 2008 15:00:00   0
> > 830 7 2008 16:00:00   0
> > 930 7 2008 17:00:00   0
> > 10   30 7 2008 18:00:00   0
> > 11   30 7 2008 19:00:00   0
> > 12   30 7 2008 20:00:00   0
> > 13   30 7 2008 21:00:00   0
> > 14   30 7 2008 22:00:00   0
> > 15   30 7 2008 23:00:00   0
> > 16   30 7 2008 24:00:00   0
> > 17   31 7 2008  1:00:00   0
> > 18   31 7 2008  2:00:00   0
> > 19   31 7 2008  3:00:00   0
> > 20   31 7 2008  4:00:00   0
> > 21   31 7 2008  5:00:00   0
> > 22   31 7 2008  6:00:00   0
> > 23   31 7 2008  7:00:00   0
> > 24   31 7 2008  8:00:00   0
> > 25   31 7 2008  9:00:00   0
> > 26   31 7 2008 10:00:00   0
> > 27   31 7 2008 11:00:00   0
> > 28   31 7 2008 12:00:00   0
> > 29   31 7 2008 13:00:00   0
> > 30   31 7 2008 14:00:00   0
> > 31   31 7 2008 15:00:00   0
> > 32   31 7 2008 16:00:00   0
> > 33   31 7 2008 17:00:00   0
> > 34   31 7 2008 18:00:00   0
> > 35   31 7 2008 19:00:00   0
> > 36   31

Re: [R] Aggregate rainfall data

2016-07-12 Thread David Winsemius


> On Jul 12, 2016, at 3:45 PM, roslinazairimah zakaria  
> wrote:
> 
> Dear R-users,
> 
> I have these data:
> 
> head(balok, 10); tail(balok, 10)
>Date Time Rain.mm
> 1  30/7/2008  9:00:00   0
> 2  30/7/2008 10:00:00   0
> 3  30/7/2008 11:00:00   0
> 4  30/7/2008 12:00:00   0
> 5  30/7/2008 13:00:00   0
> 6  30/7/2008 14:00:00   0
> 7  30/7/2008 15:00:00   0
> 8  30/7/2008 16:00:00   0
> 9  30/7/2008 17:00:00   0
> 10 30/7/2008 18:00:00   0
>   Date Time Rain.mm
> 63667 4/11/2015  3:00:00   0
> 63668 4/11/2015  4:00:00   0
> 63669 4/11/2015  5:00:00   0
> 63670 4/11/2015  6:00:00   0
> 63671 4/11/2015  7:00:00   0
> 63672 4/11/2015  8:00:00   0
> 63673 4/11/2015  9:00:00 0.1
> 63674 4/11/2015 10:00:00 0.1
> 63675 4/11/2015 11:00:00 0.1
> 63676 4/11/2015 12:00:000.1?
> 
>> str(balok)
> 'data.frame':   63676 obs. of  3 variables:
> $ Date   : Factor w/ 2654 levels "1/1/2009","1/1/2010",..: 2056 2056 2056
> 2056 2056 2056 2056 2056 2056 2056 ...
> $ Time   : Factor w/ 24 levels "1:00:00","10:00:00",..: 24 2 3 4 5 6 7 8 9
> 10 ...
> $ Rain.mm: Factor w/ 352 levels "0","0.0?","0.1",..: 1 1 1 1 1 1 1 1 1 1

Thar's your problem:

  Rain.mm: Factor w/ 352 levels "0","0.0?","0.1"

Need to use the standard fix for the screwed-up-factor-on-input-problem

  balok$Rain.mm2 <- as.numeric( as.character(balok$Rain.mm) )

Cannot just do as.numeric because factors are actually already numeric.

-- 
David.


> ...
> 
> and I have change the data as follows:
> 
> realdate <- as.Date(balok$Date,format="%d/%m/%Y")
> dfdate <- data.frame(date=realdate)
> year=as.numeric (format(realdate,"%Y"))
> month=as.numeric (format(realdate,"%m"))
> day=as.numeric (format(realdate,"%d"))
> 
> balok2 <-cbind(dfdate,day,month,year,balok[,2:3])
> colnames(balok2)
> head(balok2)
>date day month year Time Rain.mm
> 1 2008-07-30  30 7 2008  9:00:00   0
> 2 2008-07-30  30 7 2008 10:00:00   0
> 3 2008-07-30  30 7 2008 11:00:00   0
> 4 2008-07-30  30 7 2008 12:00:00   0
> 5 2008-07-30  30 7 2008 13:00:00   0
> 6 2008-07-30  30 7 2008 14:00:00   0
> ...
> 
>> balok3 <- balok2[,-1]; head(balok3, n=100)
>day month year Time Rain.mm
> 130 7 2008  9:00:00   0
> 230 7 2008 10:00:00   0
> 330 7 2008 11:00:00   0
> 430 7 2008 12:00:00   0
> 530 7 2008 13:00:00   0
> 630 7 2008 14:00:00   0
> 730 7 2008 15:00:00   0
> 830 7 2008 16:00:00   0
> 930 7 2008 17:00:00   0
> 10   30 7 2008 18:00:00   0
> 11   30 7 2008 19:00:00   0
> 12   30 7 2008 20:00:00   0
> 13   30 7 2008 21:00:00   0
> 14   30 7 2008 22:00:00   0
> 15   30 7 2008 23:00:00   0
> 16   30 7 2008 24:00:00   0
> 17   31 7 2008  1:00:00   0
> 18   31 7 2008  2:00:00   0
> 19   31 7 2008  3:00:00   0
> 20   31 7 2008  4:00:00   0
> 21   31 7 2008  5:00:00   0
> 22   31 7 2008  6:00:00   0
> 23   31 7 2008  7:00:00   0
> 24   31 7 2008  8:00:00   0
> 25   31 7 2008  9:00:00   0
> 26   31 7 2008 10:00:00   0
> 27   31 7 2008 11:00:00   0
> 28   31 7 2008 12:00:00   0
> 29   31 7 2008 13:00:00   0
> 30   31 7 2008 14:00:00   0
> 31   31 7 2008 15:00:00   0
> 32   31 7 2008 16:00:00   0
> 33   31 7 2008 17:00:00   0
> 34   31 7 2008 18:00:00   0
> 35   31 7 2008 19:00:00   0
> 36   31 7 2008 20:00:00   0
> 37   31 7 2008 21:00:00   0
> 38   31 7 2008 22:00:00   0
> 39   31 7 2008 23:00:00   0
> 40   31 7 2008 24:00:00   0
> 411 8 2008  1:00:00   0
> 421 8 2008  2:00:00   0
> 431 8 2008  3:00:00   0
> 441 8 2008  4:00:00   0
> 451 8 2008  5:00:00   0
> 461 8 2008  6:00:00   0
> 471 8 2008  7:00:00   0
> 481 8 2008  8:00:00   0
> 491 8 2008  9:00:00   0
> 501 8 2008 10:00:00   0
> 511 8 2008 11:00:00   0
> 521 8 2008 12:00:00   0
> 531 8 2008 13:00:00   0
> 541 8 2008 14:00:00   0
> 551 8 2008 15:00:00   0
> 561 8 2008 16:00:00   0
> 571 8 2008 17:00:00   0
> 581 8 2008 18:00:00   0
> 591 8 2008 19:00:00   0
> 601 8 2008 20:00:00   0
> 611 8 2008 21:00:00   0
> 621 8 2008 22:00:00   0
> 631 8 2008 23:00:00   0
> 641 8 2008 24:00:00   0
> 652 8 2008  1:00:00   0
> 662 8 2008  2:00:00   0
> 672 8 2008  3:00:00   0
> 682 8 2008  4:00:00   0
> 692 8 2008  5:00:00   0
> 702 8 2008  6:00:00   0
> 712 8

Re: [R] Aggregate FIPS data to State and Census divisions

2016-05-01 Thread David Winsemius


> On May 1, 2016, at 9:30 AM, Miluji Sb  wrote:
> 
> Dear Dennis,
> 
> Thank you for your reply. I can use the dplyr/data.table packages to
> aggregate - its the matching FIPS codes to their states that I am having
> trouble. Thanks again.

So post some example code that demonstrate you paid attention to the answer 
given. Both dplyr and data.table not limited for aggregation. They do several 
versions of matching. So does the base function `merge`. We do not yet know 
what sort of efforts you have made. r-help@r-project.org is not an online 
code-writing service.

-- 
David
> 
> Sincerely,
> 
> Milu
> 
> On Sun, May 1, 2016 at 6:20 PM, Dennis Murphy  wrote:
> 
>> Hi:
>> 
>> Several such packages exist. Given the size of your data, it's likely
>> that the dplyr and data.table packages would be worth investigating.
>> Both are well documented.
>> 
>> Dennis
>> 
>> On Sun, May 1, 2016 at 8:30 AM, Miluji Sb  wrote:
>>> Dear all,
>>> 
>>> I have the following data by US FIPS code. Is there a package to
>> aggregate
>>> the data by State and Census divisions?
>>> 
>>> temp <- dput(head(pop1,5))
>>> structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009"
>>> ), death_2050A1 = c(18.19158, 101.63088, 13.18896, 10.30068,
>>> 131.91798), death_2050A2 = c(22.16349, 116.58387, 15.85324, 12.78564,
>>> 155.20506), death_2050B1 = c(21.38906, 76.23018, 21.38218, 17.14269,
>>> 151.64466), death_2050B2 = c(23.43543, 81.39378, 22.96802, 18.76926,
>>> 161.86404), death_2050BC = c(21.89947, 93.88002, 18.60352, 15.1032,
>>> 152.43414)), .Names = c("FIPS", "death_2050A1", "death_2050A2",
>>> "death_2050B1", "death_2050B2", "death_2050BC"), row.names = c(NA,
>>> 5L), class = "data.frame")
>>> 
>>> Thank you!
>>> 
>>> Sincerely,
>>> 
>>> Milu
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate FIPS data to State and Census divisions

2016-05-01 Thread Miluji Sb

Dear Dennis,

Thank you for your reply. I can use the dplyr/data.table packages to
aggregate - its the matching FIPS codes to their states that I am having
trouble. Thanks again.

Sincerely,

Milu

On Sun, May 1, 2016 at 6:20 PM, Dennis Murphy  wrote:

> Hi:
>
> Several such packages exist. Given the size of your data, it's likely
> that the dplyr and data.table packages would be worth investigating.
> Both are well documented.
>
> Dennis
>
> On Sun, May 1, 2016 at 8:30 AM, Miluji Sb  wrote:
> > Dear all,
> >
> > I have the following data by US FIPS code. Is there a package to
> aggregate
> > the data by State and Census divisions?
> >
> > temp <- dput(head(pop1,5))
> > structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009"
> > ), death_2050A1 = c(18.19158, 101.63088, 13.18896, 10.30068,
> > 131.91798), death_2050A2 = c(22.16349, 116.58387, 15.85324, 12.78564,
> > 155.20506), death_2050B1 = c(21.38906, 76.23018, 21.38218, 17.14269,
> > 151.64466), death_2050B2 = c(23.43543, 81.39378, 22.96802, 18.76926,
> > 161.86404), death_2050BC = c(21.89947, 93.88002, 18.60352, 15.1032,
> > 152.43414)), .Names = c("FIPS", "death_2050A1", "death_2050A2",
> > "death_2050B1", "death_2050B2", "death_2050BC"), row.names = c(NA,
> > 5L), class = "data.frame")
> >
> > Thank you!
> >
> > Sincerely,
> >
> > Milu
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate combination data

2016-04-15 Thread ruipbarradas

Hello,

I'm cc'ing R-Help.

Sorry but your question was asked 3.5 years ago, I really don't  
remember it. Can you please post a question to R-Help, with a  
reproducible example that describes your problem?

Rui Barradas
 

Citando catalin roibu :

> Dear Rui,
>  
> I helped me some time ago with a code. regarding aggregated data  
> from combination values. I solved partial the problem...I have one  
> single question. From combination I have a number of data frames. I  
> want for each combination to insert a column with combination ID (C1  
> for first iteration to Cn for last one)? Is there a possibility to  
> do that?
>  
> Thank you very much!
>  
> best regards!
>  
> Catalin
>   On 15 November 2012 at 13:29, Rui Barradas  wrote:
>> Hello,
>>
>> Sorry but now I don't understand, what you are saying is that you  
>> want all the results in just one df?
>>
>> all <- do.call(rbind, result)
>>
>> But this creates just one very large df.
>>
>> Hope this helps,
>>
>> Rui Barradas Em 15-11-2012 10:42, catalin roibu escreveu:
>>> Hello again,
>>> I solve that problem.
>>> But I have another one. I want my result is this form:
>>> plot d
>>> 1 15.00
>>> 1 27.50
>>> 1 10.50
>>> 1 12.25
>>> 2 14.00
>>> 2 32.50
>>> …
>>> 99 32.00
>>> 99 42.00
>>> 100 57.00
>>> 100 16.00
>>> 100 8.00
>>> 100 56.00
>>>
>>> in final values of d for all combination possible.
>>>
>>> Thank you very much!
>>>
>>> On 15 November 2012 12:10, catalin roibu  wrote:
>>>  
 Hello again,
 Wen I want to show all combination 100C3, I have this problem:
   [ reached getOption("max.print") -- omitted 2498631 rows ]
 How can you do?

 On 14 November 2012 19:28, Rui Barradas  wrote:
  
> Simple
> Just use unlist(result).
>
> Hope this helps,
>
> Rui Barradas
> Em 14-11-2012 17:13, catalin roibu escreveu:
>  
>> hello again,
>> It's ok now, but I have a little problem. I want to remove the
>> combination
>> number (1 to 4950). In this mode the all data are continuous. Thank you!
>> *[[4271]]*
>>
>>       plot     d
>> 218   74 11.50
>> 219   74 12.00
>> 220   74 10.50
>> 221   74 80.75
>> 251   87 15.25
>> 252   87 93.50
>> 253   87 14.50
>> 254   87 83.75
>> 255   87  9.75
>> 256   87 95.00
>>
>> *[[4272]]*
>>
>>       plot     d
>> 218   74 11.50
>> 219   74 12.00
>> 220   74 10.50
>> 221   74 80.75
>> 257   88 13.50
>> 258   88 16.25
>> 259   88  8.50
>> 260   88  8.50
>>
>> On 14 November 2012 18:56, catalin roibu  wrote:
>>
>>   thank you very much!
>>> On 14 November 2012 18:47, Rui Barradas  wrote:
>>>
>>>   Hello,
 Ok, I think this is it.

 fun <- function(x, k){
       n <- length(x)
       cmb <- combn(n, k)
       apply(cmb, 2, function(j) x[j])
 }

 fun2 <- function(x, p){
       idx <- x[["plot"]] %in% p
       x[idx, ]
 }

 uplot <- unique(dat$plot)
 plots <- fun(uplot, 2)
 apply(plots, 2, function(p) fun2(dat, p))

 There's a total of 4560 df's returned.

 Rui Barradas
 Em 14-11-2012 14:22, catalin roibu escreveu:

    Hello again,
  
> I want all d values for all posible combination, 100C2 (all d values
> for
> plot 1 with all d values in the plot 2...all d values from plot 1
> with
> all d values from plot 100, ..all d values from plot 99  
> with all d
> values from plot 100). Total 4950 values
> structure(list(plot = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L,
> 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 9L, 9L,
> 10L, 10L, 10L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L,
> 13L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L,
> 16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 19L, 19L, 19L, 19L,
> 20L, 20L, 20L, 21L, 22L, 22L, 22L, 23L, 23L, 23L, 23L, 23L, 24L,
> 24L, 25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 27L, 27L, 27L,
> 27L, 27L, 28L, 28L, 28L, 28L, 29L, 29L, 29L, 30L, 30L, 30L, 30L,
> 32L, 32L, 32L, 32L, 33L, 34L, 34L, 34L, 35L, 36L, 36L, 36L, 36L,
> 37L, 37L, 37L, 38L, 38L, 38L, 38L, 38L, 38L, 39L, 39L, 39L, 39L,
> 39L, 39L, 40L, 40L, 40L, 41L, 41L, 42L, 42L, 42L, 42L, 42L, 42L,
> 42L, 43L, 44L, 44L, 44L, 45L, 45L, 46L, 46L, 47L, 48L, 48L, 48L,
> 49L, 50L, 50L, 50L, 50L, 50L, 50L, 51L, 51L, 52L, 52L, 53L, 53L,
> 53L, 54L, 54L, 54L, 54L, 55L, 56L, 56L, 57L, 57L, 57L, 58L, 58L,
> 58L, 59L, 60L, 60L, 60L, 61L, 61L, 62L, 62L, 63L, 63L, 64L, 64L,
> 64L, 65L, 65L, 66L, 66L, 66L,

Re: [R] aggregate and the $ operator

2016-01-22 Thread Ed Siefker

So that's how that works!  Thanks.

On Fri, Jan 22, 2016 at 1:32 PM, Joe Ceradini  wrote:
> Does this do what you want?
>
> aggregate(Nuclei ~ Slide, example, sum)
>
> On Fri, Jan 22, 2016 at 12:20 PM, Ed Siefker  wrote:
>>
>> Aggregate does the right thing with column names when passing it
>> numerical coordinates.
>> Given a dataframe like this:
>>
>>   Nuclei Positive Nuclei Slide
>> 1133  96A1
>> 2 96  70A1
>> 3 62  52A2
>> 4 60  50A2
>>
>> I can call 'aggregate' like this:
>>
>> > aggregate(example[1], by=example[3], sum)
>>   Slide Nuclei
>> 1A1229
>> 2A2122
>>
>> But that means I have to keep track of which column is which number.
>> If I try it the
>> easy way, it doesn't keep track of column names and it forces me to
>> coerce the 'by'
>> to a list.
>>
>> > aggregate(example$Nuclei, by=list(example$Slide), sum)
>>   Group.1   x
>> 1  A1 229
>> 2  A2 122
>>
>> Is there a better way to do this?  Thanks
>> -Ed
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Cooperative Fish and Wildlife Research Unit
> Zoology and Physiology Dept.
> University of Wyoming
> joecerad...@gmail.com / 914.707.8506
> wyocoopunit.org
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and the $ operator

2016-01-22 Thread William Dunlap via R-help

Using column names where you used column numbers would work:

example <- data.frame(
check.names = FALSE,
Nuclei = c(133L, 96L, 62L, 60L),
`Positive Nuclei` = c(96L, 70L, 52L, 50L),
Slide = factor(c("A1", "A1", "A2", "A2"), levels = c("A1", "A2")))
aggregate(example["Nuclei"], by=example["Slide"], sum)
#  Slide Nuclei
#1A1229
#2A2122
aggregate(example[1], by=example[3], sum)
#  Slide Nuclei
#1A1229
#2A2122

Many people find that the functions in the dplyr or plyr packages
are worth the trouble to learn about.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Jan 22, 2016 at 11:20 AM, Ed Siefker  wrote:

> Aggregate does the right thing with column names when passing it
> numerical coordinates.
> Given a dataframe like this:
>
>   Nuclei Positive Nuclei Slide
> 1133  96A1
> 2 96  70A1
> 3 62  52A2
> 4 60  50A2
>
> I can call 'aggregate' like this:
>
> > aggregate(example[1], by=example[3], sum)
>   Slide Nuclei
> 1A1229
> 2A2122
>
> But that means I have to keep track of which column is which number.
> If I try it the
> easy way, it doesn't keep track of column names and it forces me to
> coerce the 'by'
> to a list.
>
> > aggregate(example$Nuclei, by=list(example$Slide), sum)
>   Group.1   x
> 1  A1 229
> 2  A2 122
>
> Is there a better way to do this?  Thanks
> -Ed
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and the $ operator

2016-01-22 Thread Joe Ceradini

Does this do what you want?

aggregate(Nuclei ~ Slide, example, sum)

On Fri, Jan 22, 2016 at 12:20 PM, Ed Siefker  wrote:

> Aggregate does the right thing with column names when passing it
> numerical coordinates.
> Given a dataframe like this:
>
>   Nuclei Positive Nuclei Slide
> 1133  96A1
> 2 96  70A1
> 3 62  52A2
> 4 60  50A2
>
> I can call 'aggregate' like this:
>
> > aggregate(example[1], by=example[3], sum)
>   Slide Nuclei
> 1A1229
> 2A2122
>
> But that means I have to keep track of which column is which number.
> If I try it the
> easy way, it doesn't keep track of column names and it forces me to
> coerce the 'by'
> to a list.
>
> > aggregate(example$Nuclei, by=list(example$Slide), sum)
>   Group.1   x
> 1  A1 229
> 2  A2 122
>
> Is there a better way to do this?  Thanks
> -Ed
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Cooperative Fish and Wildlife Research Unit
Zoology and Physiology Dept.
University of Wyoming
joecerad...@gmail.com / 914.707.8506
wyocoopunit.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and the $ operator

2016-01-22 Thread David Wolfskill

On Fri, Jan 22, 2016 at 01:20:59PM -0600, Ed Siefker wrote:
> Aggregate does the right thing with column names when passing it
> numerical coordinates.
> Given a dataframe like this:
> 
>   Nuclei Positive Nuclei Slide
> 1133  96A1
> 2 96  70A1
> 3 62  52A2
> 4 60  50A2
> 
> I can call 'aggregate' like this:
> 
> > aggregate(example[1], by=example[3], sum)
>   Slide Nuclei
> 1A1229
> 2A2122
> 
> But that means I have to keep track of which column is which number.
> If I try it the
> easy way, it doesn't keep track of column names and it forces me to
> coerce the 'by'
> to a list.
> 
> > aggregate(example$Nuclei, by=list(example$Slide), sum)
>   Group.1   x
> 1  A1 229
> 2  A2 122
> 
> Is there a better way to do this?  Thanks
> -Ed
> 

Something like:

> aggregate(Nuclei ~ Slide, example, sum)
  Slide Nuclei
1A1229
2A2122
> 

perhaps?

Peace,
david
-- 
David H. Wolfskill  r...@catwhisker.org
Those who would murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate records to 10min

2016-01-17 Thread Fankhauser GEP Data Consulting


Hi Jim,
Thanks a lot! It works now. I didn't remember how to access the 
datetimes in w10min. names(...) is the solution!


Rolf

Jim Lemon wrote:

Hi Rolf,
If I get the above, perhaps if you change the names of w10min after 
applying the calculation:


raindata<-data.frame(value=round(runif(60,0,4),1),
ptime=paste("2016-01-17 ","15:",0:59,sep=""))
t10min <- 600*floor(as.integer(as.POSIXct(raindata$ptime))/600)
w10min <- tapply(raindata$value,t10min,sum)
names(w10min)<-format(as.POSIXct(as.numeric(names(w10min)),
 tz="AEST",origin="1970-01-01"),"%m/%d/%Y %H:%M")

Jim


On Sun, Jan 17, 2016 at 5:45 AM, Rolf Fankhauser 
> wrote:


Hi

I would like to aggregate a rainfall series with 1min records
(timestamp and value of 0.1mm from a tipping bucket raingauge) to
10min values by summing up the values.

# ptime is a POSIXlt datetime value with tz="GMT"

t10min <- 600*floor(as.integer(as.POSIXct(data$ptime))/600)
w10min <- tapply(data$value, format(as.POSIXct(t10min, tz="GMT",
origin = "1970-01-01"), "%Y-%m-%d %H:%M"), sum)
write.table(as.matrix(w10min),"data 10min.txt", row.names=TRUE,
col.names=FALSE, quote=FALSE)

This code works but I would like to have the result in datetime
format of %m/%d/%Y %H:%M. When I output this format the records
are not chronologically sorted but text-sorted because dimnames of
w10min is of type character (because of the format function).
Is there an easier way summing up the records to 10min records?

Thanks,
Rolf

__
R-help@r-project.org  mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
__

Fankhauser GEP Data Consulting
Hegenheimerstrasse 129
4055 Basel

Tel:++41-(0)61-321-4525
Mobile: ++41-(0)79-440-7706
rolf.fankhau...@gepdata.ch
www.gepdata.ch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate counting variable factors

2015-09-17 Thread PIKAL Petr

Hi

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Kai Mx
> Sent: Wednesday, September 16, 2015 10:43 PM
> To: r-help mailing list
> Subject: [R] aggregate counting variable factors
>
> Hi everybody,
>
> >From a questionnaire, I have a dataset  like this one with some 40
> items:
>
> df1 <- data.frame(subject=c('user1','user2', 'user3', 'user4'),
> item1=c(0,1,2,5), item2=c(1,2,1,2), item3=c(2,3,4,0), item4=c(0,3,3,2),
> item5=c(5,5,5,5))
>
> Users can choose an answer from 0 to 5 for each item.
>
> Now I want to reshape the dataset to have the items in rows and the
> count
> of each of the result factors in columns:
>
> result <- data.frame (item=c("item1", "item2", "item3", "item4",
> "item5"),
> result0=c(1,0,1,1,0), result1=c(1,2,0,0,0), result2=c(1,2,1,1,0),
> result3=c(0,0,1,2,0), result4=c(0,0,1,0,0), result5=c(1,0,0,0,4))
>
> I have been fiddling around with melt/plyr, but haven't been able to
> figure
> it out. What's the most elegant way to do this (preferably without
> typing
> in all the item names).

Perhaps,

m<-melt(df1)
m$value<-paste("res",m$value, sep="")
dcast(m, variable~value)
Aggregation function missing: defaulting to length
  variable res0 res1 res2 res3 res4 res5
1item1111001
2item2022000
3item3101110
4item4101200
5item5000004

Cheers
Petr


>
> Thanks so much!
>
> Best,
>
> Kai
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

Re: [R] aggregate counting variable factors

2015-09-17 Thread Frank Schwidom

Hi

where can i find 'melt' and 'dcast' ?

Regards


On Thu, Sep 17, 2015 at 08:22:10AM +, PIKAL Petr wrote:
> Hi
> 
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Kai Mx
> > Sent: Wednesday, September 16, 2015 10:43 PM
> > To: r-help mailing list
> > Subject: [R] aggregate counting variable factors
> >
> > Hi everybody,
> >
> > >From a questionnaire, I have a dataset  like this one with some 40
> > items:
> >
> > df1 <- data.frame(subject=c('user1','user2', 'user3', 'user4'),
> > item1=c(0,1,2,5), item2=c(1,2,1,2), item3=c(2,3,4,0), item4=c(0,3,3,2),
> > item5=c(5,5,5,5))
> >
> > Users can choose an answer from 0 to 5 for each item.
> >
> > Now I want to reshape the dataset to have the items in rows and the
> > count
> > of each of the result factors in columns:
> >
> > result <- data.frame (item=c("item1", "item2", "item3", "item4",
> > "item5"),
> > result0=c(1,0,1,1,0), result1=c(1,2,0,0,0), result2=c(1,2,1,1,0),
> > result3=c(0,0,1,2,0), result4=c(0,0,1,0,0), result5=c(1,0,0,0,4))
> >
> > I have been fiddling around with melt/plyr, but haven't been able to
> > figure
> > it out. What's the most elegant way to do this (preferably without
> > typing
> > in all the item names).
> 
> Perhaps,
> 
> m<-melt(df1)
> m$value<-paste("res",m$value, sep="")
> dcast(m, variable~value)
> Aggregation function missing: defaulting to length
>   variable res0 res1 res2 res3 res4 res5
> 1item1111001
> 2item2022000
> 3item3101110
> 4item4101200
> 5item5000004
> 
> Cheers
> Petr
> 
> 
> >
> > Thanks so much!
> >
> > Best,
> >
> > Kai
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
> určeny pouze jeho adresátům.
> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
> jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
> svého systému.
> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
> jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
> zpožděním přenosu e-mailu.
> 
> V případě, že je tento e-mail součástí obchodního jednání:
> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, 
> a to z jakéhokoliv důvodu i bez uvedení důvodu.
> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
> Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany 
> příjemce s dodatkem či odchylkou.
> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
> dosažením shody na všech jejích náležitostech.
> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
> žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
> pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu 
> případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je 
> adresátovi či osobě jím zastoupené známá.
> 
> This e-mail and any documents attached to it may be confidential and are 
> intended only for its intended recipients.
> If you received this e-mail by mistake, please immediately inform its sender. 
> Delete the contents of this e-mail with all attachments and its copies from 
> your system.
> If you are not the intended recipient of this e-mail, you are not authorized 
> to use, disseminate, copy or disclose this e-mail in any manner.
> The sender of this e-mail shall not be liable for any possible damage caused 
> by modifications of the e-mail or by delay with transfer of the email.
> 
> In case that this e-mail forms part of business dealings:
> - the sender reserves the right to end negotiations about entering into a 
> contract in any time, for any reason, and without stating any reasoning.
> - if the e-mail contains an offer, the recipient is entitled to immediately 
> accept such offer; The sender of this e-mail (offer) excludes any acceptance 
> of the offer on the part of the recipient containing any amendment or 
> variation.
> - the sender insists on that the respective contract is concluded only upon 
> an express mutual agreement on all its aspects.
> - the sender of this e-mail informs that he/she is not authorized to enter 
> into any contracts on behalf of the company except for cases in which he/she 
> is expressly authorized to do so in writing, and such authorization or power 
> of attorney is

Re: [R] aggregate counting variable factors

2015-09-17 Thread Frank Schwidom

Hi

res <- sapply( df1[ , -1], function( x) table(x)[as.character( 0:5)])
rownames( res) <- paste( sep='', 'result', 0:5)
res[ is.na( res)] <- 0

res
item1 item2 item3 item4 item5
result0 1 0 1 1 0
result1 1 2 0 0 0
result2 1 2 1 1 0
result3 0 0 1 2 0
result4 0 0 1 0 0
result5 1 0 0 0 4


t( res)
  result0 result1 result2 result3 result4 result5
item1   1   1   1   0   0   1
item2   0   2   2   0   0   0
item3   1   0   1   1   1   0
item4   1   0   1   2   0   0
item5   0   0   0   0   0   4


Regards


On Wed, Sep 16, 2015 at 10:43:16PM +0200, Kai Mx wrote:
> Hi everybody,
> 
> >From a questionnaire, I have a dataset  like this one with some 40 items:
> 
> df1 <- data.frame(subject=c('user1','user2', 'user3', 'user4'),
> item1=c(0,1,2,5), item2=c(1,2,1,2), item3=c(2,3,4,0), item4=c(0,3,3,2),
> item5=c(5,5,5,5))
> 
> Users can choose an answer from 0 to 5 for each item.
> 
> Now I want to reshape the dataset to have the items in rows and the count
> of each of the result factors in columns:
> 
> result <- data.frame (item=c("item1", "item2", "item3", "item4", "item5"),
> result0=c(1,0,1,1,0), result1=c(1,2,0,0,0), result2=c(1,2,1,1,0),
> result3=c(0,0,1,2,0), result4=c(0,0,1,0,0), result5=c(1,0,0,0,4))
> 
> I have been fiddling around with melt/plyr, but haven't been able to figure
> it out. What's the most elegant way to do this (preferably without typing
> in all the item names).
> 
> Thanks so much!
> 
> Best,
> 
> Kai
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate counting variable factors

2015-09-17 Thread Rui Barradas


In package reshape2

Hope this helps,

Rui Barradas

Em 17-09-2015 17:03, Frank Schwidom escreveu:

Hi

where can i find 'melt' and 'dcast' ?

Regards


On Thu, Sep 17, 2015 at 08:22:10AM +, PIKAL Petr wrote:

Hi


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Kai Mx
Sent: Wednesday, September 16, 2015 10:43 PM
To: r-help mailing list
Subject: [R] aggregate counting variable factors

Hi everybody,

>From a questionnaire, I have a dataset  like this one with some 40
items:

df1 <- data.frame(subject=c('user1','user2', 'user3', 'user4'),
item1=c(0,1,2,5), item2=c(1,2,1,2), item3=c(2,3,4,0), item4=c(0,3,3,2),
item5=c(5,5,5,5))

Users can choose an answer from 0 to 5 for each item.

Now I want to reshape the dataset to have the items in rows and the
count
of each of the result factors in columns:

result <- data.frame (item=c("item1", "item2", "item3", "item4",
"item5"),
result0=c(1,0,1,1,0), result1=c(1,2,0,0,0), result2=c(1,2,1,1,0),
result3=c(0,0,1,2,0), result4=c(0,0,1,0,0), result5=c(1,0,0,0,4))

I have been fiddling around with melt/plyr, but haven't been able to
figure
it out. What's the most elegant way to do this (preferably without
typing
in all the item names).


Perhaps,

m<-melt(df1)
m$value<-paste("res",m$value, sep="")
dcast(m, variable~value)
Aggregation function missing: defaulting to length
   variable res0 res1 res2 res3 res4 res5
1item1111001
2item2022000
3item3101110
4item4101200
5item5000004

Cheers
Petr




Thanks so much!

Best,

Kai

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.



Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.

Re: [R] aggregate counting variable factors

2015-09-17 Thread Kai Mx

Thanks everybody!

On Thu, Sep 17, 2015 at 6:57 PM, Rui Barradas  wrote:

> In package reshape2
>
> Hope this helps,
>
> Rui Barradas
>
>
> Em 17-09-2015 17:03, Frank Schwidom escreveu:
>
>> Hi
>>
>> where can i find 'melt' and 'dcast' ?
>>
>> Regards
>>
>>
>> On Thu, Sep 17, 2015 at 08:22:10AM +, PIKAL Petr wrote:
>>
>>> Hi
>>>
>>> -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Kai Mx
 Sent: Wednesday, September 16, 2015 10:43 PM
 To: r-help mailing list
 Subject: [R] aggregate counting variable factors

 Hi everybody,

 >From a questionnaire, I have a dataset  like this one with some 40
 items:

 df1 <- data.frame(subject=c('user1','user2', 'user3', 'user4'),
 item1=c(0,1,2,5), item2=c(1,2,1,2), item3=c(2,3,4,0), item4=c(0,3,3,2),
 item5=c(5,5,5,5))

 Users can choose an answer from 0 to 5 for each item.

 Now I want to reshape the dataset to have the items in rows and the
 count
 of each of the result factors in columns:

 result <- data.frame (item=c("item1", "item2", "item3", "item4",
 "item5"),
 result0=c(1,0,1,1,0), result1=c(1,2,0,0,0), result2=c(1,2,1,1,0),
 result3=c(0,0,1,2,0), result4=c(0,0,1,0,0), result5=c(1,0,0,0,4))

 I have been fiddling around with melt/plyr, but haven't been able to
 figure
 it out. What's the most elegant way to do this (preferably without
 typing
 in all the item names).

>>>
>>> Perhaps,
>>>
>>> m<-melt(df1)
>>> m$value<-paste("res",m$value, sep="")
>>> dcast(m, variable~value)
>>> Aggregation function missing: defaulting to length
>>>variable res0 res1 res2 res3 res4 res5
>>> 1item1111001
>>> 2item2022000
>>> 3item3101110
>>> 4item4101200
>>> 5item5000004
>>>
>>> Cheers
>>> Petr
>>>
>>>
>>>
 Thanks so much!

 Best,

 Kai

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

>>>
>>> 
>>> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
>>> určeny pouze jeho adresátům.
>>> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
>>> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie
>>> vymažte ze svého systému.
>>> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
>>> email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
>>> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou
>>> modifikacemi či zpožděním přenosu e-mailu.
>>>
>>> V případě, že je tento e-mail součástí obchodního jednání:
>>> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
>>> smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
>>> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
>>> přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze
>>> strany příjemce s dodatkem či odchylkou.
>>> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
>>> výslovným dosažením shody na všech jejích náležitostech.
>>> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
>>> společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn
>>> nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto
>>> emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich
>>> existence je adresátovi či osobě jím zastoupené známá.
>>>
>>> This e-mail and any documents attached to it may be confidential and are
>>> intended only for its intended recipients.
>>> If you received this e-mail by mistake, please immediately inform its
>>> sender. Delete the contents of this e-mail with all attachments and its
>>> copies from your system.
>>> If you are not the intended recipient of this e-mail, you are not
>>> authorized to use, disseminate, copy or disclose this e-mail in any manner.
>>> The sender of this e-mail shall not be liable for any possible damage
>>> caused by modifications of the e-mail or by delay with transfer of the
>>> email.
>>>
>>> In case that this e-mail forms part of business dealings:
>>> - the sender reserves the right to end negotiations about entering into
>>> a contract in any time, for any reason, and without stating any reasoning.
>>> - if the e-mail contains an offer, the recipient is entitled to
>>> immediately accept such offer; The sender of this e-mail (offer) excludes
>>> any acceptance of the offer on the part of the recipient containing any
>>> amendment or

Re: [R] aggregate help

2014-04-23 Thread arun



Hi,
Please use ?dput() to show the datasets as one of the rows (Id four) in first 
dataset didn't show 11 elements.

df1 - structure(list(Id = c(one, one, two, two, three, three, 
three, four, five, five), col1 = c(a1, NA, b1, b1, 
NA, NA, c1, d1, e1, NA), col2 = c(a2, NA, b2, b2, 
c2, NA, c2, D2, e2, e2), col3 = c(a3, a3, b3, 
b3, c3, c3, c3, d3, E3, e3), col4 = c(a4, a4, 
B4, b4, c4, c4, c4, d4, e4, E4), col5 = c(a5, 
a5, b5, b5, C5, c5, c5, d5, e5, e5), col6 = c(A6, 
a6, b6, B6, c6, c6, C6, d6, e6, e6), col7 = c(a7, 
A7, b7, b7, c7, c7, c7, NA, e7, e7), col8 = c(a8, 
a8, b8, b8, c8, c8, c8, NA, e8, e8), col9 = c(a9, 
a9, b9, b9, c9, C9, NA, , e9, e9), col10 = c(NA, 
a10, b10, b10, NA, c10, NA, , NA, e10)), .Names = c(Id, 
col1, col2, col3, col4, col5, col6, col7, col8, 
col9, col10), class = data.frame, row.names = c(NA, -10L
))


df2 - structure(list(Id = c(one, one, two, two, three, three, 
three, four, five, five), colnew = c(A6, A7, B4, 
B6, C5, C9, C6, D2, E3, E4)), .Names = c(Id, 
colnew), class = data.frame, row.names = c(NA, -10L))

#expected result

res - structure(list(Id = c(one, two, three, four, five), 
    col1 = c(a1, b1, c1, d1, e1), col2 = c(a2, b2, 
    c2, D2, e2), col3 = c(a3, b3, c3, d3, E3), 
    col4 = c(a4, B4, c4, d4, E4), col5 = c(a5, b5, 
    C5, d5, e5), col6 = c(A6, B6, C6, d6, e6), 
    col7 = c(A7, b7, c7, NA, e7), col8 = c(a8, b8, 
    c8, NA, e8), col9 = c(a9, b9, C9, , e9), col10 = c(a10, 
    b10, c10, , e10)), .Names = c(Id, col1, col2, 
col3, col4, col5, col6, col7, col8, col9, col10
), class = data.frame, row.names = c(NA, -5L))


##there would be simple ways to perform this operation.

res1 - as.data.frame(t(sapply(split(df1, df1$Id), function(x) {
    x1 - x[, -1]
    c(Id = unique(x[, 1]), apply(x1, 2, function(y) {
    y1 - unique(y[!is.na(y)])
    y2 - if (length(y1) == 0) NA else y1
    if (any(y2 %in% df2$colnew)) unique(toupper(y2)) else y2
    }))
})), stringsAsFactors = FALSE)
res1 - res1[order(gsub(\\d+, , res1$col1)), ]
row.names(res1) - 1:nrow(res1)
identical(res, res1)
# [1] TRUE


A.K.




I am stuck in a situation and seek urgent help!.

I have a DF something like this;

Id    col1  col2  col3  col4  col5  col6   col7   col8  col9  col10
one  a1    a2    a3 a4    a5    A6 a7 a8    a9    NA
one  NA    NA    a3 a4    a5    a6 A7 a8    a9    a10
two  b1    b2 b3 B4    b5    b6 b7    b8 b9   b10
two  b1    b2 b3 b4    b5    B6 b7    b8 b9   b10
three    NA   c2  c3    c4    C5    c6 c7    c8 c9   NA
three    NA    NA    c3    c4    c5    c6 c7    c8 C9   c10
three    c1    c2 c3    c4    c5    C6 c7    c8    NA    NA
four  d1    D2 d3 d4    d5    d6   NA   NA  
five  e1    e2    E3 e4 e5 e6    e7   e8 e9   NA
five  NA    e2    e3 E4 e5 e6    e7   e8 e9   e10


* each row is different and some has NA.
* the capital letters in some cells are key values which will be useful for 
further analysis

I have another DF which has only the key values

Id colnew    
one A6    
one A7
two B4
two B6
three   C5
three   C9
three   C6
four D2  
five E3
five E4


Now,
I need to aggregate the first DF  based on ID values to get unique entries 
for each ID so that the output should look like the below

Id    col1  col2  col3  col4  col5  col6   col7   col8  col9  col10
one  a1    a2    a3 a4    a5    A6 A7 a8    a9    a10
two  b1    b2 b3 B4    b5    B6 b7    b8 b9   b10
three    c1   c2  c3    c4    C5    C6 c7    c8 C9   c10
four  d1    D2 d3 d4    d5    d6   NA   NA  
five  e1    e2    E3 E4 e5 e6    e7   e8 e9   e10


Thanks for the help
Regards,
karthick 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate time series from daily to monthly by date and site

2014-04-05 Thread Rui Barradas


Hello,

Maybe the following will do.

dat - structure(...)

aggregate(dat[5:8], dat[c(1, 2, 4)], FUN = mean)


Hope this helps,

Rui Barradas

Em 05-04-2014 06:37, Zilefac Elvis escreveu:

Hi,

I have daily data arranged by date and site. Keeping the number of columns as 
there are, I will like to aggregate (FUN=mean) from daily to monthly the 
following data (only part is shown here) which starts in 1971 and ends in 1980.


structure(list(Year = c(1971, 1971, 1971, 1971, 1971, 1971, 1971,
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
1971, 1971, 1971, 1971, 1971), Month = c(1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1), Day = c(1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), Site = c(GGG1, GGG2, GGG3, GGG4,
GGG5, GGG6, GGG7, GGG8, GGG9, GG10, GG11, GG12,
GG13, GG14, GG15, GG16, GG17, GG18, GG19, GG20,
GG21, GG22, GG23, GG24, GG25, GG26, GG27, GG28,
GG29, GG30, GG31, GG32, GG33, GG34, GG35, GG36,
GG37, GG38, GG39, GG40, GG41, GG42, GG43, GG44,
GG45, GG46, GG47, GG48, GG49, GG50, GG51, GG52,
GG53, GG54, GG55, GG56, GG57, GG58, GG59, GG60,
GG61, GG62, GG63, GG64, GG65, GG66, GG67, GG68,
GG69, GG70, GG71, GG72, GG73, GG74, GG75, GG76,
GG77, GG78, GG79, GG80, GG81, GG82, GG83, GG84,
GG85, GG86, GG87, GG88, GG89, GG90, GG91, GG92,
GG93, GG94, GG95, GG96, GG97, GG98, GG99, G100
), Sim001 = c(8.58, 11.82, 7.72, 8.93, 9.82, 13.93, 10.94, 5.07,
11.13, 7.66, 1.06, 14.93, 7.56, 6.18, 10.96, 4.94, 0.79, 4.91,
0.68, 4.5, 3.13, 2.91, -2.27, 8.18, 3.87, 5.1, 7.23, 8.55, 1.39,
6.85, 4.25, -0.24, 9.86, 9.92, 5.11, 6.14, 8.14, 7.52, 4.19,
1.02, -2.43, 6.85, 4.94, 8.11, 7.34, 10.56, -6.52, 2.66, -4.89,
0.11, 3.9, 3.87, 2.55, -0.2, 5.09, -2.4, 1.32, 3.49, 4.93, -9.58,
9.08, 2.85, 7.71, 4.5, 0.99, 8.72, 12.45, 8.83, 8.94, 5.05, 9.82,
11.72, 1.21, 7.92, 3.45, 1.34, 8.25, 2.92, 2.1, -3.19, 4.75,
-0.3, 1.69, 3.31, 5.18, 2.43, 3.02, -5.31, -6.7, -5.9, -4.73,
-8.13, -7.67, 3.73, -2.4, 1.46, -4.71, 0.33, -3.11, 2.45), Sim002 = c(-12.67,
-9.94, -11.94, -10.81, -6.78, -3.39, -7.58, -16.09, -9.96, -8.68,
6.14, -12.43, -10.81, -7.58, -0.62, 1.52, 0.83, -3.29, -0.5,
1.14, 3.35, 1.1, -5.25, 2, -4.09, 2.28, 2.46, 5.64, 4.64, -12.11,
-2.21, -16.54, -7.38, -16.33, -7.63, -11.61, -12.78, -12.86,
-9.99, 4.84, -13.75, -12.33, -13.43, -21.13, -12.63, -3.16, 1.61,
-2.36, 5.54, 3.59, 1.23, -0.25, -1.49, -1.65, 0.9, -2.29, -2.35,
-2.01, -2.07, 1.37, -0.64, 4.75, 3.02, 5.44, 3.76, 5.16, 2.54,
4.11, 5.03, -1.12, 4.53, -0.15, -4.98, 0.85, -3.04, -0.06, -3.01,
-1.1, 4.02, 1.77, -3.36, 1.56, -1.63, 1.12, -2.39, -2.05, 4.51,
1.52, -0.61, 2.54, 2.88, 6.79, 5.5, -2.36, 4.18, -0.13, 5.68,
1.82, 3.21, 0.21), Sim003 = c(4.45, -3.37, -1.17, 4.66, -4.19,
-3.84, 1.74, 1.26, -7.06, -2.65, -4.88, -4.06, -2.32, -1.64,
0.56, 0.31, -0.35, -5.69, 3.35, 4.84, 3.62, 0.77, -3.05, -0.42,
-2.55, 1.34, 2.89, 3.09, 9.79, 4.98, 9.59, 4.99, 11.77, 13.07,
0.41, 10.38, 11.01, 3.43, 6.08, 0.55, 6.49, 2.85, 11.17, 11.32,
-0.89, -0.48, 10.8, 1.86, 6.63, 5.14, 4.13, 8.72, 15.01, 4.78,
5.56, 5.69, 10.39, 8.99, 6.95, 10.59, 3.92, 3.4, 3.95, 0.87,
2.28, 1.11, 4.36, 6.21, 1.73, 2.5, 4.19, 1.85, 8.65, 6.24, 7.82,
7.43, 5.19, -1.71, -3.16, -2.66, -7, -2.08, 0.36, 8.61, 3.22,
7.99, -1.19, 11.38, 10.2, 8.87, 7.23, 8.07, 2.77, 9.61, -1.1,
-2.05, 6.39, 6.6, -2.89, -6.41), Sim004 = c(-1.31, 4.94, 4.7,
2.88, -0.01, 1.83, -7.51, 1.12, 5.25, 5.25, 3.78, 4.94, 2.32,
9.83, -1.59, 6.45, 4.26, 3.1, 5.5, 6.94, 2.76, 5.1, 1.95, 15.13,
-9.18, 2.88, 4.28, -5.01, -0.27, 1.91, -1.46, -0.6, -8.99, -8.79,
-3.09, -7.09, -5.2, -7.55, -4.04, -3.8, -10.66, -6.34, -3.62,
-8.49, -2.29, 0.38, 5.25, 8.6, 5.83, 8.94, 9.86, 4.62, 4.33,
10.15, 7.87, 9.07, 0.04, 2.85, 6, 4.54, 13.5, 12.39, 11.79, 6.29,
15.45, 15.82, 19.79, 13.12, 6.5, -4.63, 1.79, -0.8, 1.29, 8.88,
1.28, 6.55, 5.78, 5.46, 2.83, 8, 6.25, 4.94, 5.01, 5.32, 2.95,
7.46, 5.71, -3.51, 3.51, 9.46, 8.55, 6.71, 7.36, 10.96, 3.47,
-1.99, 5.75, 1.56, -4.38, 0.67)), .Names = c(Year, Month,
Day, Site, Sim001, Sim002, Sim003, Sim004), row.names = c(NA,
100L), class = data.frame)

Thanks for your

Re: [R] Aggregate time series from daily to monthly by date and site

2014-04-05 Thread Zilefac Elvis

Thanks, Rui.
It works great.
Atem.
On Saturday, April 5, 2014 4:46 AM, Rui Barradas ruipbarra...@sapo.pt wrote:
 
Hello,

Maybe the following will do.

dat - structure(...)

aggregate(dat[5:8], dat[c(1, 2, 4)], FUN = mean)


Hope this helps,

Rui Barradas


Em 05-04-2014 06:37, Zilefac Elvis escreveu:
 Hi,

 I have daily data arranged by date and site. Keeping the number of columns as 
 there are, I will like to aggregate (FUN=mean) from daily to monthly the 
 following data (only part is shown here) which starts in 1971 and ends in 
 1980.


 structure(list(Year = c(1971, 1971, 1971, 1971, 1971, 1971, 1971,
 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971,
 1971, 1971, 1971, 1971, 1971), Month = c(1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1), Day = c(1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1), Site = c(GGG1, GGG2, GGG3, GGG4,
 GGG5, GGG6, GGG7, GGG8, GGG9, GG10, GG11, GG12,
 GG13, GG14, GG15, GG16, GG17, GG18, GG19, GG20,
 GG21, GG22, GG23, GG24, GG25, GG26, GG27, GG28,
 GG29, GG30, GG31, GG32, GG33, GG34, GG35, GG36,
 GG37, GG38, GG39, GG40, GG41, GG42, GG43, GG44,
 GG45, GG46, GG47, GG48, GG49, GG50, GG51, GG52,
 GG53, GG54, GG55, GG56, GG57, GG58, GG59, GG60,
 GG61, GG62, GG63, GG64, GG65, GG66, GG67, GG68,
 GG69, GG70, GG71, GG72, GG73, GG74, GG75, GG76,
 GG77, GG78, GG79, GG80, GG81, GG82, GG83, GG84,
 GG85, GG86, GG87, GG88, GG89, GG90, GG91, GG92,
 GG93, GG94, GG95, GG96, GG97, GG98, GG99, G100
 ), Sim001 = c(8.58, 11.82, 7.72, 8.93, 9.82, 13.93, 10.94, 5.07,
 11.13, 7.66, 1.06, 14.93, 7.56, 6.18, 10.96, 4.94, 0.79, 4.91,
 0.68, 4.5, 3.13, 2.91, -2.27, 8.18, 3.87, 5.1, 7.23, 8.55, 1.39,
 6.85, 4.25, -0.24, 9.86, 9.92, 5.11, 6.14, 8.14, 7.52, 4.19,
 1.02, -2.43, 6.85, 4.94, 8.11, 7.34, 10.56, -6.52, 2.66, -4.89,
 0.11, 3.9, 3.87, 2.55, -0.2, 5.09, -2.4, 1.32, 3.49, 4.93, -9.58,
 9.08, 2.85, 7.71, 4.5, 0.99, 8.72, 12.45, 8.83, 8.94, 5.05, 9.82,
 11.72, 1.21, 7.92, 3.45, 1.34, 8.25, 2.92, 2.1, -3.19, 4.75,
 -0.3, 1.69, 3.31, 5.18, 2.43, 3.02, -5.31, -6.7, -5.9, -4.73,
 -8.13, -7.67, 3.73, -2.4, 1.46, -4.71, 0.33, -3.11, 2.45), Sim002 = c(-12.67,
 -9.94, -11.94, -10.81, -6.78, -3.39, -7.58, -16.09, -9.96, -8.68,
 6.14, -12.43, -10.81, -7.58, -0.62, 1.52, 0.83, -3.29, -0.5,
 1.14, 3.35, 1.1, -5.25, 2, -4.09, 2.28, 2.46, 5.64, 4.64, -12.11,
 -2.21, -16.54, -7.38, -16.33, -7.63, -11.61, -12.78, -12.86,
 -9.99, 4.84, -13.75, -12.33, -13.43, -21.13, -12.63, -3.16, 1.61,
 -2.36, 5.54, 3.59, 1.23, -0.25, -1.49, -1.65, 0.9, -2.29, -2.35,
 -2.01, -2.07, 1.37, -0.64, 4.75, 3.02, 5.44, 3.76, 5.16, 2.54,
 4.11, 5.03, -1.12, 4.53, -0.15, -4.98, 0.85, -3.04, -0.06, -3.01,
 -1.1, 4.02, 1.77, -3.36, 1.56, -1.63, 1.12, -2.39, -2.05, 4.51,
 1.52, -0.61, 2.54, 2.88, 6.79, 5.5, -2.36, 4.18, -0.13, 5.68,
 1.82, 3.21, 0.21), Sim003 = c(4.45, -3.37, -1.17, 4.66, -4.19,
 -3.84, 1.74, 1.26, -7.06, -2.65, -4.88, -4.06, -2.32, -1.64,
 0.56, 0.31, -0.35, -5.69, 3.35, 4.84, 3.62, 0.77, -3.05, -0.42,
 -2.55, 1.34, 2.89, 3.09, 9.79, 4.98, 9.59, 4.99, 11.77, 13.07,
 0.41, 10.38, 11.01, 3.43, 6.08, 0.55, 6.49, 2.85, 11.17, 11.32,
 -0.89, -0.48, 10.8, 1.86, 6.63, 5.14, 4.13, 8.72, 15.01, 4.78,
 5.56, 5.69, 10.39, 8.99, 6.95, 10.59, 3.92, 3.4, 3.95, 0.87,
 2.28, 1.11, 4.36, 6.21, 1.73, 2.5, 4.19, 1.85, 8.65, 6.24, 7.82,
 7.43, 5.19, -1.71, -3.16, -2.66, -7, -2.08, 0.36, 8.61, 3.22,
 7.99, -1.19, 11.38, 10.2, 8.87, 7.23, 8.07, 2.77, 9.61, -1.1,
 -2.05, 6.39, 6.6, -2.89, -6.41), Sim004 = c(-1.31, 4.94, 4.7,
 2.88, -0.01, 1.83, -7.51, 1.12, 5.25, 5.25, 3.78, 4.94, 2.32,
 9.83, -1.59, 6.45, 4.26, 3.1, 5.5, 6.94, 2.76, 5.1, 1.95, 15.13,
 -9.18, 2.88, 4.28, -5.01, -0.27, 1.91, -1.46, -0.6, -8.99, -8.79,
 -3.09, -7.09, -5.2, -7.55, -4.04, -3.8, -10.66, -6.34, -3.62,
 -8.49, -2.29, 0.38, 5.25, 8.6, 5.83, 8.94, 9.86, 4.62, 4.33,
 10.15, 7.87, 9.07, 0.04, 2.85, 6, 4.54, 13.5, 12.39, 11.79, 6.29,
 15.45, 15.82, 19.79, 13.12, 6.5, -4.63, 1.79, -0.8, 1.29, 8.88,
 1.28, 6.55, 5.78, 5.46, 2.83, 8, 6.25, 4.94, 5.01, 5.32, 2.95,
 7.46, 5.71, -3.51, 3.51, 9.46,

Re: [R] Aggregate time series from daily to monthly by date and site

2014-04-04 Thread Jeff Newmiller

You have been around long enough that we should not have to tell you how to 
provide data in a reproducible manner... read ?dput.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On April 4, 2014 9:02:03 PM PDT, Zilefac Elvis zilefacel...@yahoo.com wrote:
Hi,
I have daily data arranged by date and site. Keeping the number of
columns as there are, I will like to aggregate (FUN=mean) from daily to
monthly the following data (only part is shown here) which starts in
1971 and ends in 1980.

    Year Month Day Site Sim001 Sim002 Sim003 Sim004
1   1971     1   1 GGG1   8.58 -12.67   4.45  -1.31
2   1971     1   1 GGG2  11.82  -9.94  -3.37   4.94
3   1971     1   1 GGG3   7.72 -11.94  -1.17   4.70
4   1971     1   1 GGG4   8.93 -10.81   4.66   2.88
5   1971     1   1 GGG5   9.82  -6.78  -4.19  -0.01
6   1971     1   1 GGG6  13.93  -3.39  -3.84   1.83
7   1971     1   1 GGG7  10.94  -7.58   1.74  -7.51
8   1971     1   1 GGG8   5.07 -16.09   1.26   1.12
9   1971     1   1 GGG9  11.13  -9.96  -7.06   5.25
10  1971     1   1 GG10   7.66  -8.68  -2.65   5.25
11  1971     1   1 GG11   1.06   6.14  -4.88   3.78
12  1971     1   1 GG12  14.93 -12.43  -4.06   4.94
13  1971     1   1 GG13   7.56 -10.81  -2.32   2.32
14  1971     1   1 GG14   6.18  -7.58  -1.64   9.83
15  1971     1   1 GG15  10.96  -0.62   0.56  -1.59
16  1971     1   1 GG16   4.94   1.52   0.31   6.45
17  1971     1   1 GG17   0.79   0.83  -0.35   4.26
18  1971     1   1 GG18   4.91  -3.29  -5.69   3.10
19  1971     1   1 GG19   0.68  -0.50   3.35   5.50
20  1971     1   1 GG20   4.50   1.14   4.84   6.94
21  1971     1   1 GG21   3.13   3.35   3.62   2.76
22  1971     1   1 GG22   2.91   1.10   0.77   5.10
23  1971     1   1 GG23  -2.27  -5.25  -3.05   1.95
24  1971     1   1 GG24   8.18   2.00  -0.42  15.13
25  1971     1   1 GG25   3.87  -4.09  -2.55  -9.18
26  1971     1   1 GG26   5.10   2.28   1.34   2.88
27  1971     1   1 GG27   7.23   2.46   2.89   4.28
28  1971     1   1 GG28   8.55   5.64   3.09  -5.01
29  1971     1   1 GG29   1.39   4.64   9.79  -0.27
30  1971     1   1 GG30   6.85 -12.11   4.98   1.91
31  1971     1   1 GG31   4.25  -2.21   9.59  -1.46
32  1971     1   1 GG32  -0.24 -16.54   4.99  -0.60
33  1971     1   1 GG33   9.86  -7.38  11.77  -8.99
34  1971     1   1 GG34   9.92 -16.33  13.07  -8.79
35  1971     1   1 GG35   5.11  -7.63   0.41  -3.09
36  1971     1   1 GG36   6.14 -11.61  10.38  -7.09
37  1971     1   1 GG37   8.14 -12.78  11.01  -5.20
38  1971     1   1 GG38   7.52 -12.86   3.43  -7.55
39  1971     1   1 GG39   4.19  -9.99   6.08  -4.04
40  1971     1   1 GG40   1.02   4.84   0.55  -3.80
41  1971     1   1 GG41  -2.43 -13.75   6.49 -10.66
42  1971     1   1 GG42   6.85 -12.33   2.85  -6.34
43  1971     1   1 GG43   4.94 -13.43  11.17  -3.62
44  1971     1   1 GG44   8.11 -21.13  11.32  -8.49
45  1971     1   1 GG45   7.34 -12.63  -0.89  -2.29
46  1971     1   1 GG46  10.56  -3.16  -0.48   0.38
47  1971     1   1 GG47  -6.52   1.61  10.80   5.25
48  1971     1   1 GG48   2.66  -2.36   1.86   8.60
49  1971     1   1 GG49  -4.89   5.54   6.63   5.83
50  1971     1   1 GG50   0.11   3.59   5.14   8.94
51  1971     1   1 GG51   3.90   1.23   4.13   9.86
52  1971     1   1 GG52   3.87  -0.25   8.72   4.62
53  1971     1   1 GG53   2.55  -1.49  15.01   4.33
54  1971     1   1 GG54  -0.20  -1.65   4.78  10.15
55  1971     1   1 GG55   5.09   0.90   5.56   7.87
56  1971     1   1 GG56  -2.40  -2.29   5.69   9.07
57  1971     1   1 GG57   1.32  -2.35  10.39   0.04
58  1971     1   1 GG58   3.49  -2.01   8.99   2.85
59  1971     1   1 GG59   4.93  -2.07   6.95   6.00
60  1971     1   1 GG60  -9.58   1.37  10.59   4.54
61  1971     1   1 GG61   9.08  -0.64   3.92  13.50
62  1971     1   1 GG62   2.85   4.75   3.40  12.39
63  1971     1   1 GG63   7.71   3.02   3.95  11.79
64  1971     1   1 GG64   4.50   5.44   0.87   6.29
65  1971     1   1 GG65   0.99   3.76   2.28  15.45
66  1971     1   1 GG66   8.72   5.16   1.11  15.82
67  1971     1   1 GG67  12.45   2.54   4.36  19.79
68  1971     1   1 GG68   8.83   4.11   6.21  13.12
69  1971     1   1 GG69   8.94   5.03   1.73   6.50
70  1971     1   1 GG70   5.05  -1.12   2.50  -4.63
71  1971     1   1 GG71   9.82   4.53   4.19   1.79
72  1971     1   1 GG72  11.72  -0.15   1.85  -0.80
73  1971     1   1 GG73   1.21  -4.98   8.65   1.29
74  1971     1   1 GG74   7.92   0.85   6.24   8.88
75  1971     1   1 GG75   3.45  -3.04   7.82   1.28
76  1971     1   1 GG76   1.34

Re: [R] Aggregate time series from daily to monthly by date and site

2014-04-04 Thread Zilefac Elvis

Hi,

I have daily data arranged by date and site. Keeping the number of columns as 
there are, I will like to aggregate (FUN=mean) from daily to monthly the 
following data (only part is shown here) which starts in 1971 and ends in 1980.


structure(list(Year = c(1971, 1971, 1971, 1971, 1971, 1971, 1971, 
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 
1971, 1971, 1971, 1971, 1971), Month = c(1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1), Day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1), Site = c(GGG1, GGG2, GGG3, GGG4, 
GGG5, GGG6, GGG7, GGG8, GGG9, GG10, GG11, GG12, 
GG13, GG14, GG15, GG16, GG17, GG18, GG19, GG20, 
GG21, GG22, GG23, GG24, GG25, GG26, GG27, GG28, 
GG29, GG30, GG31, GG32, GG33, GG34, GG35, GG36, 
GG37, GG38, GG39, GG40, GG41, GG42, GG43, GG44, 
GG45, GG46, GG47, GG48, GG49, GG50, GG51, GG52, 
GG53, GG54, GG55, GG56, GG57, GG58, GG59, GG60, 
GG61, GG62, GG63, GG64, GG65, GG66, GG67, GG68, 
GG69, GG70, GG71, GG72, GG73, GG74, GG75, GG76, 
GG77, GG78, GG79, GG80, GG81, GG82, GG83, GG84, 
GG85, GG86, GG87, GG88, GG89, GG90, GG91, GG92, 
GG93, GG94, GG95, GG96, GG97, GG98, GG99, G100
), Sim001 = c(8.58, 11.82, 7.72, 8.93, 9.82, 13.93, 10.94, 5.07, 
11.13, 7.66, 1.06, 14.93, 7.56, 6.18, 10.96, 4.94, 0.79, 4.91, 
0.68, 4.5, 3.13, 2.91, -2.27, 8.18, 3.87, 5.1, 7.23, 8.55, 1.39, 
6.85, 4.25, -0.24, 9.86, 9.92, 5.11, 6.14, 8.14, 7.52, 4.19, 
1.02, -2.43, 6.85, 4.94, 8.11, 7.34, 10.56, -6.52, 2.66, -4.89, 
0.11, 3.9, 3.87, 2.55, -0.2, 5.09, -2.4, 1.32, 3.49, 4.93, -9.58, 
9.08, 2.85, 7.71, 4.5, 0.99, 8.72, 12.45, 8.83, 8.94, 5.05, 9.82, 
11.72, 1.21, 7.92, 3.45, 1.34, 8.25, 2.92, 2.1, -3.19, 4.75, 
-0.3, 1.69, 3.31, 5.18, 2.43, 3.02, -5.31, -6.7, -5.9, -4.73, 
-8.13, -7.67, 3.73, -2.4, 1.46, -4.71, 0.33, -3.11, 2.45), Sim002 = c(-12.67, 
-9.94, -11.94, -10.81, -6.78, -3.39, -7.58, -16.09, -9.96, -8.68, 
6.14, -12.43, -10.81, -7.58, -0.62, 1.52, 0.83, -3.29, -0.5, 
1.14, 3.35, 1.1, -5.25, 2, -4.09, 2.28, 2.46, 5.64, 4.64, -12.11, 
-2.21, -16.54, -7.38, -16.33, -7.63, -11.61, -12.78, -12.86, 
-9.99, 4.84, -13.75, -12.33, -13.43, -21.13, -12.63, -3.16, 1.61, 
-2.36, 5.54, 3.59, 1.23, -0.25, -1.49, -1.65, 0.9, -2.29, -2.35, 
-2.01, -2.07, 1.37, -0.64, 4.75, 3.02, 5.44, 3.76, 5.16, 2.54, 
4.11, 5.03, -1.12, 4.53, -0.15, -4.98, 0.85, -3.04, -0.06, -3.01, 
-1.1, 4.02, 1.77, -3.36, 1.56, -1.63, 1.12, -2.39, -2.05, 4.51, 
1.52, -0.61, 2.54, 2.88, 6.79, 5.5, -2.36, 4.18, -0.13, 5.68, 
1.82, 3.21, 0.21), Sim003 = c(4.45, -3.37, -1.17, 4.66, -4.19, 
-3.84, 1.74, 1.26, -7.06, -2.65, -4.88, -4.06, -2.32, -1.64, 
0.56, 0.31, -0.35, -5.69, 3.35, 4.84, 3.62, 0.77, -3.05, -0.42, 
-2.55, 1.34, 2.89, 3.09, 9.79, 4.98, 9.59, 4.99, 11.77, 13.07, 
0.41, 10.38, 11.01, 3.43, 6.08, 0.55, 6.49, 2.85, 11.17, 11.32, 
-0.89, -0.48, 10.8, 1.86, 6.63, 5.14, 4.13, 8.72, 15.01, 4.78, 
5.56, 5.69, 10.39, 8.99, 6.95, 10.59, 3.92, 3.4, 3.95, 0.87, 
2.28, 1.11, 4.36, 6.21, 1.73, 2.5, 4.19, 1.85, 8.65, 6.24, 7.82, 
7.43, 5.19, -1.71, -3.16, -2.66, -7, -2.08, 0.36, 8.61, 3.22, 
7.99, -1.19, 11.38, 10.2, 8.87, 7.23, 8.07, 2.77, 9.61, -1.1, 
-2.05, 6.39, 6.6, -2.89, -6.41), Sim004 = c(-1.31, 4.94, 4.7, 
2.88, -0.01, 1.83, -7.51, 1.12, 5.25, 5.25, 3.78, 4.94, 2.32, 
9.83, -1.59, 6.45, 4.26, 3.1, 5.5, 6.94, 2.76, 5.1, 1.95, 15.13, 
-9.18, 2.88, 4.28, -5.01, -0.27, 1.91, -1.46, -0.6, -8.99, -8.79, 
-3.09, -7.09, -5.2, -7.55, -4.04, -3.8, -10.66, -6.34, -3.62, 
-8.49, -2.29, 0.38, 5.25, 8.6, 5.83, 8.94, 9.86, 4.62, 4.33, 
10.15, 7.87, 9.07, 0.04, 2.85, 6, 4.54, 13.5, 12.39, 11.79, 6.29, 
15.45, 15.82, 19.79, 13.12, 6.5, -4.63, 1.79, -0.8, 1.29, 8.88, 
1.28, 6.55, 5.78, 5.46, 2.83, 8, 6.25, 4.94, 5.01, 5.32, 2.95, 
7.46, 5.71, -3.51, 3.51, 9.46, 8.55, 6.71, 7.36, 10.96, 3.47, 
-1.99, 5.75, 1.56, -4.38, 0.67)), .Names = c(Year, Month, 
Day, Site, Sim001, Sim002, Sim003, Sim004), row.names = c(NA, 
100L), class = data.frame)

Thanks for your useful solution.
Atem. 
[[alternative HTML version deleted]]

__

Re: [R] aggregate and sum on some columns fromduplicate rows

2014-02-28 Thread ltdm

Hi again,

Sorry for disturbing. After posting I suddenly found a solution.
As it may help someone else here it goes.

df
du1 - duplicated(df[,c(St.Sam,Species)],fromLast = F)
du2 - duplicated(df[,c(St.Sam,Species)],fromLast = T)
X - df[du1|du2,]
aggRows - aggregate(cbind(NT,PT) ~St.Sam+Species+Var1+Var2, X,sum)
dfNew - rbind(df[!(du1|du2),], aggRows)

Just need to polish dfNew and its OK. But maybe there is a more elegant
solution.

Cheers,

Tito





--
View this message in context: 
http://r.789695.n4.nabble.com/aggregate-and-sum-on-some-columns-fromduplicate-rows-tp4686040p4686043.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate and sum on some columns fromduplicate rows

2014-02-28 Thread arun

Hi,
You could use ?ddply
library(plyr)

cbind(ddply(dat,.(St.Sam,Sp),function(x) 
x[!duplicated(x$Var1),])[,-c(5:6)],ddply(dat,.(St.Sam,Sp),colwise(sum,.(NT,PT)))[,-c(1:2)])
   St.Sam  Sp Var1 Var2 NT PT
1  ST1.S1 Sp1   12   aa 23 37
2  ST1.S1 Sp2   32   bb 45 26
3  ST1.S1 Sp3   47   cc 89 35
4  ST1.S2 Sp1   25   dd 29 66
5  ST1.S2 Sp2   59   ee 89 35
6  ST2.S1 Sp1   15   aa 30 45
7  ST2.S1 Sp2   45   cc 55 23
8  ST2.S1 Sp3   27   aa 85 12
9  ST2.S1 Sp4   42   cc  8  3
10 ST3.S1 Sp1   25   aa 26 69
11 ST3.S1 Sp2   36   bb 65 48
A.K.




On Friday, February 28, 2014 6:47 PM, ltdm luis.tito-de-mor...@ird.fr wrote:
Hi again,

Sorry for disturbing. After posting I suddenly found a solution.
As it may help someone else here it goes.

df
du1 - duplicated(df[,c(St.Sam,Species)],fromLast = F)
du2 - duplicated(df[,c(St.Sam,Species)],fromLast = T)
X - df[du1|du2,]
aggRows - aggregate(cbind(NT,PT) ~St.Sam+Species+Var1+Var2, X,sum)
dfNew - rbind(df[!(du1|du2),], aggRows)

Just need to polish dfNew and its OK. But maybe there is a more elegant
solution.

Cheers,

Tito





--
View this message in context: 
http://r.789695.n4.nabble.com/aggregate-and-sum-on-some-columns-fromduplicate-rows-tp4686040p4686043.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 >

1 - 100 of 315 matches

Mail list logo