> On Jul 5, 2016, at 2:27 AM, g.maub...@weinwolf.de wrote: > > Hi guys, > > I checked out your example but I can't follow the results.: > >> mtcars %>% > + group_by (am, gear) %>% > + summarise (n=n()) %>% > + mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% > + ungroup() %>% > + mutate(row.tot = sum(n)) > Source: local data frame [4 x 5] > > am gear n rel.freq row.tot > (dbl) (dbl) (int) (chr) (int) > 1 0 3 15 79% 32 > 2 0 4 4 21% 32 > 3 1 4 8 62% 32 > 4 1 5 5 38% 32 > > We have a total of 32 cases and 15 * 100 / 32 = 48,9 % instead of 79 %. > The same with the other columns. How is 79 % calculated? >
It is apparently the number of items in the first “group determinant” > mtcars %>% + group_by (am, gear) %>% + summarise (n=n()) %>% + mutate(sum = sum(n)) %>% + ungroup() Source: local data frame [4 x 4] am gear n sum (dbl) (dbl) (int) (int) 1 0 3 15 19 2 0 4 4 19 3 1 4 8 13 4 1 5 5 13 > ?n > with(mtcars,table(am,gear)) gear am 3 4 5 0 15 4 0 1 0 8 5 The documentation for the `n` functions is particularly unhelpful in letting one know what to expect from it: "Description This function is implemented special for each data source and can only be used from within summarise, mutate and filter" — David. > When searching the web I saw this example: > > -- cut -- > > #-- not run -- > url <- "http://www.lock5stat.com/datasets/HollywoodMovies2011.csv" > response <- GET(url) > Hollywoodmovies2011 <- content(x = GET(url), as = data.frame) > #-- end not run > > Hollywoodmovies2011 %>% > group_by(genre) %>% > summarize(count = n()) %>% > mutate(rf = count / sum(count)) > > -- cut -- > > which gives > > Source: local data frame [9 x 3] > > Genre count % > (fctr) (int) (dbl) > 1 Action 32 0.235294118 > 2 Adventure 1 0.007352941 > 3 Animation 12 0.088235294 > 4 Comedy 27 0.198529412 > 5 Drama 21 0.154411765 > 6 Fantasy 2 0.014705882 > 7 Horror 17 0.125000000 > 8 Romance 11 0.080882353 > 9 Thriller 13 0.095588235 > > Here the % correspond to the count and the sum of count, e. g. sum = 136 > and 32 / 136 = 0,2352941. > > What is the difference when counting? What do the relative counts in the > first example mean? > > Kind regards > > Georg > > > > > > Von: Ulrik Stervbo <ulrik.ster...@gmail.com> > An: David Winsemius <dwinsem...@comcast.net>, > Kopie: r-help@r-project.org, mai...@infomed.sld.cu > Datum: 05.07.2016 06:06 > Betreff: Re: [R] dplyr : row total for all groups in dplyr > summarise > Gesendet von: "R-help" <r-help-boun...@r-project.org> > > > > That will give you the wrong result when used on summarised data > > David Winsemius <dwinsem...@comcast.net> schrieb am Di., 5. Juli 2016 > 02:10: > >> I thought there was an nrow() function? >> >> Sent from my iPhone >> >> On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo <ulrik.ster...@gmail.com> > wrote: >> >> If you want the total number of rows in the original data.frame after >> counting the rows in each group, you can ungroup and sum the row counts, >> like: >> >> library("dplyr") >> >> >> mtcars %>% >> group_by (am, gear) %>% >> summarise (n=n()) %>% >> mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% >> ungroup() %>% >> mutate(row.tot = sum(n)) >> >> HTH >> Ulrik >> >> On Mon, 4 Jul 2016 at 18:23 David Winsemius <dwinsem...@comcast.net> >> wrote: >> >>> >>>> On Jul 4, 2016, at 6:56 AM, mai...@infomed.sld.cu wrote: >>>> >>>> Hello, >>>> How can I aggregate row total for all groups in dplyr summarise ? >>> >>> Row total … of what? Aggregate … how? What is the desired answer? >>> >>> >>> >>>> library(dplyr) >>>> mtcars %>% >>>> group_by (am, gear) %>% >>>> summarise (n=n()) %>% >>>> mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) >>>> >>>> best regard >>>> Maicel Monzon >>>> >>>> >>>> >>>> ---------------------------------------------------------------- >>>> >>>> >>>> >>>> >>>> -- >>>> Este mensaje le ha llegado mediante el servicio de correo electronico >>> que ofrece Infomed para respaldar el cumplimiento de las misiones del >>> Sistema Nacional de Salud. La persona que envia este correo asume el >>> compromiso de usar el servicio a tales fines y cumplir con las > regulaciones >>> establecidas >>>> >>>> Infomed: http://www.sld.cu/ >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.