Re: [R] about data manipulation
Hi lily, If you want to use aggregate, supply the name of the function: aggregate(flow~year, data=df, "sum") You can also use "by" like this by(df$flow,df$year,FUN=sum) I assume that you don't have to worry about missing months in a year. Jim : On Thu, Dec 1, 2016 at 3:06 PM, lily li wrote: > Hi R users, > > I'm trying to manipulate dataset, but met some difficulties. > > df > year month flow > 2006 33.5 > 2006 43.8 > 2006 521 > 2006 632 > 2007 34.1 > 2007 44.4 > ... > > I want to calculate total flow for each year, and use the code below: > aggregate(flow~year, data=df, sum) > But it gave the error message: > Error in get(as.character(FUN), mode = "function", envir = envir) : > object 'FUN' of mode 'function' was not found > > What is the problem and how to solve it? Thanks for your help. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about data manipulation
> On 1 Dec 2016, at 05:06, lily li wrote: > > Hi R users, > > I'm trying to manipulate dataset, but met some difficulties. > > df > year month flow > 2006 33.5 > 2006 43.8 > 2006 521 > 2006 632 > 2007 34.1 > 2007 44.4 > ... > > I want to calculate total flow for each year, and use the code below: > aggregate(flow~year, data=df, sum) > But it gave the error message: > Error in get(as.character(FUN), mode = "function", envir = envir) : > object 'FUN' of mode 'function' was not found > > What is the problem and how to solve it? Thanks for your help. > Not enough information. If I try this df <- read.table(text="year month flow 2006 33.5 2006 43.8 2006 521 2006 632 2007 34.1 2007 44.4 ", header=TRUE) df aggregate(flow~year, data=df, sum) I get a correct answer. So you are likely doing something weird and not showing us all. > [[alternative HTML version deleted]] > Plan text mail. Has been asked many, many times before. Berend Hasselman > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] about data manipulation
Hi R users, I'm trying to manipulate dataset, but met some difficulties. df year month flow 2006 33.5 2006 43.8 2006 521 2006 632 2007 34.1 2007 44.4 ... I want to calculate total flow for each year, and use the code below: aggregate(flow~year, data=df, sum) But it gave the error message: Error in get(as.character(FUN), mode = "function", envir = envir) : object 'FUN' of mode 'function' was not found What is the problem and how to solve it? Thanks for your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About data manipulation
just assign it to an object x<- DT . Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sun, Nov 27, 2016 at 2:03 AM, lily li wrote: > Thanks Jim, this method is very convenient and is what I want. Could I > know how to save the resulted dataframe? It printed in the console directly. > > On Sat, Nov 26, 2016 at 5:55 PM, jim holtman wrote: > >> You did not provide any data, but I will take a stab at it using the >> "dplyr" package >> >> library(dplyr) >> DT %>% >> group_by(month, id, note) %>% >> summarise(avg = mean(total)) >> >> >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> On Sat, Nov 26, 2016 at 11:11 AM, lily li wrote: >> >>> Hi R users, >>> >>> I'm trying to manipulate a dataframe and have some difficulties. >>> >>> The original dataset is like this: >>> >>> DF >>> year month total id note >>> 2000 1 98GA 1 >>> 2001 1100 GA 1 >>> 2002 2 99GA 1 >>> 2002 2 80GB 1 >>> ... >>> 2012 1 78GA 2 >>> ... >>> >>> The structure is like this: when year is between 2000-2005, note is 1; >>> when >>> year is between 2006-2010, note is 2; GA, GB, etc represent different >>> groups, but they all have years 2000-2005, 2006-2010, 2011-2015. >>> I want to calculate one average value for each month in each time slice. >>> For example, between 2000-2005, when note is 1, for GA, there is one >>> value >>> in month 1, one value in month 2, etc; for GB, there is one value in >>> month >>> 1, one value in month 2, between this time period. So later, there is no >>> 'year' column, but other columns. >>> I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF, >>> id==GA¬e==1)), but it did not give me the ideal dataframe. How to do >>> then? >>> Thanks for your help. >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posti >>> ng-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About data manipulation
Thanks Jim, this method is very convenient and is what I want. Could I know how to save the resulted dataframe? It printed in the console directly. On Sat, Nov 26, 2016 at 5:55 PM, jim holtman wrote: > You did not provide any data, but I will take a stab at it using the > "dplyr" package > > library(dplyr) > DT %>% > group_by(month, id, note) %>% > summarise(avg = mean(total)) > > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Sat, Nov 26, 2016 at 11:11 AM, lily li wrote: > >> Hi R users, >> >> I'm trying to manipulate a dataframe and have some difficulties. >> >> The original dataset is like this: >> >> DF >> year month total id note >> 2000 1 98GA 1 >> 2001 1100 GA 1 >> 2002 2 99GA 1 >> 2002 2 80GB 1 >> ... >> 2012 1 78GA 2 >> ... >> >> The structure is like this: when year is between 2000-2005, note is 1; >> when >> year is between 2006-2010, note is 2; GA, GB, etc represent different >> groups, but they all have years 2000-2005, 2006-2010, 2011-2015. >> I want to calculate one average value for each month in each time slice. >> For example, between 2000-2005, when note is 1, for GA, there is one value >> in month 1, one value in month 2, etc; for GB, there is one value in month >> 1, one value in month 2, between this time period. So later, there is no >> 'year' column, but other columns. >> I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF, >> id==GA¬e==1)), but it did not give me the ideal dataframe. How to do >> then? >> Thanks for your help. >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About data manipulation
You did not provide any data, but I will take a stab at it using the "dplyr" package library(dplyr) DT %>% group_by(month, id, note) %>% summarise(avg = mean(total)) Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sat, Nov 26, 2016 at 11:11 AM, lily li wrote: > Hi R users, > > I'm trying to manipulate a dataframe and have some difficulties. > > The original dataset is like this: > > DF > year month total id note > 2000 1 98GA 1 > 2001 1100 GA 1 > 2002 2 99GA 1 > 2002 2 80GB 1 > ... > 2012 1 78GA 2 > ... > > The structure is like this: when year is between 2000-2005, note is 1; when > year is between 2006-2010, note is 2; GA, GB, etc represent different > groups, but they all have years 2000-2005, 2006-2010, 2011-2015. > I want to calculate one average value for each month in each time slice. > For example, between 2000-2005, when note is 1, for GA, there is one value > in month 1, one value in month 2, etc; for GB, there is one value in month > 1, one value in month 2, between this time period. So later, there is no > 'year' column, but other columns. > I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF, > id==GA¬e==1)), but it did not give me the ideal dataframe. How to do > then? > Thanks for your help. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About data manipulation
A reproducible example was not provided, but I think what is wanted is either ?tapply or ?ave; e.g. within(DF, means <- ave(total, note, month, FUN = mean)) Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Nov 26, 2016 at 3:42 PM, P Tennant wrote: > Hi, > > It may help that: > > aggregate(DF$total, list(DF$note, DF$id, DF$month), mean) > > should give you means broken down by time slice (note), id and month. You > could then subset means for GA or GB from the aggregated dataframe. > > Philip > > On 27/11/2016 3:11 AM, lily li wrote: >> >> Hi R users, >> >> I'm trying to manipulate a dataframe and have some difficulties. >> >> The original dataset is like this: >> >> DF >> year month total id note >> 2000 1 98GA 1 >> 2001 1100 GA 1 >> 2002 2 99GA 1 >> 2002 2 80GB 1 >> ... >> 2012 1 78GA 2 >> ... >> >> The structure is like this: when year is between 2000-2005, note is 1; >> when >> year is between 2006-2010, note is 2; GA, GB, etc represent different >> groups, but they all have years 2000-2005, 2006-2010, 2011-2015. >> I want to calculate one average value for each month in each time slice. >> For example, between 2000-2005, when note is 1, for GA, there is one value >> in month 1, one value in month 2, etc; for GB, there is one value in month >> 1, one value in month 2, between this time period. So later, there is no >> 'year' column, but other columns. >> I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF, >> id==GA¬e==1)), but it did not give me the ideal dataframe. How to do >> then? >> Thanks for your help. >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About data manipulation
Hi, It may help that: aggregate(DF$total, list(DF$note, DF$id, DF$month), mean) should give you means broken down by time slice (note), id and month. You could then subset means for GA or GB from the aggregated dataframe. Philip On 27/11/2016 3:11 AM, lily li wrote: Hi R users, I'm trying to manipulate a dataframe and have some difficulties. The original dataset is like this: DF year month total id note 2000 1 98GA 1 2001 1100 GA 1 2002 2 99GA 1 2002 2 80GB 1 ... 2012 1 78GA 2 ... The structure is like this: when year is between 2000-2005, note is 1; when year is between 2006-2010, note is 2; GA, GB, etc represent different groups, but they all have years 2000-2005, 2006-2010, 2011-2015. I want to calculate one average value for each month in each time slice. For example, between 2000-2005, when note is 1, for GA, there is one value in month 1, one value in month 2, etc; for GB, there is one value in month 1, one value in month 2, between this time period. So later, there is no 'year' column, but other columns. I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF, id==GA¬e==1)), but it did not give me the ideal dataframe. How to do then? Thanks for your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] About data manipulation
Hi R users, I'm trying to manipulate a dataframe and have some difficulties. The original dataset is like this: DF year month total id note 2000 1 98GA 1 2001 1100 GA 1 2002 2 99GA 1 2002 2 80GB 1 ... 2012 1 78GA 2 ... The structure is like this: when year is between 2000-2005, note is 1; when year is between 2006-2010, note is 2; GA, GB, etc represent different groups, but they all have years 2000-2005, 2006-2010, 2011-2015. I want to calculate one average value for each month in each time slice. For example, between 2000-2005, when note is 1, for GA, there is one value in month 1, one value in month 2, etc; for GB, there is one value in month 1, one value in month 2, between this time period. So later, there is no 'year' column, but other columns. I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF, id==GA¬e==1)), but it did not give me the ideal dataframe. How to do then? Thanks for your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.