Re: [R] I need to create new variables based on two numeric variables and one dichotomize conditional category variables.
There are many techniques Callum and yours is an interesting twist I had not considered. Yes, you can specify what integer a factor uses to represent things but not what I meant. Of course your trick does not work for some other forms of data like real numbers in double format. There is a cost to converting a column to a factor that is recouped best if it speeds things up multiple times. The point I was making was that when you will be using group_by, especially if done many times, it might speed things up if the column is already a normal factor, perhaps just indexed from 1 onward. My guess is that underneath the covers, some programs implicitly do such a factor conversion if needed. An example might be aspects of the ggplot program where you may get a mysterious order of presentation in the graph unless you create a factor with the order you wish to have used and avoid it making one invisibly. From: CALUM POLWART Sent: Saturday, November 4, 2023 7:14 PM To: avi.e.gr...@gmail.com Cc: Jorgen Harmse ; r-help@r-project.org; mkzama...@gmail.com Subject: Re: [R] I need to create new variables based on two numeric variables and one dichotomize conditional category variables. I might have factored the gender. I'm not sure it would in any way be quicker. But might be to some extent easier to develop variations of. And is sort of what factors should be doing... # make dummy data gender <- c("Male", "Female", "Male", "Female") WC <- c(70,60,75,65) TG <- c(0.9, 1.1, 1.2, 1.0) myDf <- data.frame( gender, WC, TG ) # label a factor myDf$GF <- factor(myDf$gender, labels= c("Male"=65, "Female"=58)) # do the maths myDf$LAP <- (myDf$WC - as.numeric(myDf$GF))* myDf$TG #show results head(myDf) gender WC TG GF LAP 1 Male 70 0.9 58 61.2 2 Female 60 1.1 65 64.9 3 Male 75 1.2 58 87.6 4 Female 65 1.0 65 64.0 (Reality: I'd have probably used case_when in tidy to create a new numeric column) The equation to calculate LAP is different for male and females. I am giving both equations below. LAP for male = (WC-65)*TG LAP for female = (WC-58)*TG My question is 'how can I calculate the LAP and create a single new column? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I need to create new variables based on two numeric variables and one dichotomize conditional category variables.
I might have factored the gender. I'm not sure it would in any way be quicker. But might be to some extent easier to develop variations of. And is sort of what factors should be doing... # make dummy data gender <- c("Male", "Female", "Male", "Female") WC <- c(70,60,75,65) TG <- c(0.9, 1.1, 1.2, 1.0) myDf <- data.frame( gender, WC, TG ) # label a factor myDf$GF <- factor(myDf$gender, labels= c("Male"=65, "Female"=58)) # do the maths myDf$LAP <- (myDf$WC - as.numeric(myDf$GF))* myDf$TG #show results head(myDf) gender WC TG GF LAP 1 Male 70 0.9 58 61.2 2 Female 60 1.1 65 64.9 3 Male 75 1.2 58 87.6 4 Female 65 1.0 65 64.0 (Reality: I'd have probably used case_when in tidy to create a new numeric column) The equation to > calculate LAP is different for male and females. I am giving both equations > below. > > LAP for male = (WC-65)*TG > LAP for female = (WC-58)*TG > > My question is 'how can I calculate the LAP and create a single new column? > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum data according to date in sequence
Hi all, Thank you very much. I learn a lot from your suggested solution. On Sun, Nov 5, 2023 at 12:56 AM Rui Barradas wrote: > Às 01:49 de 03/11/2023, roslinazairimah zakaria escreveu: > > Hi all, > > > > This is the data: > > > >> dput(head(dt1,20))structure(list(StationName = c("PALO ALTO CA / > CAMBRIDGE #1", > > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > > "PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016", > > "1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016", > > "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", > > "1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016", > > "1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50", > > "20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12", > > "14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16", > > "13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414, > > 1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384, > > 2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608, > > 3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA, > > 20L), class = "data.frame") > > > > > > I would like to sum EnergykW data by the date. E.g. all values for > > EnergykWh on 1/14/2016 > > > > > > On Fri, Nov 3, 2023 at 8:10 AM jim holtman wrote: > > > >> How about send a 'dput' of some sample data. My guess is that your date > >> is 'character' and not 'Date'. > >> > >> Thanks > >> > >> Jim Holtman > >> *Data Munger Guru* > >> > >> > >> *What is the problem that you are trying to solve?Tell me what you want > to > >> do, not how you want to do it.* > >> > >> > >> On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria < > >> roslina...@gmail.com> wrote: > >> > >>> Dear all, > >>> > >>> I have this set of data. I would like to sum the EnergykWh according > date > >>> sequences. > >>> > head(dt1,20) StationName date time EnergykWh > >>> 1 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09 4.680496 > >>> 2 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50 6.272414 > >>> 3 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22 1.032782 > >>> 4 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 8:25 11.004884 > >>> 5 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824 > >>> 6 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17 6.658797 > >>> 7 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46 4.808874 > >>> 8 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19 1.469384 > >>> 9 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12 2.996239 > >>> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12 0.303222 > >>> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22 4.988339 > >>> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16 8.131804 > >>> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19 0.117156 > >>> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24 3.285669 > >>> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 9:54 1.175608 > >>> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16 3.677487 > >>> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53 1.068393 > >>> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03 8.820755 > >>> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00 8.138583 > >>> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016 8:58 9.057500 > >>> > >>> I have tried this: > >>> library(dplyr) > >>> sums <- dt1 %>% > >>>group_by(date) %>% > >>>summarise(EnergykWh = sum(EnergykWh)) > >>> > >>> head(sums,20) > >>> > >>> The date is not by daily sequence but by year sequence. > >>> > head(sums,20)# A tibble: 20 × 2 > >>> date EnergykWh > >>> 1 1/1/2017 25.3 2 1/1/2018 61.0 3 > >>> 1/1/2019 0.627 4 1/1/2020 10.7 5 1/10/201769.4 6 > >>> 1/10/201854.5 7 1/10/201949.1 8 1/10/202045.9 9 > >>> 1/11/201773.9 10 1/11/201853.3 11 1/11/201993.5 12 > >>> 1/11/202066.7 13 1/12/201778.6 14 1/12/201842.2 15 > >>> 1/12/201922.7 16 1/12/202080.9 17 1/13/201785.6 18 > >>> 1/13/201846.4 19 1/13/201940.0 20 1/13/2020 121. > >>> > >>> > >>> > >>> Thank you very much for any help given. > >>> > >>> > >>> -- > >>> *Roslinazairimah Zakaria* > >>> *Tel: +609-5492370; Fax. No.+609-5492766* > >>> > >>> *Email: roslinazairi...@ump.edu.my ; > >>> roslina...@gmail.com * > >>> Faculty of Industrial Sciences & Technology > >>> University Malaysia Pahang > >>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia > >>> > >>> [[alternative
Re: [R] I need to create new variables based on two numeric variables and one dichotomize conditional category variables.
Thanks Everyone, My variables are in a dataframe with multiple other variables. Thanks - *Md Kamruzzaman* On Sat, Nov 4, 2023 at 1:13 AM Bert Gunter wrote: > Well, something like: > > LAP <- ifelse(gender =='male', (WC-65)*TG, (WC-58)*TG) > > The exact code depends on whether your variables are in a data frame or > list or whatever, which you failed to specify. If so, ?with may be useful. > > Cheers, > Bert > > > > On Fri, Nov 3, 2023 at 3:43 AM Md. Kamruzzaman > wrote: > >> Hello Everyone, >> I have three variables: Waist circumference (WC), serum triglyceride (TG) >> level and gender. Waist circumference and serum triglyceride is numeric >> and >> gender (male and female) is categorical. From these three variables, I >> want >> to calculate the "Lipid Accumulation Product (LAP) Index". The equation to >> calculate LAP is different for male and females. I am giving both >> equations >> below. >> >> LAP for male = (WC-65)*TG >> LAP for female = (WC-58)*TG >> >> My question is 'how can I calculate the LAP and create a single new >> column? >> >> Your cooperation will be highly appreciated. >> >> Thanks in advance. >> >> With Regards >> >> ** >> >> *Md Kamruzzaman* >> >> *PhD **Research Fellow (**Medicine**)* >> Discipline of Medicine and Centre of Research Excellence in Translating >> Nutritional Science to Good Health >> Adelaide Medical School | Faculty of Health and Medical Sciences >> The University of Adelaide >> Adelaide SA 5005 >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum data according to date in sequence
There may be a point to consider about the field containing dates in the request below. Yes, much code will "work" just fine if the column are is seen as text as you can group by that too. The results will perhaps not be in the order by row that you expected but you can do your re-sorting perhaps even more efficiently after your summarise() either by converting the fewer remaining rows to a form of date or by transforming the text dates into an order of year/month/date that then sorts properly in forward or reverse order as needed. Converting lots of rows to date is not a cheap process and grouping by that more complex date data structure may be harder. Heck, it may even make sense to use the text form of dates organized as a factor as the grouping becomes sort of pre-done. The above comments are not saying any other solutions offered are wrong but simply discussing whether, especially for larger data sets, there are ways that could be more efficient. -Original Message- From: R-help On Behalf Of Rui Barradas Sent: Saturday, November 4, 2023 12:56 PM To: roslinazairimah zakaria ; jim holtman Cc: r-help mailing list Subject: Re: [R] Sum data according to date in sequence Às 01:49 de 03/11/2023, roslinazairimah zakaria escreveu: > Hi all, > > This is the data: > >> dput(head(dt1,20))structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE >> #1", > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", > "PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016", > "1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016", > "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", > "1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016", > "1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50", > "20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12", > "14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16", > "13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414, > 1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384, > 2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608, > 3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA, > 20L), class = "data.frame") > > > I would like to sum EnergykW data by the date. E.g. all values for > EnergykWh on 1/14/2016 > > > On Fri, Nov 3, 2023 at 8:10 AM jim holtman wrote: > >> How about send a 'dput' of some sample data. My guess is that your date >> is 'character' and not 'Date'. >> >> Thanks >> >> Jim Holtman >> *Data Munger Guru* >> >> >> *What is the problem that you are trying to solve?Tell me what you want to >> do, not how you want to do it.* >> >> >> On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria < >> roslina...@gmail.com> wrote: >> >>> Dear all, >>> >>> I have this set of data. I would like to sum the EnergykWh according date >>> sequences. >>> head(dt1,20) StationName date time EnergykWh >>> 1 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09 4.680496 >>> 2 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50 6.272414 >>> 3 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22 1.032782 >>> 4 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 8:25 11.004884 >>> 5 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824 >>> 6 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17 6.658797 >>> 7 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46 4.808874 >>> 8 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19 1.469384 >>> 9 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12 2.996239 >>> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12 0.303222 >>> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22 4.988339 >>> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16 8.131804 >>> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19 0.117156 >>> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24 3.285669 >>> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 9:54 1.175608 >>> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16 3.677487 >>> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53 1.068393 >>> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03 8.820755 >>> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00 8.138583 >>> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016 8:58 9.057500 >>> >>> I have tried this: >>> library(dplyr) >>> sums <- dt1 %>% >>>group_by(date) %>% >>>summarise(EnergykWh = sum(EnergykWh)) >>> >>> head(sums,20) >>> >>> The date is not by daily sequence but by year sequence. >>> head(sums,20)# A tibble: 20 × 2 >>> date EnergykWh >>>
Re: [R] Sum data according to date in sequence
Às 01:49 de 03/11/2023, roslinazairimah zakaria escreveu: Hi all, This is the data: dput(head(dt1,20))structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016", "1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016", "1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50", "20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12", "14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16", "13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414, 1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384, 2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608, 3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA, 20L), class = "data.frame") I would like to sum EnergykW data by the date. E.g. all values for EnergykWh on 1/14/2016 On Fri, Nov 3, 2023 at 8:10 AM jim holtman wrote: How about send a 'dput' of some sample data. My guess is that your date is 'character' and not 'Date'. Thanks Jim Holtman *Data Munger Guru* *What is the problem that you are trying to solve?Tell me what you want to do, not how you want to do it.* On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria < roslina...@gmail.com> wrote: Dear all, I have this set of data. I would like to sum the EnergykWh according date sequences. head(dt1,20) StationName date time EnergykWh 1 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09 4.680496 2 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50 6.272414 3 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22 1.032782 4 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 8:25 11.004884 5 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824 6 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17 6.658797 7 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46 4.808874 8 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19 1.469384 9 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12 2.996239 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12 0.303222 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22 4.988339 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16 8.131804 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19 0.117156 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24 3.285669 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 9:54 1.175608 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16 3.677487 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53 1.068393 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03 8.820755 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00 8.138583 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016 8:58 9.057500 I have tried this: library(dplyr) sums <- dt1 %>% group_by(date) %>% summarise(EnergykWh = sum(EnergykWh)) head(sums,20) The date is not by daily sequence but by year sequence. head(sums,20)# A tibble: 20 × 2 date EnergykWh 1 1/1/2017 25.3 2 1/1/2018 61.0 3 1/1/2019 0.627 4 1/1/2020 10.7 5 1/10/201769.4 6 1/10/201854.5 7 1/10/201949.1 8 1/10/202045.9 9 1/11/201773.9 10 1/11/201853.3 11 1/11/201993.5 12 1/11/202066.7 13 1/12/201778.6 14 1/12/201842.2 15 1/12/201922.7 16 1/12/202080.9 17 1/13/201785.6 18 1/13/201846.4 19 1/13/201940.0 20 1/13/2020 121. Thank you very much for any help given. -- *Roslinazairimah Zakaria* *Tel: +609-5492370; Fax. No.+609-5492766* *Email: roslinazairi...@ump.edu.my ; roslina...@gmail.com * Faculty of Industrial Sciences & Technology University Malaysia Pahang Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Here are two solutions. 1. Base R Though I don't coerce the date column to class "Date", it seems to work. aggregate(EnergykWh ~ date, dt1, sum) #>date EnergykWh #> 1 1/14/2016 11.98569 #> 2 1/15/2016 32.56938 #> 3 1/16/2016 21.29181 #> 4 1/17/2016 22.88083 #> 5 1/18/2016 9.05750 2. Package dplyr. First column date is coerced from class
Re: [R] Adding columns to a tibble based on a value in a different tibble
Yes, Bert. At first glance I thought it was one of the merge/joins and then wondered at the wording that made it sound like the ids may not be one per column. IFF the need is the simpler case, it is a straightforward enough and common need. An example might make it clear enough so actual code can be shared as compared to talking about a first and second tibble. Here is one reference to consider: https://r4ds.hadley.nz/joins.html#:~:text=dplyr%20provides%20six%20join%20functions,is%20primarily%20determined%20by%20x%20. A left_join may be what works, and of course more basic R includes the merge() function: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/merge If the column were to contain multiple ID, that changes things and a more complex approach could be needed. -Original Message- From: R-help On Behalf Of Bert Gunter Sent: Saturday, November 4, 2023 10:35 AM To: Alessandro Puglisi Cc: r-help@r-project.org Subject: Re: [R] Adding columns to a tibble based on a value in a different tibble I think a simple reproducible example ("reprex") may be necessary for you to get a useful reply. Questions with vague specifications such as yours often result in going round and round with attempts to clarify what you mean without a satisfactory answer. Clarification at the outset with a reprex may save you and others a lot of frustration. Cheers, Bert On Sat, Nov 4, 2023 at 1:41 AM Alessandro Puglisi < alessandro.pugl...@gmail.com> wrote: > Hi everyone, > > I have a tibble with various ids and associated information. > > I need to add a new column to this tibble that retrieves a specific 'y' > value from a different tibble that has some of the mentioned ids in the > first column and a 'y' value in the second one. If the id, and so the 'y' > value is found, it will be included; otherwise, 'NA' will be used. > > Could you please help me? > > Thanks, > Alessandro > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding columns to a tibble based on a value in a different tibble
I think a simple reproducible example ("reprex") may be necessary for you to get a useful reply. Questions with vague specifications such as yours often result in going round and round with attempts to clarify what you mean without a satisfactory answer. Clarification at the outset with a reprex may save you and others a lot of frustration. Cheers, Bert On Sat, Nov 4, 2023 at 1:41 AM Alessandro Puglisi < alessandro.pugl...@gmail.com> wrote: > Hi everyone, > > I have a tibble with various ids and associated information. > > I need to add a new column to this tibble that retrieves a specific 'y' > value from a different tibble that has some of the mentioned ids in the > first column and a 'y' value in the second one. If the id, and so the 'y' > value is found, it will be included; otherwise, 'NA' will be used. > > Could you please help me? > > Thanks, > Alessandro > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding columns to a tibble based on a value in a different tibble
Hi everyone, I have a tibble with various ids and associated information. I need to add a new column to this tibble that retrieves a specific 'y' value from a different tibble that has some of the mentioned ids in the first column and a 'y' value in the second one. If the id, and so the 'y' value is found, it will be included; otherwise, 'NA' will be used. Could you please help me? Thanks, Alessandro [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.