Re: [R] I need to create new variables based on two numeric variables and one dichotomize conditional category variables.

2023-11-04 Thread avi.e.gross
There are many techniques Callum and yours is an interesting twist I had not 
considered. 
 
Yes, you can specify what integer a factor uses to represent things but not 
what I meant. Of course your trick does not work for some other forms of data 
like real numbers in double format. There is a cost to converting a column to a 
factor that is recouped best if it speeds things up multiple times.
 
The point I was making was that when you will be using group_by, especially if 
done many times, it might speed things up if the column is already a normal 
factor, perhaps just indexed from 1 onward. My guess is that underneath the 
covers, some programs implicitly do such a factor conversion if needed. An 
example might be aspects of the ggplot program where you may get a mysterious 
order of presentation in the graph unless you create a factor with the order 
you wish to have used and avoid it making one invisibly.
 
From: CALUM POLWART  
Sent: Saturday, November 4, 2023 7:14 PM
To: avi.e.gr...@gmail.com
Cc: Jorgen Harmse ; r-help@r-project.org; mkzama...@gmail.com
Subject: Re: [R] I need to create new variables based on two numeric variables 
and one dichotomize conditional category variables.
 
I might have factored the gender.
 
I'm not sure it would in any way be quicker.  But might be to some extent 
easier to develop variations of. And is sort of what factors should be doing... 
 
# make dummy data
gender <- c("Male", "Female", "Male", "Female")
WC <- c(70,60,75,65)
TG <- c(0.9, 1.1, 1.2, 1.0)
myDf <- data.frame( gender, WC, TG )
 
# label a factor
myDf$GF <- factor(myDf$gender, labels= c("Male"=65, "Female"=58))
 
# do the maths
myDf$LAP <- (myDf$WC - as.numeric(myDf$GF))* myDf$TG
 
#show results
head(myDf)
 
gender WC  TG GF  LAP
1   Male 70 0.9 58 61.2
2 Female 60 1.1 65 64.9
3   Male 75 1.2 58 87.6
4 Female 65 1.0 65 64.0
 
 
(Reality: I'd have probably used case_when in tidy to create a new numeric 
column)
 
 
 
 
The equation to
calculate LAP is different for male and females. I am giving both equations
below.

LAP for male = (WC-65)*TG
LAP for female = (WC-58)*TG

My question is 'how can I calculate the LAP and create a single new column?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I need to create new variables based on two numeric variables and one dichotomize conditional category variables.

2023-11-04 Thread CALUM POLWART
I might have factored the gender.

I'm not sure it would in any way be quicker.  But might be to some extent
easier to develop variations of. And is sort of what factors should be
doing...

# make dummy data
gender <- c("Male", "Female", "Male", "Female")
WC <- c(70,60,75,65)
TG <- c(0.9, 1.1, 1.2, 1.0)
myDf <- data.frame( gender, WC, TG )

# label a factor
myDf$GF <- factor(myDf$gender, labels= c("Male"=65, "Female"=58))

# do the maths
myDf$LAP <- (myDf$WC - as.numeric(myDf$GF))* myDf$TG

#show results
head(myDf)

gender WC  TG GF  LAP
1   Male 70 0.9 58 61.2
2 Female 60 1.1 65 64.9
3   Male 75 1.2 58 87.6
4 Female 65 1.0 65 64.0


(Reality: I'd have probably used case_when in tidy to create a new numeric
column)





The equation to
> calculate LAP is different for male and females. I am giving both equations
> below.
>
> LAP for male = (WC-65)*TG
> LAP for female = (WC-58)*TG
>
> My question is 'how can I calculate the LAP and create a single new column?
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum data according to date in sequence

2023-11-04 Thread roslinazairimah zakaria
Hi all,
Thank you very much.
I learn a lot from your suggested solution.



On Sun, Nov 5, 2023 at 12:56 AM Rui Barradas  wrote:

> Às 01:49 de 03/11/2023, roslinazairimah zakaria escreveu:
> > Hi all,
> >
> > This is the data:
> >
> >> dput(head(dt1,20))structure(list(StationName = c("PALO ALTO CA /
> CAMBRIDGE #1",
> > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> > "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> > "PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016",
> > "1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016",
> > "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016",
> > "1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016",
> > "1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50",
> > "20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12",
> > "14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16",
> > "13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414,
> > 1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384,
> > 2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608,
> > 3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA,
> > 20L), class = "data.frame")
> >
> >
> > I would like to sum EnergykW data by the date. E.g. all values for
> > EnergykWh on 1/14/2016
> >
> >
> > On Fri, Nov 3, 2023 at 8:10 AM jim holtman  wrote:
> >
> >> How about send a 'dput' of some sample data.  My guess is that your date
> >> is 'character' and not 'Date'.
> >>
> >> Thanks
> >>
> >> Jim Holtman
> >> *Data Munger Guru*
> >>
> >>
> >> *What is the problem that you are trying to solve?Tell me what you want
> to
> >> do, not how you want to do it.*
> >>
> >>
> >> On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria <
> >> roslina...@gmail.com> wrote:
> >>
> >>> Dear all,
> >>>
> >>> I have this set of data. I would like to sum the EnergykWh according
> date
> >>> sequences.
> >>>
>  head(dt1,20)   StationName  date  time EnergykWh
> >>> 1  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09  4.680496
> >>> 2  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50  6.272414
> >>> 3  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22  1.032782
> >>> 4  PALO ALTO CA / CAMBRIDGE #1 1/15/2016  8:25 11.004884
> >>> 5  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
> >>> 6  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17  6.658797
> >>> 7  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46  4.808874
> >>> 8  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19  1.469384
> >>> 9  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12  2.996239
> >>> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12  0.303222
> >>> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22  4.988339
> >>> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16  8.131804
> >>> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19  0.117156
> >>> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24  3.285669
> >>> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016  9:54  1.175608
> >>> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16  3.677487
> >>> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53  1.068393
> >>> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03  8.820755
> >>> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00  8.138583
> >>> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016  8:58  9.057500
> >>>
> >>> I have tried this:
> >>> library(dplyr)
> >>> sums <- dt1 %>%
> >>>group_by(date) %>%
> >>>summarise(EnergykWh = sum(EnergykWh))
> >>>
> >>> head(sums,20)
> >>>
> >>> The date is not by daily sequence but by year sequence.
> >>>
>  head(sums,20)# A tibble: 20 × 2
> >>> date  EnergykWh
> >>>   1 1/1/2017 25.3   2 1/1/2018 61.0   3
> >>> 1/1/2019  0.627 4 1/1/2020 10.7   5 1/10/201769.4   6
> >>> 1/10/201854.5   7 1/10/201949.1   8 1/10/202045.9   9
> >>> 1/11/201773.9  10 1/11/201853.3  11 1/11/201993.5  12
> >>> 1/11/202066.7  13 1/12/201778.6  14 1/12/201842.2  15
> >>> 1/12/201922.7  16 1/12/202080.9  17 1/13/201785.6  18
> >>> 1/13/201846.4  19 1/13/201940.0  20 1/13/2020   121.
> >>>
> >>>
> >>>
> >>> Thank you very much for any help given.
> >>>
> >>>
> >>> --
> >>> *Roslinazairimah Zakaria*
> >>> *Tel: +609-5492370; Fax. No.+609-5492766*
> >>>
> >>> *Email: roslinazairi...@ump.edu.my ;
> >>> roslina...@gmail.com *
> >>> Faculty of Industrial Sciences & Technology
> >>> University Malaysia Pahang
> >>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
> >>>
> >>>  [[alternative 

Re: [R] I need to create new variables based on two numeric variables and one dichotomize conditional category variables.

2023-11-04 Thread Md. Kamruzzaman
Thanks Everyone,
My variables are in a dataframe with multiple other variables.

Thanks

-

*Md Kamruzzaman*


On Sat, Nov 4, 2023 at 1:13 AM Bert Gunter  wrote:

> Well, something like:
>
> LAP <- ifelse(gender =='male', (WC-65)*TG, (WC-58)*TG)
>
> The exact code depends on whether your variables are in a data frame or
> list or whatever, which you failed to specify. If so, ?with  may be useful.
>
> Cheers,
> Bert
>
>
>
> On Fri, Nov 3, 2023 at 3:43 AM Md. Kamruzzaman 
> wrote:
>
>> Hello Everyone,
>> I have three variables: Waist circumference (WC), serum triglyceride (TG)
>> level and gender. Waist circumference and serum triglyceride is numeric
>> and
>> gender (male and female) is categorical. From these three variables, I
>> want
>> to calculate the "Lipid Accumulation Product (LAP) Index". The equation to
>> calculate LAP is different for male and females. I am giving both
>> equations
>> below.
>>
>> LAP for male = (WC-65)*TG
>> LAP for female = (WC-58)*TG
>>
>> My question is 'how can I calculate the LAP and create a single new
>> column?
>>
>> Your cooperation will be highly appreciated.
>>
>> Thanks in advance.
>>
>> With Regards
>>
>> **
>>
>> *Md Kamruzzaman*
>>
>> *PhD **Research Fellow (**Medicine**)*
>> Discipline of Medicine and Centre of Research Excellence in Translating
>> Nutritional Science to Good Health
>> Adelaide Medical School | Faculty of Health and Medical Sciences
>> The University of Adelaide
>> Adelaide SA 5005
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum data according to date in sequence

2023-11-04 Thread avi.e.gross
There may be a point to consider about the field containing dates in the 
request below. Yes, much code will "work" just fine if the column  are is seen 
as text as you can group by that too. The results will perhaps not be in the 
order by row that you expected but you can do your re-sorting perhaps even more 
efficiently after your summarise() either by converting the fewer remaining 
rows to a form of date or by transforming the text dates into an order of 
year/month/date that then sorts properly in forward or reverse order as needed. 

Converting lots of rows to date is not a cheap process and grouping by that 
more complex date data structure may be harder. Heck, it may even make sense to 
use the text form of dates organized as a factor as the grouping becomes sort 
of pre-done.

The above comments are not saying any other solutions offered are wrong but 
simply discussing whether, especially for larger data sets, there are ways that 
could be more efficient.

-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Saturday, November 4, 2023 12:56 PM
To: roslinazairimah zakaria ; jim holtman 

Cc: r-help mailing list 
Subject: Re: [R] Sum data according to date in sequence

Às 01:49 de 03/11/2023, roslinazairimah zakaria escreveu:
> Hi all,
> 
> This is the data:
> 
>> dput(head(dt1,20))structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE 
>> #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016",
> "1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016",
> "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016",
> "1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016",
> "1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50",
> "20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12",
> "14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16",
> "13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414,
> 1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384,
> 2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608,
> 3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA,
> 20L), class = "data.frame")
> 
> 
> I would like to sum EnergykW data by the date. E.g. all values for
> EnergykWh on 1/14/2016
> 
> 
> On Fri, Nov 3, 2023 at 8:10 AM jim holtman  wrote:
> 
>> How about send a 'dput' of some sample data.  My guess is that your date
>> is 'character' and not 'Date'.
>>
>> Thanks
>>
>> Jim Holtman
>> *Data Munger Guru*
>>
>>
>> *What is the problem that you are trying to solve?Tell me what you want to
>> do, not how you want to do it.*
>>
>>
>> On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria <
>> roslina...@gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> I have this set of data. I would like to sum the EnergykWh according date
>>> sequences.
>>>
 head(dt1,20)   StationName  date  time EnergykWh
>>> 1  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09  4.680496
>>> 2  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50  6.272414
>>> 3  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22  1.032782
>>> 4  PALO ALTO CA / CAMBRIDGE #1 1/15/2016  8:25 11.004884
>>> 5  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
>>> 6  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17  6.658797
>>> 7  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46  4.808874
>>> 8  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19  1.469384
>>> 9  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12  2.996239
>>> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12  0.303222
>>> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22  4.988339
>>> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16  8.131804
>>> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19  0.117156
>>> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24  3.285669
>>> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016  9:54  1.175608
>>> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16  3.677487
>>> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53  1.068393
>>> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03  8.820755
>>> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00  8.138583
>>> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016  8:58  9.057500
>>>
>>> I have tried this:
>>> library(dplyr)
>>> sums <- dt1 %>%
>>>group_by(date) %>%
>>>summarise(EnergykWh = sum(EnergykWh))
>>>
>>> head(sums,20)
>>>
>>> The date is not by daily sequence but by year sequence.
>>>
 head(sums,20)# A tibble: 20 × 2
>>> date  EnergykWh
>>> 

Re: [R] Sum data according to date in sequence

2023-11-04 Thread Rui Barradas

Às 01:49 de 03/11/2023, roslinazairimah zakaria escreveu:

Hi all,

This is the data:


dput(head(dt1,20))structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE #1",

"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016",
"1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016",
"1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016",
"1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016",
"1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50",
"20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12",
"14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16",
"13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414,
1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384,
2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608,
3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA,
20L), class = "data.frame")


I would like to sum EnergykW data by the date. E.g. all values for
EnergykWh on 1/14/2016


On Fri, Nov 3, 2023 at 8:10 AM jim holtman  wrote:


How about send a 'dput' of some sample data.  My guess is that your date
is 'character' and not 'Date'.

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria <
roslina...@gmail.com> wrote:


Dear all,

I have this set of data. I would like to sum the EnergykWh according date
sequences.


head(dt1,20)   StationName  date  time EnergykWh

1  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09  4.680496
2  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50  6.272414
3  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22  1.032782
4  PALO ALTO CA / CAMBRIDGE #1 1/15/2016  8:25 11.004884
5  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
6  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17  6.658797
7  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46  4.808874
8  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19  1.469384
9  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12  2.996239
10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12  0.303222
11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22  4.988339
12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16  8.131804
13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19  0.117156
14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24  3.285669
15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016  9:54  1.175608
16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16  3.677487
17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53  1.068393
18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03  8.820755
19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00  8.138583
20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016  8:58  9.057500

I have tried this:
library(dplyr)
sums <- dt1 %>%
   group_by(date) %>%
   summarise(EnergykWh = sum(EnergykWh))

head(sums,20)

The date is not by daily sequence but by year sequence.


head(sums,20)# A tibble: 20 × 2

date  EnergykWh
  1 1/1/2017 25.3   2 1/1/2018 61.0   3
1/1/2019  0.627 4 1/1/2020 10.7   5 1/10/201769.4   6
1/10/201854.5   7 1/10/201949.1   8 1/10/202045.9   9
1/11/201773.9  10 1/11/201853.3  11 1/11/201993.5  12
1/11/202066.7  13 1/12/201778.6  14 1/12/201842.2  15
1/12/201922.7  16 1/12/202080.9  17 1/13/201785.6  18
1/13/201846.4  19 1/13/201940.0  20 1/13/2020   121.



Thank you very much for any help given.


--
*Roslinazairimah Zakaria*
*Tel: +609-5492370; Fax. No.+609-5492766*

*Email: roslinazairi...@ump.edu.my ;
roslina...@gmail.com *
Faculty of Industrial Sciences & Technology
University Malaysia Pahang
Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






Hello,

Here are two solutions.

1. Base R

Though I don't coerce the date column to class "Date", it seems to work.


aggregate(EnergykWh ~ date, dt1, sum)
#>date EnergykWh
#> 1 1/14/2016  11.98569
#> 2 1/15/2016  32.56938
#> 3 1/16/2016  21.29181
#> 4 1/17/2016  22.88083
#> 5 1/18/2016   9.05750


2. Package dplyr.
First column date is coerced from class 

Re: [R] Adding columns to a tibble based on a value in a different tibble

2023-11-04 Thread avi.e.gross
Yes, Bert. At first glance I thought it was one of the merge/joins and then 
wondered at the wording that made it sound like the ids may not be one per 
column.

IFF the need is the simpler case, it is a straightforward enough and common 
need. An example might make it clear enough so actual code can be shared as 
compared to talking about a first and second tibble.

Here is one reference to consider:

https://r4ds.hadley.nz/joins.html#:~:text=dplyr%20provides%20six%20join%20functions,is%20primarily%20determined%20by%20x%20.


A left_join may be what works, and of course more basic R includes the merge() 
function:

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/merge

If the column were to contain multiple ID, that changes things and a more 
complex approach could be needed.

-Original Message-
From: R-help  On Behalf Of Bert Gunter
Sent: Saturday, November 4, 2023 10:35 AM
To: Alessandro Puglisi 
Cc: r-help@r-project.org
Subject: Re: [R] Adding columns to a tibble based on a value in a different 
tibble

I think a simple reproducible example ("reprex") may be necessary for you
to get a useful reply. Questions with vague specifications such as yours
often result in going round and round with attempts to clarify what you
mean without a satisfactory answer. Clarification at the outset with a
reprex may save you and others a lot of frustration.

Cheers,
Bert

On Sat, Nov 4, 2023 at 1:41 AM Alessandro Puglisi <
alessandro.pugl...@gmail.com> wrote:

> Hi everyone,
>
> I have a tibble with various ids and associated information.
>
> I need to add a new column to this tibble that retrieves a specific 'y'
> value from a different tibble that has some of the mentioned ids in the
> first column and a 'y' value in the second one. If the id, and so the 'y'
> value is found, it will be included; otherwise, 'NA' will be used.
>
> Could you please help me?
>
> Thanks,
> Alessandro
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding columns to a tibble based on a value in a different tibble

2023-11-04 Thread Bert Gunter
I think a simple reproducible example ("reprex") may be necessary for you
to get a useful reply. Questions with vague specifications such as yours
often result in going round and round with attempts to clarify what you
mean without a satisfactory answer. Clarification at the outset with a
reprex may save you and others a lot of frustration.

Cheers,
Bert

On Sat, Nov 4, 2023 at 1:41 AM Alessandro Puglisi <
alessandro.pugl...@gmail.com> wrote:

> Hi everyone,
>
> I have a tibble with various ids and associated information.
>
> I need to add a new column to this tibble that retrieves a specific 'y'
> value from a different tibble that has some of the mentioned ids in the
> first column and a 'y' value in the second one. If the id, and so the 'y'
> value is found, it will be included; otherwise, 'NA' will be used.
>
> Could you please help me?
>
> Thanks,
> Alessandro
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding columns to a tibble based on a value in a different tibble

2023-11-04 Thread Alessandro Puglisi
Hi everyone,

I have a tibble with various ids and associated information.

I need to add a new column to this tibble that retrieves a specific 'y'
value from a different tibble that has some of the mentioned ids in the
first column and a 'y' value in the second one. If the id, and so the 'y'
value is found, it will be included; otherwise, 'NA' will be used.

Could you please help me?

Thanks,
Alessandro

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.