Re: [R] Advice on starting to analyze smokestack emissions?

2023-12-16 Thread Kevin Zembower via R-help
Just to follow up on this thread, I didn't experience any problems
accessing the air monitoring data with the RAQSAPI package that I
anticipated from the US EPA's Air Quality System (AQS) Data Mart
database website. I didn't have to qualify with an agency affiliation
at all, just an email address.

Thanks again, Karl, for suggesting this.

-Kevin

On Fri, 2023-12-15 at 08:29 -0500, Kevin Zembower wrote:
> Bert, Tim, Karl and Richard, thank you all for your suggestions and
> help.
> 
> I will try the R-sig-ecology list.
> 
> Karl, I wasn't aware of the RAQSAPI package, but it looked promising.
> However, when I went to the source of the data it uses, the United
> States Environmental Protection Agency’s (US EPA) Air Quality System
> (AQS) Data Mart database, it looks like interactive access to the
> data
> is restricted to those who can document a professional agency
> affiliation. I don't have that. I'll work with the package to see if
> this is true regarding obtaining the data through it. Thanks for the
> suggestion.
> 
> Richard, the Canada study of crematoriums was very useful. Thanks.
> 
> Thanks, again, all, for your help.
> 
> -Kevin



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Advice on starting to analyze smokestack emissions?

2023-12-15 Thread Kevin Zembower via R-help
Bert, Tim, Karl and Richard, thank you all for your suggestions and
help.

I will try the R-sig-ecology list.

Karl, I wasn't aware of the RAQSAPI package, but it looked promising.
However, when I went to the source of the data it uses, the United
States Environmental Protection Agency’s (US EPA) Air Quality System
(AQS) Data Mart database, it looks like interactive access to the data
is restricted to those who can document a professional agency
affiliation. I don't have that. I'll work with the package to see if
this is true regarding obtaining the data through it. Thanks for the
suggestion.

Richard, the Canada study of crematoriums was very useful. Thanks.

Thanks, again, all, for your help.

-Kevin


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Advice on starting to analyze smokestack emissions?

2023-12-12 Thread Kevin Zembower via R-help
Hello, all,

[Originally sent to r-sig-geo list, with no response. Cross-posting
here, in the hope of a wider audience. Anyone with any experience in
this topic? Thanks.]

I'm trying to get started analyzing the concentrations of smokestack
emissions. I don't have any professional background or training for
this; I'm just an old, retired guy who thinks playing with numbers is
fun.

A local funeral home in my neighborhood (less than 1200 ft from my
home) is proposing to construct a crematorium for human remains. I have
some experience with the tidycensus package and thought it might be
interesting to construct a model for the changes in concentrations of
the pollutants from the smokestack and, using recorded wind speeds and
directions, see which US Census blocks would be affected.

I have the US Government EPA SCREEN3 output on how concentration varies
with distance from the smokestack.
See 
https://www.epa.gov/scram/air-quality-dispersion-modeling-screening-models#screen3
if curious. As a first task, I'd like to see if I can calculate similar
results in R. I'm aware of the 'plume' steady-state Gaussian dispersion
package
(https://rdrr.io/github/holstius/plume/f/inst/doc/plume-intro.pdf), but
am a little concerned that this package was last updated 11 years ago.

Do you have any recommendations for me on how to get started analyzing
this problem? Is 'plume' still the way to go? I'm aware that there are
many atmospheric dispersion models from the US EPA, but I was hoping to
keep my work within R, which I'm really enjoying using and learning
about. Are SCREEN3 and 'plume' comparable? Is this the best R list to
ask questions about this topic?

Thanks for any advice or guidance you have for me.

-Kevin




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Missing shapes in legend with scale_shape_manual

2023-10-31 Thread Kevin Zembower via R-help
Tim, thanks, it helps very much. It works like a charm.

Wow, there's so much I don't understand about ggplot2 functions,
especially the aes() function. I just discovered the ggplot2-book.org
site, and I hope to read it slowly and carefully over the next couple
of weeks.

Thanks again, Tim, for all your help.

-Kevin

On Tue, 2023-10-31 at 12:35 +, Howard, Tim G (DEC) wrote:
> I believe the missing shapes are because you had set alpha=0 for the
> last geom point. 
> 
> I expect there are better ways, but one way to handle it would be to
> avoid the filtering, adding columns with med and exercise status,
> like the following:
> 
> # setup with data provided
> Date <- c('2023-10-17', '2023-10-16', '2023-10-15', '2023-10-14',
>  '2023-10-13', '2023-10-12', '2023-10-11')
> Time <- c('08:50', '06:58', '09:17', '09:04', '08:44', '08:55',
> '07:55') 
> bg <- c(128, 144, 137, 115, 136, 122, 150)
> missed_meds <- c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE)
> no_exercise <- c(FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE)
> 
> b2 <- data.frame(Date, Time, bg, missed_meds, no_exercise)
> 
> b2$Date <- as.Date(b2$Date)
> # add "status" columns, could also be defined as factor. 
> b2$medStat <- c("missed_meds",NA, NA, NA, NA, NA, "missed_meds")
> b2$exercise <- c(NA, NA, "missed_exercise",NA,"missed_exercise",
> "missed_exercise", "missed_exercise")
> 
> Then your ggplot call would be like this:
> 
> ggplot(data = b2, aes(x = Date, y = bg)) +
>   geom_line() +
>   geom_point(aes(shape = medStat), size = 3)+
>   geom_point(aes(shape = exercise),size = 3)+
>   scale_y_continuous(name = "Blood glucose (mg/dL)",
>  breaks = seq(100, 230, by = 20)
>   ) +
>   geom_hline(yintercept = 130) +
>   scale_shape_manual(name = "Conditions",
>  labels = c("Missed meds",
>     "Missed exercise"),
>  values = c(20, 4)
>   )
> 
> 
> Note that this method then gets very close without the
> scale_shape_manual, too. 
> 
> Hope that helps. 
> Tim
> 
> 
> > Date: Mon, 30 Oct 2023 20:55:17 +
> > From: Kevin Zembower 
> > To: r-help@r-project.org 
> > Subject: [R] Missing shapes in legend with scale_shape_manual
> > Message-ID:
> >     <0100018b825e8f7f-646d2539-f8b5-4e1a-afc3-5d29f961967f-
> > 000...@email.amazonses.com>
> > 
> > Content-Type: text/plain; charset="utf-8"
> > 
> > Hello,
> > 
> > I'm trying to plot a graph of blood glucose versus date. I also
> > record
> > conditions, such as missing the previous night's medications, and
> > missing
> > exercise on the previous day. My data looks like:
> > 
> > > b2[68:74,]
> > # A tibble: 7 × 5
> >   Date   Time  bg missed_meds no_exercise
> >         
> > 1 2023-10-17 08:50    128 TRUE    FALSE
> > 2 2023-10-16 06:58    144 FALSE   FALSE
> > 3 2023-10-15 09:17    137 FALSE   TRUE
> > 4 2023-10-14 09:04    115 FALSE   FALSE
> > 5 2023-10-13 08:44    136 FALSE   TRUE
> > 6 2023-10-12 08:55    122 FALSE   TRUE
> > 7 2023-10-11 07:55    150 TRUE    TRUE
> > > 
> > 
> > This gets me most of the way to what I want:
> > 
> > ggplot(data = b2, aes(x = Date, y = bg)) +
> >     geom_line() +
> >     geom_point(data = filter(b2, missed_meds),
> >    shape = 20,
> >    size = 3) +
> >     geom_point(data = filter(b2, no_exercise),
> >    shape = 4,
> >    size = 3) +
> >     geom_point(aes(x = Date, y = bg, shape = missed_meds),
> >    alpha = 0) + #Invisible point layer for shape
> > mapping
> >     scale_y_continuous(name = "Blood glucose (mg/dL)",
> >    breaks = seq(100, 230, by = 20)
> >    ) +
> >     geom_hline(yintercept = 130) +
> >     scale_shape_manual(name = "Conditions",
> >    labels = c("Missed meds",
> >   "Missed exercise"),
> >    values = c(20, 4),
> >    ## size = 3
> >    )
> > 
> > However, the legend just prints an empty square in front of the
> > labels.
> > What I want is a filled circle (shape 20) in front of "Missed meds"
> > and a filled
> > circle (shape 4) in front of "Missed exercise."
> > 
> > My questions are:
> >  1. How can I fix my plot to show the shapes in the legend?
> >  2. Can my overall plotting method be improved? Would you do it
> > this way?
> > 
> > Thanks so much for your advice and guidance.
> > 
> > -Kevin
> > 
> > 
> > 
> > 



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Missing shapes in legend with scale_shape_manual

2023-10-30 Thread Kevin Zembower via R-help
Hello,

I'm trying to plot a graph of blood glucose versus date. I also record
conditions, such as missing the previous night's medications, and
missing exercise on the previous day. My data looks like:

> b2[68:74,]
# A tibble: 7 × 5
  Date   Time  bg missed_meds no_exercise
  
1 2023-10-17 08:50128 TRUEFALSE  
2 2023-10-16 06:58144 FALSE   FALSE  
3 2023-10-15 09:17137 FALSE   TRUE   
4 2023-10-14 09:04115 FALSE   FALSE  
5 2023-10-13 08:44136 FALSE   TRUE   
6 2023-10-12 08:55122 FALSE   TRUE   
7 2023-10-11 07:55150 TRUETRUE   
> 

This gets me most of the way to what I want:

ggplot(data = b2, aes(x = Date, y = bg)) +
geom_line() +
geom_point(data = filter(b2, missed_meds),
   shape = 20,
   size = 3) +
geom_point(data = filter(b2, no_exercise),
   shape = 4,
   size = 3) +
geom_point(aes(x = Date, y = bg, shape = missed_meds),
   alpha = 0) + #Invisible point layer for shape mapping
scale_y_continuous(name = "Blood glucose (mg/dL)",
   breaks = seq(100, 230, by = 20)
   ) +
geom_hline(yintercept = 130) +
scale_shape_manual(name = "Conditions",
   labels = c("Missed meds",
  "Missed exercise"),
   values = c(20, 4),
   ## size = 3
   )

However, the legend just prints an empty square in front of the labels.
What I want is a filled circle (shape 20) in front of "Missed meds" and
a filled circle (shape 4) in front of "Missed exercise."

My questions are:
 1. How can I fix my plot to show the shapes in the legend?
 2. Can my overall plotting method be improved? Would you do it this
way?

Thanks so much for your advice and guidance.

-Kevin



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with plotting and date-times for climate data

2023-09-13 Thread Kevin Zembower via R-help
 down (but
> > > > recover) and tissue death.
> > > > In a simple form the growth and physiological stage of plants, 
> > > > insects, and many others, can be modeled as a function of 
> > > > temperature. These are often called growing degree day models
> > > > (or 
> > > > some version of that). This is number of thermal units needed
> > > > for 
> > > > the organism to develop to the next stage (e.g. instar for an 
> > > > insect, or fruit/flower formation for a plant). However, better
> > > > accuracy is obtained if the model includes both min and max 
> > > > thresholds.
> > > > 
> > > > All I have done is provide an example where min and max could
> > > > have 
> > > > a real world use. I use max(temp) over some interval and then 
> > > > update an accumulated thermal units variable based on the
> > > > outcome.
> > > > That detail is not evident in the original request.
> > > > 
> > > > Tim
> > > > 
> > > > -Original Message-
> > > > From: R-help  On Behalf Of
> > > > Richard
> > > > O'Keefe
> > > > Sent: Wednesday, September 13, 2023 9:58 AM
> > > > To: Kevin Zembower 
> > > > Cc: r-help@r-project.org
> > > > Subject: Re: [R] Help with plotting and date-times for climate
> > > > data
> > > > 
> > > > [External Email]
> > > > 
> > > > Off-topic, but what is a "mean temperature max"
> > > > and what good would it do you to know you if you did?
> > > > I've been looking at a lot of weather station data and for no
> > > > question I've ever had (except "would the newspapers get
> > > > excited
> > > > about this") was "max" (or min) the answer.  Considering the
> > > > way
> > > > that temperature can change by several degrees in a few
> > > > minutes,
> > > > or
> > > > a few metres -- I meant horizontally when I wrote that, but as
> > > > you
> > > > know your head and feet don't experience the same temperature,
> > > > again
> > > > by more than one degree -- I am at something of a loss to
> > > > ascribe
> > > > much practical significance to TMAX.  Are you sure this is the
> > > > analysis you want to do?  Is this the most informative data you
> > > > can
> > > > get?
> > > > 
> > > > On Wed, 13 Sept 2023 at 08:51, Kevin Zembower via R-help <
> > > > r-help@r-project.org> wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > I'm trying to calculate the mean temperature max from a file
> > > > > of
> > > > > climate date, and plot it over a range of days in the year.
> > > > > I've
> > > > > downloaded the data, and cleaned it up the way I think it
> > > > > should
> > > > > be.
> > > > > However, when I plot it, the geom_smooth line doesn't show
> > > > > up.
> > > > > I
> > > > > think that's because my x axis is characters or factors.
> > > > > Here's
> > > > > what I have so far:
> > > > > 
> > > > > library(tidyverse)
> > > > > 
> > > > > data <- read_csv("Ely_MN_Weather.csv")
> > > > > 
> > > > > start_day = yday(as_date("2023-09-22")) end_day =
> > > > > yday(as_date("2023-10-15"))
> > > > > 
> > > > > d <- as_tibble(data) %>%
> > > > >  select(DATE,TMAX,TMIN) %>%
> > > > >  mutate(DATE = as_date(DATE),
> > > > >     yday = yday(DATE),
> > > > >     md = sprintf("%02d-%02d", month(DATE),
> > > > > mday(DATE))
> > > > >     ) %>%
> > > > >  filter(yday >= start_day & yday <= end_day) %>%
> > > > >  mutate(md = as.factor(md))
> > > > > 
> > > > > d_sum <- d %>%
> > > > >  group_by(md) %>%
> > > > >  summarize(tmax_mean = mean(TMAX, na.rm=TRUE))
> > > > > 
> > > > > ## Here's the filtered data:
> > > >

Re: [R] Help with plotting and date-times for climate data

2023-09-13 Thread Kevin Zembower via R-help
Well, I looked for this, on both the NWS and WeatherUnderground, but
couldn't find what I was looking for. Didn't check Weather.com, but if
you can find a chart of the average high and low temperatures in Ely,
MN between about the middle of September to the middle of October, I'll
buy you a beer.

-Kevin

On Wed, 2023-09-13 at 17:39 +, Ebert,Timothy Aaron wrote:
> I admire the dedication to R and data science, but the Weather
> Channel might be a simpler approach. Weather.com. I can search for
> (city name) and either weather (current values) or climate. It
> depends on how far away the trip will be.
> 
> -Original Message-
> From: Kevin Zembower  
> Sent: Wednesday, September 13, 2023 1:22 PM
> To: Richard O'Keefe ; Ebert,Timothy Aaron
> 
> Cc: r-help@r-project.org
> Subject: Re: [R] Help with plotting and date-times for climate data
> 
> [External Email]
> 
> Tim, Richard, y'all are reading too much into this. I believe that
> TMAX is the high temperature of the day, and TMIN is the low. I'm
> trying to compute the average or median high and low temperatures for
> the data I have (2011 to present). I'm going on a trip to this area,
> and want to know how to pack.
> 
> Thanks for your interest.
> 
> -Kevin
> 
> On Thu, 2023-09-14 at 03:07 +1200, Richard O'Keefe wrote:
> > I am well aware of the physiological implications of temperature,
> > and 
> > that is *why* I view recorded TMIN and TMAX at a single point with
> > an 
> > extremely jaundiced eye.  TMAX at shoulder height has very little 
> > relevance to an insect living in grass, for example.  And if TMAX
> > is 
> > sustained for one second, that has very different consequences from
> > if 
> > TMAX is sustained for five minutes.  I can see the usefulness of 
> > "proportion of day above Thi/below Tlo", but that is quite
> > different.
> > 
> > OK, so my interest in weather data was mainly based around water 
> > management: precipitation, evaporation, herd and crop water needs, 
> > that kind of thing.  And the first thing you learn from that 
> > experience is that ANY kind of single-point summary is seriously 
> > misleading.
> > 
> > Let's end this digression.
> > 
> > 
> > On Thu, 14 Sept 2023 at 02:18, Ebert,Timothy Aaron 
> > wrote:
> > > I had the same question.
> > > However, I can partly answer the off-topic question. Min and max
> > > can 
> > > be important as lower and upper development thresholds. Below the
> > > min no growth or development occur because reaction rates are too
> > > slow to enable such. Above max, temperatures are too hot.
> > > Protein function is impaired, and systems stop functioning. There
> > > is 
> > > a considerable range between where systems shut down (but
> > > recover) and tissue death.
> > > In a simple form the growth and physiological stage of plants, 
> > > insects, and many others, can be modeled as a function of 
> > > temperature. These are often called growing degree day models (or
> > > some version of that). This is number of thermal units needed for
> > > the organism to develop to the next stage (e.g. instar for an 
> > > insect, or fruit/flower formation for a plant). However, better 
> > > accuracy is obtained if the model includes both min and max 
> > > thresholds.
> > > 
> > > All I have done is provide an example where min and max could
> > > have a 
> > > real world use. I use max(temp) over some interval and then
> > > update 
> > > an accumulated thermal units variable based on the outcome.
> > > That detail is not evident in the original request.
> > > 
> > > Tim
> > > 
> > > -Original Message-
> > > From: R-help  On Behalf Of Richard 
> > > O'Keefe
> > > Sent: Wednesday, September 13, 2023 9:58 AM
> > > To: Kevin Zembower 
> > > Cc: r-help@r-project.org
> > > Subject: Re: [R] Help with plotting and date-times for climate
> > > data
> > > 
> > > [External Email]
> > > 
> > > Off-topic, but what is a "mean temperature max"
> > > and what good would it do you to know you if you did?
> > > I've been looking at a lot of weather station data and for no 
> > > question I've ever had (except "would the newspapers get excited 
> > > about this") was "max" (or min) the answer.  Considering the way 
> > > that temperature can change by several degrees in a few

Re: [R] Help with plotting and date-times for climate data

2023-09-13 Thread Kevin Zembower via R-help
Rui, thanks so much for your clear explanation, solution to my problem,
and additional help with making the graph come out exactly as I was
hoping. I learned a lot from your solution. Thanks, again, for your
help.

-Kevin

On Tue, 2023-09-12 at 23:06 +0100, Rui Barradas wrote:
> Às 21:50 de 12/09/2023, Kevin Zembower via R-help escreveu:
> > Hello,
> > 
> > I'm trying to calculate the mean temperature max from a file of
> > climate
> > date, and plot it over a range of days in the year. I've downloaded
> > the
> > data, and cleaned it up the way I think it should be. However, when
> > I
> > plot it, the geom_smooth line doesn't show up. I think that's
> > because
> > my x axis is characters or factors. Here's what I have so far:
> > 
> > library(tidyverse)
> > 
> > data <- read_csv("Ely_MN_Weather.csv")
> > 
> > start_day = yday(as_date("2023-09-22"))
> > end_day = yday(as_date("2023-10-15"))
> >     
> > d <- as_tibble(data) %>%
> >  select(DATE,TMAX,TMIN) %>%
> >  mutate(DATE = as_date(DATE),
> >     yday = yday(DATE),
> >     md = sprintf("%02d-%02d", month(DATE), mday(DATE))
> >     ) %>%
> >  filter(yday >= start_day & yday <= end_day) %>%
> >  mutate(md = as.factor(md))
> > 
> > d_sum <- d %>%
> >  group_by(md) %>%
> >  summarize(tmax_mean = mean(TMAX, na.rm=TRUE))
> > 
> > ## Here's the filtered data:
> > dput(d_sum)
> > 
> > > structure(list(md = structure(1:25, levels = c("09-21", "09-22",
> > "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29",
> > "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06",
> > "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13",
> > "10-14", "10-15"), class = "factor"), tmax_mean = c(65,
> > 62.2,
> > 61.3, 63.9, 64.3, 60.1, 62.3, 60.5, 61.9,
> > 61.2, 63.7, 59.5, 59.6, 61.6,
> > 59.4, 58.8, 55.9, 58.125,
> > 58, 55.7, 57, 55.4, 49.8,
> > 48.75, 43.7)), class = c("tbl_df", "tbl", "data.frame"
> > ), row.names = c(NA, -25L))
> > > 
> > ggplot(data = d_sum, aes(x = md)) +
> >  geom_point(aes(y = tmax_mean, color = "blue")) +
> >  geom_smooth(aes(y = tmax_mean, color = "blue"))
> > =
> > My questions are:
> > 1. Why isn't my geom_smooth plotting? How can I fix it?
> > 2. I don't think I'm handling the month and day combination
> > correctly.
> > Is there a way to encode month and day (but not year) as a date?
> > 3. (Minor point) Why does my graph of tmax_mean come out red when I
> > specify "blue"?
> > 
> > Thanks for any advice or guidance you can offer. I really
> > appreciate
> > the expertise of this group.
> > 
> > -Kevin
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
> 
> The problem is that the dates are factors, not real dates. And 
> geom_smooth is not interpolating along a discrete axis (the x axis).
> 
> Paste a fake year with md, coerce to date and plot.
> I have simplified the aes() calls and added a date scale in order to 
> make the x axis more readable.
> 
> Without the formula and method arguments, geom_smooth will print a 
> message, they are now made explicit.
> 
> 
> 
> suppressPackageStartupMessages({
>    library(dplyr)
>    library(ggplot2)
> })
> 
> d_sum %>%
>    mutate(md = paste("2023", md, sep = "-"),
>   md = as.Date(md)) %>%
>    ggplot(aes(x = md, y = tmax_mean)) +
>    geom_point(color = "blue") +
>    geom_smooth(
>  formula = y ~ x,
>  method = loess,
>  color = "blue"
>    ) +
>    scale_x_date(date_breaks = "7 days", date_labels = "%m-%d")
> 
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with plotting and date-times for climate data

2023-09-13 Thread Kevin Zembower via R-help
Tim, Richard, y'all are reading too much into this. I believe that TMAX
is the high temperature of the day, and TMIN is the low. I'm trying to
compute the average or median high and low temperatures for the data I
have (2011 to present). I'm going on a trip to this area, and want to
know how to pack.

Thanks for your interest.

-Kevin

On Thu, 2023-09-14 at 03:07 +1200, Richard O'Keefe wrote:
> I am well aware of the physiological implications
> of temperature, and that is *why* I view recorded
> TMIN and TMAX at a single point with an extremely
> jaundiced eye.  TMAX at shoulder height has very
> little relevance to an insect living in grass, for
> example.  And if TMAX is sustained for one second,
> that has very different consequences from if TMAX
> is sustained for five minutes.  I can see the usefulness
> of "proportion of day above Thi/below Tlo", but that
> is quite different.
> 
> OK, so my interest in weather data was mainly based
> around water management: precipitation, evaporation,
> herd and crop water needs, that kind of thing.  And
> the first thing you learn from that experience is
> that ANY kind of single-point summary is seriously
> misleading.
> 
> Let's end this digression.
> 
> 
> On Thu, 14 Sept 2023 at 02:18, Ebert,Timothy Aaron 
> wrote:
> > I had the same question.
> > However, I can partly answer the off-topic question. Min and max
> > can be important as lower and upper development thresholds. Below
> > the min no growth or development occur because reaction rates are
> > too slow to enable such. Above max, temperatures are too hot.
> > Protein function is impaired, and systems stop functioning. There
> > is a considerable range between where systems shut down (but
> > recover) and tissue death.
> > In a simple form the growth and physiological stage of plants,
> > insects, and many others, can be modeled as a function of
> > temperature. These are often called growing degree day models (or
> > some version of that). This is number of thermal units needed for
> > the organism to develop to the next stage (e.g. instar for an
> > insect, or fruit/flower formation for a plant). However, better
> > accuracy is obtained if the model includes both min and max
> > thresholds.
> > 
> > All I have done is provide an example where min and max could have
> > a real world use. I use max(temp) over some interval and then
> > update an accumulated thermal units variable based on the outcome.
> > That detail is not evident in the original request.
> > 
> > Tim
> > 
> > -Original Message-
> > From: R-help  On Behalf Of Richard
> > O'Keefe
> > Sent: Wednesday, September 13, 2023 9:58 AM
> > To: Kevin Zembower 
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Help with plotting and date-times for climate data
> > 
> > [External Email]
> > 
> > Off-topic, but what is a "mean temperature max"
> > and what good would it do you to know you if you did?
> > I've been looking at a lot of weather station data and for no
> > question I've ever had (except "would the newspapers get excited
> > about this") was "max" (or min) the answer.  Considering the way
> > that temperature can change by several degrees in a few minutes, or
> > a few metres -- I meant horizontally when I wrote that, but as you
> > know your head and feet don't experience the same temperature,
> > again by more than one degree -- I am at something of a loss to
> > ascribe much practical significance to TMAX.  Are you sure this is
> > the analysis you want to do?  Is this the most informative data you
> > can get?
> > 
> > On Wed, 13 Sept 2023 at 08:51, Kevin Zembower via R-help <
> > r-help@r-project.org> wrote:
> > 
> > > Hello,
> > > 
> > > I'm trying to calculate the mean temperature max from a file of
> > > climate date, and plot it over a range of days in the year. I've
> > > downloaded the data, and cleaned it up the way I think it should
> > > be.
> > > However, when I plot it, the geom_smooth line doesn't show up. I
> > > think
> > > that's because my x axis is characters or factors. Here's what I
> > > have so far:
> > > 
> > > library(tidyverse)
> > > 
> > > data <- read_csv("Ely_MN_Weather.csv")
> > > 
> > > start_day = yday(as_date("2023-09-22")) end_day =
> > > yday(as_date("2023-10-15"))
> > > 
> >

[R] Help with plotting and date-times for climate data

2023-09-12 Thread Kevin Zembower via R-help
Hello,

I'm trying to calculate the mean temperature max from a file of climate
date, and plot it over a range of days in the year. I've downloaded the
data, and cleaned it up the way I think it should be. However, when I
plot it, the geom_smooth line doesn't show up. I think that's because
my x axis is characters or factors. Here's what I have so far:

library(tidyverse)

data <- read_csv("Ely_MN_Weather.csv")

start_day = yday(as_date("2023-09-22"))
end_day = yday(as_date("2023-10-15"))
   
d <- as_tibble(data) %>%
select(DATE,TMAX,TMIN) %>%
mutate(DATE = as_date(DATE),
   yday = yday(DATE),
   md = sprintf("%02d-%02d", month(DATE), mday(DATE))
   ) %>%
filter(yday >= start_day & yday <= end_day) %>%
mutate(md = as.factor(md))

d_sum <- d %>%
group_by(md) %>%
summarize(tmax_mean = mean(TMAX, na.rm=TRUE))

## Here's the filtered data:
dput(d_sum)

> structure(list(md = structure(1:25, levels = c("09-21", "09-22", 
"09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29", 
"09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06", 
"10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13", 
"10-14", "10-15"), class = "factor"), tmax_mean = c(65,
62.2, 
61.3, 63.9, 64.3, 60.1, 62.3, 60.5, 61.9, 
61.2, 63.7, 59.5, 59.6, 61.6, 
59.4, 58.8, 55.9, 58.125, 
58, 55.7, 57, 55.4, 49.8, 
48.75, 43.7)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -25L))
> 
ggplot(data = d_sum, aes(x = md)) +
geom_point(aes(y = tmax_mean, color = "blue")) +
geom_smooth(aes(y = tmax_mean, color = "blue"))
=
My questions are:
1. Why isn't my geom_smooth plotting? How can I fix it?
2. I don't think I'm handling the month and day combination correctly.
Is there a way to encode month and day (but not year) as a date?
3. (Minor point) Why does my graph of tmax_mean come out red when I
specify "blue"?

Thanks for any advice or guidance you can offer. I really appreciate
the expertise of this group.

-Kevin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Suggestions for hierarchical row names?

2023-07-31 Thread Kevin Zembower via R-help
Hello,

On 28 June I asked a question with the subject "Processing a 
hierarchical string name". Folks here were very generous in helping me, 
and I'm very pleased with the solutions. Now, I'm asking about a related 
topic, and I have both technical and stylistic questions.

I'm still working on my US census report for my neighborhood, and have 
tables with labels like this:

 > p16_tab[,1]
# A tibble: 9 × 1
   label 



1 " !!Total:" 

2 " !!Total:!!Family households:" 

3 " !!Total:!!Family households:!!Married couple family" 

4 " !!Total:!!Family households:!!Other family:" 

5 " !!Total:!!Family households:!!Other family:!!Male householder, no 
spouse present"
6 " !!Total:!!Family households:!!Other family:!!Female householder, no 
spouse present"
7 " !!Total:!!Nonfamily households:" 

8 " !!Total:!!Nonfamily households:!!Householder living alone" 

9 " !!Total:!!Nonfamily households:!!Householder not living alone" 

 >

A sample table can be obtained with:
library(tidyverse)
library(tidycensus)

get_us <- function(table, summary_var) {
 get_decennial(
 geography = "us",
 table = table,
 cache_table = TRUE,
 year = 2020,
 sumfile = "dhc",
 summary_var = summary_var) %>%
 mutate(GEOID = NULL,
NAME = NULL,
"US_pc" = value / summary_value * 100,
value = NULL,
summary_value = NULL)

tableID <- "P16"
summary_var <- "P16_001N"

(us_P16 <- get_us(tableID, summary_var))

labels <- load_variables(2020, "dhc", cache = TRUE)

(p16_tab <- us_P16 %>%
  left_join(labels, by = c("variable" = "name")) %>%
  mutate(variable = NULL, concept = NULL) %>%
  relocate(label)
)

Initially, I thought that I would indent the lines by a single space for 
every piece of text starting with "!!" and ending with ":" except for 
the last one. This works fine, if the final output was just ASCII text.

However, I'm trying to output my report in LaTeX, using sweave and 
knitr/kable. When I output my report using spaces, LaTeX deletes them. I 
then tried replacing the spaces with "\hspace{1em}":

p16_tab$label <- p16_tab$label %>%
 str_replace("^ !!", "") %>% #Drop the leading ' !!'
 str_replace_all("[^!]*!!", "hspace{1em}")

kable(p16_tab, format = "latex", booktabs = TRUE,
   col.names = c("label", "United States %-age")
   )

This results in the "\" of "\hspace" being replace with 
"\textbackslash{}hspace".

I also thought that there was a way to suppress formatting in 
kableExtra, but I can't find it now. Regardless, I remember it didn't 
work the way I wanted it to, either.

kableExtra has the add_indent() function, that looks promising:

(p16_tab <- us_P16 %>%
  left_join(labels, by = c("variable" = "name")) %>%
  mutate(variable = NULL, concept = NULL) %>%
  relocate(label)
)

p16_tab$label <- p16_tab$label %>%
 str_replace("^ !!", "") %>% #Drop the leading ' !!'
 str_replace_all("[^!]*!!", "") #Replace each !!.* with nothing

Unfortunately, this doesn't work:

kable(p16_tab, format = "latex", booktabs = TRUE,
   col.names = c("label", "United States %-age")
   ) %>%
   add_indent(c(2:9), level_of_indent = c(1,2,2,3,3,1,2,2))

and I have to do this:

kable(p16_tab, format = "latex", booktabs = TRUE,
   col.names = c("label", "United States %-age")
   ) %>%
 add_indent(c(2,7), level_of_indent = 1) %>%
 add_indent(c(3,4,8,9), level_of_indent = 2) %>%
 add_indent(c(5,6), level_of_indent = 3)

However, this is manual, and therefore not really satisfactory.

I have two question:

1. If I want to use spaces, and would like a programmatic solution 
(versus a manual one), can this be done?

2. Stylistically, is there a better way to represent the nesting of 
lower rows in a table below upper rows? If I don't fixate on using 
spaces, is there anther way?

Thanks for your suggestions and advice.

-Kevin


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing a hierarchical string name

2023-06-29 Thread Kevin Zembower via R-help
Ivan and Bert, thank you so much for your help.

Ivan, your solution worked perfectly. I didn't really understand how to 
do string processing on a vector of strings, and your solution 
demonstrated it for me. I modified it to work with the tidyverses' 
stringr library in this way:

bg3_race_sum <- bg3_race %>%
 left_join(pl_vars, by=c("variable" = "name")) %>%
 group_by(variable) %>%
 summarize(count = sum(value)) %>%
 left_join(pl_vars, by=c("variable" = "name")) %>%
 filter(count > 0) %>%
 .$label %>%
 str_replace("^ !!", "") %>% #Drop the leading ' !!'
 str_replace_all("[^!]*!!", " ") #Replace each !!.* with space

Bert, your solution was close to correct. It correctly dropped the right 
text, but didn't insert a space for each piece of text between "!!" and 
after the ":". I'm using those spaces to preserve the hierarchical 
nature of the numbers, how lower numbers (in the chart) are included in 
higher numbers. For instance, the "Total:" number is the sum of 
"Population of one race" and "Population of two or more races".

Thank you both for helping me with this specific problem and for 
increasing my knowledge and abilities with R.

-Kevin

On 6/28/23 16:56, Ivan Krylov wrote:
> On Wed, 28 Jun 2023 20:29:23 +
> Kevin Zembower via R-help  wrote:
> 
>> I think my algorithm for the labels is:
>> 1. keep everything from the last "!!" up to and including the last
>> character
>> 2. for everything remaining, replace each "!!.*:" group with a single
>> space.
> 
> If you remove the initial ' !!', the problem becomes a more tractable
> "replace each group of non-'!' followed by '!!' with one space":
> 
> bg3_race_sum$label |>
>   (\(.) sub('^ !!', '', .))() |>
>   (\(.) gsub('[^!]*!!', ' ', .))()
> 
> But that solution could have been impossible if the task was slightly
> different.
> 
>> I can split the label using str_split(label, pattern = "!!") to get a
>> vector of strings, but don't know how to work on the last string and
>> all the rest of the strings separately.
> 
> str_split() would have given you a list of character vectors. You can
> use lapply to evaluate a function on each vector inside that list.
> Inside the function, use length(x) (if `x` is the argument of the
> function) to find out how many spaces to produce and which element of
> the vector is the last one. (For code golf points, use rev(x)[1] to get
> the last element.)
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Processing a hierarchical string name

2023-06-28 Thread Kevin Zembower via R-help
Hello, all

I'm trying to process the names of the variables in the US Census 
database, that I'm retrieving with tidycensus. My end goal is to produce 
nicely formatted tables with natural labels.

The labels as downloaded from the US Census look like this:

## Get the P1 table for block group 3 in census tract 2711.01:
bg3_race <- get_decennial(
 geography = "block group",
 state = "MD",
 county = "Baltimore city",
 table = "P1",
 cache_table = TRUE,
 year = "2020",
 sumfile = "pl")%>%
 filter(substr(GEOID, 6, 12) == "2711013")

## Load the names and labels of the variables:
pl_vars <- load_variables(year = "2020", dataset = "pl", cache = TRUE)

## Join the labels to the variables, and drop the zero counts
bg3_race_sum <- bg3_race %>%
 left_join(pl_vars, by=c("variable" = "name")) %>%
 filter(value > 0) %>%
 select(c(GEOID, value, label))

head(bg3_race_sum$label)
[1] " !!Total:" 

[2] " !!Total:!!Population of one race:" 

[3] " !!Total:!!Population of one race:!!White alone" 

[4] " !!Total:!!Population of one race:!!Black or African American 
alone"
[5] " !!Total:!!Population of one race:!!American Indian and Alaska 
Native alone"
[6] " !!Total:!!Population of one race:!!Asian alone" 


I think my algorithm for the labels is:
1. keep everything from the last "!!" up to and including the last character
2. for everything remaining, replace each "!!.*:" group with a single space.

This turns head() into:
"Total:"
" Population of one race:"
"  White alone"
"  Black or African American alone"
"  American Indian and Alaska Native alone"
"  Asian alone"
[may not be clearly visible if not rendered in a monospaced font]

I think that I need lapply here, but I'm not sure of that, and of what 
to do next. I can split the label using str_split(label, pattern = "!!") 
to get a vector of strings, but don't know how to work on the last 
string and all the rest of the strings separately.

Thank you for any suggestions to nudge me along towards a workable solution.

-Kevin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rmarkdown code rendering as LaTeX, not executing?

2023-06-13 Thread Kevin Zembower via R-help
Hi, all,

I'm trying to compose an Rmarkdown document and render it as a PDF file. 
My first block of R code seems to work okay, but the second on seems to 
be interpreted as LaTeX code, and not executed as R code. In the output, 
the three back-ticks that mark the R code block are interpreted as an 
opening double-quote, followed by an opening single quote.

Here's my test file:

---
title: "An analysis of US 2020 Census Data for the Radnor-Winston 
neighborhood"
author: "E. Kevin Zembower"
date: "29 May 2023"
output:
pdf_document:
   extra_dependencies: ["array", "booktabs", "dcolumn"]

---

```{r setup, include = FALSE}

```

\section{Abstract}
In this document, ...

\section{Boundaries of the Radnor-Winston neighborhood}

...

  For the purposes of this report, the
boundaries of RW are as shown in figure \ref{RWneigh}. ...

```{r rw_map,  fig.width = 6, fig.height = 4, out.width = "80%", dev = 
"pdf",
fig.cap = "Map of RW neighborhood\label{RWneigh}"}

## Creating a polygon for RW neighborhood, based on CRS 6487 (NAD83
## (2011) / Maryland ) map in meters:
base_x <- 433000
base_y <- 186000
rw_neigh_pg_m <- data.frame(
 matrix(
 c(540, 1140,
   540, 1070,
   480, 1060,
   490, 1000,
   570, 1000,
   570, 940,
   550, 930,
   550, 890,
   580, 890,
   590, 820,
   640, 820,
   650, 590,
   520, 580,
   470, 580,
   350, 660,
   350, 710,
   180, 725,
   190, 900,
   220, 900,
   220, 1030,
   240, 1030,
   240, 1110
 ),
 ncol = 2, byrow = TRUE)
) %>% + matrix(c(rep(base_x, nrow(.)), rep(base_y, nrow(.))),
nrow = nrow(.)) %>%
sf::st_as_sf(coords = c(1,2), dim = "XY") %>%
summarize(geometry = st_combine(geometry)) %>%
st_cast("POLYGON") %>%
st_set_crs(6487)

## Map it:
rw_base_blocks <- read_osm(bb(rw_neigh_pg_m, ext = 1.3))

## Line below gives map in meters
(RW_block_map <- tm_shape(rw_base_blocks, projection = 6487) +
## Line below gives map in degrees
## (RW_block_map <- tm_shape(rw_base_blocks, projection = 6487) +
  tm_rgb() +
  tm_shape(rw_neigh_pg_m) +
  tm_fill(col = "green", alpha = 0.2) +
  tm_borders(lwd = 2, alpha = 1) +
  tm_scale_bar() +
  ## tm_grid() + tm_xlab("Long") + tm_ylab("Lat") +
  tm_grid() +
  tm_layout(title = "Radnor-Winston Neighborhood")
)

## tmap_save(RW_block_map, "rw_map.png")

```


This code block can also be obtained from 
https://gist.github.com/kzembower/f9ad52abf82975102cbf715bcfbc0f51.

I'm using Emacs and ESS to create this document. This seems to produce 
its own weirdness, as the text style and font color and sizes change in 
the R code block as I edit it and add spaces and lines.

If the block above is saved as "RW_test.Rmd", I use these lines to 
create the PDF:
===
library(rmarkdown)
render("RW_test.Rmd")


No errors are generated.

Can anyone help me understand what I'm doing wrong? A much shorter test 
file I created seems to work okay.

Thanks in advance for any advice.

-Kevin

 > sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: 
/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; 
LAPACK version 3.10.0

locale:
  [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C 
LC_TIME=en_US.UTF-8
  [4] LC_COLLATE=en_US.UTF-8LC_MONETARY=en_US.UTF-8 
LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8 
LC_ADDRESS=en_US.UTF-8
[10] LC_TELEPHONE=en_US.UTF-8  LC_MEASUREMENT=en_US.UTF-8 
LC_IDENTIFICATION=en_US.UTF-8

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
  [1] kableExtra_1.3.4 tidycensus_1.4   lubridate_1.9.2  forcats_1.0.0 
  stringr_1.5.0dplyr_1.1.2
  [7] purrr_1.0.1  readr_2.1.4  tidyr_1.3.0  tibble_3.2.1 
  ggplot2_3.4.2tidyverse_2.0.0
[13] rmarkdown_2.22

loaded via a namespace (and not attached):
  [1] gtable_0.3.3xfun_0.39   raster_3.6-20 
tigris_2.0.3rJava_1.0-6
  [6] lattice_0.21-8  tzdb_0.4.0  vctrs_0.6.2 
tools_4.3.0 generics_0.1.3
[11] curl_5.0.0  proxy_0.4-27fansi_1.0.4 
pkgconfig_2.0.3 KernSmooth_2.23-21
[16] webshot_0.5.4   uuid_1.1-0  lifecycle_1.0.3 
compiler_4.3.0  munsell_0.5.0
[21] tinytex_0.45terra_1.7-29codetools_0.2-19 
htmltools_0.5.5 class_7.3-22
[26] yaml_2.3.7  crayon_1.5.2pillar_1.9.0 
classInt_0.4-9  tidyselect_1.2.0
[31] rvest_1.0.3 digest_0.6.31   stringi_

[R] Recommended ways to draw US Census map on Open Street Map base map?

2023-06-06 Thread Kevin Zembower via R-help
Hello, all,

I asked a version of this question on the R-sig-geo list, but didn't get 
any response. I'm asking here in the hopes of a wider audience.

I'm trying to draw US Census map data, fetched with tigris, on top of a 
base map fetched by the package OpenStreetMap. I'm hoping for the most 
straight-forward solution. I made significant progress with leaflet(), 
but didn't need the interactivity of the map. I just need a 2D, static 
map that I can print and include in a document.

Here's some of what I've tried so far:
==
library(tidyverse)
library(tigris)
options(tigris_use_cache = TRUE)
library(OpenStreetMap)
library(ggplot2)

## Get an Open Street Map:
rw_map <- openmap(nw, se,
   type = "osm",
   mergeTiles = TRUE) %>%
 openproj(projection = "+proj=longlat +ellps=WGS84 +datum=WGS84 
+no_defs")

## Get an example census map:
rw_tract <- tracts(state = "MD",
 county = "Baltimore city",
 year = "2020") %>%
 filter(NAME == "2711.01")

## This works:
autoplot.OpenStreetMap(rw_map)

## So does this:
plot(rw_tract$geometry)

## These don't:
autoplot.OpenStreetMap(rw_map) +
 geom_sf(rw_tract$geometry)

ggplot(map_data(rw_map), aes(long, lat))


ggplot(aes(x="long", y="lat")) +
 geom_sf(rw_map$geometry)
=

I think my problem in part is failing to fully understand the formats of 
the rw_map and rw_tract containers. rw_tract says it's a simple feature 
collection, but rw_map just gives me lists of the data.

Can anyone help nudge me along in getting my rw_tract to be drawn on my 
rw_map? Any advice or guidance on putting together map data from 
different sources?

And an over-arching question: Is moving in this direction, with ggplot2, 
the way you would recommend accomplishing this task? Is there a simpler, 
more straight-forward way of doing this?

Thanks in advance for your help and efforts.

-Kevin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External Email] Newbie: Controlling legends in graphs

2023-05-16 Thread Kevin Zembower via R-help
See below.

On 5/16/23 10:52, Christopher Ryan wrote:
> I"m more of a lattice guy than a ggplot guy, but perhaps this is part of 
> the problem:
> 
> .
>       geom_point(aes(y = m_K, color = "red")) +  # >> you've
> associated "K" with the color red
>       geom_smooth(aes(y = m_K, color = "red")) +
>       geom_point(aes(y = m_J, color = "blue")) +   ## >> and "J"
> with the color blue
>       geom_smooth(aes(y = m_J, color = "blue")) +
> 
> .
> 
Yes, I was confused that I associated  "K" with the color red, yet the 
line and points for K's data were blue, but in the legend, was labeled 
with the word "red".

But, I think I've got it straightened out now. Thanks for your help.

-Kevin


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Newbie: Drawing fitted lines on subset of data

2023-05-16 Thread Kevin Zembower via R-help
Yep, that did it. I didn't know that you could have pipelines within 
pipelines.

Thanks, again, for all your help.

-Kevin

On 5/16/23 11:44, Rui Barradas wrote:
> Às 15:29 de 16/05/2023, Kevin Zembower via R-help escreveu:
>> Hello,
>>
>> I's still working with my tsibble of weight data for the last 20 years.
>> In addition to drawing an overall trend line, using lm, for the whole
>> data set, I'd like to draw short lines that would recompute lm and draw
>> it, say, just for the years from 2010:2015.
>>
>> Here's a short example that I think illustrates what I'm trying to do.
>> The commented out sections show what I've tried to far:
>>
>> ## Short example to test segments:
>>
>> w <- tsibble(
>>   date = as.Date("2022-01-01") + 0:99,
>>   value = rnorm(100)
>> )
>>
>> ggplot(data = w, mapping = aes(date, value)) +
>>   geom_smooth(method = "lm", se = FALSE) +
>>   geom_point()
>>   ## Below gives error about ignoring data
>>   ## geom_abline( data = w$date[25:75] )
>>   ## Gives error ''data' must be in '
>>   ## geom_smooth(data = w$date[25:35],
>>   ## method = lm,
>>   ## color = "black",
>>   ## se = FALSE)
>>
>> I'm thinking that this is probably easily done, but I'm struggling with
>> how to subset the data in the middle of the pipeline.
>>
>> Thanks for any advice and help.
>>
>> -Kevin
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
> 
> Try the following.
> In the 2nd geom_smooth you need a subset of the data not of just one of 
> its columns.
> 
> 
> 
> suppressPackageStartupMessages({
>    library(tsibble)
>    library(dplyr)
>    library(ggplot2)
>    library(lubridate)
> })
> 
> ggplot(data = w, mapping = aes(date, value)) +
>    geom_smooth(formula = y ~ x, method = "lm", se = FALSE) +
>    geom_point() +
>    geom_smooth(
>      data = w %>% filter(year(date) >= 2010, year(date) <= 2015),
>      mapping = aes(date, value),
>      formula = y ~ x,
>      method = lm,
>      color = "black",
>      se = FALSE
>    )
> 
> 
> Other ways to subset the data are
> 
> 
> # dplyr
> data = w %>% filter(year(date) %in% 2010:2015)
> # base R
> data = subset(w, year(date) %in% 2010:2015)
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Newbie: Drawing fitted lines on subset of data

2023-05-16 Thread Kevin Zembower via R-help
Hello,

I's still working with my tsibble of weight data for the last 20 years. 
In addition to drawing an overall trend line, using lm, for the whole 
data set, I'd like to draw short lines that would recompute lm and draw 
it, say, just for the years from 2010:2015.

Here's a short example that I think illustrates what I'm trying to do. 
The commented out sections show what I've tried to far:

## Short example to test segments:

w <- tsibble(
 date = as.Date("2022-01-01") + 0:99,
 value = rnorm(100)
)

ggplot(data = w, mapping = aes(date, value)) +
 geom_smooth(method = "lm", se = FALSE) +
 geom_point()
 ## Below gives error about ignoring data
 ## geom_abline( data = w$date[25:75] )
 ## Gives error ''data' must be in '
 ## geom_smooth(data = w$date[25:35],
 ## method = lm,
 ## color = "black",
 ## se = FALSE)

I'm thinking that this is probably easily done, but I'm struggling with 
how to subset the data in the middle of the pipeline.

Thanks for any advice and help.

-Kevin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Newbie: Controlling legends in graphs

2023-05-16 Thread Kevin Zembower via R-help
Rui, thanks so much for your help. Your explanation and example were 
clear and concise. Thanks for taking the time and effort to help me.

-Kevin

On 5/12/23 16:06, Rui Barradas wrote:
> Às 14:24 de 12/05/2023, Kevin Zembower via R-help escreveu:
>> Hello, I'm trying to create a line graph with a legend, but have no
>> success controlling the legend. Since nothing I've tried seems to work,
>> I must be doing something systematically wrong. Can anyone point this
>> out to me?
>>
>> Here's my data:
>>   > weights
>> # A tibble: 1,246 × 3
>>  Date   J K
>>    
>>    1 2000-02-13   133  188
>>    2 2000-02-20   134  185
>>    3 2000-02-27   135  187
>>    4 2000-03-05   135  185
>>    5 2000-03-12    NA  184
>>    6 2000-03-19    NA  184.
>>    7 2000-03-26   136  184.
>>    8 2000-04-02   134  185
>>    9 2000-04-09   133  186
>> 10 2000-04-16    NA  186
>> # ℹ 1,236 more rows
>> # ℹ Use `print(n = ...)` to see more rows
>>   >
>>
>> Here's my attempts. You can see some of the things I've tried in the
>> commented out sections:
>> weights %>%
>>   group_by(year(Date)) %>%
>>   summarize(
>>   m_K = mean(K, na.rm = TRUE),
>>   m_J = mean(J, na.rm = TRUE),
>>   ) %>%
>>   ggplot(aes(x = `year(Date)`)) +
>>   geom_point(aes(y = m_K, color = "red")) +
>>   geom_smooth(aes(y = m_K, color = "red")) +
>>   geom_point(aes(y = m_J, color = "blue")) +
>>   geom_smooth(aes(y = m_J, color = "blue")) +
>>   guides(size = "legend",
>>  shape = "legend")
>>   ## scale_shape_discrete(name="Person",
>>   ##  breaks=c("m_K", "m_J"),
>>   ##  labels=c("K", "J"))
>>   ## theme(legend.title=element_blank())
>>
>> When this runs, the blue line for "K" is above the red line for "J", as
>> I expect, but in the legend, the red is shown first, and labeled "blue."
>>
>> I'd like to be able to create a legend where the first entry shows a
>> blue line and is labeled "K" and the second is red and labeled "J".
>>
>> On a different but related topic, I'd welcome any advice or suggestions
>> on my methodology in this example. Is this the correct way to summarize
>> with a mean? Do I need the two sets of geom_point and geom_line clauses
>> to create this graph, or is there a better way?
>>
>> Thanks for all your advice and guidance.
>>
>> -Kevin
>>
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
> 
> This is mainly a data reshaping problem. Insteadof plotting two 
> variables, J and K, if the data is in the long format you will map the 
> column with these variables names to the color aesthetic and call each 
> geom_* only once. Then, assign the colors you want.
> 
> As for placing K above J, note that ggplot places them by alphabetical 
> order unless you coerce to factor with the levels in the order you want.
> 
> Also, if you want to compute aggregate statistics for several columns, 
> use ?across. See the code below.
> 
> Here is a complete example. I have augmented your data set in order to 
> have more years to plot.
> 
> 
> 
> # augment the data set
> weights <- " Date   J K
>    1 2000-02-13   133  188
>    2 2000-02-20   134  185
>    3 2000-02-27   135  187
>    4 2000-03-05   135  185
>    5 2000-03-12    NA  184
>    6 2000-03-19    NA  184.
>    7 2000-03-26   136  184.
>    8 2000-04-02   134  185
>    9 2000-04-09   133  186
> 10 2000-04-16    NA  186"
> weights <- read.table(text = weights, header = TRUE)
> weights$Date <- as.Date(weights$Date)
> tmp <- weights
> tmp <- lapply(1:10, \(y) {
>    tmp$Date <- years(y) + tmp$Date
>    tmp$J <- tmp$J + sample(-10:10, nrow(weights), TRUE)
>    tmp$K <- tmp$K + sample(-10:10, nrow(weights), TRUE)
>    tmp
> })
> weights <- do.call(rbind, tmp)
> 
> #---
> 
> # plot code
> library(ggplot2)
> library(dplyr)
> library(tidyr)
> library(lubridate)
> 
> weights %>%
&

[R] Newbie: Controlling legends in graphs

2023-05-12 Thread Kevin Zembower via R-help
Hello, I'm trying to create a line graph with a legend, but have no 
success controlling the legend. Since nothing I've tried seems to work, 
I must be doing something systematically wrong. Can anyone point this 
out to me?

Here's my data:
 > weights
# A tibble: 1,246 × 3
Date   J K
  
  1 2000-02-13   133  188
  2 2000-02-20   134  185
  3 2000-02-27   135  187
  4 2000-03-05   135  185
  5 2000-03-12NA  184
  6 2000-03-19NA  184.
  7 2000-03-26   136  184.
  8 2000-04-02   134  185
  9 2000-04-09   133  186
10 2000-04-16NA  186
# ℹ 1,236 more rows
# ℹ Use `print(n = ...)` to see more rows
 >

Here's my attempts. You can see some of the things I've tried in the 
commented out sections:
weights %>%
 group_by(year(Date)) %>%
 summarize(
 m_K = mean(K, na.rm = TRUE),
 m_J = mean(J, na.rm = TRUE),
 ) %>%
 ggplot(aes(x = `year(Date)`)) +
 geom_point(aes(y = m_K, color = "red")) +
 geom_smooth(aes(y = m_K, color = "red")) +
 geom_point(aes(y = m_J, color = "blue")) +
 geom_smooth(aes(y = m_J, color = "blue")) +
 guides(size = "legend",
shape = "legend")
 ## scale_shape_discrete(name="Person",
 ##  breaks=c("m_K", "m_J"),
 ##  labels=c("K", "J"))
 ## theme(legend.title=element_blank())

When this runs, the blue line for "K" is above the red line for "J", as 
I expect, but in the legend, the red is shown first, and labeled "blue."

I'd like to be able to create a legend where the first entry shows a 
blue line and is labeled "K" and the second is red and labeled "J".

On a different but related topic, I'd welcome any advice or suggestions 
on my methodology in this example. Is this the correct way to summarize 
with a mean? Do I need the two sets of geom_point and geom_line clauses 
to create this graph, or is there a better way?

Thanks for all your advice and guidance.

-Kevin


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.