Re: [R] Advice on starting to analyze smokestack emissions?
Just to follow up on this thread, I didn't experience any problems accessing the air monitoring data with the RAQSAPI package that I anticipated from the US EPA's Air Quality System (AQS) Data Mart database website. I didn't have to qualify with an agency affiliation at all, just an email address. Thanks again, Karl, for suggesting this. -Kevin On Fri, 2023-12-15 at 08:29 -0500, Kevin Zembower wrote: > Bert, Tim, Karl and Richard, thank you all for your suggestions and > help. > > I will try the R-sig-ecology list. > > Karl, I wasn't aware of the RAQSAPI package, but it looked promising. > However, when I went to the source of the data it uses, the United > States Environmental Protection Agency’s (US EPA) Air Quality System > (AQS) Data Mart database, it looks like interactive access to the > data > is restricted to those who can document a professional agency > affiliation. I don't have that. I'll work with the package to see if > this is true regarding obtaining the data through it. Thanks for the > suggestion. > > Richard, the Canada study of crematoriums was very useful. Thanks. > > Thanks, again, all, for your help. > > -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Advice on starting to analyze smokestack emissions?
Bert, Tim, Karl and Richard, thank you all for your suggestions and help. I will try the R-sig-ecology list. Karl, I wasn't aware of the RAQSAPI package, but it looked promising. However, when I went to the source of the data it uses, the United States Environmental Protection Agency’s (US EPA) Air Quality System (AQS) Data Mart database, it looks like interactive access to the data is restricted to those who can document a professional agency affiliation. I don't have that. I'll work with the package to see if this is true regarding obtaining the data through it. Thanks for the suggestion. Richard, the Canada study of crematoriums was very useful. Thanks. Thanks, again, all, for your help. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Advice on starting to analyze smokestack emissions?
Hello, all, [Originally sent to r-sig-geo list, with no response. Cross-posting here, in the hope of a wider audience. Anyone with any experience in this topic? Thanks.] I'm trying to get started analyzing the concentrations of smokestack emissions. I don't have any professional background or training for this; I'm just an old, retired guy who thinks playing with numbers is fun. A local funeral home in my neighborhood (less than 1200 ft from my home) is proposing to construct a crematorium for human remains. I have some experience with the tidycensus package and thought it might be interesting to construct a model for the changes in concentrations of the pollutants from the smokestack and, using recorded wind speeds and directions, see which US Census blocks would be affected. I have the US Government EPA SCREEN3 output on how concentration varies with distance from the smokestack. See https://www.epa.gov/scram/air-quality-dispersion-modeling-screening-models#screen3 if curious. As a first task, I'd like to see if I can calculate similar results in R. I'm aware of the 'plume' steady-state Gaussian dispersion package (https://rdrr.io/github/holstius/plume/f/inst/doc/plume-intro.pdf), but am a little concerned that this package was last updated 11 years ago. Do you have any recommendations for me on how to get started analyzing this problem? Is 'plume' still the way to go? I'm aware that there are many atmospheric dispersion models from the US EPA, but I was hoping to keep my work within R, which I'm really enjoying using and learning about. Are SCREEN3 and 'plume' comparable? Is this the best R list to ask questions about this topic? Thanks for any advice or guidance you have for me. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing shapes in legend with scale_shape_manual
Tim, thanks, it helps very much. It works like a charm. Wow, there's so much I don't understand about ggplot2 functions, especially the aes() function. I just discovered the ggplot2-book.org site, and I hope to read it slowly and carefully over the next couple of weeks. Thanks again, Tim, for all your help. -Kevin On Tue, 2023-10-31 at 12:35 +, Howard, Tim G (DEC) wrote: > I believe the missing shapes are because you had set alpha=0 for the > last geom point. > > I expect there are better ways, but one way to handle it would be to > avoid the filtering, adding columns with med and exercise status, > like the following: > > # setup with data provided > Date <- c('2023-10-17', '2023-10-16', '2023-10-15', '2023-10-14', > '2023-10-13', '2023-10-12', '2023-10-11') > Time <- c('08:50', '06:58', '09:17', '09:04', '08:44', '08:55', > '07:55') > bg <- c(128, 144, 137, 115, 136, 122, 150) > missed_meds <- c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE) > no_exercise <- c(FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE) > > b2 <- data.frame(Date, Time, bg, missed_meds, no_exercise) > > b2$Date <- as.Date(b2$Date) > # add "status" columns, could also be defined as factor. > b2$medStat <- c("missed_meds",NA, NA, NA, NA, NA, "missed_meds") > b2$exercise <- c(NA, NA, "missed_exercise",NA,"missed_exercise", > "missed_exercise", "missed_exercise") > > Then your ggplot call would be like this: > > ggplot(data = b2, aes(x = Date, y = bg)) + > geom_line() + > geom_point(aes(shape = medStat), size = 3)+ > geom_point(aes(shape = exercise),size = 3)+ > scale_y_continuous(name = "Blood glucose (mg/dL)", > breaks = seq(100, 230, by = 20) > ) + > geom_hline(yintercept = 130) + > scale_shape_manual(name = "Conditions", > labels = c("Missed meds", > "Missed exercise"), > values = c(20, 4) > ) > > > Note that this method then gets very close without the > scale_shape_manual, too. > > Hope that helps. > Tim > > > > Date: Mon, 30 Oct 2023 20:55:17 + > > From: Kevin Zembower > > To: r-help@r-project.org > > Subject: [R] Missing shapes in legend with scale_shape_manual > > Message-ID: > > <0100018b825e8f7f-646d2539-f8b5-4e1a-afc3-5d29f961967f- > > 000...@email.amazonses.com> > > > > Content-Type: text/plain; charset="utf-8" > > > > Hello, > > > > I'm trying to plot a graph of blood glucose versus date. I also > > record > > conditions, such as missing the previous night's medications, and > > missing > > exercise on the previous day. My data looks like: > > > > > b2[68:74,] > > # A tibble: 7 × 5 > > Date Time bg missed_meds no_exercise > > > > 1 2023-10-17 08:50 128 TRUE FALSE > > 2 2023-10-16 06:58 144 FALSE FALSE > > 3 2023-10-15 09:17 137 FALSE TRUE > > 4 2023-10-14 09:04 115 FALSE FALSE > > 5 2023-10-13 08:44 136 FALSE TRUE > > 6 2023-10-12 08:55 122 FALSE TRUE > > 7 2023-10-11 07:55 150 TRUE TRUE > > > > > > > This gets me most of the way to what I want: > > > > ggplot(data = b2, aes(x = Date, y = bg)) + > > geom_line() + > > geom_point(data = filter(b2, missed_meds), > > shape = 20, > > size = 3) + > > geom_point(data = filter(b2, no_exercise), > > shape = 4, > > size = 3) + > > geom_point(aes(x = Date, y = bg, shape = missed_meds), > > alpha = 0) + #Invisible point layer for shape > > mapping > > scale_y_continuous(name = "Blood glucose (mg/dL)", > > breaks = seq(100, 230, by = 20) > > ) + > > geom_hline(yintercept = 130) + > > scale_shape_manual(name = "Conditions", > > labels = c("Missed meds", > > "Missed exercise"), > > values = c(20, 4), > > ## size = 3 > > ) > > > > However, the legend just prints an empty square in front of the > > labels. > > What I want is a filled circle (shape 20) in front of "Missed meds" > > and a filled > > circle (shape 4) in front of "Missed exercise." > > > > My questions are: > > 1. How can I fix my plot to show the shapes in the legend? > > 2. Can my overall plotting method be improved? Would you do it > > this way? > > > > Thanks so much for your advice and guidance. > > > > -Kevin > > > > > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Missing shapes in legend with scale_shape_manual
Hello, I'm trying to plot a graph of blood glucose versus date. I also record conditions, such as missing the previous night's medications, and missing exercise on the previous day. My data looks like: > b2[68:74,] # A tibble: 7 × 5 Date Time bg missed_meds no_exercise 1 2023-10-17 08:50128 TRUEFALSE 2 2023-10-16 06:58144 FALSE FALSE 3 2023-10-15 09:17137 FALSE TRUE 4 2023-10-14 09:04115 FALSE FALSE 5 2023-10-13 08:44136 FALSE TRUE 6 2023-10-12 08:55122 FALSE TRUE 7 2023-10-11 07:55150 TRUETRUE > This gets me most of the way to what I want: ggplot(data = b2, aes(x = Date, y = bg)) + geom_line() + geom_point(data = filter(b2, missed_meds), shape = 20, size = 3) + geom_point(data = filter(b2, no_exercise), shape = 4, size = 3) + geom_point(aes(x = Date, y = bg, shape = missed_meds), alpha = 0) + #Invisible point layer for shape mapping scale_y_continuous(name = "Blood glucose (mg/dL)", breaks = seq(100, 230, by = 20) ) + geom_hline(yintercept = 130) + scale_shape_manual(name = "Conditions", labels = c("Missed meds", "Missed exercise"), values = c(20, 4), ## size = 3 ) However, the legend just prints an empty square in front of the labels. What I want is a filled circle (shape 20) in front of "Missed meds" and a filled circle (shape 4) in front of "Missed exercise." My questions are: 1. How can I fix my plot to show the shapes in the legend? 2. Can my overall plotting method be improved? Would you do it this way? Thanks so much for your advice and guidance. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with plotting and date-times for climate data
down (but > > > > recover) and tissue death. > > > > In a simple form the growth and physiological stage of plants, > > > > insects, and many others, can be modeled as a function of > > > > temperature. These are often called growing degree day models > > > > (or > > > > some version of that). This is number of thermal units needed > > > > for > > > > the organism to develop to the next stage (e.g. instar for an > > > > insect, or fruit/flower formation for a plant). However, better > > > > accuracy is obtained if the model includes both min and max > > > > thresholds. > > > > > > > > All I have done is provide an example where min and max could > > > > have > > > > a real world use. I use max(temp) over some interval and then > > > > update an accumulated thermal units variable based on the > > > > outcome. > > > > That detail is not evident in the original request. > > > > > > > > Tim > > > > > > > > -Original Message- > > > > From: R-help On Behalf Of > > > > Richard > > > > O'Keefe > > > > Sent: Wednesday, September 13, 2023 9:58 AM > > > > To: Kevin Zembower > > > > Cc: r-help@r-project.org > > > > Subject: Re: [R] Help with plotting and date-times for climate > > > > data > > > > > > > > [External Email] > > > > > > > > Off-topic, but what is a "mean temperature max" > > > > and what good would it do you to know you if you did? > > > > I've been looking at a lot of weather station data and for no > > > > question I've ever had (except "would the newspapers get > > > > excited > > > > about this") was "max" (or min) the answer. Considering the > > > > way > > > > that temperature can change by several degrees in a few > > > > minutes, > > > > or > > > > a few metres -- I meant horizontally when I wrote that, but as > > > > you > > > > know your head and feet don't experience the same temperature, > > > > again > > > > by more than one degree -- I am at something of a loss to > > > > ascribe > > > > much practical significance to TMAX. Are you sure this is the > > > > analysis you want to do? Is this the most informative data you > > > > can > > > > get? > > > > > > > > On Wed, 13 Sept 2023 at 08:51, Kevin Zembower via R-help < > > > > r-help@r-project.org> wrote: > > > > > > > > > Hello, > > > > > > > > > > I'm trying to calculate the mean temperature max from a file > > > > > of > > > > > climate date, and plot it over a range of days in the year. > > > > > I've > > > > > downloaded the data, and cleaned it up the way I think it > > > > > should > > > > > be. > > > > > However, when I plot it, the geom_smooth line doesn't show > > > > > up. > > > > > I > > > > > think that's because my x axis is characters or factors. > > > > > Here's > > > > > what I have so far: > > > > > > > > > > library(tidyverse) > > > > > > > > > > data <- read_csv("Ely_MN_Weather.csv") > > > > > > > > > > start_day = yday(as_date("2023-09-22")) end_day = > > > > > yday(as_date("2023-10-15")) > > > > > > > > > > d <- as_tibble(data) %>% > > > > > select(DATE,TMAX,TMIN) %>% > > > > > mutate(DATE = as_date(DATE), > > > > > yday = yday(DATE), > > > > > md = sprintf("%02d-%02d", month(DATE), > > > > > mday(DATE)) > > > > > ) %>% > > > > > filter(yday >= start_day & yday <= end_day) %>% > > > > > mutate(md = as.factor(md)) > > > > > > > > > > d_sum <- d %>% > > > > > group_by(md) %>% > > > > > summarize(tmax_mean = mean(TMAX, na.rm=TRUE)) > > > > > > > > > > ## Here's the filtered data: > > > >
Re: [R] Help with plotting and date-times for climate data
Well, I looked for this, on both the NWS and WeatherUnderground, but couldn't find what I was looking for. Didn't check Weather.com, but if you can find a chart of the average high and low temperatures in Ely, MN between about the middle of September to the middle of October, I'll buy you a beer. -Kevin On Wed, 2023-09-13 at 17:39 +, Ebert,Timothy Aaron wrote: > I admire the dedication to R and data science, but the Weather > Channel might be a simpler approach. Weather.com. I can search for > (city name) and either weather (current values) or climate. It > depends on how far away the trip will be. > > -Original Message- > From: Kevin Zembower > Sent: Wednesday, September 13, 2023 1:22 PM > To: Richard O'Keefe ; Ebert,Timothy Aaron > > Cc: r-help@r-project.org > Subject: Re: [R] Help with plotting and date-times for climate data > > [External Email] > > Tim, Richard, y'all are reading too much into this. I believe that > TMAX is the high temperature of the day, and TMIN is the low. I'm > trying to compute the average or median high and low temperatures for > the data I have (2011 to present). I'm going on a trip to this area, > and want to know how to pack. > > Thanks for your interest. > > -Kevin > > On Thu, 2023-09-14 at 03:07 +1200, Richard O'Keefe wrote: > > I am well aware of the physiological implications of temperature, > > and > > that is *why* I view recorded TMIN and TMAX at a single point with > > an > > extremely jaundiced eye. TMAX at shoulder height has very little > > relevance to an insect living in grass, for example. And if TMAX > > is > > sustained for one second, that has very different consequences from > > if > > TMAX is sustained for five minutes. I can see the usefulness of > > "proportion of day above Thi/below Tlo", but that is quite > > different. > > > > OK, so my interest in weather data was mainly based around water > > management: precipitation, evaporation, herd and crop water needs, > > that kind of thing. And the first thing you learn from that > > experience is that ANY kind of single-point summary is seriously > > misleading. > > > > Let's end this digression. > > > > > > On Thu, 14 Sept 2023 at 02:18, Ebert,Timothy Aaron > > wrote: > > > I had the same question. > > > However, I can partly answer the off-topic question. Min and max > > > can > > > be important as lower and upper development thresholds. Below the > > > min no growth or development occur because reaction rates are too > > > slow to enable such. Above max, temperatures are too hot. > > > Protein function is impaired, and systems stop functioning. There > > > is > > > a considerable range between where systems shut down (but > > > recover) and tissue death. > > > In a simple form the growth and physiological stage of plants, > > > insects, and many others, can be modeled as a function of > > > temperature. These are often called growing degree day models (or > > > some version of that). This is number of thermal units needed for > > > the organism to develop to the next stage (e.g. instar for an > > > insect, or fruit/flower formation for a plant). However, better > > > accuracy is obtained if the model includes both min and max > > > thresholds. > > > > > > All I have done is provide an example where min and max could > > > have a > > > real world use. I use max(temp) over some interval and then > > > update > > > an accumulated thermal units variable based on the outcome. > > > That detail is not evident in the original request. > > > > > > Tim > > > > > > -Original Message- > > > From: R-help On Behalf Of Richard > > > O'Keefe > > > Sent: Wednesday, September 13, 2023 9:58 AM > > > To: Kevin Zembower > > > Cc: r-help@r-project.org > > > Subject: Re: [R] Help with plotting and date-times for climate > > > data > > > > > > [External Email] > > > > > > Off-topic, but what is a "mean temperature max" > > > and what good would it do you to know you if you did? > > > I've been looking at a lot of weather station data and for no > > > question I've ever had (except "would the newspapers get excited > > > about this") was "max" (or min) the answer. Considering the way > > > that temperature can change by several degrees in a few
Re: [R] Help with plotting and date-times for climate data
Rui, thanks so much for your clear explanation, solution to my problem, and additional help with making the graph come out exactly as I was hoping. I learned a lot from your solution. Thanks, again, for your help. -Kevin On Tue, 2023-09-12 at 23:06 +0100, Rui Barradas wrote: > Às 21:50 de 12/09/2023, Kevin Zembower via R-help escreveu: > > Hello, > > > > I'm trying to calculate the mean temperature max from a file of > > climate > > date, and plot it over a range of days in the year. I've downloaded > > the > > data, and cleaned it up the way I think it should be. However, when > > I > > plot it, the geom_smooth line doesn't show up. I think that's > > because > > my x axis is characters or factors. Here's what I have so far: > > > > library(tidyverse) > > > > data <- read_csv("Ely_MN_Weather.csv") > > > > start_day = yday(as_date("2023-09-22")) > > end_day = yday(as_date("2023-10-15")) > > > > d <- as_tibble(data) %>% > > select(DATE,TMAX,TMIN) %>% > > mutate(DATE = as_date(DATE), > > yday = yday(DATE), > > md = sprintf("%02d-%02d", month(DATE), mday(DATE)) > > ) %>% > > filter(yday >= start_day & yday <= end_day) %>% > > mutate(md = as.factor(md)) > > > > d_sum <- d %>% > > group_by(md) %>% > > summarize(tmax_mean = mean(TMAX, na.rm=TRUE)) > > > > ## Here's the filtered data: > > dput(d_sum) > > > > > structure(list(md = structure(1:25, levels = c("09-21", "09-22", > > "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29", > > "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06", > > "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13", > > "10-14", "10-15"), class = "factor"), tmax_mean = c(65, > > 62.2, > > 61.3, 63.9, 64.3, 60.1, 62.3, 60.5, 61.9, > > 61.2, 63.7, 59.5, 59.6, 61.6, > > 59.4, 58.8, 55.9, 58.125, > > 58, 55.7, 57, 55.4, 49.8, > > 48.75, 43.7)), class = c("tbl_df", "tbl", "data.frame" > > ), row.names = c(NA, -25L)) > > > > > ggplot(data = d_sum, aes(x = md)) + > > geom_point(aes(y = tmax_mean, color = "blue")) + > > geom_smooth(aes(y = tmax_mean, color = "blue")) > > = > > My questions are: > > 1. Why isn't my geom_smooth plotting? How can I fix it? > > 2. I don't think I'm handling the month and day combination > > correctly. > > Is there a way to encode month and day (but not year) as a date? > > 3. (Minor point) Why does my graph of tmax_mean come out red when I > > specify "blue"? > > > > Thanks for any advice or guidance you can offer. I really > > appreciate > > the expertise of this group. > > > > -Kevin > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > Hello, > > The problem is that the dates are factors, not real dates. And > geom_smooth is not interpolating along a discrete axis (the x axis). > > Paste a fake year with md, coerce to date and plot. > I have simplified the aes() calls and added a date scale in order to > make the x axis more readable. > > Without the formula and method arguments, geom_smooth will print a > message, they are now made explicit. > > > > suppressPackageStartupMessages({ > library(dplyr) > library(ggplot2) > }) > > d_sum %>% > mutate(md = paste("2023", md, sep = "-"), > md = as.Date(md)) %>% > ggplot(aes(x = md, y = tmax_mean)) + > geom_point(color = "blue") + > geom_smooth( > formula = y ~ x, > method = loess, > color = "blue" > ) + > scale_x_date(date_breaks = "7 days", date_labels = "%m-%d") > > > > Hope this helps, > > Rui Barradas > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with plotting and date-times for climate data
Tim, Richard, y'all are reading too much into this. I believe that TMAX is the high temperature of the day, and TMIN is the low. I'm trying to compute the average or median high and low temperatures for the data I have (2011 to present). I'm going on a trip to this area, and want to know how to pack. Thanks for your interest. -Kevin On Thu, 2023-09-14 at 03:07 +1200, Richard O'Keefe wrote: > I am well aware of the physiological implications > of temperature, and that is *why* I view recorded > TMIN and TMAX at a single point with an extremely > jaundiced eye. TMAX at shoulder height has very > little relevance to an insect living in grass, for > example. And if TMAX is sustained for one second, > that has very different consequences from if TMAX > is sustained for five minutes. I can see the usefulness > of "proportion of day above Thi/below Tlo", but that > is quite different. > > OK, so my interest in weather data was mainly based > around water management: precipitation, evaporation, > herd and crop water needs, that kind of thing. And > the first thing you learn from that experience is > that ANY kind of single-point summary is seriously > misleading. > > Let's end this digression. > > > On Thu, 14 Sept 2023 at 02:18, Ebert,Timothy Aaron > wrote: > > I had the same question. > > However, I can partly answer the off-topic question. Min and max > > can be important as lower and upper development thresholds. Below > > the min no growth or development occur because reaction rates are > > too slow to enable such. Above max, temperatures are too hot. > > Protein function is impaired, and systems stop functioning. There > > is a considerable range between where systems shut down (but > > recover) and tissue death. > > In a simple form the growth and physiological stage of plants, > > insects, and many others, can be modeled as a function of > > temperature. These are often called growing degree day models (or > > some version of that). This is number of thermal units needed for > > the organism to develop to the next stage (e.g. instar for an > > insect, or fruit/flower formation for a plant). However, better > > accuracy is obtained if the model includes both min and max > > thresholds. > > > > All I have done is provide an example where min and max could have > > a real world use. I use max(temp) over some interval and then > > update an accumulated thermal units variable based on the outcome. > > That detail is not evident in the original request. > > > > Tim > > > > -Original Message- > > From: R-help On Behalf Of Richard > > O'Keefe > > Sent: Wednesday, September 13, 2023 9:58 AM > > To: Kevin Zembower > > Cc: r-help@r-project.org > > Subject: Re: [R] Help with plotting and date-times for climate data > > > > [External Email] > > > > Off-topic, but what is a "mean temperature max" > > and what good would it do you to know you if you did? > > I've been looking at a lot of weather station data and for no > > question I've ever had (except "would the newspapers get excited > > about this") was "max" (or min) the answer. Considering the way > > that temperature can change by several degrees in a few minutes, or > > a few metres -- I meant horizontally when I wrote that, but as you > > know your head and feet don't experience the same temperature, > > again by more than one degree -- I am at something of a loss to > > ascribe much practical significance to TMAX. Are you sure this is > > the analysis you want to do? Is this the most informative data you > > can get? > > > > On Wed, 13 Sept 2023 at 08:51, Kevin Zembower via R-help < > > r-help@r-project.org> wrote: > > > > > Hello, > > > > > > I'm trying to calculate the mean temperature max from a file of > > > climate date, and plot it over a range of days in the year. I've > > > downloaded the data, and cleaned it up the way I think it should > > > be. > > > However, when I plot it, the geom_smooth line doesn't show up. I > > > think > > > that's because my x axis is characters or factors. Here's what I > > > have so far: > > > > > > library(tidyverse) > > > > > > data <- read_csv("Ely_MN_Weather.csv") > > > > > > start_day = yday(as_date("2023-09-22")) end_day = > > > yday(as_date("2023-10-15")) > > > > >
[R] Help with plotting and date-times for climate data
Hello, I'm trying to calculate the mean temperature max from a file of climate date, and plot it over a range of days in the year. I've downloaded the data, and cleaned it up the way I think it should be. However, when I plot it, the geom_smooth line doesn't show up. I think that's because my x axis is characters or factors. Here's what I have so far: library(tidyverse) data <- read_csv("Ely_MN_Weather.csv") start_day = yday(as_date("2023-09-22")) end_day = yday(as_date("2023-10-15")) d <- as_tibble(data) %>% select(DATE,TMAX,TMIN) %>% mutate(DATE = as_date(DATE), yday = yday(DATE), md = sprintf("%02d-%02d", month(DATE), mday(DATE)) ) %>% filter(yday >= start_day & yday <= end_day) %>% mutate(md = as.factor(md)) d_sum <- d %>% group_by(md) %>% summarize(tmax_mean = mean(TMAX, na.rm=TRUE)) ## Here's the filtered data: dput(d_sum) > structure(list(md = structure(1:25, levels = c("09-21", "09-22", "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29", "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06", "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13", "10-14", "10-15"), class = "factor"), tmax_mean = c(65, 62.2, 61.3, 63.9, 64.3, 60.1, 62.3, 60.5, 61.9, 61.2, 63.7, 59.5, 59.6, 61.6, 59.4, 58.8, 55.9, 58.125, 58, 55.7, 57, 55.4, 49.8, 48.75, 43.7)), class = c("tbl_df", "tbl", "data.frame" ), row.names = c(NA, -25L)) > ggplot(data = d_sum, aes(x = md)) + geom_point(aes(y = tmax_mean, color = "blue")) + geom_smooth(aes(y = tmax_mean, color = "blue")) = My questions are: 1. Why isn't my geom_smooth plotting? How can I fix it? 2. I don't think I'm handling the month and day combination correctly. Is there a way to encode month and day (but not year) as a date? 3. (Minor point) Why does my graph of tmax_mean come out red when I specify "blue"? Thanks for any advice or guidance you can offer. I really appreciate the expertise of this group. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Suggestions for hierarchical row names?
Hello, On 28 June I asked a question with the subject "Processing a hierarchical string name". Folks here were very generous in helping me, and I'm very pleased with the solutions. Now, I'm asking about a related topic, and I have both technical and stylistic questions. I'm still working on my US census report for my neighborhood, and have tables with labels like this: > p16_tab[,1] # A tibble: 9 × 1 label 1 " !!Total:" 2 " !!Total:!!Family households:" 3 " !!Total:!!Family households:!!Married couple family" 4 " !!Total:!!Family households:!!Other family:" 5 " !!Total:!!Family households:!!Other family:!!Male householder, no spouse present" 6 " !!Total:!!Family households:!!Other family:!!Female householder, no spouse present" 7 " !!Total:!!Nonfamily households:" 8 " !!Total:!!Nonfamily households:!!Householder living alone" 9 " !!Total:!!Nonfamily households:!!Householder not living alone" > A sample table can be obtained with: library(tidyverse) library(tidycensus) get_us <- function(table, summary_var) { get_decennial( geography = "us", table = table, cache_table = TRUE, year = 2020, sumfile = "dhc", summary_var = summary_var) %>% mutate(GEOID = NULL, NAME = NULL, "US_pc" = value / summary_value * 100, value = NULL, summary_value = NULL) tableID <- "P16" summary_var <- "P16_001N" (us_P16 <- get_us(tableID, summary_var)) labels <- load_variables(2020, "dhc", cache = TRUE) (p16_tab <- us_P16 %>% left_join(labels, by = c("variable" = "name")) %>% mutate(variable = NULL, concept = NULL) %>% relocate(label) ) Initially, I thought that I would indent the lines by a single space for every piece of text starting with "!!" and ending with ":" except for the last one. This works fine, if the final output was just ASCII text. However, I'm trying to output my report in LaTeX, using sweave and knitr/kable. When I output my report using spaces, LaTeX deletes them. I then tried replacing the spaces with "\hspace{1em}": p16_tab$label <- p16_tab$label %>% str_replace("^ !!", "") %>% #Drop the leading ' !!' str_replace_all("[^!]*!!", "hspace{1em}") kable(p16_tab, format = "latex", booktabs = TRUE, col.names = c("label", "United States %-age") ) This results in the "\" of "\hspace" being replace with "\textbackslash{}hspace". I also thought that there was a way to suppress formatting in kableExtra, but I can't find it now. Regardless, I remember it didn't work the way I wanted it to, either. kableExtra has the add_indent() function, that looks promising: (p16_tab <- us_P16 %>% left_join(labels, by = c("variable" = "name")) %>% mutate(variable = NULL, concept = NULL) %>% relocate(label) ) p16_tab$label <- p16_tab$label %>% str_replace("^ !!", "") %>% #Drop the leading ' !!' str_replace_all("[^!]*!!", "") #Replace each !!.* with nothing Unfortunately, this doesn't work: kable(p16_tab, format = "latex", booktabs = TRUE, col.names = c("label", "United States %-age") ) %>% add_indent(c(2:9), level_of_indent = c(1,2,2,3,3,1,2,2)) and I have to do this: kable(p16_tab, format = "latex", booktabs = TRUE, col.names = c("label", "United States %-age") ) %>% add_indent(c(2,7), level_of_indent = 1) %>% add_indent(c(3,4,8,9), level_of_indent = 2) %>% add_indent(c(5,6), level_of_indent = 3) However, this is manual, and therefore not really satisfactory. I have two question: 1. If I want to use spaces, and would like a programmatic solution (versus a manual one), can this be done? 2. Stylistically, is there a better way to represent the nesting of lower rows in a table below upper rows? If I don't fixate on using spaces, is there anther way? Thanks for your suggestions and advice. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing a hierarchical string name
Ivan and Bert, thank you so much for your help. Ivan, your solution worked perfectly. I didn't really understand how to do string processing on a vector of strings, and your solution demonstrated it for me. I modified it to work with the tidyverses' stringr library in this way: bg3_race_sum <- bg3_race %>% left_join(pl_vars, by=c("variable" = "name")) %>% group_by(variable) %>% summarize(count = sum(value)) %>% left_join(pl_vars, by=c("variable" = "name")) %>% filter(count > 0) %>% .$label %>% str_replace("^ !!", "") %>% #Drop the leading ' !!' str_replace_all("[^!]*!!", " ") #Replace each !!.* with space Bert, your solution was close to correct. It correctly dropped the right text, but didn't insert a space for each piece of text between "!!" and after the ":". I'm using those spaces to preserve the hierarchical nature of the numbers, how lower numbers (in the chart) are included in higher numbers. For instance, the "Total:" number is the sum of "Population of one race" and "Population of two or more races". Thank you both for helping me with this specific problem and for increasing my knowledge and abilities with R. -Kevin On 6/28/23 16:56, Ivan Krylov wrote: > On Wed, 28 Jun 2023 20:29:23 + > Kevin Zembower via R-help wrote: > >> I think my algorithm for the labels is: >> 1. keep everything from the last "!!" up to and including the last >> character >> 2. for everything remaining, replace each "!!.*:" group with a single >> space. > > If you remove the initial ' !!', the problem becomes a more tractable > "replace each group of non-'!' followed by '!!' with one space": > > bg3_race_sum$label |> > (\(.) sub('^ !!', '', .))() |> > (\(.) gsub('[^!]*!!', ' ', .))() > > But that solution could have been impossible if the task was slightly > different. > >> I can split the label using str_split(label, pattern = "!!") to get a >> vector of strings, but don't know how to work on the last string and >> all the rest of the strings separately. > > str_split() would have given you a list of character vectors. You can > use lapply to evaluate a function on each vector inside that list. > Inside the function, use length(x) (if `x` is the argument of the > function) to find out how many spaces to produce and which element of > the vector is the last one. (For code golf points, use rev(x)[1] to get > the last element.) > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Processing a hierarchical string name
Hello, all I'm trying to process the names of the variables in the US Census database, that I'm retrieving with tidycensus. My end goal is to produce nicely formatted tables with natural labels. The labels as downloaded from the US Census look like this: ## Get the P1 table for block group 3 in census tract 2711.01: bg3_race <- get_decennial( geography = "block group", state = "MD", county = "Baltimore city", table = "P1", cache_table = TRUE, year = "2020", sumfile = "pl")%>% filter(substr(GEOID, 6, 12) == "2711013") ## Load the names and labels of the variables: pl_vars <- load_variables(year = "2020", dataset = "pl", cache = TRUE) ## Join the labels to the variables, and drop the zero counts bg3_race_sum <- bg3_race %>% left_join(pl_vars, by=c("variable" = "name")) %>% filter(value > 0) %>% select(c(GEOID, value, label)) head(bg3_race_sum$label) [1] " !!Total:" [2] " !!Total:!!Population of one race:" [3] " !!Total:!!Population of one race:!!White alone" [4] " !!Total:!!Population of one race:!!Black or African American alone" [5] " !!Total:!!Population of one race:!!American Indian and Alaska Native alone" [6] " !!Total:!!Population of one race:!!Asian alone" I think my algorithm for the labels is: 1. keep everything from the last "!!" up to and including the last character 2. for everything remaining, replace each "!!.*:" group with a single space. This turns head() into: "Total:" " Population of one race:" " White alone" " Black or African American alone" " American Indian and Alaska Native alone" " Asian alone" [may not be clearly visible if not rendered in a monospaced font] I think that I need lapply here, but I'm not sure of that, and of what to do next. I can split the label using str_split(label, pattern = "!!") to get a vector of strings, but don't know how to work on the last string and all the rest of the strings separately. Thank you for any suggestions to nudge me along towards a workable solution. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rmarkdown code rendering as LaTeX, not executing?
Hi, all, I'm trying to compose an Rmarkdown document and render it as a PDF file. My first block of R code seems to work okay, but the second on seems to be interpreted as LaTeX code, and not executed as R code. In the output, the three back-ticks that mark the R code block are interpreted as an opening double-quote, followed by an opening single quote. Here's my test file: --- title: "An analysis of US 2020 Census Data for the Radnor-Winston neighborhood" author: "E. Kevin Zembower" date: "29 May 2023" output: pdf_document: extra_dependencies: ["array", "booktabs", "dcolumn"] --- ```{r setup, include = FALSE} ``` \section{Abstract} In this document, ... \section{Boundaries of the Radnor-Winston neighborhood} ... For the purposes of this report, the boundaries of RW are as shown in figure \ref{RWneigh}. ... ```{r rw_map, fig.width = 6, fig.height = 4, out.width = "80%", dev = "pdf", fig.cap = "Map of RW neighborhood\label{RWneigh}"} ## Creating a polygon for RW neighborhood, based on CRS 6487 (NAD83 ## (2011) / Maryland ) map in meters: base_x <- 433000 base_y <- 186000 rw_neigh_pg_m <- data.frame( matrix( c(540, 1140, 540, 1070, 480, 1060, 490, 1000, 570, 1000, 570, 940, 550, 930, 550, 890, 580, 890, 590, 820, 640, 820, 650, 590, 520, 580, 470, 580, 350, 660, 350, 710, 180, 725, 190, 900, 220, 900, 220, 1030, 240, 1030, 240, 1110 ), ncol = 2, byrow = TRUE) ) %>% + matrix(c(rep(base_x, nrow(.)), rep(base_y, nrow(.))), nrow = nrow(.)) %>% sf::st_as_sf(coords = c(1,2), dim = "XY") %>% summarize(geometry = st_combine(geometry)) %>% st_cast("POLYGON") %>% st_set_crs(6487) ## Map it: rw_base_blocks <- read_osm(bb(rw_neigh_pg_m, ext = 1.3)) ## Line below gives map in meters (RW_block_map <- tm_shape(rw_base_blocks, projection = 6487) + ## Line below gives map in degrees ## (RW_block_map <- tm_shape(rw_base_blocks, projection = 6487) + tm_rgb() + tm_shape(rw_neigh_pg_m) + tm_fill(col = "green", alpha = 0.2) + tm_borders(lwd = 2, alpha = 1) + tm_scale_bar() + ## tm_grid() + tm_xlab("Long") + tm_ylab("Lat") + tm_grid() + tm_layout(title = "Radnor-Winston Neighborhood") ) ## tmap_save(RW_block_map, "rw_map.png") ``` This code block can also be obtained from https://gist.github.com/kzembower/f9ad52abf82975102cbf715bcfbc0f51. I'm using Emacs and ESS to create this document. This seems to produce its own weirdness, as the text style and font color and sizes change in the R code block as I edit it and add spaces and lines. If the block above is saved as "RW_test.Rmd", I use these lines to create the PDF: === library(rmarkdown) render("RW_test.Rmd") No errors are generated. Can anyone help me understand what I'm doing wrong? A much shorter test file I created seems to work okay. Thanks in advance for any advice. -Kevin > sessionInfo() R version 4.3.0 (2023-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.2 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 [10] LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 time zone: America/New_York tzcode source: system (glibc) attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] kableExtra_1.3.4 tidycensus_1.4 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0dplyr_1.1.2 [7] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2tidyverse_2.0.0 [13] rmarkdown_2.22 loaded via a namespace (and not attached): [1] gtable_0.3.3xfun_0.39 raster_3.6-20 tigris_2.0.3rJava_1.0-6 [6] lattice_0.21-8 tzdb_0.4.0 vctrs_0.6.2 tools_4.3.0 generics_0.1.3 [11] curl_5.0.0 proxy_0.4-27fansi_1.0.4 pkgconfig_2.0.3 KernSmooth_2.23-21 [16] webshot_0.5.4 uuid_1.1-0 lifecycle_1.0.3 compiler_4.3.0 munsell_0.5.0 [21] tinytex_0.45terra_1.7-29codetools_0.2-19 htmltools_0.5.5 class_7.3-22 [26] yaml_2.3.7 crayon_1.5.2pillar_1.9.0 classInt_0.4-9 tidyselect_1.2.0 [31] rvest_1.0.3 digest_0.6.31 stringi_
[R] Recommended ways to draw US Census map on Open Street Map base map?
Hello, all, I asked a version of this question on the R-sig-geo list, but didn't get any response. I'm asking here in the hopes of a wider audience. I'm trying to draw US Census map data, fetched with tigris, on top of a base map fetched by the package OpenStreetMap. I'm hoping for the most straight-forward solution. I made significant progress with leaflet(), but didn't need the interactivity of the map. I just need a 2D, static map that I can print and include in a document. Here's some of what I've tried so far: == library(tidyverse) library(tigris) options(tigris_use_cache = TRUE) library(OpenStreetMap) library(ggplot2) ## Get an Open Street Map: rw_map <- openmap(nw, se, type = "osm", mergeTiles = TRUE) %>% openproj(projection = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") ## Get an example census map: rw_tract <- tracts(state = "MD", county = "Baltimore city", year = "2020") %>% filter(NAME == "2711.01") ## This works: autoplot.OpenStreetMap(rw_map) ## So does this: plot(rw_tract$geometry) ## These don't: autoplot.OpenStreetMap(rw_map) + geom_sf(rw_tract$geometry) ggplot(map_data(rw_map), aes(long, lat)) ggplot(aes(x="long", y="lat")) + geom_sf(rw_map$geometry) = I think my problem in part is failing to fully understand the formats of the rw_map and rw_tract containers. rw_tract says it's a simple feature collection, but rw_map just gives me lists of the data. Can anyone help nudge me along in getting my rw_tract to be drawn on my rw_map? Any advice or guidance on putting together map data from different sources? And an over-arching question: Is moving in this direction, with ggplot2, the way you would recommend accomplishing this task? Is there a simpler, more straight-forward way of doing this? Thanks in advance for your help and efforts. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [External Email] Newbie: Controlling legends in graphs
See below. On 5/16/23 10:52, Christopher Ryan wrote: > I"m more of a lattice guy than a ggplot guy, but perhaps this is part of > the problem: > > . > geom_point(aes(y = m_K, color = "red")) + # >> you've > associated "K" with the color red > geom_smooth(aes(y = m_K, color = "red")) + > geom_point(aes(y = m_J, color = "blue")) + ## >> and "J" > with the color blue > geom_smooth(aes(y = m_J, color = "blue")) + > > . > Yes, I was confused that I associated "K" with the color red, yet the line and points for K's data were blue, but in the legend, was labeled with the word "red". But, I think I've got it straightened out now. Thanks for your help. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie: Drawing fitted lines on subset of data
Yep, that did it. I didn't know that you could have pipelines within pipelines. Thanks, again, for all your help. -Kevin On 5/16/23 11:44, Rui Barradas wrote: > Às 15:29 de 16/05/2023, Kevin Zembower via R-help escreveu: >> Hello, >> >> I's still working with my tsibble of weight data for the last 20 years. >> In addition to drawing an overall trend line, using lm, for the whole >> data set, I'd like to draw short lines that would recompute lm and draw >> it, say, just for the years from 2010:2015. >> >> Here's a short example that I think illustrates what I'm trying to do. >> The commented out sections show what I've tried to far: >> >> ## Short example to test segments: >> >> w <- tsibble( >> date = as.Date("2022-01-01") + 0:99, >> value = rnorm(100) >> ) >> >> ggplot(data = w, mapping = aes(date, value)) + >> geom_smooth(method = "lm", se = FALSE) + >> geom_point() >> ## Below gives error about ignoring data >> ## geom_abline( data = w$date[25:75] ) >> ## Gives error ''data' must be in ' >> ## geom_smooth(data = w$date[25:35], >> ## method = lm, >> ## color = "black", >> ## se = FALSE) >> >> I'm thinking that this is probably easily done, but I'm struggling with >> how to subset the data in the middle of the pipeline. >> >> Thanks for any advice and help. >> >> -Kevin >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > Hello, > > Try the following. > In the 2nd geom_smooth you need a subset of the data not of just one of > its columns. > > > > suppressPackageStartupMessages({ > library(tsibble) > library(dplyr) > library(ggplot2) > library(lubridate) > }) > > ggplot(data = w, mapping = aes(date, value)) + > geom_smooth(formula = y ~ x, method = "lm", se = FALSE) + > geom_point() + > geom_smooth( > data = w %>% filter(year(date) >= 2010, year(date) <= 2015), > mapping = aes(date, value), > formula = y ~ x, > method = lm, > color = "black", > se = FALSE > ) > > > Other ways to subset the data are > > > # dplyr > data = w %>% filter(year(date) %in% 2010:2015) > # base R > data = subset(w, year(date) %in% 2010:2015) > > > Hope this helps, > > Rui Barradas > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Newbie: Drawing fitted lines on subset of data
Hello, I's still working with my tsibble of weight data for the last 20 years. In addition to drawing an overall trend line, using lm, for the whole data set, I'd like to draw short lines that would recompute lm and draw it, say, just for the years from 2010:2015. Here's a short example that I think illustrates what I'm trying to do. The commented out sections show what I've tried to far: ## Short example to test segments: w <- tsibble( date = as.Date("2022-01-01") + 0:99, value = rnorm(100) ) ggplot(data = w, mapping = aes(date, value)) + geom_smooth(method = "lm", se = FALSE) + geom_point() ## Below gives error about ignoring data ## geom_abline( data = w$date[25:75] ) ## Gives error ''data' must be in ' ## geom_smooth(data = w$date[25:35], ## method = lm, ## color = "black", ## se = FALSE) I'm thinking that this is probably easily done, but I'm struggling with how to subset the data in the middle of the pipeline. Thanks for any advice and help. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie: Controlling legends in graphs
Rui, thanks so much for your help. Your explanation and example were clear and concise. Thanks for taking the time and effort to help me. -Kevin On 5/12/23 16:06, Rui Barradas wrote: > Às 14:24 de 12/05/2023, Kevin Zembower via R-help escreveu: >> Hello, I'm trying to create a line graph with a legend, but have no >> success controlling the legend. Since nothing I've tried seems to work, >> I must be doing something systematically wrong. Can anyone point this >> out to me? >> >> Here's my data: >> > weights >> # A tibble: 1,246 × 3 >> Date J K >> >> 1 2000-02-13 133 188 >> 2 2000-02-20 134 185 >> 3 2000-02-27 135 187 >> 4 2000-03-05 135 185 >> 5 2000-03-12 NA 184 >> 6 2000-03-19 NA 184. >> 7 2000-03-26 136 184. >> 8 2000-04-02 134 185 >> 9 2000-04-09 133 186 >> 10 2000-04-16 NA 186 >> # ℹ 1,236 more rows >> # ℹ Use `print(n = ...)` to see more rows >> > >> >> Here's my attempts. You can see some of the things I've tried in the >> commented out sections: >> weights %>% >> group_by(year(Date)) %>% >> summarize( >> m_K = mean(K, na.rm = TRUE), >> m_J = mean(J, na.rm = TRUE), >> ) %>% >> ggplot(aes(x = `year(Date)`)) + >> geom_point(aes(y = m_K, color = "red")) + >> geom_smooth(aes(y = m_K, color = "red")) + >> geom_point(aes(y = m_J, color = "blue")) + >> geom_smooth(aes(y = m_J, color = "blue")) + >> guides(size = "legend", >> shape = "legend") >> ## scale_shape_discrete(name="Person", >> ## breaks=c("m_K", "m_J"), >> ## labels=c("K", "J")) >> ## theme(legend.title=element_blank()) >> >> When this runs, the blue line for "K" is above the red line for "J", as >> I expect, but in the legend, the red is shown first, and labeled "blue." >> >> I'd like to be able to create a legend where the first entry shows a >> blue line and is labeled "K" and the second is red and labeled "J". >> >> On a different but related topic, I'd welcome any advice or suggestions >> on my methodology in this example. Is this the correct way to summarize >> with a mean? Do I need the two sets of geom_point and geom_line clauses >> to create this graph, or is there a better way? >> >> Thanks for all your advice and guidance. >> >> -Kevin >> >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > Hello, > > This is mainly a data reshaping problem. Insteadof plotting two > variables, J and K, if the data is in the long format you will map the > column with these variables names to the color aesthetic and call each > geom_* only once. Then, assign the colors you want. > > As for placing K above J, note that ggplot places them by alphabetical > order unless you coerce to factor with the levels in the order you want. > > Also, if you want to compute aggregate statistics for several columns, > use ?across. See the code below. > > Here is a complete example. I have augmented your data set in order to > have more years to plot. > > > > # augment the data set > weights <- " Date J K > 1 2000-02-13 133 188 > 2 2000-02-20 134 185 > 3 2000-02-27 135 187 > 4 2000-03-05 135 185 > 5 2000-03-12 NA 184 > 6 2000-03-19 NA 184. > 7 2000-03-26 136 184. > 8 2000-04-02 134 185 > 9 2000-04-09 133 186 > 10 2000-04-16 NA 186" > weights <- read.table(text = weights, header = TRUE) > weights$Date <- as.Date(weights$Date) > tmp <- weights > tmp <- lapply(1:10, \(y) { > tmp$Date <- years(y) + tmp$Date > tmp$J <- tmp$J + sample(-10:10, nrow(weights), TRUE) > tmp$K <- tmp$K + sample(-10:10, nrow(weights), TRUE) > tmp > }) > weights <- do.call(rbind, tmp) > > #--- > > # plot code > library(ggplot2) > library(dplyr) > library(tidyr) > library(lubridate) > > weights %>% &
[R] Newbie: Controlling legends in graphs
Hello, I'm trying to create a line graph with a legend, but have no success controlling the legend. Since nothing I've tried seems to work, I must be doing something systematically wrong. Can anyone point this out to me? Here's my data: > weights # A tibble: 1,246 × 3 Date J K 1 2000-02-13 133 188 2 2000-02-20 134 185 3 2000-02-27 135 187 4 2000-03-05 135 185 5 2000-03-12NA 184 6 2000-03-19NA 184. 7 2000-03-26 136 184. 8 2000-04-02 134 185 9 2000-04-09 133 186 10 2000-04-16NA 186 # ℹ 1,236 more rows # ℹ Use `print(n = ...)` to see more rows > Here's my attempts. You can see some of the things I've tried in the commented out sections: weights %>% group_by(year(Date)) %>% summarize( m_K = mean(K, na.rm = TRUE), m_J = mean(J, na.rm = TRUE), ) %>% ggplot(aes(x = `year(Date)`)) + geom_point(aes(y = m_K, color = "red")) + geom_smooth(aes(y = m_K, color = "red")) + geom_point(aes(y = m_J, color = "blue")) + geom_smooth(aes(y = m_J, color = "blue")) + guides(size = "legend", shape = "legend") ## scale_shape_discrete(name="Person", ## breaks=c("m_K", "m_J"), ## labels=c("K", "J")) ## theme(legend.title=element_blank()) When this runs, the blue line for "K" is above the red line for "J", as I expect, but in the legend, the red is shown first, and labeled "blue." I'd like to be able to create a legend where the first entry shows a blue line and is labeled "K" and the second is red and labeled "J". On a different but related topic, I'd welcome any advice or suggestions on my methodology in this example. Is this the correct way to summarize with a mean? Do I need the two sets of geom_point and geom_line clauses to create this graph, or is there a better way? Thanks for all your advice and guidance. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.