Re: [R] geom_ribbon removes missing values
Hi William, On 6/10/10 2:07 AM, William Dunlap wrote: > I'm not sure exactly what you want in poly_ids, but > if x is a vector of numbers that might contain NA's > and you want a vector of integers that identify each > run of non-NA's and are NA for each then you can get > it with > poly_id <- cumsum(is.na(x)) + 1 # bump count for each NA seen > poly_id[is.na(x)] <- NA > E.g., > > x<-c(1.5, 2.5, NA, 4.5, 5.5, 6.5, NA, 8.5, 9.5, NA, NA, 12.5) > > poly_ids <- cumsum(is.na(x)) + 1 > > poly_ids[is.na(x)] <- NA > > rbind(x, poly_ids) # to line up input and output >[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] > [,12] > x 1.5 2.5 NA 4.5 5.5 6.5 NA 8.5 9.5NANA > 12.5 > poly_ids 1.0 1.0 NA 2.0 2.0 2.0 NA 3.0 3.0NANA > 5.0 Great! That's exactly what I want in poly_ids. Thanks! Please find the new patch below. I also put a new branch on GitHub that is based on ggplot2 master and that has this patch. Note that I still don't know how to run ggplot2 from sources, so you'll have to trust in my copy-and-paste fu: http://github.com/kloesing/ggplot2/commit/177e69ae654da074 --- ggplot2-orig2010-06-06 14:02:25.0 +0200 +++ ggplot2 2010-06-10 08:31:02.0 +0200 @@ -5044,9 +5044,16 @@ draw <- function(., data, scales, coordinates, na.rm = FALSE, ...) { -data <- remove_missing(data, na.rm, - c("x","ymin","ymax"), name = "geom_ribbon") data <- data[order(data$group, data$x), ] + + # Instead of removing NA values from the data and plotting a single + # polygon, we want to "stop" plotting the polygon whenever we're missing + # values and "start" a new polygon as soon as we have new values. We do + # this by creating an id vector for polygonGrob that has distinct + # polygon numbers for sequences of non-NA values and NA for NA values in + # the original data. Example: c(NA, 2, 2, 2, NA, NA, 4, 4, 4, NA) + poly_ids <- cumsum(is.na(data$ymin) | is.na(data$ymax)) +1 + poly_ids[is.na(data$ymin) | is.na(data$ymax)] <- NA tb <- with(data, coordinates$munch(data.frame(x=c(x, rev(x)), y=c(ymax, rev(ymin))), scales) @@ -5054,12 +5061,12 @@ with(data, ggname(.$my_name(), gTree(children=gList( ggname("fill", polygonGrob( -tb$x, tb$y, +tb$x, tb$y, id=c(poly_ids, rev(poly_ids)), default.units="native", gp=gpar(fill=alpha(fill, alpha), col=NA) )), ggname("outline", polygonGrob( -tb$x, tb$y, +tb$x, tb$y, id=c(poly_ids, rev(poly_ids)), default.units="native", gp=gpar(fill=NA, col=colour, lwd=size * .pt, lty=linetype) )) Best, --Karsten __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geom_ribbon removes missing values
Hi Paul, On 6/9/10 1:12 AM, Paul Murrell wrote: > grid.polygon() can do multiple polygons in a single call, but rather > than using NA's to separate sub-polygons, it uses an 'id' argument (or > an 'id.lengths' argument) to identify sub-polygons within the vectors of > x- and y-values (see the examples in ?grid.polygon). So a ggplot2 patch > that makes use of that facility might make more sense. That's a great idea! And it makes the patch look far less ugly. Thanks for that! I still can't get rid of the loop, but I'd guess that going through the vector once is not a performance killer. If someone has an idea how we can get a similar vector as the one mentioned in the comment, but without using a loop, please do tell! Here's the new patch: --- ggplot2-orig2010-06-06 14:02:25.0 +0200 +++ ggplot2 2010-06-10 01:22:20.0 +0200 @@ -5044,9 +5044,19 @@ draw <- function(., data, scales, coordinates, na.rm = FALSE, ...) { -data <- remove_missing(data, na.rm, - c("x","ymin","ymax"), name = "geom_ribbon") data <- data[order(data$group, data$x), ] + + # Instead of removing NA values from the data and plotting a single + # polygon, we want to "stop" plotting the polygon whenever we're missing + # values and "start" a new polygon as soon as we have new values. We do + # this by creating an id vector for polygonGrob that has distinct + # polygon numbers for sequences of non-NA values and NA for NA values in + # the original data. Example: c(NA, 2, 2, 2, NA, NA, 7, 7, 7, NA) + poly_ids <- 1:length(data$x) + poly_ids[is.na(data$ymin) | is.na(data$ymax)] <- NA + for (i in 2:length(poly_ids)) +if (!is.na(poly_ids[i]) & !is.na(poly_ids[i-1])) + poly_ids[i] <- poly_ids[i-1] tb <- with(data, coordinates$munch(data.frame(x=c(x, rev(x)), y=c(ymax, rev(ymin))), scales) @@ -5054,12 +5064,12 @@ with(data, ggname(.$my_name(), gTree(children=gList( ggname("fill", polygonGrob( -tb$x, tb$y, +tb$x, tb$y, id=c(poly_ids, rev(poly_ids)), default.units="native", gp=gpar(fill=alpha(fill, alpha), col=NA) )), ggname("outline", polygonGrob( -tb$x, tb$y, +tb$x, tb$y, id=c(poly_ids, rev(poly_ids)), default.units="native", gp=gpar(fill=NA, col=colour, lwd=size * .pt, lty=linetype) )) Thanks, --Karsten __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geom_ribbon removes missing values
Hi grid.polygon() can do multiple polygons in a single call, but rather than using NA's to separate sub-polygons, it uses an 'id' argument (or an 'id.lengths' argument) to identify sub-polygons within the vectors of x- and y-values (see the examples in ?grid.polygon). So a ggplot2 patch that makes use of that facility might make more sense. Paul On 6/7/2010 5:46 AM, Karsten Loesing wrote: Hi Hadley, On 5/31/10 9:51 PM, Hadley Wickham wrote: There's no easy way to do this because behind the scenes geom_ribbon uses grid.polygon. A possible workaround might be to have grid.polygon draw multiple polygons, one for each interval. We can do this by constructing vectors with coordinates for the first polygon, then NA, then coordinates for the second polygon, etc. Here are the vectors for my initial example: x<- c(x[1:4], x[4:1], NA, x[7:10], x[10:7]) y<- c(ymax[1:4], ymin[4:1], NA, ymax[7:10], ymin[10:7]) I worked on a simple (but ugly) patch to GeomRibbon in ggplot2 that does the job using an iteration: /Library/Frameworks/R.framework/Versions/2.10/Resources/library/ggplot2/R$ diff ggplot2-orig ggplot2 5047,5048d5046 < data<- remove_missing(data, na.rm,length(data$x) || is.na(data$ymin[i]) || is.na(data$ymax[i])) { if (start> 0) { polyx<- c(polyx, data$x[start:(i-1)], data$x[(i-1):start], NA) polyy<- c(polyy, data$ymax[start:(i-1)], data$ymin[start:(i-1)], NA) start<- 0 } } else { if (start == 0) { start<- i } } } polyx<- head(polyx, length(polyx) - 1) polyy<- head(polyy, length(polyy) - 1) 5052c5071
Re: [R] geom_ribbon removes missing values
On 6/6/10 7:46 PM, Karsten Loesing wrote: > Hi Hadley, > > On 5/31/10 9:51 PM, Hadley Wickham wrote: >> There's no easy way to do this because behind the scenes geom_ribbon >> uses grid.polygon. > > A possible workaround might be to have grid.polygon draw multiple > polygons, one for each interval. We can do this by constructing vectors > with coordinates for the first polygon, then NA, then coordinates for > the second polygon, etc. Here are the vectors for my initial example: > > x <- c(x[1:4], x[4:1], NA, x[7:10], x[10:7]) > y <- c(ymax[1:4], ymin[4:1], NA, ymax[7:10], ymin[10:7]) > > I worked on a simple (but ugly) patch to GeomRibbon in ggplot2 that does > the job using an iteration: > > > /Library/Frameworks/R.framework/Versions/2.10/Resources/library/ggplot2/R$ > diff ggplot2-orig ggplot2 > 5047,5048d5046 > < data <- remove_missing(data, na.rm, > < c("x","ymin","ymax"), name = "geom_ribbon") > 5050a5049,5069 >> start <- 0 >> polyx <- c() >> polyy <- c() >> for (i in 1:(length(data$x)+1)) { >> if (i > length(data$x) || is.na(data$ymin[i]) || >> is.na(data$ymax[i])) { >> if (start > 0) { >> polyx <- c(polyx, data$x[start:(i-1)], >> data$x[(i-1):start], NA) >> polyy <- c(polyy, data$ymax[start:(i-1)], >> data$ymin[start:(i-1)], NA) Whoops, change that to: data$ymin[(i-1):start], NA) >> start <- 0 >> } >> } else { >> if (start == 0) { >> start <- i >> } >> } >> } >> polyx <- head(polyx, length(polyx) - 1) >> polyy <- head(polyy, length(polyy) - 1) > 5052c5071 > < coordinates$munch(data.frame(x=c(x, rev(x)), y=c(ymax, > rev(ymin))), scales) > --- >> coordinates$munch(data.frame(x = polyx, y = polyy), scales) > > > Do you like the described approach? Can you help me make my patch better? > > In particular, I'd want to avoid iterating over the data frame and > extract start and end index of intervals separated by NA. Is there a > function for this or at least a better approach? > > Also, probably a stupid question: How do I tell R to use the cloned > ggplot2 sources instead of the installed ggplot2 package? As you can > see, I modified the installed package, but I'd rather work with Git here. > > Thanks, > --Karsten > > >> On Sun, May 30, 2010 at 7:26 AM, Karsten Loesing >> wrote: >>> Hi everyone, >>> >>> it looks like geom_ribbon removes missing values and plots a single >>> ribbon over the whole interval of x values. However, I'd rather want it >>> to act like geom_line, that is, interrupt the ribbon for the interval of >>> missing values and continue once there are new values. Here's an example: >>> >>> library(ggplot2) >>> df <- data.frame( >>> date = seq(from = as.Date("2010-05-15"), >>> to = as.Date("2010-05-24"), >>> by = "1 day"), >>> low = c(4, 5, 4, 5, NA, NA, 4, 5, 4, 5), >>> mid = c(8, 9, 8, 9, NA, NA, 8, 9, 8, 9), >>> high = c(12, 13, 12, 13, NA, NA, 12, 13, 12, 13)) >>> ggplot(df, aes(x = date, y = mid, ymin = low, ymax = high)) + >>> geom_line() + >>> geom_ribbon(fill = alpha("blue", 0.5)) >>> >>> When running this code, R tells me: >>> >>> Warning message: >>> Removed 2 rows containing missing values (geom_ribbon). >>> >>> When you look at the graph, you can see that the line stops at May 18 >>> and starts again on May 21. But the ribbon reaches from May 15 to 24, >>> even though there are no values on May 19 and 20. >>> >>> Is there an option that I could set? Or a geom/stat that I should use >>> instead? In my pre-ggplot2 times I used polygon(), but I figured there >>> must be something better in ggplot2 (as there has always been so far). >>> >>> Thanks, >>> --Karsten >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geom_ribbon removes missing values
Hi Hadley, On 5/31/10 9:51 PM, Hadley Wickham wrote: > There's no easy way to do this because behind the scenes geom_ribbon > uses grid.polygon. A possible workaround might be to have grid.polygon draw multiple polygons, one for each interval. We can do this by constructing vectors with coordinates for the first polygon, then NA, then coordinates for the second polygon, etc. Here are the vectors for my initial example: x <- c(x[1:4], x[4:1], NA, x[7:10], x[10:7]) y <- c(ymax[1:4], ymin[4:1], NA, ymax[7:10], ymin[10:7]) I worked on a simple (but ugly) patch to GeomRibbon in ggplot2 that does the job using an iteration: /Library/Frameworks/R.framework/Versions/2.10/Resources/library/ggplot2/R$ diff ggplot2-orig ggplot2 5047,5048d5046 < data <- remove_missing(data, na.rm, < c("x","ymin","ymax"), name = "geom_ribbon") 5050a5049,5069 > start <- 0 > polyx <- c() > polyy <- c() > for (i in 1:(length(data$x)+1)) { > if (i > length(data$x) || is.na(data$ymin[i]) || > is.na(data$ymax[i])) { > if (start > 0) { > polyx <- c(polyx, data$x[start:(i-1)], > data$x[(i-1):start], NA) > polyy <- c(polyy, data$ymax[start:(i-1)], > data$ymin[start:(i-1)], NA) > start <- 0 > } > } else { > if (start == 0) { > start <- i > } > } > } > polyx <- head(polyx, length(polyx) - 1) > polyy <- head(polyy, length(polyy) - 1) 5052c5071 < coordinates$munch(data.frame(x=c(x, rev(x)), y=c(ymax, rev(ymin))), scales) --- > coordinates$munch(data.frame(x = polyx, y = polyy), scales) Do you like the described approach? Can you help me make my patch better? In particular, I'd want to avoid iterating over the data frame and extract start and end index of intervals separated by NA. Is there a function for this or at least a better approach? Also, probably a stupid question: How do I tell R to use the cloned ggplot2 sources instead of the installed ggplot2 package? As you can see, I modified the installed package, but I'd rather work with Git here. Thanks, --Karsten > On Sun, May 30, 2010 at 7:26 AM, Karsten Loesing > wrote: >> Hi everyone, >> >> it looks like geom_ribbon removes missing values and plots a single >> ribbon over the whole interval of x values. However, I'd rather want it >> to act like geom_line, that is, interrupt the ribbon for the interval of >> missing values and continue once there are new values. Here's an example: >> >> library(ggplot2) >> df <- data.frame( >> date = seq(from = as.Date("2010-05-15"), >> to = as.Date("2010-05-24"), >> by = "1 day"), >> low = c(4, 5, 4, 5, NA, NA, 4, 5, 4, 5), >> mid = c(8, 9, 8, 9, NA, NA, 8, 9, 8, 9), >> high = c(12, 13, 12, 13, NA, NA, 12, 13, 12, 13)) >> ggplot(df, aes(x = date, y = mid, ymin = low, ymax = high)) + >> geom_line() + >> geom_ribbon(fill = alpha("blue", 0.5)) >> >> When running this code, R tells me: >> >> Warning message: >> Removed 2 rows containing missing values (geom_ribbon). >> >> When you look at the graph, you can see that the line stops at May 18 >> and starts again on May 21. But the ribbon reaches from May 15 to 24, >> even though there are no values on May 19 and 20. >> >> Is there an option that I could set? Or a geom/stat that I should use >> instead? In my pre-ggplot2 times I used polygon(), but I figured there >> must be something better in ggplot2 (as there has always been so far). >> >> Thanks, >> --Karsten >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geom_ribbon removes missing values
Hi Karsten, There's no easy way to do this because behind the scenes geom_ribbon uses grid.polygon. Hadley On Sun, May 30, 2010 at 7:26 AM, Karsten Loesing wrote: > Hi everyone, > > it looks like geom_ribbon removes missing values and plots a single > ribbon over the whole interval of x values. However, I'd rather want it > to act like geom_line, that is, interrupt the ribbon for the interval of > missing values and continue once there are new values. Here's an example: > > library(ggplot2) > df <- data.frame( > date = seq(from = as.Date("2010-05-15"), > to = as.Date("2010-05-24"), > by = "1 day"), > low = c(4, 5, 4, 5, NA, NA, 4, 5, 4, 5), > mid = c(8, 9, 8, 9, NA, NA, 8, 9, 8, 9), > high = c(12, 13, 12, 13, NA, NA, 12, 13, 12, 13)) > ggplot(df, aes(x = date, y = mid, ymin = low, ymax = high)) + > geom_line() + > geom_ribbon(fill = alpha("blue", 0.5)) > > When running this code, R tells me: > > Warning message: > Removed 2 rows containing missing values (geom_ribbon). > > When you look at the graph, you can see that the line stops at May 18 > and starts again on May 21. But the ribbon reaches from May 15 to 24, > even though there are no values on May 19 and 20. > > Is there an option that I could set? Or a geom/stat that I should use > instead? In my pre-ggplot2 times I used polygon(), but I figured there > must be something better in ggplot2 (as there has always been so far). > > Thanks, > --Karsten > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.