Re: [R] geom_ribbon removes missing values

2010-06-10 Thread Karsten Loesing
Hi William,

On 6/10/10 2:07 AM, William Dunlap wrote:
> I'm not sure exactly what you want in poly_ids, but
> if x is a vector of numbers that might contain NA's
> and you want a vector of integers that identify each
> run of non-NA's and are NA for each then you can get
> it with
> poly_id <- cumsum(is.na(x)) + 1 # bump count for each NA seen
> poly_id[is.na(x)] <- NA
> E.g.,
>   > x<-c(1.5, 2.5, NA, 4.5, 5.5, 6.5, NA, 8.5, 9.5, NA, NA, 12.5)
>   > poly_ids <- cumsum(is.na(x)) + 1
>   > poly_ids[is.na(x)] <- NA
>   > rbind(x, poly_ids) # to line up input and output
>[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
> [,12]
>   x 1.5  2.5   NA  4.5  5.5  6.5   NA  8.5  9.5NANA
> 12.5
>   poly_ids  1.0  1.0   NA  2.0  2.0  2.0   NA  3.0  3.0NANA
> 5.0


Great! That's exactly what I want in poly_ids. Thanks! Please find the
new patch below.

I also put a new branch on GitHub that is based on ggplot2 master and
that has this patch. Note that I still don't know how to run ggplot2
from sources, so you'll have to trust in my copy-and-paste fu:

  http://github.com/kloesing/ggplot2/commit/177e69ae654da074



--- ggplot2-orig2010-06-06 14:02:25.0 +0200
+++ ggplot2 2010-06-10 08:31:02.0 +0200
@@ -5044,9 +5044,16 @@


   draw <- function(., data, scales, coordinates, na.rm = FALSE, ...) {
-data <- remove_missing(data, na.rm,
-  c("x","ymin","ymax"), name = "geom_ribbon")
 data <- data[order(data$group, data$x), ]
+
+  # Instead of removing NA values from the data and plotting a single
+  # polygon, we want to "stop" plotting the polygon whenever we're missing
+  # values and "start" a new polygon as soon as we have new values.  We do
+  # this by creating an id vector for polygonGrob that has distinct
+  # polygon numbers for sequences of non-NA values and NA for NA values in
+  # the original data.  Example: c(NA, 2, 2, 2, NA, NA, 4, 4, 4, NA)
+  poly_ids <- cumsum(is.na(data$ymin) | is.na(data$ymax)) +1
+  poly_ids[is.na(data$ymin) | is.na(data$ymax)] <- NA

 tb <- with(data,
   coordinates$munch(data.frame(x=c(x, rev(x)), y=c(ymax,
rev(ymin))), scales)
@@ -5054,12 +5061,12 @@

 with(data, ggname(.$my_name(), gTree(children=gList(
   ggname("fill", polygonGrob(
-tb$x, tb$y,
+tb$x, tb$y, id=c(poly_ids, rev(poly_ids)),
 default.units="native",
 gp=gpar(fill=alpha(fill, alpha), col=NA)
   )),
   ggname("outline", polygonGrob(
-tb$x, tb$y,
+tb$x, tb$y, id=c(poly_ids, rev(poly_ids)),
 default.units="native",
 gp=gpar(fill=NA, col=colour, lwd=size * .pt, lty=linetype)
   ))

Best,
--Karsten

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] geom_ribbon removes missing values

2010-06-09 Thread Karsten Loesing
Hi Paul,

On 6/9/10 1:12 AM, Paul Murrell wrote:
> grid.polygon() can do multiple polygons in a single call, but rather
> than using NA's to separate sub-polygons, it uses an 'id' argument (or
> an 'id.lengths' argument) to identify sub-polygons within the vectors of
> x- and y-values (see the examples in ?grid.polygon).  So a ggplot2 patch
> that makes use of that facility might make more sense.

That's a great idea! And it makes the patch look far less ugly. Thanks
for that!

I still can't get rid of the loop, but I'd guess that going through the
vector once is not a performance killer. If someone has an idea how we
can get a similar vector as the one mentioned in the comment, but
without using a loop, please do tell!

Here's the new patch:


--- ggplot2-orig2010-06-06 14:02:25.0 +0200
+++ ggplot2 2010-06-10 01:22:20.0 +0200
@@ -5044,9 +5044,19 @@


   draw <- function(., data, scales, coordinates, na.rm = FALSE, ...) {
-data <- remove_missing(data, na.rm,
-  c("x","ymin","ymax"), name = "geom_ribbon")
 data <- data[order(data$group, data$x), ]
+
+  # Instead of removing NA values from the data and plotting a single
+  # polygon, we want to "stop" plotting the polygon whenever we're missing
+  # values and "start" a new polygon as soon as we have new values.  We do
+  # this by creating an id vector for polygonGrob that has distinct
+  # polygon numbers for sequences of non-NA values and NA for NA values in
+  # the original data.  Example: c(NA, 2, 2, 2, NA, NA, 7, 7, 7, NA)
+  poly_ids <- 1:length(data$x)
+  poly_ids[is.na(data$ymin) | is.na(data$ymax)] <- NA
+  for (i in 2:length(poly_ids))
+if (!is.na(poly_ids[i]) & !is.na(poly_ids[i-1]))
+  poly_ids[i] <- poly_ids[i-1]

 tb <- with(data,
   coordinates$munch(data.frame(x=c(x, rev(x)), y=c(ymax,
rev(ymin))), scales)
@@ -5054,12 +5064,12 @@

 with(data, ggname(.$my_name(), gTree(children=gList(
   ggname("fill", polygonGrob(
-tb$x, tb$y,
+tb$x, tb$y, id=c(poly_ids, rev(poly_ids)),
 default.units="native",
 gp=gpar(fill=alpha(fill, alpha), col=NA)
   )),
   ggname("outline", polygonGrob(
-tb$x, tb$y,
+tb$x, tb$y, id=c(poly_ids, rev(poly_ids)),
 default.units="native",
 gp=gpar(fill=NA, col=colour, lwd=size * .pt, lty=linetype)
   ))


Thanks,
--Karsten

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] geom_ribbon removes missing values

2010-06-08 Thread Paul Murrell

Hi

grid.polygon() can do multiple polygons in a single call, but rather 
than using NA's to separate sub-polygons, it uses an 'id' argument (or 
an 'id.lengths' argument) to identify sub-polygons within the vectors of 
x- and y-values (see the examples in ?grid.polygon).  So a ggplot2 patch 
that makes use of that facility might make more sense.


Paul

On 6/7/2010 5:46 AM, Karsten Loesing wrote:

Hi Hadley,

On 5/31/10 9:51 PM, Hadley Wickham wrote:

There's no easy way to do this because behind the scenes geom_ribbon
uses grid.polygon.


A possible workaround might be to have grid.polygon draw multiple
polygons, one for each interval. We can do this by constructing vectors
with coordinates for the first polygon, then NA, then coordinates for
the second polygon, etc. Here are the vectors for my initial example:

x<- c(x[1:4], x[4:1], NA, x[7:10], x[10:7])
y<- c(ymax[1:4], ymin[4:1], NA, ymax[7:10], ymin[10:7])

I worked on a simple (but ugly) patch to GeomRibbon in ggplot2 that does
the job using an iteration:


/Library/Frameworks/R.framework/Versions/2.10/Resources/library/ggplot2/R$
diff ggplot2-orig ggplot2
5047,5048d5046
<  data<- remove_missing(data, na.rm,
  length(data$x) || is.na(data$ymin[i]) ||
   is.na(data$ymax[i])) {
 if (start>  0) {
   polyx<- c(polyx, data$x[start:(i-1)],
   data$x[(i-1):start], NA)
   polyy<- c(polyy, data$ymax[start:(i-1)],
   data$ymin[start:(i-1)], NA)
   start<- 0
 }
   } else {
 if (start == 0) {
   start<- i
 }
   }
 }
 polyx<- head(polyx, length(polyx) - 1)
 polyy<- head(polyy, length(polyy) - 1)

5052c5071


Re: [R] geom_ribbon removes missing values

2010-06-06 Thread Karsten Loesing
On 6/6/10 7:46 PM, Karsten Loesing wrote:
> Hi Hadley,
> 
> On 5/31/10 9:51 PM, Hadley Wickham wrote:
>> There's no easy way to do this because behind the scenes geom_ribbon
>> uses grid.polygon.
> 
> A possible workaround might be to have grid.polygon draw multiple
> polygons, one for each interval. We can do this by constructing vectors
> with coordinates for the first polygon, then NA, then coordinates for
> the second polygon, etc. Here are the vectors for my initial example:
> 
> x <- c(x[1:4], x[4:1], NA, x[7:10], x[10:7])
> y <- c(ymax[1:4], ymin[4:1], NA, ymax[7:10], ymin[10:7])
> 
> I worked on a simple (but ugly) patch to GeomRibbon in ggplot2 that does
> the job using an iteration:
> 
> 
> /Library/Frameworks/R.framework/Versions/2.10/Resources/library/ggplot2/R$
> diff ggplot2-orig ggplot2
> 5047,5048d5046
> < data <- remove_missing(data, na.rm,
> <   c("x","ymin","ymax"), name = "geom_ribbon")
> 5050a5049,5069
>> start <- 0
>> polyx <- c()
>> polyy <- c()
>> for (i in 1:(length(data$x)+1)) {
>>   if (i > length(data$x) || is.na(data$ymin[i]) ||
>>   is.na(data$ymax[i])) {
>> if (start > 0) {
>>   polyx <- c(polyx, data$x[start:(i-1)],
>>   data$x[(i-1):start], NA)
>>   polyy <- c(polyy, data$ymax[start:(i-1)],
>>   data$ymin[start:(i-1)], NA)

Whoops, change that to:

 data$ymin[(i-1):start], NA)

>>   start <- 0
>> }
>>   } else {
>> if (start == 0) {
>>   start <- i
>> }
>>   }
>> }
>> polyx <- head(polyx, length(polyx) - 1)
>> polyy <- head(polyy, length(polyy) - 1)
> 5052c5071
> <   coordinates$munch(data.frame(x=c(x, rev(x)), y=c(ymax,
> rev(ymin))), scales)
> ---
>>   coordinates$munch(data.frame(x = polyx, y = polyy), scales)
> 
> 
> Do you like the described approach? Can you help me make my patch better?
> 
> In particular, I'd want to avoid iterating over the data frame and
> extract start and end index of intervals separated by NA. Is there a
> function for this or at least a better approach?
> 
> Also, probably a stupid question: How do I tell R to use the cloned
> ggplot2 sources instead of the installed ggplot2 package? As you can
> see, I modified the installed package, but I'd rather work with Git here.
> 
> Thanks,
> --Karsten
> 
> 
>> On Sun, May 30, 2010 at 7:26 AM, Karsten Loesing
>>  wrote:
>>> Hi everyone,
>>>
>>> it looks like geom_ribbon removes missing values and plots a single
>>> ribbon over the whole interval of x values. However, I'd rather want it
>>> to act like geom_line, that is, interrupt the ribbon for the interval of
>>> missing values and continue once there are new values. Here's an example:
>>>
>>> library(ggplot2)
>>> df <- data.frame(
>>>  date = seq(from = as.Date("2010-05-15"),
>>> to = as.Date("2010-05-24"),
>>> by = "1 day"),
>>>  low = c(4, 5, 4, 5, NA, NA, 4, 5, 4, 5),
>>>  mid = c(8, 9, 8, 9, NA, NA, 8, 9, 8, 9),
>>>  high = c(12, 13, 12, 13, NA, NA, 12, 13, 12, 13))
>>> ggplot(df, aes(x = date, y = mid, ymin = low, ymax = high)) +
>>>  geom_line() +
>>>  geom_ribbon(fill = alpha("blue", 0.5))
>>>
>>> When running this code, R tells me:
>>>
>>> Warning message:
>>> Removed 2 rows containing missing values (geom_ribbon).
>>>
>>> When you look at the graph, you can see that the line stops at May 18
>>> and starts again on May 21. But the ribbon reaches from May 15 to 24,
>>> even though there are no values on May 19 and 20.
>>>
>>> Is there an option that I could set? Or a geom/stat that I should use
>>> instead? In my pre-ggplot2 times I used polygon(), but I figured there
>>> must be something better in ggplot2 (as there has always been so far).
>>>
>>> Thanks,
>>> --Karsten
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] geom_ribbon removes missing values

2010-06-06 Thread Karsten Loesing
Hi Hadley,

On 5/31/10 9:51 PM, Hadley Wickham wrote:
> There's no easy way to do this because behind the scenes geom_ribbon
> uses grid.polygon.

A possible workaround might be to have grid.polygon draw multiple
polygons, one for each interval. We can do this by constructing vectors
with coordinates for the first polygon, then NA, then coordinates for
the second polygon, etc. Here are the vectors for my initial example:

x <- c(x[1:4], x[4:1], NA, x[7:10], x[10:7])
y <- c(ymax[1:4], ymin[4:1], NA, ymax[7:10], ymin[10:7])

I worked on a simple (but ugly) patch to GeomRibbon in ggplot2 that does
the job using an iteration:


/Library/Frameworks/R.framework/Versions/2.10/Resources/library/ggplot2/R$
diff ggplot2-orig ggplot2
5047,5048d5046
< data <- remove_missing(data, na.rm,
<   c("x","ymin","ymax"), name = "geom_ribbon")
5050a5049,5069
> start <- 0
> polyx <- c()
> polyy <- c()
> for (i in 1:(length(data$x)+1)) {
>   if (i > length(data$x) || is.na(data$ymin[i]) ||
>   is.na(data$ymax[i])) {
> if (start > 0) {
>   polyx <- c(polyx, data$x[start:(i-1)],
>   data$x[(i-1):start], NA)
>   polyy <- c(polyy, data$ymax[start:(i-1)],
>   data$ymin[start:(i-1)], NA)
>   start <- 0
> }
>   } else {
> if (start == 0) {
>   start <- i
> }
>   }
> }
> polyx <- head(polyx, length(polyx) - 1)
> polyy <- head(polyy, length(polyy) - 1)
5052c5071
<   coordinates$munch(data.frame(x=c(x, rev(x)), y=c(ymax,
rev(ymin))), scales)
---
>   coordinates$munch(data.frame(x = polyx, y = polyy), scales)


Do you like the described approach? Can you help me make my patch better?

In particular, I'd want to avoid iterating over the data frame and
extract start and end index of intervals separated by NA. Is there a
function for this or at least a better approach?

Also, probably a stupid question: How do I tell R to use the cloned
ggplot2 sources instead of the installed ggplot2 package? As you can
see, I modified the installed package, but I'd rather work with Git here.

Thanks,
--Karsten


> On Sun, May 30, 2010 at 7:26 AM, Karsten Loesing
>  wrote:
>> Hi everyone,
>>
>> it looks like geom_ribbon removes missing values and plots a single
>> ribbon over the whole interval of x values. However, I'd rather want it
>> to act like geom_line, that is, interrupt the ribbon for the interval of
>> missing values and continue once there are new values. Here's an example:
>>
>> library(ggplot2)
>> df <- data.frame(
>>  date = seq(from = as.Date("2010-05-15"),
>> to = as.Date("2010-05-24"),
>> by = "1 day"),
>>  low = c(4, 5, 4, 5, NA, NA, 4, 5, 4, 5),
>>  mid = c(8, 9, 8, 9, NA, NA, 8, 9, 8, 9),
>>  high = c(12, 13, 12, 13, NA, NA, 12, 13, 12, 13))
>> ggplot(df, aes(x = date, y = mid, ymin = low, ymax = high)) +
>>  geom_line() +
>>  geom_ribbon(fill = alpha("blue", 0.5))
>>
>> When running this code, R tells me:
>>
>> Warning message:
>> Removed 2 rows containing missing values (geom_ribbon).
>>
>> When you look at the graph, you can see that the line stops at May 18
>> and starts again on May 21. But the ribbon reaches from May 15 to 24,
>> even though there are no values on May 19 and 20.
>>
>> Is there an option that I could set? Or a geom/stat that I should use
>> instead? In my pre-ggplot2 times I used polygon(), but I figured there
>> must be something better in ggplot2 (as there has always been so far).
>>
>> Thanks,
>> --Karsten
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] geom_ribbon removes missing values

2010-05-31 Thread Hadley Wickham
Hi Karsten,

There's no easy way to do this because behind the scenes geom_ribbon
uses grid.polygon.

Hadley

On Sun, May 30, 2010 at 7:26 AM, Karsten Loesing
 wrote:
> Hi everyone,
>
> it looks like geom_ribbon removes missing values and plots a single
> ribbon over the whole interval of x values. However, I'd rather want it
> to act like geom_line, that is, interrupt the ribbon for the interval of
> missing values and continue once there are new values. Here's an example:
>
> library(ggplot2)
> df <- data.frame(
>  date = seq(from = as.Date("2010-05-15"),
>             to = as.Date("2010-05-24"),
>             by = "1 day"),
>  low = c(4, 5, 4, 5, NA, NA, 4, 5, 4, 5),
>  mid = c(8, 9, 8, 9, NA, NA, 8, 9, 8, 9),
>  high = c(12, 13, 12, 13, NA, NA, 12, 13, 12, 13))
> ggplot(df, aes(x = date, y = mid, ymin = low, ymax = high)) +
>  geom_line() +
>  geom_ribbon(fill = alpha("blue", 0.5))
>
> When running this code, R tells me:
>
> Warning message:
> Removed 2 rows containing missing values (geom_ribbon).
>
> When you look at the graph, you can see that the line stops at May 18
> and starts again on May 21. But the ribbon reaches from May 15 to 24,
> even though there are no values on May 19 and 20.
>
> Is there an option that I could set? Or a geom/stat that I should use
> instead? In my pre-ggplot2 times I used polygon(), but I figured there
> must be something better in ggplot2 (as there has always been so far).
>
> Thanks,
> --Karsten
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.