Re: [R] Violin plot of categorical/binned data

2012-11-06 Thread Brian Diggs

On 11/3/2012 5:47 PM, Jim Lemon wrote:

On 11/04/2012 06:27 AM, Nathan Miller wrote:

Hi,

I'm trying to create a plot showing the density distribution of some
shipping data. I like the look of violin plots, but my data is not
continuous but rather binned and I want to make sure its binned nature
(not
smooth) is apparent in the final plot. So for example, I have the
number of
individuals per vessel, but rather than having the actual number of
individuals I have data in the format of: 7 values of zero, 11 values
between 1-10, 6 values between 10-100, 13 values between 100-1000,
etc. To
plot this data I generated a new dataset with the first 7 values being 0,
representing the 7 values of 0, the next 11 values being 5.5,
representing
the 11 values between 1-10, etc. Sample data below.

I can make a violin plot (code below) using a log y-axis, which looks
alright (though I do have to deal with the zeros still), but in its
default
format it hides the fact that these are binned data, which seems a bit
misleading. Is it possible to make a violin plot that looks a bit more
angular (more corners, less smoothing) or in someway shows the
distribution, but also clearly shows the true nature of these data? I've
tried playing with the bandwidth adjustment and the kernel but haven't
been
able to get a figure that seems to work.

Anyone have some thoughts on this?


Hi Nate,
I'm not exactly sure what you are doing in the data transformation, but
you can display this type of information as a single polygon for each
instance (kiteChart) or separate rectangles (battleship.plot).

library(plotrix)
vessels-matrix(c(zero=sample(1:10,5),one2ten=sample(5:20,5),
  ten2hundred=sample(15:36,5),hundred2thousand=sample(10:16,5)),
  ncol=4)
battleship.plot(vessels,xlab=Number of passengers,
  yaxlab=c(Barnacle,Maelstrom,Poopdeck,Seasick,Wallower),
  xaxlab=c(0,1-10,10-100,100-1000))
kiteChart(vessels,xlab=Number of passengers,ylab=Vessel,
  varlabels=c(Barnacle,Maelstrom,Poopdeck,Seasick,Wallower),
  timelabels=c(0,1-10,10-100,100-1000))

Jim


Expanding on the idea of the battleship.plot, you can draw rectangles of 
the right width with ggplot2 if you want.


Original data:

data2 - read.csv(text=
count, bin
7,0
11,1-10
6,11-100
13,101-1000
7,1001-1
3,10001-10
2,11-100)
data2$bin - ordered(data2$bin, 
levels=c(0,1-10,11-100,101-1000,1001-1,10001-10,11-100))


Define the lower and upper reaches of each bin:

data2$low - c(0,1,11,101,1001,10001,11)
data2$high - c(0,10,100,1000,1,10,100)

And make multiple ones for different vessels (or whatever grouping):

data3 - rbind(data2, data2, data2, data2)
data3$vessel - rep(c(Barnacle,Maelstrom,Poopdeck,Seasick),
each=7)
data3$count - abs(data2$count + sample(-5:5, 7*4, replace=TRUE))

With each bin taking the same size, regardless of its extent:

ggplot(data3) +
  geom_blank(aes(x=count/2, y=bin)) +
  geom_rect(aes(ymin=as.numeric(bin)-0.5, ymax=as.numeric(bin)+0.5,
xmin = -count/2, xmax = count/2)) +
  facet_grid(~vessel)

Width (height, really) of rectangles is based on range. Since 
logarithmic scale and exponential binning, rectangles are same height 
(with some gaps due to discrete nature). Since log scale, still problems 
with 0.


ggplot(data3) +
  geom_rect(aes(ymin=low, ymax=high, xmin=-count/2, xmax=count/2)) +
  facet_grid(~vessel) +
  scale_y_log10()




--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Violin plot of categorical/binned data

2012-11-03 Thread Nathan Miller
Hi,

I'm trying to create a plot showing the density distribution of some
shipping data. I like the look of violin plots, but my data is not
continuous but rather binned and I want to make sure its binned nature (not
smooth) is apparent in the final plot. So for example, I have the number of
individuals per vessel, but rather than having the actual number of
individuals I have data in the format of: 7 values of zero, 11 values
between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To
plot this data I generated a new dataset with the first 7 values being 0,
representing the 7 values of 0, the next 11 values being 5.5, representing
the 11 values between 1-10, etc. Sample data below.

I can make a violin plot (code below) using a log y-axis, which looks
alright (though I do have to deal with the zeros still), but in its default
format it hides the fact that these are binned data, which seems a bit
misleading. Is it possible to make a violin plot that looks a bit more
angular (more corners, less smoothing) or in someway shows the
distribution, but also clearly shows the true nature of these data? I've
tried playing with the bandwidth adjustment and the kernel but haven't been
able to get a figure that seems to work.

Anyone have some thoughts on this?

Thanks,
Nate

library(ggplot2)
library(scales)

p=ggplot(data2,(aes(vessel,values)))
p+geom_violin()+
scale_y_log10(breaks = trans_breaks(log10, function(x) 10^x),labels =
trans_format(log10, math_format(10^.x)))

data2-read.table(textConnection(
   vessel  values
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 5.5e+00
 rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+04
rec 5.5e+04
rec 5.5e+04
rec 5.5e+05
rec 5.5e+05,header=T)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Violin plot of categorical/binned data

2012-11-03 Thread Jim Lemon

On 11/04/2012 06:27 AM, Nathan Miller wrote:

Hi,

I'm trying to create a plot showing the density distribution of some
shipping data. I like the look of violin plots, but my data is not
continuous but rather binned and I want to make sure its binned nature (not
smooth) is apparent in the final plot. So for example, I have the number of
individuals per vessel, but rather than having the actual number of
individuals I have data in the format of: 7 values of zero, 11 values
between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To
plot this data I generated a new dataset with the first 7 values being 0,
representing the 7 values of 0, the next 11 values being 5.5, representing
the 11 values between 1-10, etc. Sample data below.

I can make a violin plot (code below) using a log y-axis, which looks
alright (though I do have to deal with the zeros still), but in its default
format it hides the fact that these are binned data, which seems a bit
misleading. Is it possible to make a violin plot that looks a bit more
angular (more corners, less smoothing) or in someway shows the
distribution, but also clearly shows the true nature of these data? I've
tried playing with the bandwidth adjustment and the kernel but haven't been
able to get a figure that seems to work.

Anyone have some thoughts on this?


Hi Nate,
I'm not exactly sure what you are doing in the data transformation, but 
you can display this type of information as a single polygon for each 
instance (kiteChart) or separate rectangles (battleship.plot).


library(plotrix)
vessels-matrix(c(zero=sample(1:10,5),one2ten=sample(5:20,5),
 ten2hundred=sample(15:36,5),hundred2thousand=sample(10:16,5)),
 ncol=4)
battleship.plot(vessels,xlab=Number of passengers,
 yaxlab=c(Barnacle,Maelstrom,Poopdeck,Seasick,Wallower),
 xaxlab=c(0,1-10,10-100,100-1000))
kiteChart(vessels,xlab=Number of passengers,ylab=Vessel,
 varlabels=c(Barnacle,Maelstrom,Poopdeck,Seasick,Wallower),
 timelabels=c(0,1-10,10-100,100-1000))

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.