Re: [R] Violin plot of categorical/binned data
On 11/3/2012 5:47 PM, Jim Lemon wrote: On 11/04/2012 06:27 AM, Nathan Miller wrote: Hi, I'm trying to create a plot showing the density distribution of some shipping data. I like the look of violin plots, but my data is not continuous but rather binned and I want to make sure its binned nature (not smooth) is apparent in the final plot. So for example, I have the number of individuals per vessel, but rather than having the actual number of individuals I have data in the format of: 7 values of zero, 11 values between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To plot this data I generated a new dataset with the first 7 values being 0, representing the 7 values of 0, the next 11 values being 5.5, representing the 11 values between 1-10, etc. Sample data below. I can make a violin plot (code below) using a log y-axis, which looks alright (though I do have to deal with the zeros still), but in its default format it hides the fact that these are binned data, which seems a bit misleading. Is it possible to make a violin plot that looks a bit more angular (more corners, less smoothing) or in someway shows the distribution, but also clearly shows the true nature of these data? I've tried playing with the bandwidth adjustment and the kernel but haven't been able to get a figure that seems to work. Anyone have some thoughts on this? Hi Nate, I'm not exactly sure what you are doing in the data transformation, but you can display this type of information as a single polygon for each instance (kiteChart) or separate rectangles (battleship.plot). library(plotrix) vessels-matrix(c(zero=sample(1:10,5),one2ten=sample(5:20,5), ten2hundred=sample(15:36,5),hundred2thousand=sample(10:16,5)), ncol=4) battleship.plot(vessels,xlab=Number of passengers, yaxlab=c(Barnacle,Maelstrom,Poopdeck,Seasick,Wallower), xaxlab=c(0,1-10,10-100,100-1000)) kiteChart(vessels,xlab=Number of passengers,ylab=Vessel, varlabels=c(Barnacle,Maelstrom,Poopdeck,Seasick,Wallower), timelabels=c(0,1-10,10-100,100-1000)) Jim Expanding on the idea of the battleship.plot, you can draw rectangles of the right width with ggplot2 if you want. Original data: data2 - read.csv(text= count, bin 7,0 11,1-10 6,11-100 13,101-1000 7,1001-1 3,10001-10 2,11-100) data2$bin - ordered(data2$bin, levels=c(0,1-10,11-100,101-1000,1001-1,10001-10,11-100)) Define the lower and upper reaches of each bin: data2$low - c(0,1,11,101,1001,10001,11) data2$high - c(0,10,100,1000,1,10,100) And make multiple ones for different vessels (or whatever grouping): data3 - rbind(data2, data2, data2, data2) data3$vessel - rep(c(Barnacle,Maelstrom,Poopdeck,Seasick), each=7) data3$count - abs(data2$count + sample(-5:5, 7*4, replace=TRUE)) With each bin taking the same size, regardless of its extent: ggplot(data3) + geom_blank(aes(x=count/2, y=bin)) + geom_rect(aes(ymin=as.numeric(bin)-0.5, ymax=as.numeric(bin)+0.5, xmin = -count/2, xmax = count/2)) + facet_grid(~vessel) Width (height, really) of rectangles is based on range. Since logarithmic scale and exponential binning, rectangles are same height (with some gaps due to discrete nature). Since log scale, still problems with 0. ggplot(data3) + geom_rect(aes(ymin=low, ymax=high, xmin=-count/2, xmax=count/2)) + facet_grid(~vessel) + scale_y_log10() -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Violin plot of categorical/binned data
Hi, I'm trying to create a plot showing the density distribution of some shipping data. I like the look of violin plots, but my data is not continuous but rather binned and I want to make sure its binned nature (not smooth) is apparent in the final plot. So for example, I have the number of individuals per vessel, but rather than having the actual number of individuals I have data in the format of: 7 values of zero, 11 values between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To plot this data I generated a new dataset with the first 7 values being 0, representing the 7 values of 0, the next 11 values being 5.5, representing the 11 values between 1-10, etc. Sample data below. I can make a violin plot (code below) using a log y-axis, which looks alright (though I do have to deal with the zeros still), but in its default format it hides the fact that these are binned data, which seems a bit misleading. Is it possible to make a violin plot that looks a bit more angular (more corners, less smoothing) or in someway shows the distribution, but also clearly shows the true nature of these data? I've tried playing with the bandwidth adjustment and the kernel but haven't been able to get a figure that seems to work. Anyone have some thoughts on this? Thanks, Nate library(ggplot2) library(scales) p=ggplot(data2,(aes(vessel,values))) p+geom_violin()+ scale_y_log10(breaks = trans_breaks(log10, function(x) 10^x),labels = trans_format(log10, math_format(10^.x))) data2-read.table(textConnection( vessel values rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+04 rec 5.5e+04 rec 5.5e+04 rec 5.5e+05 rec 5.5e+05,header=T) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Violin plot of categorical/binned data
On 11/04/2012 06:27 AM, Nathan Miller wrote: Hi, I'm trying to create a plot showing the density distribution of some shipping data. I like the look of violin plots, but my data is not continuous but rather binned and I want to make sure its binned nature (not smooth) is apparent in the final plot. So for example, I have the number of individuals per vessel, but rather than having the actual number of individuals I have data in the format of: 7 values of zero, 11 values between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To plot this data I generated a new dataset with the first 7 values being 0, representing the 7 values of 0, the next 11 values being 5.5, representing the 11 values between 1-10, etc. Sample data below. I can make a violin plot (code below) using a log y-axis, which looks alright (though I do have to deal with the zeros still), but in its default format it hides the fact that these are binned data, which seems a bit misleading. Is it possible to make a violin plot that looks a bit more angular (more corners, less smoothing) or in someway shows the distribution, but also clearly shows the true nature of these data? I've tried playing with the bandwidth adjustment and the kernel but haven't been able to get a figure that seems to work. Anyone have some thoughts on this? Hi Nate, I'm not exactly sure what you are doing in the data transformation, but you can display this type of information as a single polygon for each instance (kiteChart) or separate rectangles (battleship.plot). library(plotrix) vessels-matrix(c(zero=sample(1:10,5),one2ten=sample(5:20,5), ten2hundred=sample(15:36,5),hundred2thousand=sample(10:16,5)), ncol=4) battleship.plot(vessels,xlab=Number of passengers, yaxlab=c(Barnacle,Maelstrom,Poopdeck,Seasick,Wallower), xaxlab=c(0,1-10,10-100,100-1000)) kiteChart(vessels,xlab=Number of passengers,ylab=Vessel, varlabels=c(Barnacle,Maelstrom,Poopdeck,Seasick,Wallower), timelabels=c(0,1-10,10-100,100-1000)) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.