hadley wickham wrote: > On 9/29/07, hadley wickham <[EMAIL PROTECTED]> wrote: >> On 9/29/07, Michael Friendly <[EMAIL PROTECTED]> wrote: >>> hadley wickham wrote: >>>> I was interested to see that you have code for drawing scatterplots >>>> with multiple y-axes. As far as I know the only legitimate use for a >>>> double-axis plot is to confuse or mislead the reader (and this is not >>>> a very ethical use case). Perhaps you have a counter-example? >>>> >>>> Hadley >>>> >>> While it is true that the double-Y-axis graph is generally considered >>> sinful, it can be used effectively to show the relation of two time >>> series in ways that other graphs can't do as well. >>> >>> For one striking example, >>> a political, presentation graphic, see: >>> http://www.math.yorku.ca/SCS/Gallery/images/commonsenserevolution6.pdf >>> described on my Graphical Excellence page, >>> http://www.math.yorku.ca/SCS/Gallery/excellence.html >>> I found it easy to excuse the sin by the 'wow effect' produced by the >>> graph. >> While I agree that the double y-axis plot can be used to compare two >> time series, I'm not sure whether or not it actually is effective. >> The appearance of the display is so critically dependent on the >> relative scales of the axes, that it is easy to draw the wrong >> conclusion. Why not use a scatterplot or path plot (i.e. connect >> subsequent observations with edges) if you want to understand the >> relationship between two variables? > > To compare the scatterplot vs double axis plot, I used graphclick > (http://www.arizona-software.ch/graphclick/) to digitise the graphic, > to get the following dataset: > > csr <- structure(list(year = c(1985, 1986, 1987, 1988, 1989, 1990, 1991, > 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, > 2003, 2004, 2005, 2006), deaths = c(1, 1, 7, 5, 12, 3, 7, 5, > 4, 6, 8, 19, 26, 20, 42, 41, 45, 41, 27, 52, 67, 50), income = c(NA, > 8572, NA, NA, 9264, 10071, 10338, 10687, 10666, 10666, 9907, > 8141, 8059, 7997, 7874, 7648, 7484, 7319, 7135, 7135, 7011, NA > )), .Names = c("year", "deaths", "income"), row.names = c(NA, > -22L), class = "data.frame") > > and produce the attached graphic (I'm not sure if the attachment will > make it to r-help, but the code should be reproducible on any system): > > library(ggplot2) > ggplot(csr, aes(x=deaths, y=income)) + > geom_path(colour="grey80") + geom_point() > > # or without connecting lines > ggplot(csr, aes(x=deaths, y=income)) + geom_point() > > I find this graph much easier to interpret - one can see outliers, the > suggestion of non-linearity etc. It would also be easy to add the > political party with colour or shape. > > I'm not sure if it's a good idea to include the line or not - the > gestalt principle of connectedness makes it very difficult to > interpret the points as separate objects even when the line connecting > them is so faint. > > Hadley >
Thanks for trying this, Hadley, because the comparison is instructive in terms of the difference between the communication goals of analysis and presentation graphs. Actually, one should regard income as the independent variable, deaths as response, so what you want is > ggplot(csr, aes(y=deaths, x=income)) + + geom_path(colour="grey80") + geom_point() > but, instead of/in addition to geom_path, a bolder loess smooth would show the trend better. This does, indeed show the inverse, and non-linear relation between welfare income and deaths more directly, a few outliers. Good for an analysis graph, but it fails the Interocular Traumatic Test for a presentation graph-- the message should hit you between the eyes. Even with use of color/shape to represent the party in power, the stark message of the original is lost: When the Mike Harris conservatives came to power in Ontario in June 1995, they slashed welfare payments, and the number deaths of homeless people increased dramatically. This trend continued under the McGuinty liberals, elected in Oct 2003. It's particularly poignant that bars for deaths are made from the names of the homeless who died (and sad to see the number of John/Jane Doe among them). To explore this further, I added a column for party to the csr dataframe, but the transitions between parties occurred in different months, and one would need a separate datafram to represent that precisely. year deaths income party 1 1985 1 NA Liberal 2 1986 1 8572 Liberal 3 1987 7 NA Liberal 4 1988 5 NA Liberal 5 1989 12 9264 Liberal 6 1990 3 10071 NDP 7 1991 7 10338 NDP 8 1992 5 10687 NDP 9 1993 4 10666 NDP 10 1994 6 10666 NDP 11 1995 8 9907 Conservative 12 1996 19 8141 Conservative 13 1997 26 8059 Conservative 14 1998 20 7997 Conservative 15 1999 42 7874 Conservative 16 2000 41 7648 Conservative 17 2001 45 7484 Conservative 18 2002 41 7319 Conservative 19 2003 27 7135 Liberal 20 2004 52 7135 Liberal 21 2005 67 7011 Liberal 22 2006 50 NA Liberal > -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.