Re: [R] log y 'axis' of histogram

David Scott Mon, 30 Aug 2010 14:05:26 -0700

 On 31/08/10 03:37, Derek M Jones wrote:

Hadley,

I have counts ranging over 4-6 orders of magnitude with peaks
occurring at various 'magic' values.  Using a log scale for the
y-axis enables the smaller peaks, which would otherwise
be almost invisible bumps along the x-axis, to be seen

That doesn't justify the use of a _histogram_  - and regardless of

The usage highlights meaningful characteristics of the data.
What better justification for any method of analysis and display is
there?

what distributional display you use, logging the counts imposes some
pretty heavy restrictions on the shape of the distribution (e.g. that
it must not drop to zero).

Does there have to be a recognized statistical distribution to use R?
In my case I am using R for all of the analysis and graphics in a
new book.  This means that sometimes I have to deal with data sets
that are more or less a jumble of numbers with patterns in a few
places.  For instance, the numeric value of integer constants
appearing as one operand of the binary bitwise-AND operator (see
figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data
at: www.knosof.co.uk/cbook/bandcons.hist.gz)

qplot(band, binwidth=8, geom="histogram") + scale_y_log()
does a good job of highlighting the peaks.

It may be useful for your purposes, but that doesn't necessarily make
it a meaningful graphic.

Doesn't being useful for my purpose make it meaningful, at least for me
and I hope my readers?

Hadley is correct about the problem of where to end the bars when tryingto draw a log-histogram: basically you have to decide to cut them offsomewhere. He is also right that a log-histogram is perhaps not a greatgraphic to use. However, they are used and indeed there is one in theFieller, Flenley, Olbricht paper (published in Applied Statistics, nowJRSS C) for example. I haven't searched for others, but certainly when Iwrote a log-histogram routine it wasn't because I thought of doing sucha plot all on my own.

A number of authors, including Barndorff-Nielsen in at least some of hispapers (I haven't gone back and checked all his older work) just plotthe midpoints of the tops of the log-histogram. (That is an option inlogHist). Another approach is to fit an empirical density to the dataand plot the log-density. That matches the advice often seen in thisforum that plotting empirical density functions is preferable to drawinghistograms. My feeling is that either of these two approaches isprobably preferable to using log-histograms for the reasons Hadleyenunciated. When plotting data plus a fitted curve, the midpointsapproach does have the advantage of distinguishing data and theoreticalcurve more clearly.

Overall the idea of a plot with a logged y-axis is definitely a good oneand its use is endemic in literature concerned with heavy-taileddistributions, particularly finance. The advantage is the clarityoffered regarding tail behaviour, where for example exponential tails inthe density correspond to straight lines in the logged y-axis plot.


Hope this helps.

David Scott


--
_________________________________________________________________
David Scott     Department of Statistics
                The University of Auckland, PB 92019
                Auckland 1142,    NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] log y 'axis' of histogram

Reply via email to