I was distracted enough by the possibility of hijacking hist() for this
to give it a go. 

The following code implements a basic hanging rootogram based on a
normal density with hist() breaks used as bins and bin midpoints used as
the hanging location (not exact, I suspect, but perhaops good enough).
Extensions to other distributions are reasonably obvious.

S Ellison


rootonorm <- function(x, breaks="Sturges", col="lightgrey", gap=0.2,
...) {
        h<-hist(x, breaks=breaks)
        nbins<-length(h$counts)
        mu<-mean(x)
        s<-sd(x)
        normdens<-dnorm(h$mids, mu, s)
        
        plot.range <- range(pretty(h$breaks))
        
        plot(z <- seq(plot.range[1], plot.range[2], length.out=200),
                        dens<-dnorm(z, mu,s), type="n", ...)
        
        d.gap <- min(diff(h$breaks)) * gap /2
        
        for(i in 1:nbins) {
                rect(h$breaks[i]+d.gap, normdens[i]-h$density[i],
h$breaks[i+1]-d.gap, normdens[i], col=col)
        
        }
        
        lines(z, dens, lwd=2)
        
        points(h$mids, normdens) 
        
}

set.seed(17*13)
y <- rnorm(500, 10,3)
rootonorm(y)
 

>>> Deepayan Sarkar <deepayan.sar...@gmail.com> 17/01/2011 05:06:54
>>>
On Sun, Jan 16, 2011 at 11:58 AM, Hugo Mildenberger
<hugo.mildenber...@web.de> wrote:
> Thank you very much for your qualified answers, and also for the
> link to the Tukey paper. I appreciate Tukey's writings very much.

Yes, thanks to Hadley for the nice reference, I hadn't seen it before.

> Looking at the lattice code (below), a possible implementation might
> involve  binning, not so?
>
> I see a problematic part here:
>
>   xx <- sort(unique(x))
>
> Unique certainly works well with Poisson distributed data, but is
> essentially a no-op when confronted with continous floating-point
> numbers.

True, but as Achim said, rootogram() is intended to work with data
arising from discrete distributions, not continuous ones. I see now
that this is not as explicit as it could be in the help page (although
"frequency distribution" gives a hint), which I will try to improve.

I don't think automatic handling of continuous distributions is simple
(because it is not clear how you would specify the reference
distribution). However, a little preliminary work will get you close
with the current implementation:

xnorm <- rnorm(1000)

## 'discretize' by binning and replacing data by bin midpoints
h <- hist(xnorm, plot = FALSE) # add arguments for more control
xdisc <- with(h, rep(mids, counts))

## Option 1: Assume bin probabilities proportional to dnorm()
norm.factor <- sum(dnorm(h$mids, mean(xnorm), sd(xnorm)))

rootogram(~ xdisc,
          dfun = function(x) {
              dnorm(x, mean(xnorm), sd(xnorm)) / norm.factor
          })

## Option 2: Compute probabilities explicitly using pnorm()

## pdisc <- diff(pnorm(h$breaks)) ## or estimated:
pdisc <- diff(pnorm(h$breaks, mean = mean(xnorm), sd = sd(xnorm)))
pdisc <- pdisc / sum(pdisc)

rootogram(~ xdisc,
          dfun = function(x) {
              f <- factor(x, levels = h$mids)
              pdisc[f]
          })

-Deepayan

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to