Re: [R] sample function

Ted Harding Fri, 11 Mar 2005 03:18:04 -0800

On 11-Mar-05 Martin C. Martin wrote:
> "hist" is lumping things together.
> 
> Try:
> sum(temp == 0)
> 
> compare to the height of the left most bar.
> 
> Is this a bug in hist?
> 
> - Martin


Well, not a bug strictly speaking since "it works as documented",
but I do think it's not necessarily a happy choice.

The unsuspecting (like Martin) will step into holes even after
reading "?hist", since the truths are rather deeply (and I think
somewhat obliquely) hidden ("?hist" leads you to look up
"?nclass.Sturges" which in turn only mentions "Sturges' formula"
and invites you to read V&R's MASS book and other references
in the hope of further clarification -- all a bit much when
you just want to draw a histogram, which ought to be kid's
stuff! Not to mention the things to do with parameters
"include.lowest" and "right" whose combined effect is not
too obvious).

I'd like to repeat the sort of hint I occasionally give:

In using R, if there's any doubt it is best to spell out exactly
what you want rather than expecting the functions to agree with
what you want. R functions are often more complex and subtle
than you might suspect.

In this particular case,

  hist(temp,breaks= -0.5+(-0:14) )

will produce the sort of thing which is wanted. One could
interpret the results which Martin reported as due to a
sort of "confusion" (but on whose part -- R or Martin?)
over the fact that "hist" is designed to deal with
"continuous" values, while his sample consists of integers.

For that particular case, one could also use "table" or
"barchart", as has been suggested by David Scott, which
would produce a plot of similar appearance; but this is
not in the "histogram family" despite appearances, since
it is not primarily a "quantitative" plot (i.e. respecting
the numerical values and their numerical comparisons), but
more a "catefory count". In particular, natural variants
of the above "hist" command such as

  hist(temp,breaks= -0.5+2*(0:7) )

(which corresponds to binning by different intervals) do
not lie so easily in the "table" or "barchart" domain.

And I don't agree with David's comment that "No, hist
is the wrong thing to use to display this data."

In so far as these data are considered to be numerical
values of which one wants a view of their distribution,
then "hist" is entirely appropriate, as for any other
numerical variable. The only question is how to get
this to happen appropriately.

Would David make the same comment about data sampled
from (0:5000) instead of (0:12)?

Best wishes to all,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 11-Mar-05                                       Time: 10:59:55
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] sample function

Reply via email to