RE: [R] Generating a vector for breaks in a histogram

2003-07-06 Thread Mulholland, Tom
My gut feeling is that stacked dotplots would have given you the same
insight. In general terms it's about getting the right tool for the
right job. My comment was about the order of choosing rather than
ignoring totally. If I recall correctly the article about dot plots was
about old fashioned hand drawn dot plots where dots were either stacked
above each other or if more appropriate next to each other as near as
possible to where they should be located on the axis. This results in a
pattern that looks very similar to the histogram. The argument being
made if I recall correctly is that if you choose the wrong bins for a
histogram you may well end up with the same type of result that you had
with the densityplot.

My practical way of looking at this is to look at what happens to the
overall shape of the histogram when you change the bins. The issue is
how quickly and reliably do you get to the "truth" using the various
techniques. As you've noted the density plot doesn't seem to deal with
some types of data as well as it does others. So when I am looking at
data I use a variety of methods, and histograms come later than rugplots
or density plots, but I tend to do both of those together.

I'm just learning and welcome guidance in a field that I do not claim
expertise in.

_
 
Tom Mulholland
Senior Policy Officer
WA Country Health Service
189 Royal St, East Perth, WA, 6004
 
Tel: (08) 9222 4062
e-mail: [EMAIL PROTECTED]
 
The contents of this e-mail transmission are confidential an...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Generating a vector for breaks in a histogram

2003-07-05 Thread M. Edward Borasky
Well ... In my own recent example, it was plotting the raw data as a
histogram that finally directed me to the "truth" of what the data had to
say. As you may recall, the dataset was inter-arrival times of calls to a
computer routine, known only from timestamps truncated (not rounded) to the
nearest second. I started with kernel density (sm.density, with the default
parameters, to be precise) and was unsatisfied with the result. Yesterday,
when I plotted the raw counts (how many values were 0, how many 1, etc.) as
a histogram, I was struck by two things:

1. There really are only two peaks -- the "stuff" in between them is, for
the purpose of business decisions, irrelevant.

2. The inter-arrival time value "0" in such a dataset represents all the
values that are greater than or equal to zero and *less than 1*, and so on.
There is a natural "histogramming" going on via the timestamp truncation,
which implies to me that the *midpoint* of the "bin" -- say, for the 0
values, 0.5 -- is the "natural" value to choose for the "x-axis" in the
absence of any better information. This also rather neatly disposes of the
issue of zero-valued inter-arrival times. :)

Are the "old ways" best? Maybe not. Can I make reasonable business decisions
without histograms? I'm not convinced that's the case; it certainly wasn't
the case this time.

Finally, while I've never been fortunate enough to use S, the existence of R
has caused a revolution in the way I do the analysis of computer performance
data. Before R came along, the only tools I had available were Excel,
Minitab, and any special-purpose code I was willing to write to accomplish
tasks not in the vocabulary of Excel or Minitab. For example, it's
difficult, though not impossible, to do a non-linear regression or kernel
density estimation with either tool. In R, they're one-liners. If there was
a Nobel Prize for scientific software, I'd nominate R and its creators. (Of
course, there *is* a Nobel in Economics.) :)
-- 
M. Edward (Ed) Borasky
mailto:[EMAIL PROTECTED]
http://www.borasky-research.net

> -Original Message-
> Things have moved on since the ASH work too, but I would 
> agree that density estimation is often a better way than 
> histograms.  However, close 
> to state-of-the-art density estimation is built into R 
> (?density) and packages `polspline', `KernSmooth' and `sm' 
> are also much more advanced 
> than `ash'. 
> 
> It was the advent of enough computing power that changed 
> this, and the S 
> language has been in the forefront of making the state of the art 
> available.  You'll see that MASS (the book) covers histograms and 
> alternatives in its chapter on Univariate Distributions, and 
> it has since 
> its 1994 first edition (when did you go to `school'?)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Generating a vector for breaks in a histogram

2003-07-03 Thread Prof Brian Ripley
Things have moved on since the ASH work too, but I would agree that
density estimation is often a better way than histograms.  However, close 
to state-of-the-art density estimation is built into R (?density) and
packages `polspline', `KernSmooth' and `sm' are also much more advanced 
than `ash'. 

It was the advent of enough computing power that changed this, and the S 
language has been in the forefront of making the state of the art 
available.  You'll see that MASS (the book) covers histograms and 
alternatives in its chapter on Univariate Distributions, and it has since 
its 1994 first edition (when did you go to `school'?)

One often overlooked alternative is plot ECDFs.

If distributions are not really continuous other techniques may be 
appropriate -- such as dotplots.

On Fri, 4 Jul 2003, Mulholland, Tom wrote:

> One of my discoveries while learning the art of R, is that time has
> moved on since I did my basic statistics in school (although to my
> dismay the teaching of statistics in school appears also to have not
> noticed the movement.) I have seen a few references when people want to
> pie chart something, for the advice to be "find a better way." I've been
> reading some of the ash work (see package of same name and loads of
> papers on the web), also some interesting work on dot plots as an
> alternative to histograms. They make me feel that unless the data that
> you have in both histograms accidentally works well with the same set of
> bins you may not get the comparative assessment that you think you are
> getting.
> 
> I am beginning to form the opinion that in most cases (if not all) there
> are better alternatives to histograms.
> _
>  
> Tom Mulholland
> Senior Policy Officer
> WA Country Health Service
> 189 Royal St, East Perth, WA, 6004
>  
> Tel: (08) 9222 4062
> e-mail: [EMAIL PROTECTED]
>  
> The contents of this e-mail transmission are confidential an...{{dropped}}
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Generating a vector for breaks in a histogram

2003-07-03 Thread Mulholland, Tom
One of my discoveries while learning the art of R, is that time has
moved on since I did my basic statistics in school (although to my
dismay the teaching of statistics in school appears also to have not
noticed the movement.) I have seen a few references when people want to
pie chart something, for the advice to be "find a better way." I've been
reading some of the ash work (see package of same name and loads of
papers on the web), also some interesting work on dot plots as an
alternative to histograms. They make me feel that unless the data that
you have in both histograms accidentally works well with the same set of
bins you may not get the comparative assessment that you think you are
getting.

I am beginning to form the opinion that in most cases (if not all) there
are better alternatives to histograms.
_
 
Tom Mulholland
Senior Policy Officer
WA Country Health Service
189 Royal St, East Perth, WA, 6004
 
Tel: (08) 9222 4062
e-mail: [EMAIL PROTECTED]
 
The contents of this e-mail transmission are confidential an...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Generating a vector for breaks in a histogram

2003-07-03 Thread Duncan Murdoch
On Thu, 3 Jul 2003 14:48:04 +0100 , "michael watson (IAH-C)"
<[EMAIL PROTECTED]> wrote :

>Now this makes sense of course, my bins probably DON'T span the entire range of X.  
>SO I am still left with the same problem:
>
>1) two variables
>2) I want to draw histograms of both
>3) I want them to have the SAME x-y scale on the graph
>4) I want them to have the SAME bin range
>
>How do i do it?  Any suggestions?

I'd expand the range to cover both (otherwise your histogram will miss
some observations), but if you really want to do what you're asking,
use both xlim and breaks.  For example:

> x <- rnorm(100)

This gives the error you saw, because there are some negative values:

> hist(x, breaks=seq(0,5, len=6))
Error in hist.default(x, breaks = seq(0, 5, len = 6)) : 
some `x' not counted; maybe `breaks' do not span range of `x'

But this works:

> hist(x, breaks=seq(-5,5, len=11),xlim=c(0,5))

It only shows the bins that are between 0 and 5, and a bit of the one
from -1 to 0.  If you don't even want that bit, then you could do a
histogram of a subset of your data:

> hist(x[x>0], breaks=seq(0,5,len=6))

You'll probably want to use ylim to force the vertical scales to match
between plots.

If none of these work, you should always be able construct the plot
you want using barplot(), where you do all the calculations for
positioning yourself.

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Generating a vector for breaks in a histogram

2003-07-03 Thread Prof Brian Ripley
On Thu, 3 Jul 2003, michael watson (IAH-C) wrote:

> Fantastic.  You're right, I was looking for seq().
> 
> However, my plan for using it for hist() was foiled!  
> 
> I thought if I did something like:
> 
> > b <- seq(0,500,10)
> > hist(myvble,breaks=b)
> 
> It would bin myvble into the bins 0-50,50-100,100-150 etc and in that way I could 
> ensure that two histograms are on the same scale with the same bins!
> 
> I get the following error:
> 
> Error in hist.default(Cy5, breaks = s) : some 'x' not counted; maybe 'breaks' do not 
> span range of 'x'
> 
> Now this makes sense of course, my bins probably DON'T span the entire
> range of X.  SO I am still left with the same problem:
> 
> 1) two variables
> 2) I want to draw histograms of both
> 3) I want them to have the SAME x-y scale on the graph
> 4) I want them to have the SAME bin range
> 
> How do i do it?  Any suggestions?

SO choose a range of breaks to span the whole range of x!

Hint: hist(c(x1,  x2)) knows how to do it, and may even provide you a
suitable vector of breaks in its return value.

> Cheers
> Mick
> 
> -Original Message-----
> From: Hotz, T. [mailto:[EMAIL PROTECTED]
> Sent: 03 July 2003 14:16
> To: michael watson (IAH-C); [EMAIL PROTECTED]
> Subject: RE: [R] Generating a vector for breaks in a histogram
> 
> 
> Dear Mick,
> 
> Have a look at ?seq - seq(1,20,length=20) should do it.
> 
> HTH
> 
> Thomas
> 
> ---
> 
> Thomas Hotz
> Research Associate in Medical Statistics
> University of Leicester
> United Kingdom
> 
> Department of Epidemiology and Public Health
> 22-28 Princess Road West
> Leicester
> LE1 6TP
> Tel +44 116 252-5410
> Fax +44 116 252-5423
> 
> Division of Medicine for the Elderly
> Department of Medicine
> The Glenfield Hospital
> Leicester
> LE3 9QP
> Tel +44 116 256-3643
> Fax +44 116 232-2976
> 
> 
> > -Original Message-
> > From: michael watson (IAH-C) [mailto:[EMAIL PROTECTED]
> > Sent: 03 July 2003 14:02
> > To: '[EMAIL PROTECTED]'
> > Subject: [R] Generating a vector for breaks in a histogram
> > 
> > 
> > Hi
> > 
> > I have two lots of numbers which I would like to histogram 
> > using the hist() function.  For comparative reasons, I want 
> > them to be on the same scale, which I can use the xlim and 
> > ylim options to achieve.
> > 
> > However, having them on the same scale is meaningless unless 
> > they have the same "breaks".  Consulting the documentation, 
> > there are 4 ways of defining the number of breaks, only one 
> > of which is "definite", the others merely form suggestions, 
> > which I have found is not good enough.
> > 
> > The only definite way is to provide a vector to the hist() 
> > function which is a vector of the break points for the 
> > histogram.  So I need to generate a vector that contains say, 
> > 500, numbers in it, equi-distance apart between a min and a max.  EG:
> > 
> > > myfunc(n=20,min=1,max=20) 
> > 
> > would provide a vector, length 20, with the numbers 1 through 
> > 20 in it.
> > 
> > Is there a function in R that can do this?
> > 
> > Thanks
> > Mick
> > 
> > __
> > [EMAIL PROTECTED] mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Generating a vector for breaks in a histogram

2003-07-03 Thread michael watson (IAH-C)
Fantastic.  You're right, I was looking for seq().

However, my plan for using it for hist() was foiled!  

I thought if I did something like:

> b <- seq(0,500,10)
> hist(myvble,breaks=b)

It would bin myvble into the bins 0-50,50-100,100-150 etc and in that way I could 
ensure that two histograms are on the same scale with the same bins!

I get the following error:

Error in hist.default(Cy5, breaks = s) : some 'x' not counted; maybe 'breaks' do not 
span range of 'x'

Now this makes sense of course, my bins probably DON'T span the entire range of X.  
SO I am still left with the same problem:

1) two variables
2) I want to draw histograms of both
3) I want them to have the SAME x-y scale on the graph
4) I want them to have the SAME bin range

How do i do it?  Any suggestions?

Cheers
Mick

-Original Message-
From: Hotz, T. [mailto:[EMAIL PROTECTED]
Sent: 03 July 2003 14:16
To: michael watson (IAH-C); [EMAIL PROTECTED]
Subject: RE: [R] Generating a vector for breaks in a histogram


Dear Mick,

Have a look at ?seq - seq(1,20,length=20) should do it.

HTH

Thomas

---

Thomas Hotz
Research Associate in Medical Statistics
University of Leicester
United Kingdom

Department of Epidemiology and Public Health
22-28 Princess Road West
Leicester
LE1 6TP
Tel +44 116 252-5410
Fax +44 116 252-5423

Division of Medicine for the Elderly
Department of Medicine
The Glenfield Hospital
Leicester
LE3 9QP
Tel +44 116 256-3643
Fax +44 116 232-2976


> -Original Message-
> From: michael watson (IAH-C) [mailto:[EMAIL PROTECTED]
> Sent: 03 July 2003 14:02
> To: '[EMAIL PROTECTED]'
> Subject: [R] Generating a vector for breaks in a histogram
> 
> 
> Hi
> 
> I have two lots of numbers which I would like to histogram 
> using the hist() function.  For comparative reasons, I want 
> them to be on the same scale, which I can use the xlim and 
> ylim options to achieve.
> 
> However, having them on the same scale is meaningless unless 
> they have the same "breaks".  Consulting the documentation, 
> there are 4 ways of defining the number of breaks, only one 
> of which is "definite", the others merely form suggestions, 
> which I have found is not good enough.
> 
> The only definite way is to provide a vector to the hist() 
> function which is a vector of the break points for the 
> histogram.  So I need to generate a vector that contains say, 
> 500, numbers in it, equi-distance apart between a min and a max.  EG:
> 
> > myfunc(n=20,min=1,max=20) 
> 
> would provide a vector, length 20, with the numbers 1 through 
> 20 in it.
> 
> Is there a function in R that can do this?
> 
> Thanks
> Mick
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Generating a vector for breaks in a histogram

2003-07-03 Thread Hotz, T.
Dear Mick,

Have a look at ?seq - seq(1,20,length=20) should do it.

HTH

Thomas

---

Thomas Hotz
Research Associate in Medical Statistics
University of Leicester
United Kingdom

Department of Epidemiology and Public Health
22-28 Princess Road West
Leicester
LE1 6TP
Tel +44 116 252-5410
Fax +44 116 252-5423

Division of Medicine for the Elderly
Department of Medicine
The Glenfield Hospital
Leicester
LE3 9QP
Tel +44 116 256-3643
Fax +44 116 232-2976


> -Original Message-
> From: michael watson (IAH-C) [mailto:[EMAIL PROTECTED]
> Sent: 03 July 2003 14:02
> To: '[EMAIL PROTECTED]'
> Subject: [R] Generating a vector for breaks in a histogram
> 
> 
> Hi
> 
> I have two lots of numbers which I would like to histogram 
> using the hist() function.  For comparative reasons, I want 
> them to be on the same scale, which I can use the xlim and 
> ylim options to achieve.
> 
> However, having them on the same scale is meaningless unless 
> they have the same "breaks".  Consulting the documentation, 
> there are 4 ways of defining the number of breaks, only one 
> of which is "definite", the others merely form suggestions, 
> which I have found is not good enough.
> 
> The only definite way is to provide a vector to the hist() 
> function which is a vector of the break points for the 
> histogram.  So I need to generate a vector that contains say, 
> 500, numbers in it, equi-distance apart between a min and a max.  EG:
> 
> > myfunc(n=20,min=1,max=20) 
> 
> would provide a vector, length 20, with the numbers 1 through 
> 20 in it.
> 
> Is there a function in R that can do this?
> 
> Thanks
> Mick
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help