RE: [R] Generating a vector for breaks in a histogram
My gut feeling is that stacked dotplots would have given you the same insight. In general terms it's about getting the right tool for the right job. My comment was about the order of choosing rather than ignoring totally. If I recall correctly the article about dot plots was about old fashioned hand drawn dot plots where dots were either stacked above each other or if more appropriate next to each other as near as possible to where they should be located on the axis. This results in a pattern that looks very similar to the histogram. The argument being made if I recall correctly is that if you choose the wrong bins for a histogram you may well end up with the same type of result that you had with the densityplot. My practical way of looking at this is to look at what happens to the overall shape of the histogram when you change the bins. The issue is how quickly and reliably do you get to the "truth" using the various techniques. As you've noted the density plot doesn't seem to deal with some types of data as well as it does others. So when I am looking at data I use a variety of methods, and histograms come later than rugplots or density plots, but I tend to do both of those together. I'm just learning and welcome guidance in a field that I do not claim expertise in. _ Tom Mulholland Senior Policy Officer WA Country Health Service 189 Royal St, East Perth, WA, 6004 Tel: (08) 9222 4062 e-mail: [EMAIL PROTECTED] The contents of this e-mail transmission are confidential an...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Generating a vector for breaks in a histogram
Well ... In my own recent example, it was plotting the raw data as a histogram that finally directed me to the "truth" of what the data had to say. As you may recall, the dataset was inter-arrival times of calls to a computer routine, known only from timestamps truncated (not rounded) to the nearest second. I started with kernel density (sm.density, with the default parameters, to be precise) and was unsatisfied with the result. Yesterday, when I plotted the raw counts (how many values were 0, how many 1, etc.) as a histogram, I was struck by two things: 1. There really are only two peaks -- the "stuff" in between them is, for the purpose of business decisions, irrelevant. 2. The inter-arrival time value "0" in such a dataset represents all the values that are greater than or equal to zero and *less than 1*, and so on. There is a natural "histogramming" going on via the timestamp truncation, which implies to me that the *midpoint* of the "bin" -- say, for the 0 values, 0.5 -- is the "natural" value to choose for the "x-axis" in the absence of any better information. This also rather neatly disposes of the issue of zero-valued inter-arrival times. :) Are the "old ways" best? Maybe not. Can I make reasonable business decisions without histograms? I'm not convinced that's the case; it certainly wasn't the case this time. Finally, while I've never been fortunate enough to use S, the existence of R has caused a revolution in the way I do the analysis of computer performance data. Before R came along, the only tools I had available were Excel, Minitab, and any special-purpose code I was willing to write to accomplish tasks not in the vocabulary of Excel or Minitab. For example, it's difficult, though not impossible, to do a non-linear regression or kernel density estimation with either tool. In R, they're one-liners. If there was a Nobel Prize for scientific software, I'd nominate R and its creators. (Of course, there *is* a Nobel in Economics.) :) -- M. Edward (Ed) Borasky mailto:[EMAIL PROTECTED] http://www.borasky-research.net > -Original Message- > Things have moved on since the ASH work too, but I would > agree that density estimation is often a better way than > histograms. However, close > to state-of-the-art density estimation is built into R > (?density) and packages `polspline', `KernSmooth' and `sm' > are also much more advanced > than `ash'. > > It was the advent of enough computing power that changed > this, and the S > language has been in the forefront of making the state of the art > available. You'll see that MASS (the book) covers histograms and > alternatives in its chapter on Univariate Distributions, and > it has since > its 1994 first edition (when did you go to `school'?) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Generating a vector for breaks in a histogram
Things have moved on since the ASH work too, but I would agree that density estimation is often a better way than histograms. However, close to state-of-the-art density estimation is built into R (?density) and packages `polspline', `KernSmooth' and `sm' are also much more advanced than `ash'. It was the advent of enough computing power that changed this, and the S language has been in the forefront of making the state of the art available. You'll see that MASS (the book) covers histograms and alternatives in its chapter on Univariate Distributions, and it has since its 1994 first edition (when did you go to `school'?) One often overlooked alternative is plot ECDFs. If distributions are not really continuous other techniques may be appropriate -- such as dotplots. On Fri, 4 Jul 2003, Mulholland, Tom wrote: > One of my discoveries while learning the art of R, is that time has > moved on since I did my basic statistics in school (although to my > dismay the teaching of statistics in school appears also to have not > noticed the movement.) I have seen a few references when people want to > pie chart something, for the advice to be "find a better way." I've been > reading some of the ash work (see package of same name and loads of > papers on the web), also some interesting work on dot plots as an > alternative to histograms. They make me feel that unless the data that > you have in both histograms accidentally works well with the same set of > bins you may not get the comparative assessment that you think you are > getting. > > I am beginning to form the opinion that in most cases (if not all) there > are better alternatives to histograms. > _ > > Tom Mulholland > Senior Policy Officer > WA Country Health Service > 189 Royal St, East Perth, WA, 6004 > > Tel: (08) 9222 4062 > e-mail: [EMAIL PROTECTED] > > The contents of this e-mail transmission are confidential an...{{dropped}} > > __ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Generating a vector for breaks in a histogram
One of my discoveries while learning the art of R, is that time has moved on since I did my basic statistics in school (although to my dismay the teaching of statistics in school appears also to have not noticed the movement.) I have seen a few references when people want to pie chart something, for the advice to be "find a better way." I've been reading some of the ash work (see package of same name and loads of papers on the web), also some interesting work on dot plots as an alternative to histograms. They make me feel that unless the data that you have in both histograms accidentally works well with the same set of bins you may not get the comparative assessment that you think you are getting. I am beginning to form the opinion that in most cases (if not all) there are better alternatives to histograms. _ Tom Mulholland Senior Policy Officer WA Country Health Service 189 Royal St, East Perth, WA, 6004 Tel: (08) 9222 4062 e-mail: [EMAIL PROTECTED] The contents of this e-mail transmission are confidential an...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Generating a vector for breaks in a histogram
On Thu, 3 Jul 2003 14:48:04 +0100 , "michael watson (IAH-C)" <[EMAIL PROTECTED]> wrote : >Now this makes sense of course, my bins probably DON'T span the entire range of X. >SO I am still left with the same problem: > >1) two variables >2) I want to draw histograms of both >3) I want them to have the SAME x-y scale on the graph >4) I want them to have the SAME bin range > >How do i do it? Any suggestions? I'd expand the range to cover both (otherwise your histogram will miss some observations), but if you really want to do what you're asking, use both xlim and breaks. For example: > x <- rnorm(100) This gives the error you saw, because there are some negative values: > hist(x, breaks=seq(0,5, len=6)) Error in hist.default(x, breaks = seq(0, 5, len = 6)) : some `x' not counted; maybe `breaks' do not span range of `x' But this works: > hist(x, breaks=seq(-5,5, len=11),xlim=c(0,5)) It only shows the bins that are between 0 and 5, and a bit of the one from -1 to 0. If you don't even want that bit, then you could do a histogram of a subset of your data: > hist(x[x>0], breaks=seq(0,5,len=6)) You'll probably want to use ylim to force the vertical scales to match between plots. If none of these work, you should always be able construct the plot you want using barplot(), where you do all the calculations for positioning yourself. Duncan Murdoch __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Generating a vector for breaks in a histogram
On Thu, 3 Jul 2003, michael watson (IAH-C) wrote: > Fantastic. You're right, I was looking for seq(). > > However, my plan for using it for hist() was foiled! > > I thought if I did something like: > > > b <- seq(0,500,10) > > hist(myvble,breaks=b) > > It would bin myvble into the bins 0-50,50-100,100-150 etc and in that way I could > ensure that two histograms are on the same scale with the same bins! > > I get the following error: > > Error in hist.default(Cy5, breaks = s) : some 'x' not counted; maybe 'breaks' do not > span range of 'x' > > Now this makes sense of course, my bins probably DON'T span the entire > range of X. SO I am still left with the same problem: > > 1) two variables > 2) I want to draw histograms of both > 3) I want them to have the SAME x-y scale on the graph > 4) I want them to have the SAME bin range > > How do i do it? Any suggestions? SO choose a range of breaks to span the whole range of x! Hint: hist(c(x1, x2)) knows how to do it, and may even provide you a suitable vector of breaks in its return value. > Cheers > Mick > > -Original Message----- > From: Hotz, T. [mailto:[EMAIL PROTECTED] > Sent: 03 July 2003 14:16 > To: michael watson (IAH-C); [EMAIL PROTECTED] > Subject: RE: [R] Generating a vector for breaks in a histogram > > > Dear Mick, > > Have a look at ?seq - seq(1,20,length=20) should do it. > > HTH > > Thomas > > --- > > Thomas Hotz > Research Associate in Medical Statistics > University of Leicester > United Kingdom > > Department of Epidemiology and Public Health > 22-28 Princess Road West > Leicester > LE1 6TP > Tel +44 116 252-5410 > Fax +44 116 252-5423 > > Division of Medicine for the Elderly > Department of Medicine > The Glenfield Hospital > Leicester > LE3 9QP > Tel +44 116 256-3643 > Fax +44 116 232-2976 > > > > -Original Message- > > From: michael watson (IAH-C) [mailto:[EMAIL PROTECTED] > > Sent: 03 July 2003 14:02 > > To: '[EMAIL PROTECTED]' > > Subject: [R] Generating a vector for breaks in a histogram > > > > > > Hi > > > > I have two lots of numbers which I would like to histogram > > using the hist() function. For comparative reasons, I want > > them to be on the same scale, which I can use the xlim and > > ylim options to achieve. > > > > However, having them on the same scale is meaningless unless > > they have the same "breaks". Consulting the documentation, > > there are 4 ways of defining the number of breaks, only one > > of which is "definite", the others merely form suggestions, > > which I have found is not good enough. > > > > The only definite way is to provide a vector to the hist() > > function which is a vector of the break points for the > > histogram. So I need to generate a vector that contains say, > > 500, numbers in it, equi-distance apart between a min and a max. EG: > > > > > myfunc(n=20,min=1,max=20) > > > > would provide a vector, length 20, with the numbers 1 through > > 20 in it. > > > > Is there a function in R that can do this? > > > > Thanks > > Mick > > > > __ > > [EMAIL PROTECTED] mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > __ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Generating a vector for breaks in a histogram
Fantastic. You're right, I was looking for seq(). However, my plan for using it for hist() was foiled! I thought if I did something like: > b <- seq(0,500,10) > hist(myvble,breaks=b) It would bin myvble into the bins 0-50,50-100,100-150 etc and in that way I could ensure that two histograms are on the same scale with the same bins! I get the following error: Error in hist.default(Cy5, breaks = s) : some 'x' not counted; maybe 'breaks' do not span range of 'x' Now this makes sense of course, my bins probably DON'T span the entire range of X. SO I am still left with the same problem: 1) two variables 2) I want to draw histograms of both 3) I want them to have the SAME x-y scale on the graph 4) I want them to have the SAME bin range How do i do it? Any suggestions? Cheers Mick -Original Message- From: Hotz, T. [mailto:[EMAIL PROTECTED] Sent: 03 July 2003 14:16 To: michael watson (IAH-C); [EMAIL PROTECTED] Subject: RE: [R] Generating a vector for breaks in a histogram Dear Mick, Have a look at ?seq - seq(1,20,length=20) should do it. HTH Thomas --- Thomas Hotz Research Associate in Medical Statistics University of Leicester United Kingdom Department of Epidemiology and Public Health 22-28 Princess Road West Leicester LE1 6TP Tel +44 116 252-5410 Fax +44 116 252-5423 Division of Medicine for the Elderly Department of Medicine The Glenfield Hospital Leicester LE3 9QP Tel +44 116 256-3643 Fax +44 116 232-2976 > -Original Message- > From: michael watson (IAH-C) [mailto:[EMAIL PROTECTED] > Sent: 03 July 2003 14:02 > To: '[EMAIL PROTECTED]' > Subject: [R] Generating a vector for breaks in a histogram > > > Hi > > I have two lots of numbers which I would like to histogram > using the hist() function. For comparative reasons, I want > them to be on the same scale, which I can use the xlim and > ylim options to achieve. > > However, having them on the same scale is meaningless unless > they have the same "breaks". Consulting the documentation, > there are 4 ways of defining the number of breaks, only one > of which is "definite", the others merely form suggestions, > which I have found is not good enough. > > The only definite way is to provide a vector to the hist() > function which is a vector of the break points for the > histogram. So I need to generate a vector that contains say, > 500, numbers in it, equi-distance apart between a min and a max. EG: > > > myfunc(n=20,min=1,max=20) > > would provide a vector, length 20, with the numbers 1 through > 20 in it. > > Is there a function in R that can do this? > > Thanks > Mick > > __ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Generating a vector for breaks in a histogram
Dear Mick, Have a look at ?seq - seq(1,20,length=20) should do it. HTH Thomas --- Thomas Hotz Research Associate in Medical Statistics University of Leicester United Kingdom Department of Epidemiology and Public Health 22-28 Princess Road West Leicester LE1 6TP Tel +44 116 252-5410 Fax +44 116 252-5423 Division of Medicine for the Elderly Department of Medicine The Glenfield Hospital Leicester LE3 9QP Tel +44 116 256-3643 Fax +44 116 232-2976 > -Original Message- > From: michael watson (IAH-C) [mailto:[EMAIL PROTECTED] > Sent: 03 July 2003 14:02 > To: '[EMAIL PROTECTED]' > Subject: [R] Generating a vector for breaks in a histogram > > > Hi > > I have two lots of numbers which I would like to histogram > using the hist() function. For comparative reasons, I want > them to be on the same scale, which I can use the xlim and > ylim options to achieve. > > However, having them on the same scale is meaningless unless > they have the same "breaks". Consulting the documentation, > there are 4 ways of defining the number of breaks, only one > of which is "definite", the others merely form suggestions, > which I have found is not good enough. > > The only definite way is to provide a vector to the hist() > function which is a vector of the break points for the > histogram. So I need to generate a vector that contains say, > 500, numbers in it, equi-distance apart between a min and a max. EG: > > > myfunc(n=20,min=1,max=20) > > would provide a vector, length 20, with the numbers 1 through > 20 in it. > > Is there a function in R that can do this? > > Thanks > Mick > > __ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help