> On 22 Dec 2016, at 18:08 , William Dunlap via R-help <r-help@r-project.org> > wrote: > > As a practical matter, 'continuous' data must be discretized, so if you > have long vectors of it you will run into this problem.
Yep, and it is a bit unfortunate that hist() tries to use "pretty" breakpoints, so that you will have data points on the boundaries, causing all the left/right/endpoint business to come into play. The truehist() function in MASS does somewhat better. For the case at hand, things are much improved by setting the breaks explicitly: hist(y,freq=TRUE, col='red', breaks=0.5:6.5) but as pointed out by others, it is a much better idea to do plot(factor(y, levels=1:6)) or similar. Incidentally, what is the most handy way to get a plot with percentages instead of counts? This works, but seems a bit ham-fisted: barplot(prop.table(table(factor(y, levels=1:6)))) -pd > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Thu, Dec 22, 2016 at 8:19 AM, Martin Maechler <maech...@stat.math.ethz.ch >> wrote: > >>>>>>> itpro <itp...@yandex.ru> >>>>>>> on Thu, 22 Dec 2016 16:17:28 +0300 writes: >> >>> Hi, everyone. >>> I stumbled upon weird histogram behaviour. >> >>> Consider this "dice emulator": >>> Step 1: Generate uniform random array x of size N. >>> Step 2: Multiply each item by six and round to next bigger integer >> to get numbers 1 to 6. >>> Step 3: Plot histogram. >> >>>> x<-runif(N) >>>> y<-ceiling(x*6) >>>> hist(y,freq=TRUE, col='orange') >> >> >>> Now what I get with N=100000 >> >>>> x<-runif(100000) >>>> y<-ceiling(x*6) >>>> hist(y,freq=TRUE, col='green') >> >>> At first glance looks OK. >> >>> Now try N=100 >> >>>> x<-runif(100) >>>> y<-ceiling(x*6) >>>> hist(y,freq=TRUE, col='red') >> >>> Now first bar is not where it should be. >>> Hmm. Look again to 100000 histogram... First bar is not where I want >> it, it's only less striking due to narrow bars. >> >>> So, first bar is always in wrong position. How do I fix it to make >> perfectly spaced bars? >> >> Don't use histograms *at all* for such discrete integer data. >> >> N <- rpois(100, 5) >> plot(table(N), lwd = 4) >> >> Histograms should be only be used for continuous data (or discrete data >> with "many" possible values). >> >> It's a pain to see them so often "misused" for data like the 'N' above. >> >> Martin Maechler, >> ETH Zurich >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.