Re: [R] Including percentage values inside columns of a histogram

Rui Barradas Tue, 17 Aug 2021 12:04:18 -0700

Hello,



Às 19:28 de 17/08/21, Bert Gunter escreveu:

Inline below.



On Tue, Aug 17, 2021 at 4:09 AM Rui Barradas <ruipbarra...@sapo.pt> wrote:


Hello,

I had forgotten about plot.histogram, it does make everything simpler.
To have percentages on the bars, in the code below I use package scales.

Note that it seems to me that you do not want densities, to have
percentages,  the proportions of counts are given by any of


Under the default of equal width bins -- which is what Sturges gives


Right.

if I read the docs correctly -- since the densities sum to 1,


The "densities" do not sum to 1. From ?hist, section Value:

density

values f^(x[i]), as estimated density values. If all(diff(breaks) == 1),they are the relative frequencies counts/n and in general satisfy

sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i].

If all(diff(breaks) == 1) is FALSE, the density list member must bemultiplied by diff(.$breaks)



h <- hist(datasetregs$Amount, plot = FALSE)
sum(h$density)
#[1] 1e-04
diff(h$breaks)
#[1] 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000
sum(h$density*diff(h$breaks))
#[1] 1


Hope this helps,

Rui Barradas

they are

already the proportion of counts in each histogram bin, no?

-- Bert


h$counts/sum(h$counts)
h$density*diff(h$breaks)



# One histogram for all dates
h <- hist(datasetregs$Amount, plot = FALSE)
plot(h, labels = scales::percent(h$counts/sum(h$counts)),
       ylim = c(0, 1.1*max(h$counts)))



# Histograms by date
sp <- split(datasetregs, datasetregs$Date)
old_par <- par(mfrow = c(1, 3))
h_list <- lapply(seq_along(sp), function(i){
    hist_title <- paste("Histogram of", names(sp)[i])
    h <- hist(sp[[i]]$Amount, plot = FALSE)
    plot(h, main = hist_title, xlab = "Amount",
         labels = scales::percent(h$counts/sum(h$counts)),
         ylim = c(0, 1.1*max(h$counts)))
})
par(old_par)


Hope this helps,

Rui Barradas

Às 01:49 de 17/08/21, Bert Gunter escreveu:

I may well misunderstand, but proffered solutions seem more complicated
than necessary.
Note that the return of hist() can be saved as a list of class "histogram"
and then plotted with  plot.histogram(), which already has a "labels"
argument that seems to be what you want. A simple example is"

dat <- runif(50, 0, 10)
myhist <- hist(dat, freq = TRUE, breaks ="Sturges")

plot(myhist, col = "darkgray",
       labels = as.character(round(myhist$density*100,1) ),
       ylim = c(0, 1.1*max(myhist$counts)))
## note that this is plot.histogram because myhist has class "histogram"

Note that I expanded the y axis a bit to be sure to include the labels. You
can, of course, plot your separate years as Rui has indicated or via e.g.
?layout.

Apologies if I have misunderstood. Just ignore this in that case.
Otherwise, I leave it to you to fill in details.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Aug 16, 2021 at 4:14 PM Paul Bernal <paulberna...@gmail.com> wrote:

Dear Jim,

Thank you so much for your kind reply. Yes, this is what I am looking for,
however, can´t see clearly how the bars correspond to the bins in the
x-axis. Maybe there is a way to align the amounts so that they match the
columns, sorry if I sound picky, but just want to learn if there is a way
to accomplish this.

Best regards,

Paul

El lun, 16 ago 2021 a las 17:57, Jim Lemon (<drjimle...@gmail.com>)
escribió:

Hi Paul,
I just worked out your first request:

datasetregs<-<-structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L), .Label = c("AF 2017", "AF 2020", "AF 2021"), class =
"factor"),
      Amount = c(40100, 101100, 35000, 40100, 15000, 45100, 40200,
      15000, 35000, 35100, 20300, 40100, 15000, 67100, 17100, 15000,
      15000, 50100, 35100, 15000, 15000, 15000, 15000, 15000, 15000,
      15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000,
      15000, 15000, 20100, 15000, 15000, 15000, 15000, 15000, 15000,
      16600, 15000, 15000, 15700, 15000, 15000, 15000, 15000, 15000,
      15000, 15000, 15000, 15000, 20200, 21400, 25100, 15000, 15000,
      15000, 15000, 15000, 15000, 25600, 15000, 15000, 15000, 15000,
      15000, 15000, 15000, 15000)), row.names = c(NA, -74L), class =
"data.frame")
histval<-with(datasetregs, hist(Amount, groups=Date, scale="frequency",
   breaks="Sturges", col="darkgray"))
library(plotrix)
histpcts<-paste0(round(100*histval$counts/sum(histval$counts),1),"%")
barlabels(histval$mids,histval$counts,histpcts)

I think that's what you asked for:

Jim

On Tue, Aug 17, 2021 at 8:44 AM Paul Bernal <paulberna...@gmail.com>
wrote:


This is way better, now, how could I put the frequency labels in the
columns as a percentage, instead of presenting them as counts?

Thank you so much.

Paul

El lun, 16 ago 2021 a las 17:33, Rui Barradas (<ruipbarra...@sapo.pt>)
escribió:

Hello,

You forgot to cc the list.

Here are two ways, both of them apply hist() and text() to Amount

split

by Date. The return value of hist is saved because it's a list with
members the histogram's bars midpoints and the counts. Those are used

to

know where to put the text labels.
A vector lbls is created to get rid of counts of zero.

The main difference between the two ways is the histogram's titles.


old_par <- par(mfrow = c(1, 3))
h_list <- with(datasetregs, tapply(Amount, Date, function(x){
     h <- hist(x)
     lbls <- ifelse(h$counts == 0, NA_integer_, h$counts)
     text(h$mids, h$counts/2, labels = lbls)
}))
par(old_par)



old_par <- par(mfrow = c(1, 3))
sp <- split(datasetregs, datasetregs$Date)
h_list <- lapply(seq_along(sp), function(i){
     hist_title <- paste("Histogram of", names(sp)[i])
     h <- hist(sp[[i]]$Amount, main = hist_title)
     lbls <- ifelse(h$counts == 0, NA_integer_, h$counts)
     text(h$mids, h$counts/2, labels = lbls)
})
par(old_par)


Hope this helps,

Rui Barradas

Às 23:16 de 16/08/21, Paul Bernal escreveu:

Dear Rui,

The hist() function comes from the graphics package, from what I

could

see. The thing is that I want to divide the Amount column into

several

bins and then generate three different histograms, one for each AF
period (AF refers to fiscal years). As you can see, the data

contains

three fiscal years (2017, 2020 and 2021). I want to see the

percentage

of cases that fall into different amount categories, from 15,000

and

below, 16,000 to 17,000, from 18,000 to 19,000, and so on.

Thanks for your kind help.

Paul

El lun, 16 ago 2021 a las 17:07, Rui Barradas (<

ruipbarra...@sapo.pt

<mailto:ruipbarra...@sapo.pt>>) escribió:

      Hello,

      The function Hist comes from what package?

      Are you sure you don't want a bar plot?


      agg <- aggregate(Amount ~ Date, datasetregs, sum)
      bp <- barplot(Amount ~ Date, agg)
      with(agg, text(bp, Amount/2, labels = Amount))


      Hope this helps,

      Rui Barradas

      Às 22:54 de 16/08/21, Paul Bernal escreveu:
       > Hello everyone,
       >
       > I am currently working with R version 4.1.0 and I am trying

to

      include
       > (inside the columns of the histogram), the percentage
      distribution and I
       > want to generate three histograms, one for each fiscal year

(in

      the Date
       > column, there are three fiscal year AF 2017, AF 2020 and AF
      2021). However,
       > I can´t seem to accomplish this.
       >
       > Here is my data:
       >
       > structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L,

2L,

       > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L,

       > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L,

       > 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,

3L,

       > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,

3L,

       > 3L, 3L, 3L), .Label = c("AF 2017", "AF 2020", "AF 2021"),

class =

       > "factor"),
       >      Amount = c(40100, 101100, 35000, 40100, 15000, 45100,

40200,

       >      15000, 35000, 35100, 20300, 40100, 15000, 67100, 17100,

15000,

       >      15000, 50100, 35100, 15000, 15000, 15000, 15000, 15000,

15000,

       >      15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000,

15000,

       >      15000, 15000, 20100, 15000, 15000, 15000, 15000, 15000,

15000,

       >      16600, 15000, 15000, 15700, 15000, 15000, 15000, 15000,

15000,

       >      15000, 15000, 15000, 15000, 20200, 21400, 25100, 15000,

15000,

       >      15000, 15000, 15000, 15000, 25600, 15000, 15000, 15000,

15000,

       >      15000, 15000, 15000, 15000)), row.names = c(NA, -74L),

class

       > "data.frame")
       >
       > I would like to modify the following script:
       >
       >> with(datasetregs, Hist(Amount, groups=Date,

scale="frequency",

       > +   breaks="Sturges", col="darkgray"))
       >
       > #The only thing missing here are the percentages

corresponding to

      each bin
       > (I would like to see the percentages inside each column, or

on

      top outside
       > if possible)
       >
       > Any help will be greatly appreciated.
       >
       > Best regards,
       >
       > Paul.
       >
       >       [[alternative HTML version deleted]]
       >
       > ______________________________________________
       > R-help@r-project.org <mailto:R-help@r-project.org> mailing

list

      -- To UNSUBSCRIBE and more, see
       > https://stat.ethz.ch/mailman/listinfo/r-help
      <https://stat.ethz.ch/mailman/listinfo/r-help>
       > PLEASE do read the posting guide
      http://www.R-project.org/posting-guide.html
      <http://www.R-project.org/posting-guide.html>
       > and provide commented, minimal, self-contained, reproducible

code.


          [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


          [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Including percentage values inside columns of a histogram

Reply via email to