Re: [R] Histogram omitting/collapsing groups

2012-01-01 Thread peter dalgaard

On Jan 1, 2012, at 07:40 , Joshua Wiley wrote:

 If you just want a plot of the frequencies at each hour why not just call 
 barplot on the output of table?  Histograms create bins and count in those, 
 which doesn't sound like what you're after.
 

Exactly. If what you want is a barplot, make a barplot; histograms are for 
continuous data.   Just remember that you may need to set the levels explicitly 
in case of empty groups: barplot(table(factor(x,levels=0:23))). (This is 
irrelevant with 100K data samples, but not with 100 of them).

That being said, the fact that hist() tends to create breakpoints which 
coincide with data points due to discretization is arguably a bit of a design 
error, but it is age-old and hard to change now. One way out is to use 
truehist() from MASS, another is to explicitly set the breaks to intermediate 
values, as in hist(x, breaks=seq(-.5, 23.5, 1))

 Cheers,
 
 Josh
 
 
 On Dec 31, 2011, at 21:37, jim holtman jholt...@gmail.com wrote:
 
 Fast fingers; notice that there is still a problem in the counts;  I
 was only looking at the last.
 
 Happy New Year -- up too late.
 
 On Sun, Jan 1, 2012 at 12:33 AM, jim holtman jholt...@gmail.com wrote:
 Here is a test I ran and looks fine, but then I created the data, so
 it might have something to do with your data:
 
 x - sample(0:23, 10, TRUE)
 a - hist(x, breaks = 24)
 a[1:5]
 $breaks
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 
 $counts
 [1] 8262 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132
 4139 4231 4216 4158 4054 4185 4153
 [21] 4281 4110 4221
 
 $intensities
 [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155
 0.04157 0.04203 0.04186 0.04158
 [13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153
 0.04281 0.04110 0.04221
 
 $density
 [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155
 0.04157 0.04203 0.04186 0.04158
 [13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153
 0.04281 0.04110 0.04221
 
 $mids
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
 13.5 14.5 15.5 16.5 17.5 18.5 19.5
 [21] 20.5 21.5 22.5
 
 table(x)
 x
  0123456789   10   11   12   13
 14   15   16   17   18   19   20
 4168 4094 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132
 4139 4231 4216 4158 4054 4185 4153
 21   22   23
 4281 4110 4221
 
 
 
 On Sat, Dec 31, 2011 at 11:20 AM, Sarah Goslee sarah.gos...@gmail.com 
 wrote:
 Hi,
 
 I think you're not understanding quite what's going on with hist. Reread 
 the
 help, and take a look at this small example. The solution I'd use is the 
 last
 item.
 
 x - rep(1:10, times=1:10)
 table(x)
 x
 1 2 3 4 5 6 7 8 9 10
 1 2 3 4 5 6 7 8 9 10
 
 
 hist(x, plot=FALSE, right=TRUE)$counts
 [1] 3 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=TRUE)$breaks
 [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=TRUE)$mids
 [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
 
 
 hist(x, plot=FALSE, right=FALSE)$counts
 [1]  1  2  3  4  5  6  7  8 19
 hist(x, plot=FALSE, right=FALSE)$breaks
 [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=FALSE)$mids
 [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
 
 
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts
 [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids
 [1] 1 2 3 4 5 6 7 8 9 10
 
 
 Sarah
 
 On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre a...@arencambre.com wrote:
 I have two large datasets (156K and 2.06M records). Each row has the
 hour that an event happened, represented by an integer from 0 to 23.
 
 R's histogram is combining some data.
 
 Here's the command I ran to get the histogram:
 histinfo - hist(crashes$hour, right=FALSE)
 
 Here's histinfo:
 histinfo
 $breaks
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
 23
 
 $counts
 [1]  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
 6596  7152  7490  8166
 [16]  9758 11301 11745  9943  7494  6272  6220 11669
 
 $intensities
 [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
 0.02937602 0.03930449
 [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
 0.05223967 0.06242403
 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
 0.07464911
 
 $density
 [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
 0.02937602 0.03930449
 [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
 0.05223967 0.06242403
 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
 0.07464911
 
 $mids
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
 13.5 14.5 15.5 16.5 17.5
 [19] 18.5 19.5 20.5 21.5 22.5
 
 $xname
 [1] crashes$hour
 
 $equidist
 [1] TRUE
 
 attr(,class)
 [1] histogram
 
 Note how the last value in counts is 11669. It's relevant to the
 output of 

Re: [R] Histogram omitting/collapsing groups

2012-01-01 Thread Aren Cambre
On Sun, Jan 1, 2012 at 5:29 AM, peter dalgaard pda...@gmail.com wrote:
 Exactly. If what you want is a barplot, make a barplot; histograms are for 
 continuous data.   Just remember that you may need to set the levels 
 explicitly in case of empty groups: barplot(table(factor(x,levels=0:23))). 
 (This is irrelevant with 100K data samples, but not with 100 of them).

 That being said, the fact that hist() tends to create breakpoints which 
 coincide with data points due to discretization is arguably a bit of a design 
 error, but it is age-old and hard to change now. One way out is to use 
 truehist() from MASS, another is to explicitly set the breaks to intermediate 
 values, as in hist(x, breaks=seq(-.5, 23.5, 1))

Thanks, everybody. I'll definitely switch to barplot.

As for continuous, it's all relative. Even the most continuous dataset
at a scale that looks pretty to humans may have gaps between the
values when you zoom in a lot.

Aren

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Histogram omitting/collapsing groups

2012-01-01 Thread Joshua Wiley
Hi Aren,

I was busy thinking about how to make what you wanted, and I missed
that you were working with hours from a day.  That being the case, you
may think about a circular graph.  The attached plots show two
different ways of working with the same data.

Cheers,

Josh

set.seed(10)
x - sample(0:23, 1, TRUE, prob = sin(0:23)+1)

require(ggplot2) # graphing package

## regular barplot
p - ggplot(as.data.frame(table(x)), aes(x = x, y = Freq)) +
  geom_bar()

## using circular coordinates
p2 - p + coord_polar()

## print them
print(p)
print(p2)


## just if you're interested, the code to
## put the two plots side by side
require(grid)

dev.new(height = 6, width = 12)
grid.newpage()
pushViewport(vpList(
  viewport(x = 0, width = .5,  just = left, name = barplot),
  viewport(x = .5, width = .5, just = left, name=windrose)))
seekViewport(barplot)
grid.draw(ggplotGrob(p))
seekViewport(windrose)
grid.draw(ggplotGrob(p2))


On Sun, Jan 1, 2012 at 7:59 AM, Aren Cambre a...@arencambre.com wrote:
 On Sun, Jan 1, 2012 at 5:29 AM, peter dalgaard pda...@gmail.com wrote:
 Exactly. If what you want is a barplot, make a barplot; histograms are for 
 continuous data.   Just remember that you may need to set the levels 
 explicitly in case of empty groups: barplot(table(factor(x,levels=0:23))). 
 (This is irrelevant with 100K data samples, but not with 100 of them).

 That being said, the fact that hist() tends to create breakpoints which 
 coincide with data points due to discretization is arguably a bit of a 
 design error, but it is age-old and hard to change now. One way out is to 
 use truehist() from MASS, another is to explicitly set the breaks to 
 intermediate values, as in hist(x, breaks=seq(-.5, 23.5, 1))

 Thanks, everybody. I'll definitely switch to barplot.

 As for continuous, it's all relative. Even the most continuous dataset
 at a scale that looks pretty to humans may have gaps between the
 values when you zoom in a lot.

 Aren



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/


plots.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Histogram omitting/collapsing groups

2012-01-01 Thread Aren Cambre
This is helpful, although I can't seem to adapt it to my own data.

If I run your sample as is, I do get the nice graphs.

However, this doesn't work:
(Assume you already have a data frame dallas with 2057980 rows. It
has column offense_hour, and each row has a value between 0 and 23,
inclusive.)
 p - ggplot(as.data.frame(table(dallas$offense_hour)), aes(x = 
 dallas$offense_hour, y = Freq)) + geom_bar()
 print(p)
Error in data.frame(x = c(9, 8, 10, 9, 10, 15, 11, 13, 0, 16, 13, 20,  :
  arguments imply differing number of rows: 2057980, 24

Seems like dallas$offense_hour corresponds to x in your example. I'm
confused why yours works even though your x has 10,000 values, yet
mine fails complaining that the row count is way off. Either way, the
length of x or dallas$offense_hour grossly exceeds 24.

Aren

On Sun, Jan 1, 2012 at 10:34 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:

 Hi Aren,

 I was busy thinking about how to make what you wanted, and I missed
 that you were working with hours from a day.  That being the case, you
 may think about a circular graph.  The attached plots show two
 different ways of working with the same data.

 Cheers,

 Josh

 set.seed(10)
 x - sample(0:23, 1, TRUE, prob = sin(0:23)+1)

 require(ggplot2) # graphing package

 ## regular barplot
 p - ggplot(as.data.frame(table(x)), aes(x = x, y = Freq)) +
  geom_bar()

 ## using circular coordinates
 p2 - p + coord_polar()

 ## print them
 print(p)
 print(p2)


 ## just if you're interested, the code to
 ## put the two plots side by side
 require(grid)

 dev.new(height = 6, width = 12)
 grid.newpage()
 pushViewport(vpList(
  viewport(x = 0, width = .5,  just = left, name = barplot),
  viewport(x = .5, width = .5, just = left, name=windrose)))
 seekViewport(barplot)
 grid.draw(ggplotGrob(p))
 seekViewport(windrose)
 grid.draw(ggplotGrob(p2))


 On Sun, Jan 1, 2012 at 7:59 AM, Aren Cambre a...@arencambre.com wrote:
  On Sun, Jan 1, 2012 at 5:29 AM, peter dalgaard pda...@gmail.com wrote:
  Exactly. If what you want is a barplot, make a barplot; histograms are for 
  continuous data.   Just remember that you may need to set the levels 
  explicitly in case of empty groups: barplot(table(factor(x,levels=0:23))). 
  (This is irrelevant with 100K data samples, but not with 100 of them).
 
  That being said, the fact that hist() tends to create breakpoints which 
  coincide with data points due to discretization is arguably a bit of a 
  design error, but it is age-old and hard to change now. One way out is to 
  use truehist() from MASS, another is to explicitly set the breaks to 
  intermediate values, as in hist(x, breaks=seq(-.5, 23.5, 1))
 
  Thanks, everybody. I'll definitely switch to barplot.
 
  As for continuous, it's all relative. Even the most continuous dataset
  at a scale that looks pretty to humans may have gaps between the
  values when you zoom in a lot.
 
  Aren



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 Programmer Analyst II, Statistical Consulting Group
 University of California, Los Angeles
 https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Histogram omitting/collapsing groups

2012-01-01 Thread Joshua Wiley
Sorry, that was probably a really confusing example...too many xs
floating around.

set.seed(10)
rawdata - sample(0:23, 1, TRUE, prob = sin(0:23)+1)

## do theis step first for your data
tableddata - as.data.frame(table(rawdata))
## use these names in ggplot
colnames(tableddata)

require(ggplot2)
p - ggplot(tableddata, aes(x = rawdata, y = Freq)) +
  geom_bar()

Cheers,

Josh

On Sun, Jan 1, 2012 at 2:36 PM, Aren Cambre a...@arencambre.com wrote:
 This is helpful, although I can't seem to adapt it to my own data.

 If I run your sample as is, I do get the nice graphs.

 However, this doesn't work:
 (Assume you already have a data frame dallas with 2057980 rows. It
 has column offense_hour, and each row has a value between 0 and 23,
 inclusive.)
 p - ggplot(as.data.frame(table(dallas$offense_hour)), aes(x = 
 dallas$offense_hour, y = Freq)) + geom_bar()
 print(p)
 Error in data.frame(x = c(9, 8, 10, 9, 10, 15, 11, 13, 0, 16, 13, 20,  :
   arguments imply differing number of rows: 2057980, 24

 Seems like dallas$offense_hour corresponds to x in your example. I'm
 confused why yours works even though your x has 10,000 values, yet
 mine fails complaining that the row count is way off. Either way, the
 length of x or dallas$offense_hour grossly exceeds 24.

 Aren

 On Sun, Jan 1, 2012 at 10:34 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:

 Hi Aren,

 I was busy thinking about how to make what you wanted, and I missed
 that you were working with hours from a day.  That being the case, you
 may think about a circular graph.  The attached plots show two
 different ways of working with the same data.

 Cheers,

 Josh

 set.seed(10)
 x - sample(0:23, 1, TRUE, prob = sin(0:23)+1)

 require(ggplot2) # graphing package

 ## regular barplot
 p - ggplot(as.data.frame(table(x)), aes(x = x, y = Freq)) +
  geom_bar()

 ## using circular coordinates
 p2 - p + coord_polar()

 ## print them
 print(p)
 print(p2)


 ## just if you're interested, the code to
 ## put the two plots side by side
 require(grid)

 dev.new(height = 6, width = 12)
 grid.newpage()
 pushViewport(vpList(
  viewport(x = 0, width = .5,  just = left, name = barplot),
  viewport(x = .5, width = .5, just = left, name=windrose)))
 seekViewport(barplot)
 grid.draw(ggplotGrob(p))
 seekViewport(windrose)
 grid.draw(ggplotGrob(p2))


 On Sun, Jan 1, 2012 at 7:59 AM, Aren Cambre a...@arencambre.com wrote:
  On Sun, Jan 1, 2012 at 5:29 AM, peter dalgaard pda...@gmail.com wrote:
  Exactly. If what you want is a barplot, make a barplot; histograms are 
  for continuous data.   Just remember that you may need to set the levels 
  explicitly in case of empty groups: 
  barplot(table(factor(x,levels=0:23))). (This is irrelevant with 100K data 
  samples, but not with 100 of them).
 
  That being said, the fact that hist() tends to create breakpoints which 
  coincide with data points due to discretization is arguably a bit of a 
  design error, but it is age-old and hard to change now. One way out is to 
  use truehist() from MASS, another is to explicitly set the breaks to 
  intermediate values, as in hist(x, breaks=seq(-.5, 23.5, 1))
 
  Thanks, everybody. I'll definitely switch to barplot.
 
  As for continuous, it's all relative. Even the most continuous dataset
  at a scale that looks pretty to humans may have gaps between the
  values when you zoom in a lot.
 
  Aren



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 Programmer Analyst II, Statistical Consulting Group
 University of California, Los Angeles
 https://joshuawiley.com/



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Histogram omitting/collapsing groups

2012-01-01 Thread Aren Cambre
Thanks. That did it!

And I get it now--in your original example, aes(x = x, y = Freq), x
refers to the column name in as.data.frame(table(x)), not the x
vector(?) you created.

Aren

On Sun, Jan 1, 2012 at 4:44 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Sorry, that was probably a really confusing example...too many xs
 floating around.

 set.seed(10)
 rawdata - sample(0:23, 1, TRUE, prob = sin(0:23)+1)

 ## do theis step first for your data
 tableddata - as.data.frame(table(rawdata))
 ## use these names in ggplot
 colnames(tableddata)

 require(ggplot2)
 p - ggplot(tableddata, aes(x = rawdata, y = Freq)) +
  geom_bar()

 Cheers,

 Josh

 On Sun, Jan 1, 2012 at 2:36 PM, Aren Cambre a...@arencambre.com wrote:
 This is helpful, although I can't seem to adapt it to my own data.

 If I run your sample as is, I do get the nice graphs.

 However, this doesn't work:
 (Assume you already have a data frame dallas with 2057980 rows. It
 has column offense_hour, and each row has a value between 0 and 23,
 inclusive.)
 p - ggplot(as.data.frame(table(dallas$offense_hour)), aes(x = 
 dallas$offense_hour, y = Freq)) + geom_bar()
 print(p)
 Error in data.frame(x = c(9, 8, 10, 9, 10, 15, 11, 13, 0, 16, 13, 20,  :
   arguments imply differing number of rows: 2057980, 24

 Seems like dallas$offense_hour corresponds to x in your example. I'm
 confused why yours works even though your x has 10,000 values, yet
 mine fails complaining that the row count is way off. Either way, the
 length of x or dallas$offense_hour grossly exceeds 24.

 Aren

 On Sun, Jan 1, 2012 at 10:34 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:

 Hi Aren,

 I was busy thinking about how to make what you wanted, and I missed
 that you were working with hours from a day.  That being the case, you
 may think about a circular graph.  The attached plots show two
 different ways of working with the same data.

 Cheers,

 Josh

 set.seed(10)
 x - sample(0:23, 1, TRUE, prob = sin(0:23)+1)

 require(ggplot2) # graphing package

 ## regular barplot
 p - ggplot(as.data.frame(table(x)), aes(x = x, y = Freq)) +
  geom_bar()

 ## using circular coordinates
 p2 - p + coord_polar()

 ## print them
 print(p)
 print(p2)


 ## just if you're interested, the code to
 ## put the two plots side by side
 require(grid)

 dev.new(height = 6, width = 12)
 grid.newpage()
 pushViewport(vpList(
  viewport(x = 0, width = .5,  just = left, name = barplot),
  viewport(x = .5, width = .5, just = left, name=windrose)))
 seekViewport(barplot)
 grid.draw(ggplotGrob(p))
 seekViewport(windrose)
 grid.draw(ggplotGrob(p2))


 On Sun, Jan 1, 2012 at 7:59 AM, Aren Cambre a...@arencambre.com wrote:
  On Sun, Jan 1, 2012 at 5:29 AM, peter dalgaard pda...@gmail.com wrote:
  Exactly. If what you want is a barplot, make a barplot; histograms are 
  for continuous data.   Just remember that you may need to set the levels 
  explicitly in case of empty groups: 
  barplot(table(factor(x,levels=0:23))). (This is irrelevant with 100K 
  data samples, but not with 100 of them).
 
  That being said, the fact that hist() tends to create breakpoints which 
  coincide with data points due to discretization is arguably a bit of a 
  design error, but it is age-old and hard to change now. One way out is 
  to use truehist() from MASS, another is to explicitly set the breaks to 
  intermediate values, as in hist(x, breaks=seq(-.5, 23.5, 1))
 
  Thanks, everybody. I'll definitely switch to barplot.
 
  As for continuous, it's all relative. Even the most continuous dataset
  at a scale that looks pretty to humans may have gaps between the
  values when you zoom in a lot.
 
  Aren



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 Programmer Analyst II, Statistical Consulting Group
 University of California, Los Angeles
 https://joshuawiley.com/



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 Programmer Analyst II, Statistical Consulting Group
 University of California, Los Angeles
 https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Histogram omitting/collapsing groups

2011-12-31 Thread Sarah Goslee
Hi,

I think you're not understanding quite what's going on with hist. Reread the
help, and take a look at this small example. The solution I'd use is the last
item.

 x - rep(1:10, times=1:10)
 table(x)
x
 1  2  3  4  5  6  7  8  9 10
 1  2  3  4  5  6  7  8  9 10


 hist(x, plot=FALSE, right=TRUE)$counts
[1]  3  3  4  5  6  7  8  9 10
 hist(x, plot=FALSE, right=TRUE)$breaks
 [1]  1  2  3  4  5  6  7  8  9 10
 hist(x, plot=FALSE, right=TRUE)$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5


 hist(x, plot=FALSE, right=FALSE)$counts
[1]  1  2  3  4  5  6  7  8 19
 hist(x, plot=FALSE, right=FALSE)$breaks
 [1]  1  2  3  4  5  6  7  8  9 10
 hist(x, plot=FALSE, right=FALSE)$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5


 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts
 [1]  1  2  3  4  5  6  7  8  9 10
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids
 [1]  1  2  3  4  5  6  7  8  9 10


Sarah

On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre a...@arencambre.com wrote:
 I have two large datasets (156K and 2.06M records). Each row has the
 hour that an event happened, represented by an integer from 0 to 23.

 R's histogram is combining some data.

 Here's the command I ran to get the histogram:
 histinfo - hist(crashes$hour, right=FALSE)

 Here's histinfo:
 histinfo
 $breaks
  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

 $counts
  [1]  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
  6596  7152  7490  8166
 [16]  9758 11301 11745  9943  7494  6272  6220 11669

 $intensities
  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
 0.02937602 0.03930449
  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
 0.05223967 0.06242403
 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
 0.07464911

 $density
  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
 0.02937602 0.03930449
  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
 0.05223967 0.06242403
 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
 0.07464911

 $mids
  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
 13.5 14.5 15.5 16.5 17.5
 [19] 18.5 19.5 20.5 21.5 22.5

 $xname
 [1] crashes$hour

 $equidist
 [1] TRUE

 attr(,class)
 [1] histogram

 Note how the last value in counts is 11669. It's relevant to the
 output of table(crashes$hour):
     0     1     2     3     4     5     6     7     8     9    10
 11    12    13    14
  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
 6596  7152  7490  8166
    15    16    17    18    19    20    21    22    23
  9758 11301 11745  9943  7494  6272  6220  6000  5669

 Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is
 that correct for the histogram to combine hours 22 and 23? Since I
 specified right = FALSE, I figured there's no way 23 would be combined
 with 22?

 Adding breaks=24 to the hist makes no difference; it's still stuck at
 23 breaks. I also tried breaks=25 and 23 and several other values, in
 case I am misinterpreting breaks's meaning, but none of them make a
 difference.

 I imagine this is a n00b question, so my apologies if this is obvious.

 Aren


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Histogram omitting/collapsing groups

2011-12-31 Thread jim holtman
Here is a test I ran and looks fine, but then I created the data, so
it might have something to do with your data:

 x - sample(0:23, 10, TRUE)
 a - hist(x, breaks = 24)
 a[1:5]
$breaks
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

$counts
 [1] 8262 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132
4139 4231 4216 4158 4054 4185 4153
[21] 4281 4110 4221

$intensities
 [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155
0.04157 0.04203 0.04186 0.04158
[13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153
0.04281 0.04110 0.04221

$density
 [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155
0.04157 0.04203 0.04186 0.04158
[13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153
0.04281 0.04110 0.04221

$mids
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
13.5 14.5 15.5 16.5 17.5 18.5 19.5
[21] 20.5 21.5 22.5

 table(x)
x
   0123456789   10   11   12   13
 14   15   16   17   18   19   20
4168 4094 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132
4139 4231 4216 4158 4054 4185 4153
  21   22   23
4281 4110 4221



On Sat, Dec 31, 2011 at 11:20 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Hi,

 I think you're not understanding quite what's going on with hist. Reread the
 help, and take a look at this small example. The solution I'd use is the last
 item.

 x - rep(1:10, times=1:10)
 table(x)
 x
  1 2 3 4 5 6 7 8 9 10
  1 2 3 4 5 6 7 8 9 10


 hist(x, plot=FALSE, right=TRUE)$counts
 [1] 3 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=TRUE)$breaks
  [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=TRUE)$mids
 [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5


 hist(x, plot=FALSE, right=FALSE)$counts
 [1]  1  2  3  4  5  6  7  8 19
 hist(x, plot=FALSE, right=FALSE)$breaks
  [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=FALSE)$mids
 [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5


 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts
  [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks
  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids
  [1] 1 2 3 4 5 6 7 8 9 10


 Sarah

 On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre a...@arencambre.com wrote:
 I have two large datasets (156K and 2.06M records). Each row has the
 hour that an event happened, represented by an integer from 0 to 23.

 R's histogram is combining some data.

 Here's the command I ran to get the histogram:
 histinfo - hist(crashes$hour, right=FALSE)

 Here's histinfo:
 histinfo
 $breaks
  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

 $counts
  [1]  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
  6596  7152  7490  8166
 [16]  9758 11301 11745  9943  7494  6272  6220 11669

 $intensities
  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
 0.02937602 0.03930449
  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
 0.05223967 0.06242403
 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
 0.07464911

 $density
  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
 0.02937602 0.03930449
  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
 0.05223967 0.06242403
 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
 0.07464911

 $mids
  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
 13.5 14.5 15.5 16.5 17.5
 [19] 18.5 19.5 20.5 21.5 22.5

 $xname
 [1] crashes$hour

 $equidist
 [1] TRUE

 attr(,class)
 [1] histogram

 Note how the last value in counts is 11669. It's relevant to the
 output of table(crashes$hour):
     0     1     2     3     4     5     6     7     8     9    10
 11    12    13    14
  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
 6596  7152  7490  8166
    15    16    17    18    19    20    21    22    23
  9758 11301 11745  9943  7494  6272  6220  6000  5669

 Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is
 that correct for the histogram to combine hours 22 and 23? Since I
 specified right = FALSE, I figured there's no way 23 would be combined
 with 22?

 Adding breaks=24 to the hist makes no difference; it's still stuck at
 23 breaks. I also tried breaks=25 and 23 and several other values, in
 case I am misinterpreting breaks's meaning, but none of them make a
 difference.

 I imagine this is a n00b question, so my apologies if this is obvious.

 Aren


 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are 

Re: [R] Histogram omitting/collapsing groups

2011-12-31 Thread jim holtman
Fast fingers; notice that there is still a problem in the counts;  I
was only looking at the last.

Happy New Year -- up too late.

On Sun, Jan 1, 2012 at 12:33 AM, jim holtman jholt...@gmail.com wrote:
 Here is a test I ran and looks fine, but then I created the data, so
 it might have something to do with your data:

 x - sample(0:23, 10, TRUE)
 a - hist(x, breaks = 24)
 a[1:5]
 $breaks
  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

 $counts
  [1] 8262 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132
 4139 4231 4216 4158 4054 4185 4153
 [21] 4281 4110 4221

 $intensities
  [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155
 0.04157 0.04203 0.04186 0.04158
 [13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153
 0.04281 0.04110 0.04221

 $density
  [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155
 0.04157 0.04203 0.04186 0.04158
 [13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153
 0.04281 0.04110 0.04221

 $mids
  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
 13.5 14.5 15.5 16.5 17.5 18.5 19.5
 [21] 20.5 21.5 22.5

 table(x)
 x
   0    1    2    3    4    5    6    7    8    9   10   11   12   13
  14   15   16   17   18   19   20
 4168 4094 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132
 4139 4231 4216 4158 4054 4185 4153
  21   22   23
 4281 4110 4221



 On Sat, Dec 31, 2011 at 11:20 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Hi,

 I think you're not understanding quite what's going on with hist. Reread the
 help, and take a look at this small example. The solution I'd use is the last
 item.

 x - rep(1:10, times=1:10)
 table(x)
 x
  1 2 3 4 5 6 7 8 9 10
  1 2 3 4 5 6 7 8 9 10


 hist(x, plot=FALSE, right=TRUE)$counts
 [1] 3 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=TRUE)$breaks
  [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=TRUE)$mids
 [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5


 hist(x, plot=FALSE, right=FALSE)$counts
 [1]  1  2  3  4  5  6  7  8 19
 hist(x, plot=FALSE, right=FALSE)$breaks
  [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=FALSE)$mids
 [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5


 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts
  [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks
  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids
  [1] 1 2 3 4 5 6 7 8 9 10


 Sarah

 On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre a...@arencambre.com wrote:
 I have two large datasets (156K and 2.06M records). Each row has the
 hour that an event happened, represented by an integer from 0 to 23.

 R's histogram is combining some data.

 Here's the command I ran to get the histogram:
 histinfo - hist(crashes$hour, right=FALSE)

 Here's histinfo:
 histinfo
 $breaks
  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

 $counts
  [1]  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
  6596  7152  7490  8166
 [16]  9758 11301 11745  9943  7494  6272  6220 11669

 $intensities
  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
 0.02937602 0.03930449
  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
 0.05223967 0.06242403
 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
 0.07464911

 $density
  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
 0.02937602 0.03930449
  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
 0.05223967 0.06242403
 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
 0.07464911

 $mids
  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
 13.5 14.5 15.5 16.5 17.5
 [19] 18.5 19.5 20.5 21.5 22.5

 $xname
 [1] crashes$hour

 $equidist
 [1] TRUE

 attr(,class)
 [1] histogram

 Note how the last value in counts is 11669. It's relevant to the
 output of table(crashes$hour):
     0     1     2     3     4     5     6     7     8     9    10
 11    12    13    14
  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
 6596  7152  7490  8166
    15    16    17    18    19    20    21    22    23
  9758 11301 11745  9943  7494  6272  6220  6000  5669

 Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is
 that correct for the histogram to combine hours 22 and 23? Since I
 specified right = FALSE, I figured there's no way 23 would be combined
 with 22?

 Adding breaks=24 to the hist makes no difference; it's still stuck at
 23 breaks. I also tried breaks=25 and 23 and several other values, in
 case I am misinterpreting breaks's meaning, but none of them make a
 difference.

 I imagine this is a n00b question, so my apologies if this is obvious.

 Aren


 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 

Re: [R] Histogram omitting/collapsing groups

2011-12-31 Thread Joshua Wiley
If you just want a plot of the frequencies at each hour why not just call 
barplot on the output of table?  Histograms create bins and count in those, 
which doesn't sound like what you're after.

Cheers,

Josh


On Dec 31, 2011, at 21:37, jim holtman jholt...@gmail.com wrote:

 Fast fingers; notice that there is still a problem in the counts;  I
 was only looking at the last.
 
 Happy New Year -- up too late.
 
 On Sun, Jan 1, 2012 at 12:33 AM, jim holtman jholt...@gmail.com wrote:
 Here is a test I ran and looks fine, but then I created the data, so
 it might have something to do with your data:
 
 x - sample(0:23, 10, TRUE)
 a - hist(x, breaks = 24)
 a[1:5]
 $breaks
  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 
 $counts
  [1] 8262 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132
 4139 4231 4216 4158 4054 4185 4153
 [21] 4281 4110 4221
 
 $intensities
  [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155
 0.04157 0.04203 0.04186 0.04158
 [13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153
 0.04281 0.04110 0.04221
 
 $density
  [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155
 0.04157 0.04203 0.04186 0.04158
 [13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153
 0.04281 0.04110 0.04221
 
 $mids
  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
 13.5 14.5 15.5 16.5 17.5 18.5 19.5
 [21] 20.5 21.5 22.5
 
 table(x)
 x
   0123456789   10   11   12   13
  14   15   16   17   18   19   20
 4168 4094 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132
 4139 4231 4216 4158 4054 4185 4153
  21   22   23
 4281 4110 4221
 
 
 
 On Sat, Dec 31, 2011 at 11:20 AM, Sarah Goslee sarah.gos...@gmail.com 
 wrote:
 Hi,
 
 I think you're not understanding quite what's going on with hist. Reread the
 help, and take a look at this small example. The solution I'd use is the 
 last
 item.
 
 x - rep(1:10, times=1:10)
 table(x)
 x
  1 2 3 4 5 6 7 8 9 10
  1 2 3 4 5 6 7 8 9 10
 
 
 hist(x, plot=FALSE, right=TRUE)$counts
 [1] 3 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=TRUE)$breaks
  [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=TRUE)$mids
 [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
 
 
 hist(x, plot=FALSE, right=FALSE)$counts
 [1]  1  2  3  4  5  6  7  8 19
 hist(x, plot=FALSE, right=FALSE)$breaks
  [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, right=FALSE)$mids
 [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
 
 
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts
  [1] 1 2 3 4 5 6 7 8 9 10
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks
  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
 hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids
  [1] 1 2 3 4 5 6 7 8 9 10
 
 
 Sarah
 
 On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre a...@arencambre.com wrote:
 I have two large datasets (156K and 2.06M records). Each row has the
 hour that an event happened, represented by an integer from 0 to 23.
 
 R's histogram is combining some data.
 
 Here's the command I ran to get the histogram:
 histinfo - hist(crashes$hour, right=FALSE)
 
 Here's histinfo:
 histinfo
 $breaks
  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
 23
 
 $counts
  [1]  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
  6596  7152  7490  8166
 [16]  9758 11301 11745  9943  7494  6272  6220 11669
 
 $intensities
  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
 0.02937602 0.03930449
  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
 0.05223967 0.06242403
 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
 0.07464911
 
 $density
  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
 0.02937602 0.03930449
  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
 0.05223967 0.06242403
 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
 0.07464911
 
 $mids
  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
 13.5 14.5 15.5 16.5 17.5
 [19] 18.5 19.5 20.5 21.5 22.5
 
 $xname
 [1] crashes$hour
 
 $equidist
 [1] TRUE
 
 attr(,class)
 [1] histogram
 
 Note how the last value in counts is 11669. It's relevant to the
 output of table(crashes$hour):
 0 1 2 3 4 5 6 7 8 910
 11121314
  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
 6596  7152  7490  8166
151617181920212223
  9758 11301 11745  9943  7494  6272  6220  6000  5669
 
 Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is
 that correct for the histogram to combine hours 22 and 23? Since I
 specified right = FALSE, I figured there's no way 23 would be combined
 with 22?
 
 Adding breaks=24 to the hist makes no difference; it's still stuck at
 23 breaks. I also tried breaks=25 and 23 and several other values, in
 case I am