Re: [R] Counting things

2009-08-05 Thread Gabor Grothendieck
Try this using built in data frame iris:

> length(subset(iris, Sepal.Length >= 7, Sepal.Width)[[1]])
[1] 13
> length(subset(iris, Sepal.Length >= 7 & Species == 'virginica', 
> Sepal.Width)[[1]])
[1] 12

> # or the following (note that dot in Sepal.Length is automatically
> # converted to _ because dot has special meaning in sql)

> library(sqldf)
> sqldf("select count(*) from iris where Sepal_Length >= 7")
  count(*)
1   13
> sqldf("select count(*) from iris where Sepal_Length >= 7 and Species = 
> 'virginica'")
  count(*)
1   12

For the second part use cut to create a factor with the levels you
want

 iris$Sepal.Length.factor <- cut(iris$Sepal.Length, 4:8)

and then summarize as desired using sql such as:

> sqldf("select Sepal_Length_factor, avg(Sepal_Length), count(Sepal_Length) 
> from iris group by Sepal_Length_factor")
  Sepal_Length_factor avg(Sepal_Length) count(Sepal_Length)
1   (4,5]  4.787500  32
2   (5,6]  5.550877  57
3   (6,7]  6.473469  49
4   (7,8]  7.475000  12

or use summaryBy the in the doBy package.

See ?cut, ?subset, and in doBy see ?summaryBy  Also see
http://sqldf.googlecode.com

On Tue, Aug 4, 2009 at 11:40 PM, Noah Silverman wrote:
> I've completed an experiment and want to summarize the results.
>
> There are two things I like to create.
>
> 1) A simple count of things from the data.frame with predictions
>    1a) Number of predictions with probability greater than x
>    1b) Number of predictions with probability greater than x that are really
> true
>
>    In SQL, this would be,
>        "Select count(predictions) from data.frame where probability > x"
> "Select count(predictions) from data.frame where probability > x and label
> ='T' "
>
> How can I do this one in R?
>
>
> 2) I'd like to create what we call "binning".  It is a simple list of
> probability ranges and how accurate our model is.  The idea is to see how
> "true" our probabilities are.
> for example
>
> range        number of items        mean(probability)   true_accuracy
> 100-90%        20                            .924                    .90
> 90-80%          50                            .825                    .84
> 80-70%          214                          .75                      .71
> etc...
>
> It would be really great if I could also graph this!
>
> Is there any kind of package or way to do this in R
>
> Thanks!
>
> -N
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting things

2009-08-05 Thread William Dunlap

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Noah Silverman
> Sent: Tuesday, August 04, 2009 8:40 PM
> To: r help
> Subject: [R] Counting things
> 
> I've completed an experiment and want to summarize the results.
> 
> There are two things I like to create.
> 
> 1) A simple count of things from the data.frame with predictions
>  1a) Number of predictions with probability greater than x

sum(logicalVector) returns the number of TRUEs in logicalVector,
because it converts TRUE to 1 and FALSE to 0 before doing the sum.
You will have to use na.rm=TRUE if there are NA's (missing values)
in logical vector.  Hence you get compute 1a with
sum(probabilities>x)
mean(probabilities>x) will give the proportion of times probabilities>x
is
TRUE. table(probabilities>x) will give a count of both the FALSEs and
TRUEs.

>  1b) Number of predictions with probability greater than 
>   x that are  really true

sum(probabilities>x & label=="T")
(I'm guessing that label is a character or factor vector with values
"T" and "F".)

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 

> 
>  In SQL, this would be,
>  "Select count(predictions) from data.frame where 
> probability > x"
> "Select count(predictions) from data.frame where probability > x and 
> label ='T' "
> 
> How can I do this one in R?
> 
> 
> 2) I'd like to create what we call "binning".  It is a simple list of 
> probability ranges and how accurate our model is.  The idea is to see 
> how "true" our probabilities are.
> for example
> 
> rangenumber of itemsmean(probability)   true_accuracy
> 100-90%20.924 
>.90
> 90-80%  50.825
> .84
> 80-70%  214  .75  
> .71
> etc...
> 
> It would be really great if I could also graph this!
> 
> Is there any kind of package or way to do this in R
> 
> Thanks!
> 
> -N
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.