Re: [R] How to group by then count?

2015-01-06 Thread Marc Schwartz

 On Jan 6, 2015, at 3:29 PM, Monnand monn...@gmail.com wrote:
 
 Thank you, all! Your replies are very useful, especially Don's explanation!
 
 One complaint I have is: the function name (talbe) is really not very
 informative.


Why not? You used the word 'table' in your original post, except as Don noted, 
you were overthinking the problem.

The basic concept is a tabulation of discrete values in a vector, which is a 
basic analytic method.

Using commands like:

  ??table
  ??frequency

would have led you to the table() function, as well as others.

Believe it or not, taking a few minutes to have read/searched An Introduction 
to R, which is the basic R manual, would have led you to the same solution:

  
http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Frequency-tables-from-factors

Regards,

Marc Schwartz


 
 On Sun Jan 04 2015 at 5:03:47 PM MacQueen, Don macque...@llnl.gov wrote:
 
 This seems to me to be a case where thinking in terms of computer
 programming concepts is getting in the way a bit. Approach it as a data
 analysis task; the S language (upon which R is based) is designed in part
 for data analysis so there is a function that does most of the job for you.
 
 (I changed your vector of strings to make the result more easily
 interpreted)
 
 x = c(1, 1, 2, 1, 5, 2,'3','5','5','2','2')
 tmp - table(x)  ## counts the number of appearances of each element
 tmp[tmp==max(tmp)]   ## finds which one occurs most often
 2
 4
 
 Meaning that the element '2' appears 4 times.  The table() function should
 be fast even with long vectors. Here's an example with a vector of length
 1 million:
 
 foo - table( sample(letters, 1e6, replace=TRUE) )
 
 
 One of the seminal books on the S language is John M Chambers' Programming
 with Data -- and I would emphasize the with Data part of that title.
 
 --
 
 Don MacQueen
 
 Lawrence Livermore National Laboratory
 7000 East Ave., L-627
 Livermore, CA 94550
 925-423-1062
 
 
 
 
 
 On 1/4/15, 1:02 AM, Monnand monn...@gmail.com wrote:
 
 Hi all,
 
 I thought this was a very naive problem but I have not found any solution
 which is idiomatic to R.
 
 The problem is like this:
 
 Assuming we have vector of strings:
 x = c(1, 1, 2, 1, 5, 2)
 
 We want to count number of appearance of each string. i.e. in vector x,
 string 1 appears 3 times; 2 appears twice and 5 appears once. Then I
 want to know which string is the majority. In this case, it is 1.
 
 For imperative languages like C, C++ Java and python, I would use a hash
 table to count each strings where keys are the strings and values are the
 number of appearance. For functional languages like clojure, there're
 higher order functions like group-by.
 
 However, for R, I can hardly find a good solution to this simple problem.
 I
 found a hash package, which implements hash table. However, installing a
 package simple for a hash table is really annoying for me. I did find
 aggregate and other functions which operates on data frames. But in my
 case, it is a simple vector. Converting it to a data frame may be not
 desirable. (Or is it?)
 
 Could anyone suggest me an idiomatic way of doing such job in R? I would
 be
 appreciate for your help!
 
 -Monnand

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to group by then count?

2015-01-06 Thread Monnand
Thank you, all! Your replies are very useful, especially Don's explanation!

One complaint I have is: the function name (talbe) is really not very
informative.

On Sun Jan 04 2015 at 5:03:47 PM MacQueen, Don macque...@llnl.gov wrote:

 This seems to me to be a case where thinking in terms of computer
 programming concepts is getting in the way a bit. Approach it as a data
 analysis task; the S language (upon which R is based) is designed in part
 for data analysis so there is a function that does most of the job for you.

 (I changed your vector of strings to make the result more easily
 interpreted)

  x = c(1, 1, 2, 1, 5, 2,'3','5','5','2','2')
  tmp - table(x)  ## counts the number of appearances of each element
  tmp[tmp==max(tmp)]   ## finds which one occurs most often
 2
 4

 Meaning that the element '2' appears 4 times.  The table() function should
 be fast even with long vectors. Here's an example with a vector of length
 1 million:

 foo - table( sample(letters, 1e6, replace=TRUE) )


 One of the seminal books on the S language is John M Chambers' Programming
 with Data -- and I would emphasize the with Data part of that title.

 --

 Don MacQueen

 Lawrence Livermore National Laboratory
 7000 East Ave., L-627
 Livermore, CA 94550
 925-423-1062





 On 1/4/15, 1:02 AM, Monnand monn...@gmail.com wrote:

 Hi all,
 
 I thought this was a very naive problem but I have not found any solution
 which is idiomatic to R.
 
 The problem is like this:
 
 Assuming we have vector of strings:
  x = c(1, 1, 2, 1, 5, 2)
 
 We want to count number of appearance of each string. i.e. in vector x,
 string 1 appears 3 times; 2 appears twice and 5 appears once. Then I
 want to know which string is the majority. In this case, it is 1.
 
 For imperative languages like C, C++ Java and python, I would use a hash
 table to count each strings where keys are the strings and values are the
 number of appearance. For functional languages like clojure, there're
 higher order functions like group-by.
 
 However, for R, I can hardly find a good solution to this simple problem.
 I
 found a hash package, which implements hash table. However, installing a
 package simple for a hash table is really annoying for me. I did find
 aggregate and other functions which operates on data frames. But in my
 case, it is a simple vector. Converting it to a data frame may be not
 desirable. (Or is it?)
 
 Could anyone suggest me an idiomatic way of doing such job in R? I would
 be
 appreciate for your help!
 
 -Monnand
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to group by then count?

2015-01-04 Thread Christian Brandstätter
Dear Monnad,

one possible way would be to use as.factor() and in the summary you would get 
counts for every level.

Like this:

  x = c(1, 1, 2, 1, 5, 2)

summary(as.factor(x))

Cheers, Christian


 Hi all,

 I thought this was a very naive problem but I have not found any solution
 which is idiomatic to R.

 The problem is like this:

 Assuming we have vector of strings:
   x = c(1, 1, 2, 1, 5, 2)

 We want to count number of appearance of each string. i.e. in vector x,
 string 1 appears 3 times; 2 appears twice and 5 appears once. Then I
 want to know which string is the majority. In this case, it is 1.

 For imperative languages like C, C++ Java and python, I would use a hash
 table to count each strings where keys are the strings and values are the
 number of appearance. For functional languages like clojure, there're
 higher order functions like group-by.

 However, for R, I can hardly find a good solution to this simple problem. I
 found a hash package, which implements hash table. However, installing a
 package simple for a hash table is really annoying for me. I did find
 aggregate and other functions which operates on data frames. But in my
 case, it is a simple vector. Converting it to a data frame may be not
 desirable. (Or is it?)

 Could anyone suggest me an idiomatic way of doing such job in R? I would be
 appreciate for your help!

 -Monnand

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to group by then count?

2015-01-04 Thread Berend Hasselman

 On 04-01-2015, at 10:02, Monnand monn...@gmail.com wrote:
 
 Hi all,
 
 I thought this was a very naive problem but I have not found any solution
 which is idiomatic to R.
 
 The problem is like this:
 
 Assuming we have vector of strings:
 x = c(1, 1, 2, 1, 5, 2)
 
 We want to count number of appearance of each string. i.e. in vector x,
 string 1 appears 3 times; 2 appears twice and 5 appears once. Then I
 want to know which string is the majority. In this case, it is 1.
 
 For imperative languages like C, C++ Java and python, I would use a hash
 table to count each strings where keys are the strings and values are the
 number of appearance. For functional languages like clojure, there're
 higher order functions like group-by.
 
 However, for R, I can hardly find a good solution to this simple problem. I
 found a hash package, which implements hash table. However, installing a
 package simple for a hash table is really annoying for me. I did find
 aggregate and other functions which operates on data frames. But in my
 case, it is a simple vector. Converting it to a data frame may be not
 desirable. (Or is it?)
 
 Could anyone suggest me an idiomatic way of doing such job in R? I would be
 appreciate for your help!
 

Have a look at table:

?table

Berend

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to group by then count?

2015-01-04 Thread MacQueen, Don
This seems to me to be a case where thinking in terms of computer
programming concepts is getting in the way a bit. Approach it as a data
analysis task; the S language (upon which R is based) is designed in part
for data analysis so there is a function that does most of the job for you.

(I changed your vector of strings to make the result more easily
interpreted)

 x = c(1, 1, 2, 1, 5, 2,'3','5','5','2','2')
 tmp - table(x)  ## counts the number of appearances of each element
 tmp[tmp==max(tmp)]   ## finds which one occurs most often
2 
4 

Meaning that the element '2' appears 4 times.  The table() function should
be fast even with long vectors. Here's an example with a vector of length
1 million:

foo - table( sample(letters, 1e6, replace=TRUE) )


One of the seminal books on the S language is John M Chambers' Programming
with Data -- and I would emphasize the with Data part of that title.

-- 

Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 1/4/15, 1:02 AM, Monnand monn...@gmail.com wrote:

Hi all,

I thought this was a very naive problem but I have not found any solution
which is idiomatic to R.

The problem is like this:

Assuming we have vector of strings:
 x = c(1, 1, 2, 1, 5, 2)

We want to count number of appearance of each string. i.e. in vector x,
string 1 appears 3 times; 2 appears twice and 5 appears once. Then I
want to know which string is the majority. In this case, it is 1.

For imperative languages like C, C++ Java and python, I would use a hash
table to count each strings where keys are the strings and values are the
number of appearance. For functional languages like clojure, there're
higher order functions like group-by.

However, for R, I can hardly find a good solution to this simple problem.
I
found a hash package, which implements hash table. However, installing a
package simple for a hash table is really annoying for me. I did find
aggregate and other functions which operates on data frames. But in my
case, it is a simple vector. Converting it to a data frame may be not
desirable. (Or is it?)

Could anyone suggest me an idiomatic way of doing such job in R? I would
be
appreciate for your help!

-Monnand

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.