Re: [R] Counting non-empty levels of a factor

David Winsemius Sun, 08 Nov 2009 06:26:42 -0800


On Nov 8, 2009, at 9:11 AM, David Winsemius wrote:

On Nov 8, 2009, at 8:38 AM, sylvain willart wrote:
Hi everyone,
I'm struggling with a little problem for a while, and I'm wonderingif
anyone could help...
I have a dataset (from retailing industry) that indicates whichbrands
are present in a panel of 500 stores,

store , brand
1 , B1
1 , B2
1 , B3
2 , B1
2 , B3
3 , B2
3 , B3
3 , B4

I would like to know how many brands are present in each store,

I tried:
result <- aggregate(MyData$brand , by=list(MyData$store) , nlevels)

but I got:
Group.1 x
1 , 4
2 , 4
3 , 4

which is not exactly the result I expected
I would like to get sthg like:
Group.1 x
1 , 3
2 , 2
3 , 3
Try:

result <- aggregate(MyData$brand , by=list(MyData$store) , length)
Quick, easy and generalizes to other situations. The factor levelsgot carried along identically, but length counts the number ofelements in the list returned by tapply.

Which may not have been what you asked for as this would demonstrate.You probably wnat the second solution:

mydata2 <- rbind(MyData, MyData)
> result <- aggregate(mydata2$brand , by=list(mydata2$store) , length)
> result
  Group.1 x
1       1 6
2       2 4
3       3 6

> result <- aggregate(mydata2$brand , by=list(mydata2$store) ,function(x) nlevels(factor(x)))

> result
  Group.1 x
1       1 3
2       2 2
3       3 3


Looking around, I found I can delete empty levels of factor using:
problem.factor <- problem.factor[,drop=TRUE]

If you reapply the function, factor, you get the same result. So youcould have done this:

> result <- aggregate(MyData$brand , by=list(MyData$store) ,function(x) nlevels(factor(x)))

> result
 Group.1 x
1       1 3
2       2 2
3       3 3

But this solution isn't handy for me as I have many stores and should
make a subset of my data for each store before dropping empty factor

I can't either counting the line for each store (N), because the same

brand can appear several times in each store (several products forthe

same brand, and/or several weeks of observation)

I used to do this calculation using SAS with:
proc freq data = MyData noprint ; by store ;
tables  brand / out = result ;
run ;
(the cool thing was I got a database I can merge with MyData)

any idea for doing that in R ?

Thanks in advance,

King Regards,

Sylvain Willart,
PhD Marketing,
IAE Lille, France

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting non-empty levels of a factor

Reply via email to