On Nov 8, 2009, at 8:38 AM, sylvain willart wrote:
Hi everyone,
I'm struggling with a little problem for a while, and I'm wondering if
anyone could help...
I have a dataset (from retailing industry) that indicates which brands
are present in a panel of 500 stores,
store , brand
1 , B1
1 , B2
1 , B3
2 , B1
2 , B3
3 , B2
3 , B3
3 , B4
I would like to know how many brands are present in each store,
I tried:
result <- aggregate(MyData$brand , by=list(MyData$store) , nlevels)
but I got:
Group.1 x
1 , 4
2 , 4
3 , 4
which is not exactly the result I expected
I would like to get sthg like:
Group.1 x
1 , 3
2 , 2
3 , 3
Try:
result <- aggregate(MyData$brand , by=list(MyData$store) , length)
Quick, easy and generalizes to other situations. The factor levels got
carried along identically, but length counts the number of elements in
the list returned by tapply.
Looking around, I found I can delete empty levels of factor using:
problem.factor <- problem.factor[,drop=TRUE]
If you reapply the function, factor, you get the same result. So you
could have done this:
> result <- aggregate(MyData$brand , by=list(MyData$store) ,
function(x) nlevels(factor(x)))
> result
Group.1 x
1 1 3
2 2 2
3 3 3
But this solution isn't handy for me as I have many stores and should
make a subset of my data for each store before dropping empty factor
I can't either counting the line for each store (N), because the same
brand can appear several times in each store (several products for the
same brand, and/or several weeks of observation)
I used to do this calculation using SAS with:
proc freq data = MyData noprint ; by store ;
tables brand / out = result ;
run ;
(the cool thing was I got a database I can merge with MyData)
any idea for doing that in R ?
Thanks in advance,
King Regards,
Sylvain Willart,
PhD Marketing,
IAE Lille, France
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.