Thanks a lot for those solutions, Both are working great, and they do slightly different (but both very interesting) things, Moreover, I learned about the length() function ... one more to add to my personal cheat sheet King Regards
2009/11/8 David Winsemius <dwinsem...@comcast.net>: > > On Nov 8, 2009, at 9:11 AM, David Winsemius wrote: > >> >> On Nov 8, 2009, at 8:38 AM, sylvain willart wrote: >> >>> Hi everyone,h >>> >>> I'm struggling with a little problem for a while, and I'm wondering if >>> anyone could help... >>> >>> I have a dataset (from retailing industry) that indicates which brands >>> are present in a panel of 500 stores, >>> >>> store , brand >>> 1 , B1 >>> 1 , B2 >>> 1 , B3 >>> 2 , B1 >>> 2 , B3 >>> 3 , B2 >>> 3 , B3 >>> 3 , B4 >>> >>> I would like to know how many brands are present in each store, >>> >>> I tried: >>> result <- aggregate(MyData$brand , by=list(MyData$store) , nlevels) >>> >>> but I got: >>> Group.1 x >>> 1 , 4 >>> 2 , 4 >>> 3 , 4 >>> >>> which is not exactly the result I expected >>> I would like to get sthg like: >>> Group.1 x >>> 1 , 3 >>> 2 , 2 >>> 3 , 3 >> >> Try: >> >> result <- aggregate(MyData$brand , by=list(MyData$store) , length) >> >> Quick, easy and generalizes to other situations. The factor levels got >> carried along identically, but length counts the number of elements in the >> list returned by tapply. > > Which may not have been what you asked for as this would demonstrate. You > probably wnat the second solution: > mydata2 <- rbind(MyData, MyData) >> result <- aggregate(mydata2$brand , by=list(mydata2$store) , length) >> result > Group.1 x > 1 1 6 > 2 2 4 > 3 3 6 > >> result <- aggregate(mydata2$brand , by=list(mydata2$store) , function(x) >> nlevels(factor(x))) >> result > Group.1 x > 1 1 3 > 2 2 2 > 3 3 3 > >>> >>> Looking around, I found I can delete empty levels of factor using: >>> problem.factor <- problem.factor[,drop=TRUE] >> >> If you reapply the function, factor, you get the same result. So you could >> have done this: >> >> > result <- aggregate(MyData$brand , by=list(MyData$store) , function(x) >> > nlevels(factor(x))) >> > result >> Group.1 x >> 1 1 3 >> 2 2 2 >> 3 3 3 >> >> >> >>> But this solution isn't handy for me as I have many stores and should >>> make a subset of my data for each store before dropping empty factor >>> >>> I can't either counting the line for each store (N), because the same >>> brand can appear several times in each store (several products for the >>> same brand, and/or several weeks of observation) >>> >>> I used to do this calculation using SAS with: >>> proc freq data = MyData noprint ; by store ; >>> tables brand / out = result ; >>> run ; >>> (the cool thing was I got a database I can merge with MyData) >>> >>> any idea for doing that in R ? >>> >>> Thanks in advance, >>> >>> King Regards, >>> >>> Sylvain Willart, >>> PhD Marketing, >>> IAE Lille, France >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> Heritage Laboratories >> West Hartford, CT >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.