Hi Carol, all, You are right, numbers without context mean nothing. It is all about correlation. Yet, one must start to measure first before the insights can be created. But it must not be the end goal. It must all be seen in relation to adoption, community growth and health.
Best regards, Pierre Smits ORRTIZ.COM <http://www.orrtiz.com> OFBiz based solutions & services OFBiz Extensions Marketplace http://oem.ofbizci.net/oci-2/ On Wed, Mar 30, 2016 at 10:13 AM, Carol Pearson <[email protected]> wrote: > Hi, > > I've looked at a bunch of things to get a handle on our users, growth, and > what some other Apache projects have for committers and community. > Trafodion is a database project, so I went looking for real data, > everything from participation on our email lists (and new posts there) to > Jira activity to Github forks and pulls and commits. I also monitor some > more fanciful stats, looking for references to Trafodion on Twitter, > stackoverflow, etc. > > As far as email list activity goes, I use the data from the mailing list > archive. The user list was very quiet (fewer than 20 emails total from > when Trafodion started incubating through December. That's not very > inviting - our users who drove by to check us out didn't see much activity, > even though there was a lot. So I don't pay too much attention to data in > that range. Our user list has shown a big jump in usage in that period, > slightly cannibalizing the dev list. > > Here are the numbers I have for Jan/Feb/March. Sorry for the funky ascii > formatting, but mailing lists don't do attachments and tables very well: > > User List: > > MON Total Posts Distinct Non-Esgyn > Posters Posters > ====================================== > JAN2016 19 12 2 > FEB2016 291 42 6 > MAR2016 126 25 1 > > Dev List: > > MON Total Posts Distinct Non-Esgyn > Posters Posters > ====================================== > DEC2016 243 29 6 > JAN2016 199 24 3 > FEB2016 181 24 4 > MAR2016 200 31 4 > > > Note that Dec2016 was a release month and the Non-Esgyn posters were mostly > IPMC posters helping guide our release with respect to things like > licensing guidance. > > So we're seeing some additional participation but it's still heavily > dominated by Esgyn. > > I count distinct posters by email address, so posters that use two > different emails count twice. > > We have google analytics on the newly-redesigned website. It shows similar > numbers of hits between new users and returning users, but I'm not sure how > significant that is, since many returning users from Esgyn don't need to > re-hit the website. > > Still, data is data, and here's a sample for the period from 29Feb through > today, 29Mar: > > Metric New User Returning User Total > ======================================== > Sessions 885 895 1780 > %New Sessions 100% 0% 49.72% > Bounce Rate 60% 48.83% 54.38% > Pages/Session 2.09 2.39 2.24 > Avg Session 02:01 02:57 02:29 > Duration > > And so on. > > > But one thing I've learned over the years is that numbers are just.... > numbers. These are nice (and I have plenty more), but the real question > is, "what's a good score?" What's typical for Apache projects for > committer distribution? What's typical for user list activity? > > I started with the first question: Where do committers come from and what's > their distribution? I used the Apache committer lists and the websites > that indicated committer affiliation. This wasn't perfect: Some project > don't have committer affiliation; I can't trust others to be perfectly > up-to-date. Further, it doesn't indicate committer activity. Still, it > gives some targets. > > After I started, I refined the data a little bit by looking for projects > similar to Trafodion along a couple of possible vectors: data management > or Hadoop/Big Data ecosystem and recently graduated. The latter category > is particularly interesting to me because I would expect more diversity of > committers over time, if only because developers move around. > > I was not able to collect data on currently incubating projects because the > list of committers I worked from on ASF did not include incubating projects > in the phonebook, though the reports have them and many project websites > have them. I was more interested in projects that climbed the mountain > we're trying to climb: > > Here's some of the data I collected back in February > > Trafodion: > ORG Count Pct > ================ > Esgyn 10 66.67% > orrtiz.com 1 06.67 > Unvailable 4 28% > /Inactive > Total 15 > > HBase: > ======================== > Cloudera 12 26% > Continuuity 1 2% > Dropbox 1 2% > Explorys 1 2% > Facebook 9 19% > Hortonworks 7 15% > IBM 1 2% > Intel 2 4% > Salesforce.com 3 6% > Scaled Risk 1 2% > Taobao 1 2% > unaffiliated 1 2% > WANdisco 1 2% > Xiaomi 4 9% > Yahoo! 1 2% > Yuantiku 1 2% > > > Formatting this is getting crazy and it's getting late since I was up early > travelling. I'll just C&P and my apologies for the alignment > > Ignite: Graduated Sept 2015 > ChronoTrack 1 4% > CyberAgent, Inc. 1 4% > Engiweb Security 1 4% > Evosent Consulting 1 4% > Fitech Source 1 4% > GridGain 14 58% > Pivotal 1 4% > Shoutlet 1 4% > Trend Micro 1 4% > WANdisco 2 8% > Grand Total 24 > > Calcite: Graduated Nov 2015 > Dremio 1 7% > Hortonworks 7 47% > Intel 1 7% > MapR 3 20% > NetCracker 1 7% > NGData 1 7% > Salesforce 1 7% > Grand Total 15 > > Or > > Count > > Spark: > > Alibaba 1 2% > > Bizo 1 2% > > ClearStory Data 1 2% > > Cloudera 4 9% > > Databricks 15 34% > > Databricks, MIT 1 2% > > Facebook 1 2% > > Hortonworks 1 2% > > IBM 1 2% > > Intel 2 5% > > Mxit 1 2% > > Netflix 1 2% > > NTT Data 1 2% > > Quantifind 1 2% > > QuestTec B.V. 1 2% > > Tachyon Nexus 1 2% > > UC Berkeley 5 11% > > University of Michigan, Ann Arbor 1 2% > > Webtrends 1 2% > > Yahoo! 3 7% > > Grand Total 44 > > > > I have a spreadsheet with a bunch more companies. I'll send it to anyone > who > > asks - the data was all gleaned publicly. > > > Anyway, the upshot from what I saw was that even recently graduated > projects > > had 50-60% at most of active committers from one company (and I would guess > > are moving away from that as apart of the apache way. > > > > I have a spreadsheet that I'm happy to send to anyone who wants it - the > data was all gleaned publicly. > > The upshot from what I saw was that even recently graduated projects are > typically in the 50-60% range of committers from a single company. The > largest percent I saw was 76% on the Ambari project. > > So that's some of the user data/growth data I have. Apparently, I'm more > of a data junky than I thought.... > > -Carol P. > > > --------------------------------------------------------------- > Email: [email protected] > Twitter: @CarolP222 > --------------------------------------------------------------- > > On Tue, Mar 29, 2016 at 6:57 PM, Andrew Purtell <[email protected]> > wrote: > > > On Tue, Mar 29, 2016 at 10:01 AM, Pierre Smits <[email protected]> > > wrote: > > > > > A > > > distribution with Apache only elements (Hadoop, HBase, Zookeeper, > Ambari, > > > etc) would surely be a nice-to-have, and also a means to show > > cross-selling > > > Apache products that could lead to cross-pollination (adoption and > > > community growth wise). > > > > > > > That's known as Apache Bigtop. > > > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > >
