Hi,
I've looked at a bunch of things to get a handle on our users, growth, and
what some other Apache projects have for committers and community.
Trafodion is a database project, so I went looking for real data,
everything from participation on our email lists (and new posts there) to
Jira activity to Github forks and pulls and commits. I also monitor some
more fanciful stats, looking for references to Trafodion on Twitter,
stackoverflow, etc.
As far as email list activity goes, I use the data from the mailing list
archive. The user list was very quiet (fewer than 20 emails total from
when Trafodion started incubating through December. That's not very
inviting - our users who drove by to check us out didn't see much activity,
even though there was a lot. So I don't pay too much attention to data in
that range. Our user list has shown a big jump in usage in that period,
slightly cannibalizing the dev list.
Here are the numbers I have for Jan/Feb/March. Sorry for the funky ascii
formatting, but mailing lists don't do attachments and tables very well:
User List:
MON Total Posts Distinct Non-Esgyn
Posters Posters
======================================
JAN2016 19 12 2
FEB2016 291 42 6
MAR2016 126 25 1
Dev List:
MON Total Posts Distinct Non-Esgyn
Posters Posters
======================================
DEC2016 243 29 6
JAN2016 199 24 3
FEB2016 181 24 4
MAR2016 200 31 4
Note that Dec2016 was a release month and the Non-Esgyn posters were mostly
IPMC posters helping guide our release with respect to things like
licensing guidance.
So we're seeing some additional participation but it's still heavily
dominated by Esgyn.
I count distinct posters by email address, so posters that use two
different emails count twice.
We have google analytics on the newly-redesigned website. It shows similar
numbers of hits between new users and returning users, but I'm not sure how
significant that is, since many returning users from Esgyn don't need to
re-hit the website.
Still, data is data, and here's a sample for the period from 29Feb through
today, 29Mar:
Metric New User Returning User Total
========================================
Sessions 885 895 1780
%New Sessions 100% 0% 49.72%
Bounce Rate 60% 48.83% 54.38%
Pages/Session 2.09 2.39 2.24
Avg Session 02:01 02:57 02:29
Duration
And so on.
But one thing I've learned over the years is that numbers are just....
numbers. These are nice (and I have plenty more), but the real question
is, "what's a good score?" What's typical for Apache projects for
committer distribution? What's typical for user list activity?
I started with the first question: Where do committers come from and what's
their distribution? I used the Apache committer lists and the websites
that indicated committer affiliation. This wasn't perfect: Some project
don't have committer affiliation; I can't trust others to be perfectly
up-to-date. Further, it doesn't indicate committer activity. Still, it
gives some targets.
After I started, I refined the data a little bit by looking for projects
similar to Trafodion along a couple of possible vectors: data management
or Hadoop/Big Data ecosystem and recently graduated. The latter category
is particularly interesting to me because I would expect more diversity of
committers over time, if only because developers move around.
I was not able to collect data on currently incubating projects because the
list of committers I worked from on ASF did not include incubating projects
in the phonebook, though the reports have them and many project websites
have them. I was more interested in projects that climbed the mountain
we're trying to climb:
Here's some of the data I collected back in February
Trafodion:
ORG Count Pct
================
Esgyn 10 66.67%
orrtiz.com 1 06.67
Unvailable 4 28%
/Inactive
Total 15
HBase:
========================
Cloudera 12 26%
Continuuity 1 2%
Dropbox 1 2%
Explorys 1 2%
Facebook 9 19%
Hortonworks 7 15%
IBM 1 2%
Intel 2 4%
Salesforce.com 3 6%
Scaled Risk 1 2%
Taobao 1 2%
unaffiliated 1 2%
WANdisco 1 2%
Xiaomi 4 9%
Yahoo! 1 2%
Yuantiku 1 2%
Formatting this is getting crazy and it's getting late since I was up early
travelling. I'll just C&P and my apologies for the alignment
Ignite: Graduated Sept 2015
ChronoTrack 1 4%
CyberAgent, Inc. 1 4%
Engiweb Security 1 4%
Evosent Consulting 1 4%
Fitech Source 1 4%
GridGain 14 58%
Pivotal 1 4%
Shoutlet 1 4%
Trend Micro 1 4%
WANdisco 2 8%
Grand Total 24
Calcite: Graduated Nov 2015
Dremio 1 7%
Hortonworks 7 47%
Intel 1 7%
MapR 3 20%
NetCracker 1 7%
NGData 1 7%
Salesforce 1 7%
Grand Total 15
Or
Count
Spark:
Alibaba 1 2%
Bizo 1 2%
ClearStory Data 1 2%
Cloudera 4 9%
Databricks 15 34%
Databricks, MIT 1 2%
Facebook 1 2%
Hortonworks 1 2%
IBM 1 2%
Intel 2 5%
Mxit 1 2%
Netflix 1 2%
NTT Data 1 2%
Quantifind 1 2%
QuestTec B.V. 1 2%
Tachyon Nexus 1 2%
UC Berkeley 5 11%
University of Michigan, Ann Arbor 1 2%
Webtrends 1 2%
Yahoo! 3 7%
Grand Total 44
I have a spreadsheet with a bunch more companies. I'll send it to anyone who
asks - the data was all gleaned publicly.
Anyway, the upshot from what I saw was that even recently graduated
projects
had 50-60% at most of active committers from one company (and I would guess
are moving away from that as apart of the apache way.
I have a spreadsheet that I'm happy to send to anyone who wants it - the
data was all gleaned publicly.
The upshot from what I saw was that even recently graduated projects are
typically in the 50-60% range of committers from a single company. The
largest percent I saw was 76% on the Ambari project.
So that's some of the user data/growth data I have. Apparently, I'm more
of a data junky than I thought....
-Carol P.
---------------------------------------------------------------
Email: [email protected]
Twitter: @CarolP222
---------------------------------------------------------------
On Tue, Mar 29, 2016 at 6:57 PM, Andrew Purtell <[email protected]> wrote:
> On Tue, Mar 29, 2016 at 10:01 AM, Pierre Smits <[email protected]>
> wrote:
>
> > A
> > distribution with Apache only elements (Hadoop, HBase, Zookeeper, Ambari,
> > etc) would surely be a nice-to-have, and also a means to show
> cross-selling
> > Apache products that could lead to cross-pollination (adoption and
> > community growth wise).
> >
>
> That's known as Apache Bigtop.
>
>
>
> --
> Best regards,
>
> - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>