Hi all,

As those of you who have been around for a while might know, I am very
interested in community metrics, and I think it's important that those
metrics be as visible and transparent as possible.

I've been working on what needs to happen to get a community dashboard
going, with a web page updated in real time (or close to it) with the
data from the various sources. I'd like to support some more complex
querying & reports, and ideally allow integration with office suites too.

The goal is to save time of people who have been manually extracting
this data monthly for Dawn (and of course save Dawn some time in
preparing her monthly reports), and also ensure that everyone has timely
access to this data in a nice way.

You can follow along the progress and help out in the wiki:
http://wiki.meego.com/Metrics/Dashboard

The basic idea is to automate this:

* Gather data from various data sources into a usable form:
 * Drupal (user accounts, active users, ...)
 * git (commits, active developers, ...)
 * Mailing lists (emails per list, per user, per month, ...)
 * Wikipedia (Popular pages, active users, total edits, ...)
 * Bugzilla (new bugs, closed bugs, new comments, patch proposals, patch
review, ...)
 * Forums (new posts, new users, active users, ...)
 * IRC (Active users, total activity, across various channels)
 * Transifex (New translations, active translators, ...)
 * Community OBS (uploads, active users, popular downloads, ...)
 * SDK downloads and other web analytics (would be nice to get)
* Transform the data into useful metrics via queries
* Present the data in graphical form showing the evolution of the
various stats from week to week or month to month


The usual way of doing this is to use some kind of Business Intelligence
platform to suck data in using an ETL engine, store it in some kind of
local data store, create and store reports using a Reports engine, and
present the data in a dashboard which can be a thin or thick client.

The two open source BI engines worth considering are Pentaho and Jaspersoft.

So far, I'm leaning towards Pentaho, primarily because there seems to be
better supprot for dashboards within the community, but I am open to
input & any experiences that people might have.

I am also open to any input people might have in helping integrate some
of these services together. Right now, I have local instances of much
(but not all) of the software, and we will need to figure out
interchange formats for everything - anything with a MySQL database
should be straightforward, but for things like downloads, web analytics,
mailing lists, IRC and the forum, where the data will be going through a
different format, the integration might be a little trickier.

For the time being, I will be investigating Kettle and JasperETL (which,
as far as I can tell, is just a rebranded Talend Open Studio) and
figuring out how to get data into the BI server from the various apps.

So what specific feedback would I like?

* What are the useful data that people would like to know about the
various meego.com services?
* Anyone have experience with Jasper & Pentaho and can give sensible
feedback on the advantages & disadvantages of each?
* Does anyone want to help with the integration of the various services,
once I get a public BI platform installed?
* Anyone think that I'm smoking crack with this basic architecture, and
care to suggest something simpler/quicker/easier?

Thanks!
Dave.

-- 
Email: dne...@maemo.org
Jabber: bo...@jabber.org

_______________________________________________
MeeGo-community mailing list
MeeGo-community@meego.com
http://lists.meego.com/listinfo/meego-community

Reply via email to