Hello,

As a project we currently don't have a lot of insight information on
about how CloudStack is being used. Surveys tell us a lot, but not
everybody fills in the survey, so we still miss a lot of information.

That's why I've written the Usage Reporting functionality for the
management server which automatically sends back anonymous information
about a CloudStack deployment.

It's currently in the 'reporter' branch. [0]

By default, every 7 days it generates a JSON document with:
- Hosts (Number, version, type, hypervisor)
- Clusters (Hypervisor en Management type)
- Primary storage (Type and provider)
- Zones (Network type and providers)
- Instances (Number and types)

This report is not complete yet, I'd like to add more information, but
that will be Management Server information.

The code on how this report is generated is obviously 100% Open Source,
so end-users can always exactly see how the information was compiled.

I want to discuss this new feature for CloudStack and the possible
implications it might have.

I'm opting for a opt-out. So every new or upgraded install to 4.6.0
(master) will have this enabled. Yes, we have to be very explicit in the
Release Notes that this has been added.

Why? It's the small price we as a project ask for using CloudStack. We
want a little bit of information on how CloudStack is being used so that
we can use this to make CloudStack even better.

Turning it off is also just one global setting and it will never turn on
again.

On the server-side there is a Python flask application [1] (found in the
reporter directory) which stores all the incoming information in a
ElasticSearch database. From there analytics can be gathered on
CloudStack deployments.

It currently points to http://cs-report.widodh.nl/report which will NOT
be the endpoint when this is merged into master.

For 'production' I want to have
https://report.cloudstack.apache.org/report where all reports are submitted.

For every setup a unique ID is determined by hashing the first row in
the 'version' table. This is the version + timestamp and that is hashed
using SHA256. Using this unique ID we can track changes in deployments
and see how they grow or shrink.

Technically this wasn't that hard to implement, but the politics
surrounding it might be the hardest part.

What do other have to say about this? Should there be a VOTE for this
feature to come into CloudStack? Opt-in, opt-out?

Wido

[0]:
https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=shortlog;h=refs/heads/reporter
[1]:
https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=blob;f=reporter/usage-report-collector.py;h=500a4d284b5172fd93acea08f5460cfff5520855;hb=reporter

Reply via email to