Hi All,
Have you ever dreamt of running a full *Mesos* cluster but never had the
time to play with it?
Better, a Mesos cluster running Spark code on *Cassandra*?
Or wait. Maybe I'd also like to have it in *ElasticSearch AND Cassandra*.
Hmm, now we're talking!
*Aerospike* is really fast, I'd also like to have it.
At the same time, for the data lake, you're running *Hadoop* in prod, so
you'd also like to dump your stuff there.
While we are at it, *MongoDB* is nice as well, and our devs could use that
for the node.js apps.
Or maybe you don't want to choose, and you want them ALL at the same time.
You're 20 lines away from having your dream become true. Keep reading.
Spoiler: As you're all Juju users, you're also 35min away from having it
running for real in HA in your preferred cloud
A month ago I attended Strata+Hadoop in London, and I discovered a pretty
awesome piece of technology called Stratio (www.stratio.com)
Stratio is an open source Big Data analytics platform based on Spark. It
uses a data pipeline built on Kafka and Flume, backed by one or more of
Cassandra, MongoDB, ElasticSearch, HDFS or Aerospike (WIP) for the
resilient storage.
Analytics is provided by running Spark either in Standalone or in a Mesos
cluster, managed by ZooKeeper.
The ultimate version of Stratio is called Sparkta, and offers the ability
to describe data processing with a very simple JSON language that tells
input, output, processing to apply etc... (6 words only). Sparkta is due
for GA sometime this month.
Stratio deployer is based on Chef running from a specific node (Stratio
Admin). Hence charming the whole thing was pretty easy as the charm is a
wrapper around the chef based deployer, as if Juju was only managing the
resources and specifying them to Chef Server. Each node is built depending
on the relation that's created with the admin node (ZK, Mesos...).
I also designed 4 reference architectures based on each of the storage
backends. Each reference arch has:
* 1x Stratio Admin (there is no HA yet)
* 3x ZooKeeper
* 2x Mesos Master
* 3 instances of storage, also running Mesos Slaves for data locality. For
HDFS, it's actually 8 nodes (3x data, 3x journal, 2x name)
The code repositories lie in GitHub, but I push version to Launchpad at the
same time in my personal namespace (samuel-cozannet)
* Bundles: https://github.com/SaMnCo/bundle-stratio
* Charms:
* Admin: https://github.com/SaMnCo/charm-stratio-admin
* Node: https://github.com/SaMnCo/charm-stratio-node
* Discussion tracker:
https://groups.google.com/forum/?hl=fr#!topic/stratio-admin/KCth-xqZdM4
Next Steps:
* Clean up the code, make it faster (~35min deployment for now, should use
the framework to fasten that up)
* Add a demo use case, with Spark code that runs out of the box
* Charm Sparkta when it's ready. There is little documentation yet as the
project itself if really young. I'll be working with Stratio to make it
happen, hopefully supported by them over time.
* Charm Sparkta dashboard that shows results of analytics
Any feedback/questions more than welcome. I hope you'll find this platform
or some of its components useful. Stratio people are very nice and answer
quickly to questions, don't hesitate to reach out to them.
Best,
Samuel
--
Samuel Cozannet
Cloud, Big Data and IoT Strategy Team
Business Development - Cloud and ISV Ecosystem
Changing the Future of Cloud
Ubuntu http://ubuntu.com / Canonical UK LTD http://canonical.com / Juju
https://jujucharms.com
samuel.cozan...@canonical.com
mob: +33 616 702 389
skype: samnco
Twitter: @SaMnCo_23
--
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/juju