Re: [PROPOSAL] Superset Proposal for Apache Incubator

Jean-Baptiste Onofré Wed, 12 Apr 2017 12:51:34 -0700

Hi Maxime,

The proposal looks interesting.


Just a note,  it's PPMC (not PMC) during incubation.

Are you seeking for other mentor (I see you only have one mentor and onechampion for now) ?


Regards
JB

On 04/12/2017 09:41 PM, Maxime Beauchemin wrote:

Hi all,

We would love feedback on the proposal. Do the veterans on this mailing
list think that the proposal is ready for a vote!?

Thanks,

Max

On Tue, Apr 4, 2017 at 5:26 PM, Luke Han <luke...@gmail.com> wrote:

Hi Jeff,
    This is great project which have been mentioned many times in
community. It looks cool and fun for data works.

    Thanks to proposal Superset to be Apache Incubator Project, please let
me know if there's anything I could help.

    Thanks.
Luke


Best Regards!
---------------------

Luke Han

On Sun, Apr 2, 2017 at 7:45 AM, Jeff Feng <jeff.f...@airbnb.com.invalid>
wrote:

Dear Apache Incubator Community,

We are excited to share our proposal for discussion and feedback for
entering Apache Incubation.  Superset is an enterprise-ready web
application for data exploration, data visualization and dashboarding.

Our Incubation proposal is at the following Wiki as well as copied in the
email below:

https://wiki.apache.org/incubator/SupersetProposal

We have an active Superset community including 400+ members and nearly

topics.  The Google Group can be found below.  We plan to move the
discussion to the ASF:

https://groups.google.com/forum/#!forum/airbnb_superset

Thank you and look forward to the discussion!

Jeff, Max & Alanna




= Superset =

== Abstract ==

Superset is an enterprise-ready web application for data exploration,

data

visualization and dashboarding.

== Proposal ==

Superset is business intelligence (BI) software that helps modern
organizations visualize and interact with their data. Superset enables
users explore data from a variety of databases, assemble beautiful
dashboards and share their findings.  Superset works neatly with all

modern

SQL-speaking databases, and integrates with Druid.io to provide

real-time,

interactive, blazing fast data access to large datasets.

== Background ==

Data is mission critical. To succeed in this era, organizations need to
provide low-friction, intuitive and interactive access to data. It is
paramount for knowledge workers to be capable of answering their own
questions by querying, exploring and visualizing data.

The entire business intelligence industry has pivoted from a model of
centralized top-down platforms driven by IT organizations to self-service
analytics and agile workflows by any user.  This shift unblocks

centralized

service bottlenecks for creating data visualizations while also creating

an

environment that is iterative and fast-moving.  This means that business
intelligence software must also be easy and delightful to use.
Self-service analytics doesn’t mean that admin and governance features

are

not needed.

Modern BI tools provide fine-grain access controls and auditing
capabilities to understand how data is being used.  Superset is a

solution

that delivers on all of these vectors.

The technology stack is also constantly morphing - vendors are struggling
to provide cheap, quick and easy solutions to access data.  Business
intelligence users are finding existing solutions lacking as these

software

products either disregard or react slowly to recent game-changing
technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, d3.js,
React.js and iPython’s Jupyter for instance.

== Rationale ==

Business intelligence is more relevant today than at any other point in
history.  Organizations are currently very limited in options for open
source data visualization solutions, especially solutions that are both
self-service and enterprise-ready.  Every company informing their

decisions

with data needs a BI tool.

We believe that Superset will be a strong compliment to existing Apache
Software Foundation technologies by offering scalable user interactions

to

distributed storage and computation solutions.  Users will often find

that

Superset can act as a catalyst for tooling that can visualize the

byproduct

of data and computation infrastructure.

Superset has many key design elements that help fill a gap in current
solutions for organizations:

* Easy, low friction access to data through a simple, web-based data
exploration interface.  Composing charts and dashboards are intuitive.
Eliminating the need to write code or SQL empowers anyone to use it.

* Access to a wide array of rich, interactive data visualization types.

* Enterprise-ready: Integration with different authentication mechanisms
and granular permissions centered around actions and data access.

* Realtime & fast: Superset provides realtime analytics at the speed of
thought on very large datasets when integrated with Druid.io.

* Broad data access: Consume data out of any SQL-speaking relational
database.

* Extensible: Can be extended to talk to many noSQL databases like Apache
Drill, Elastic Search, and other popular database engines.

* Fast loading dashboards with configurable web-scale caching.

* Plug-in framework that enables organizations to build custom analytical
applications with new UI/UX interfaces.

* SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking users

with

more flexibility.  SQL Lab integrates with the visualization engine
seamlessly.

== Initial Goals ==

The initial goals of the Superset project are several-fold:

Move the existing codebase to Apache and integrate with the Apache
development process.

Redesign the user interface and interaction model for creating
visualizations/dashboards and connecting to data sources

Build robust support for security and governance of the tool including
popular authorization modules (including Apache Ranger and Apache Sentry)
and a more sophisticated permissions system

Grow the extensibility of the project both in terms of enhanced
connectivity to NoSQL-based data sources and creating a plug-in framework
that enables organizations to build custom analytical applications which
require a new UI/UX

== Current Status ==

By many standards, Superset is already a successful open source project.

As

of March 2017, Superset is officially used in production at about a dozen
companies, has received contributions from over one hundred contributors

on

Github, 1500+ forks, and 12k+ stars.

Sizeable companies like Airbnb, Yahoo! and Hortonworks have made
significant contributions, and expressed their commitment to the project.
The product is feature complete and has been viable for months. It

already

serves as the main interface for consuming data at many companies of
different sizes.

While the product is usable, there’s room for improvement across the

board,

starting with providing a smoother user experience around content

creation,

making sure all features work out-of-the-box on more platforms and
databases, providing better user training guides and videos, having a
predictable release process, and increasing the overall quality of the
Superset releases.

=== Meritocracy ===

We plan to invest in supporting a meritocracy. We will discuss the
requirements in an open forum. Several companies have expressed interest

in

this project, and we intend to invite additional developers to

participate.

We will encourage and monitor community participation so that privileges
can be extended to those that contribute.

=== Community ===

The need for an enterprise-ready data visualization and exploration
platform in the open source community is tremendous.  While Superset is
fairly well known, recognized and used within the Druid.io community,
adoption is currently limited outside of that niche. There is a huge
opportunity to grow the community to hundreds if not thousands of
organizations, and we are hoping that embracing “the Apache way” will
accelerate the growth of our community.

We have already been active at seeking and inviting contributions, and

are

planning to scale the project by investing time and growing the support
structure to grow the community.

=== Core Developers ===

The initial committers for Superset include experienced full stack,
front-end and data engineers:

* Maxime Beauchemin (Airbnb)

* Alanna Scott (Airbnb)

* Bogdan Kyryliuk (Airbnb)

* Vera Liu  (Airbnb)

* Jeff Feng (Airbnb)

* Ashutosh Chauhan (Hortonworks)

* Nishant Bangarwa (Hortonworks)

* Slim Bouguerra (Hortonworks)

* Priyank Shah (Hortonworks)

* Sriharsha Chintalapani (Hortonworks)

* Daniel Dai (Hortonworks)

We realize that additional employer diversity is needed, and we will work
aggressively to recruit developers from additional companies.

=== Alignment ===

The initial committers strongly believe that a system for interactive
visualization of data will gain broader adoption as an open source,
community driven project, where the community can contribute not only to
the core components, but also to a growing collection of connectors,
visualizations and improving integration a all potential data sources.
Superset already integrates closely with Apache Hive, the Hive metastore,
as well as most SQL-speaking databases found in modern data ecosystems.

== Known Risks ==

=== Orphaned Products ===

Superset is a vital component for both visualizing, accessing and
democratizing data at Airbnb.  Also at Hortonworks, Superset is a core
component of the DataFlow product offering.  Thus, the risk of the

project

being orphaned is relatively low.  The project could be at risk if Airbnb
changes their approach for democratizing data or if Hortonworks changes
their strategy in the market.  In such an event, the committers plan to
continue working on the project on their own time, thought the progress
will likely be slower.  We plan to mitigate this risk by recruiting
additional committers.

=== Inexperience with Open Source ===

The initial committers include veteran Apache members (committers and PMC
members) and other developers who have varying degrees of experience with
open source projects. All have been involved with source code that has

been

released under an open source license, and several also have experience
developing code with an open source development process.

=== Homogenous Developers ===

The initial committers are employed by Airbnb Inc., and Hortonworks. We

are

committed to recruiting additional committers from other companies.

=== Reliance on Salaried Developers ===

It is expected that Superset development will occur on both salaried time
and on volunteer time, after hours. The majority of initial committers

are

paid by their employer to contribute to this project. However, they are

all

passionate about the project, and we are confident that the project will
continue even if no salaried developers contribute to the project. We are
committed to recruiting additional committers including non-salaried
developers.

=== Relationships with Other Apache Products ===

To the knowledge of the Initial Committers, there are no direct

competitors

to Superset within the Apache Software Foundation.  That said, Apache
Zeppelin is an indirect competitor, but it solves a different use case.

Apache Zeppelin is a web-based notebook that enables interactive data
analytics. It enables the creation of beautiful data-driven, interactive
and collaborative documents with SQL, Scala and more.  Although a user

can

create data visualizations using this project, it leverages a notebook
style user interfaces and it is geared towards the Spark community where
Scala and SQL co-exist

We look forward to collaborating with those communities, as well as other
Apache communities.

=== An Excessive Fascination with the Apache Brand ===

Superset is solving two huge challenges:

The challenge of enabling every knowledge worker to make data informed
decisions, particularly those who are not deeply skilled at writing SQL.

The challenge of visualizing huge amounts of data interactively and in
real-time

Superset was first developed as a data visualization solution for

Druid.io

as a way to visualize billions of rows of data.  Since then, usage of
Superset has expanded to address data visualization use cases across SQL
speaking data sources as well.

Our rationale for developing Superset as an Apache project is detailed in
the Rationale Section.  We believe that the Apache brand and community
process will help us attract more contributors to this project, and help
grow the footprint of the project through usage at other organizations

and

within other applications.  Establishing consensus among users and
developers will result in a more valuable tool for everyone.

== Documentation ==

References to further reading material:

* [[http://airbnb.io/superset/|Superset Documentation]]

* [[https://medium.com/airbnb-engineering/caravel-airbnb-s-dat
a-exploration-platform-15a72aa610e5#.npqmmbu25|Blog Post:  Superset:
Airbnb’s Data Exploration Platform]]

* [[https://medium.com/airbnb-engineering/superset-scaling-dat
a-access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog

Post:

 Superset: Scaling Data Access & Visual Insights at Airbnb]]

== Initial Source ==

The origin of the proposed code base can be found at
https://github.com/airbnb/superset.  The code base is primarily in

Python.


== Source and Intellectual Property Submission Plan ==

We do not expect any complications for the submission of the Superset

code

base.  Our code is already in Github and there is only a single code

base.


== External Dependencies ==

List of Python packages, from the Python Package Index (Pypi):

* boto3

* celery

* cryptography

* flask-appbuilder

* flask-cache

* flask-migrate

* flask-script

* flask-sqlalchemy

* flask-testing

* humanize

* gunicorn

* markdown

* pandas

* parsedatetime

* pydruid

* PyHive

* python-dateutil

* requests

* simplejson

* six

* sqlalchemy

* sqlalchemy-utils

* sqlparse

* thrift

* thrift-sasl

* werkzeug

List of Javascript packages, from NPM:

* autobind-decorator

* bootstrap

* bootstrap-datepicker

* brace

* brfs

* cal-heatmap

* classnames

* d3

* d3-cloud

* d3-sankey

* d3-scale

* d3-tip

* datamaps

* datatables-bootstrap3-plugin

* datatables.net-bs

* font-awesome

* gridster

* immutability-helper

* immutable

* jquery

* lodash.throttle

* mapbox-gl

* moment

* moments

* mustache

* nvd3

* react

* react-ace

* react-bootstrap

* react-bootstrap-table

* react-dom

* react-draggable

* react-gravatar

* react-grid-layout

* react-map-gl

* react-redux

* react-resizable

* react-select

* react-syntax-highlighter

* reactable

* redux

* redux-localstorage

* redux-thunk

* shortid

* style-loader

* supercluster

* topojson

* victory

* viewport-mercator-project

== Cryptography ==

The proposal does not include cryptographic code.

== Required Resources ==

=== Mailing List ===

There is a current mailing list as a Google Group “airbnb_superset” that

we

are planning on deprecating as the Apache.org become ready to serve our
community.

* superset-private

* superset-dev

* superset-user

=== Subversion Directory ===

Git is the preferred source control system.

http://svn.apache.org/repos/as

f/incubator/superset

== Git Repository ==

Git is the preferred source control system, we’re assuming
https://github.com/apache/incubator-superset based on the naming scheme

== Issue Tracking ==

JIRA Superset (SUPERSET). If possible, we’d like to use Github issues &

PRs

to manage our project as much as possible. It’s been said that there are
ways to keep Github’s issues in sync with Jira, allowing us to get best

of

both worlds. If that is not possible, we will comply to using Jira.

== Other Resources ==

We currently use a set of Github integrated services that are free to the
open source community, like Travis-ci, Code Climate, Coveralls,
Landscape.io, Requires.io, david-dm and Gitter. We would like to keep

using

these services as they allow us to scale contributions and optimize our
development flows. These services require some elevated rights on the
Github repository in order to set up or tune and we would like for the
committers to have the required rights.


== Initial Committers ==

* Maxime Beauchemin <maxime.beauche...@airbnb.com> - PMC & Committer

* Alanna Scott <alanna.sc...@airbnb.com> - PMC & Committer

* Bogdan Kyryliuk <b.kyryl...@gmail.com> - PMC & Committer

* Vera Liu <vera....@airbnb.com> - Committer

* Jeff Feng <jeff.f...@airbnb.com> - PMC & Committer

* Ashutosh Chauhan <hashut...@apache.org> - Mentor & Committer

* Nishant Bangarwa <nbanga...@hortonworks.com> - PMC & Committer

* Slim Bouguerra <sbougue...@hortonworks.com> - Committer

* Priyank Shah <ps...@hortonworks.com> - Committer

* Harsha Chintalapani <schintalap...@hortonworks.com> - Committer

* Daniel Dai <da...@apache.org> - Champion & Committer

== Affiliations ==

The initial committers are employees of Airbnb Inc. and Hortonworks.

== Sponsors ==

=== Champion ===

Daniel Dai <da...@apache.org>

=== Nominated Mentors ===

Ashutosh Chauhan <hashut...@apache.org>

=== Sponsoring Entity ===

Incubator PMC


--

*Jeff Feng*
Product Manager
m: (949)-610-5108 <(949)%20610-5108>
twitter: @jtfeng


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [PROPOSAL] Superset Proposal for Apache Incubator

Reply via email to