Re: [PROPOSAL] DLab for Apache Incubator

2018-08-14 Thread P. Taylor Goetz
If there’s no further discussion, I will call for a VOTE tomorrow.

-Taylor

> On Aug 6, 2018, at 10:47 AM, P. Taylor Goetz  wrote:
> 
> I would like to propose DLab as an Apache Incubator project.
> 
> The text of the proposal can be found below as well as on the Incubator wiki:
> 
> https://wiki.apache.org/incubator/DLabProposal
> 
> We are seeking additional mentors and would welcome anyone who would like to 
> volunteer.
> 
> -Taylor
> 
> 
> = DLab Proposal =
> 
> == Abstract ==
> DLab is a platform for creating self-service, exploratory data science 
> environments in the cloud using best-of-breed data science tools.
> 
> DLab includes a self-service web console, used to create and manage 
> exploratory environments. It allows teams to spin up analytical environments 
> with just a single click of a mouse. Once established, the environment can be 
> managed by an analytical team itself, leveraging simple and easy-to-use 
> web-based interface.
> 
> == Proposal ==
> In order to work effectively, data scientists rely on a varying suite of 
> analytics tools that are readily available. However, many of those tools are 
> non-trivial to set up in terms of hardware provisioning, software 
> installation, configuration, and deployment. Setting up a collaborative, 
> multi-tenant development environment for data scientists consumes substantial 
> IT and DevOps resources, as well as time. These factors often combine to 
> hinder the agility and effectiveness of data science teams within an 
> organization. Current solutions are largely closed source and/or proprietary, 
> and committing to a given solution introduces the potential for vendor 
> lock-in.
> 
> EPAM Systems developed DLab in response to the lack of open source, 
> permissibly licensed solutions to better enable data science workflows. The 
> ALv2 was selected to encourage open development and user adoption. DLab was 
> open sourced on Dec 29, 2016 and is under active development with support 
> from EPAM Systems.
> 
> We believe DLab is a unique solution with no current open source equivalent. 
> Our primary goals of incubation are to grow and diversify the DLab community 
> to ensure its long-term sustainability.
> 
> == Rationale ==
> DLab is a platform that provides data scientists with the ability to 
> self-provision, without IT support, exploratory and production environments 
> with their preferred set of tools installed and pre-configured. Tool options 
> include, but are not limited to:
> 
> * Apache Spark
> * Apache Flink (planned)
> * Apache Zeppelin
> * Jupyter
> * TensorFlow + Jupyter
> * Deep Learning + Jupyter
> 
> DLab leverages cloud computing providers for virtual hardware provisioning 
> and currently supports the following:
> 
> * Amazon Web Services (AWS)
> * Microsoft Azure
> * Google Compute Platform (GCP) (under development)
> 
> DLab offers git-based collaboration tools for data scientists and developers 
> and integrates with the following git service providers:
> 
> * GItHub
> * GitLab
> * BitBucket
> 
> Additionally, DLab includes the option to configure the UnGit tool in an 
> environment to facilitate collaboration.
> Finally, DLab integrates closely with many security and SSO offerings, 
> including:
> 
> * LDAP
> * Microsoft Active Directory
> * AWS Identity Access Management service
> 
> DLab was designed from the ground up to be highly configurable, flexible, and 
> extensible platform. We believe these qualities will encourage community 
> growth by enabling contributors to easily add new integrations and extensions.
> 
> == Initial Goals ==
> The initial goal will be to move the existing codebase to Apache and 
> integrate with the Apache development process and infrastructure. A primary 
> goal of incubation will be to grow and diversify the DLab PPMC. We are well 
> aware that the project community is comprised of individuals from a single 
> company. We aim to change that during incubation.
> 
> == Current Status ==
> As previously mentioned, DLab is under active development at EPAM Systems, 
> and is being used in a number of production deployments:
> 
> * [An investment company] is using DLab as an AWS-based analytics platform 
> for their data scientists to provide a convenient way to perform multi-tenant 
> data analytics. This enables data scientists to easily provision work 
> environments with integrated data sources based on Elasticsearch, Apache 
> HBase, and Neo4j, and utilizing Apache Spark. This enabled a “one click”, 
> self service option for users to provision an environment with the necessary 
> tools and data.
> 
> * [An electronics manufacturing company] leverages DLab for data quality, 
> data exploration, and analytics. The company’s data scientists leverage DLab 
> to work with data sources that have been transferred to the cloud in order to 
> find new insights on the data, and help the implementation team define 
> requirements for data engineering. The main goal is to increase the 

Re: [PROPOSAL] DLab for Apache Incubator

2018-08-07 Thread P. Taylor Goetz
Thanks Debo,

Traditionally/in practice/lately, those who are not IPMC members are listed as 
“Interested Contributors”. Are you okay with that?

I’ll gladly add it and invite anyone else who is interested to join that group.

As we tried to make clear in the proposal, we understand the current community 
is limited to one company, and the first order of business is to diversify the 
community.

-Taylor

> On Aug 7, 2018, at 9:11 PM, Debo Dutta (dedutta)  
> wrote:
> 
> I am happy to help (either mentor or volunteer). This is a good idea. Have 
> helped out in Apache projects before. 
> 
> Debo 
> 
> Sent from my iPhone
> 
>> On Aug 7, 2018, at 6:08 PM, P. Taylor Goetz  wrote:
>> 
>> Henry Saputra (hsaputra) has been added to the mentor list.
>> 
>> We are still interested in proposal feedback and mentor volunteers.
>> 
>> -Taylor
>> 
>>> On Aug 6, 2018, at 10:47 AM, P. Taylor Goetz  wrote:
>>> 
>>> I would like to propose DLab as an Apache Incubator project.
>>> 
>>> The text of the proposal can be found below as well as on the Incubator 
>>> wiki:
>>> 
>>> https://wiki.apache.org/incubator/DLabProposal
>>> 
>>> We are seeking additional mentors and would welcome anyone who would like 
>>> to volunteer.
>>> 
>>> -Taylor
>>> 
>>> 
>>> = DLab Proposal =
>>> 
>>> == Abstract ==
>>> DLab is a platform for creating self-service, exploratory data science 
>>> environments in the cloud using best-of-breed data science tools.
>>> 
>>> DLab includes a self-service web console, used to create and manage 
>>> exploratory environments. It allows teams to spin up analytical 
>>> environments with just a single click of a mouse. Once established, the 
>>> environment can be managed by an analytical team itself, leveraging simple 
>>> and easy-to-use web-based interface.
>>> 
>>> == Proposal ==
>>> In order to work effectively, data scientists rely on a varying suite of 
>>> analytics tools that are readily available. However, many of those tools 
>>> are non-trivial to set up in terms of hardware provisioning, software 
>>> installation, configuration, and deployment. Setting up a collaborative, 
>>> multi-tenant development environment for data scientists consumes 
>>> substantial IT and DevOps resources, as well as time. These factors often 
>>> combine to hinder the agility and effectiveness of data science teams 
>>> within an organization. Current solutions are largely closed source and/or 
>>> proprietary, and committing to a given solution introduces the potential 
>>> for vendor lock-in.
>>> 
>>> EPAM Systems developed DLab in response to the lack of open source, 
>>> permissibly licensed solutions to better enable data science workflows. The 
>>> ALv2 was selected to encourage open development and user adoption. DLab was 
>>> open sourced on Dec 29, 2016 and is under active development with support 
>>> from EPAM Systems.
>>> 
>>> We believe DLab is a unique solution with no current open source 
>>> equivalent. Our primary goals of incubation are to grow and diversify the 
>>> DLab community to ensure its long-term sustainability.
>>> 
>>> == Rationale ==
>>> DLab is a platform that provides data scientists with the ability to 
>>> self-provision, without IT support, exploratory and production environments 
>>> with their preferred set of tools installed and pre-configured. Tool 
>>> options include, but are not limited to:
>>> 
>>> * Apache Spark
>>> * Apache Flink (planned)
>>> * Apache Zeppelin
>>> * Jupyter
>>> * TensorFlow + Jupyter
>>> * Deep Learning + Jupyter
>>> 
>>> DLab leverages cloud computing providers for virtual hardware provisioning 
>>> and currently supports the following:
>>> 
>>> * Amazon Web Services (AWS)
>>> * Microsoft Azure
>>> * Google Compute Platform (GCP) (under development)
>>> 
>>> DLab offers git-based collaboration tools for data scientists and 
>>> developers and integrates with the following git service providers:
>>> 
>>> * GItHub
>>> * GitLab
>>> * BitBucket
>>> 
>>> Additionally, DLab includes the option to configure the UnGit tool in an 
>>> environment to facilitate collaboration.
>>> Finally, DLab integrates closely with many security and SSO offerings, 
>>> including:
>>> 
>>> * LDAP
>>> * Microsoft Active Directory
>>> * AWS Identity Access Management service
>>> 
>>> DLab was designed from the ground up to be highly configurable, flexible, 
>>> and extensible platform. We believe these qualities will encourage 
>>> community growth by enabling contributors to easily add new integrations 
>>> and extensions.
>>> 
>>> == Initial Goals ==
>>> The initial goal will be to move the existing codebase to Apache and 
>>> integrate with the Apache development process and infrastructure. A primary 
>>> goal of incubation will be to grow and diversify the DLab PPMC. We are well 
>>> aware that the project community is comprised of individuals from a single 
>>> company. We aim to change that during incubation.
>>> 
>>> == Current Status ==
>>> As previously mentioned, DLab is 

Re: [PROPOSAL] DLab for Apache Incubator

2018-08-07 Thread Debo Dutta (dedutta)
I am happy to help (either mentor or volunteer). This is a good idea. Have 
helped out in Apache projects before. 

Debo 

Sent from my iPhone

> On Aug 7, 2018, at 6:08 PM, P. Taylor Goetz  wrote:
> 
> Henry Saputra (hsaputra) has been added to the mentor list.
> 
> We are still interested in proposal feedback and mentor volunteers.
> 
> -Taylor
> 
>> On Aug 6, 2018, at 10:47 AM, P. Taylor Goetz  wrote:
>> 
>> I would like to propose DLab as an Apache Incubator project.
>> 
>> The text of the proposal can be found below as well as on the Incubator wiki:
>> 
>> https://wiki.apache.org/incubator/DLabProposal
>> 
>> We are seeking additional mentors and would welcome anyone who would like to 
>> volunteer.
>> 
>> -Taylor
>> 
>> 
>> = DLab Proposal =
>> 
>> == Abstract ==
>> DLab is a platform for creating self-service, exploratory data science 
>> environments in the cloud using best-of-breed data science tools.
>> 
>> DLab includes a self-service web console, used to create and manage 
>> exploratory environments. It allows teams to spin up analytical environments 
>> with just a single click of a mouse. Once established, the environment can 
>> be managed by an analytical team itself, leveraging simple and easy-to-use 
>> web-based interface.
>> 
>> == Proposal ==
>> In order to work effectively, data scientists rely on a varying suite of 
>> analytics tools that are readily available. However, many of those tools are 
>> non-trivial to set up in terms of hardware provisioning, software 
>> installation, configuration, and deployment. Setting up a collaborative, 
>> multi-tenant development environment for data scientists consumes 
>> substantial IT and DevOps resources, as well as time. These factors often 
>> combine to hinder the agility and effectiveness of data science teams within 
>> an organization. Current solutions are largely closed source and/or 
>> proprietary, and committing to a given solution introduces the potential for 
>> vendor lock-in.
>> 
>> EPAM Systems developed DLab in response to the lack of open source, 
>> permissibly licensed solutions to better enable data science workflows. The 
>> ALv2 was selected to encourage open development and user adoption. DLab was 
>> open sourced on Dec 29, 2016 and is under active development with support 
>> from EPAM Systems.
>> 
>> We believe DLab is a unique solution with no current open source equivalent. 
>> Our primary goals of incubation are to grow and diversify the DLab community 
>> to ensure its long-term sustainability.
>> 
>> == Rationale ==
>> DLab is a platform that provides data scientists with the ability to 
>> self-provision, without IT support, exploratory and production environments 
>> with their preferred set of tools installed and pre-configured. Tool options 
>> include, but are not limited to:
>> 
>> * Apache Spark
>> * Apache Flink (planned)
>> * Apache Zeppelin
>> * Jupyter
>> * TensorFlow + Jupyter
>> * Deep Learning + Jupyter
>> 
>> DLab leverages cloud computing providers for virtual hardware provisioning 
>> and currently supports the following:
>> 
>> * Amazon Web Services (AWS)
>> * Microsoft Azure
>> * Google Compute Platform (GCP) (under development)
>> 
>> DLab offers git-based collaboration tools for data scientists and developers 
>> and integrates with the following git service providers:
>> 
>> * GItHub
>> * GitLab
>> * BitBucket
>> 
>> Additionally, DLab includes the option to configure the UnGit tool in an 
>> environment to facilitate collaboration.
>> Finally, DLab integrates closely with many security and SSO offerings, 
>> including:
>> 
>> * LDAP
>> * Microsoft Active Directory
>> * AWS Identity Access Management service
>> 
>> DLab was designed from the ground up to be highly configurable, flexible, 
>> and extensible platform. We believe these qualities will encourage community 
>> growth by enabling contributors to easily add new integrations and 
>> extensions.
>> 
>> == Initial Goals ==
>> The initial goal will be to move the existing codebase to Apache and 
>> integrate with the Apache development process and infrastructure. A primary 
>> goal of incubation will be to grow and diversify the DLab PPMC. We are well 
>> aware that the project community is comprised of individuals from a single 
>> company. We aim to change that during incubation.
>> 
>> == Current Status ==
>> As previously mentioned, DLab is under active development at EPAM Systems, 
>> and is being used in a number of production deployments:
>> 
>> * [An investment company] is using DLab as an AWS-based analytics platform 
>> for their data scientists to provide a convenient way to perform 
>> multi-tenant data analytics. This enables data scientists to easily 
>> provision work environments with integrated data sources based on 
>> Elasticsearch, Apache HBase, and Neo4j, and utilizing Apache Spark. This 
>> enabled a “one click”, self service option for users to provision an 
>> environment with the necessary tools and data.

Re: [PROPOSAL] DLab for Apache Incubator

2018-08-07 Thread P. Taylor Goetz
Henry Saputra (hsaputra) has been added to the mentor list.

We are still interested in proposal feedback and mentor volunteers.

-Taylor

> On Aug 6, 2018, at 10:47 AM, P. Taylor Goetz  wrote:
> 
> I would like to propose DLab as an Apache Incubator project.
> 
> The text of the proposal can be found below as well as on the Incubator wiki:
> 
> https://wiki.apache.org/incubator/DLabProposal
> 
> We are seeking additional mentors and would welcome anyone who would like to 
> volunteer.
> 
> -Taylor
> 
> 
> = DLab Proposal =
> 
> == Abstract ==
> DLab is a platform for creating self-service, exploratory data science 
> environments in the cloud using best-of-breed data science tools.
> 
> DLab includes a self-service web console, used to create and manage 
> exploratory environments. It allows teams to spin up analytical environments 
> with just a single click of a mouse. Once established, the environment can be 
> managed by an analytical team itself, leveraging simple and easy-to-use 
> web-based interface.
> 
> == Proposal ==
> In order to work effectively, data scientists rely on a varying suite of 
> analytics tools that are readily available. However, many of those tools are 
> non-trivial to set up in terms of hardware provisioning, software 
> installation, configuration, and deployment. Setting up a collaborative, 
> multi-tenant development environment for data scientists consumes substantial 
> IT and DevOps resources, as well as time. These factors often combine to 
> hinder the agility and effectiveness of data science teams within an 
> organization. Current solutions are largely closed source and/or proprietary, 
> and committing to a given solution introduces the potential for vendor 
> lock-in.
> 
> EPAM Systems developed DLab in response to the lack of open source, 
> permissibly licensed solutions to better enable data science workflows. The 
> ALv2 was selected to encourage open development and user adoption. DLab was 
> open sourced on Dec 29, 2016 and is under active development with support 
> from EPAM Systems.
> 
> We believe DLab is a unique solution with no current open source equivalent. 
> Our primary goals of incubation are to grow and diversify the DLab community 
> to ensure its long-term sustainability.
> 
> == Rationale ==
> DLab is a platform that provides data scientists with the ability to 
> self-provision, without IT support, exploratory and production environments 
> with their preferred set of tools installed and pre-configured. Tool options 
> include, but are not limited to:
> 
> * Apache Spark
> * Apache Flink (planned)
> * Apache Zeppelin
> * Jupyter
> * TensorFlow + Jupyter
> * Deep Learning + Jupyter
> 
> DLab leverages cloud computing providers for virtual hardware provisioning 
> and currently supports the following:
> 
> * Amazon Web Services (AWS)
> * Microsoft Azure
> * Google Compute Platform (GCP) (under development)
> 
> DLab offers git-based collaboration tools for data scientists and developers 
> and integrates with the following git service providers:
> 
> * GItHub
> * GitLab
> * BitBucket
> 
> Additionally, DLab includes the option to configure the UnGit tool in an 
> environment to facilitate collaboration.
> Finally, DLab integrates closely with many security and SSO offerings, 
> including:
> 
> * LDAP
> * Microsoft Active Directory
> * AWS Identity Access Management service
> 
> DLab was designed from the ground up to be highly configurable, flexible, and 
> extensible platform. We believe these qualities will encourage community 
> growth by enabling contributors to easily add new integrations and extensions.
> 
> == Initial Goals ==
> The initial goal will be to move the existing codebase to Apache and 
> integrate with the Apache development process and infrastructure. A primary 
> goal of incubation will be to grow and diversify the DLab PPMC. We are well 
> aware that the project community is comprised of individuals from a single 
> company. We aim to change that during incubation.
> 
> == Current Status ==
> As previously mentioned, DLab is under active development at EPAM Systems, 
> and is being used in a number of production deployments:
> 
> * [An investment company] is using DLab as an AWS-based analytics platform 
> for their data scientists to provide a convenient way to perform multi-tenant 
> data analytics. This enables data scientists to easily provision work 
> environments with integrated data sources based on Elasticsearch, Apache 
> HBase, and Neo4j, and utilizing Apache Spark. This enabled a “one click”, 
> self service option for users to provision an environment with the necessary 
> tools and data.
> 
> * [An electronics manufacturing company] leverages DLab for data quality, 
> data exploration, and analytics. The company’s data scientists leverage DLab 
> to work with data sources that have been transferred to the cloud in order to 
> find new insights on the data, and help the implementation team define 
>