Re: [DISCUSS] SystemML Incubator Proposal
On Sun, Oct 25, 2015 at 11:02 PM, Henry Saputrawrote: > Thanks Luciano, I got my answer but would probably helped to > distinguish option to run it as Apache Hadoop MapReduce or YARN > application, and with abstraction of Apache Spark. > Thanks for the feedback. I have updated the proposal to clarify Hadoop MapReduce instead of just mentioning Hadoop. > > Looking forward possibility of having it run with Apache Flink :) > > - Henry > > -- Luciano Resende http://people.apache.org/~lresende http://twitter.com/lresende1975 http://lresende.blogspot.com/
Re: [DISCUSS] SystemML Incubator Proposal
Thanks Luciano, I got my answer but would probably helped to distinguish option to run it as Apache Hadoop MapReduce or YARN application, and with abstraction of Apache Spark. Looking forward possibility of having it run with Apache Flink :) - Henry On Sat, Oct 24, 2015 at 12:32 PM, Luciano Resendewrote: > On Sat, Oct 24, 2015 at 11:31 AM, Henry Saputra > wrote: > >> I have one question about the proposal, it keep mentioning that it >> could run on "Hadoop or Spark", but technically Spark can run on >> Hadoop YARN. >> Was it trying to say it could be run in Hadoop YARN (maybe via >> MapReduce) or Spark? >> >> > Exactly, if this is a point of confusion i can clarify it on the proposal. > > >> I would love to see if the execution abstraction is well enough >> defined to be able to run it on the others distributed framework like >> Flink or Tez (maybe via Crunch?) >> >> > Yes, this is definitely a possibility, we have talked about Flink before as > a possible next runtime. > > >> Thanks, >> >> Henry >> > > > -- > Luciano Resende > http://people.apache.org/~lresende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] SystemML Incubator Proposal
Hello Luciano, Recently heard the presentation on SystemML at Apache BigData conference and it sounds exciting. Looking forward to Apache Incubation. Regards Seshu Adunuthula On 10/23/15, 5:34 PM, "Luciano Resende"wrote: >On Fri, Oct 23, 2015 at 5:30 PM, Henry Saputra >wrote: > >> Hi Luciano, >> >> Good proposal, but looks like >> https://wiki.apache.org/incubator/SystemM does not exist? >> > >Good catch, it's a typo on the original link and it's missing the L at the >end, here is the correct link > >https://wiki.apache.org/incubator/SystemML > > > >> >> Also, Reynold Xin and Patrick Wendell are not member of IPMCs so I >> don't they could be mentors of this project, yet. >> >> They can ask to be member of IPMCs since both are already member of >> ASF. But for now need to remove it from proposal. >> >> >> >Yes, they are aware of the requirement, and this will be fixed before we >call a vote on the proposal. > > > >> - Henry >> >> On Fri, Oct 23, 2015 at 4:34 PM, Luciano Resende >> wrote: >> > We would like to start a discussion on accepting SystemML as an Apache >> > Incubator project. >> > >> > The proposal is available at : >> > https://wiki.apache.org/incubator/SystemM >> > >> > And it's contents is also copied below. >> > >> > Thanks in Advance for you time reviewing and providing feedback. >> > >> > == >> > >> > = SystemML = >> > >> > == Abstract == >> > >> > SystemML provides declarative large-scale machine learning (ML) that >>aims >> > at flexible specification of ML algorithms and automatic generation of >> > hybrid runtime plans ranging from single node, in-memory >>computations, to >> > distributed computations on Apache Hadoop and Apache Spark. ML >> algorithms >> > are expressed in an R-like syntax, that includes linear algebra >> primitives, >> > statistical functions, and ML-specific constructs. This high-level >> language >> > significantly increases the productivity of data scientists as it >> provides >> > (1) full flexibility in expressing custom analytics, and (2) data >> > independence from the underlying input formats and physical data >> > representations. Automatic optimization according to data >>characteristics >> > such as distribution on the disk file system, and sparsity as well as >> > processing characteristics in the distributed environment like number >>of >> > nodes, CPU, memory per node, ensures both efficiency and scalability. >> > >> > == Proposal == >> > >> > The goal of SystemML is to create a commercial friendly, scalable and >> > extensible machine learning framework for data scientists to create or >> > extend machine learning algorithms using a declarative syntax. The >> machine >> > learning framework enables data scientists to develop algorithms >>locally >> > without the need of a distributed cluster, and scale up and scale out >>the >> > execution of these algorithms to distributed Hadoop or Spark clusters. >> > >> > == Background == >> > >> > SystemML started as a research project in the IBM Almaden Research >>Center >> > around 2010 aiming to enable data scientists to develop machine >>learning >> > algorithms independent of data and cluster characteristics. >> > >> > == Rationale == >> > >> > SystemML enables the specification of machine learning algorithms >>using a >> > declarative machine learning (DML) language. DML includes linear >>algebra >> > primitives, statistical functions, and additional constructs. This >> > high-level language significantly increases the productivity of data >> > scientists as it provides (1) full flexibility in expressing custom >> > analytics and (2) data independence from the underlying input formats >>and >> > physical data representations. >> > >> > SystemML computations can be executed in a variety of different >>modes. It >> > supports single node in-memory computations and large-scale >>distributed >> > cluster computations. This allows the user to quickly prototype new >> > algorithms in local environments but automatically scale to large data >> > sizes as well without changing the algorithm implementation. >> > >> > Algorithms specified in DML are dynamically compiled and optimized >>based >> on >> > data and cluster characteristics using rule-based and cost-based >> > optimization techniques. The optimizer automatically generates hybrid >> > runtime execution plans ranging from in-memory single-node execution >>to >> > distributed computations on Spark or Hadoop. This ensures both >>efficiency >> > and scalability. Automatic optimization reduces or eliminates the >>need to >> > hand-tune distributed runtime execution plans and system >>configurations. >> > >> > == Initial Goals == >> > >> > The initial goals to move SystemML to the Apache Incubator is to >>broaden >> > the community foster the contributions from data scientists to develop >> new >> > machine learning algorithms and enhance the existing ones. Ultimately, >>
Re: [DISCUSS] SystemML Incubator Proposal
I have one question about the proposal, it keep mentioning that it could run on "Hadoop or Spark", but technically Spark can run on Hadoop YARN. Was it trying to say it could be run in Hadoop YARN (maybe via MapReduce) or Spark? I would love to see if the execution abstraction is well enough defined to be able to run it on the others distributed framework like Flink or Tez (maybe via Crunch?) Thanks, Henry On Fri, Oct 23, 2015 at 4:34 PM, Luciano Resendewrote: > We would like to start a discussion on accepting SystemML as an Apache > Incubator project. > > The proposal is available at : > https://wiki.apache.org/incubator/SystemM > > And it's contents is also copied below. > > Thanks in Advance for you time reviewing and providing feedback. > > == > > = SystemML = > > == Abstract == > > SystemML provides declarative large-scale machine learning (ML) that aims > at flexible specification of ML algorithms and automatic generation of > hybrid runtime plans ranging from single node, in-memory computations, to > distributed computations on Apache Hadoop and Apache Spark. ML algorithms > are expressed in an R-like syntax, that includes linear algebra primitives, > statistical functions, and ML-specific constructs. This high-level language > significantly increases the productivity of data scientists as it provides > (1) full flexibility in expressing custom analytics, and (2) data > independence from the underlying input formats and physical data > representations. Automatic optimization according to data characteristics > such as distribution on the disk file system, and sparsity as well as > processing characteristics in the distributed environment like number of > nodes, CPU, memory per node, ensures both efficiency and scalability. > > == Proposal == > > The goal of SystemML is to create a commercial friendly, scalable and > extensible machine learning framework for data scientists to create or > extend machine learning algorithms using a declarative syntax. The machine > learning framework enables data scientists to develop algorithms locally > without the need of a distributed cluster, and scale up and scale out the > execution of these algorithms to distributed Hadoop or Spark clusters. > > == Background == > > SystemML started as a research project in the IBM Almaden Research Center > around 2010 aiming to enable data scientists to develop machine learning > algorithms independent of data and cluster characteristics. > > == Rationale == > > SystemML enables the specification of machine learning algorithms using a > declarative machine learning (DML) language. DML includes linear algebra > primitives, statistical functions, and additional constructs. This > high-level language significantly increases the productivity of data > scientists as it provides (1) full flexibility in expressing custom > analytics and (2) data independence from the underlying input formats and > physical data representations. > > SystemML computations can be executed in a variety of different modes. It > supports single node in-memory computations and large-scale distributed > cluster computations. This allows the user to quickly prototype new > algorithms in local environments but automatically scale to large data > sizes as well without changing the algorithm implementation. > > Algorithms specified in DML are dynamically compiled and optimized based on > data and cluster characteristics using rule-based and cost-based > optimization techniques. The optimizer automatically generates hybrid > runtime execution plans ranging from in-memory single-node execution to > distributed computations on Spark or Hadoop. This ensures both efficiency > and scalability. Automatic optimization reduces or eliminates the need to > hand-tune distributed runtime execution plans and system configurations. > > == Initial Goals == > > The initial goals to move SystemML to the Apache Incubator is to broaden > the community foster the contributions from data scientists to develop new > machine learning algorithms and enhance the existing ones. Ultimately, this > may lead to the creation of an industry standard in specifying machine > learning algorithms. > > == Current Status == > > The initial code has been developed at the IBM Almaden Research Center in > California and has recently been made available in GitHub under the Apache > Software License 2.0. The project currently supports a single node (in > memory computation) as well as distributed computations utilizing Hadoop or > Spark clusters. > > === Meritocracy === > > We plan to invest in supporting a meritocracy. We will discuss the > requirements in an open forum. Several companies have already expressed > interest in this project, and we intend to invite additional developers to > participate. We will encourage and monitor community participation so that > privileges can be extended to those that contribute operating to the > standard of meritocracy
Re: [DISCUSS] SystemML Incubator Proposal
On Sat, Oct 24, 2015 at 11:31 AM, Henry Saputrawrote: > I have one question about the proposal, it keep mentioning that it > could run on "Hadoop or Spark", but technically Spark can run on > Hadoop YARN. > Was it trying to say it could be run in Hadoop YARN (maybe via > MapReduce) or Spark? > > Exactly, if this is a point of confusion i can clarify it on the proposal. > I would love to see if the execution abstraction is well enough > defined to be able to run it on the others distributed framework like > Flink or Tez (maybe via Crunch?) > > Yes, this is definitely a possibility, we have talked about Flink before as a possible next runtime. > Thanks, > > Henry > -- Luciano Resende http://people.apache.org/~lresende http://twitter.com/lresende1975 http://lresende.blogspot.com/
Re: [DISCUSS] SystemML Incubator Proposal
Hi Luciano, If you need any additional mentors, let me know. I would be interested in helping out. thanks — Hitesh On Oct 23, 2015, at 4:34 PM, Luciano Resendewrote: > We would like to start a discussion on accepting SystemML as an Apache > Incubator project. > > The proposal is available at : > https://wiki.apache.org/incubator/SystemM > > And it's contents is also copied below. > > Thanks in Advance for you time reviewing and providing feedback. > > == > > = SystemML = > > == Abstract == > > SystemML provides declarative large-scale machine learning (ML) that aims > at flexible specification of ML algorithms and automatic generation of > hybrid runtime plans ranging from single node, in-memory computations, to > distributed computations on Apache Hadoop and Apache Spark. ML algorithms > are expressed in an R-like syntax, that includes linear algebra primitives, > statistical functions, and ML-specific constructs. This high-level language > significantly increases the productivity of data scientists as it provides > (1) full flexibility in expressing custom analytics, and (2) data > independence from the underlying input formats and physical data > representations. Automatic optimization according to data characteristics > such as distribution on the disk file system, and sparsity as well as > processing characteristics in the distributed environment like number of > nodes, CPU, memory per node, ensures both efficiency and scalability. > > == Proposal == > > The goal of SystemML is to create a commercial friendly, scalable and > extensible machine learning framework for data scientists to create or > extend machine learning algorithms using a declarative syntax. The machine > learning framework enables data scientists to develop algorithms locally > without the need of a distributed cluster, and scale up and scale out the > execution of these algorithms to distributed Hadoop or Spark clusters. > > == Background == > > SystemML started as a research project in the IBM Almaden Research Center > around 2010 aiming to enable data scientists to develop machine learning > algorithms independent of data and cluster characteristics. > > == Rationale == > > SystemML enables the specification of machine learning algorithms using a > declarative machine learning (DML) language. DML includes linear algebra > primitives, statistical functions, and additional constructs. This > high-level language significantly increases the productivity of data > scientists as it provides (1) full flexibility in expressing custom > analytics and (2) data independence from the underlying input formats and > physical data representations. > > SystemML computations can be executed in a variety of different modes. It > supports single node in-memory computations and large-scale distributed > cluster computations. This allows the user to quickly prototype new > algorithms in local environments but automatically scale to large data > sizes as well without changing the algorithm implementation. > > Algorithms specified in DML are dynamically compiled and optimized based on > data and cluster characteristics using rule-based and cost-based > optimization techniques. The optimizer automatically generates hybrid > runtime execution plans ranging from in-memory single-node execution to > distributed computations on Spark or Hadoop. This ensures both efficiency > and scalability. Automatic optimization reduces or eliminates the need to > hand-tune distributed runtime execution plans and system configurations. > > == Initial Goals == > > The initial goals to move SystemML to the Apache Incubator is to broaden > the community foster the contributions from data scientists to develop new > machine learning algorithms and enhance the existing ones. Ultimately, this > may lead to the creation of an industry standard in specifying machine > learning algorithms. > > == Current Status == > > The initial code has been developed at the IBM Almaden Research Center in > California and has recently been made available in GitHub under the Apache > Software License 2.0. The project currently supports a single node (in > memory computation) as well as distributed computations utilizing Hadoop or > Spark clusters. > > === Meritocracy === > > We plan to invest in supporting a meritocracy. We will discuss the > requirements in an open forum. Several companies have already expressed > interest in this project, and we intend to invite additional developers to > participate. We will encourage and monitor community participation so that > privileges can be extended to those that contribute operating to the > standard of meritocracy that Apache emphasizes. > > === Community === > > The need for a generic scalable and declarative machine learning approach > in the open source is tremendous, so there is a potential for a very large > community. We believe that SystemML’s extensible architecture,
Re: [DISCUSS] SystemML Incubator Proposal
Hi Luciano, Good proposal, but looks like https://wiki.apache.org/incubator/SystemM does not exist? Also, Reynold Xin and Patrick Wendell are not member of IPMCs so I don't they could be mentors of this project, yet. They can ask to be member of IPMCs since both are already member of ASF. But for now need to remove it from proposal. - Henry On Fri, Oct 23, 2015 at 4:34 PM, Luciano Resendewrote: > We would like to start a discussion on accepting SystemML as an Apache > Incubator project. > > The proposal is available at : > https://wiki.apache.org/incubator/SystemM > > And it's contents is also copied below. > > Thanks in Advance for you time reviewing and providing feedback. > > == > > = SystemML = > > == Abstract == > > SystemML provides declarative large-scale machine learning (ML) that aims > at flexible specification of ML algorithms and automatic generation of > hybrid runtime plans ranging from single node, in-memory computations, to > distributed computations on Apache Hadoop and Apache Spark. ML algorithms > are expressed in an R-like syntax, that includes linear algebra primitives, > statistical functions, and ML-specific constructs. This high-level language > significantly increases the productivity of data scientists as it provides > (1) full flexibility in expressing custom analytics, and (2) data > independence from the underlying input formats and physical data > representations. Automatic optimization according to data characteristics > such as distribution on the disk file system, and sparsity as well as > processing characteristics in the distributed environment like number of > nodes, CPU, memory per node, ensures both efficiency and scalability. > > == Proposal == > > The goal of SystemML is to create a commercial friendly, scalable and > extensible machine learning framework for data scientists to create or > extend machine learning algorithms using a declarative syntax. The machine > learning framework enables data scientists to develop algorithms locally > without the need of a distributed cluster, and scale up and scale out the > execution of these algorithms to distributed Hadoop or Spark clusters. > > == Background == > > SystemML started as a research project in the IBM Almaden Research Center > around 2010 aiming to enable data scientists to develop machine learning > algorithms independent of data and cluster characteristics. > > == Rationale == > > SystemML enables the specification of machine learning algorithms using a > declarative machine learning (DML) language. DML includes linear algebra > primitives, statistical functions, and additional constructs. This > high-level language significantly increases the productivity of data > scientists as it provides (1) full flexibility in expressing custom > analytics and (2) data independence from the underlying input formats and > physical data representations. > > SystemML computations can be executed in a variety of different modes. It > supports single node in-memory computations and large-scale distributed > cluster computations. This allows the user to quickly prototype new > algorithms in local environments but automatically scale to large data > sizes as well without changing the algorithm implementation. > > Algorithms specified in DML are dynamically compiled and optimized based on > data and cluster characteristics using rule-based and cost-based > optimization techniques. The optimizer automatically generates hybrid > runtime execution plans ranging from in-memory single-node execution to > distributed computations on Spark or Hadoop. This ensures both efficiency > and scalability. Automatic optimization reduces or eliminates the need to > hand-tune distributed runtime execution plans and system configurations. > > == Initial Goals == > > The initial goals to move SystemML to the Apache Incubator is to broaden > the community foster the contributions from data scientists to develop new > machine learning algorithms and enhance the existing ones. Ultimately, this > may lead to the creation of an industry standard in specifying machine > learning algorithms. > > == Current Status == > > The initial code has been developed at the IBM Almaden Research Center in > California and has recently been made available in GitHub under the Apache > Software License 2.0. The project currently supports a single node (in > memory computation) as well as distributed computations utilizing Hadoop or > Spark clusters. > > === Meritocracy === > > We plan to invest in supporting a meritocracy. We will discuss the > requirements in an open forum. Several companies have already expressed > interest in this project, and we intend to invite additional developers to > participate. We will encourage and monitor community participation so that > privileges can be extended to those that contribute operating to the > standard of meritocracy that Apache emphasizes. > > === Community === > > The need for a
Re: [DISCUSS] SystemML Incubator Proposal
On Fri, Oct 23, 2015 at 5:30 PM, Henry Saputrawrote: > Hi Luciano, > > Good proposal, but looks like > https://wiki.apache.org/incubator/SystemM does not exist? > Good catch, it's a typo on the original link and it's missing the L at the end, here is the correct link https://wiki.apache.org/incubator/SystemML > > Also, Reynold Xin and Patrick Wendell are not member of IPMCs so I > don't they could be mentors of this project, yet. > > They can ask to be member of IPMCs since both are already member of > ASF. But for now need to remove it from proposal. > > > Yes, they are aware of the requirement, and this will be fixed before we call a vote on the proposal. > - Henry > > On Fri, Oct 23, 2015 at 4:34 PM, Luciano Resende > wrote: > > We would like to start a discussion on accepting SystemML as an Apache > > Incubator project. > > > > The proposal is available at : > > https://wiki.apache.org/incubator/SystemM > > > > And it's contents is also copied below. > > > > Thanks in Advance for you time reviewing and providing feedback. > > > > == > > > > = SystemML = > > > > == Abstract == > > > > SystemML provides declarative large-scale machine learning (ML) that aims > > at flexible specification of ML algorithms and automatic generation of > > hybrid runtime plans ranging from single node, in-memory computations, to > > distributed computations on Apache Hadoop and Apache Spark. ML > algorithms > > are expressed in an R-like syntax, that includes linear algebra > primitives, > > statistical functions, and ML-specific constructs. This high-level > language > > significantly increases the productivity of data scientists as it > provides > > (1) full flexibility in expressing custom analytics, and (2) data > > independence from the underlying input formats and physical data > > representations. Automatic optimization according to data characteristics > > such as distribution on the disk file system, and sparsity as well as > > processing characteristics in the distributed environment like number of > > nodes, CPU, memory per node, ensures both efficiency and scalability. > > > > == Proposal == > > > > The goal of SystemML is to create a commercial friendly, scalable and > > extensible machine learning framework for data scientists to create or > > extend machine learning algorithms using a declarative syntax. The > machine > > learning framework enables data scientists to develop algorithms locally > > without the need of a distributed cluster, and scale up and scale out the > > execution of these algorithms to distributed Hadoop or Spark clusters. > > > > == Background == > > > > SystemML started as a research project in the IBM Almaden Research Center > > around 2010 aiming to enable data scientists to develop machine learning > > algorithms independent of data and cluster characteristics. > > > > == Rationale == > > > > SystemML enables the specification of machine learning algorithms using a > > declarative machine learning (DML) language. DML includes linear algebra > > primitives, statistical functions, and additional constructs. This > > high-level language significantly increases the productivity of data > > scientists as it provides (1) full flexibility in expressing custom > > analytics and (2) data independence from the underlying input formats and > > physical data representations. > > > > SystemML computations can be executed in a variety of different modes. It > > supports single node in-memory computations and large-scale distributed > > cluster computations. This allows the user to quickly prototype new > > algorithms in local environments but automatically scale to large data > > sizes as well without changing the algorithm implementation. > > > > Algorithms specified in DML are dynamically compiled and optimized based > on > > data and cluster characteristics using rule-based and cost-based > > optimization techniques. The optimizer automatically generates hybrid > > runtime execution plans ranging from in-memory single-node execution to > > distributed computations on Spark or Hadoop. This ensures both efficiency > > and scalability. Automatic optimization reduces or eliminates the need to > > hand-tune distributed runtime execution plans and system configurations. > > > > == Initial Goals == > > > > The initial goals to move SystemML to the Apache Incubator is to broaden > > the community foster the contributions from data scientists to develop > new > > machine learning algorithms and enhance the existing ones. Ultimately, > this > > may lead to the creation of an industry standard in specifying machine > > learning algorithms. > > > > == Current Status == > > > > The initial code has been developed at the IBM Almaden Research Center in > > California and has recently been made available in GitHub under the > Apache > > Software License 2.0. The project currently supports a single node (in > > memory computation) as well
[DISCUSS] SystemML Incubator Proposal
We would like to start a discussion on accepting SystemML as an Apache Incubator project. The proposal is available at : https://wiki.apache.org/incubator/SystemM And it's contents is also copied below. Thanks in Advance for you time reviewing and providing feedback. == = SystemML = == Abstract == SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations, to distributed computations on Apache Hadoop and Apache Spark. ML algorithms are expressed in an R-like syntax, that includes linear algebra primitives, statistical functions, and ML-specific constructs. This high-level language significantly increases the productivity of data scientists as it provides (1) full flexibility in expressing custom analytics, and (2) data independence from the underlying input formats and physical data representations. Automatic optimization according to data characteristics such as distribution on the disk file system, and sparsity as well as processing characteristics in the distributed environment like number of nodes, CPU, memory per node, ensures both efficiency and scalability. == Proposal == The goal of SystemML is to create a commercial friendly, scalable and extensible machine learning framework for data scientists to create or extend machine learning algorithms using a declarative syntax. The machine learning framework enables data scientists to develop algorithms locally without the need of a distributed cluster, and scale up and scale out the execution of these algorithms to distributed Hadoop or Spark clusters. == Background == SystemML started as a research project in the IBM Almaden Research Center around 2010 aiming to enable data scientists to develop machine learning algorithms independent of data and cluster characteristics. == Rationale == SystemML enables the specification of machine learning algorithms using a declarative machine learning (DML) language. DML includes linear algebra primitives, statistical functions, and additional constructs. This high-level language significantly increases the productivity of data scientists as it provides (1) full flexibility in expressing custom analytics and (2) data independence from the underlying input formats and physical data representations. SystemML computations can be executed in a variety of different modes. It supports single node in-memory computations and large-scale distributed cluster computations. This allows the user to quickly prototype new algorithms in local environments but automatically scale to large data sizes as well without changing the algorithm implementation. Algorithms specified in DML are dynamically compiled and optimized based on data and cluster characteristics using rule-based and cost-based optimization techniques. The optimizer automatically generates hybrid runtime execution plans ranging from in-memory single-node execution to distributed computations on Spark or Hadoop. This ensures both efficiency and scalability. Automatic optimization reduces or eliminates the need to hand-tune distributed runtime execution plans and system configurations. == Initial Goals == The initial goals to move SystemML to the Apache Incubator is to broaden the community foster the contributions from data scientists to develop new machine learning algorithms and enhance the existing ones. Ultimately, this may lead to the creation of an industry standard in specifying machine learning algorithms. == Current Status == The initial code has been developed at the IBM Almaden Research Center in California and has recently been made available in GitHub under the Apache Software License 2.0. The project currently supports a single node (in memory computation) as well as distributed computations utilizing Hadoop or Spark clusters. === Meritocracy === We plan to invest in supporting a meritocracy. We will discuss the requirements in an open forum. Several companies have already expressed interest in this project, and we intend to invite additional developers to participate. We will encourage and monitor community participation so that privileges can be extended to those that contribute operating to the standard of meritocracy that Apache emphasizes. === Community === The need for a generic scalable and declarative machine learning approach in the open source is tremendous, so there is a potential for a very large community. We believe that SystemML’s extensible architecture, declarative syntax, cost based optimizer and its alignment with Spark will further encourage community participation not only in enhancing the infrastructure but also speed up the creation of algorithms for a wide range of use cases. We expect that over time SystemML will attract a large community. === Alignment === The initial committers strongly believe that a generic scalable and declarative