Thanks Roman. 1. Apache Beam looks promising. I agree it can potentially be extremely useful in, for example, Data Preparator of DASE-architecture engine of PredictionIO so it can leverage Spark/Flink/Google Dataflow. Look forward to hearing more about it.
2. The integration with Apache Zeppelin is definitely a great suggestion. In fact, Lee Moon Soo, an initial committer of Zeppelin is also listed as committer in this proposal. Some works have been done previously ( https://docs.prediction.io/datacollection/analytics-zeppelin/) but I anticipate a tighter collaboration with Apache Zeppelin after PredictionIO becomes an Apache project. Regards, Simon On Saturday, May 14, 2016, Andrew Purtell <andrew.purt...@gmail.com> wrote: > Yikes, apologies for the formatting. It looked fine in Gmail when I sent > it alas. > > I must let the proposers respond to the technical questions but I think I > can make the general observation that would-be contributors proposing and > performing work on new and better Apache ecosystem integrations would be > excellent for the health of the new podling and the ecosystem at large. > > > > On May 14, 2016, at 5:32 PM, Roman Shaposhnik <ro...@shaposhnik.org > <javascript:;>> wrote: > > > > Super excited to see this proposal! This will finally allow us to have > > an ASF managed > > backend for next generation data-driven apps that I see emerging quite > rapidly. > > > > The proposal looks great to me (although I'd recommend calling Scala > > as an implementation > > language more prominently since it may attract additional developers > > with affinity to it). > > > > I do have two questions about technology: > > 1. do you think it would be possible to leverage Apache Beam > (incubating) > > for abstracting away dependency on execution frameworks? My > understanding > > is that PredictionIO currently only run on Spark. > > 2. is there a potential integration with Apache Zeppelin possible? > > > > Thanks, > > Roman. > > > >> On Fri, May 13, 2016 at 1:41 PM, Andrew Purtell <apurt...@apache.org > <javascript:;>> wrote: > >> Greetings, > >> > >> It is my pleasure to > >> > >> propose the PredictionIO project for incubation at the Apache Software > >> Foundation. > >> > >> PredictionIO is a > >> popular > >> open > >> > >> source Machine Learning Server built on top of a state-of-the-art open > >> source stack, including several Apache technologies, that > >> > >> enables developers to manage and deploy production-ready predictive > >> services for various kinds of machine learning tasks > >> , with more than 400 production deployments around the world and a > growing > >> contributor community. > >> > >> > >> The text of the proposal is included below and is also available at > >> https://wiki.apache.org/incubator/PredictionIO > >> > >> Best regards, > >> Andrew Purtell > >> > >> > >> = PredictionIO Proposal = > >> > >> === Abstract === > >> PredictionIO is an open source Machine Learning Server built on top of > >> state-of-the-art open source stack, that enables developers to manage > and > >> deploy production-ready predictive services for various kinds of machine > >> learning tasks. > >> > >> === Proposal === > >> The PredictionIO platform consists of the following components: > >> > >> * PredictionIO framework - provides the machine learning stack for > >> building, evaluating and deploying engines with machine learning > >> algorithms. It uses Apache Spark for processing. > >> > >> * Event Server - the machine learning analytics layer for unifying > events > >> from multiple platforms. It can use Apache HBase or any JDBC backends > >> as its data store. > >> > >> The PredictionIO community also maintains a > >> > >> Template Gallery, a place to > >> publish and download (free or proprietary) engine templates for > different > >> types of machine learning applications, and is a complemental part of > the > >> project. At this point we exclude the Template Gallery from the > proposal, > >> as it has a separate set of contributors and we’re not familiar with an > >> Apache approved mechanism to maintain such a gallery. > >> > >> You can find the Template Gallery at https://templates.prediction.io/ > >> > >> === Background === > >> PredictionIO was started with a mission to democratize and bring machine > >> learning to the masses. > >> > >> Machine learning has traditionally been a luxury for big companies like > >> Google, Facebook, and Netflix. There are ML libraries and tools lying > >> around the internet but the effort of putting them all together as a > >> production-ready infrastructure is a very resource-intensive task that > is > >> remotely reachable by individuals or small businesses. > >> > >> PredictionIO is a production-ready, full stack machine learning system > that > >> allows organizations of any scale to quickly deploy machine learning > >> capabilities. It comes with official and community-contributed machine > >> learning engine templates that are easy to customize. > >> > >> === Rationale === > >> As usage and number of contributors to PredictionIO has grown bigger and > >> more diverse, we have sought for an independent framework for the > project > >> to keep thriving. We believe the Apache foundation is a great fit. > Joining > >> Apache would ensure that tried and true processes and procedures are in > >> place for the growing number of organizations interested in contributing > >> to PredictionIO. PredictionIO is also a good fit for the Apache > foundation. > >> PredictionIO was built on top of several Apache projects (HBase, Spark, > >> Hadoop). We are familiar with the Apache process and believe that the > >> democratic and meritocratic nature of the foundation aligns with the > >> project goals. > >> > >> === Initial Goals === > >> The initial milestones will be to move the existing codebase to Apache > and > >> integrate with the Apache development process. Once this is > accomplished, > >> we plan for incremental development and releases that follow the Apache > >> guidelines, as well as growing our developer and user communities. > >> > >> === Current Status === > >> PredictionIO has undergone nine minor releases and many patches. > >> PredictionIO is being used in production by Salesforce.com as well as > many > >> other organizations and apps. The PredictionIO codebase is currently > >> hosted at GitHub, which will form the basis of the Apache git > repository. > >> > >> ==== Meritocracy ==== > >> We plan to invest in supporting a meritocracy. We will discuss the > >> requirements in an open forum. We intend to invite additional developers > >> to participate. We will encourage and monitor community participation so > >> that privileges can be extended to those that contribute. > >> > >> ==== Community ==== > >> Acceptance into the Apache foundation would bolster the already strong > >> user and developer community around PredictionIO. That community > includes > >> many contributors from various other companies, and an active mailing > list > >> composed of hundreds of users. > >> > >> ==== Core Developers ==== > >> The core developers of our project are listed in our contributors and > >> initial PPMC below. Though many are employed at Salesforce.com, there > are > >> also engineers from ActionML, and independent developers. > >> > >> === Alignment === > >> The ASF is the natural choice to host the PredictionIO project as its > goal > >> is democratizing Machine Learning by making it more easily accessible to > >> every user/developer. PredictionIO is built on top of several top level > >> Apache projects as outlined above. > >> > >> === Known Risks === > >> > >> ==== Orphaned products ==== > >> PredictionIO has a solid and growing community. It is deployed on > >> production environments by companies of all sizes to run various kinds > of > >> predictive engines. > >> > >> In addition to the community contribution to PredictionIO framework, the > >> community is also actively contributing new engines to the Template > >> Gallery as well as SDKs and documentation for the project. Salesforce is > >> committed to utilize and advance the PredictionIO code base and support > >> its user community. > >> > >> ==== Inexperience with Open Source ==== > >> PredictionIO has existed as a healthy open source project for almost two > >> years and is the most starred Scala project on GitHub. All of the > proposed > >> committers have contributed to ASF and Linux Foundation open source > >> projects. Several current committers on Apache projects and Apache > Members > >> are involved in this proposal and intend to provide mentorship. > >> > >> ==== Homogeneous Developers ==== > >> The initial list of committers includes developers from several > >> institutions, including Salesforce, ActionML, Channel4, USC as well as > >> unaffiliated developers. > >> > >> ==== Reliance on Salaried Developers ==== > >> Like most open source projects, PredictionIO receives substantial > support > >> from salaried developers. PredictionIO development is partially > supported > >> by Salesforce.com, but there are many contributors from various other > >> companies, and an active mailing list composed of hundreds of users. We > >> will continue our efforts to ensure stewardship of the project to be > >> independent of salaried developers by meritocratically promoting those > >> contributors to committers. > >> > >> ==== Relationships with Other Apache Product ==== > >> PredictionIO relies heavily on top level apache projects such as Apache > >> Spark, HBase and Hadoop. However it brings a distinguished > functionality, > >> rather than just an abstraction - Machine Learning in a plug-and-play > >> fashion. > >> > >> Compared to Apache Mahout, which focuses on the development of a wide > >> variety of algorithms, PredictionIO offers a platform to manage the > whole > >> machine learning workflow, including data collection, data preparation, > >> modeling, deployment and management of predictive services in production > >> environments. > >> > >> ==== An Excessive Fascination with the Apache Brand ==== > >> PredictionIO is already a widely known open source project. This > proposal > >> is not for the purpose of generating publicity. Rather, the primary > >> benefits to joining Apache are those outlined in the Rationale section. > >> > >> === Documentation === > >> PredictionIO boasts rich and live documentation, included in the code > repo > >> (docs/manual directory), is built with Middleman, and publicly hosted at > >> https://docs.prediction.io > >> > >> === Initial Source and Intellectual Property Submission Plan === > >> Currently, the PredictionIO codebase is distributed under the Apache 2.0 > >> License and hosted on GitHub: > https://github.com/PredictionIO/PredictionIO > >> > >> === External Dependencies === > >> PredictionIO has the following external dependencies: > >> * Apache Hadoop 2.4.0 (optional, required only if YARN and HDFS are > needed) > >> * Apache Spark 1.3.0 for Hadoop 2.4 > >> * Java SE Development Kit 8 > >> * and one of the following sets: > >> > >> * PostgreSQL 9.1 > >> > >> > >> or > >> > >> > >> * MySQL 5.1 > >> > >> or > >> > >> > >> * Apache HBase 0.98.6 > >> > >> > >> * Elasticsearch 1.4.0 > >> > >> Upon acceptance to the incubator, we would begin a thorough analysis of > >> all transitive dependencies to verify this information and introduce > >> license checking into the build and release process by integrating with > >> Apache RAT. > >> > >> === Cryptography === > >> PredictionIO does not include cryptographic code. We utilize standard > >> JCE and JSSE APIs provided by the Java Runtime Environment. > >> > >> === Required Resources === > >> We request that following resources be created for the project to use > >> > >> ==== Mailing lists ==== > >> > >> predictionio-priv...@incubator.apache.org <javascript:;> (with > moderated subscriptions) > >> > >> predictionio-dev > >> > >> predictionio-user > >> > >> predictionio-commits > >> > >> We will migrate the existing PredictionIO mailing lists. > >> > >> ==== Git repository ==== > >> The PredictionIO team would like to use Git for source control, due to > our > >> current use of GitHub. > >> > >> git://git.apache.org/incubator-predictionio > >> > >> ==== Documentation ==== > >> https://predictionio.incubator.apache.org/docs/ > >> > >> ==== JIRA instance ==== > >> PredictionIO currently uses the GitHub issue tracking system associated > >> with its repository: > https://github.com/PredictionIO/PredictionIO/issues. > >> We will migrate to Apache JIRA. > >> > >> JIRA PREDICTIONIO > >> https://issues.apache.org/jira/browse/PREDICTIONIO > >> > >> ==== Other Resources ==== > >> * TravisCI for builds and test running. > >> > >> * PredictionIO's documentation, included in the code repo (docs/manual > >> directory), is built with Middleman and publicly hosted > >> https://docs.prediction.io > >> > >> * A blog to drive adoption and excitement at https://blog.prediction.io > >> > >> === Initial Committers === > >> > >> * Pat Ferrell > >> > >> * Tamas Jambor > >> > >> * Justin Yip > >> > >> * Xusen Yin > >> > >> * Lee Moon Soo > >> > >> * Donald Szeto > >> > >> * Kenneth Chan > >> > >> * Tom Chan > >> > >> * Simon Chan > >> > >> * Marco Vivero > >> > >> * Matthew Tovbin > >> > >> * Yevgeny Khodorkovsky > >> > >> * Felipe Oliveira > >> > >> * Vitaly Gordon > >> > >> === Affiliations === > >> > >> * Pat Ferrell - ActionML > >> > >> * Tamas Jambor - Channel4 > >> > >> * Justin Yip - independent > >> > >> * Xusen Yin - USC > >> > >> * Lee Moon Soo - NFLabs > >> > >> * Donald Szeto - Salesforce > >> > >> * Kenneth Chan - Salesforce > >> > >> * Tom Chan - Salesforce > >> > >> * Simon Chan - Salesforce > >> > >> * Marco Vivero - Salesforce > >> > >> * Matthew Tovbin - Salesforce > >> > >> * Yevgeny Khodorkovsky - Salesforce > >> > >> * Felipe Oliveira - Salesforce > >> > >> * Vitaly Gordon - Salesforce > >> > >> === Sponsors === > >> > >> ==== Champion ==== > >> > >> Andrew Purtell <apurtell at apache dot org> > >> > >> ==== Nominated Mentors ==== > >> > >> * Andrew Purtell <apurtell at apache dot org> > >> > >> * James Taylor <jtaylor at apache dot org> > >> > >> * Lars Hofhansl <larsh at apache dot org> > >> > >> * Suneel Marthi <smarthi at apache dot org> > >> > >> * Xiangrui Meng <meng at apache dot org> > >> > >> * Luciano Resende <lresende at apache dot org> > >> > >> ==== Sponsoring Entity ==== > >> > >> Apache Incubator PMC > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > <javascript:;> > > For additional commands, e-mail: general-h...@incubator.apache.org > <javascript:;> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > <javascript:;> > For additional commands, e-mail: general-h...@incubator.apache.org > <javascript:;> > >